I'm importing a large number of objects from external sources. The objects themselves are fairly simple, just 5-10 attributes, but there are more than 100.000 of them.
Are there any tried-and-true suggestions for how to handle the import process? The data I use comes from external sources (which are out of my control) and it needs to be refreshed periodically.
Based on my testing of the import feature, this is too much for the CSV importer. I can get 25.000 objects imported/updated from CSV but the process is still quite slow. How much memory should be given to JVM, does anybody have any real-life experiences to share? Is the process memory or CPU bound?
On linux, splitting the incoming CSV data file into separate chunks is easy. What I find impractical is that after the split I must either a) create separate duplicate import configurations for each chunk (5-10, depending on how large the individual chunks are), or b) in some kind of a looping process copy/symlink the separate files into one known filename which the import process knows to look for, and then after the import switch to the next chunk. With the help of cron to drive the looping + scheduled imports on Insight this might just be doable, although somewhat annoying as a long-term solution.
Should I consider switching to a different importer altogether? Would it somehow make the process faster and less involved if I first imported the CSV as "raw data" into an external db from which the Insight db importer would do its job? I don't think there are any benefits to be gained from using the JSON importer as that is file based just like the CSV importer is.
Hi Tomi,
I would look into the following documentation for performance and tuning regarding Insight: https://documentation.riada.se/insight/latest/system-requirements
Let us know if you find that useful.
Best Regards
Alexander
It is useful, yes. We will look deeper and try to find an optimal solution.
Just out of curiosity, is the db import any less CPU/memory hungry? Do I gain anything by creating a temporary db table from which Insight could do the importing? The original csv would be quite easy (and fast!) to dump into a fresh table (i.e. drop table xxx; import into new table xxx from csv) each time the external data source produces a new fresh set of data.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hi Tomi,
I don't know if there is any difference between using the CSV and the DB import, in the end they both ends up creating the same data to import. I think what's more important is to look at the system requirements.
Best Regards
Alexander
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Online forums and learning are now in one easy-to-use experience.
By continuing, you accept the updated Community Terms of Use and acknowledge the Privacy Policy. Your public name, photo, and achievements may be publicly visible and available in search engines.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.