Issues importing large amounts of records in batch job
Posted: August 16th, 2022, 10:38 pm
I was talking with Matt and we both had situations where we needed to import many records and ran into design issues in the batch job.
I know the recommended approach is to save to db (described in this forum post). Below is the summary of the approaches tried and how they turned out
Ideally all approaches should have some way to work, though approach 1 and 3 would not be recommended for data that could change.
Can I get some feedback on a setup for approaches 1 and 2 that would not run into these issues?
Approach #1: RunQuery Paging
Using the paging feature for RunQuery by setting Paging = True, # of records per page = 50
Then set up the canvas so it looped back to RunQuery until EOF = True
This approach meant that it would have multiple queries against JDE
Failed: hung at ~30k records and caused the whole batch server to hang
Approach #2: Save to Db
Using the save to db feature of a RunQuery and retrieving using the CreateServiceNodeFromDatabase.
Platform3 App1: pulling 50 records at a time, saving to database, ECC timeout 2400000 (40min), Data Size 9.77 KB.
Tried to pull in over 10k records and it would timeout. Eventually the requirements were changed and more filters were added so only 2k records were needed, so this was never solved.
Platform3 App2: same variables. 50 records at a time, saving to database, ECC timeout 240000, data size 9.77 KB.
Tried to pull in 16k records, but get an ECC timeout message:
An error occurred in the ItemQuery RunQuery service: No message received from ECC for application : 100093 (Connector Application : 100053). Timeout: 2400000 ms
Attempt 3: 50 records at a time, data size: 1MB, 10 minute timeout.
Needed to pull in 191k records, but lowered it to 10k and it still timed out.
Approach #3: Run Batch Job at regular frequency and resume progress
run for 1k at a time and let the batch job run every minute and resume at the last ID
keeping track of the last ID that was processed and doing a ID > lastId in the Query with a top 1000
This worked
I know the recommended approach is to save to db (described in this forum post). Below is the summary of the approaches tried and how they turned out
Ideally all approaches should have some way to work, though approach 1 and 3 would not be recommended for data that could change.
Can I get some feedback on a setup for approaches 1 and 2 that would not run into these issues?
Approach #1: RunQuery Paging
Using the paging feature for RunQuery by setting Paging = True, # of records per page = 50
Then set up the canvas so it looped back to RunQuery until EOF = True
This approach meant that it would have multiple queries against JDE
Failed: hung at ~30k records and caused the whole batch server to hang
Approach #2: Save to Db
Using the save to db feature of a RunQuery and retrieving using the CreateServiceNodeFromDatabase.
Platform3 App1: pulling 50 records at a time, saving to database, ECC timeout 2400000 (40min), Data Size 9.77 KB.
Tried to pull in over 10k records and it would timeout. Eventually the requirements were changed and more filters were added so only 2k records were needed, so this was never solved.
Platform3 App2: same variables. 50 records at a time, saving to database, ECC timeout 240000, data size 9.77 KB.
Tried to pull in 16k records, but get an ECC timeout message:
An error occurred in the ItemQuery RunQuery service: No message received from ECC for application : 100093 (Connector Application : 100053). Timeout: 2400000 ms
Attempt 3: 50 records at a time, data size: 1MB, 10 minute timeout.
Needed to pull in 191k records, but lowered it to 10k and it still timed out.
Approach #3: Run Batch Job at regular frequency and resume progress
run for 1k at a time and let the batch job run every minute and resume at the last ID
keeping track of the last ID that was processed and doing a ID > lastId in the Query with a top 1000
This worked