Scribe Insight’s newest release has added a Bulk Operation mode to improve the performance at which we send data into Dynamics CRM and Salesforce.Com. An overview of the upgrade is here. For data stores that support bulk operations (such as Dynamics CRM and Salesforce), you can now configure DTS files to specify bulk processing with basic insert, update, and delete. Now you can now set bulk operations for individual steps within a DTS file. Prior to Insight Version 7.6, the Bulk Insert feature was set for the entire DTS file, rather than for each step.
In December when this was released, I was so excited to see our focus was on speed. So over the holidays I upgraded some of my virtual machine images to see what kind of performance improvements one can make by upgrading Scribe along with several other sub systems. I just want to take it out and let her “stretch out her legs”, lets see how fast we could go. So below is the first step of the performance improvements.
- First I set up a SQL database with 50,000 contacts and built a simple DTS to insert these contacts into Microsoft Dynamics CRM 2011 as Leads. By the way if you are looking for some test data I found a great resource for generic contact lists. http://www.briandunning.com/sample-data/ these are random names mixed with random addresses with properly formatted e-mail and web addresses. He has built many file sizes from 500 to 5 million contact names, perfect for our purposes. I created a table in SQL and copied the CSV file into the SQL table.
- Next I built a DTS to insert these into Dynamics CRM as Leads. No formatting of data, no lookups, just moving the data in from the table as fast as possible. I felt that this would help me see the raw changes. Using our previous version I was able to insert 50,000 records in 26.93 minutes, that is 31 rows per second with this DTS, on this image.
Before we go any further I need to mention that integration performance is determined by many factors. The use of Lookups and external variables, target application speed, use of database indexes, query syntax, Step Control pre operation conditions, step order of operations, available RAM on the Server, and wind speed all impact the performance of a DTS. Well maybe not wind speed, but suffice to say there are a lot of factors impacting the performance inserting thousands and millions of records. So these results are specific to my environment. So in this instance the only variable was the Scribe upgrade, your “actual mileage will vary”.
- After upgrading to Scribe Insight 7.6 I opened up the DTS, I got the following indication that the DTS is upgraded. I click OK and move to the next step of configuration.
- Opening up upgraded DTS, I went to the Configure Steps area, selected the insert step and set insert to Bulk Mode, 1000 rows. You can max this out at 1000 rows if you think your data will process error free. If it encounters an error it will revert to non-bulk mode and will step through each row individually until it gets to the end of the batch size. So as this was purely inserting with no parent or child objects it is good to set it to the maximum value.
- I ran the DTS from the Workbench. Happily the rows click past. 16 minutes later all 50,000 records have been inserted into CRM.
When we look at the performance I see that we are slightly over 40 % faster than the previous version at 52 rows per second.
My performance results are as follow:
| DTS Version | Source | Rows | Total Time – Insert |
| 7.5.2 | SQL 2008 R2 | 50,000 | 26.935 |
| 7.6 | SQL 2008 R2 | 50,000 | 16.007 |
|
Improvements |
41% |
Improving the core functional speed is a great way to start 2013. When I look at the way the DTS needs to work to gain these performance improvements, some rethinking of the integration patterns and methods may be needed. When bulk operations are called for it will be best used when your steps are of a simple design. These would be when building migrations where all the data is inserted. When doing SQL based integrations having separate DTS's for Inserting and Update operations may be called for. Also where extensive parent child operations are being used, like a sales orders with header and details in 1 DTS, this bulk setting will do little to improve performance. Further, if the source is a single XML file where only one record is being integrated, then bulk will not improve the performance in any noticeable way.
As these are only preliminary results running on a virtual machine set up on my laptop, not an ideal environment, I am excited to see the impact of some of the other areas where I can improve my integration performance. I will be tweaking RAM settings, upgrading to SQL 2012, changing Queue Priority settings, and performing several other tweaks to the system, all of which I will share on the Blog. Next, I am going to upgrade the underlying database to SQL 2012 to see how that impacts performance.
