Usually, when we talk about performance optimization, it is in the context of ongoing data integrations. There are some shared concepts involved in optimizing one-time migrations, though there are a few things that are different about migrations. This post is dedicated to performance optimization in a one-time data migration scenario.
Migration performance is important since there is generally a requirement of a specific cut over date, and missing that date could have risks for a project. If an ongoing data integration simply takes a little longer than anticipated to process transactions it usually does not carry the same type of risk to your project. This blog post covers the major areas that can impact migration performance and gives you some tips and tricks to speed up a migration.
Hardware
The first place to start is with hardware. Scribe Insight can be installed almost anywhere; the requirements to get it running are minimal. However, you should still take care when planning where you install your Scribe Insight server and workstations. Keep in mind where Scribe will run the migration from. Ideally, it is best to have Scribe Insight on its own dedicated server with at least 2 GB of RAM and dual CPUs. This can scale upwards for larger migrations, but it's a good starting point. You can use a local instance of Scribe Insight on your laptop to design DTS files, but we do not recommend using your laptop to run your migration.
Another piece of information to think about with hardware is resource contention. Again, Scribe has a small enough footprint that it can be installed and running on the same machine as other apps, but there are risks associated with that type of setup. For example, let's imagine you installed Scribe on the same server as SQL Server, Dynamics CRM, and Exchange. When the migration kicks off, Scribe will start trying to use local resources to run the DTS file. Now, if Scribe is reading/writing to Dynamics CRM, the Microsoft API will start being utilized, consuming additional resources. Next, since Scribe is reading/writing to something that uses SQL Server, we will see the SQL Server server start to consume memory. You can see how this sets up a situation where multiple processes are trying to use the same server resources. If this server is underpowered, this situation can very quickly begin to negatively affect the performance of the migration, and the corresponding applications.
Minimize Target Operations
Now let's take a closer look at what Scribe is doing while it is processing data. The biggest goal for performance with Scribe is to minimize target database operations. When Scribe makes a call to do an action against a target (insert/update/delete/seek) it makes a request and waits for a response from that system to find out if it was a success or a failure. So by making sure you are not needlessly asking for actions to occur, you can help minimize this waiting period. This is the type of change that may not give you perceptible changes to the run time of a few records, but when you multiply small increases in performance across many rows of data, these small performance improvements can make a big improvement in the time it takes to run a migration.
With this concept in mind of reducing the amount of DB hits, we have some Scribe features built in to assist you. Some of these are fairly common, while others are a little less known, but can still provide a lot of help.
The first of these is the Cross Reference Keys feature. When importing data, you will generally need to do some foreign key association, which can cost you a lot of DB hits looking to relate on those foreign keys. Using Scribe's built in Cross Reference Keys feature allows you to store those keys locally in the SCRIBEINTERNAL database for fast lookups. Keeping lookups local is a great way to reduce the number of target operations, while not losing any functionality of the migration.
Another great feature is the use of Calculated Variables. These give you a way to generate a value once, and use it over again in the same DTS file. A great example of this would be if you use the same DBLOOKUP to pull back an AccountID multiple times in one DTS file. Every time that DBLOOKUP happens, it's an additional database operation. Instead of wasting several database operations, you can do that lookup once, store the return value in a Calculated Variable, and then use that variable over and over again to get the ID value. It's a simple way to get the data you need, but not waste time with extra database hits.
There are some other features that can save you some overhead such as Automatic Foreign Key assignment and message processor caching. These are built in features of Scribe that require no additional setup by you in order to use, other than knowing they exist. By knowing what Scribe can do, you can better design your jobs to take advantage of these built in features.
Migration Design
When people set up migrations, at first the only approach that seems feasible is simply a point to point migration (all accounts à accounts, contacts à contacts, etc). But, there are actually several different approaches you can take using some of the features and functions of Scribe to your advantage. These design tactics should help you reduce the amount of time it takes to run your migration.
The first design tactic is to run the migration in stages using a net change pattern. Let's say, for example, you have a lot of lead time to design and test this migration, but the actual amount of time you have to cut over to production once you begin the migration is very short. You can run the migration and process the majority of the data several days before the cutover date. Then on the cutover date, you can run the migration again using the :LastRunDateTime system variables. That way, you have a much smaller subset of data to process to complete the migration on the cutover date.
Another method is to make use of the multi-threading capability in Scribe. The ideal way of doing this is by using queue based integration processes. Don't let the word integration fool you; you can use these features for your migration. Queue based integrations immediately give you up to 63 threads, depending on your license, to run data in parallel, boosting your throughput. When working with queue based integrations, the standard method of loading the queue with data is using some kind of publisher. But, when running migrations people tend to like having a little more control over what data is being run at what time, so instead of using a publisher to load your queue, you can make use of a DTS that has a target connection of the XML adapter using the Queue output option. This way, you can load the queue whenever you want, control how many records are loaded, have a running status about how many records are in there, and have the ability to easily change the source query in your DTS file to meet your needs.
If you have already designed a migration solution and converting to queue based integration seems a little much to do at this point, there is another way to get multi-threading out of Scribe. You can run multiple Workbenches at the same time. As long as your data can be segmented in a way that you can avoid running the same records twice, this can be an easy method to get jobs to run in parallel. The biggest concern for this is that you have sufficient hardware to support the multiple Workbenches running at the same time. If you do not, you run the risk of actually hurting performance by overworking the computer. The Workbench does not have any thread awareness, unlike the message processors, which know when there are not enough resources to start up another message processor.
If the computer where your Scribe server in installed cannot support this type of multi-threaded load, you could also install multiple Scribe workstations on other computers, giving each a portion of the migration to do. The most important thing to know is that you must segment your data, or run the risk of creating duplicates in your target system. You can build it in using features like Cross Reference Keys Locking to allow Scribe to know when it's processing a duplicate records in another process, and to create logic to avoid just inserting it again.
So, that's a glimpse into some of the tips, tricks, and design considerations that can help you design your migration for the fastest performance. There are many ways to optimize Scribe, so do not be shy of trying out some of these ideas. If you happen to find some particularly great design features or a few tricks you use, please share it with us.
If you would like to see videos where we explain the concepts presented in this blog post in more detail, check out the Scribe Inside Track service, which gives you access to a vBook called "Performance." A subscription to the Inside Track gets you access to premium content including deep dive educational videos, monthly interactive expert panel discussions, and a quarterly Inside Track newsletter. Learn more here - http://www.scribesoft.com/InsideTrack/.
