Loading an Operational Data Stores
In my previous Operational Data Store (ODS) blog, we discussed how to design an ODS. In this post, I will describe how to optimize the loading of the ODS.
The first question to answer is the timeliness of the data. Does the user need real-time data or can the data be refreshed on a less frequent periodic basis (hourly, daily, tec.)? The business units need to provide this answer, which will be impacted by their budget. The question they will need to answer is what is the return on investment (ROI) of a real-time refresh vs. that of an hourly update of the ODS. Remember, the higher the update frequency of the ODS will cause the synchronization to increase in difficulty and to be much more expensive.
There are a couple of ways to extract the changes from the transactional systems. You can use the old, reliable method of selecting from the transaction system tables directly using a date field to capture the datetime the data was modified. This method puts a great deal of additional stress by using CPU and memory, thereby degrading the performance of the transaction system. You could put triggers on the tables you are hoping to perform change data capture (CDC), but again there is a performance degradation on the transactional system. Finally, in both of these scenarios, you will need to know the application database design and, in some cases, modify the workings of the database by adding triggers and procedures to capture changes. These triggers and procedures will be wiped out when upgrades to the application database are implemented.
The better solution is to use a tool to capture the transaction log files from the application database and apply those changes to the ODS where applicable. Tools like Oracle’s GoldenGate, ApexSQL, and Debezium/Kafka solutions read the database log files to perform CDC. The CDC is executed on the ODS servers, thereby eliminating any CPU and memory usage on the source application database.