Oracle Tips and Tricks to Enhance ETL Loads
Some of the biggest problems I ran into as an ETL developer were the loading of large tables. Loading large tables is an issue because the ETL load can go into core business hours, making the data unavailable for customers when doing nightly loads or full refresh of large tables.
The first step I would consider is to optimize both the select and insert statements, for which we can use the /* parallel */ hint to do some parallel read and write. However, be careful while using that hint as you can reach your CPU limits, which can slow down the query instead of speed it up, proving that understanding the environment is crucial. Then, to further optimize the inserts, using the /* append */hint is vital and can save you time on writing to the table.
Indexing fields that are being joined on in-source tables and any columns used to filter data being ETL’ed can significantly enhance the performance of the select. Disabling indexes, triggers, and constraints on the target tables can improve the performance of the rights, then enabling them after ETL is complete; however, that should not always be done, and a rebuild of the indexes is required periodically. Utilization of partitioning on the target tables would allow you to run multiple inserts into the different partitions simultaneously. You can also achieve asynchronous ETL in Oracle by utilization of DBMS_SCHEDULER.
Change Data Capture (CDC), which is not a well-known technology for many developers, is a handy tool. CDC is achieved in Oracle using Oracle LogMiner, which essentially reads the transaction logs and provides all the Inserts, Updates, and Deletes on each table since the last time the logs were read. As a result, users can perform close to real-time updates to tables. However, there is a catch with CDC, if the system does a lot of bulk transactions, there are not enough REDO logs, and/or it isn’t done enough, you may miss some transactions due to REDO logs being overwritten. Therefore, an on-demand or weekly data refresh must be done to obtain missed transactions. An alternative method is to use Oracle’s exchange partition, which would allow you to load the data into a different table and swap the partition with the new table. After taking these steps, you can almost instantaneously get the refreshed data in without affecting the user community.
Final hints that ETL developers to consider: Estimate what the size of the target table is going to be and how fast it’s going to grow. You can then create tables and set the INITIAL extent and NEXT extent to match your predictions. When loading in Oracle, if the tables have to extent storage multiple times during the load, it slows things down. For example, if I anticipate that the first load is 2GB and will grow 100MB nightly, then my INITIAL extent would be 2GB, and my NEXT would be 100MB. Ask DBAs to increase the REDO log size and Block Size of the database, especially for data warehouse databases.
About the Author
Alex Franjie is Section Manager/Senior Data Architect at BigBear.ai and has 10 years of IT experience designing and implementing innovative business solutions. He has a wide range of skills and experiences in enterprise IT system solutioning for on-prem, cloud, and hybrid environments. In addition, he specializes in database design, including the creation of Data Warehouses.