Late Arriving Facts

Late arriving facts are those transactions which are delayed in arriving to the warehouse. For Type I data warehouses there is no real impact but for Type II warehouse models there is a minor challenge that the ETL developer must remember to handle to ensure the late arriving transaction corresponds with dimensional attributes at the time of the transaction.

If we have a Type I warehouse or are always assured that we only receive fact data at the same time as the dimensional data then we can use the following simple query where we find the current active record for the dimension’s natural ID which is found in the fact record:

SELECT dimensional_key

FROM dimension_table

WHERE dimension_natural_id = {natural_id from fact}

AND dimension_actv_rcrd_fl = 1;
slide1_500x121

Figure 1

Taking a look at the Type II data warehouse challenge, we cannot assume that the active dimensional record is the correct record for the fact. Therefore, we need to modify the ETL workflow process (Figure 1) to address the challenge of the possible changing of dimensional data since the “old” fact occurred.

To address this issue, we need to add an additional check when associating the dimensional keys to the fact table. We must find the dimensional key value where the transaction date key is between the dimensional active record start and end dates (Figure 2 & Figure 3) to ensure the accuracy of the data at the time of the transaction.

slide2_500x63

Figure 2

slide3_500x71

Figure 3

The query needed to find this record is slightly different since we need to find the record between two dates instead of by finding the most active record:

SELECT dim_key

FROM dimension_table

WHERE dim_natural_id = {natural_id from fact}

AND {trnsctn_dt from fact} BETWEEN dim_actv_rcrd_strt_dt AND dim_actv_rcrd_end_dt;

Indexing Tip: Only index the natural identifier on the dimension for best performance. Including the date columns in the index will not improve performance. It will only make your index larger and thereby reduce the performance of the index. Remember dimensions are supposed to be wide and shallow. If you have a rapidly changing dimension you will need to find a way to eliminate the attributes causing the dimensional change.

In conclusion, make sure you know your data. Profiling of your data and full knowledge of your customer’s business process is critical to a successful implementation of your data warehouse.

In my next blog, I will discuss the challenge of late arriving dimensional records and the impact it has on the accuracy of the data in the data warehouse.

BigBear.ai Privacy Policy