BigBear.ai
  • Home
  • Industries
    • Academia
    • Government
    • Healthcare
    • Manufacturing
  • Solutions
    • Cyber
    • Data Analytics
    • Enterprise Planning and Logistics
    • Intelligent Automation
    • Modeling Solutions
    • Professional Services
  • Company
    • About
    • Investor Relations
    • Partners
    • Team
  • Careers
    • Benefits
    • Culture
    • Explore Jobs
    • Military and Veterans
    • Applicant Login
    • Employee Login
  • Resources
    • Blog
    • Events
    • Newsroom
    • Resource Library
    • Online Store
  • Contact
Search

Home Data Modeling Joining Fact Tables

Blog

Joining Fact Tables

Jim McHugh
February 20, 2019
  • Share
  • Share

Joining fact tables can be done but there are some inherent risks so you need to be careful when joining fact tables is required. In the following simple scenario, we have a fact of authors to articles and a separate fact of articles to pageviews. Our customer has asked for the ability to 1) find the authors who have provided the most content in a given time period, find the articles which have the greatest number of pageviews for a given time period and 3) find the authors with the highest number of pageviews in a given time period. Our data model can be seen here:

To answer the first two questions is done by using a simple select statement joining a single fact table with the date dimension and the associated other associated dimensions to retrieve the answer.

TOP 10 AUTHORS BY ARTICLE COUNT

TOP 10 ARTICLES BY MONTH

The third requirement will compel us to join two fact tables to get the answer. Remember to join a fact table you need at least one common dimension shared between the facts to accomplish this task. In this scenario, DIM_ARTICLE will be the common dimension which we will use.

TOP 10 AUTHORS FOR 2019

These simple joins work well on small and medium sized fact tables but as the fact tables increase in size your performance may suffer and therefore you may want to be prepared to address these challenges.

There are two other specific issues to be aware of and avoid, Chasm Traps and Fan Traps.

Chasm traps are queries that require measures from more than one fact table. In the data model below we see 2 separate fact tables, one for sales and one for returns. A common request may be to show the net profit (sales – returns) for the 4th quarter of 2018. We may be tempted to join these two dimensions on DIM_DATE and get the results of the sum of sales and the sum of returns.

Let’s walk through the example by running each query separately. First, let’s get the total sales for each store for the 3rd quarter of 2018 using the following query:

RESULTS

Store IDStore NameUnits SoldGross Sales
2NEW YORK210918,532.55
7PORTLAND209418,205.14
6PHILADELPHIA205818,117.64
4WASHINGTON206217,948.92
9SAN FRANCISCO204017,647.64
5SEATTLE203417,638.31
1ONLINE205317,599.21
8BOSTON201017,286.39
3LOS ANGELES196217,008.89

Next, let’s get the total refunds for each store for the 3rd quarter of 2018 using the following query:

RESULTS

Store IDStore NameUnits ReturnedGross Returns
6PHILADELPHIA68602.66
3LOS ANGELES67580.03
7PORTLAND64569.28
2NEW YORK61526.16
4WASHINGTON62526.05
9SAN FRANCISCO58510.32
8BOSTON52442.92
1ONLINE48414.57
5SEATTLE43382.70

Finally, let’s put both of these together into a single query with the following query:

RESULTS

Store
ID
Store NameUnits
Sold
Gross
Sales
Units
Returned
Gross
Returns
7PORTLAND12140105,680.48800769,802.46
6PHILADELPHIA11718102,838.25794269,391.35
2NEW YORK11692101,895.82773267,121.34
1ONLINE1154599,375.88762866,735.25
5SEATTLE1144199,165.43748564,984.80
4WASHINGTON1141499,132.33760766,445.23
9SAN FRANCISCO1143499,110.30769766,808.76
8BOSTON1144498,324.45760966,681.70
3LOS ANGELES1125097,627.69744264,708.52

The results are inflated due to the cross join which will occurs between the fact tables when trying to get a group function result.

To avoid the chasm trap you need to separate the functions into the base SQL statements we originally designed to verify our results, adding zeros for the columns from the other table, and then union the statements together, wrap them in a SQL statement.

RESULTS

Store NameUnits SoldGross SalesUnits ReturnedGross ReturnsNet Units SoldNet Sales
NEW YORK210918,53361526.16204818,006
PORTLAND209418,20564569.28203017,636
PHILADELPHIA205818,11868602.66199017,515
WASHINGTON206217,94962526.05200017,423
SEATTLE203417,63843382.70199117,256
ONLINE205317,59948414.57200517,185
SAN FRANCISCO204017,64858510.32198217,137
BOSTON201017,28652442.92195816,843
LOS ANGELES196217,00967580.03189516,429

Fan traps occur when there is a that require measures from more than one fact table in a master detail relationship. An example of this is an order and order detail relationship.

For this example, our customer has given us a requirement to calculate the average order item. If we try to sum the order value from the fact order table and then sum the quantity of items ordered from the order detail to help us get the average the order value may be inflated due to the cartesian join between the fact tables. Let’s start with a simple select between the customer dimension and the FACT_ORDER table.

RESULTS

Customer IDCustomer NameOrder Value
[email protected]Emmy Cuevas353.80

But when we add in the FACT_ORDER_DTL to the query the result changes in the sum from the FACT_ORDER table.

RESULTS

Customer IDCustomer
Name
Order
Value
QtyDetail
Order
Value
[email protected]Emmy Cuevas381.7038353.80

Again, this will be exacerbated by the additional number of line items in the FACT_ORDER_DTL to the cross join between the fact tables.

The way to address this challenge can be done in multiple ways. The first is to ignore the FACT_ORDER (parent) when summing information from the FACT_ORDER_DTL (child). If you rely only on the detail table you can be assured your answer will be correct.

Another way to address this challenge is to adjust the design by removing the FACT_ORDER table and inserting the ORDER_ID as a degenerate dimensional value into the FACT_ORDER_DTL.

Posted in Data Modeling, Data Warehousing.
BigBear.ai
  • Home
  • Industries
  • Solutions
  • Company
  • Careers
  • Blog
  • Investor Relations
  • Contact
  • Twitter
  • Facebook
  • Linkedin
  • Google My business for BigBear.ai
1-410-312-0885
[email protected]
  • Privacy Policy
  • Terms of Use
  • Accessibility
  • Site Map
© BigBear.ai 2023
We value your privacy
We use cookies on our website to give you the most relevant experience by remembering your preferences and repeat visits. By clicking “Accept”, you consent to the use of ALL the cookies.
Privacy Policy | Do not sell my personal information
AcceptCookie Settings
Manage Consent

Cookies Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.
Necessary
Always Enabled
Necessary cookies are absolutely essential for the website to function properly. These cookies ensure basic functionalities and security features of the website, anonymously.
CookieDurationDescription
cookielawinfo-checkbox-analytics11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional11 monthsThe cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
JSESSIONIDsessionThe JSESSIONID cookie is used by New Relic to store a session identifier so that New Relic can monitor session counts for an application.
viewed_cookie_policy11 monthsThe cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
Functional
Functional cookies help to perform certain functionalities like sharing the content of the website on social media platforms, collect feedbacks, and other third-party features.
CookieDurationDescription
__atuvc1 year 1 monthAddThis sets this cookie to ensure that the updated count is seen when one shares a page and returns to it, before the share count cache is updated.
__atuvs30 minutesAddThis sets this cookie to ensure that the updated count is seen when one shares a page and returns to it, before the share count cache is updated.
Analytics
Analytical cookies are used to understand how visitors interact with the website. These cookies help provide information on metrics the number of visitors, bounce rate, traffic source, etc.
CookieDurationDescription
_ga2 yearsThe _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors.
_ga_NK4L4Q320Q2 yearsThis cookie is installed by Google Analytics.
_gat_gtag_UA_163894009_21 minuteSet by Google to distinguish users.
_gid1 dayInstalled by Google Analytics, _gid cookie stores information on how visitors use a website, while also creating an analytics report of the website's performance. Some of the data that are collected include the number of visitors, their source, and the pages they visit anonymously.
at-randneverAddThis sets this cookie to track page visits, sources of traffic and share counts.
CONSENT2 yearsYouTube sets this cookie via embedded youtube-videos and registers anonymous statistical data.
uvc1 year 1 monthSet by addthis.com to determine the usage of addthis.com service.
Advertisement
Advertisement cookies are used to provide visitors with relevant ads and marketing campaigns. These cookies track visitors across websites and collect information to provide customized ads.
CookieDurationDescription
f5avraaaaaaaaaaaaaaaa_session_sessionbusinesswire.com cookie
loc1 year 1 monthAddThis sets this geolocation cookie to help understand the location of users who share the information.
VISITOR_INFO1_LIVE5 months 27 daysA cookie set by YouTube to measure bandwidth that determines whether the user gets the new or old player interface.
YSCsessionYSC cookie is set by Youtube and is used to track the views of embedded videos on Youtube pages.
yt-remote-connected-devicesneverYouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt-remote-device-idneverYouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
Save & Accept