BigBear.ai
  • Home
  • Industries
    • Academia
    • Government
    • Healthcare
    • Manufacturing
  • Solutions
    • Cyber
    • Data Analytics
    • Enterprise Planning and Logistics
    • Intelligent Automation
    • Modeling Solutions
    • Professional Services
  • Company
    • About
    • Investor Relations
    • Partners
    • Team
  • Careers
    • Benefits
    • Culture
    • Explore Jobs
    • Military and Veterans
    • Applicant Login
    • Employee Login
  • Resources
    • Blog
    • Events
    • Newsroom
    • Resource Library
    • Online Store
  • Contact
Search

Home Business Intelligence Bridge Tables – Deep Dive

Blog

Bridge Tables – Deep Dive

Jim McHugh
June 3, 2020
  • Share
  • Share

This is the third and final blog a part of my deep dive series for dimensional modeling. Today, I will focus on Bridge Tables in this post.

Bridge tables are dimensional tables needed to address the many to many relationships between facts and dimensions or dimensions and multi-valued attributes you may come across when modeling your star schema.

Multi-Value Dimensional Attributes

Let’s start with the dimensions containing descriptive attributes that can have multiple values. For example, an employee skills relationship. In a relational model (Figure 1), the employee skills would be a single lookup table with a joiner table to show the relationship between the employee and the skills that they possess.

Figure 1. Relational Logical Model

 

Although this works perfectly in relational modeling, this will not work in dimensional modeling as one to many (1::M) relationships do not work the same way in a dimensional model. The reason why we try to eliminate the 1::M relationships in dimensional models, the detrimental impact on the performance of queries. Therefore we have to remove the 1::M relationships in dimensional models, but how?

One way in which we do this is through denormalization. To do this using the relational model above, we would add a flag attribute for each skill in the skill table.

Figure 2. Denormalized Employee/Skill Model

 

This is a perfectly acceptable solution if the list of skills is relatively static. The challenge with this solution is that it doesn’t scale when I add attributes dynamically. If I were to add ten new skills to the skill table in the relational model, I would need to add 10 new flag attributes into the DIM_EMPLOYEE table manually and I need to adjust the ETL workflow to be able to handle these new attributes.

The preferred dimensional modeling solution would be to use an attribute bridge table to connect the list of skills, a separate dimension table (outrigger), to the employee dimension by creating groups of skills. Although the model looks similar to the relational model, there are subtle differences between the dimensional logical model in Figure 3 vs. the relational model depicted in Figure 1. As you can see in Figure 3, the bridge table (BR_SKILL_GRP) will contain all of the combinations of the skills included in the outrigger (OR_SKILL) table with each combination of skills given a unique SKILL_GRP_KEY. The employee dimension would contain a specific skill group key that would uniquely identify the group of skills the employee possesses.

Figure 3. Dimensional Logical Model

 

A group of five skills would have 32 unique SKILL_GRP_KEYs and a total of 81 records. The data would look like Figure 4.

 

Figure 4

This solution does add complexity to the ETL loading process as we need to ensure the correct SKILL_GRP_KEY is entered in the DIM_EMPLOYEE table, and the proper skill group key is associated with the appropriate employee. Also, this solution may cause query performance issues and challenges when using some BI tools. Custom reporting solutions may be needed to properly filter and group results when using this bridge and outrigger solution.

Multi-Value Dimensions

The other type of many-to-many relationships we encounter in dimensional modeling is when several dimension records are associated with a single fact record. The challenge is similar to the multi-value attributes. In both models, we need to eliminate the redundancy in the model to ensure we do not overcount our fact records while providing a way to group and filter our results accurately.

Having worked in publications in the past, I like to use the article pageview fact as our example. The article pageviews count the number of times an article was viewed, but we also want to see which authors have the most pageviews. The challenge is that most articles are written by more than one author. How can we accurately count the total pageviews by author?

Figure 5 shows the relational, logical model between the Arthur, Article, and Pageviews.

 

Figure 5.

Now we want to take this information into a dimensional model for reporting. If we add an author key directly to the FACT_ARTICLE_PAGEVIEW as depicted in Figure 6, we will overcount any article that has multiple authors because the relationship between article and author is many to many.

 

Figure 6.

Because of this many to many relationship we need to find a way to eliminate the author from the fact table while retaining the ability to provide counts by authors. To do this, we will need to create a bridge to group authors for each article. The good thing about this example is that the list of authors rarely changes after publication, so once the author group is created for a specific article, it shouldn’t change. Figure 7. shows the dimensional model that would optimally be able to answer the question of which author had the most pageviews. I also added an allocation percentage in the event the customer wanted to give each author a portion of each pageview based on the number of authors.

 

Figure 7.

In conclusion, bridge tables both add and subtract complexity from our dimensional data models. The models expand, our ETL becomes a little more complicated, and the queries will have more tables to join, but the result will be accurate, and that is what our customers expect.

I hope you found this three-part deep-dive series on dimensional modeling challenges informative and useful.

Posted in Business Intelligence, Data Modeling, Data Warehousing.
BigBear.ai
  • Home
  • Industries
  • Solutions
  • Company
  • Careers
  • Blog
  • Investor Relations
  • Contact
  • Twitter
  • Facebook
  • Linkedin
  • Google My business for BigBear.ai
1-410-312-0885
[email protected]
  • Privacy Policy
  • Terms of Use
  • Accessibility
  • Site Map
© BigBear.ai 2023
We value your privacy
We use cookies on our website to give you the most relevant experience by remembering your preferences and repeat visits. By clicking “Accept”, you consent to the use of ALL the cookies.
Privacy Policy | Do not sell my personal information
AcceptCookie Settings
Manage Consent

Cookies Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.
Necessary
Always Enabled
Necessary cookies are absolutely essential for the website to function properly. These cookies ensure basic functionalities and security features of the website, anonymously.
CookieDurationDescription
cookielawinfo-checkbox-analytics11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional11 monthsThe cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
JSESSIONIDsessionThe JSESSIONID cookie is used by New Relic to store a session identifier so that New Relic can monitor session counts for an application.
viewed_cookie_policy11 monthsThe cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
Functional
Functional cookies help to perform certain functionalities like sharing the content of the website on social media platforms, collect feedbacks, and other third-party features.
CookieDurationDescription
__atuvc1 year 1 monthAddThis sets this cookie to ensure that the updated count is seen when one shares a page and returns to it, before the share count cache is updated.
__atuvs30 minutesAddThis sets this cookie to ensure that the updated count is seen when one shares a page and returns to it, before the share count cache is updated.
Analytics
Analytical cookies are used to understand how visitors interact with the website. These cookies help provide information on metrics the number of visitors, bounce rate, traffic source, etc.
CookieDurationDescription
_ga2 yearsThe _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors.
_ga_NK4L4Q320Q2 yearsThis cookie is installed by Google Analytics.
_gat_gtag_UA_163894009_21 minuteSet by Google to distinguish users.
_gid1 dayInstalled by Google Analytics, _gid cookie stores information on how visitors use a website, while also creating an analytics report of the website's performance. Some of the data that are collected include the number of visitors, their source, and the pages they visit anonymously.
at-randneverAddThis sets this cookie to track page visits, sources of traffic and share counts.
CONSENT2 yearsYouTube sets this cookie via embedded youtube-videos and registers anonymous statistical data.
uvc1 year 1 monthSet by addthis.com to determine the usage of addthis.com service.
Advertisement
Advertisement cookies are used to provide visitors with relevant ads and marketing campaigns. These cookies track visitors across websites and collect information to provide customized ads.
CookieDurationDescription
f5avraaaaaaaaaaaaaaaa_session_sessionbusinesswire.com cookie
loc1 year 1 monthAddThis sets this geolocation cookie to help understand the location of users who share the information.
VISITOR_INFO1_LIVE5 months 27 daysA cookie set by YouTube to measure bandwidth that determines whether the user gets the new or old player interface.
YSCsessionYSC cookie is set by Youtube and is used to track the views of embedded videos on Youtube pages.
yt-remote-connected-devicesneverYouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt-remote-device-idneverYouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
Save & Accept