BigBear.ai
  • Home
  • Industries
    • Academia
    • Government
    • Healthcare
    • Manufacturing
  • Solutions
    • Cyber
    • Data Analytics
    • Enterprise Planning and Logistics
    • Intelligent Automation
    • Modeling Solutions
    • Professional Services
  • Company
    • About
    • Investor Relations
    • Partners
    • Team
  • Careers
    • Benefits
    • Culture
    • Explore Jobs
    • Military and Veterans
    • Applicant Login
    • Employee Login
  • Resources
    • Blog
    • Events
    • Newsroom
    • Resource Library
    • Online Store
  • Contact
Search

Home Data Warehousing Data Warehouse Design Methodologies

Blog

Data Warehouse Design Methodologies

Jim McHugh
November 9, 2016
  • Share
  • Share

There are two traditional data warehouse design methodologies came of age in the 1990’s, that of Bill Inmon’s Top-Down Atomic Data Warehouse and that of Ralph Kimball’s Bottom-Up Dimensional Data Warehouse. These methodologies have been used over the past 20 years to create informational data stores for organizations seeking to leverage their data for corporate gain. In this article, we will compare and contrast these two methodologies.

Atomic Data Warehouse – Bill Inmon

Bill Inmon’s Atomic Data Warehouse approach is strategic in nature and seeks to capture all of the enterprise data in 3rd Normal Form and store all of this atomic data in the data warehouse. Inmon defines a data warehouse as a subject-oriented, non-volatile, time-variant and integrated data source. Let’s break down each of these descriptors of the Inmon’s Data Warehouse.

  • Subject-Oriented – the data is organized so that the data, related by subject area, is linked together.
  • Non-Volatile – once data is entered it is never updated or deleted; all data is retained for future reporting needs.
  • Time-Variant – because of the non-volatile nature of the data and the need for time-based reporting, once data is entered into the warehouse it cannot be modified, new records must be added to reflect the changes in data over time.
  • Integrated – data is sourced from most to all of the enterprise’s information systems and organized in a consistent and unified manner.

Inmon then creates data marts, subject or department focused subset of the data warehouse, which is designed to address the data and reporting needs of the targeted subset of business users.

There are several benefits of this model. The first is that all of the corporate data is completely documented. This process provides the organization with a complete view of their processes, products/services, customers, vendors, etc. This documentation is invaluable to the organization as, in most cases, up to this point, every system has been launched in isolation and is often the first time the organization truly defines the different processes, products or parties with whom they interact with on a consistent basis.

Second, the data is efficiently stored in 3rd Normal Form in a single repository. This storage methodology makes the retrieval and storage of the data from transactional systems already defined in 3NF a little easier.

Finally, the data is readily available for extraction into data marts for the business users. Now that the data is fully defined and efficiently stored the warehousing team can build the data mart foe the business unit. These data marts will be created to allow the business unit quickly and efficiently answer their questions. It will also provide the user with the detail data supporting the data mart as well as the lineage of the data. This supporting information and data lineage is often critical in the acceptance and functional usage of the data mart by the business user.

There are also several challenges which this framework poses to the organization. First is the time-consuming task of documenting and defining the complete repository for the entire organization. This design methodology is a long, time-consuming process that, although invaluable, requires the data warehousing IT team to work closely with the business users to ensure the authoritative data is captured and stored in the correct data structure. Therefore a great deal of time will elapse between project kick-off and the initial data mart deliverable. This “deliverable gap” has been the demise of many data warehousing projects, as, depending on the size of the organization and the experience of the data warehousing team, the initial discovery and documentation step may never be completed before the project is canceled.

A second challenge is the lack of flexibility this model provides. If one adds a new business unit, a new application or offers a significantly different product or service the organization will need to go back and modify the existing data model to accurately document and define the new state of the business while maintaining the subject-oriented, non-volatile, time-variant and integrated aspects of the existing data stored in the warehouse. This too has often called into question in the value of a data warehouse.

Finally, there is substantial ETL processing necessary to transform the data warehouse data into a data mart to be used for business consumption. This protracted processing can cause delays in the delivery of the data to the business user. Sometimes these delays in transforming the data from the source system to the data warehouse and finally into the data mart for business consumption does not meet the needs of the business user and alternative solutions, or shortcuts are often researched and invoked.

Dimensional Data Warehouse – Ralph Kimball

Ralph Kimball’s methodology is more tactical in nature and is the antithesis of the Inmon’s methodology. Kimball’s definition of a data warehouse is “a copy of transaction data specifically structured for query and analysis.” He believes that you should start at the tactical level by focusing on the data mart first, thereby providing immediate value to the business users. Kimball’s data warehouse is to simply leverage the collection of the data marts as a whole. Kimball’s approach only worries about the data needed for the data marts.

Kimball’s data marts consist of source data converted from 3NF to a dimensional model. This dimensional model consists of facts and dimensions. Facts are calculated measures about entities at a specified point in time. Dimensions are the containers for the clarifying elements of the entities about which measures are grouped.

The major benefit of Kimball’s approach and the use of dimensional modeling is the speed upon which the business user can derive value from the data mart and the flexibility this modeling offers. Data marts can usually be defined, designed and delivered in less than 120 days and in a majority of cases in less than 90 days from the availability of the data. Next, this model also allows the facts or dimensions to easily be expanded to add new measures or additional information describing the entity to be added. Finally, this modeling technique calls for the preprocessing and storage of the data in such a way that aggregations of the fact data can be easily sliced and diced by the dimensional columns by the business users with little to no IT help. This combination of speed, agility, scalability, and understandability is needed in today’s rapidly changing business environment.

The challenges of the Kimball methodology is the lack of enterprise focus of the data warehouse. For example, each data mart may have similar but not conforming (consistent) dimensions across the different data marts as each may be derived from different sources. This has the potential of having each data mart provide a different answer to a standard enterprise question, such as “How many customers do we have?”, based on which source system the data mart has derived the customers.

Conclusion

In the end, both of these data warehousing methodologies provide intrinsic value to the enterprise. Some organizations want to focus on the strategic and therefore choose the Inmon methodology. While others need the speed and agility of the Kimball method. While still others want the best of both worlds and create a hybrid of both methodologies.

I hope you feel that you have a solid, high-level understanding of these methodologies to make an informed choice on your data warehousing methodology. I will provide more detailed information about how to implement these methodologies in future blog posts.

Posted in Data Modeling, Data Warehousing.
BigBear.ai
  • Home
  • Industries
  • Solutions
  • Company
  • Careers
  • Blog
  • Investor Relations
  • Contact
  • Twitter
  • Facebook
  • Linkedin
  • Google My business for BigBear.ai
1-410-312-0885
[email protected]
  • Privacy Policy
  • Terms of Use
  • Accessibility
  • Site Map
© BigBear.ai 2023
We value your privacy
We use cookies on our website to give you the most relevant experience by remembering your preferences and repeat visits. By clicking “Accept”, you consent to the use of ALL the cookies.
Privacy Policy | Do not sell my personal information
AcceptCookie Settings
Manage Consent

Cookies Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.
Necessary
Always Enabled
Necessary cookies are absolutely essential for the website to function properly. These cookies ensure basic functionalities and security features of the website, anonymously.
CookieDurationDescription
cookielawinfo-checkbox-analytics11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional11 monthsThe cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
JSESSIONIDsessionThe JSESSIONID cookie is used by New Relic to store a session identifier so that New Relic can monitor session counts for an application.
viewed_cookie_policy11 monthsThe cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
Functional
Functional cookies help to perform certain functionalities like sharing the content of the website on social media platforms, collect feedbacks, and other third-party features.
CookieDurationDescription
__atuvc1 year 1 monthAddThis sets this cookie to ensure that the updated count is seen when one shares a page and returns to it, before the share count cache is updated.
__atuvs30 minutesAddThis sets this cookie to ensure that the updated count is seen when one shares a page and returns to it, before the share count cache is updated.
Analytics
Analytical cookies are used to understand how visitors interact with the website. These cookies help provide information on metrics the number of visitors, bounce rate, traffic source, etc.
CookieDurationDescription
_ga2 yearsThe _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors.
_ga_NK4L4Q320Q2 yearsThis cookie is installed by Google Analytics.
_gat_gtag_UA_163894009_21 minuteSet by Google to distinguish users.
_gid1 dayInstalled by Google Analytics, _gid cookie stores information on how visitors use a website, while also creating an analytics report of the website's performance. Some of the data that are collected include the number of visitors, their source, and the pages they visit anonymously.
at-randneverAddThis sets this cookie to track page visits, sources of traffic and share counts.
CONSENT2 yearsYouTube sets this cookie via embedded youtube-videos and registers anonymous statistical data.
uvc1 year 1 monthSet by addthis.com to determine the usage of addthis.com service.
Advertisement
Advertisement cookies are used to provide visitors with relevant ads and marketing campaigns. These cookies track visitors across websites and collect information to provide customized ads.
CookieDurationDescription
f5avraaaaaaaaaaaaaaaa_session_sessionbusinesswire.com cookie
loc1 year 1 monthAddThis sets this geolocation cookie to help understand the location of users who share the information.
VISITOR_INFO1_LIVE5 months 27 daysA cookie set by YouTube to measure bandwidth that determines whether the user gets the new or old player interface.
YSCsessionYSC cookie is set by Youtube and is used to track the views of embedded videos on Youtube pages.
yt-remote-connected-devicesneverYouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt-remote-device-idneverYouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
Save & Accept