Data Warehousing In the Cloud
Traditional data warehouses were often created as custom reporting data structures using on-site servers. These on-prem servers need to have a team of DBAs, storage engineers, operating system administrators, and network engineers to ensure the continuous running of the infrastructure, which quickly became obsolete. The ability to scale or replace your data warehouse infrastructure to handle a spike in data volume or rapidly address storage needs was limited by the speed of your procurement team. The cloud has changed all of that.
Cloud data warehouses offer you the ability to scale your data warehouse both vertically and horizontally, allowing you to optimize your data warehouse environment by scaling up during times of peak processing and scaling down after the peak processing need has passed. The cost savings that companies can achieve via the ability to scale, along with the resiliency and redundancy of the cloud, is a primary reason many companies are moving their data warehouses to the cloud.
Another reason to move your data warehouse to the cloud is to leverage the benefits of the data lake. You might ask, what does a data lake have to do with a data warehouse? A data warehouse is a database of structured data used to answer known questions the organization uses to measure its success while a data lake is a massive repository of unstructured data with no predefined use, so what is the benefit of storing the data warehouse with the data lake? Aren’t they used for separate tasks? Yes and no. I see a data lake as a place for innovation and self-service. I have often leveraged the data found in my data lake to find new and exciting KPIs for my organization. Once I find this information, I will want to store this data in my warehouse, thereby making the data found in the data lake an input to my warehouse. Keeping them co-located also reduces my ETL process by leveraging the speed and scalability of the could for processing this data. I see the data lake as an extension of my data warehouse, a source of new KPIs, and a location to provide detailed supporting information for my warehouse.
In conclusion, think of your data lake as an extension of your data warehouse.