Proclamations about the death of Enterprise Data Warehouse (EDW) have been made more than once during its lengthy tenure. Throughout its life, business and technical teams have encountered challenges on all aspects of EDWs related to its flexibility, adaptiveness, scalability, and performance. Many vendors are still working on these problems today, but in a way that is reshaping the capabilities of warehousing vs. looking for a replacement.
Big data, at first glance, seemed to many as the last straw for data warehousing as it was speculated the technology would not meet the volume, variety and velocity expectations. Because of this, organizations have sought after other options to get off the traditional data warehouse technology into environments such as Data lakes. However, data warehousing has still proven (in the more modern rendition) to be a tool for meeting information needs when the purpose is well defined. With respect to data lakes, we are seeing it acting not as a replacement but as a complementary piece for a different set of purposes.
Data Warehouse vs Data Lake
A data warehouse targets a business user with a specific purpose in mind, resulting in a processed refined data structure. In contrast, a data lake targets users and developers like data scientists looking to perform more explorative tasks, resulting in data stored in its raw form with an emphasis on high accessibility.
Where We Are with Data Warehouse Today
It is true that the day-to-day decision-support capabilities provided by data warehouse will still be vital. However, in its modernized role, several new and changing characteristics will be required to stay relevant, such as:
- It must be able to handle a larger array of heterogeneous data both inside and outside of database storage semi and structured. While in theory, this isn’t completely new, the model is shifting towards supporting JSON and XML in object storage. Redshift Spectrum and Azure Polybase are two prime examples of supporting an object storage-centric architecture on data platforms.
- Depending on the organization, it may no longer play the role of a central location for data. Data lakes could be a better environment for that job in the long run. The industry is still working towards proving that.
- We should strive to have our data warehousing in the cloud to help resolve long-standing scalability, deployment and administration issues where possible.
- It will need to continue to evolve in order to support fast ELT data processing instead of legacy ETL process and tools. When in the cloud, this speeds up processing while lowering costs.
- Data Warehousing technology will need to adapt to the practices of DataOps by making it easy & efficient for developers to move/import/export data and its structures in a unified manner.
Modern modular data platforms will leverage data warehouses as one of the most proven and long-standing components in the analytics ecosystem. We’ve gone through many data warehouse iterations to get where we are today, and it’ll take a couple more for us to fully realize the co-existence of both the data lake and data warehouse.
Rely on the Experts
If you’re either thinking about modernizing your current data warehouse, or not sure where to start implementing one, Indellient’s experts can help you plan and architect the best solution for your specific needs and budget. Send me an email or connect with me on LinkedIn to continue the conversation!