An enterprise data warehouse (EDW) is a database, or collection of databases, that centralizes a business's information from multiple sources and applications and makes it available for analytics and use across the organization. EDWs can be housed in an on-premise server or in the cloud.
Advantages of organizing an EDW:
The maintenance of an enterprise data warehouse solution is advantageous to an organization for many different reasons. Commonly, this kind of data collection and storage is thought of from a marketing or customer relations perspective, and that is certainly one piece of the puzzle.
That is not the only utility of a data warehouse, however. It can also help to make sense of seemingly random pieces of data which are coming into the organization through various inputs, and it can save valuable time by aggregating that information automatically. Organizations are likely to be better positioned for future growth when their data is organized in such a systemic, automated fashion.
Enterprise Data Warehouse concepts and functions
These pillars define a warehouse as a technological phenomenon:
Nonvolatile. Once placed in a warehouse, the data is never deleted from it. The data can be manipulated, modified, or updated due to source changes, but it’s never meant to be erased, at least by the end users. As we speak about historical data, deletions are counterproductive for analytical purposes. Yet general revisions may occur once in a few years to get rid of irrelevant data.
Reflects the source data. EDW sources data from its original storage spaces like Google Analytics, CRMs, IoT devices, etc. If the data is scattered across multiple systems, its unmanageable. So, the purpose of EDW is to provide the likeness of the original source data in a single repository. As there is always new, relevant data generated both inside and outside the company, the flow of data requires a dedicated infrastructure to manage it before it enters a warehouse.
Serves as the ultimate storage. An enterprise data warehouse is a unified repository for all corporate business data ever occurring in the organization.
Stores structured data. The data stored in an EDW is always standardized and structured. This makes it possible for the end users to query it via BI interfaces and form reports. And this is what makes a data warehouse different from a data lake. Data lakes are used to store unstructured data for analytical purposes. But unlike warehouses, data lakes are used more by data engineers/scientists to work with big sets of raw data.
Subject-oriented data. The main focus of a warehouse is business data that can relate to different domains. To understand what the data relates to, it’s always structured around a specific subject called a data model. An example of a subject can be a sales region or total sales of a given item. Additionally, metadata is added to explain in detail where every piece of information comes from.
Time-dependent. The data collected is usually historical data, because it describes past events. To understand when and for how long a certain tendency took place, most stored data is usually divided into time periods.
Enterprise Data Warehouse Components
There are a lot of instruments used to set up a warehousing platform. Following are few component and their functions:
Extract, Transform, Load (ETL) or Extract, Load, Transform (ELT) layer. These are the tools that perform actual connection with source data, its extraction, and loading to the place where it will be transformed. Transformation unifies data format. ETL and ELT approaches differ in that in ETL the transformation is done before EDW, in a staging area. ELT is a more modern approach that handles all the transformation in a warehouse.
Staging area. In the case of ETL, the staging area is the place data is loaded before EDW. Here, it will be cleaned and transformed to a given data model. The staging area may also include tooling for data quality management.
DW database. The data is finally loaded into the storage space. In ELT, it might still take some transformation here. But, at that stage, all the general changes will be applied, so the data will be loaded in its final model(s). As we mentioned, data warehouses are most often relational databases. DW will also include a database management system and additional storage for metadata.
Meta-data module. Put simply, metadata is data about data. These are the explanations that give hints for users/administrators of what subject/domain this information relates to. This data can be technical meta (e.g. initial source), or business meta (e.g. region of sales). All the meta is stored in a separate module of EDW and is managed by a metadata manager.
Reporting layer. These are tools that give end users access to data. Also called BI interface, this layer will serve as a dashboard to visualize data, form reports, and pull separate pieces of information.