BI, Business

Data Warehousing – A Process Overview

Data warehouse is a very common term that you will come across in the BI field. A data warehouse is simply a data store in its most basic definition. This data store is used for analysis and reporting and not for operational use.

The data warehouse typically has 3 layers: staging, integration and access layers. The staging layer usually contains a raw data dump from an operational database. The integration layer is where the raw data is restructured and links the data in a meaningful manner. The access layer is used for presentation and is usually for business clients.

The data warehouse stores all the business data for a corporation. This can be subdivided and stored into smaller stores known as Data Marts. Data Marts are typically subdivided by business function or logical reporting categories. If a single large Data Warehouse is implemented then this is known as centralised approach. The data is centrally stored and is accessed by all business functions. Implementing multiple data marts is a decentralised approach; as there is no single store with information from all business functions.

It’s beginning to sound like the two approaches are very different but they have a similar purpose. In fact, Ralph Kimball and Bill Inmon are the authors behind the decentralised and centralised approaches respectively. Inmon defined the idea of centralised storage and a top down approach. A top down approach starts with consideration of all the requirements at an enterprise level without breakdown by business function. Kimball defined a bottom up approach leading to decentralised storage with multiple Marts. A bottom up approach starts with individual business functions and their specific requirements. After all the marts are built these are then linked together to deliver an enterprise-wide solution.

Even though these two approaches to building a DW are different both involve 3 common stages: Integration, Analysis and Reporting. Integration and Analysis both involve manipulation of data, whereas Reporting is more about presentation of the data. It is possible to perform further analysis in the reporting layer.

The tools used for development usually support these three layers. One of the industry leading tool sets is provided by Microsoft. The SQL Server with Business Intelligence Development Studio (BIDS) supports the development of the above mentioned layers. The SQL Server Integration Services (SSIS), SQL Server Analysis Services (SSAS) and SQL Server Reporting Services (SSRS) together form the MS BI stack.

SSIS is used to get data from an operational database and transform it into a dimensional or ER data store. SSAS allows building of a multi-dimensional Cube using the new DW model and perform calculations and aggregations. SSRS then allows generating reports from the Cube with some graphics.

In the next post we will take a closer look at the dimensional model (Kimball) for data warehousing and define some basic terms.

Standard