Don't you want to read? Try listening to the article in audio mode 🎧
At a time when company decisions are increasingly becoming Data Driven, it is essential to have one or more sources of information that can be used to implement business strategies and processes to be able to maximise productivity and ROI (Return on investment). Data Warehouses meet this need by providing a basis on which to operate more effectively through Business Intelligence tools. So, let's analyse these particular systems and try to understand why they are beneficial even when you have to manage large amounts of data.

Data Warehouse: what it is

In the field of Data Management, Data Warehouses are systems supporting Business Intelligence that allow for in-depth analysis procedures and address data-based Decision-Making processes. The information contained therein can come from different sources, therefore Data Warehouses often feature historical data, data deriving from company activity (or operational data), data collected from external sources, data generated by the operation of applications, log files, and much more, all in a centralised collection. However, these systems should not be confused with databases and were not designed for processing but to offer answers to queries. For this reason, when talking about Data Warehouse, we also refer to a read-only database. From the point of view of composition, they are not monolithic structures and include some elements required for their proper functioning, such as:
  • an RDBMS (Relational Database Management System) with which to manage the data storage and querying processes;
  • ELT tool (Extract, Load, Transform) with which to extract, load and transform data in preparation for analysis;
  • Data Mining tools that allow and facilitate data extraction;
  • Data Analysis applications;
  • reporting tools;
  • platforms for data presentation.
With the progress of Artificial Intelligence technologies, and the growing availability of Big Data, algorithms for training Machine Learning models for analysis, tools that enable different graphical data representations that are generated based on different analysis criteria, have also gained particular importance. William H. Inmon, credited as the father of Data Warehouse and one of the world's leading experts in the field of Data Warehouse Design, argues that these systems must be at the same time:
  • subject-oriented, that is allowing to analyse data on the basis of specific topics;
  • non-volatile, therefore able to guarantee data stability and non-modifiability;
  • integrated, that is designed to generate consistency despite the different types of managed data and their multiple sources;
  • time-variant, helping to identify the updates that have involved data throughout their history.
It is therefore clear how much Data Warehouses can be useful both for storing and enhancing data.

Examples of Data Warehouse architecture

There is no single architecture for Data Warehouses, but different variants developed according to the needs of a company or any other entity that wishes to take advantage of these particular systems for data collection and analysis. However, it is possible to describe some commonly-used architectures:

Sandbox

It is an architecture based on the creation of private areas that ensure a high level of data protection and security. Thanks to them, stored information can be explored more quickly because there is no lack of requirement to comply with constraints placed by Data Warehouse protocols or its formal rules.

Basic architecture

Unlike what happens with the Sandbox, it provides a single central repository in which all data flows into it, regardless of their type (raw data, metadata, summaries...) and source. Access is also guaranteed to all users who, among the various purposes, can use data for Data Mining, analysis and reporting.

Basic architecture with a staging area

In this case, all data is stored through the Data Warehouse only after being selected, by removing non relevant information, and then pre-processed. Hence there is a need to define a temporary management area for data preparation before its inclusion into the system. Alternatively, it is still possible to clean up and process the information during the programming stage.

Hub and spoke

An architecture characterised by the presence of a Data Mart, this is a structured database with a subset of data on a specific topic. Typically, a Data Mart is represented by the section of a Data Warehouse and it is used to meet specific requirements, such as the need to manage data of an individual business division (marketing, sales, administration...). The Data Mart acts as an intermediary level between the repository and users, so data flows into it only when they are ready to be used. Clearly, multiple Data Marts can be associated with a single system.

How is a Data Warehouse powered?

Where does all stored data through the Data Warehouses come from? Generally, it is possible to distinguish the sources used in internal and external sources. Taking as an example the reality of a structured company, the internal data sources are for example databases, CMS (Content Management System) used for content management, ERP (Enterprise Resource Planning) to which reference is made for corporate functions and business processes, as well as CRM (Customer Relationship Management) for managing relationships with customers. In the same way, e-mail messages and any type of file produced during the operation of an organisation can also be considered internal data. On the other hand, external data sources are considered to be outsourced services, such as those provided through Cloud infrastructures, but also third-party platforms such as social networks and suppliers' websites. In the same category it is possible to include all the external applications with which the company must interact for its work. Consider for example financial services agencies that through Web services connect their CMRs to those of the credit institution with which they operate.

Business Intelligence and Data Warehouse

Data Warehouses can play an important role in supporting Business Intelligence because they help to make rational decisions based on data. In fact, they represent a source of historical and operating information which, when used for analysis, allows to identify trends, anticipate market changes, and adapt strategies, budgets and investments based on accurate evaluation and forecasts. To enhance the data made available by the Data Warehouses, it is necessary to develop analytical skills, also by taking a specific training course such as the Business Data Analysis Master offered by Talent Garden. The master course was specifically designed to acquire in-depth skills on the functioning of Data Driven businesses. Dedicated to Sales Managers, Marketing Managers, Product Managers and Analysts with at least 3 years of work experience, it is divided into four weekends of live online learning sessions and two weekends of in presence lessons at the Talent Garden Calabiana, in Milan. These are occasions during which all aspects of the use of data in defining business strategies can be discovered.

Conclusions

Data Warehouses are systems that allow you to allocate data through different sources in order to have useful information for Decision Making. There are different Data Warehouse structures. Industry professionals therefore have the task of selecting and managing the most suitable one for their organisation and support for Business Intelligence.
Article updated on: 09 August 2023
Talent Garden
Written by
Talent Garden, Digital Skills Academy

Keep reading

How to make a company data driven. Is data at the heart of strategic decisions?

A data driven company? Yes, digital transformation can transform a business and all the decisions of company managers ...

What Data Engineers do, how to become one and how much they earn

To define the professional figure of the Data Engineer it is useful to refer to Big Data, that is the amount of ...

How to become a Chief Risk Officer (or Risk Manager)

What is Risk Management: the definition Risk management is a task that includes all identification processes, ...

What is Data Scraping and what are its applications for Data Analysis

Data Scraping is an extremely wide-ranging subject that covers several contexts, from content optimization for search ...