From the course: Implementing a Data Warehouse with SQL Server 2022

Data warehouse core concepts

- [Instructor] Let's kick off this course with an explanation about what a data warehouse is. A data warehouse plays a critical role in an enterprise's business intelligence system. In a typical scenario, important business data is initially collected in operational or production databases. These can be separate databases for customer transactions and sales data, or human resources and employment records, or corporate financial information. They can even be external data sources maintained by a vendor or a government regulator. All of this data is consolidated through an ETL process that extracts or copies the data from the disparate source databases, transforms it into a common format and arrangement, and then loads or writes it into a single destination database. This single repository is the data warehouse. Then once everything is brought into a common, unified location, the data can be used for analysis and reporting with the knowledge that everyone is pulling business insights from the same set of information. The data warehouse represents the unified truth that everyone in the enterprise can agree on. Data warehouses are designed to collect data over long periods of time, and they aggregate and format that data so that it can better support the analysis and reporting requirements of the enterprise. In this way, they differ from the transactional databases that are used for real-time data collection. However, data warehouses use the same table structures with attribute columns and data types and relationships. They also use the same underlying database engine. In our case, SQL Server. And the same query language, SQL. The difference is is that they only receive data through a standardized method, the extract, transform, and load process. And once loaded, that data doesn't change very often. Where production databases store information that might be entered and modified by customers or employees or automated sensors or any number of different data entry points, the ETL process alone is used to populate the data warehouse. This allows it to be able to organize the data, make it more consistent and predictable across the various inputs, and aggregate it into meaningful groups that support further analysis. The ETL is typically run on a regular schedule most often nightly, so that the data warehouse can support consistent analyses for everyone in the organization between the loads. So what are the benefits of incorporating a data warehouse into your enterprise business intelligence strategy? The first is that it relieves system resource pressure from your operational databases by removing reporting and analysis workloads. This allows your primary database to focus on the transactional needs of the business, and keep those resources dedicated to writing and updating records in order to keep up with the real-time workload of the business. Implementing a data warehouse also helps remove issues related to database isolation level locking contention, which can be caused when complex analysis queries are run against the database, preventing database updates when the query executes. Next, a data warehouse presents a single, consistent point of contact for all analysis and reporting needs, regardless of the original source of the data. This makes queries easier to write and maintain and greatly simplifies the process of gaining insight for users and decision makers across the organization. Data warehouses also present an opportunity to tune the database for analysis query performance. Because a data warehouse database is solving an inherently different problem from the operational database, different decisions can be made about the structure of tables and relationships, and the types of resources dedicated to the system. We're going to talk a lot about this benefit throughout the course. And finally, data warehouses allow an enterprise to maintain historical data, even when the operational databases do not. Again, this relieves pressure from the operational databases by allowing them to focus strictly on the current state of the data. As you can see, a data warehouse is just another database in the enterprise, but they serve a different purpose from the relational databases that you might be used to working with. The primary goal of a data warehouse is to support the decision-making needs of the enterprise. And as such, they require a different approach to their design.

Contents