Data Warehouses, Data Lakes, Data Marts, Databases

Written by the Datamine team | Jan 10, 2023 10:24:12 PM

Data Warehouses, Data Lakes, Data Marts, & Databases

What's the difference?

There are many different data storage terms, and they can quickly get confusing. To find the best data storage solution, check out the definitions of each, how they compare, and the best use cases.

WHAT IS A DATABASE?

A database collects structured or unstructured data and is typically controlled by a database management system (DBMS). In many instances, databases are used as repositories for data related to a specific situation – for example, a cloud database like Amazon Web Services (AWS).

New data that comes into the database is processed, organised, managed, updated, and stored in tables to make processing and data querying efficient. Most databases use structured query language (SQL) for writing and querying data.

More characteristics of databases:

RDBMS-specific – strict rules around data storage and organisation
NoSQL-specific – flexible data storage.
Single-purpose – handles one process
Used for online transaction processing (OLTP)

Types of Databases

There are many types of databases depending on how an organisation plans to use the data. Here are three databases and their purpose:

Distributed database - Consists of two or more files located at different sites (for example, multiple computers or networks).
Nonrelational (NoSQL) database - Stores and manipulates unstructured and semi-structured data.
Relational databases - Organises data as a set of tables with columns and rows, providing the most flexible and efficient way to access structured information.

How does a database work?

In a relational database, each row and column contain unique data. For example, your row might list customer information such as name, email and job role, and your column would set out a specific type of information relevant to all customers.

Use case of a database

Consider a retailer using a POS application that reads and writes real-time transaction information to the database at a reasonable speed. The retailer wouldn’t want their POS system freezing for one minute in the middle of making a sale (so there aren’t any intensive analysis processes run on it to prevent this).

A database could be as simple as a single access database table but could be a normalised transactional database, a graph database storing relationships, or a NoSQL database.

WHAT IS A DATA WAREHOUSE?

A data warehouse is a collection of already modelled or structured data that can be analysed using different visualisation tools - tables, keys, indexes, views, and types - to make data-driven business decisions in near to real-time.

Data warehouses like BigQuery, Amazon RedShift, and Snowflake are created through data integration, such as databases, using an operational data store (ODS) and the automation process called ETL (extract, transform, load). So, for example, multiple departments can contribute data sets into a single database, and then structure the information (using ETL) to help decision-makers delve into metrics and insights that optimize business performance.

More characteristics of a Data Warehouse:

Stores current and historical data – old data stays intact even when new information is updated.
Captures data from multiple disparate databases.
Works with ODS to house normalised, cleaned data.
Reports and dashboards use data from data warehouses.

Data Warehouses Have a 3-Tier Architecture

Bottom tier - The bottom tier includes a data warehouse server and a relational database system that collects, cleans and transforms data from multiple data sources through the ETL process.
Middle tier - The middle tier includes an online analytical processing (OLAP) server to provide for easy data grouping, aggregation and joining. The type of OLAP model used depends on the current type of database system.
Top tier - The top tier has a user interface or reporting tool enabling business data analysis.

How does a data warehouse like BigQuery or Snowflake work?

A data warehouse contains multiple databases. Within each database, data is stored in tables and columns. You can add a description of the data, such as an integer, data field or string. Tables can be organised inside a schema or folder. Query tools use the schema to determine which data tables to access and analyse.

Use Case of a Data Warehouse

You would primarily read from a data warehouse, while the source of their data is typically a transactional database for fast writing of data.

You might consider a data warehouse if you’re interested in storing history for a lot of structured data, you don’t want to use cloud technology, and you have some known query patterns that align well with a star schema for reporting.

Often data marts are built from the data held in the data warehouse in a top-down approach for governance and performance reasons.

WHAT IS A DATA mART?

Like a data warehouse, a data mart will maintain and house cleaned data ready for analysis. However, the visibility scope is limited to subject-oriented data to support a specific business unit, e.g. the marketing department. This means the business unit can avoid sifting through an ocean of irrelevant and non-essential data to generate the reporting it needs.

It’s like a mini data warehouse that holds a smaller, more focused subset of data important to a single team or select group of business users.

More characteristics of a Data Mart:

Security – because a data mart only contains data specific to a department, it significantly reduces the chance of data misuse and the potential for conflicting reports.
Speed – there’s less volume in a data mart, meaning less processing – and faster queries.

The 3 Types of Data Mart:

Dependent data mart - Created from an existing data warehouse, often referred to as the top-down approach, a dependent data mart stores all business data in one location. When analysis is required, it withdraws a defined set of data.
Independent data mart - An independent data mart is created without a data warehouse and focuses on one business function or particular workload. The data is released from internal or external data sources, refined, and then loaded into the data mart for business analysis.
Hybrid data mart - A hybrid data mart aggregates data from a data warehouse and other operational data sources. It has the speed and focus of a dependent data mart and enterprise-level integration. It’s suitable for businesses with multiple databases requiring fast reporting turnaround.

How does a Data Mart work?

Building a data mart is a complex procedure that includes multiple technologies and tools. The result is a physical database that enables an organisation to uncover in-depth insights quicker than they would with a broader data warehouse data set.

Use Case of a Data Mart

You would consider a data mart if you’re interested in storing history for a lot of structured data for a specific business area. Also, you don’t want to embrace cloud technology and have some known query patterns that align well with a star schema for reporting. Finally, you may combine several data marts to build a data warehouse in a bottom-up approach.

For example, your marketing team wants a dashboard that uses some, but not all, of the aggregated data in your data warehouse. The data mart contains relevant data the marketing team need, so dashboards refresh a lot faster without compromising access to other confidential data.

What is a Data Lake?

A data lake casts a much wider net than a database or data warehouse to capture any raw and processed data an organisation considers valuable for future use.

The data lake will extract data from multiple sources and process it like a data warehouse. Data scientists can also use the data for analytics and report creation. However, the technology used in a data lake is much more complex.

More characteristics of a data lake:

Collects all data from many disparate sources over an extended period.
Processes, cleans, and stores data in the data lake without an ODS.

How does a Data Lake work?

A data lake holds big data in its native, raw format. Several technologies and applications are needed to process and analyse the information, as well as data analysts experienced in complex data, programming languages and data science techniques. For this reason, data lakes are often used with machine learning to extract required outputs.

Use Case of a Data Lake

You’re a retailer that needs to store large amounts of raw data of all different types, e.g. POS transactions, images, and pdfs from various sources. This might be because you’re unsure what to analyse at a later date, or you’re in an industry where you need to store different information for regulatory purposes.

Data lakes typically take advantage of cheaper cloud storage and are a good source for data analytics. For example, you would consider a data lake if you’ve reached capacity in your existing databases and want to retain a large volume of information that is proving difficult to manage. A data lake brings that information together in one place so you can apply some governance.

Database versus a Data Warehouse

Understand the key differences between a database and a data warehouse.

Database	Data Warehouse
Used to capture transactional data from one primary source Holds current data only Captures fundamental daily operations	Summarised data from various sources is used for data analysis Holds current and historical data Helps provide insight into overall business operations

Data Warehouse versus a Data Mart

Here are the key differences between a data warehouse and a data mart.

Data Warehouse	Data Mart
Contains the cleaned, normalised data from all business units of an organisation Gets its data from databases	Gets its data from the data warehouse Contains essential data from a specific subgroup, e.g. a single business unit, for faster, more effective data analysis

Data Mart versus a Data Lake

See the key differences between a data mart and a data lake.

Data Mart	Data Lake
Contains a small subset of filtered, structured essential data for a specific business unit Better for fast analysis of specific data	Contains all the raw, unfiltered data from an enterprise Better for broader, deeper analysis of raw data An all-in-one solution that combines a data warehouse, database and data mart

Data Warehouse versus a Data Lake

What are the key differences between a data warehouse and a data lake?

Data Warehouse	Data Lake
Stores cleaned data for analysis and reporting Uses an ODS from transactional systems Designed for operational users who need to generate analytical reports	Stores all data for an organisation Extracts data from all data types and channels Designed for deep analysis that goes beyond a data warehouse

Database versus a Data Mart

Now, the key differences between a database and a data mart.

Database	Data Mart
Transactional data repository Captures data from one subject Data is raw and unprocessed The first step in the data ETL process Users don’t interact with data directly	Analytical data repository Captures data from multiple subjects Data is processed and validated for reporting The last step in the ETL process Users interact with data directly

Database versus a Data Lake

Lastly the key differences between a database and a data lake.

Database

Data Lake

Captures transactional data from one subject
Stores traditional data – text, numeric
No data cleaning capabilities
Exports data to an ODS
The first step in the ETL process

Provides data integration from many databases and other data sources

Stores all types of data – images, sound files, PDFs
Implements data cleaning procedures
All data processing happens in the data lake
Handles all aspects of the ETL process

Are you getting the best value from your data?

Your organisation could be sitting on a potential goldmine of data. But figuring out how to store and process that data so you can extract those all-important analytics can sometimes get in the way of unearthing the golden nuggets.

Chat with the experts who build enterprise data management solutions daily – use the contact form to get in touch.

View full post