There are many different data storage terms, and they can quickly get confusing. To find the best data storage solution, check out the definitions of each, how they compare, and the best use cases.
A database collects structured or unstructured data and is typically controlled by a database management system (DBMS). In many instances, databases are used as repositories for data related to a specific situation – for example, a cloud database like Amazon Web Services (AWS).
New data that comes into the database is processed, organised, managed, updated, and stored in tables to make processing and data querying efficient. Most databases use structured query language (SQL) for writing and querying data.
More characteristics of databases:
RDBMS-specific – strict rules around data storage and organisation
NoSQL-specific – flexible data storage.
Single-purpose – handles one process
Used for online transaction processing (OLTP)
There are many types of databases depending on how an organisation plans to use the data. Here are three databases and their purpose:
In a relational database, each row and column contain unique data. For example, your row might list customer information such as name, email and job role, and your column would set out a specific type of information relevant to all customers.
Consider a retailer using a POS application that reads and writes real-time transaction information to the database at a reasonable speed. The retailer wouldn’t want their POS system freezing for one minute in the middle of making a sale (so there aren’t any intensive analysis processes run on it to prevent this).
A database could be as simple as a single access database table but could be a normalised transactional database, a graph database storing relationships, or a NoSQL database.
A data warehouse is a collection of already modelled or structured data that can be analysed using different visualisation tools - tables, keys, indexes, views, and types - to make data-driven business decisions in near to real-time.
Data warehouses like BigQuery, Amazon RedShift, and Snowflake are created through data integration, such as databases, using an operational data store (ODS) and the automation process called ETL (extract, transform, load). So, for example, multiple departments can contribute data sets into a single database, and then structure the information (using ETL) to help decision-makers delve into metrics and insights that optimize business performance.
More characteristics of a Data Warehouse:
Stores current and historical data – old data stays intact even when new information is updated.
Captures data from multiple disparate databases.
Works with ODS to house normalised, cleaned data.
Reports and dashboards use data from data warehouses.
A data warehouse contains multiple databases. Within each database, data is stored in tables and columns. You can add a description of the data, such as an integer, data field or string. Tables can be organised inside a schema or folder. Query tools use the schema to determine which data tables to access and analyse.
You would primarily read from a data warehouse, while the source of their data is typically a transactional database for fast writing of data.
You might consider a data warehouse if you’re interested in storing history for a lot of structured data, you don’t want to use cloud technology, and you have some known query patterns that align well with a star schema for reporting.
Often data marts are built from the data held in the data warehouse in a top-down approach for governance and performance reasons.
Like a data warehouse, a data mart will maintain and house cleaned data ready for analysis. However, the visibility scope is limited to subject-oriented data to support a specific business unit, e.g. the marketing department. This means the business unit can avoid sifting through an ocean of irrelevant and non-essential data to generate the reporting it needs.
It’s like a mini data warehouse that holds a smaller, more focused subset of data important to a single team or select group of business users.
More characteristics of a Data Mart:
Security – because a data mart only contains data specific to a department, it significantly reduces the chance of data misuse and the potential for conflicting reports.
Speed – there’s less volume in a data mart, meaning less processing – and faster queries.
Building a data mart is a complex procedure that includes multiple technologies and tools. The result is a physical database that enables an organisation to uncover in-depth insights quicker than they would with a broader data warehouse data set.
You would consider a data mart if you’re interested in storing history for a lot of structured data for a specific business area. Also, you don’t want to embrace cloud technology and have some known query patterns that align well with a star schema for reporting. Finally, you may combine several data marts to build a data warehouse in a bottom-up approach.
For example, your marketing team wants a dashboard that uses some, but not all, of the aggregated data in your data warehouse. The data mart contains relevant data the marketing team need, so dashboards refresh a lot faster without compromising access to other confidential data.
A data lake casts a much wider net than a database or data warehouse to capture any raw and processed data an organisation considers valuable for future use.
The data lake will extract data from multiple sources and process it like a data warehouse. Data scientists can also use the data for analytics and report creation. However, the technology used in a data lake is much more complex.
More characteristics of a data lake:
Collects all data from many disparate sources over an extended period.
Processes, cleans, and stores data in the data lake without an ODS.
A data lake holds big data in its native, raw format. Several technologies and applications are needed to process and analyse the information, as well as data analysts experienced in complex data, programming languages and data science techniques. For this reason, data lakes are often used with machine learning to extract required outputs.
You’re a retailer that needs to store large amounts of raw data of all different types, e.g. POS transactions, images, and pdfs from various sources. This might be because you’re unsure what to analyse at a later date, or you’re in an industry where you need to store different information for regulatory purposes.
Data lakes typically take advantage of cheaper cloud storage and are a good source for data analytics. For example, you would consider a data lake if you’ve reached capacity in your existing databases and want to retain a large volume of information that is proving difficult to manage. A data lake brings that information together in one place so you can apply some governance.
Understand the key differences between a database and a data warehouse.
Database | Data Warehouse |
|
|
Data Warehouse versus a Data Mart
Here are the key differences between a data warehouse and a data mart.
Data Warehouse |
Data Mart |
|
|
Data Mart versus a Data Lake
See the key differences between a data mart and a data lake.
Data Mart |
Data Lake |
|
|
Data Warehouse versus a Data Lake
What are the key differences between a data warehouse and a data lake?
Data Warehouse |
Data Lake |
|
|
Database versus a Data Mart
Now, the key differences between a database and a data mart.
Database |
Data Mart |
|
|
Database versus a Data Lake
Lastly the key differences between a database and a data lake.
Database |
Data Lake |
|
|
Your organisation could be sitting on a potential goldmine of data. But figuring out how to store and process that data so you can extract those all-important analytics can sometimes get in the way of unearthing the golden nuggets.
Chat with the experts who build enterprise data management solutions daily – use the contact form to get in touch. We’ll get back to you within one business day.