Types of Data Warehouse, Architecture & Differences With Database and Data Lake

Types of Data Warehouses
Reading Time : 10 mins

If you have read our previous article “Why You Need a Data Warehouse in Dental Clinics & DSO”, you already have an understanding of data warehouses. Just as a refresher, a data warehouse is something that stores heaps of data on an organization and provides analytical insights along with business intelligence solutions. In this article, we are going to dig a little deeper and understand what types of data warehouses are available, and which one is the best option for a DSO or dental practice.

Types of Data Warehouses 

Data warehouses are an extremely efficient way of getting analytics and displaying product data, but they come in all shapes and forms. You will find 3 types of data warehouses with certain differences in functionalities. Each of them caters to the requirements of certain businesses and users. Now, those 3 types of data warehouses include the following.

  • Enterprise Data Warehouse (EDW) 
  • Operational Data Store (ODS) 
  • Data Mart 

As mentioned earlier, each data warehouse type will bring in different features and benefits. Let us understand those data warehouse types and how it helps a particular organization.

Enterprise Data Warehouse (EDW) 

An enterprise data warehouse is a centralized database that helps you get a clear picture of the entire organization. It offers a foundation for decision support across an organization and assists with the visualization and analysis of the data collected.

You can classify your information according to departments or locations, allowing for quick access to relevant information in one central location. It also offers the largest single database which makes it a perfect choice for businesses.

Components of Enterprise Data Warehouse (EDW) 

An EDW typically includes 

  • Sources of information from transactional and operational systems (ERP, CRM, finance apps, IoT devices, mobile and online systems). 
  • A location for gathering and cleansing data. 
  • An interactive summary of key metrics like querying, reporting, and sharing in a presentation or access space. 
  • A variety of APIs or tool integrations for data (BI software, ingestion and ETL tools, etc.) 

If you are wondering why EDW is the best choice for a DSO or dental practice, there are multiple reasons. One simple reason is that it allows the extraction of data from multiple locations. Want to know another best thing?

We at BPK Tech provide DSOs and solo dental practices with an industry-best enterprise data warehouse to get business intelligence reports. Using such information, an organization can make data-driven decisions to improve efficiency and overall growth. Our data warehouse will bring in multiple PMS under one platform to automate it all. The best feature is that it works with the no-touch deployment of PMS.

Operational Data Store 

An operational data store (ODS) is another kind of data warehouse. An ODS is an added element to your enterprise-wide data warehouse (EDW) that touches almost every business aspect of your company. One of its purposes is to serve as a central database for operational reporting and business intelligence tools, acting as the backbone for consistent reports across the entire company. With that, a company can make the right decisions.

Storing employee records, maintaining them and other reports is easier in ODS as it is constantly updated in real-time.

Data Mart 

The third kind of data warehouse is Data Mart. A data mart is derived from a data warehouse (an actual subdivision of it), that is designed to be used by a specific team, such as finance or sales. It is subject-oriented, meaning it will provide the relevant data that these individuals need and also act as a ready reference for them so that they don’t have to go through an entire massive data warehouse to find what they’re looking for.

The real-time availability of all this relevant information means that the people involved in the processing and internal operations can rapidly act on the new information available instead of getting bogged down trying to find variables from across the whole data pool.

Data Warehouse Architecture 

As familiar as you may be with the term “data warehouse;” you may want to think of data warehouse architecture as a specific type of product in its own right. Architecture, as we all know, is the design of a structure. This is true for a data warehouse as well.  

No matter what structure your company chooses for its ultimate product, it is still necessary to go through the laborious process of organizing information and making sure it ends up easily digestible by your visiting customers. The architecture of a data warehouse is determined by the organization’s specific needs.  

A common data warehouse architecture includes the following things. 

Simple Architecture  

Almost all data warehouses have a central repository where metadata, summary, and raw data are stored. Data sources feed the repository, and end users access it for analysis, reporting, and mining. You can call this single-tier architecture.  

Simple Architecture, but with a staging area 

It is necessary to clean and process operational data before storing it in the warehouse. It is possible to do this programmatically, but many data warehouses also include a staging area so that data can be prepared before entering the warehouse. 

Hub and Spoke Model 

Organizations can customize their data warehouses in a hub-and-spoke model to serve different lines of business by adding data marts between their central repository and end users (acting like nodes between multiple spokes). Data is moved to the appropriate data mart when it is ready to be used. 

Sandboxes 

The purpose of a sandbox is to provide companies with a secure and private space to look for new data sets as quickly as possible. Along with that, data analysis methods cannot be governed by data warehouse protocols and restrictions. 

Well, so far, we have talked about data warehouse types and architecture. While discussing all these things, you have heard database quite a few times. This might lead you to wonder, what’s a database then? This is quite a common question many people ask when they talk about a data warehouse. We can provide you with a simple one-line answer, but that’s not useful now, is it? So let us discuss the difference in a bit more detail.  

Difference Between a Data Warehouse and Database 

In simple terms, a database is nothing but a collection of various kinds of data placed in storage. It could be cloud storage or a physical one. This may sound a little like a data warehouse but one key difference makes the two poles apart. The difference is analytics.

Data warehouses provide insights and business solutions from various historical data, whereas a database is storage space. A data warehouse uses Online Analytical Processing (OLAP), whereas a database uses Online Transactional Processing (OLTP). When it comes to usage, the database uses ER modeling techniques for design. Again, this isn’t in a data warehouse, rather data modeling techniques are used.

Just like data warehouse types, there are a few different types of databases. The downside of using a database is that it is restricted in storage to one application only. You will not find this restriction in a data warehouse as multiple applications’ data is stored here. This is a significant reason why many organizations prefer data warehouses over simple databases.

A data warehouse is also better optimized and organized for a business requirement. They offer more flexibility and options. That’s the reason you will also find them in dental practices and DSO.

Difference between a Data Warehouse and Data Lake 

A Data Lake is a storage archive that can store large amounts of data from multiple sources. Unlike a data warehouse that holds only structured data, a data lake can also store unstructured data. It is also a centralized storage archive but is flexible in terms of the kind of data it can store. Other data types include semi-structured and structured data. In simple terms, Data Lake is a more advanced version of a data warehouse.

The key advantage of using this type of technology is that you can easily connect multiple systems. This makes it much simpler to create complex reports and analyze them – preserving your valuable time. Now, let’s talk about some key differences between a data warehouse and a data lake.

Data Warehouse Data Lake
Long established, conventional approach. Modern approach
Contains only cleansed data through ETL Contains both raw data and cleansed data.
Storage is shared and expansion costs are high.  Here, storage is not shared and expansion is cheap.
Schema in Write model, since it’s structured. Schema in Read model because unstructured.
Teradata, Redshift, & Vertica are some examples of Data Warehouses. Snowflake, Athena, BigQuery, & Hive are some examples of Data Lake.

In terms of data quality, a data lake contains raw data as it takes all kinds of data, whereas a data warehouse has organized data. We know that a data warehouse’s main function is to provide analytics to make the right decisions for a business, so what is the main function of a data lake? Well, it is to store huge amounts of data, but more cost-effectively. This is another major difference between a data warehouse and a data lake.

A Few Other Differences

Another question may pop into your head about how data lake users can access data and utilize it when it is unstructured or semi-structured.

There’s one simple answer. Machine Learning, Big data, exploratory analytics, operational analytics, etc. So, it’s more than one I guess, my bad. While data in a data warehouse is in tabular format, which is easily accessible by SQL queries, Data Lake uses machine learning and Big Data analytic techniques to access and use the data.

As listed in the table, another difference between a data warehouse and a data lake is the Schema. A data warehouse is in write model for a schema. Usually, this means schemas are written and designed before the implementation of a data warehouse (sometimes also at the time of analysis). Still, for a data lake, the schema is on read so it is usually written at the time of analysis.

Interestingly, many data warehouses are now being updated with all the advantages that a data lake carries. So, there is not much reason to choose between either option when both can do the same job.

Factors to Consider While Evaluating a Data Lake or a Data Warehouse

As mentioned earlier in the table, there are various data warehouses and data lakes in the market. So, here we are going to list out a few factors (in no specific order) that will help you evaluate a data warehouse and a data lake.

  1. Data Loading
  2. Querying
  3. Query Performance Scaling
  4. Pricing
  5. Deployment

Data Loading

The data loading in a data warehouse or a data lake is one of the major factors to determine how good a data warehouse/data lake is. Data loading tells us whether it is easy or difficult to load data into a data warehouse and a data lake. Similarly, it also tells us about the querying difficulty. 

Formats like CSV, Parquet, ORC, AVRO, and JSON are the supported & popular formats for a data lake/warehouse. Most of the time data ingestion is done from the cloud storage where raw data is kept. The data loading part is different for each data warehouse/lake. Some use APIs to load the data (like Redshift), while others use an external table (like Snowflake & Hive). Then there’s BigQuery which loads data directly from the Google Cloud storage.

Querying

When you are done loading the data, it needs to be used for querying, and only then the raw data is useful for analytics. Various data warehouses/lakes use different techniques to run queries on the data. Various data warehouses/lakes use different query engines for this. Redshift uses ParAccel, Hive uses OSS Apache Hive, and Athena uses Presto. 

Query Performance Scaling

A data warehouse/lake should be able to scale independently, which is one of the other evaluating factors. Snowflake, Athena, and Hive are some of the data warehouses that can scale independently whereas BigQuery and Redshit don’t do that. The better the independent scalability, the easier it is to manage them.

Pricing

The pricing factor is different for every data warehouse or data lake. For example, Snowflake pricing starts at $2 per hour and has a 1 minute based billing. Redshift has an hourly basis pricing that depends on the provisioned server’s capacity. Similarly, each one has a different pricing model.

Deployment

This is probably the most important evaluating factor for a data lake/warehouse. You’ll see 3 kinds of deployment modes, Server-less, Cloud/On-premise, and SaaS. Many businesses choose between these 3 deployment models based on their requirements and business model. 

If you use a SaaS-based deployment model, you can get space through a subscription-based model and there’s no need to provide any space manually on a server. When it comes to the Server-less model, it is scalable on its own without any management. Cloud and On-premise though, require management and provision of space.

Wrapping Up 

In this article, we have covered a lot of things. From types of data warehouses to data warehouse architecture, and its difference with database and data lakes. We have discussed all these things, some briefly and others in a bit more detail. You might be looking for some answers to questions like which one is better for a dental practice. A data warehouse, database, or data lake. Well, the answer depends on your business requirements, but most of the time it’s a data warehouse. 

Again, an enterprise data warehouse is the best option for a DSO or any solo dental practice. Since you have come this far, you might as well check out BPK Tech’s data warehouse services and get one for your business. Better yet, hop on a call using this consultation link so we can talk business.  

Submit your response

Your email address will not be published. Required fields are marked *