Dataflow vs. Datamart, or maybe even Dataflow vs. Datamart vs. Dataset – which comparison of Power BI components makes the most sense? Which one should go through your mind when designing your Business Intelligence solution?
While the term Dataset must be familiar to all those who spend more than 5 minutes in Power BI Service, Dataflows and Datamarts have not yet settled down in the Business Intelligence world.
Datamarts are still in preview mode, giving users the possibility of trying it out and providing their feedback. Nevertheless, Dataflows were released a few years back and, looking at their qualities, they should get definitely more attention that they do.
Once you read this article, you will know:
- what is a Power BI Dataflow and a Datamart,
- what are benefits of Power BI Dataflows and Datamarts,
- what is the difference between Power BI Dataflow and Dataset,
- when to use Power BI Dataflows and Datamarts,
- will Power BI Datamarts replace Dataflows, Datasets, Power BI Desktops, or data platforms,
- what is the difference between Dataset, Dataflow and Datamart,
- which should you use – Dataset vs. Dataflow vs. Datamart?
Power BI Dataflows
What is a Power BI Dataflow?
Power BI Dataflow is a data transformation service, running in the cloud independently from a report or Dataset. It leverages the Power Query engine, using Power Query online and UI (User Interface) to perform data modifications.
In Power BI Service, you can create Dataflows in a workspace. Once you save and publish it, data transformations run in the cloud.
What are Power BI Dataflows benefits?
The true power of Dataflows lies in their independence from Datasets. As a result, data transformation component in Dataflows won’t affect loading times of Power BI reports. Nonetheless, there are also other benefits of Dataflows that make them powerful capabilities.
Power BI Dataflows can easily integrate with various data sources such as Dynamics 365, Salesforce, and other external sources. They allow you to connect, clean, and transform data into a unified, consistent view that can be used across multiple reports and dashboards.
Data transformation and enrichment
Data transformations and enrichment capabilities of Power BI Dataflows are the same as those made in Power BI Desktops. Similarly, you can easily:
raw data into meaningful insights for target audience.
You can also enrich your data by including additional information from external sources to add more context to your analysis.
As already mentioned, Dataflows in Power BI allow you to create reusable data entities that can be easily shared across different reports, Datasets, dashboards, or teams. As a result, it can help you to limit the redundancy of data, promote consistency, and reduce data silos.
Power BI Dataflows can also automate repetitive tasks, like data cleansing. Since you can do this once and then use it in different reports, they save you a significant amount of time and effort.
What is more, Dataflows provide built-in security features that ensure data privacy and compliance with organizational policies and regulations.
Power BI Dataflows also turn out to be highly scalable, being able to handle large volumes of data with ease. As a result, data is processed quickly and doesn’t become a bottleneck.
Finally, Power BI Dataflows provides version control. It allows you to easily track changes to a data model and revert to a previous version if needed.
What is the difference – Power BI Dataflow vs. Dataset?
Looking at the definition and benefits, you might think that Dataflows are like Datasets.
In fact, there are a few similarities between data transformations in Dataflows and on Datasets in Power BI desktop, among others:
- Power Query engine,
- M language for scripts,
- running data transformations online when published.
Nevertheless, the major difference between them is that Power BI Datasets load data into Datasets directly, whereas Dataflows to Azure Data Lake, Dataverse or other storage is available.
Dataflows are created in the cloud – in Power BI Service in a workspace, and run independently from any Power BI Dataset. Therefore, one Dataflow can be loaded and used in many different Datasets and tables. As a result, it provides a re-usable ETL (Extract-Transform-Load) component, whereas a Dataset provide a modeling layer.
As a result, Dataflows can be a replacement for your Power Query layer to improve the performance of your report, whereas Dataset can be a replacement for your modeling, relationships, and DAX expressions.
The best practice then would be to create Dataflows that access the data source directly, and Datasets that access the data from Dataflows. It will help you in improving the performance of your report by decreasing the loading and refreshing times.
Let’s summarize the key differences between Datasets and Dataflows in Power BI before we move on to Dataflows use cases in companies.
|Data is loaded directly into a Dataset||Data is loaded into Azure Data Lake, Dataverse or other storage available in the future|
|Modeling layer||ETL layer|
|Replace modeling, relationship, and DAX layer||Replace Power Query layer|
|Access the data from Dataflows when possible (best practice)||Directly access data sources|
|Impact the loading and refreshing times by running transformations when report is refreshed||Improve the performance of a report by running transformations independently|
When to use Power BI Dataflows?
Once we know the power of Power BI Dataflows, let’s move from theory to practice and check when you should implement Dataflows in your Business Intelligence pipeline.
Use case 1: One table in multiple reports
As already mentioned, a perfect use case for Power BI Dataflows are data tables that are used in multiple reports. Instead of getting data from the source and performing data processing in the Desktop each time, you can use one Dataflow.
If you need to use the same table in another report, with a connection with other tables not in the first file, you would have to reproduce the Power Query transformations by copying and pasting the M Script into a new report.
A good candidate can be, for example, a Date dimension. It needs to be used in sales, inventory, operations and finance reports. Nonetheless, instead of copying the script for Date dimension into four, you can easily create one Dataflow and then load it to all of them.
As a result, you do the transformation once and you can be sure that there is consistency in Dates across all reports. Therefore, if you can easily detect re-usable table or queries across multiple files, they are probably good candidates for Power BI Dataflow.
Use case 2: Different schedules or refresh times
Another great use case for Power BI Dataflows can be a report with data tables that have completely different schedule times and options.
For example, in your model there can be tables, like:
- a sales table, that need to be refreshed each day,
- mapping tables, like a products table, that needs to be updated only once per month.
If you store both in a Power BI Dataset, you are forced to refresh your report each day as otherwise it will display outdated information. Unfortunately, such refresh also entails, almost always, unchanged mapping table, which can increase the loading time.
Luckily, if you implement Dataflows, you can set up a mechanism to refresh a mapping table once a month, applying all the transformation needed, and then load it into a report.
As a result, your report can be updated daily with the latest sales data and, once a month, updated with information about products coming from a Dataflow.
Use case 3: Centralized data warehouse
Due to the rise of usage of Power BI and other self-service technologies, numerous organizations have started to implement BI systems without a centralized data warehouse.
A centralized warehouse is a specifically designed database to store data in the predefined format. In cases of building traditional BI systems, setting it up should always be an integral element of the process:
- extract-transform-load (ETL) process will extract data from sources and load it into the warehouse,
- all reporting solutions will use the warehouse as the single source of truth.
Dataflows can serve as an important element of building a centralized data warehouse for a Power BI solution.
To implement this, you can start by creating the structure using Power Query and storing data in output tables. These tables will serve as a centralized data warehouse for your pbix files. You can also set up an Azure Data Lake Storage with certain tables structure and make your Dataflows load data into these tables.
Power BI Datamarts
Datamarts are a major new capability coming to Power BI that can be used in a public preview.
What is a Power BI Datamart?
In brief, a Power BI Datamart can be seen as an upgraded extension a Dataflow. It is built upon qualities of a Dataflow, but with extra common scenarios added that would not entirely fit into the simplicity upon which Dataflows were created.
Truth be told, Power BI Datamart is rather a container of components, instead of a single object like a Dataflow. Once you create one, you are also creating:
- Azure SQL Database,
Therefore, Datamarts also allow users to perform data transformations using a Power Query online and create reusable shareable data tables in Power BI workspace.
Nonetheless, Datamarts come with extra elements – one of the most crucial is data storage in Azure SQL Database.
As a result, they assist in bridging the gap between business users and IT. They allow users to store and explore data loaded in a fully managed database. They provide a simple and optionally no-code way to:
- ingest data from various sources,
- extract, transform and load (ETL) the data using Power Query online,
- load the data into an Azure SQL database, requiring no tuning or optimization.
Once you load data into a Datamart, you can also define relationships or policies for Business Intelligence and analysis.
Datamarts automatically create a Power BI Dataset or semantic model, that can be used to generate Power BI reports and dashboards. You are also able to query a Datamart using a T-SQL endpoint or a visual experience.
Source: Understand datamarts (preview) – Power BI | Microsoft Learn
What are Datamarts benefits?
As already mentioned, Datamarts come with all the qualities of Dataflows.
One of their downsides, and differences, as compared to Dataflows is the storage limit, which is 100 GB in preview for Datamarts, and limitless for Dataflows.
Apart from that, they can equip businesses and users with even more capabilities to enhance data analytics and the decision-making process.
Trustworthiness of data
Once created, Datamarts provide a single source of truth when it comes to specific subjects or departments. As a result, teams have a collective view of the data and are able to concentrate on getting insights, making decisions and acting, instead of sharing spreadsheets, wondering which data is most accurate.
Single place to manage
Datamart user interface (UI) is web-based, so you don’t have to:
- install Power BI Desktop, SSMS (SQL Server Management Studio), or Visual Studio,
- use any specific operating systems.
Once you get data from data sources, you can build an entire Power BI solution using the same UI in Power BI Service.
Row-level security support
Unlike Dataflows, Power BI Datamarts support row-level security (RLS). When you create a Datamart, you can define row-level security rules restricting what users can see when they are connected to the Datamart. Consequently, you can define filters within roles and ensure better security of data.
Self-service analytics solutions
Another great advantage of Power BI Datamarts is the self-service usage. Once you have the right permissions, you can easily query the Datamart’s underlying SQL database using SQL queries and perform ad-hoc analysis.
Therefore, all authorized users can easily perform relational database analytics or exploratory data analysis in SSMS, Excel, or Azure Data Studio without the need for a database administrator.
Data transformations with no code
What is more, Datamarts provide better capabilities for:
- data ingestion,
- data preparation (like merges, joins) to create star schema data model,
- data exploration
with SQL, with optionally no-code experience.
Completeness of the BI Ecosystem in holistic cloud experience
Power BI Datamarts bring together:
- data transformation component using Dataflows (and Power Query online),
- data warehousing using Azure SQL Database,
- data modeling using Power BI Dataset.
Therefore, they provide a complete four-layer implementation in Power BI. They allow to build semantic models in a holistic cloud experience, where basic modeling features like DAX measures, row-level security or relationships can be created in the web editor.
When to use Power BI Datamarts?
Use case 1: Larger scale than Datasets
When you build a solution, for which you need a larger scale than you can achieve solely with Power BI Datasets, you can think about building one or multiple Datamarts to generate smaller reusable artifacts that you can use across multiple entities.
Although it may seem a lot like Dataflows, with Datamarts you can:
- enhance your solution with additional features, for example RLS,
- get an auto-generated Dataset that is directly connected to your Datamart, which enables you to create new reports on top of it.
What is more, other users can use your Datamart as a starting point and enhance it with their own data. There is a built-in connector to Power BI Datamarts in Power BI Desktop which allows you to connect to the auto-generated Dataset. As a result, users can benefit from pre-created relationships and measures that you have already defined.
Use case 2: End-to-end semantic models
Another great use case for Power BI Datamarts is building end-to-end semantic models. They allow developers to create end-to-end solutions independently from other tools or IT teams.
Using Datamarts allows you to eradicate the need of managing orchestration between Dataflows and Datasets through auto-generated Datasets. What is more, in the same time, delivering visual experiences for querying data and ad-hoc backed by Azure SQL Database.
Use case 3: Centralized data warehouse
Although Power BI Dataflows can serve as an integral element of building a centralized data warehouse, Power BI Datamarts, loading the data directly to an Azure SQL Database, can be a much better solution for this purpose.
With Datamarts, users can build a data warehouse with the data transformation layer in a way that is meaningful for future projects or colleagues easily using Power BI. The solution will be governed by the Power BI service, so the implementation will be faster, and users will not depend on other teams, like a BI Team, that can implement a process for certifying Datamarts and decrease their backlog.
As a result, Power BI Datamarts empower users and their organization to implement faster Power BI solutions in a fully-governed structure.
Will Power BI Datamart replace Dataflow, Dataset, Power BI Desktop, or data platform?
Although Datamarts are built upon the Dataflows, they are not a replacement for them. In fact, they fulfill different needs. You will still tap into Dataflows to load centralized dimensions to reuse them in multiple data models across your Power BI ecosystem and easily share with others.
Similarly, a Datamart is also not a replacement for a Power BI Dataset, which comes with aggregations, more complex DAX expressions, and object-level security.
Moreover, Datamarts won’t also replace Power BI Desktops. Although Datamarts are powerful, they don’t have features like user-defined aggregations, more complex DAX expressions, or security setups that are integral elements of Desktops.
Last but not least, a Power BI Datamart won’t replace your existing data platform or warehouse. Datamarts are intended for self-service needs and have lower scalability for large volumes.
Moreover, they are delivered in SaaS offering, whereas data platforms come as PaaS offering. As a result, in data platforms engineers and architects are able to define pipelines for data ingestion, configure security requirements, which is not possible in Datamarts.
What is the difference – Dataset vs. Dataflow vs. Datamart?
Let’s summarize the key differences between Datasets, Dataflows and Datamarts.
|Data is loaded directly into a Dataset||Data is loaded into Azure Data Lake, Dataverse or other storage available||Data is loaded into an Azure SQL Database|
|Modeling layer||ETL layer||Business Transformation Layer|
|Replace modeling, relationship, and DAX layer||Replace Power Query layer||Dataset and Dataflow are elements of a Datamart|
|Access the data from Dataflows when possible (best practice)||Directly access data sources||Use Dataflows to access data sources|
|Impact the loading and refreshing times by running transformations when report is refreshed||Improve the performance of Power BI report by running transformations independently from the report||Enable all business users to self-service and get faster insights into data, removing dependency on other departments, for example IT|
Which should I use Dataset vs. Dataflow vs. Datamart?
Which one to use – Dataset, Dataflow or Datamart is not an easy question. The answer depends on numerous conditions and on the outcome you need to achieve.
If the implementation stage includes:
- data ingestion from the source,
- data transformations,
- data load,
- creating DAX expressions,
Power BI Datamarts enable you to build such a solution in one combined structure.
On the other hand, there are situations where data transformations are more complex. As a result, you should isolate them from the data source to ensure that minimal changes are required is the source changes. In such cases, you should think about including Dataflows in the transformation layer.
As for Datasets, they themselves are already important components. They are able to handle large volumes of data, add user-defined aggregations or more complex DAX expressions. For example, if you implement an architecture that is done through a service like Azure Data Factory or Azure Synapse, data models can be built using Power BI Dataset with calculations on top of it.
All in all, when designing your Power BI Solution architecture, it is worthwhile to be familiar with capabilities of all Power BI components available. If implemented accordingly, they can:
- improve the performance of your report,
- empower business users with the self-service usage and the possibility of ad-hoc data analysis,
- release you from doing repetitive tasks.
Therefore, keep up with the latest additions and implement them in your solutions to always stay at the top of your game. And your reports and dashboards as well.