Azure Synapse vs Azure Data Factory vs Power BI dataflows – what are the similarities and differences? 

BI & Data Science Analyst at 10 Senses

Data Engineers may use multiple tools for data transformation activities.  

In this article we will focus on Microsoft stack dataflows. We will then discover similarities and differences between dataflows in Azure Synapse, Azure Data Factory, and Power BI.  

Table of contents: 

  1. What is Azure Data Factory,  
  2. What is Azure Synapse,  
  3. What are Power BI dataflows, 
  4. What are the similarities between Azure Synapse, Azure Data Factory and Power BI dataflows, 
  5. What are the differences between Azure Synapse, Azure Data Factory and Power BI dataflows? 

What is Azure Data Factory? 

Azure Data Factory (ADF), introduced in 2015, is a Microsoft cloud ETL service allowing Data Engineers to integrate various, even disparate, data sources. Using Data Factory, you create scheduled pipelines in a code-free manner to manage data you have on-premises or in the cloud in data stores of various kinds (for example, Azure SQL database, Azure Blob storage, Azure Date Lake Storage).

In ADF, dataflows enable you to pull data into the ADF runtime, manipulate it, and then write transformed data back to a destination. They are similar to the concept of dataflows in SQL Server Integration Services (SSIS), but more scalable and flexible. 

A few of ADF’s key features are: 

  1. Real-time integration,  
  2. Possibility to chunk data,  
  3. Parallel processing.  

Mapping dataflows in ADF 

Creating mapping dataflows in Azure Data Factory

Source: https://docs.microsoft.com/en-us/azure/synapse-analytics/concepts-data-flow-overview 

When it comes to dataflows, with Azure Data Factory, you can create mapping dataflows.  

They are visually designed data transformations executed as activities within the Azure Data Factory pipeline. These data pipelines use Apache Spark clusters and can be operationalized using already existing ADF scheduling, flow, control and monitoring capabilities. 

A great advantage of creating dataflows in ADF is the fact that it is a code-free visual transformation layer using Azure Databricks clusters in the back. 

Creating flows in ADF is all about dragging-and-dropping. It makes building a data pipeline easy and quick. Also, you can clean data without worrying about managing server clusters or writing code.  

Power Query in ADF 

It is also worth mentioning that ADF allows you to build interactive Power Query mashups natively in the tool and then execute those at scale inside an ADF pipeline. Then, ADF can automatically scale it out and operationalize data wrangling, which is a part of data transformation, using ADF’s dataflow Spark environment.  

The User Interface (UI) offered in ADF Power Query is identical to Power BI UI, so it can be easily used by Power BI users. 

What is Azure Synapse Analytics? 

Azure Synapse Analytics is an analytics service combining data analysis, visualization, integration, and orchestration, along with predictive analytics user experiences in one single platform. It provides Data Engineers with the opportunity to query data, using serverless on-demand, or provisioned resources – at scale. 

Azure Synapse Analytics Architecture

Source: https://docs.microsoft.com/en-us/azure/synapse-analytics/overview-what-is 

Mapping dataflows in Azure Synapse Analytics

Azure Synapse was introduced in 2019 and has inherited most of its data integration and orchestration capabilities from ADF. Consequently, you can use it to create mapping dataflows without writing any code and using a drag-and-drop feature, similarly to ADF. 

Nevertheless, Azure Synapse has been equipped with additional data integration key components, such as: 

  1. Spark notebooks allowing you to use notebooks with Python, Scala, and .NET code as an element of the data integration pipelines. This element resembles the Databricks Notebook included in ADF. 
  2. Spark job definitions working like Spark notebooks, except for the fact they require script files located in the storage account. It is similar to Databricks Python activity in ADF. 

Microsoft has added these features to replace the need for using Databricks for Data Engineering tasks. Nevertheless, the Databricks parts coming from ADF are still available in Synapse as well.  

Moreover, Synapse offers complete T-SQL-based analytics and hybrid data integration. 

What are Power BI dataflows? 

Finally, Power BI capabilities can also be utilized for data transformation and data driven workflows creation.  

Power BI dataflows are Power Query processes that run in the cloud with the same set of data preparation functionalities, data source connectors, transformations, and gateways. The dataflows are created and managed in the online Power BI service existing next to Power BI datasets, dashboards, and reporting in a Power BI workspace. 

Dataflows are created using Power Query Online. Once you create them, the “M” scripts are available for review or for changes, but you do not need to write any line of code by yourself. It makes creating dataflows in Power Bi a code-free solution, just like Azure Synapse and ADF. 

With Power BI dataflows, you can develop ETL processes which can be used to connect with business data from multiple data sources. Data imported by Power BI dataflows is stored in Azure Data Lake (Gen2), which is known for having massive scalability.  

Common Data Models (CDM) mapping

You can also map data to Microsoft’s Common Data Model (CDM) or define schemas that align with your source data. Datasets can be generated and refreshed on a schedule from dataflows, then used to visualize data.  

Range of services contributing to and leveraging data from CDM folders in a data lake (Azure Synapse is an evolution of Azure SQL DW)

Range of services contributing to and leveraging data from CDM folders in a data lake (Azure Synapse is an evolution of Azure SQL DW)

Source: https://powerbi.microsoft.com/pl-pl/blog/power-bi-dataflows-and-azure-data-lake-storage-gen2-integration-preview/

Consequently, Power BI dataflows provide the possibility of building semantic models on top of their data. Dataflows are already stored in CDM folders, so the integration between Power BI and Azure Data Lake allows all authorized users or services to easily leverage dataflow data using CDM folders as a shared standard.  

What is more, authorized people are also able to create and store CDM folders in their organizations’ Azure Storage account and then simply add them to Power BI as a dataflow.  

What are the similarities between Azure Synapse, Azure Data Factory and Power BI dataflows? 

Data transformation enablement 

All in all, mapping dataflows in Azure Synapse or Azure Data Factory and Power BI dataflows are various paths that Microsoft has created for data integration, preparation, and transformation.  

As a result, all of them can be used for creating effective data movement activities, defining data transformation by Data Engineers, and other users.   

Dataflows are serverless, fast and cost-efficient service supporting data processing for data analysis, machine learning, or data warehousing.  

They define data transformation activities that are part of the ETL (Extract, Transform, Load) and ELT (Extract, Load, Transform) processes, which are about actions, such as: 

  • extract data from flat files, or relational tables,  
  • transform data, 
  • load the data.  

Consequently, ETL and ELT processes along with dataflows are an integral part of a Data Engineer’s job whose primary role is to process data and make data ready for consumption and analysis.  

No coding required 

All these tools provide Data Engineers and business users with the possibility to exploit data integration capabilities in a code-free manner, as they provide a graphical user interface that doesn’t require writing a single line of code.  

No need for ETL server maintenance 

What is more, all of them remove the need to maintain ETL servers, or integration applications as they allow authorized users to focus on the data-driven tasks at hand. They also allow users to schedule, monitor and manage dataflow jobs. 

Simplicity of dataflow creation 

Finally, the dataflows in all these tools are simple to build. When it comes to Azure Synapse and Azure Data Factory, mapping dataflows are created in the same manner using a built-in drag-and-drop function. Consequently, creating dataflows with these tools is easy and user-friendly.  

Nevertheless, creating Power BI dataflows is also a simple process. The visual web-based Power Query editor is embedded in the Power BI Service with step-by-step manuals on how to create dataflows. 

What are the differences between Azure Synapse, Azure Data Factory and Power BI dataflows? 

Different audience 

First, Azure Synapse and Azure Data Factory are targeted towards Data Engineers, or other corporate IT resources focused on data integration within an organization. In opposition, Power BI dataflows are directed at business users, unlocking for them data preparation capabilities for citizen data integrators. 

Flexibility of features 

The different audiences when it comes to Azure tools and Power BI dataflows users also result in Power BI dataflows being more limited in scope with the editor being streamlined for serving its purpose.  

Consequently, Power BI dataflows should rather be treated as an addition to an analytics environment instead of a replacement for data warehouse architecture, or ETL procedures. 

Meanwhile, ADF and Azure Synapse are compound and broad tools that interplay into numerous other services. As a result, they offer more complexity and their features are focused on deeply technical users, especially those focused on creating organizational level data consolidation and processing.  

Selection of features in ADF and Azure Synapse 

Azure Synapse data integration pipelines are based on the same concepts as ADF-linked services, datasets, activities, and triggers. Nevertheless, there are a few differences between these tools. 

 

CategoryFeatureAzure Data FactoryAzure Synapse Analytics
Integration Runtime Using SSIS and SSIS Integration Runtime ✓ ✓  
Public preview 
  Support for Cross-region Integration Runtime (Data Flows) ✓ ✗ 
  Integration Runtime Sharing 
 Can be shared across different data factories 
✗ 
Pipelines Activities SSIS Package Activity ✓ ✓ 
Public preview 
  Support for Power Query Activity ✓ ✗ 
Template Gallery and Knowledge center Solution Templates ✓ 
Azure Data Factory Template Gallery 
✓ 
Synapse Workspace Knowledge center 
GIT Repository Integration GIT Integration ✓ ✓ 
Monitoring Monitoring of Spark Jobs for Data Flow ✗ ✓ 
Leverage the Synapse Spark pools 

Source: https://docs.microsoft.com/en-us/azure/synapse-analytics/data-integration/concepts-data-factory-differences

As you can see in the above table, the main differences between Azure Synapse and ADF lie in integration runtime, Power Query activities and monitoring. 

Azure Synapse doesn’t support cross-region integration runtime, integration runtime sharing across data factories, and Power Query activities, while ADF doesn’t allow monitoring of Spark jobs for dataflows. 

Nevertheless, Azure Synapse with built-in Spark notebooks guarantee the flexibility to use Python, Scala, and .NET code. Consequently, you can easily perform more advanced data transformations and data movement. 

It is also worth mentioning that using SSIS, SSIS Integration runtime and SSIS package execution activity in Azure Synapse are now available only in public preview mode.  

Summary – Azure Synapse vs Azure Data Factory vs PowerBi dataflows

All in all, creating dataflows that define data transformation actions, and are an element of ETL, or ELT processes, is an integral part of Data engineers’ job. Microsoft stack offers tools such as Azure Data Factory, Azure Synapse, or Power BI for building effective dataflows.  

All these tools are cloud-based, simple to use without coding skills required. Nevertheless, Azure Synapse and Azure Data Factory are advanced tools specifically designed for Data Engineering needs. Therefore, creating dataflows in Power BI should not be treated as a replacement for data warehouse architecture, or ETL procedures.  

On the other hand, data integration concepts are similar within Azure Synapse and Azure Data Factory. Nevertheless, ADF disposes of features such as SSIS execution package activity, or integration runtime sharing, that are still in preview mode in Azure Synapse. In contrast, Azure Synapse has deeply integrated Apache Spark, the possibility to easily monitor Spark dataflows, and allows for hybrid data integration.  

Talk to our expert

Are you looking for expert skills for your next data project?

Or maybe you need seasoned data scientists to extract value from data?

Fill out the contact form and we will respond as soon as possible.