What is Microsoft Fabric and how will it revolutionize the data world?

BI & Data Science Analyst at 10 Senses

The Microsoft Fabric announcement by Microsoft has reverberated around the world of data analytics, data engineering, and data science. You can read what is the difference between Data Engineering and Data Science here.

The tool is already available in public preview mode. Microsoft promises that it will “reshape how everyone accesses, manages, and acts on data and insights by connecting every data source and analytics service together—on a single, AI-powered platform”*.

Let’s then check:

  • What is Microsoft Fabric,
  • What are Microsoft Fabric’s core components,
  • How can Microsoft Fabric revolutionize the data world?

What is Microsoft Fabric?

Microsoft Fabric is an end-to-end analytics solution with full-service capabilities backed by a shared analytics platform, promising robust data security, governance, and compliance.

To understand Microsoft Fabric in the most accurate way, you should think of it not as a single data platform. It is rather a suite of individual analytical tools and services, working in synergy to give users a unified analytics experience.

The unified analytics experience may seem familiar. It is the way Azure Synapse Analytics was introduced in the first place a few years ago. Nevertheless, with Fabric, Microsoft takes the process of data analytics unification to the next level.

While Azure Synapse Analytics integrates Data Engineering, Data Science, Data Warehousing, and Real-Time Data Analytics capabilities, Microsoft Fabric offers an upgraded analytics experience.

What is Microsoft Fabric - a suite of core data workloads

Microsoft Fabric is built upon Microsoft’s trio of analytics products (Data Factory, Azure Synapse Analytics and Power BI), including unified storage (OneLake) and data governance with Microsoft Purview. On top of that, there will be an action platform, which is not yet available in the preview.

Consequently, Microsoft Fabric is a suite of seven core data workloads, such as:

  • Data Factory for data integration,
  • Synapse Data Engineering for data transformation,
  • Synapse Data Science for data modeling,
  • Synapse Data Warehouse for data warehousing,
  • Synapse Real-Time Analytics for data analysis in real time,
  • Power BI for translating data into actionable insights with business intelligence capabilities,
  • Data Activator for data monitoring.

Knowing the basics of the Fabric capacity, let’s explore what exactly each component of Microsoft Fabric brings to the table.

What are Microsoft Fabric’s core components?

OneLake

OneLake is a key component of Microsoft Fabric, where all compute engines in Fabric automatically store data in the open-source Delta Lake format. The prefix “One” is the core here, as Microsoft emphasizes that it is ONE single storage for all the Fabric workloads from the list above.

OneLake is built on top of Azure Data Lake Storage (ADLS) and covers both structured and unstructured data. It is centrally governed and open to collaboration. Data is secured and governed in one place while remaining discoverable to users who have access across your organization.

Summing up, OneLake:

  • provides a single, integrated environment for data practitioners and the business to collaborate on data projects,
  • saves time by eliminating the need to move and copy data between different systems and data teams to create Power BI reports.

Data Factory

Azure Data Factory can transfer massive loads of data, whereas dataflows can provide users with a comprehensive data integration technology.

Microsoft Fabric uses the best of these two, providing users with scalability and transformation power in one place.

If you’re already familiar with Azure Data Factory, data pipelines in Microsoft Fabric will be immediately familiar as well. They use the same architecture of connected activities to define a process that can include multiple kinds of data processing tasks and control flow logic. You can run pipelines interactively in the Microsoft Fabric user interface, or schedule them to run automatically.

Pipelines in Microsoft Fabric encapsulate a sequence of activities that perform data movement and processing tasks. Users can create pipelines to:

  • define transfer and transform data activities (for example, users can create data flows using the Power Query engine, Gen2),
  • orchestrate these activities through control flow activities that manage branching, looping, and other typical processing logic.

Synapse Data Engineering

Synapse Data Engineering equips users with the ability to build the infrastructure using Lakehouse (OneLake) and pipelines to load the data into this Lakehouse. You can ingest data into the Lakehouse using Shortcuts or data integration methods.

Nevertheless, lakehouses are not just for storing data but also for managing tables. With Synapse, you can achieve better performance and management across the data lakehouse.

Synapse Data Science

Microsoft Fabric - Data Engineering experience

As for Data Science, there is a suite of features and tools across the entire Microsoft Fabric that can be useful in data science projects. As a result, Microsoft Fabric enables:

  • using data wrangler for data analysis,
  • building machine learning models (ML models),
  • creating and running experiments using MLFlow,
  • training machine learning models,
  • using cognitive services and large language models,
  • making predictions.

Synapse Data Warehouse

Microsoft Fabric’s data warehouse centralizes and organizes data from different departments, systems, and databases into a single, unified view for analysis and reporting purposes.

Fabric’s data warehouse is built on Lakehouse, which is stored in Delta format and can be queried using SQL queries. It enables data analysts, engineers, and scientists to work together to create and query a data warehouse that is optimized for their specific needs.

Summing up, Fabric’s data warehouse is:

  • a fully managed, scalable, and highly available relational data warehouse that supports full transactional T-SQL capabilities,
  • a place to store and query data in the Lakehouse,
  • a tool that makes you fully in control of creating tables, loading, transforming, and querying data using either the Fabric portal or T-SQL commands.

Synapse Real-Time Analytics

Microsoft Fabric also empowers users to analyze data in real-time using IoT (Internet of Things) and Log Analytics. Synapse Real-Time Analytics is a fully managed service that is optimized for streaming time-series data.

Real-Time Analytics experience
Real-Time Analytics experience in Microsoft Fabric

With Real-Time Analytics, you’re able to get consistent performance searching all types of data (structured, unstructured, and semi-structured) at scale.

It works with event streaming technologies by loading data into KQL DB, Lakehouse, and ML models, and finally, using Power BI and Power BI reports to visualize the results.

Business Intelligence capabilities with Power BI

Microsoft Power BI doesn’t need any introduction, as it is a leading business intelligence analytics tool. Nevertheless, Microsoft Fabric comes with something extra for Power BI users. It is a new connection type, DirectLake, which is a result of coupling PowerBI and OneLake.

DirectLake is based on loading parquet-formatted files directly from a data lake without querying a Lakehouse endpoint or duplicating a Power BI dataset.

Now, when you choose the DirectQuery option, you must be aware of all the inherent drawbacks and limitations. With DirectLake, the Power BI Analysis Services engine can do the same as the SQL and Spark engines. It means it can directly access the data in OneLake and use the data directly from there, without data movement, while obtaining the same performance level as when using the Import mode.

As a result, DirectLake will be a fast-path way to load the data from the lake straight to the Power BI engine.

Data Activator

Data Activator is a new tool that will be an important part of the Microsoft Fabric suite. It will be a data-event-trigger tool that will assist in automating actions based on data.

For example, you will be able to create a query that will run when a specific measure’s value in a Power BI dataset goes above or below a certain threshold.

Data Activator capabilities

Source: Driving actions from your data with Data Activator  | Microsoft Fabric Blog | Microsoft Fabric

The feature is not yet available in preview mode.

Microsoft Purview

Finally, Purview is a functionality that enables users to govern, protect, and manage their data. From now on, the Purview catalog will be able to scan the entire Microsoft Fabric artifacts set, not just Power BI.

How can Microsoft Fabric revolutionize the data world?

As you can see, Microsoft Fabric is an all-in-one analytics solution. Each of Fabric’s components is powerful by itself, but when brought together, they are a huge step toward the augmented democratization of data.

Let’s go through the key concepts that served as a guide in building the Microsoft Fabric structure and their impact on Microsoft Fabric’s power to revolutionize the data world.

Simplicity

Simplicity was the core concept behind building Microsoft Fabric.

With numerous products and services in the data stack (only at Microsoft are there over 30 products, and there are also other providers of data solutions), it was a highly complicated task for the data analytics department to deliver a successful project.

Consequently, data teams had to learn not only the specifics of these tools but also different licensing plans and features.

To solve this issue and make things easier for users, Microsoft created one unified and complete analytics solution to bring all the data tools together. As a result, users don’t have to worry about technology or licensing individual elements of data pipelines. Instead, they can focus on the results for their organizations.

Software as a Service (SaaS)

Microsoft Fabric is a SaaS product. Consequently, there is no need to worry about the hardware, complex infrastructure, or administration tasks. Instead, you can use the power of instant provisioning and simple onboarding, transferring focus to the analysis of data itself.

As a result, organizations or businesses can quickly get valuable insights as data integration and optimization can occur automatically.

SaaS for the whole company also guarantees that there are no data silos and that users can access multiple systems, which enhances collaboration between data practitioners.

Centralization

As already mentioned, when speaking of OneLake, Microsoft Fabric is based upon the idea of ONE.

All organizational data artifacts are handled in ONE place by ONE tenant. As a result, there is seamless integration between various data artifacts, and data teams don’t have to worry about compliance or security issues.

Low-to-no-code

Low-to-no-code functionality has already empowered many users on the Power Platform with its own SaaS offering.

While the low-code platform maintains scale and integrity for data science, data warehousing, data ingestion, data preparation, and Microsoft’s analytics tools, it also offers many ways to visually represent code that previously blocked many users from going further.

It was an issue, especially in large enterprises, where there was a limited number of professional developers with coding skills per numerous business analysts who demanded that they meet their requirements. The concept was to empower business users with self-service capabilities without using an SQL endpoint or writing an SQL query.

In Microsoft Fabric, low-to-no-code is present in each component of the suite, enabling all users to explore data, easily create reports in Power BI, and make data-driven decisions.

Lakes everywhere

OneLake, lakehouse, Delta Lake, and DirectLake are all important capabilities in Microsoft Fabric:

  • OneLake is a unified data lake with data stored as delta files that serves as storage in Microsoft Fabric,
  • Lakehouse is a data architecture platform for storing, managing, and analyzing structured and unstructured data,
  • Delta Lake is an open-source storage layer used in OneLake,
  • DirectLake is a new fast-path connection type to load the data from the lake straight into the Power BI engine.

What is more, the data lake is the core of Microsoft Fabric. It enables numerous applications to access raw data in the most efficient way possible. It significantly reduces data duplication and movement across multiple services.

By being centered around a data lake and having one storage, OneLake, Microsoft Fabric empowers various personas in an organization to work simultaneously, providing reliable insights to the business user.

Empowering various data personas

Speaking of personas, Microsoft Fabric removes data silos and the necessity to access multiple systems, facilitating collaboration between data professionals.

In Microsoft Fabric, all data specialists work together on the same SaaS product. It enables different data personas to provision and run different types of Fabric workloads and jobs without the need for pre-approval or planning. There is one simple copy of data, and all users can leverage one single copy.

As a result, the tool empowers:

  • Data engineers to easily curate data models and expand knowledge with data science techniques,
  • Data analysts to avoid the mundane tasks of performing extensive downstream data transformations before creating a Power BI report and can see the lineage and connect to data with DirectLake,
  • Data scientists to easily integrate data science techniques and exploit interactive Power BI reports to provide data-educated insights.

Security and governance at the top level

Managing security and governance aspects are also inherent in Microsoft Fabric.

Admins can set up security and access control to enable only authorized users’ access to sensitive data. On top of that, Microsoft Fabric has capabilities, such as data lineage and protection labels (highly confidential, confidential, etc.) to protect data artifacts.

Admins can also use the admin center as a platform to execute all administrative tasks, such as setting up tenant settings, performance and usage metrics, or networking.

As you can see, Microsoft Fabric is a powerful analytics platform that has the potential to revolutionize the data world. Although it may seem that all the platform elements were already there and Microsoft used rebranding in a very clever way, the addition and modification of specific concepts and features can really change the rules of the analytics game.

It is high time to check out what Microsoft Fabric offers for yourself. You don’t need to have Power BI Premium capacity to get access, but you do need a valid Power BI license. If you don’t have one, you can sign up for a free Power BI license by following the instructions here.

Remember, though, that Microsoft Fabric is still in preview mode. The analytics platform’s accessibility has just been announced and enabled for users to try it out. As a result, there will be improvements in the upcoming months, so things can take an even more interesting turn soon.

*Source: Data Analytics | Microsoft Fabric

Talk to our expert

Are you looking for expert skills for your next data project?

Or maybe you need seasoned data scientists to extract value from data?

Fill out the contact form and we will respond as soon as possible.