Data maturity for process mining

Co-founder and CEO at 10 Senses

One of the distinctive features of process mining is that it is based on collected data. We don’t rely on ideas about a process, or solely on our expertise. With automatic process discovery at hand, we don’t have to model and draw anything manually. This work is being done by algorithms which are behind detecting and building process models. Our main task is to correctly set the parameters for algorithms and to analyze the outcomes.

Nevertheless, since algorithms perform the task of building models, the main bottleneck in process mining analyses is the availability and quality of data – in brief, its maturity.

Collecting data has always been a tedious task. Nonetheless, for process mining, the importance of data is even higher than usual, which makes low data quality or availability a significant obstacle in the process of building process models and their subsequent analysis. This makes data maturity crucial when it comes to process mining analytics projects.

 

5 levels of data maturity

The people responsible for process mining methodology development, IEEE Task Force on Process Mining, recognized the growing importance of data maturity and decided to systematize the understanding of this subject. As a result, they prepared and published appropriate materials which can be found in the Process Mining Manifesto.

The data maturity typology is based on five levels and provides detailed information on what formal requirements must be met for each level. In practice, there are several features that determine data maturity and the goal for data is to meet them as much as possible:

  • Automatic data collection: the difficulty in the process mining implementation is that many organizations do not collect information about the activities carried out in the process at all, or the information is collected in a paper form (for example notes, or overview books). In the end, we aim for data about events to be automatically saved by IT systems.
  • Completeness: to build a process model that reflects reality, the data should include various activities involved in the process to a high degree and data gaps to a low degree.
  • Consistent definitions for process activities: in process mining we cannot allow too much freedom and ambiguity in the activity definitions. Therefore, we should always have a closed list of terms for process elements.
  • Credibility: data must reflect the reality and actual activities performed in the company for the process models and their analyses to be correct.

 

Consistent ontologies at the organization level: if all IT systems use the same concepts to describe reality, we can easily integrate data from different sources and build more complete process models.

Do you need support with Process Mining?

Let’s check if we can help you

It is commonly believed that process mining can be performed starting from mature data at level 3. Such a suggestion is included, for example, in the Process Mining Manifesto. In general, this is a true statement, but you can try to perform process mining analysis with data at a lower level of maturity. It is worth mentioning that data from Levels 1 and 2 can be processed so that they can be used effectively in the analysis.

What is more, you can also find examples of process mining carried out on completely unstructured data, like video recordings. However, you should keep in mind that the use of low-maturity data requires careful processing, is time-consuming, and is associated with a higher risk of errors. Data maturity is critical to process mining

As already mentioned, the quality and availability of data are essential to process analysis. Unfortunately, there is a tendency to ignore this aspect.

In many cases, end users of process mining analyses are focused primarily on specific results, such as process visualization, results summaries, filtering capabilities, treating other concepts, like data maturity, as abstract issues of little practical importance. That is why it is so important to educate the end-users of process mining analyses from the beginning on. In the end, while process mining analysis is the goal, it must be based on data which is available and in good quality. This should be obvious to all parties involved in a project.

It is also worth mentioning that reaching data maturity is not a one-off event. During the growing adoption of process mining tools in a company, it is expected that analytical work related to the creation of process models will intertwine with initiatives aimed at increasing data maturity.

How could such initiatives to increase data maturity look like? Collecting the information you need is itself of crucial importance. Although this statement may seem trivial, in practice, it is not always an obvious fact. Currently, a lot of information in the form of logs is deleted from IT systems on basis. Consequently, it is worth considering whether we delete valuable data that could help us in the analysis and optimization of processes.

Moreover, introducing consistent categories may also be helpful. Initially, it can entail only very simple changes, such as modifying data entry forms (for example, an employee who has to select information from a list and cannot enter it manually). Nevertheless, in the end, it should lead to the harmonization of ontologies of terms used by all IT systems in the company.

Often, we don’t have access to the software and cannot modify it to generate the data the way we need it. In such occurrences, building an ETL system would be the solution. Such a system would integrate data from different systems into one common database, or many databases, but using the same set of names for events and their parameters. It would also be important to use data quality measures (we wrote more about quality measures, among others, here: How does automatic process discovery in process mining work?)

 

Data governance initiatives

At this point, it can be seen that data maturity for process mining has a lot to do with data governance. Actually, it is a slice of data governance but focused on a specific type of data, and its specific use.

Keeping in mind the importance of data governance, we, as 10 Senses, have been involved in the activities of the Polish branch of the Data Managers Association (DAMA). One of the key goals of DAMA is to increase awareness and raise standards for working with data. This has a direct impact on the area of data maturity which is so important for process mining projects.

If you want to improve the maturity of data in your company for the purposes of analysis and process optimization, please contact us.

It is very possible that this article will raise questions about process mining tools and analysis. If you would like to know more, please contact us or read our other articles:

Talk to our expert

Are you looking for expert skills for your next data project?

Or maybe you need seasoned data scientists to extract value from data?

Fill out the contact form and we will respond as soon as possible.