Introduction to Explainable AI (Explainable Artificial Intelligence or XAI)

Patrycja Zajac

BI & Data Science Analyst at 10 Senses

Artificial Intelligence (AI) and AI development have made remarkable strides recently. It has not only achieved unprecedented levels of model accuracy but also great performance in various applications. No wonder ChatGPT or Microsoft Copilot are getting such extensive coverage.

As a result, these days, more and more organizations deploy AI and advanced analytics in their daily operations, business processes, and decision automation processes. Nevertheless, the adoption of an AI system triggers numerous challenges for these companies when it comes to security, transparency, and accountability.

These days, phrases like trustworthy AI or responsible AI appear more and more frequently. Companies need to understand the decision-making processes behind AI systems to appropriately trust machine learning techniques and deep learning models.

Nevertheless, they need human-understandable and accurate explanations, not those only relatable to AI practitioners and machine learning engineers. That’s how the Explainable Artificial Intelligence (XAI) domain has emerged.

Explainable AI is an area that tries to demystify the reasons behind certain decisions and predictions of AI systems. With the highest explanation accuracy possible, it tries to deliver accurate explanations in a human language, making AI explainable without harnessing the efficiency that an AI system brings to the table.

Once you read this article, you will know:

What is Explainable AI (XAI),
Why do we need Explainable AI (XAI),
What are the Explainable AI (XAI) core principles,
What are Explainable AI (XAI) techniques?

What is Explainable AI (XAI)?

In brief, Explainable AI is an emerging domain that is focused on different methods and techniques aimed at making the results of various AI applications easily understandable by a human end user.

The idea of Explainable AI contrasts with the black-box models in machine learning (ML). There, even designers themselves cannot simply explain why AI produced a specific solution or made a certain decision.

XAI aims to increase AI explainability within knowledge limits by:

delivering accurate explanations of how AI systems operate,
allowing organizations to effectively manage AI services in-house.

Why do we need Explainable AI (XAI)?

As already mentioned, more and more businesses plan to or already leverage AI in their daily operations. Consequently, they need to let the ML models make impactful decisions, and this cannot happen if the model doesn’t foster confidence.

Explainable AI comes to the rescue here, and it is essential for several reasons. That being said, Explainable Artificial Intelligence:

Increases machine learning model transparency and builds trust by providing insights into how AI arrived at their decisions,
Provides compliance with regulatory requirements in domains such as finance, healthcare, and legal systems,
Helps to debug models and improve model performance by helping to identify and fix biases, errors, or vulnerabilities,
Fosters fairness and ethics by addressing potential ethical concerns in the model’s predictions, for example, discriminatory behavior or risk assessment,
Develops human-AI collaboration by allowing humans to understand, interact, and correct model decisions,
Increases machine learning model adoption and acceptance by increasing its transparency, predictive power,
Educates users, stakeholders, and domain experts about AI models and their limitations, bridging the gap between technical users and non-experts.

Overall, Explainable AI is vital for responsible and accountable AI deployment. It promotes fairness, transparency, and trust in AI systems and encourages the adoption of responsible AI technologies across various sectors.

Explainable AI (XAI) - when it is and when there is no XAI

What are the Explainable AI (XAI) core concepts?

Explainable AI executes various methods and machine learning techniques that bridge the gap between an AI model’s complex computation and human comprehension. To make the explainable AI work, it follows specific ideas. The core concepts of Explainable AI are:

Interpretability is the capability of AI models to generate understandable explanations for their outputs. Human users should be capable of grasping the reasoning behind the model’s predictions, actions, and recommendations, which is enabled by Explainable AI.
Transparency involves increasing the visibility and comprehensibility of the inner workings of AI models. As a result, Explainable AI entails delivering insights into the model’s architecture, parameters, and training data.
Trustworthiness is an essential aspect of Explainable Artificial Intelligence, especially in the case of sensitive applications (like healthcare or banking systems). It involves building confidence among human users in the AI system’s decision-making capabilities and making sure that the results are reliable and unbiased.

What are Explainable AI (XAI) techniques?

In fact, there are numerous explainability techniques and different machine learning algorithms to explain AI systems. We can group them by different categorization criteria. If we take the stage when we apply explainability method, we can organize Explainable AI algorithms into two broad categories:

Self-interpretable models are algorithm models that can be directly read and interpreted by a human being. Therefore, the model itself is the explanation.
Post-hoc explanations are explanations usually generated by other software tools that describe, explain, or model the algorithm to give an idea of how it works, on the condition that it can be queried for outputs on chosen inputs.

Self-interpretable models

As already mentioned, self-interpretable models are themselves explanations. They explain the entire model globally. They also study each input through the AI model. As a result, the simulation of the input of the model can provide a local explanation for each decision.

The most popular self-interpretable models are:

Decision Trees,
Linear Regression,
Logistic Regression,
Naive Bayes Classifier,
K-Nearest Neighbors (KNN),
Rule-based Models,
Generalized Additive Models (GAM).

These self-interpretable models are valuable in scenarios where model interpretability and transparency are critical for understanding the impactful decisions of the model and visually investigating model behavior. Nevertheless, they are usually less accurate than post-hoc models (there must be a compromise between making the model more exact or more meaningful to humans).

Post-hoc explanations

Post-hoc explanations are commonly used for deep learning models. We can group them into local and global explanations. Local ones explain a subset of decisions or are per-decision explanations, while global ones generate explainable models that approximate the non-interpretable models.

Moreover, in certain cases, a global explanation can also produce local explanations by simulating them on specific inputs and providing explanations for those individual inputs.

Local explanations

As already stated, Explainable AI local explanations deal with a subset of inputs. The most common kind of local explanation is a per-decision or single-decision explanation, which produces an explanation for the algorithm’s output or decision on a single input point.

There are a few commonly used local explanation algorithms, for example:

LIME (Local Interpretable Model-Agnostic Explainer),
Shapley Values,
Counterfactual Explanations,
ICE (Individual Conditional Expectation).

LIME

LIME approximates the behavior of the underlying black-box model locally using a simpler, interpretable model to provide explanations.

It first takes a decision, then queries nearby points, builds an interpretable model representing the local decision, and, finally, uses that model to provide per-feature explanations. The default model chosen is logistic regression, but it can also use decision trees, neural networks, linear regression, and others.

When it comes to images, LIME breaks each one into superpixels. Then, it queries the model with a random search space, and varies which superpixels are omitted and replaced with selected color. LIME can also be used to provide local explanations by approximating the model’s behavior around a specific data point and explaining why AI arrived at a certain point.

Shapley Values

In Shapley Values, a prediction can be explained by treating each feature value as a player in a game where the prediction is the payout. Shapley values are a method from coalitional game theory that tells us how to fairly distribute the “payout” among the features.

The players are the feature values that work together to receive the gain (= predict a certain value). The Shapley value is the average marginal contribution of a feature value across all possible coalitions.

SHAP is a practical method for computing Shapley values in machine learning models. Consequently, SHAP provides feature attributions and explanations for individual predictions using a sampling-based technique, making it more scalable and applicable to complex models.

Source: decision plot — SHAP latest documentation

Counterfactual Explanations

Counterfactual explanations provides a “what-if” analysis by generating a new data point that is as close as possible to a given instance but with a different outcome. It aims to explain why an AI model made a particular prediction by showing what changes in the input data would lead to a different outcome. Therefore, you are able to compare model predictions.

It is worth mentioning, though, that certain systems may generate numerous counterfactual instances as a single explanation. For example, if a loan application is rejected by a credit scoring model, a counterfactual explanation might generate feature attributions that would result in the loan application being accepted, thereby revealing which attributes were crucial for the rejection.

ICE (Individual conditional explanation)

Individual Conditional Explanation provides insights into the predictions of machine learning models at the level of individual data points. ICE generates a separate explanation for each instance by decomposing the data point of interest while keeping other instances fixed.

By observing how the model’s prediction changes with the perturbation, ICE helps understand the unique decision-making process for each specific data point, enhancing interpretability on a per-instance basis and prediction accuracy.

Explainable AI - ICE plots — ICE plot of probability of cervical cancer among women by age (one line = one woman). For the majority of women the predicted cancer probability increases along with age. For others, with predicted probability above 0.4, the probability doesn’t change a lot at higher age.

Source: 9.1 Individual Conditional Expectation (ICE) | Interpretable Machine Learning (christophm.github.io)

Global explanations

Global explanations generate post-hoc explanations for the entire algorithm. Usually, they entail producing a global model for algorithms or AI systems. There are multiple algorithms applied in global explanations, among them:

Partial Dependence Plots (PDPs),
Accumulated Local Effects (ALE) Plots,
Feature Interaction,
PFI (Permutation Feature Importance),
Sensitivity Analysis.

Partial Dependence Plots (PDPs)

Partial Dependence Plots present the marginal change of the predicted response when the feature (value of that certain data column or component) changes. It is useful to determine if the relationship between a feature and the response is linear or more complex.

Partial Dependence Plots visualize the marginal effect of a specific feature on the model’s predictions while holding all other features fixed. PDPs provide an overview of how the model’s output changes with variations in a particular feature, allowing us to understand the impact of that feature across the entire dataset.

Explainable Artificial Intelligence - PDPs — Cancer probability and the interaction of age and number of pregnancies. It shows the increase of cancer probability at 45.

Source: 8.1 Partial Dependence Plot (PDP) | Interpretable Machine Learning (christophm.github.io)

Accumulated Local Effects (ALE) Plots

Accumulated local effects describe how features influence the prediction of a machine learning model on average. They provide a way to visualize the relationship between a feature and a model’s predictions. They show how the average prediction changes as the feature of interest varies while ignoring the effects of other features.

It helps to understand the impact of a single feature on the model’s predictions and prediction accuracy by highlighting non-linear relationships and potential interactions between features.

Feature Interaction

The feature interaction algorithm analyzes and uncovers interactions between features across the entire dataset or model. It provides insights into how combinations of features collectively impact the model’s predictions on a broader scale.

PFI

PFI provides insights into the importance of features at a model-wide level, offering a holistic view of how each feature influences the model’s predictions across the entire dataset.

In PFI, the importance of each feature is assessed by evaluating the impact of randomly permuting the values of that feature across all instances in the dataset. By analyzing the model performance drop when the feature is permuted, PFI quantifies the global influence of each feature on the model’s predictions.

Feature Importance as a ML algorithm used in XAI — The impact of each feature on bike counts predictions with a support vector machine. The temp was the most impactful while holiday was the least.

Source: 8.5 Permutation Feature Importance | Interpretable Machine Learning (christophm.github.io)

As a global explanation, PFI does not focus on individual data points or predictions. Instead, it aims to understand the overall behavior and decision-making process of the model with respect to the features it considers. This information is valuable in gaining insights into the significance of different features in the model’s decision-making process, thus helping in model interpretability and feature selection.

Sensitivity Analysis

Sensitivity analysis involves perturbing the input features systematically to observe how these changes affect the machine learning model’s predictions on a global level. It helps understand the robustness and stability of the model’s decisions.

It is also worth mentioning that XAI algorithms such as SHAP or LIME can also be used in global explanations. SHAP values provide a unified framework for interpreting feature contributions in individual predictions, and can also be used to explain global model behavior.

Example of SHAP visualization - XAI — Example of SHAP – global explanations

Source: Using SHAP Values to Explain How Your Machine Learning Model Works | by Vinícius Trevisan | Towards Data Science

On the other hand, LIME approximates complex machine learning models with simpler explainable models based on random samples of the data and aggregates these explanations to gain insights into global model behavior.

All in all, Explainable AI helps companies appropriately trust and address concerns resulting from an AI application. As Artificial Intelligence is expanding and becoming more prevalent in our daily lives, the ongoing research claims there will be even more machine learning models before we are able to thoroughly understand the structures behind the current ones.

Consequently, it is essential to be able to interpret these ML models to develop more efficient and reliable ones soon. Explainable AI employing various algorithms can help to build trust in AI (and black-box models), optimize model performance, provide interactive explanations, effectively manage the fairness of models, and prevent potential discrimination.

Talk to our expert

Are you looking for expert skills for your next data project?

Or maybe you need seasoned data scientists to extract value from data?

Fill out the contact form and we will respond as soon as possible.