The Holistic AI Library's New Explainability Module: Demystifying Black-Box Models

September 8, 2023
Authored by
Kleyton da Costa
Machine Learning Researcher at Holistic AI
Cristian Munoz
Machine Learning Researcher at Holistic AI
Franklin Cardenoso Fernandez
Researcher at Holistic AI
The Holistic AI Library's New Explainability Module: Demystifying Black-Box Models

Holistic AI are proud to present our new explainability module, the latest addition to the open-source Holistic AI Library. This development marks another significant step in our ongoing mission to enhance the transparency and security of machine learning models.

Since the initial release of the Holistic AI Library, we have focused our efforts on gathering metrics and methods that not only improve model quality but also provide clear insights into their inner workings. And in the coming weeks, we will introduce more functions and application scenarios for explainability.

Our team also recently introduced the metrics and bias mitigation module. We have since received valuable feedback from the community, highlighting the importance of not only eliminating bias but also understanding the underlying logic behind model decisions.

This is why we are excited to introduce the explainability module, which offers a range of powerful tools to demystify the "black boxes" of machine learning models. And, keeping in mind our commitment to continuous improvement, we hope that the community can collaborate by sharing new ideas and enhancement suggestions.

In this article, our focus is on highlighting the key features of the new explainability module within the library. In general, there are three main strategy types to explore the model’s interpretability for binary or regression tasks:

  • Permutation feature importance.
  • Surrogate feature importance.
  • Lime feature importance.

In this post, we focus on Surrogate feature importance, using the classic Diabetes dataset to illustrate the techniques offered by the Holistic AI Library.

Opening the machine learning black box

The literature classifies AI-based models in three groups: white-box, grey-box, or black-box models. White-box models are inherently explainable and transparent in terms of their internal functioning. In other words, through model construction, the user can interpret which model "actions" are responsible for the output.

Generally, these models have low accuracy when applied in complex contexts, but they have the advantage of high transparency. Linear regression, logistic regression, and decision trees are common examples of white-box models.

On the other hand, models classified as gray-box are those in which there is partial access to the internal functioning of the model. The ideal scenario for using these models, therefore, is when the user doesn't need to understand the entire model's functioning to accept its decisions. Reinforcement learning and neural networks with a hidden layer are examples of gray-box models.

Finally, black-box models are those with hidden functionality for users. The only information available when using a black-box model is the input data and the output data. Apart from these two elements, nothing else is humanly understandable, including the learning process that the model follows to generate results. Models such as neural networks with multiple hidden layers (including variations like LSTMs), support vector machines, and random forests are present are examples of black-box models.

Explainability for regression tasks with Diabetes dataset

The "Diabetes dataset"  is a popular dataset in the machine learning community and is often used for practicing regression algorithms and predictive modelling. The dataset has ten baseline variables (age, sex, body mass index, average blood pressure, and six blood serum measurements), which were obtained for each of 442 diabetes patients, as well as the response of interest, a quantitative measurement of disease progression one year after the baseline was recorded. In this study, the prediction variable aims to measure how the disease develops in patients.

We begin by loading the necessary libraries and the data using the sklearn library, as detailed below.

# import libraries 

from sklearn.datasets import load_diabetes 


import matplotlib.pyplot as plt 

import pandas as pd 


dataset = load_diabetes() # load dataset 

X = # features 

y = # target  

feature_names = dataset.feature_names # feature names 

X = pd.DataFrame(X, columns=feature_names) # convert to dataframe 

Visualising the histogram of the variables, we can observe that the data has undergone a transformation. More specifically, all feature variables have been mean-centred and scaled by the standard deviation times n_samples (i.e., the sum of squares of each column totals 1).

X.hist(bins=10, figsize=(10, 10), color = 'mediumslateblue') 

Furthermore, we can also observe the correlation between variables through a heatmap. This type of analysis assists us in identifying a high correlation between certain variables. For instance, there is a high and positive correlation between the serum 1 and serum 2 variables. In general, variables related to blood serum exhibit higher correlations among themselves, which could indicate a relationship between biochemical processes.

from holisticai.bias.plots import correlation_matrix_plot 

correlation_matrix_plot(X, target_feature='age', size = (12,7)) 

As the data is already standardised, we will, after a brief data characterisation, perform a simple processing. After separating the data into training and testing sets, we will train the Linear Regression model and make predictions using the test data.

# simple preprocessing 

from sklearn.linear_model import LinearRegression 

from sklearn.model_selection import train_test_split 


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2) # train test split 

model = LinearRegression() # instantiate model, y_train) # fit model 

y_pred = model.predict(X_test) # compute predictions 

After we have completed the model training, we can compute accuracy metrics using the function `regression_efficacy_metrics`, which is available in the Holistic AI Library. The results observed in the table indicate that the model has reasonable performance for this dataset.

# compute efficacy metrics 

from holisticai.efficacy.metrics import regression_efficacy_metrics 

regression_efficacy_metrics(y_test, y_pred) 

Normally, this would be the final step in an analyses of machine learning models – we have the data, a model, prediction results, and accuracy metrics. However, it is necessary to beyond and seek additional information that assists us in addressing the fundamental questions in the scientific process. In other words, the ‘why’ and the ‘how’.

How to interpret model outputs

As practitioners of machine learning, it's essential to dig deeper and comprehend not only what the model predicts, but also why and how it arrives at those predictions. This entails unraveling the underlying patterns, relationships, and decision pathways that the model learned from the data. By doing so, we can achieve a higher level of understanding, trust, and reliability in the outcomes provided by the model.

This quest for deeper insights drives us to explore the interpretability and explainability of machine learning models. Techniques that illuminate the feature contributions, highlight decision influences, and uncover the model's inner workings enable us to bridge the gap between complex algorithms and human comprehension. As we move forward, the integration of explainability not only empowers us to make more informed decisions based on the model's output but also ensures the accountability and ethical robustness of AI systems.

Surrogate feature importance

Moving on to the Surrogacy Efficacy Score, this technique is designed specifically to gain insights into complex "black box" models, which are often challenging to interpret. Examples include deep neural networks or ensemble models, which are powerful but lack transparency in their decision-making process.

To address this issue, the Surrogacy Efficacy Score relies on creating interpretable surrogate models. It starts by training a more interpretable model – such as a decision tree – to approximate the behaviour of the complex black-box model. This surrogate model is constructed by partitioning the input data based on the values of specific features and creating simple rules to mimic the original model's predictions.

The training process for the surrogate model involves minimising the loss between the predictions of the black-box model and the surrogate model. By achieving a close resemblance between the two models' predictions, the surrogate model effectively acts as an interpretable proxy for the black-box model. This surrogate can then be analysed and inspected to understand how the complex model makes decisions based on different feature values.

With the explainability module, it's possible to calculate surrogate feature importance and their corresponding metrics using the "surrogate" strategy within the Explainer class. Just as observed in the case of permutation, we can obtain metric results and visualise feature importance.

# surrogate feature importance 

explainer = Explainer(based_on='feature_importance', 



model = model,  

x = X,  

y = y) 

A quick way to compute explainability metrics using the Holistic AI Library is by calling the metrics function from the explainer object. This way, feature importance metrics are computed quickly and conveniently.


Note: A complete explanation about this metrics can ce found in this paper.

An initial analysis of metrics results:

  • Fourth Fifths: this metric shows that 10% of features properly explain the model output.
  • Importance Spread Divergence: this metric shows the entropy of global feature importance. This metric is of interest when we compare with another model.
  • Importance Spread Ratio: shows that the feature importance is concentrated in few features. This happens because the metric result is close to 0 (high importance concentration) instead of close to 1 (uniform importance spread).
  • Global Explainability Score: this metrics shows the ease level of partial dependence curves. In this case, the result shows that the partial dependence curves are easy to interpret. More information about partial dependence plots can be found here.
  • Surrogate Efficacy Regression: this metric shows that surrogate model (a simple decision tree with depth equal to 3) has a symmetric mean absolute error equal to 33%. This metric shows that surrogate model is a difficult fit for the real model.

One way to visualise the feature importance results for the model outcomes is through a bar chart that displays the variables that contributed the most to the prediction. As shown below, you can simply call the bar_plot chart from the explainer object.


The next plot shows the decision process of the surrogate model in a comprehendible and digestible way. This is a simple binary tree that can be easily interpreted by humans.

explainer.visualization('Decision Tree graphviz') 

We can visualise more details of each decision block’s feature distribution to increase our understanding about model outputs.

img_tree = explainer.visualization('Decision Tree dtreeviz') 


In this article, we have introduced the Holistic AI Library’s new explainability module. With this novel tool, it is possible to forensically explore the outcomes of your machine learning models and devise more transparent solutions in both research and production environments.

We have demonstrated how it is feasible to compute feature importance, both on a global and local scale. Additionally, we have illustrated how to calculate relevant metrics and visualise information pertaining to the model outcomes. By incorporating this module into your workflow, you can gain deeper insights into the inner workings of your models, thus fostering better understanding and trust in the predictions they generate.

With the capability to explore feature contributions and interpret the decision-making processes, you can enhance not only the interpretability of your models but also the reliability of their results. This advancement is poised to facilitate more informed decision-making, enable effective model debugging, and encourage collaboration and knowledge sharing among the data science community.

As the field of AI continues to evolve, transparent insights into model behaviour will undoubtedly play an essential role in building more ethical and effective AI solutions.

DISCLAIMER: This blog article is for informational purposes only. This blog article is not intended to, and does not, provide legal advice or a legal opinion. It is not a do-it-yourself guide to resolving legal issues or handling litigation. This blog article is not a substitute for experienced legal counsel and does not provide legal advice regarding any situation or employer.

Subscriber to our Newsletter
Join our mailing list to receive the latest news and updates.
We’re committed to your privacy. Holistic AI uses this information to contact you about relevant information, news, and services. You may unsubscribe at anytime. Privacy Policy.

Discover how we can help your company

Schedule a call with one of our experts

Schedule a call