Data Science

Enhancing Transparency in AI: Explainability Metrics via SHAP Feature Importance with Holistic AI Open-Source Library

Authored By

Published on

March 5, 2024

In our ongoing series on Enhancing Transparency in AI, we delve into the crucial aspect of comprehending machine learning model outcomes through insightful explainability metrics. For an overview of why explainability metrics are important for transparent and trustworthy AI, check out our guide on explainability metrics using Holistic AI Library. For a deeper dive into our original research on measuring explainability in machine learning, explore our research paper on explainability.

In this article, we shed light on the metrics derived from SHAP feature importance, providing a comprehensive understanding of your model’s performance.

Shapley Additive eXplanations

Additive feature attribution methods are well-defined in literature and have numerous applications. Several models adhere to this additive attribution principle. The Shapley Additive Explanations (SHAP) method leverages Shapley values to calculate feature attributions. To explain this, let’s define concepts:

Set function (v): A function that assigns a value to a set of features.
Shapley value (sp_i) for feature ‘i’: Measures the contribution of feature ‘i’ to the model’s output by considering all possible combinations of features. SHAP calculates this as a weighted average.

SHAP explainability metrics using Holistic AI Library

The first step in implementing explainability metrics is to load the libraries and read the data. Remember that in this case, the library is installed as follows:


pip install -q holisticai[explainability]

After installing the HAI library we need to load the dataset and split the data into train and test sets. In this tutorial, we use the Law School dataset. We have used this dataset previously with bias metrics from our library, but here we focus on explainability.


# load the dataset 
from holisticai.datasets import load_dataset 
df, group_a, group_b = load_dataset(dataset='law_school', preprocessed=True, as_array=False) 

 

# select features (X) and target (y) data 

import pandas as pd  

 

X = df.drop(columns=['target']) 
y = df['target'] 
 

# create a train-test split 

from sklearn.model_selection import train_test_splitX_train, X_test, y_train, y_test= train_test_split(X, y, test_size=0.3, random_state=42)

The goal of this dataset is the prediction of the binary attribute ‘bar’ (whether a student passes the law school bar). We can use a machine learning model to make this classification and have used a simple logistic regression model in this tutorial:


# import model Logistic Regression 
from sklearn.linear_model import LogisticRegression 

 

# instantiate model 
model = LogisticRegression() 
model.fit(X_train, y_train) 

 

# make predictions 

y_pred = model.predict(X_test) 

 

# compute classification efficacy metrics 
from holisticai.efficacy.metrics import classification_efficacy_metricsclassification_efficacy_metrics(y_test, y_pred)

As can be seen in the figure below, based on the performance metrics we calculated in the last step of the below code, the regression model formed reasonably well.

Explainer class

The Explainer class is used to compute metrics and generate graphs related to these metrics. In this sense, some parameters are important for the successful implementation. The “based-on” parameter defines the type of strategy that will be used — in this case, we use strategies based on feature importance. The “strategy_type” parameter is used to select the strategy type, namely SHAP. Additionally, we need to define the model type (binary_classification), the model object, features used in training (X_train) and targets used in training (y_train).


# import Explainer class and instantiate lime feature importance 
from holisticai.explainability import Explainershap_explainer =  

 

Explainer(based_on='feature_importance', 
                          strategy_type='shap', 
                          model_type='binary_classification', 
                          model = model, 
                          x = X_train, 
                          y = y_train)

After instantiating the explainability object for the model results, we can compute the metrics. With the HAI library, this process is simplified through the metrics function. In this example, we use the parameter detailed=True to visualize the results for labels 0 and 1.


# compute metrics 
shap_explainer.metrics(detailed=True)

As can be seen below, the values computed are relatively small and close to the target value of 0.

Another important tool that is possible to access using the Explainer object is plots. As an example, the following code snippets show the bar plot with the feature importance ranking and the box plot for data stability and feature stability.


# plot the feature importance ranking 
shap_explainer.bar_plot()


# plot feature stability and data stability 
shap_explainer.show_importance_stability()

Summary

In this tutorial, we focused on the SHAP feature importance tool within the Holistic AI library’s explainability module. We learned how to calculate explainability metrics and generate visualizations that reveal the key factors influencing a model’s predictions. Remember, these techniques offer the valuable benefit of adaptability, allowing you to apply them across diverse datasets and scenarios.

Dive in!

Interested in exploring explainable AI further? Reach out for a demo of our AI governance platform, read our paper on explainability metrics for AI, or start exploring metrics on your own data using the Holistic AI Library.