Mitigating Bias in Recommender Systems with Holistic AI

Authored by

Researcher at Holistic AI

Published on

Jul 28, 2023

read time

min read

How to mitigate bias in recommender systems with Holistic AI

To ensure recommender systems provide equitable and unbiased outcomes, it is essential to measure and mitigate bias. The user-friendly Holistic AI Python package enables this by allowing users to quantify bias and apply algorithms to mitigate it within machine learning models. By leveraging this toolkit, we can work to obtain more inclusive and fair platforms that enhance user interaction through reduced bias.

In this tutorial, we will present how to train a basic recommender system, calculate its bias metrics with the holisticai package and apply a mitigator to compare the new results with our baseline. To do this, we will use the well-known "Last FM Dataset" from the holisticai library. This dataset – which encompasses user information such as sex and country – details information about a set of artists downloaded by users. The objective of this recommendation system is to suggest artists based on user interactions.

Building our baseline

First, we must import the required packages to perform our bias analysis and mitigation. You will need to have the holisticai package and their dependencies installed on your system. You can install it by running:


!pip install holisticai[methods] 

# Base Imports 

import pandas as pd 

import numpy as np 

import matplotlib.pyplot as plt

from holisticai.datasets import load_last_fm

The dataset that we will use is the "Last FM Dataset", a publicly available dataset that contains a set of artists that were downloaded by users. It includes personal information about the user, specifically sex and country of origin. A user can download more than one artist. We will use the column "score", which contains only 1s for counting the interactions.


bunch = load_last_fm() 

df = bunch["frame"] 

df.head()

Next, we preprocess the dataset before feeding it into the model. For this step, we will define a function that will clean the dataset, create the pivot matrix, and separate the protected groups according to a given feature:


def preprocess_lastfm_dataset(df, protected_attribute, user_column, item_column): 

    """Performs the pre-processing step of the data.""" 

    from holisticai.utils import recommender_formatter 

    df_ = df.copy() 

    df_['score'] = np.random.randint(1,5, len(df_)) 

    df_[protected_attribute] = df_[protected_attribute]=='m' 

    df_  = df_.drop_duplicates() 

    # create the pivot matrix 

    df_pivot, p_attr = recommender_formatter(df_, users_col=user_column, 

groups_col=protected_attribute,  

                                       	items_col=item_column,  

scores_col='score', aggfunc='mean') 

     

    return df_pivot, p_attr 

 

df_pivot, p_attr = preprocess_lastfm_dataset(df, 'sex', 'user', 'artist')

Model training

There are many ways to recommend artists to users. We will use item-based collaborative filtering, the simplest and most intuitive approach. This method bases its recommendations on similarities between items, allowing us to decipher and suggest a list of corresponding artists.

To do that, we will first define some util functions to help us to sort these recommendations:


def items_liked_by_user(data_matrix, u): 

    return np.nonzero(data_matrix[u])[0] 

  

def recommended_items(data_matrix, similarity_matrix, u, k): 

    liked = items_liked_by_user(data_matrix, u) 

    arr = np.sum(similarity_matrix[liked,:], axis=0) 

    arr[liked] = 0 

    return np.argsort(arr)[-k:] 

  

def explode(arr, num_items): 

    out = np.zeros(num_items) 

    out[arr] = 1 

    return out

Now, we must prepare our pivoted table to calculate the correlations and perform the filtering to create a new pivoted table where we can extract the recommendations for the users.


from sklearn.metrics.pairwise import linear_kernel 

 

data_matrix = df_pivot.fillna(0).to_numpy() 

cosine_sim = linear_kernel(data_matrix.T, data_matrix.T) 

 

new_recs = [explode(recommended_items(data_matrix, cosine_sim, u, 10), len(df_pivot.columns)) for u in range(df_pivot.shape[0])] 

 

new_df_pivot = pd.DataFrame(new_recs, columns = df_pivot.columns) 

new_df_pivot.head()

Finally, we obtain our recommendation matrix:


mat = new_df_pivot.replace(0,np.nan)

Measuring the bias

With the new recommendation matrix at hand, we can now calculate various metrics of fairness for recommender systems. In this example, we will cover item_based metrics by using the recommender_bias_metrics function:


from holisticai.bias.metrics import recommender_bias_metrics 

df_baseline = recommender_bias_metrics(mat_pred=mat, metric_type='item_based') 

df_baseline

Above, we have batch plotted all item_based metrics for the recommender bias task. For instance, observe the Average Recommendation Popularity is 5609, meaning that on average a user will be recommended an artist that has 5609 total interactions.

An interesting feature of this function is that it not only returns the calculated metrics from the predictions but also returns the reference to compare the values with an ideal fair model. This feature helps us to analyse the fairness of the predictions for the protected groups in terms of different metrics.

For our analysis, we are interested in the two following metrics:

Aggregate Diversity: Given a matrix of scores, this function computes the recommended items for each user and returns the proportion of recommended items out of all possible items. A value of 1 is desired.

GINI index: Measures the inequality across the frequency distribution of the recommended items. An algorithm that recommends each item the same number of times (uniform distribution) will have a Gini index of 0, while one with extreme inequality will have a Gini of 1.

Mitigating the bias

Now that we can observe that the model metrics are far from the desired values, we must apply a strategy to mitigate the model’s bias.

There are three different strategy categories: "pre-processing", "in-processing" and "post-processing". The holisticai library contains different algorithms from these categories, and all are compatible with the Scikit-learn package. So, if you are familiar with this package, you will have no issues using the library.

For this, we will implement the "Two-sided fairness" method, an in-processing algorithm that maps the fair recommendation problem to a fair allocation problem. This method is agnostic to the specifics of the data-driven model (that estimates the product-customer relevance scores), making it more scalable and easier to adapt.

To perform the mitigation with this method, we will use the data matrix calculated before with the protected groups.


from holisticai.bias.mitigation import FairRec 

fr = FairRec(rec_size=10, MMS_fraction=0.5) 

fr.fit(data_matrix) 

  

recommendations = fr.recommendation 

new_recs = [explode(recommendations[key], len(df_pivot.columns)) for key in recommendations.keys()] 

  

new_df_pivot_db = pd.DataFrame(new_recs, columns = df_pivot.columns) 

mat = new_df_pivot_db.replace(0,np.nan).to_numpy() 

  

recommender_bias_metrics(mat_pred=mat, metric_type='item_based')

We can observe that the use of the mitigator improves the "Aggregate Diversity" metric, reaching the reference value, as well as the remaining values, showing a clear improvement. Let's now compare them with our baseline.

Results comparison

Now that we can observe how to apply the bias mitigator, we will compare the results with the baseline that we have previously implemented to analyse how the metrics have changed.


result = pd.concat([df_baseline, df_tsf], axis=1).iloc[:, [0,2,1]] 

result.columns = ['Baseline','Mitigator', 'Reference']

From the previous chart, we can see that although some of the actual metrics are still far from the ideal values, an improvement is obtained by applying this method in the data, compared with our baseline.

Summary

In this tutorial we have exhibited how the holisticai library can be easily used to measure bias present in recommender systems by the application of the recommender_bias_metrics function, which returns the calculated values for different metrics respectively.

We have also shown how to mitigate bias through the "Two-sided fairness" technique, which is used to train fairness models. This in-processing method maps the fair recommendation problem to a fair allocation problem and is data-agnostic.

By walking through concrete examples of how to quantify and reduce bias in a recommender system, we have demonstrated the feasibility and importance of promoting algorithmic fairness.

DISCLAIMER: This blog article is for informational purposes only. This blog article is not intended to, and does not, provide legal advice or a legal opinion. It is not a do-it-yourself guide to resolving legal issues or handling litigation. This blog article is not a substitute for experienced legal counsel and does not provide legal advice regarding any situation or employer.