A major challenge in using artificial intelligence (AI) for decision-making is that it might pick up and repeat any biases found in the data it was trained on, leading to biased results.
This is where bias mitigation tools come into play, including tools for the monitoring and controlling of experiments conducted with models.
To aid in the task of conducting reproducible experiments free from discriminatory biases, this tutorial introduces integration between MLFlow and the Holistic AI library. Through this integration, we aim to facilitate a more responsible implementation of AI. We also provide a deployment setup with FastAPI. In this way, it is possible to implement models trained with bias mitigation methods.
MLFlow is an open-source platform for managing the life cycle of AI models. MLFlow enables data scientists to organize and manage experiments, track metrics, store artifacts, and manage AI models, in addition to providing production monitoring.
These capabilities aid with many of the largest challenges in the implementation of AI models including management, monitoring, and result reproducibility. Life cycle management tools like MLFlow are crucial for limiting and tracking bias and promoting reproducibility.
The first step is to import mlflow library and set the tracking. In this case, we will use the localhost http://127.0.0.1:8080.
To start the MLflow server, run the following command in the terminal:
The dataset that we will use is the “Adult” dataset from the UCI Machine Learning Repository, this is a publicly available dataset that contains information about age, education, marital status, race and gender of individuals from the United States. The objective is to predict whether an individual’s income will be above or below $50K per year.
Source: Holistic AI Datasets
In this step, we will split the dataset into training and test sets. In this case is important to assure that the groups are considered in the same proportion in both sets.
In the next code snippet, we explore the implementation of bias mitigation techniques in machine learning models using the HolisticAI framework. The code demonstrates how to leverage a variety of mitigations including:
By splitting the data into distinct groups and incorporating HolisticAI’s pipeline, the code ensures the fair treatment of different demographic groups during both training and evaluation.
Leveraging Scikit-learn’s Logistic Regression as the base model and the HolisticAI pipeline, the following code encapsulates the trained model within a custom class, MyModel, ensuring compatibility with MLflow standards.
The next code snippet presents a versatile and organized function, train_model, designed for training and evaluating machine learning models within the MLflow framework.
This function allows users to seamlessly incorporate bias mitigation into the training pipeline by specifying a mitigator parameter.
The function not only predicts a given input but also computes and logs essential metrics, including accuracy, disparate impact, statistical parity, and a table artifact with classification bias metrics. Furthermore, it logs model parameters, bias evaluation results, and the trained model itself, providing a comprehensive and transparent record of the model’s performance and bias characteristics.
This approach facilitates reproducibility and thorough analysis, aligning with best practices in machine learning model development and evaluation.
To save the model’s and mitigator’s results we define a simple loop that iterates over a dictionary containing different mitigators, and for each mitigator, it invokes the train_model function with the corresponding name and mitigator settings.
The image below shows the results of the trained models. With the results in MLFlow UI, we can compare the performance of the models and the bias metrics.
We also can compare the results of the models using the mlflow API. The charts below show the results of different metrics for the models trained with and without bias mitigation.
In the next code snippet, we dive into the practical aspect of deploying and utilizing a machine learning model that has been logged and saved using MLflow. The process begins with loading a previously saved model, specifically the one trained with the ‘correlation_remover’ mitigator, which is retrieved using its unique run identifier.
The model is then loaded as a PyFuncModel, making it compatible with MLflow’s PyFunc API. Subsequently, predictions are made on a Pandas DataFrame, simulating real-world input data for the model. The data frame includes features from the test set along with corresponding group information. The predictions are then printed, showcasing how to seamlessly apply a previously trained model to new data.
To access the model save we need to use the run_id and model_name of the experiment. This information is available in the MLflow UI.
This code snippet highlights the ease of model deployment and prediction using MLflow, demonstrating the practical utility of the platform in the machine learning development lifecycle.
A FastAPI application is set up to serve predictions from a machine learning model trained and logged using MLflow.
First, we need to create a file called app.py. Use the following command on the terminal:
The script begins by configuring the MLflow tracking URI and loading the pre-trained model with the correlation_remover mitigator. FastAPI is then initialized, and a /predict route is defined to handle HTTP POST requests for making predictions.
Input data is expected to conform to a specific format, validated using a Pydantic model named InputData. The route’s function processes incoming data, converting it into a Pandas DataFrame, and generates predictions using the loaded MLflow model. Predictions are returned as a JSON response, and exception handling is implemented to manage potential errors, providing informative error messages with appropriate HTTP status codes.
To run the application we can use the following command on the terminal:
Finally, the FastAPI application is run on the development server, making the machine learning model accessible at http://127.0.0.1:8000/predict for real-time predictions through a user-friendly API. After running the application, we can access the API documentation at http://127.0.0.1/8000/docs. The image below shows our API waiting for input data.
This code demonstrates the integration of FastAPI and MLflow, creating a robust and efficient platform for deploying machine learning models without bias using Holistic AI Library with a focus on ease of use and real-time prediction capabilities.
Now you can make predictions using the FastAPI. To do this, you can use the following command on the terminal:
After, you need to import essential libraries, including requests for making HTTP requests and pandas for handling data.
We witness the client-side interaction with the FastAPI web application that hosts the machine learning model. First, the input data, mirroring the structure used during model training and testing, is prepared. This includes augmenting the DataFrame with ‘group_a’ and ‘group_b’ columns.
The input data is then converted into JSON format using Pandas’ to_json method. Subsequently, a POST request is made to the local Flask server (‘http://127.0.0.1:5000/predict') with the prepared JSON data.
The response from the server is captured, and predictions are extracted from the returned JSON content. This code provides a practical example of how to interact with a deployed machine learning model using client-side scripting.
In conclusion, the presented code not only highlights the seamless integration of MLflow and FastAPI but also emphasizes the incorporation of the HolisticAI library for mitigating bias in machine learning models.
The use of MLflow enables effective model tracking, management, and deployment, ensuring transparency and reproducibility in the machine learning development lifecycle. FastAPI’s modern design and automatic OpenAPI and JSON Schema generation provide an efficient platform for building robust APIs, facilitating real-time predictions.
Additionally, the code showcases the integration of HolisticAI, a powerful library designed to address bias in models. By incorporating bias mitigation and measure techniques, such as the ‘correlation_remover’ used in the showcased example, developers can enhance the fairness and ethical considerations of their machine learning models.
This approach, combining MLflow, FastAPI, and HolisticAI, serves as a comprehensive guide for deploying and consuming bias-aware machine learning models, promoting responsible and inclusive AI practices in production environments.
While integrating the Holistic AI Library for bias tracking and mitigation is a crucial step, it's just the beginning of your journey towards comprehensive AI governance. To further build on this success, explore our 360-degree AI governance, regulatory, and compliance platform. Our platform offers a more holistic approach, encompassing every aspect of AI deployment and management.
Schedule a consultation with our team today to discover how you can leverage our full suite of tools and services to ensure your AI systems are not only bias-free but also fully compliant with the latest regulations and best practices in AI governance. Let us help you lead the way in responsible AI implementation.
DISCLAIMER: This blog article is for informational purposes only. This blog article is not intended to, and does not, provide legal advice or a legal opinion. It is not a do-it-yourself guide to resolving legal issues or handling litigation. This blog article is not a substitute for experienced legal counsel and does not provide legal advice regarding any situation or employer.
Schedule a call with one of our experts