MPF: Aligning and Debiasing Language Models post Deployment via Multi Perspective Fusion

The study reveals that large language models amplify political bias during repeated training, a process separate from model collapse, requiring targeted fixes.

This paper looks at how large language models change when trained on their own synthetic data over and over again. The authors focus on political bias and show that models like GPT-2 can gradually lean more to one side of the political spectrum, especially toward the right, as training cycles continue. They test different methods to control this bias but find it persists even when model collapse is prevented. Their analysis also shows that bias amplification and model collapse are caused by different mechanisms, meaning they need different solutions.

Download our latest
Academic Paper
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Share this

Unlock the Future with AI Governance.

Get a demo

Get a demo