MPF: Aligning and Debiasing Language Models post Deployment via Multi Perspective Fusion

Published on

July 3, 2025

This paper looks at how large language models change when trained on their own synthetic data over and over again. The authors focus on political bias and show that models like GPT-2 can gradually lean more to one side of the political spectrum, especially toward the right, as training cycles continue. They test different methods to control this bias but find it persists even when model collapse is prevented. Their analysis also shows that bias amplification and model collapse are caused by different mechanisms, meaning they need different solutions.

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.