The study finds that large language models like GPT-2 can become more politically biased, such as becoming more right-leaning, when trained repeatedly on their own synthetic outputs. This increase in bias happens even if overall model quality remains stable, meaning bias can get worse without “model collapse.” The researchers show that different sets of neurons are responsible for bias and collapse, so these issues need separate solutions, and simply fighting one won’t fix the other.
Model collapse happens when a language model’s performance drops after being trained too many times on its own synthetic data. While this issue is well studied, less attention has been given to bias amplification when models increasingly reinforce political or social biases.
To study this, we built a benchmark using U.S. political news and tested it on GPT-2. We found that with each training cycle, the model became more politically skewed, especially toward right-leaning bias.
We tried three strategies Overfitting, Preservation, and Accumulation but none stopped the bias from growing, even when model collapse was controlled. Our analysis showed that different groups of neurons cause collapse and amplification, meaning they come from separate mechanisms.
This work highlights that bias amplification is not just a side effect of collapse. It needs its own targeted solutions to ensure LLMs stay reliable and fair.
Get a demo
Get a demo