Model collapse happens when a language model’s performance drops after being trained too many times on its own synthetic data. While this issue is well studied, less attention has been given to bias amplification when models increasingly reinforce political or social biases.
To study this, we built a benchmark using U.S. political news and tested it on GPT-2. We found that with each training cycle, the model became more politically skewed, especially toward right-leaning bias.
We tried three strategies Overfitting, Preservation, and Accumulation but none stopped the bias from growing, even when model collapse was controlled. Our analysis showed that different groups of neurons cause collapse and amplification, meaning they come from separate mechanisms.
This work highlights that bias amplification is not just a side effect of collapse. It needs its own targeted solutions to ensure LLMs stay reliable and fair.


Get a demo
Get a demo