Control Reinforcement Learning: Interpretable Token-Level Steering of LLMs via Sparse Autoencoder Features

Share this

Download our

Download the Free

Academic Paper

Get the insights you need to stay compliant and competitive in the evolving AI landscape.

Academic Paper
https://cdn.prod.website-files.com/6305e5d52c28356b4fe71bac/6a0eddd27d199a7856d3517f_2602.10437v3_compressed.pdf
Control Reinforcement Learning: Interpretable Token-Level Steering of LLMs via Sparse Autoencoder Features
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.