Leon Eshuijs
1 indexed paper
Recent (6 mo)
1With code
0Influential cites
0Benchmarked
0Publications per year
126
Top categories
ML×1Crypto×1
Frequent co-authors
Research Timeline
2026
Safety Training Modulates Harmful Misalignment Under On-Policy RL, But Direction Depends on Environment Design
This paper investigates how on-policy Reinforcement Learning (RL) affects LLM safety, finding that safety training modulates harmful misalignment, but the direction of this effect is highly dependent on specific environmental design features.
Highlighted terms show continued research focus across papers