Leon Eshuijs

1 indexed paper

Recent (6 mo)

With code

Influential cites

Benchmarked

Publications per year

Top categories

ML×1Crypto×1

Frequent co-authors

Shihan Wang1×

Antske Fokkens1×

Research Timeline

2026

Safety Training Modulates Harmful Misalignment Under On-Policy RL, But Direction Depends on Environment Design

This paper investigates how on-policy Reinforcement Learning (RL) affects LLM safety, finding that safety training modulates harmful misalignment, but the direction of this effect is highly dependent on specific environmental design features.

Highlighted terms show continued research focus across papers

Papers

cs.LGcs.CRRecentApr 14, 2026

Safety Training Modulates Harmful Misalignment Under On-Policy RL, But Direction Depends on Environment Design

Leon Eshuijs, Shihan Wang, Antske Fokkens

View →