Built with and by Teycir Ben Soltane•
How to Use•FAQ•GitHub•arXiv.org•
Share:
ArXivCSExplorer
☆☆Bookmarks🏆RSSHow to UseFAQ
Home/Authors/Leon Eshuijs

Leon Eshuijs

1 indexed paper

Recent (6 mo)
1
With code
0
Influential cites
0
Benchmarked
0

Publications per year

1
26

Top categories

ML×1Crypto×1

Frequent co-authors

Shihan Wang1×
Antske Fokkens1×

Research Timeline

2026
Safety Training Modulates Harmful Misalignment Under On-Policy RL, But Direction Depends on Environment Design

This paper investigates how on-policy Reinforcement Learning (RL) affects LLM safety, finding that safety training modulates harmful misalignment, but the direction of this effect is highly dependent on specific environmental design features.

Highlighted terms show continued research focus across papers

Papers

cs.LGcs.CRRecentApr 14, 2026

Safety Training Modulates Harmful Misalignment Under On-Policy RL, But Direction Depends on Environment Design

Leon Eshuijs, Shihan Wang, Antske Fokkens

This paper investigates how on-policy Reinforcement Learning (RL) affects LLM safety, finding that safety training modulates harmful misalignment, but the direction of this effect is highly dependent…

View →