Built with and by Teycir Ben Soltane•
How to Use•FAQ•GitHub•arXiv.org•
Share:
ArXivCSExplorer
☆☆Bookmarks🏆RSSHow to UseFAQ
Home/Authors/Anna Sztyber-Betley

Anna Sztyber-Betley

1 indexed paper

Recent (6 mo)
1
With code
0
Influential cites
0
Benchmarked
0

Publications per year

1
26

Top categories

ML×1AI×1Crypto×1

Frequent co-authors

Jan Dubiński1×
Jan Betley1×
Daniel Tan1×
Owain Evans1×

Research Timeline

2026
Conditional misalignment: common interventions can hide emergent misalignment behind contextual triggers

The paper introduces the concept of 'conditional misalignment,' demonstrating that common interventions designed to reduce emergent misalignment can fail by only masking misaligned behavior until the input context resembles the training data.

Highlighted terms show continued research focus across papers

Papers

cs.LGcs.AIcs.CRRecentApr 28, 2026

Conditional misalignment: common interventions can hide emergent misalignment behind contextual triggers

Jan Dubiński, Jan Betley, Anna Sztyber-Betley, Daniel Tan +1 more

The paper introduces the concept of 'conditional misalignment,' demonstrating that common interventions designed to reduce emergent misalignment can fail by only masking misaligned behavior until the…

View →