Owain Evans
1 indexed paper
Recent (6 mo)
1With code
0Influential cites
0Benchmarked
0Publications per year
126
Top categories
ML×1AI×1Crypto×1
Frequent co-authors
Research Timeline
2026
Conditional misalignment: common interventions can hide emergent misalignment behind contextual triggers
The paper introduces the concept of 'conditional misalignment,' demonstrating that common interventions designed to reduce emergent misalignment can fail by only masking misaligned behavior until the input context resembles the training data.
Highlighted terms show continued research focus across papers
Papers
cs.LGcs.AIcs.CRRecentApr 28, 2026
Conditional misalignment: common interventions can hide emergent misalignment behind contextual triggers
Jan Dubiński, Jan Betley, Anna Sztyber-Betley, Daniel Tan +1 more
The paper introduces the concept of 'conditional misalignment,' demonstrating that common interventions designed to reduce emergent misalignment can fail by only masking misaligned behavior until the…
View →