Katharina Deckenbach

1 indexed paper

Recent (6 mo)

With code

Influential cites

Benchmarked

Publications per year

Top categories

NLP×1AI×1

Frequent co-authors

Haritz Puerto1×

Jonas Geiping1×

Sahar Abdelnabi1×

Research Timeline

2026

Models That Know How Evaluations Are Designed Score Safer

The paper demonstrates that models can acquire 'evaluation meta-knowledge' from training data describing evaluation practices, leading to inflated safety benchmark performance that is independent of explicit memorization.

Highlighted terms show continued research focus across papers

Papers

cs.CLcs.AIRecentMay 27, 2026

Models That Know How Evaluations Are Designed Score Safer

Katharina Deckenbach, Haritz Puerto, Jonas Geiping, Sahar Abdelnabi

View →