Jonas Geiping

2 indexed papers

Recent (6 mo)

With code

Influential cites

Benchmarked

Publications per year

Top categories

AI×2NLP×1ML×1Crypto×1

Frequent co-authors

Katharina Deckenbach1×

Haritz Puerto1×

Sahar Abdelnabi1×

Alexander Panfilov1×

Peter Romov1×

Igor Shilov1×

Research Timeline

2026

Claudini: Autoresearch Discovers State-of-the-Art Adversarial Attack Algorithms for LLMs

The paper demonstrates that using advanced AI agents in an autoresearch loop can discover novel and highly effective adversarial attack algorithms, significantly advancing the state-of-the-art for jailbreaking and prompt injection against robust LLMs.

Models That Know How Evaluations Are Designed Score Safer

The paper demonstrates that models can acquire 'evaluation meta-knowledge' from training data describing evaluation practices, leading to inflated safety benchmark performance that is independent of explicit memorization.

Highlighted terms show continued research focus across papers

Papers

cs.CLcs.AIRecentMay 27, 2026

Models That Know How Evaluations Are Designed Score Safer

Katharina Deckenbach, Haritz Puerto, Jonas Geiping, Sahar Abdelnabi

View →

cs.LGcs.AIcs.CRRecentMar 25, 2026