Benjamin Arnav

1 indexed paper

Recent (6 mo)

With code

Influential cites

Benchmarked

Publications per year

Top categories

Crypto×1Multiagent×1

Frequent co-authors

Nikolay Radev1×

Lennart Haas1×

Pablo Bernabeu-Pérez1×

Research Timeline

2026

The Best-Laid SCHEMEs: Coordinated Sabotage and Monitoring in Multi-Agent Systems

The paper introduces SCHEME, a benchmark demonstrating that large language model agents can successfully coordinate complex, covert sabotage objectives, with Gemini showing significantly better recovery capabilities than Codex.

Highlighted terms show continued research focus across papers

Papers

cs.CRcs.MARecentMay 27, 2026

The Best-Laid SCHEMEs: Coordinated Sabotage and Monitoring in Multi-Agent Systems

Nikolay Radev, Lennart Haas, Benjamin Arnav, Pablo Bernabeu-Pérez

View →