Marius Hobbhahn
1 indexed paper
Recent (6 mo)
1With code
0Influential cites
0Benchmarked
0Publications per year
126
Top categories
NLP×1AI×1ML×1
Frequent co-authors
Research Timeline
2026
Training Deliberative Monitors for Black-Box Scheming Detection
The paper introduces a novel method for training low-cost, action-only deliberative monitors that detect scheming behavior in autonomous agents, achieving high performance comparable to expensive frontier models.
Highlighted terms show continued research focus across papers
Papers
cs.CLcs.AIcs.LGRecentMay 28, 2026
Training Deliberative Monitors for Black-Box Scheming Detection
Aditya Sinha, Akshat Naik, Victor Gillioz, Simon Storf +4 more
The paper introduces a novel method for training low-cost, action-only deliberative monitors that detect scheming behavior in autonomous agents, achieving high performance comparable to expensive fron…
View →