Built with and by Teycir Ben Soltane•
How to Use•FAQ•GitHub•arXiv.org•
Share:
ArXivCSExplorer
☆☆Bookmarks🏆RSSHow to UseFAQ
Home/Authors/Koushik Sen

Koushik Sen

1 indexed paper

Recent (6 mo)
1
With code
0
Influential cites
0
Benchmarked
0

Publications per year

1
26

Top categories

AI×1Crypto×1

Frequent co-authors

Hao Wang1×
Hanchen Li1×
Qiuyang Mang1×
Alvin Cheung1×
Dawn Song1×

Research Timeline

2026
Do Androids Dream of Breaking the Game? Systematically Auditing AI Agent Benchmarks with BenchJack

The paper introduces BenchJack, an automated red-teaming system that systematically audits popular AI agent benchmarks, revealing numerous reward-hacking exploits and demonstrating a method to significantly improve benchmark robustness.

Highlighted terms show continued research focus across papers

Papers

cs.AIcs.CRRecentMay 12, 2026

Do Androids Dream of Breaking the Game? Systematically Auditing AI Agent Benchmarks with BenchJack

Hao Wang, Hanchen Li, Qiuyang Mang, Alvin Cheung +2 more

The paper introduces BenchJack, an automated red-teaming system that systematically audits popular AI agent benchmarks, revealing numerous reward-hacking exploits and demonstrating a method to signifi…

View →