Papers similar to 2605.22568v1

~ similar to 2605.22568v1· 20 results

cs.AIcs.CRRecentMay 12, 2026

Do Androids Dream of Breaking the Game? Systematically Auditing AI Agent Benchmarks with BenchJack

Hao Wang, Hanchen Li, Qiuyang Mang, Alvin Cheung +2 more

The paper introduces BenchJack, an automated red-teaming system that systematically audits popular AI agent benchmarks, revealing numerous reward-hacking exploits and demonstrating a method to signifi…

View →

cs.CRcs.AIcs.SERecentMay 21, 2026

Benchmarking Autonomous Agents against Temporal, Spatial, and Semantic Evasions

Jianan Ma, Xiaohu Du, Ruixiao Lin, Yaoxiang Bian +7 more

The paper introduces a multi-dimensional evasion framework and a new benchmark (A3S-Bench) to test autonomous agents, demonstrating that stateful, multi-turn attacks significantly increase system risk…

View →

cs.CRcs.AIRecentMay 26, 2026

Lessons from Penetration Tests on Large-Scale Agent Systems

Kevin Eykholt, Dhilung Kirat, Xiaokui Shu, Jiyong Jang +2 more

The paper reports on penetration tests conducted on proprietary, large-scale AI agent systems, finding that security vulnerabilities persist despite stricter development standards.

View →

cs.AIcs.CLcs.CRRecentApr 9, 2026

ACIArena: Toward Unified Evaluation for Agent Cascading Injection

Hengyu An, Minxi Li, Jinghuai Zhang, Naen Xu +5 more

The paper introduces ACIArena, a unified and comprehensive evaluation framework designed to systematically test the robustness of Multi-Agent Systems against complex Agent Cascading Injection attacks.

View →

cs.CRcs.CLRecentMay 14, 2026

Talk is (Not) Cheap: A Taxonomy and Benchmark Coverage Audit for LLM Attacks

Karthik Raghu Iyer, Yazdan Jamshidi, Nicholas Bray, Alexey A. Shvets

The paper introduces a comprehensive taxonomy and auditing framework to assess the collective coverage of existing LLM attack benchmarks, revealing significant and systematic gaps in current testing m…

View →

cs.AIcs.CRRecentMay 11, 2026

From Controlled to the Wild: Evaluation of Pentesting Agents for the Real-World

Pedro Conde, Henrique Branquinho, Valerio Mazzone, Bruno Mendes +2 more

The paper introduces a novel, practical evaluation protocol that shifts the assessment of AI pentesting agents from simple task completion to validated, open-ended vulnerability discovery in complex,…

View →

cs.CRcs.AIRecentMay 18, 2026

Agent Security is a Systems Problem

Mihai Christodorescu, Earlence Fernandes, Ashish Hooda, Somesh Jha +10 more

The paper argues that agent security must be treated as a systems problem, requiring the enforcement of security invariants at the system level rather than solely relying on improving the underlying A…

View →

cs.AIRecentMay 27, 2026

Benchmarking AI for low-resource contexts: Thinking beyond leaderboards

Aakash Pant, Kavya Shah, Apoorv Agnihotri, Sneha Nikam +2 more

The paper critiques current AI benchmarking practices for low-resource settings, arguing that evaluation must shift focus from isolated model performance to the holistic performance of the deployed sy…

View →

cs.CRcs.AIRecentMar 30, 2026

Design Principles for the Construction of a Benchmark Evaluating Security Operation Capabilities of Multi-agent AI Systems

Yicheng Cai, Mitchell John DeStefano, Guodong Dong, Pulkit Handa +4 more

This paper proposes a set of design principles and a conceptual benchmark (SOC-bench) to systematically evaluate the blue team operational capabilities of multi-agent AI systems in autonomous Security…

View →

cs.CRcs.AIRecentApr 21, 2026

Cyber Defense Benchmark: Agentic Threat Hunting Evaluation for LLMs in SecOps

Alankrit Chona, Igor Kozlov, Ambuj Kumar

The paper introduces a challenging benchmark for LLM agents to perform unsupervised threat hunting on raw Windows event logs, finding that current frontier models perform poorly and are not ready for…

View →

cs.AIcs.CRRecentMay 6, 2026

AgentTrust: Runtime Safety Evaluation and Interception for AI Agent Tool Use

Chenglin Yang

AgentTrust is a novel runtime safety layer that intercepts and evaluates AI agent tool calls before execution, achieving high accuracy in detecting unsafe actions across complex and obfuscated scenari…

View →

cs.CRcs.AIcs.LGRecentJun 3, 2026

CyberGym-E2E: Scalable Real-World Benchmark for AI Agents' End-to-End Cybersecurity Capabilities

Tianneng Shi, Robin Rheem, Dongwei Jiang, Mona Wang +12 more

The paper introduces CyberGym-E2E, a large-scale, end-to-end benchmark designed to comprehensively evaluate AI agents' capabilities across the entire lifecycle of real-world software vulnerability dis…

View →

cs.CRcs.AIRecentMay 22, 2026

When the Manual Lies: A Realistic Benchmark to Evaluate MCP Poisoning Attacks for LLM Agents

Shi Liu, Xuehai Tang, Xikang Yang, Liang Lin +3 more

This paper introduces a new benchmark to test Tool Description Poisoning (TDP) attacks on LLM agents, demonstrating that even advanced models like GPT-4o are highly vulnerable and that current defense…

View →

cs.CRcs.AIRecentJun 3, 2026

Search-Time Contamination in Deep Research Agents: Measuring Performance Inflation in Public Benchmark Evaluation

Yongjie Wang, Xinyue Zhang, Kunhong Yao, Zhiwei Zeng +3 more

The paper introduces the concept of Search-Time Contamination (STC), demonstrating that deep research agents can leak information from public benchmarks via web search, leading to an overestimation of…

View →

cs.CRRecentMay 2, 2026

Toward a Principled Framework for Agent Safety Measurement

Shuyi Lin, Anshuman Suri, Alina Oprea, Cheng Tan

The paper introduces BOA, a novel framework that measures agent safety by exhaustively searching the entire in-budget trajectory space, thereby identifying unsafe behaviors missed by traditional sampl…

View →

cs.CRcs.AIRecentApr 3, 2026

A Systematic Security Evaluation of OpenClaw and Its Variants

Yuhang Wang, Haichang Gao, Zhenxing Niu, Zhaoxiang Liu +3 more

The paper systematically evaluates six OpenClaw-series AI agent frameworks, demonstrating that these agentized systems possess significant security vulnerabilities that are distinct from and more seve…

View →

cs.CYcs.CRRecentMay 20, 2026

Backchaining Loss of Control Mitigations from Mission-Specific Benchmarks in National Security

Matteo Pistillo, Samantha Faraone, Joshua Herman

The paper proposes a novel, empirical methodology called 'backchaining' to derive and prioritize Loss of Control (LoC) mitigations by analyzing the errors an AI system makes on mission-specific nation…

View →

cs.MAcs.CRcs.LGRecentApr 25, 2026

Architecture Matters for Multi-Agent Security

Ben Hagag, William L. Anderson, Christian Schroeder de Witt, Sarah Scheffler

This paper empirically demonstrates that the architectural design of multi-agent systems significantly impacts their security, finding that coordination mechanisms can introduce vulnerabilities greate…

View →

cs.CRcs.AIRecentMay 19, 2026

Measuring Safety Alignment Effects in Autonomous Security Agents

Isaac David, Arthur Gervais

The study evaluates how safety alignment affects autonomous security agents using a comprehensive trace-based benchmark, finding that while less-restricted models show gains, these effects are not uni…

View →

cs.CRcs.AIRecentApr 30, 2026

Security Attack and Defense Strategies for Autonomous Agent Frameworks: A Layered Review with OpenClaw as a Case Study

Luyao Xu, Xiang Chen

This paper provides a systematic, layered review of security risks and defense strategies for autonomous agent frameworks, using OpenClaw as a case study to address the current lack of integrated rese…

View →