Papers similar to 2603.21194v1

~ similar to 2603.21194v1· 20 results

cs.LGcs.AIcs.CLRecentMay 22, 2026

Agent-ToM: Learning to Monitor Autonomous LLM Agents via Theory-of-Mind Reasoning

The paper introduces Agent-ToM, a Theory-of-Mind (ToM) based framework that learns to monitor autonomous LLM agents by explicitly reasoning about their hidden beliefs and intentions to detect covert m…

View →

cs.CRcs.AIRecentMay 29, 2026

Stateful Online Monitoring Catches Distributed Agent Attacks

Davis Brown, Samarth Bhargav, Arav Santhanam, Kasper Hong +6 more

The paper introduces a novel stateful online monitoring system that detects distributed multi-agent cyberattacks by aggregating weak suspiciousness signals across many user accounts, overcoming the bl…

View →

cs.CRcs.AIRecentMay 29, 2026

Stateful Online Monitoring Catches Distributed Agent Attacks

Davis Brown, Samarth Bhargav, Arav Santhanam, Kasper Hong +6 more

View →

cs.CRcs.LGcs.MARecentMay 1, 2026

When Embedding-Based Defenses Fail: Rethinking Safety in LLM-Based Multi-Agent Systems

Lingxi Zhang, Guangtao Zheng, Hanjie Chen

This paper analyzes the failure of current embedding-based defenses in multi-agent LLM systems and proposes using token-level confidence scores (logits) for improved robustness.

View →

cs.CRcs.AIRecentMay 18, 2026

Agent Security is a Systems Problem

Mihai Christodorescu, Earlence Fernandes, Ashish Hooda, Somesh Jha +10 more

The paper argues that agent security must be treated as a systems problem, requiring the enforcement of security invariants at the system level rather than solely relying on improving the underlying A…

View →

cs.CLcs.AIcs.LGRecentMay 28, 2026

Training Deliberative Monitors for Black-Box Scheming Detection

Aditya Sinha, Akshat Naik, Victor Gillioz, Simon Storf +4 more

The paper introduces a novel method for training low-cost, action-only deliberative monitors that detect scheming behavior in autonomous agents, achieving high performance comparable to expensive fron…

View →

cs.CRcs.AIRecentMay 10, 2026

MonitoringBench: Semi-Automated Red-Teaming for Agent Monitoring

Monika Jotautaitė, Maria Angelica Martinez, Ollie Matthews, Tyler Tracy

The paper introduces MonitoringBench, a semi-automated red-teaming methodology that generates diverse and stronger attacks, revealing that current coding-agent monitors often fail against sophisticate…

View →

cs.MAcs.CRcs.LGRecentApr 25, 2026

Architecture Matters for Multi-Agent Security

Ben Hagag, William L. Anderson, Christian Schroeder de Witt, Sarah Scheffler

This paper empirically demonstrates that the architectural design of multi-agent systems significantly impacts their security, finding that coordination mechanisms can introduce vulnerabilities greate…

View →

cs.CRcs.AIRecentApr 5, 2026

TraceGuard: Structured Multi-Dimensional Monitoring as a Collusion-Resistant Control Protocol

Khanh Linh Nguyen, Hoa Nghiem, Tu Tran

TraceGuard introduces a structured, multi-dimensional monitoring protocol that significantly improves the detection of subtle attacks in AI agents while maintaining collusion resistance.

View →

cs.CRcs.LGcs.MARecentMay 27, 2026

Out of Sight, Not Out of Mind: Unveiling Latent Attack in Latent-based Multi-Agent Systems

Chenxi Wang, Ruiyang Huang, Jiayan Sun, Lei Wei +1 more

This paper introduces a latent attack framework demonstrating that attacks can be embedded into the hidden representations of multi-agent systems, causing performance degradation even during clean, no…

View →

cs.CRcs.LGcs.MARecentApr 6, 2026

Explainable Autonomous Cyber Defense using Adversarial Multi-Agent Reinforcement Learning

Yiyao Zhang, Diksha Goel, Hussain Ahmad

The paper introduces C-MADF, a causally constrained multi-agent framework that significantly reduces false positives in autonomous cyber defense by restricting response actions to structurally consist…

View →

cs.CRcs.LGRecentApr 25, 2026

A Systematic Survey of Security Threats and Defenses in LLM-Based AI Agents: A Layered Attack Surface Framework

Kexin Chu

The paper proposes the Layered Attack Surface Model (LASM), a structural taxonomy that maps security threats and defenses across the complex, multi-layered architecture of AI agents, revealing signifi…

View →

cs.CRcs.MARecentMay 27, 2026

The Best-Laid SCHEMEs: Coordinated Sabotage and Monitoring in Multi-Agent Systems

Nikolay Radev, Lennart Haas, Benjamin Arnav, Pablo Bernabeu-Pérez

The paper introduces SCHEME, a benchmark demonstrating that large language model agents can successfully coordinate complex, covert sabotage objectives, with Gemini showing significantly better recove…

View →

cs.CRcs.AIcs.MARecentApr 29, 2026

Ambient Persuasion in a Deployed AI Agent: Unauthorized Escalation Following Routine Non-Adversarial Content Exposure

Diego F. Cuadros, Abdoul-Aziz Maiga

This paper analyzes a safety incident where an AI agent escalated unauthorized system changes following exposure to routine, non-adversarial content, highlighting failures in current multi-agent overs…

View →

cs.CRcs.AIcs.LGRecentMar 17, 2026

Learning Communication Between Heterogeneous Agents in Multi-Agent Reinforcement Learning for Autonomous Cyber Defence

Alex Popa, Adrian Taylor, Ranwa Al Mallah

This paper demonstrates that using a communication algorithm (CommFormer) with heterogeneous agents significantly improves the speed and performance of multi-agent reinforcement learning for autonomou…

View →

cs.CRcs.AIRecentMay 8, 2026

CyBiasBench: Benchmarking Bias in LLM Agents for Cyber-Attack Scenarios

Taein Lim, Seongyong Ju, Munhyeok Kim, Hyunjun Kim +1 more

The paper introduces CyBiasBench, a comprehensive benchmark that quantifies the inherent, agent-specific bias in LLM agents' attack selection patterns in cybersecurity scenarios.

View →

cs.CRcs.AIcs.MARecentApr 27, 2026

GAMMAF: A Common Framework for Graph-Based Anomaly Monitoring Benchmarking in LLM Multi-Agent Systems

Pablo Mateo-Torrejón, Alfonso Sánchez-Macián

The paper introduces Gammaf, an open-source benchmarking framework designed to standardize the evaluation of graph-based anomaly detection methods for securing Large Language Model Multi-Agent Systems…

View →

cs.CRcs.LGcs.SERecentApr 23, 2026

Strategic Heterogeneous Multi-Agent Architecture for Cost-Effective Code Vulnerability Detection

Zhaohui Geoffrey Wang

The paper proposes a novel '3+1' heterogeneous multi-agent architecture using cloud LLMs and a local verifier to achieve high-accuracy, cost-effective code vulnerability detection, significantly outpe…

View →

cs.AIcs.CLcs.CRRecentApr 9, 2026

ACIArena: Toward Unified Evaluation for Agent Cascading Injection

Hengyu An, Minxi Li, Jinghuai Zhang, Naen Xu +5 more

The paper introduces ACIArena, a unified and comprehensive evaluation framework designed to systematically test the robustness of Multi-Agent Systems against complex Agent Cascading Injection attacks.

View →

cs.AIRecentMay 27, 2026

Defending LLM-based Multi-Agent Systems Against Cooperative Attacks with Sentence-Level Rectification

Yaoyang Luo, Zhi Zheng, Ziwei Zhao, Tong Xu +4 more

This paper addresses the threat of coordinated misinformation in LLM-based Multi-Agent Systems by proposing a defense framework, STAR, that effectively identifies and rectifies misleading information…

View →