~ similar to 2606.02282· 20 results
This paper addresses the critical need for trustworthy LLMs in science by proposing a comprehensive, multi-layered defense framework and methodology to evaluate unique scientific vulnerabilities.
Xianyou Li, Weiran Yan, Yichao Wu, Penghao Liang +3 more
This paper introduces a failure-aware observability framework to diagnose wasted computation in multi-agent LLM systems by mapping recurring failure modes to online trace signals.
Jinhu Qi, Muzhi Li, Jiahong Liu, Yuqin Shu +8 more
This survey provides a comprehensive, practical guide to ensuring the trustworthiness of complex, autonomous agentic AI systems by focusing on safety, robustness, privacy, and system security.
Md Nakhla Rafi, Md Ahasanuzzaman, Dong Jae Kim, Zhijie Wang +1 more
FALAT is a diagnostic framework that treats failure attribution in complex LLM agent trajectories as a dependency-guided search problem, successfully identifying both the responsible agent and the dec…
Haoyu Wang, Zibo Xiao, Yedi Zhang, Christopher M. Poskitt +1 more
The paper proposes SafeClaw-R, a novel framework that enforces safety as a system-level invariant over the execution graph to mitigate the high safety and security risks inherent in autonomous multi-a…
Hengyu An, Minxi Li, Jinghuai Zhang, Naen Xu +5 more
The paper introduces ACIArena, a unified and comprehensive evaluation framework designed to systematically test the robustness of Multi-Agent Systems against complex Agent Cascading Injection attacks.
The paper introduces and measures 'accidental meltdown,' a new type of unsafe agent behavior triggered by benign environmental errors, finding that such meltdowns occur frequently and often involve hi…
Sen Fang, Weiyuan Ding, Zhezhen Cao, Zhou Yang +1 more
AEGIS is a novel multi-agent framework that grounds vulnerability reasoning by reconstructing per-variable dependency chains over a Code Property Graph, achieving state-of-the-art performance on the P…
Xiang Liu, Sa Song, Zhaowei Zhang, Huiying Lan +5 more
The paper introduces Agora, a domain-aware multi-agent framework that successfully detects deep, previously unknown logic bugs in complex consensus protocols, outperforming existing LLM-based analysis…
Qi Hu, Yifeng Tang, Qinghua Wang, Lanyang Zhao +6 more
The paper introduces SABER, a new benchmark that evaluates the operational safety of LLM coding agents in complex, stateful project environments, finding that current models have a high rate of harmfu…
The paper introduces Post-Deterministic Distributed Systems (PDDS) as a new model to coordinate autonomous infrastructure where participants, including stochastic agents, produce divergent reasoning p…
The paper introduces SafeAudit, a meta-audit framework that systematically enumerates test cases and uses a quantitative metric to uncover significant residual unsafe behaviors in LLM agents that exis…
Zhen Huang, Zhihuang Liu, Mengxuan Luo, Weishang Wu +1 more
The paper proposes a novel attack paradigm demonstrating how compromising a single robot in an LLM-controlled multi-robot system can rapidly propagate malicious intent to cause coordinated unsafe acti…
Xunguang Wang, Yuguang Zhou, Qingyue Wang, Zongjie Li +4 more
This paper introduces a novel framework, the Reasoning Safety Monitor, to detect and prevent logical inconsistencies and adversarial manipulations within the internal reasoning steps of large language…
This paper analyzes failure modes in collaborative visual reasoning systems, demonstrating that naive shared workspaces can amplify hallucinations and proposing diagnostics for improving communication…
Xuwei Ding, Skylar Zhai, Linxin Song, Jiate Li +5 more
The paper introduces OS-BLIND, a benchmark demonstrating that current safety evaluations fail to detect critical vulnerabilities in computer-use agents when user instructions are benign, showing high…
Zhenhao Xu, Wenhan Chang, Yichuan Chen, Yuxin Fang +2 more
The paper proposes Safety Context Injection (SCI), an inference-time framework that prepends a structured external risk report to protect Large Reasoning Models (LRMs) against sophisticated jailbreaks…
The paper argues that current 'on-the-fly' AI agent design lacks necessary software engineering rigor and proposes an 'AI Workflow Store' to provide hardened, reusable, and reliable agent workflows.
The paper introduces BOA, a novel framework that measures agent safety by exhaustively searching the entire in-budget trajectory space, thereby identifying unsafe behaviors missed by traditional sampl…
The paper introduces a self-healing agentic orchestrator that significantly improves the reliability of tool-augmented LLM systems by treating failure as a bounded runtime control problem, achieving h…