~ similar to 2605.30834· 20 results
Xianyou Li, Weiran Yan, Yichao Wu, Penghao Liang +3 more
This paper introduces a failure-aware observability framework to diagnose wasted computation in multi-agent LLM systems by mapping recurring failure modes to online trace signals.
Md Nakhla Rafi, Md Ahasanuzzaman, Dong Jae Kim, Zhijie Wang +1 more
FALAT is a diagnostic framework that treats failure attribution in complex LLM agent trajectories as a dependency-guided search problem, successfully identifying both the responsible agent and the dec…
The paper demonstrates a semantic denial-of-service attack against LLM-controlled robots by injecting short, safety-plausible phrases into the audio channel, causing the robot to halt or disrupt execu…
This paper analyzes failure modes in collaborative visual reasoning systems, demonstrating that naive shared workspaces can amplify hallucinations and proposing diagnostics for improving communication…
The paper identifies a 'deployment-safety gap' in Vision-Language-Action (VLA) policies, showing that identical model checkpoints can result in physically different and unsafe robot actions due to act…
Yuefeng Peng, Mingzhe Li, Kejing Xia, Renhao Zhang +1 more
This paper presents the first systematic study of membership inference attacks (MIAs) against Vision-Language-Action (VLA) models, demonstrating that these models are highly vulnerable to privacy brea…
Zhen Huang, Zhihuang Liu, Mengxuan Luo, Weishang Wu +1 more
The paper proposes a novel attack paradigm demonstrating how compromising a single robot in an LLM-controlled multi-robot system can rapidly propagate malicious intent to cause coordinated unsafe acti…
Haoyuan Shi, Xiancong Ren, Yingji Zhang, Qinfan Zhang +8 more
VLA-Trace is a diagnostic framework that analyzes Vision-Language-Action (VLA) models by tracing their internal representations and external behaviors, revealing that while these models are good at vi…
The paper analyzes how runtime safety enforcement impacts the performance of multi-step LLM agents, finding that while safety mechanisms can block unsafe actions, they impose a significant performance…
The paper introduces a diagnostic framework to determine if World-Action Models (WAMs) provide genuinely actionable behavioral improvements beyond simply achieving task success, finding that WAMs ofte…
Xian Qi Loye, Qinglin Su, Zhexin Zhang, Shiyao Cui +4 more
The paper introduces RUBAS, a rubric-based reinforcement learning framework that improves agent safety by providing fine-grained, multi-dimensional rewards for complex tool-use scenarios.
Minyang Hu, Bo Yang, Zhinuo Zhou, Jiachen Liang +3 more
The paper introduces RedundancyBench, a new benchmark for detecting unnecessary steps in LLM agent trajectories, finding that this task is highly complex and difficult to solve.
The paper introduces Agent-ToM, a Theory-of-Mind (ToM) based framework that learns to monitor autonomous LLM agents by explicitly reasoning about their hidden beliefs and intentions to detect covert m…
The paper identifies and measures a critical failure mode where LLM agents violate policies by losing or corrupting directive-bearing state during the process of assembling the decision context, and p…
Jiaming Wang, Ziteng Feng, Jiangtao Wu, Ruihao Li +7 more
The paper introduces TELBench and the DRIFT framework to enable fine-grained, span-level error localization in deep-research agents, significantly improving the ability to pinpoint exactly where an ag…
The paper addresses the difficulty of using general vision-language models (VLMs) for fine-grained driver behavior recognition by creating a new, richly described dataset and demonstrating that fine-t…
The paper introduces Temporal Logit Observability (TLO), a training-free diagnostic that analyzes the decoding process to reveal the temporal patterns of LLM safety failures, showing that failure mech…
Tianzhuo Yang, Zihan Shen, Zirui Mi, Zhaoyi Zhang +6 more
The paper introduces MiraBench, a new benchmark that evaluates the action-conditioned reliability of robotic world models, finding that visual fidelity is insufficient and that optimism bias is a perv…
The paper proposes an algorithmic method using conformal prediction to formally certify high-probability safety for Belief-Space Neural Safety Filters (BeliefSF), significantly improving safety guaran…
Jingtao He, Hongliang Lu, Xiaoyun Qiu, Yixuan Wang +1 more
The paper introduces a structured multi-level visual perturbation framework to systematically analyze how dependent VLA-based driving behavior is on visual information, revealing uneven visual groundi…