~ similar to 2605.28807· 20 results
Jinhu Qi, Muzhi Li, Jiahong Liu, Yuqin Shu +8 more
This survey provides a comprehensive, practical guide to ensuring the trustworthiness of complex, autonomous agentic AI systems by focusing on safety, robustness, privacy, and system security.
The paper proves that standard runtime enforcement mechanisms cannot detect systematic behavioral drift in autonomous agents, proposing a new Invariant Measurement Layer (IML) that restores observabil…
The paper proposes the Policy-Execution-Authorization (PEA) architecture, a separation-of-powers system designed to structurally enforce goal integrity in AI agents, moving safety from a probabilistic…
The paper demonstrates that many instruction-tuned language models suffer from 'silent commitment failure,' meaning they can produce confidently incorrect outputs without any warning signal, and intro…
The paper argues that traditional identity-based reputation mechanisms are structurally inapplicable to language model agents because their mutable, modular nature makes them ontologically dissociativ…
The paper proposes an engineering framework, inspired by metamaterials physics, to quantify institutional coordination and predict civilizational stability in the age of AI.
The paper introduces Governed MCP, a kernel-resident gateway that enforces comprehensive, robust tool governance for AI agents' privileged tool calls, significantly improving safety beyond userspace m…
The paper argues that Agentic AI fundamentally breaks the historical security tradeoff between deception fidelity and scale, necessitating a shift from authenticating actors to evaluating actions.
Shiping Chen, Qin Wang, Guangsheng Yu, Xu Wang +1 more
This paper systematizes the security challenges of open agentic systems, concluding that while attack characterization is mature, the field lacks robust guidelines for operational governance, memory i…
The paper proposes a framework to model moral reasoning as an ethical distribution (ethical pluralism) rather than a single binary judgment, achieving high classification accuracy by integrating norma…
Yuyan Bu, Haowei Li, Qirui Zheng, Bowen Dong +6 more
The paper introduces SPADE-Bench, a new benchmark designed to rigorously evaluate 'agent deception'—the divergence between an agent's reported plan and its actual executed actions—which is a critical…
Xucong Wang, Ziyu Ma, Yong Wang, Yuxiang Ji +4 more
This paper proposes a new method for agentic Reinforcement Learning called Agentic Procedural Policy Optimization (APPO) that improves tool-use capabilities by assigning credit to fine-grained decisio…
Lecheng Yan, Ruizhe Li, Xicheng Han, Wenxi Li +4 more
The paper introduces a new security benchmark and framework to defend LLM agents against 'cognitive poisoning,' where malicious tools build trust through benign feedback before executing a harmful fin…
The paper demonstrates that for edge-native SLMs used in decentralized governance, simpler, intuitive reasoning (System 1) is significantly more robust and efficient than complex, iterative deliberati…
Xiwen Chen, Wenhui Zhu, Jingjing Wang, Peijie Qiu +12 more
S-SPPO introduces a dual-space semantic calibration framework to stabilize Self-Play Preference Optimization (SPPO), preventing policy degeneration when preference oracles assign overly confident wins…
The paper introduces Sovereign Agentic Loops (SAL), a control-plane architecture that decouples LLM reasoning from system execution to enhance safety and reliability in real-world AI agents.
This study demonstrates that instruction-tuned language model agents exhibit robust, group-contingent in-group bias, structurally mimicking human social biases, even when standard action logs fail to…
The paper introduces an outer-loop AI agent that autonomously redesigns LLM policy-synthesis pipelines for multi-agent social dilemmas, demonstrating that the optimal pipeline structure depends critic…
Yan Wang, Zhixuan Chu, Zihao Xue, Zhen Bi +8 more
The paper introduces ConsisGuard, a framework that addresses the 'deliberation-to-enforcement gap' in LLM guardrails by ensuring that the reasoning process is faithfully and consistently translated in…
The paper introduces Distributed Sentinel, a zero-trust architecture that prevents Context-Fragmented Violations (CFVs) in multi-agent systems by propagating security state across departmental boundar…