~ similar to 2604.16913v1· 20 results
The Cognitive Firewall is a hybrid edge-cloud defense architecture that significantly reduces the attack success rate of Indirect Prompt Injection against browser-based AI agents by combining local vi…
Xunguang Wang, Yuguang Zhou, Qingyue Wang, Zongjie Li +4 more
This paper introduces a novel framework, the Reasoning Safety Monitor, to detect and prevent logical inconsistencies and adversarial manipulations within the internal reasoning steps of large language…
The paper proves that platform-deterministic inference is a necessary and sufficient condition for trustworthy AI, establishing that AI trust fundamentally relies on consistent arithmetic.
The paper proposes using an LLM aggregator that analyzes complete reasoning traces, demonstrating that trace-level synthesis is superior to traditional consensus methods like majority voting for solvi…
The paper proposes DySCo, a dynamic trust-aware sparse consensus mechanism, to efficiently manage communication in multi-agent LLM systems by selectively connecting agents based on real-time value, th…
Erchi Wang, Pengrun Huang, Eli Chien, Om Thakkar +3 more
The paper introduces DPrivBench, a new benchmark to test whether large language models (LLMs) can automate the complex reasoning required to verify differential privacy guarantees for algorithms.
Jiling Zhou, Aisvarya Adeseye, Seppo Virtanen, Antti Hakkala +1 more
The paper proposes a structured prompt engineering framework to enhance the integrity and reliability of Chain-of-Thought (CoT) reasoning in LLMs, demonstrating significant improvements in security-se…
Doguhuan Yeke, Yanming Zhou, Leo Y. Lin, Hongyu Cai +2 more
The paper introduces RoboJailBench, the first standardized evaluation framework for assessing adversarial jailbreak attacks and defenses in embodied AI systems like robots.
The paper proposes replacing individual agent autonomy with a structured 'social contract' and institutional Separation of Power (SoP) to mitigate systemic failures and deceptive behavior in multi-age…
This paper simulates the Argumentative Theory of Reasoning (ATR) using multi-agent debate among LLMs, demonstrating that collective adversarial discourse significantly enhances truth-seeking performan…
The paper introduces Sovereign Agentic Loops (SAL), a control-plane architecture that decouples LLM reasoning from system execution to enhance safety and reliability in real-world AI agents.
Benlong Wu, Weiming Zhang, Kejiang Chen, Han Fang +1 more
The paper introduces an executable Proof-Constrained Action (ePCA) framework that secures AI agents by forcing them to formalize their intentions into first-order logical constraints, achieving provab…
Benlong Wu, Weiming Zhang, Kejiang Chen, Han Fang +1 more
The paper introduces a formal, logically constrained framework, ePCA, to secure advanced AI agents by forcing them to translate natural language intentions into first-order logical constraints before…
The paper introduces a lightweight, sampling-based cryptographic protocol for verifiable AI inference that drastically reduces proving overhead from minutes to milliseconds by leveraging statistical p…
Yunfeng Xia, Chao Li, Lei Li, Chenhao Zhang +3 more
The paper systematizes the interaction between autonomous AI agents and blockchain platforms using a bidirectional trust framework, identifying significant gaps in current standards and proposing a ta…
The paper introduces a certified purity architecture that strengthens governance in cognitive workflow systems by replacing insufficient runtime checks with cryptographically attested structural guara…
Philip Huff, Dakota Dale, Harshith Guduru, Rohan Singh +1 more
The paper proposes a system that operationalizes cybersecurity governance frameworks by integrating them with attack-path modeling and Deep Reinforcement Learning to generate practical, resource-const…
The paper analyzes the failure modes of aggressive 2-bit quantization in large reasoning models, proposing lightweight controls like FP16 planning and loop rescue to restore accuracy and achieve pract…
The paper proposes the Policy-Execution-Authorization (PEA) architecture, a separation-of-powers system designed to structurally enforce goal integrity in AI agents, moving safety from a probabilistic…
Zhenhao Xu, Wenhan Chang, Yichuan Chen, Yuxin Fang +2 more
The paper proposes Safety Context Injection (SCI), an inference-time framework that prepends a structured external risk report to protect Large Reasoning Models (LRMs) against sophisticated jailbreaks…