~ similar to 2603.25022v1· 20 results
The paper proposes the Policy-Execution-Authorization (PEA) architecture, a separation-of-powers system designed to structurally enforce goal integrity in AI agents, moving safety from a probabilistic…
The paper analyzes the failure modes of current AI containment methods when the agent itself is the adversary, deriving five necessary architectural requirements for durable safety.
The paper proposes the Energetic Paradigm, a model-agnostic architectural framework that allows states to maintain decision sovereignty and control over military AI systems, even when using proprietar…
Xinyu Liu, Darryl Cherian Jacob, Yang Zhou, Jindong Wang +1 more
The OISD framework improves language model reasoning by distilling on-policy predictive signals from the final output layer to intermediate representations, leading to substantial improvements on math…
Gaetan Narozniak, Gérard Biau, Rémi Munos, Ahmad Rammal +1 more
The paper introduces Feedback Distillation, a novel training method that uses a language model's privileged feedback to provide token-level supervision, significantly improving complex reasoning tasks…
The paper empirically characterizes 'shadow AI'—the unsanctioned use of frontier AI in critical infrastructure—as a systemic threat that erodes established assurance and security controls.
The paper proposes a novel, empirical methodology called 'backchaining' to derive and prioritize Loss of Control (LoC) mitigations by analyzing the errors an AI system makes on mission-specific nation…
Sangwoo Park, Woongyeong Yeo, Seanie Lee, Yumin Choi +5 more
The paper proposes SELFCI, a complementary self-distillation framework that effectively balances the privacy requirements of Contextual Integrity (CI) with the utility of large language models, outper…
The paper benchmarks current frontier computer-using agents against hand-crafted attacks, finding that while they are highly safe in browser tasks, this safety does not generalize to other domains lik…
Chong Xiang, Drew Zagieboylo, Shaona Ghosh, Sanjay Kariyappa +4 more
The paper proposes a vision for system-level defenses against indirect prompt injection attacks targeting AI agents, emphasizing structured control and human oversight.
Benlong Wu, Weiming Zhang, Kejiang Chen, Han Fang +1 more
The paper introduces an executable Proof-Constrained Action (ePCA) framework that secures AI agents by forcing them to formalize their intentions into first-order logical constraints, achieving provab…
Benlong Wu, Weiming Zhang, Kejiang Chen, Han Fang +1 more
The paper introduces a formal, logically constrained framework, ePCA, to secure advanced AI agents by forcing them to translate natural language intentions into first-order logical constraints before…
The paper introduces Trajectory-aware OPD (TOPD), a method that uses near-future trajectory information to improve On-Policy Distillation by accurately identifying and guiding true reasoning divergenc…
This paper introduces the Data-Model Compatibility (DMC) metric to quantify how suitable a dataset is for reasoning distillation, showing that optimizing data selection using DMC significantly improve…
The paper develops a formal theory to analyze how throughput changes in AI-enhanced cybersecurity pipelines when stage capacities are perturbed by multipliers.
The paper demonstrates that extended pure neural reasoning fails on complex, deterministic state-tracking tasks beyond a certain 'Deterministic Horizon,' necessitating the integration of external tool…
Shuqiang Wang, Wei Cao, Jiaqi Weng, Jialing Tao +3 more
The paper proposes a black-box attack using a hierarchical genetic algorithm to induce 'overthinking' in Large Reasoning Models, demonstrating that this vulnerability can cause significant resource ex…
The paper demonstrates that for edge-native SLMs used in decentralized governance, simpler, intuitive reasoning (System 1) is significantly more robust and efficient than complex, iterative deliberati…
The paper proposes a taxonomy of 20 hardware-level governance mechanisms for AI compute, finding that the most critical mechanisms needed for international treaty verification are currently the least…
Yan Wang, Zhixuan Chu, Zihao Xue, Zhen Bi +8 more
The paper introduces ConsisGuard, a framework that addresses the 'deliberation-to-enforcement gap' in LLM guardrails by ensuring that the reasoning process is faithfully and consistently translated in…