Papers similar to 2603.25022v1

~ similar to 2603.25022v1· 20 results

cs.AIcs.CRRecentApr 26, 2026

Structural Enforcement of Goal Integrity in AI Agents via Separation-of-Powers Architecture

The paper proposes the Policy-Execution-Authorization (PEA) architecture, a separation-of-powers system designed to structurally enforce goal integrity in AI agents, moving safety from a probabilistic…

View →

cs.CRRecentApr 25, 2026

When the Agent Is the Adversary: Architectural Requirements for Agentic AI Containment After the April 2026 Frontier Model Escape

Richard Joseph Mitchell

The paper analyzes the failure modes of current AI containment methods when the agent itself is the adversary, deriving five necessary architectural requirements for durable safety.

View →

cs.CYcs.AIcs.CRRecentMar 26, 2026

Preserving Decision Sovereignty in Military AI: A Trade-Secret-Safe Architectural Framework for Model Replaceability, Human Authority, and State Control

Peng Wei, Wesley Shu

The paper proposes the Energetic Paradigm, a model-agnostic architectural framework that allows states to maintain decision sovereignty and control over military AI systems, even when using proprietar…

View →

cs.LGcs.AIcs.CVRecentMay 27, 2026

OISD: On-Policy Internal Self-Distillation of Language Models

Xinyu Liu, Darryl Cherian Jacob, Yang Zhou, Jindong Wang +1 more

The OISD framework improves language model reasoning by distilling on-policy predictive signals from the final output layer to intermediate representations, leading to substantial improvements on math…

View →

cs.AIRecentMay 29, 2026

Distilling LLM Feedback for Lean Theorem Proving

Gaetan Narozniak, Gérard Biau, Rémi Munos, Ahmad Rammal +1 more

The paper introduces Feedback Distillation, a novel training method that uses a language model's privileged feedback to provide token-level supervision, significantly improving complex reasoning tasks…

View →

cs.CRcs.CYRecentMay 23, 2026

From Frontier to Shadow AI: A Simmering Threat to Assurance and Security in Critical Infrastructure

Mohan Baruwal Chhetri, Shahroz Tariq, Tooba Aamir, Marthie Grobler +2 more

The paper empirically characterizes 'shadow AI'—the unsanctioned use of frontier AI in critical infrastructure—as a systemic threat that erodes established assurance and security controls.

View →

cs.CYcs.CRRecentMay 20, 2026

Backchaining Loss of Control Mitigations from Mission-Specific Benchmarks in National Security

Matteo Pistillo, Samantha Faraone, Joshua Herman

The paper proposes a novel, empirical methodology called 'backchaining' to derive and prioritize Loss of Control (LoC) mitigations by analyzing the errors an AI system makes on mission-specific nation…

View →

cs.LGcs.AIcs.CRRecentMay 18, 2026

It Takes Two: Complementary Self-Distillation for Contextual Integrity in LLMs

Sangwoo Park, Woongyeong Yeo, Seanie Lee, Yumin Choi +5 more

The paper proposes SELFCI, a complementary self-distillation framework that effectively balances the privacy requirements of Contextual Integrity (CI) with the utility of large language models, outper…

View →

cs.CRcs.AIcs.CLRecentJun 3, 2026

Domain-Conditioned Safety in Frontier Computer-Using Agents: A 793-Episode Browser Benchmark, a Coding-Domain Cross-Reference, and a Reproducibility Audit of Recent Red-Teaming

Nicholas Saban

The paper benchmarks current frontier computer-using agents against hand-crafted attacks, finding that while they are highly safe in browser tasks, this safety does not generalize to other domains lik…

View →

cs.CRcs.AIRecentMar 31, 2026

Architecting Secure AI Agents: Perspectives on System-Level Defenses Against Indirect Prompt Injection Attacks

Chong Xiang, Drew Zagieboylo, Shaona Ghosh, Sanjay Kariyappa +4 more

The paper proposes a vision for system-level defenses against indirect prompt injection attacks targeting AI agents, emphasizing structured control and human oversight.

View →

cs.AIcs.CRRecentMay 28, 2026

Provably Secure Agent Guardrail

Benlong Wu, Weiming Zhang, Kejiang Chen, Han Fang +1 more

The paper introduces an executable Proof-Constrained Action (ePCA) framework that secures AI agents by forcing them to formalize their intentions into first-order logical constraints, achieving provab…

View →

cs.AIcs.CRRecentMay 28, 2026

Provably Secure Agent Guardrail

Benlong Wu, Weiming Zhang, Kejiang Chen, Han Fang +1 more

The paper introduces a formal, logically constrained framework, ePCA, to secure advanced AI agents by forcing them to translate natural language intentions into first-order logical constraints before…

View →

cs.CLcs.AIRecentMay 29, 2026

Bridging Reasoning Trajectories in On-Policy Distillation via Near-Future Guidance

Yuxuan Jiang, Francis Ferraro

The paper introduces Trajectory-aware OPD (TOPD), a method that uses near-future trajectory information to improve On-Policy Distillation by accurately identifying and guiding true reasoning divergenc…

View →

cs.AIRecentMay 28, 2026

Tailoring the Curriculum: Student-Centered Reasoning Distillation via Dynamic Data-Model Compatibility

Jiahao Huang, Fei Cheng, Junfeng Jiang, Akiko Aizawa

This paper introduces the Data-Model Compatibility (DMC) metric to quantify how suitable a dataset is for reasoning distillation, showing that optimizing data selection using DMC significantly improve…

View →

cs.CRRecentMar 20, 2026

Constraint Migration: A Formal Theory of Throughput in AI Cybersecurity Pipelines

Surasak Phetmanee

The paper develops a formal theory to analyze how throughput changes in AI-enhanced cybersecurity pipelines when stage capacities are perturbed by multipliers.

View →

cs.AIcs.CLcs.LGRecentMay 29, 2026

The Deterministic Horizon: When Extended Reasoning Fails and Tool Delegation Becomes Necessary

Dongxin Guo, Jikun Wu, Siu Ming Yiu

The paper demonstrates that extended pure neural reasoning fails on complex, deterministic state-tracking tasks beyond a certain 'Deterministic Horizon,' necessitating the integration of external tool…

View →

cs.CRcs.AIRecentMay 13, 2026

Inducing Overthink: Hierarchical Genetic Algorithm-based DoS Attack on Black-Box Large Language Reasoning Models

Shuqiang Wang, Wei Cao, Jiaqi Weng, Jialing Tao +3 more

The paper proposes a black-box attack using a hierarchical genetic algorithm to induce 'overthinking' in Large Reasoning Models, demonstrating that this vulnerability can cause significant resource ex…

View →

cs.AIcs.CLcs.CRRecentApr 18, 2026

The Cognitive Penalty: Ablating System 1 and System 2 Reasoning in Edge-Native SLMs for Decentralized Consensus

Syed Muhammad Aqdas Rizvi

The paper demonstrates that for edge-native SLMs used in decentralized governance, simpler, intuitive reasoning (System 1) is significantly more robust and efficient than complex, iterative deliberati…

View →

cs.CRcs.CYRecentApr 6, 2026

Hardware-Level Governance of AI Compute: A Feasibility Taxonomy for Regulatory Compliance and Treaty Verification

Samar Ansari

The paper proposes a taxonomy of 20 hardware-level governance mechanisms for AI compute, finding that the most critical mechanisms needed for international treaty verification are currently the least…

View →

cs.CLRecentMay 29, 2026

ConsisGuard: Aligning Safety Deliberation with Policy Enforcement in LLM Guardrails

Yan Wang, Zhixuan Chu, Zihao Xue, Zhen Bi +8 more

The paper introduces ConsisGuard, a framework that addresses the 'deliberation-to-enforcement gap' in LLM guardrails by ensuring that the reasoning process is faithfully and consistently translated in…

View →