Papers similar to 2604.24083v1

~ similar to 2604.24083v1· 20 results

cs.AIcs.CLcs.CRRecentApr 27, 2026

An Information-Geometric Framework for Stability Analysis of Large Language Models under Entropic Stress

The paper proposes a novel information-geometric framework to analyze LLM stability by integrating task utility, external entropy, and internal structural proxies, showing this composite score improve…

View →

cs.CRcs.AIcs.LGRecentApr 1, 2026

Safety, Security, and Cognitive Risks in World Models

Manoj Parmar

This paper surveys the risks associated with world models, proposing a unified threat model and demonstrating adversarial attacks that show world models require rigorous safety standards comparable to…

View →

cs.LGcs.AIcs.CRRecentApr 8, 2026

When Safety Geometry Collapses: Fine-Tuning Vulnerabilities in Agentic Guard Models

Ismail Hossain, Sai Puppala, Jannatul Ferdaus, Md Jahangir Alam +3 more

The paper demonstrates that fine-tuning safety guard models on benign data can catastrophically collapse their safety alignment, proposing Fisher-Weighted Safety Subspace Regularization (FW-SSR) to ac…

View →

cs.CReess.SYRecentMay 14, 2026

On the (non-)resilience of encrypted controllers to covert attacks

Philipp Binfet, Janis Adamek, Moritz Schulze Darup

The paper demonstrates that standard homomorphic encryption (HE) schemes are insufficient to guarantee integrity in networked control systems (NCS) against covert attacks, proposing instead a verifiab…

View →

cs.CRcs.AIRecentMar 28, 2026

SafetyDrift: Predicting When AI Agents Cross the Line Before They Actually Do

Aditya Dhodapkar, Farhaan Pishori

The paper introduces SafetyDrift, a predictive model that forecasts when AI agents will violate safety protocols by analyzing the cumulative risk across sequences of individually safe actions.

View →

cs.CRRecentMay 1, 2026

Composable Post-Quantum Security for FADEC-Coupled Dual-Spool Turbofan Cyber-Physical Systems

Faruk Alpay, Taylan Alpay

The paper develops a unified mathematical framework to analyze the interaction between post-quantum security, real-time communication constraints, and closed-loop stability in safety-critical turbofan…

View →

cs.CRcs.LGRecentMay 19, 2026

Latent Geometry as a Structural Monitor: Eigenspace Alignment for Anomaly Detection in Anonymity Networks

Vaibhav Chhabra

The paper proposes using geometric metrics, specifically eigenspace alignment, to monitor the structural integrity of large behavioral populations, demonstrating its effectiveness in detecting network…

View →

cs.ROcs.AIcs.LGRecentJun 1, 2026

Permissive Safety Through Trusted Inference: Verifiable Belief-Space Neural Safety Filters for Assured Interactive Robotics

Haimin Hu

The paper proposes an algorithmic method using conformal prediction to formally certify high-probability safety for Belief-Space Neural Safety Filters (BeliefSF), significantly improving safety guaran…

View →

cs.AIcs.CReess.SYRecentMay 4, 2026

Stable Agentic Control: Tool-Mediated LLM Architecture for Autonomous Cyber Defense

Kerri Prinos, Lilianne Brush, Cameron Denton, Zhanqi Wang +4 more

The paper proposes a tool-mediated LLM architecture for autonomous cyber defense, formally proving its stability and demonstrating that it significantly reduces an attacker's expected payoff in real-w…

View →

cs.AIRecentMay 31, 2026

Recognize Your Orchestrator: An Entropy Dynamics Perspective for LLM Multi-Agent Systems

Junze Zhu, Weihao Chen, Xuanwang Zhang, Zhen Wu +1 more

The paper proposes an Entropy Dynamics framework to analyze the stability and failure modes of centralized orchestration in Multi-Agent Systems, identifying a 'Reasoning Trap' where complex reasoning…

View →

cs.CRcs.AIcs.CLRecentApr 4, 2026

Safety, Security, and Cognitive Risks in State-Space Models: A Systematic Threat Analysis with Spectral, Stateful, and Capacity Attacks

Manoj Parmar

This paper provides the first systematic threat analysis of State-Space Models (SSMs) in safety-critical applications, introducing novel attack classes and formal metrics to quantify their security an…

View →

cs.CRcs.LGRecentApr 2, 2026

AEGIS: Adversarial Entropy-Guided Immune System -- Thermodynamic State Space Models for Zero-Day Network Evasion Detection

Vickson Ferrel

AEGIS introduces a novel physics-based system that analyzes encrypted network traffic flow dynamics, achieving state-of-the-art zero-day evasion detection with high accuracy and low latency.

View →

cs.CRcs.LGquant-phRecentMay 19, 2026

Quantum Machine Learning for Cyber-Physical Anomaly Detection in Unmanned Aerial Vehicles: A Leakage-Free Evaluation with Proxy-Audited Feature Sets

Carlos A. Durán Paredes, Javier E. León Calderón, Nicolás Sánchez Perea, Germán Darío Díaz +1 more

The paper evaluates quantum machine learning for detecting anomalies in UAVs using a rigorous, leakage-free methodology, showing that a hybrid XGBoost + Data Reuploading classifier performs well, part…

View →

cs.CRcs.AIcs.CLRecentApr 3, 2026

An Independent Safety Evaluation of Kimi K2.5

Zheng-Xin Yong, Parv Mahajan, Andy Wang, Ida Caspary +11 more

The paper conducts a preliminary safety evaluation of the open-weight LLM Kimi K2.5, finding that while it is highly capable, it exhibits concerning dual-use risks, particularly regarding CBRNE misuse…

View →

cs.CRcs.DCeess.SYRecentApr 15, 2026

Digital Guardians: The Past and The Future of Cyber-Physical Resilience

Saurabh Bagchi, Hyunseung Kim, Tarek Abdelzaher, Homa Alemzadeh +19 more

This survey provides a comprehensive, systematic roadmap for achieving cyber-physical system (CPS) resilience by integrating five interconnected themes: system-wide properties, handling data scarcity…

View →

cs.AIcs.CRRecentApr 26, 2026

Structural Enforcement of Goal Integrity in AI Agents via Separation-of-Powers Architecture

Rong Xiang

The paper proposes the Policy-Execution-Authorization (PEA) architecture, a separation-of-powers system designed to structurally enforce goal integrity in AI agents, moving safety from a probabilistic…

View →