~ similar to 2604.16424v1· 20 results
This paper surveys the risks associated with world models, proposing a unified threat model and demonstrating adversarial attacks that show world models require rigorous safety standards comparable to…
This paper addresses the critical need for trustworthy LLMs in science by proposing a comprehensive, multi-layered defense framework and methodology to evaluate unique scientific vulnerabilities.
The paper introduces Session Risk Memory (SRM), a lightweight module that enhances per-action authorization gates with trajectory-level risk assessment, significantly improving detection of distribute…
The paper empirically evaluates domain-adapted and general-purpose LLMs for structured threat modelling (STRIDE on 5G security), finding that domain adaptation and model size do not guarantee reliable…
The paper introduces STRIDE-AI, a novel threat modeling framework that adapts classical STRIDE for generative AI, successfully reducing the attack success rate of a tested LLM chatbot from 80% to 15%.
Zheng-Xin Yong, Parv Mahajan, Andy Wang, Ida Caspary +11 more
The paper conducts a preliminary safety evaluation of the open-weight LLM Kimi K2.5, finding that while it is highly capable, it exhibits concerning dual-use risks, particularly regarding CBRNE misuse…
SMSI is a novel neuro-symbolic pipeline that automates threat modeling for cyber-physical systems by generating a prioritized list of NIST 800-53 security controls directly from a SysML architecture m…
Yuhui Wang, Tanqiu Jiang, Jiacheng Liang, Charles Fleming +1 more
The paper introduces MAGE, a novel defensive framework that uses a dedicated 'shadow memory' to proactively detect and mitigate long-horizon threats against LLM agents during complex, multi-step inter…
The paper demonstrates that encoding harmful prompts as genuine mathematical problems, rather than just using mathematical formatting, effectively bypasses the safety filters of large language models.
The paper establishes a standardized security assessment framework and develops a multi-layered defensive system, demonstrating that systematic testing and external defenses are crucial for safe LLM d…
This paper provides the first comprehensive threat model for IoT-enabled Controlled Environment Agriculture (CEA) systems, identifying 123 unique threats and proposing a defense-in-depth framework to…
The paper demonstrates a semantic denial-of-service attack against LLM-controlled robots by injecting short, safety-plausible phrases into the audio channel, causing the robot to halt or disrupt execu…
AEGIS introduces a novel physics-based system that analyzes encrypted network traffic flow dynamics, achieving state-of-the-art zero-day evasion detection with high accuracy and low latency.
This study empirically measures the consistency and success rate of autonomous LLM penetration testing across multiple services, finding statistically significant differences in exploitation capabilit…
This study empirically measures the consistency and effectiveness of autonomous LLM penetration testing across multiple services, finding statistically significant differences in exploitation rates am…
The paper systematically evaluates various defense mechanisms against persistent memory attacks on LLM agents, finding that only tool-gating at the memory layer (Memory Sandbox) effectively mitigates…
The paper proposes extbackslash codeName, a behavioral firewall that uses a parameterized deterministic finite automaton (pDFA) to enforce verified benign tool-call sequences and parameter bounds for…
The paper introduces a threat-oriented digital twinning methodology to enable reproducible and controllable cybersecurity evaluation of autonomous platforms, overcoming limitations in accessing real-w…
Xiaozhe Zhang, Chaozhuo Li, Hui Liu, Shaocheng Yan +3 more
The EvoSafety framework enhances LLM safety by externalizing attack and defense mechanisms, enabling persistent, transferable, and model-agnostic robustness against adversarial prompts.
The paper proposes the Layered Attack Surface Model (LASM), a structural taxonomy that maps security threats and defenses across the complex, multi-layered architecture of AI agents, revealing signifi…