~ similar to 2603.24857v1· 20 results
The paper demonstrates that current defenses against malicious fine-tuning of foundation models are insufficient because they only address fixed attacks, and introduces a unified adaptive attack that…
The paper introduces Landseer, a modular framework designed to systematically evaluate and compose multiple machine learning defenses to address complex, real-world security requirements.
This paper provides the first comprehensive review of threats and defenses specifically targeting on-device AI inference, revealing a significant imbalance where certain attack types, like adversarial…
The paper establishes a standardized security assessment framework and develops a multi-layered defensive system, demonstrating that systematic testing and external defenses are crucial for safe LLM d…
This paper introduces a novel Function Hijacking Attack (FHA) that manipulates the tool selection process of agentic models, demonstrating a robust and context-agnostic threat to function calling LLMs…
The paper introduces MCPSHIELD, a comprehensive formal security framework that systematically characterizes and provides a defense-in-depth architecture for the rapidly adopted but insecure Model Cont…
The paper introduces XFED, a novel non-collusive model poisoning attack that demonstrates the feasibility of compromising Federated Learning systems without requiring coordination among attackers, byp…
The paper proposes the Layered Attack Surface Model (LASM), a structural taxonomy that maps security threats and defenses across the complex, multi-layered architecture of AI agents, revealing signifi…
This paper provides a systematic, layered review of security risks and defense strategies for autonomous agent frameworks, using OpenClaw as a case study to address the current lack of integrated rese…
The paper introduces AutoMIA, a novel framework that uses LLM agents to automate the discovery and implementation of Membership Inference Attacks (MIAs), achieving state-of-the-art performance by syst…
This paper proposes a gap-prioritization framework to bridge the gap between theoretical cyber attack prediction research and practical operational deployment by identifying critical implementation hu…
The paper introduces the concept of 'authenticity debt'—the institutional liability from deploying unverified AI content—and proposes a layered reference architecture combining cryptographic provenanc…
The paper introduces the concept of 'authenticity debt'—the institutional liability from deploying unverified AI content—and proposes a layered reference architecture combining cryptographic provenanc…
The paper demonstrates that simpler, shallower Deep Neural Network architectures with reduced features and ReLU activations can inherently improve the robustness of ML-NIDS against gradient-based adve…
Qingwen Zeng, Zhenghao Zhao, Yitian Yang, Yiqi Zhu +5 more
This paper proposes a unified, lifecycle-centric framework and a detailed taxonomy to survey and analyze novel, finance-specific attack surfaces and vulnerabilities in AI systems used within the finan…
This paper demonstrates that current AI model extraction defenses, which assume attacks come from single sources, are easily bypassed by coordinated, distributed threat actors.
The paper demonstrates that advanced capabilities, such as jailbreaking large language models and finding software vulnerabilities, can be achieved effectively at zero cost by coordinating multiple sm…
Yuhang Wang, Haichang Gao, Zhenxing Niu, Zhaoxiang Liu +3 more
The paper systematically evaluates six OpenClaw-series AI agent frameworks, demonstrating that these agentized systems possess significant security vulnerabilities that are distinct from and more seve…
Jinghuai Zhang, Yetian He, Kunlin Cai, Han Zhao +2 more
RogueMerge introduces a unified framework to robustly attack LLM model merging by addressing the challenges of autoregressive decoding, unknown merging configurations, and prompt generalization, signi…
The paper introduces CSTM-Bench, a comprehensive benchmark and evaluation framework demonstrating that standard session-bound AI guardrails fail against sophisticated, cross-session attacks that accum…