~ similar to 2603.16960v2· 20 results
This paper systematically analyzes the high cross-architecture transferability of physical adversarial attacks on Vision-Language Models (VLMs) used in autonomous driving, demonstrating that attacks e…
This survey provides a comprehensive taxonomy and vulnerability-centric analysis of adversarial attacks targeting Multimodal Large Language Models (MLLMs), offering an explanatory framework for enhanc…
The paper establishes a standardized security assessment framework and develops a multi-layered defensive system, demonstrating that systematic testing and external defenses are crucial for safe LLM d…
The paper demonstrates that adversarial examples can be used to manipulate Vision-Language Models (VLMs) into confidently providing authoritative but incorrect information, a process termed 'AI author…
This paper introduces a dual-layer side-channel attack framework that exploits the variable workload introduced by dynamic image preprocessing in local Vision-Language Models (VLMs) to infer sensitive…
The paper establishes a theoretical information-theoretic bound proving that for Vision-Language-Action (VLA) models, capability and robustness cannot both be arbitrarily high, quantifying the trade-o…
Youness Bouchari, Matteo Boffa, Marco Mellia, Idilio Drago +2 more
The paper re-evaluates LLM agents on CTFs, finding that while general-purpose agents like claude-code are strong baselines, specialized, modular architectures significantly improve performance and con…
Yuefeng Peng, Mingzhe Li, Kejing Xia, Renhao Zhang +1 more
This paper presents the first systematic study of membership inference attacks (MIAs) against Vision-Language-Action (VLA) models, demonstrating that these models are highly vulnerable to privacy brea…
Doguhuan Yeke, Yanming Zhou, Leo Y. Lin, Hongyu Cai +2 more
The paper introduces RoboJailBench, the first standardized evaluation framework for assessing adversarial jailbreak attacks and defenses in embodied AI systems like robots.
Ye Sun, Xin Wang, Jiaming Zhang, Yifeng Gao +6 more
DarkLLM introduces a novel framework that uses a Large Language Model (LLM) to translate natural language instructions into flexible, latent adversarial attack vectors, demonstrating a systemic vulner…
This paper addresses the vulnerability of existing LLM safety monitors to adaptive attackers and proposes activation watermarking, a technique that significantly improves detection robustness against…
The paper introduces a quality-diversity evolutionary framework that evolves interpretable attack strategies, successfully discovering distinct and systematic vulnerabilities in major LLMs like GPT-4o…
The paper introduces a quality-diversity evolutionary framework that discovers diverse, interpretable vulnerabilities in large language models by evolving attack strategies at the semantic level, reve…
Siyuan Li, Zehao Liu, Xi Lin, Qinghua Mao +5 more
CoopGuard is a novel stateful, multi-round defense framework using cooperative agents to significantly reduce the success rate of evolving adversarial attacks against Large Language Models.
The paper proposes a general-purpose pipeline to train automated red teaming models capable of generating attacks for arbitrary adversarial goals, overcoming the limitations of current methods that ar…
Xiaozhe Zhang, Chaozhuo Li, Hui Liu, Shaocheng Yan +3 more
The EvoSafety framework enhances LLM safety by externalizing attack and defense mechanisms, enabling persistent, transferable, and model-agnostic robustness against adversarial prompts.
This paper analyzes the security of LLM-based autonomous agents by drawing parallels to operating system security, finding that while some vulnerabilities are inherent, many can be mitigated using est…
This paper demonstrates that LLM cascade systems, designed for efficiency, are vulnerable to targeted adversarial attacks that simultaneously degrade both performance and cost-efficiency.
The paper empirically evaluates domain-adapted and general-purpose LLMs for structured threat modelling (STRIDE on 5G security), finding that domain adaptation and model size do not guarantee reliable…
The paper systematically maps LLM agent vulnerabilities by testing 10,000 prompt variations, finding that 'goal reframing' language is the primary trigger for exploitation, rather than broad adversari…