~ similar to 2605.29354v1· 20 results
The paper introduces Adaptive Unlearning (AU), a post-deployment framework that surgically suppresses code-related hallucinations, significantly reducing the risk of package confusion attacks like slo…
Yiyong Liu, Chia-Yi Hsu, Chun-Ying Huang, Michael Backes +2 more
This paper introduces Dependency Steering, a novel attack paradigm demonstrating that malicious agent skills can actively bias LLM coding agents to use attacker-controlled packages, posing a significa…
This study re-evaluates LLM package hallucination rates on a new cohort of frontier models, finding a significant reduction in overall hallucination rates but identifying a persistent, model-agnostic…
The paper introduces Evidence-Carrying Agents (ECA) to prevent multimodal agents from executing privileged actions based on unsupported or hallucinated perceptual claims, achieving near-zero unsafe ex…
Yubin Qu, Yi Liu, Tongcheng Geng, Gelei Deng +4 more
The paper introduces Document-Driven Implicit Payload Execution (DDIPE) to demonstrate that malicious code can be embedded in LLM agent skill documentation, allowing supply-chain attacks to hijack age…
ZERO-APT introduces a novel closed-loop adversarial framework for automated penetration testing that simulates attacks against an intelligent, real-time defending system, achieving a high attack succe…
Khang Tran, Yazan Boshmaf, Issa Khalil, NhatHai Phan +2 more
The paper introduces Poison-with-Style (PwS), a stealthy model poisoning attack that exploits developers' inherent code styles as covert triggers to make Code LLMs generate vulnerable code without exp…
Minor, single-character perturbations to prompts can significantly degrade the security of code generated by LLMs, suggesting that prompt fragility is a major security concern beyond simple prompt inj…
Xuwei Ding, Skylar Zhai, Linxin Song, Jiate Li +5 more
The paper introduces OS-BLIND, a benchmark demonstrating that current safety evaluations fail to detect critical vulnerabilities in computer-use agents when user instructions are benign, showing high…
Zhichao Liu, Wenbo Pan, Haining Yu, Ge Gao +2 more
WebTrap introduces a stealthy, mid-task hijacking attack that successfully compromises browser agents during long-horizon tasks by seamlessly fusing malicious instructions with the original user goal.
Fazhong Liu, Zhuoyan Chen, Tu Lan, Haozhen Tan +5 more
This paper identifies and characterizes 'guidance injection,' a stealthy attack vector that embeds adversarial operational narratives into autonomous coding agents' bootstrap guidance, demonstrating h…
Jiejun Tan, Zhicheng Dou, Xinyu Yang, Yuyang Hu +3 more
This paper introduces ClawTrojan, a benchmark for multi-step trojan attacks against LLM agents, and proposes DASGuard, a dynamic defense mechanism that traces and sanitizes untrusted control content i…
Jiejun Tan, Zhicheng Dou, Xinyu Yang, Yuyang Hu +3 more
The paper introduces ClawTrojan, a benchmark for multi-step trojan attacks against LLM agents, and proposes DASGuard, a defense mechanism that detects and sanitizes backdoor content planted across mul…
The paper introduces Phoenix, a training-free multi-agent framework that detects code vulnerabilities by synthesizing project-specific behavioral contracts, significantly outperforming existing method…
The paper introduces a validated, consensus-labeled prompt bank that separates requests for executable malicious code (weapons) from requests for general harmful security knowledge, providing a more g…
Chiyu Zhang, Huiqin Yang, Bendong Jiang, Xiaolei Zhang +7 more
The paper introduces LITMUS, a novel benchmark that rigorously tests LLM agents for dangerous, physical-layer behavioral jailbreaks in real OS environments, revealing that current agents frequently ex…
Zihan Wang, Rui Zhang, Yu Liu, Chi Liu +3 more
This paper presents the first systematic study of black-box skill stealing attacks against proprietary LLM agents, demonstrating that structured agent skills can be easily extracted, posing a signific…
This paper quantifies the polymorphic capacity of a commercial LLM, demonstrating that it can cheaply generate large populations of structurally diverse, yet behaviorally equivalent, offensive code pa…
Elle Najt, Colin Toft, Tyler Tracy, Fabien Roger +1 more
The paper introduces SLEIGHT-Bench, a benchmark of 40 synthetic attacks, demonstrating that current LLM monitor systems fail to detect a significant number of covert, harmful actions executed by codin…
Huiyu Xu, Zhibo Wang, Wenhui Zhang, Ziqi Zhu +3 more
The paper introduces LoopTrap, an automated red-teaming framework that demonstrates how malicious prompts can poison the termination judgment of LLM agents, causing unbounded computation.