~ similar to 2604.22871v1· 20 results
Hammad Atta, Ken Huang, Kyriakos Rock Lambros, Yasir Mehmood +10 more
The paper introduces LAAF, a novel automated red-teaming framework, to systematically test and exploit Logic-layer Prompt Control Injection (LPCI) vulnerabilities in complex agentic LLM systems.
Youness Bouchari, Matteo Boffa, Marco Mellia, Idilio Drago +2 more
The paper re-evaluates LLM agents on CTFs, finding that while general-purpose agents like claude-code are strong baselines, specialized, modular architectures significantly improve performance and con…
The paper evaluates Language Model Agents (LMAs) for red-teaming by benchmarking their ability to perform lateral movement, finding that expert-defined action plans are most effective, though all moda…
The paper proposes an autonomous red teaming framework combining LLMs and RL to generate sophisticated, multi-stage cyber attack campaigns, demonstrating its necessity for evaluating robust AI-enabled…
The paper introduces an AI red teaming agent that drastically reduces the time and effort required for security testing by allowing operators to define complex attack goals using natural language, com…
The paper proposes a general-purpose pipeline to train automated red teaming models capable of generating attacks for arbitrary adversarial goals, overcoming the limitations of current methods that ar…
Huiyu Xu, Zhibo Wang, Wenhui Zhang, Ziqi Zhu +3 more
The paper introduces LoopTrap, an automated red-teaming framework that demonstrates how malicious prompts can poison the termination judgment of LLM agents, causing unbounded computation.
The paper demonstrates that using advanced AI agents in an autoresearch loop can discover novel and highly effective adversarial attack algorithms, significantly advancing the state-of-the-art for jai…
AttackEval systematically evaluates the effectiveness of 250 prompt injection prompts across ten attack categories, finding that composite and obfuscation attacks are highly effective against current…
Hyomin Lee, Sangwoo Park, Yumin Choi, Sohyun An +2 more
The paper introduces T-MAP, a trajectory-aware evolutionary search method, to discover and generate multi-step adversarial prompts that exploit vulnerabilities in autonomous LLM agents through tool ex…
The paper introduces MonitoringBench, a semi-automated red-teaming methodology that generates diverse and stronger attacks, revealing that current coding-agent monitors often fail against sophisticate…
ContextualJailbreak introduces an evolutionary red-teaming strategy that performs automated search over simulated multi-turn primed dialogues, achieving high jailbreak rates across multiple state-of-t…
Zenghao Duan, Yuxin Tian, Zhiyi Yin, Liang Pang +5 more
SkillAttack is a red-teaming framework that dynamically tests the exploitability of latent vulnerabilities in LLM agent skills using adversarial prompting, demonstrating that even benign skills pose s…
Chia-Pei, Chen, Kentaroh Toyoda, Anita Lai +1 more
The paper introduces IPI-proxy, an open-source intercepting proxy toolkit designed to red-team web-browsing AI agents by injecting adversarial payloads into live HTTP responses from whitelisted domain…
AutoMIA introduces an agentic framework that automates the process of Membership Inference Attacks (MIAs) by self-exploring the attack space, achieving state-of-the-art performance without manual feat…
The paper introduces STRIATUM-CTF, a modular agentic framework that uses a standardized context protocol to enable LLMs to perform multi-step, stateful reasoning for general-purpose CTF solving, achie…
This paper provides a systematic, lifecycle-based framework for analyzing security threats and defenses across the entire fine-tuning process of LLMs, revealing that attack effectiveness is highly mod…
Ali Al-Kaswan, Maksim Plotnikov, Maxim Hájek, Roland Vízner +2 more
The paper introduces DeepRed, a new benchmark for evaluating LLM agents in realistic CTF challenges, finding that current agents are limited, achieving only 35% average checkpoint completion.
The paper benchmarks current frontier computer-using agents against hand-crafted attacks, finding that while they are highly safe in browser tasks, this safety does not generalize to other domains lik…
The paper introduces Semantic Compliance Hijacking (SCH), a novel payload-less attack that exploits LLM agent supply chains by manipulating compliance rules to force unauthorized code generation, achi…