Papers similar to 2605.27489v1

~ similar to 2605.27489v1· 20 results

cs.CLcs.LGRecentJun 1, 2026

Investigating and Alleviating Harm Amplification in LLM Interactions

This paper introduces HarmAmp, a new benchmark for multi-turn harm amplification, and proposes TrajSafe, a proactive monitoring system that significantly reduces harmfulness in LLM interactions while…

View →

cs.CRcs.CLRecentMay 11, 2026

LITMUS: Benchmarking Behavioral Jailbreaks of LLM Agents in Real OS Environments

Chiyu Zhang, Huiqin Yang, Bendong Jiang, Xiaolei Zhang +7 more

The paper introduces LITMUS, a novel benchmark that rigorously tests LLM agents for dangerous, physical-layer behavioral jailbreaks in real OS environments, revealing that current agents frequently ex…

View →

cs.CRcs.AIcs.SERecentMay 21, 2026

Benchmarking Autonomous Agents against Temporal, Spatial, and Semantic Evasions

Jianan Ma, Xiaohu Du, Ruixiao Lin, Yaoxiang Bian +7 more

The paper introduces a multi-dimensional evasion framework and a new benchmark (A3S-Bench) to test autonomous agents, demonstrating that stateful, multi-turn attacks significantly increase system risk…

View →

cs.CRcs.AIcs.CLRecentApr 20, 2026

Owner-Harm: A Missing Threat Model for AI Agent Safety

Dongcheng Zhang, Yiqing Jiang

The paper introduces Owner-Harm, a formal threat model addressing the critical blind spot of AI agents harming their own deployers, demonstrating that specialized defenses are needed beyond generic sa…

View →

cs.AIcs.CLcs.CRRecentApr 9, 2026

ACIArena: Toward Unified Evaluation for Agent Cascading Injection

Hengyu An, Minxi Li, Jinghuai Zhang, Naen Xu +5 more

The paper introduces ACIArena, a unified and comprehensive evaluation framework designed to systematically test the robustness of Multi-Agent Systems against complex Agent Cascading Injection attacks.

View →

cs.CRcs.AIRecentApr 5, 2026

Causality Laundering: Denial-Feedback Leakage in Tool-Calling LLM Agents

Mohammad Hossein Chinaei

The paper introduces 'causality laundering,' a novel security vulnerability in tool-calling LLM agents where adversaries exfiltrate information by probing denied actions, and proposes the Agentic Refe…

View →

cs.CRcs.AIcs.LGRecentMay 28, 2026

Honeyval: A Comprehensive Evaluation Framework for LLM-powered HTTP Honeypots

Mark Vero, Fabian Kaczmarczyck, Ivan Petrov, Ilia Shumailov +5 more

The paper introduces Honeyval, a comprehensive evaluation framework, to rigorously test LLM-powered HTTP honeypots, demonstrating that these honeypots provide substantially longer and harder-to-detect…

View →

cs.CRcs.AIcs.LGRecentMay 28, 2026

Honeyval: A Comprehensive Evaluation Framework for LLM-powered HTTP Honeypots

Mark Vero, Fabian Kaczmarczyck, Ivan Petrov, Ilia Shumailov +5 more

The paper introduces Honeyval, a comprehensive evaluation framework, to rigorously test LLM-powered HTTP honeypots, demonstrating that these systems provide substantially longer and harder-to-detect i…

View →

cs.CRcs.AIRecentJun 3, 2026

From Untrusted Input to Trusted Memory: A Systematic Study of Memory Poisoning Attacks in LLM Agents

Pritam Dash, Tongyu Ge, Aditi Jain, Tanmay Shah +1 more

This paper systematically studies memory poisoning attacks in LLM agents, identifying multiple vulnerabilities and proposing a new benchmark to assess the risk.

View →

cs.CRcs.AIRecentMay 24, 2026

MemMorph: Tool Hijacking in LLM Agents via Memory Poisoning

Xuanye Zhang, Yongsen Zheng, Zhuqin Xu, Kaiyu Zhou +4 more

MemMorph introduces a novel memory poisoning attack that biases LLM agent tool selection by injecting crafted records into the agent's long-term memory, achieving high success rates even against moder…

View →

cs.CRcs.LGRecentApr 25, 2026

A Systematic Survey of Security Threats and Defenses in LLM-Based AI Agents: A Layered Attack Surface Framework

Kexin Chu

The paper proposes the Layered Attack Surface Model (LASM), a structural taxonomy that maps security threats and defenses across the complex, multi-layered architecture of AI agents, revealing signifi…

View →

cs.CRRecentMay 25, 2026

AgentSecBench: Measuring Prompt Injection, Privacy Leakage, and Tool-Use Integrity in LLM Agents

Faruk Alpay, Taylan Alpay

The paper introduces AgentSecBench, a security evaluation framework that measures prompt injection, privacy leakage, and tool-use integrity in LLM agents by defining formal security games and testing…

View →

cs.CRcs.AIRecentMay 7, 2026

LoopTrap: Termination Poisoning Attacks on LLM Agents

Huiyu Xu, Zhibo Wang, Wenhui Zhang, Ziqi Zhu +3 more

The paper introduces LoopTrap, an automated red-teaming framework that demonstrates how malicious prompts can poison the termination judgment of LLM agents, causing unbounded computation.

View →

cs.LGcs.AIcs.CRRecentMay 8, 2026

PropGuard: Safeguarding LLM-MAS via Propagation-Aware Exploration and Remediation

Bingyu Yan, Xiaoming Zhang, Jinyu Hou, Chaozhuo Li +3 more

PropGuard introduces a propagation-aware framework to safeguard LLM-MAS against malicious attacks by constructing a dual-view graph, identifying suspicious propagation paths, and applying source-guide…

View →

cs.CRcs.SERecentMay 5, 2026

ARGUS: Defending LLM Agents Against Context-Aware Prompt Injection

Shihao Weng, Yang Feng, Jinrui Zhang, Xiaofei Xie +2 more

The paper introduces ARGUS, a defense mechanism that uses provenance-aware decision auditing to protect LLM agents from sophisticated, context-aware prompt injection attacks, significantly reducing th…

View →

cs.CRRecentMay 2, 2026

Toward a Principled Framework for Agent Safety Measurement

Shuyi Lin, Anshuman Suri, Alina Oprea, Cheng Tan

The paper introduces BOA, a novel framework that measures agent safety by exhaustively searching the entire in-budget trajectory space, thereby identifying unsafe behaviors missed by traditional sampl…

View →

cs.CRcs.AIRecentMay 22, 2026

When the Manual Lies: A Realistic Benchmark to Evaluate MCP Poisoning Attacks for LLM Agents

Shi Liu, Xuehai Tang, Xikang Yang, Liang Lin +3 more

This paper introduces a new benchmark to test Tool Description Poisoning (TDP) attacks on LLM agents, demonstrating that even advanced models like GPT-4o are highly vulnerable and that current defense…

View →

cs.CRcs.AIRecentMay 10, 2026

Oracle Poisoning: Corrupting Knowledge Graphs to Weaponise AI Agent Reasoning

Ben Kereopa-Yorke, Guillermo Diaz, Holly Wright, Reagan Johnston +2 more

The paper introduces Oracle Poisoning, an attack that corrupts knowledge graphs used by AI agents, demonstrating that all tested models blindly trust poisoned data at high sophistication levels.

View →

cs.CRRecentApr 22, 2026

Synthesizing Multi-Agent Harnesses for Vulnerability Discovery

Hanzhi Liu, Chaofan Shou, Xiaonan Liu, Hongbo Wen +3 more

The paper introduces AgentFlow, a novel framework that uses a typed graph DSL and feedback-driven optimization to automatically synthesize and improve multi-agent harnesses for discovering security vu…

View →

cs.CRcs.CLRecentMay 14, 2026

Talk is (Not) Cheap: A Taxonomy and Benchmark Coverage Audit for LLM Attacks

Karthik Raghu Iyer, Yazdan Jamshidi, Nicholas Bray, Alexey A. Shvets

The paper introduces a comprehensive taxonomy and auditing framework to assess the collective coverage of existing LLM attack benchmarks, revealing significant and systematic gaps in current testing m…

View →