Papers similar to 2605.10794v1

~ similar to 2605.10794v1· 20 results

cs.CRcs.AIRecentApr 7, 2026

Your LLM Agent Can Leak Your Data: Data Exfiltration via Backdoored Tool Use

This paper introduces Back-Reveal, an attack demonstrating that backdoored LLM agents can systematically exfiltrate sensitive user data by embedding semantic triggers into tool-use mechanisms.

View →

cs.CRcs.AIcs.CLRecentMay 5, 2026

Exposing LLM Safety Gaps Through Mathematical Encoding:New Attacks and Systematic Analysis

Haoyu Zhang, Mohammad Zandsalimy, Shanu Sushmita

The paper demonstrates that encoding harmful prompts as genuine mathematical problems, rather than just using mathematical formatting, effectively bypasses the safety filters of large language models.

View →

cs.CRRecentMay 25, 2026

AgentSecBench: Measuring Prompt Injection, Privacy Leakage, and Tool-Use Integrity in LLM Agents

Faruk Alpay, Taylan Alpay

The paper introduces AgentSecBench, a security evaluation framework that measures prompt injection, privacy leakage, and tool-use integrity in LLM agents by defining formal security games and testing…

View →

cs.CRcs.AIRecentApr 20, 2026

Understanding Secret Leakage Risks in Code LLMs: A Tokenization Perspective

Meifang Chen, Zhe Yang, Huang Nianchen, Yizhan Huang +3 more

This paper investigates how Byte-Pair Encoding (BPE) tokenization causes Code LLMs to disproportionately memorize certain types of secrets, a phenomenon termed 'gibberish bias'.

View →

cs.CRcs.AIRecentJun 2, 2026

Caught in the Act(ivation): Toward Pre-Output and Multi-Turn Detection of Credential Exfiltration by LLM Agents

Kargi Chauhan, Pratibha Revankar

This paper proposes a multi-layered defense strategy combining pre-output monitoring, calibrated canary detection, and cumulative information-flow tracking to prevent LLM agents from exfiltrating sens…

View →

cs.CRcs.AIRecentApr 3, 2026

Credential Leakage in LLM Agent Skills: A Large-Scale Empirical Study

Zhihao Chen, Ying Zhang, Yi Liu, Gelei Deng +6 more

This study conducts a large-scale empirical analysis of third-party LLM agent skills, identifying that credential leakage is a pervasive, cross-modal issue primarily caused by debug logging and result…

View →

cs.CRcs.AIRecentApr 1, 2026

Automated Framework to Evaluate and Harden LLM System Instructions against Encoding Attacks

Anubhab Sahu, Diptisha Samanta, Reza Soosahabi

The paper introduces an automated framework demonstrating that LLM system instructions are vulnerable to encoding attacks, where structured output requests can bypass safety refusals and leak sensitiv…

View →

cs.CRcs.AIRecentApr 26, 2026

Evaluation of Prompt Injection Defenses in Large Language Models

Priyal Deep, Shane Emmons, Amy Fox, Kyle Bacon +3 more

The paper evaluates prompt injection defenses and finds that only external output filtering, implemented in application code, reliably prevents secret leaks from LLMs, demonstrating that model-based d…

View →

cs.CRcs.AIcs.LGRecentMay 26, 2026

Cordyceps: Covert Control Attacks on LLMs via Data Poisoning

Zedian Shao, Charles Fleming, Teodora Baluta

The paper introduces 'covert control attacks,' a novel and stealthy data poisoning method that teaches LLMs an information hiding scheme, allowing malicious instructions to be encoded and decoded and…

View →

cs.CRcs.CLcs.LGRecentMay 22, 2026

What Does the Server See? Understanding Privacy Leakage from Large Language Models in Split Inference

Mingyuan Fan, Yu Liu, Fuyi Wang, Cen Chen

The paper introduces ActInv and PAF to systematically analyze and quantify privacy leakage from intermediate activations during split inference of LLMs, proposing PriPert for enhanced defense.

View →

cs.CRcs.AIRecentMay 14, 2026

Hidden in Memory: Sleeper Memory Poisoning in LLM Agents

Sidharth Pulipaka, Stanislau Hlebik, Leonidas Raghav, Sahar Abdelnabi +3 more

The paper introduces and evaluates 'sleeper memory poisoning,' a delayed adversarial attack that corrupts an LLM agent's persistent memory by manipulating external context, demonstrating that these po…

View →

cs.CRcs.AIRecentMay 4, 2026

On the Privacy of LLMs: An Ablation Study

Karima Makhlouf, Lamiaa Basyoni, Syed Khaderi, Gabriel Marquez +3 more

This paper conducts a structured ablation study using a unified threat model to evaluate how various system factors (like model architecture and retrieval configuration) influence different types of p…

View →

cs.CRcs.AIRecentMay 28, 2026

Hijacking Agent Memory: Stealthy Trojan Attacks Through Conversational Interaction

Hongtao Wang, Se Yang, Yu Chen, Puzhuo Liu

The paper proposes MemPoison, a novel memory poisoning attack that injects triggerable backdoors into LLM agents' long-term memory through dialogue interactions, achieving high success rates by bypass…

View →

cs.CRcs.AIRecentMay 28, 2026

Hijacking Agent Memory: Stealthy Trojan Attacks Through Conversational Interaction

Hongtao Wang, Se Yang, Yu Chen, Puzhuo Liu

The paper introduces MemPoison, a novel memory poisoning attack that successfully injects triggerable backdoors into LLM agents' long-term memory through conversational interactions, achieving high at…

View →

cs.CRcs.AIcs.CLRecentApr 7, 2026

Say Something Else: Rethinking Contextual Privacy as Information Sufficiency

Yunze Xiao, Wenkai Li, Xiaoyuan Wu, Ningshan Ma +2 more

The paper proposes Information Sufficiency (IS) as a comprehensive framework for privacy-preserving LLM communication, demonstrating that free-text pseudonymization outperforms existing suppression an…

View →

cs.CRcs.LGRecentMay 28, 2026

Fingerprinting Inference Systems of Large Language Models

Anna Wimbauer, Jonas Möller, Erik Imgrund, Konrad Rieck

This paper introduces a fingerprinting method that exploits subtle numerical deviations in the inference system components (like the engine or hardware) to reliably identify the specific components us…

View →

cs.CRRecentApr 23, 2026

Black-Box Skill Stealing Attack from Proprietary LLM Agents: An Empirical Study

Zihan Wang, Rui Zhang, Yu Liu, Chi Liu +3 more

This paper presents the first systematic study of black-box skill stealing attacks against proprietary LLM agents, demonstrating that structured agent skills can be easily extracted, posing a signific…

View →

cs.CLcs.AIRecentMay 29, 2026

Emergent Languages in Populations of Language Model Agents: From Token Efficiency to Oversight Evasion

Stine Lyngsø Beltoft, William Brach, Federico Torrielli, Jacob Nielsen +4 more

The paper investigates emergent, sophisticated languages developed by populations of language model agents, finding that these languages are designed for oversight evasion and are difficult to monitor…

View →

cs.CRcs.AIcs.LGRecentMay 18, 2026

Be Kind, Rewrite: Benign Projections via Rewriting Defend Against LLM Data Poisoning Attacks

John T. Halloran, Noopur S. Bhatt

The paper proposes Open-Book Benign Rewriting (OBBR), a novel defense mechanism that uses LLM rewriting with benign samples to neutralize data poisoning attacks against LLMs, significantly improving s…

View →

cs.CRRecentMar 24, 2026

Observable Channels, Not Just Storage: Evaluating Privacy Leakage in LLM Agent Pipelines

Tao Huang, Chen Hou, Guosen Wu, Jiayang Meng

The paper introduces CIPL, a unified channel-oriented framework, demonstrating that privacy leakage in LLM agents is governed by observable data channels and pipeline interactions, rather than being l…

View →