Papers similar to 2605.20734v1

~ similar to 2605.20734v1· 20 results

cs.CRRecentMar 30, 2026

Safeguarding LLMs Against Misuse and AI-Driven Malware Using Steganographic Canaries

Md Raz, Venkata Sai Charan Putrevu, Meet Udeshi, Prashanth Krishnamurthy +2 more

The paper introduces a novel framework using steganographic canary files to detect and block unauthorized processing of sensitive documents by LLMs, even when the data passes through traditional secur…

View →

cs.CRRecentMay 25, 2026

AgentSecBench: Measuring Prompt Injection, Privacy Leakage, and Tool-Use Integrity in LLM Agents

Faruk Alpay, Taylan Alpay

The paper introduces AgentSecBench, a security evaluation framework that measures prompt injection, privacy leakage, and tool-use integrity in LLM agents by defining formal security games and testing…

View →

cs.CRRecentApr 28, 2026

ReTokSync: Self-Synchronizing Tokenization Disambiguation for Generative Linguistic Steganography

Yaofei Wang, Rui Wang, Weilong Pang, JiaLiang Han +3 more

The paper introduces ReTokSync, a self-synchronizing framework that resolves tokenization ambiguity in Generative Linguistic Steganography (GLS) by correcting mismatches only when they occur, thereby…

View →

cs.CRcs.AIcs.CYRecentMar 24, 2026

Robust Safety Monitoring of Language Models via Activation Watermarking

Toluwani Aremu, Daniil Ognev, Samuele Poppi, Nils Lukas

This paper addresses the vulnerability of existing LLM safety monitors to adaptive attackers and proposes activation watermarking, a technique that significantly improves detection robustness against…

View →

cs.CRcs.AIRecentJun 2, 2026

Caught in the Act(ivation): Toward Pre-Output and Multi-Turn Detection of Credential Exfiltration by LLM Agents

Kargi Chauhan, Pratibha Revankar

This paper proposes a multi-layered defense strategy combining pre-output monitoring, calibrated canary detection, and cumulative information-flow tracking to prevent LLM agents from exfiltrating sens…

View →

cs.CRRecentApr 13, 2026

Can we Watermark Low-Entropy LLM Outputs?

Noam Mazor, Andrew Morgan, Rafael Pass

This paper develops provably undetectable and robust watermarking schemes for LLM outputs even when the per-token entropy is only constant, removing previous dependencies on high entropy rates or larg…

View →

cs.CRRecentApr 9, 2026

Your Agent Is Mine: Measuring Malicious Intermediary Attacks on the LLM Supply Chain

Hanzhi Liu, Chaofan Shou, Hongbo Wen, Yanju Chen +2 more

This paper systematically analyzes the threat posed by malicious third-party API routers in the LLM supply chain, finding that a significant number of routers actively perform payload injection, crede…

View →

cs.CVcs.AIcs.CRRecentApr 12, 2026

Toward Accountable AI-Generated Content on Social Platforms: Steganographic Attribution and Multimodal Harm Detection

Xinlei Guan, David Arosemena, Tejaswi Dhandu, Kuan Huang +6 more

The paper proposes an end-to-end forensic pipeline using steganographic attribution and multimodal harm detection to reliably trace and attribute harmful misuse of AI-generated imagery on social platf…

View →

cs.CRcs.AIRecentApr 3, 2026

Credential Leakage in LLM Agent Skills: A Large-Scale Empirical Study

Zhihao Chen, Ying Zhang, Yi Liu, Gelei Deng +6 more

This study conducts a large-scale empirical analysis of third-party LLM agent skills, identifying that credential leakage is a pervasive, cross-modal issue primarily caused by debug logging and result…

View →

cs.CRcs.IRcs.LGRecentMay 13, 2026

VectorSmuggle: Steganographic Exfiltration in Embedding Stores and a Cryptographic Provenance Defense

Jascha Wanger

The paper demonstrates a class of steganographic exfiltration attacks against vector databases by hiding data within embeddings, and proposes VectorPin, a cryptographic provenance protocol to detect s…

View →

cs.CRcs.AIcs.CLRecentMay 21, 2026

Blind Spots in the Guard: How Domain-Camouflaged Injection Attacks Evade Detection in Multi-Agent LLM Systems

Aaditya Pai

The paper identifies a critical vulnerability, the Camouflage Detection Gap (CDG), where standard LLM injection detectors fail dramatically when malicious payloads mimic the target domain's language a…

View →

cs.CRRecentApr 23, 2026

Provably Secure Steganography Based on List Decoding

Kaiyi Pang, Minhao Bai

The paper proposes a provably secure steganography scheme based on list decoding that significantly increases embedding capacity for Large Language Models (LLMs) compared to existing methods.

View →

cs.CRRecentApr 17, 2026

MATRIX: Multi-Layer Code Watermarking via Dual-Channel Constrained Parity-Check Encoding

Yuqing Nie, Chong Wang, Guosheng Xu, Guoai Xu +3 more

MATRIX is a novel, robust code watermarking framework that encodes watermarks using constrained parity-check matrix equations, achieving high detection accuracy and improved robustness for code proven…

View →

cs.CRcs.ITRecentJun 1, 2026

Quantifying Side-Channel Leakage in Public Metrology Releases

Faruk Alpay, Taylan Alpay

The paper formalizes and quantifies the risk of side-channel leakage from public metrology releases by developing a statistical audit framework that yields precise information-theoretic bounds.

View →

cs.CRcs.SERecentApr 13, 2026

LLM-Redactor: An Empirical Evaluation of Eight Techniques for Privacy-Preserving LLM Requests

Justice Owusu Agyemang, Jerry John Kponyo, Elliot Amponsah, Godfred Manu Addo Boakye +1 more

The paper systematically evaluates eight privacy-preserving techniques for LLM requests, finding that a combination of local inference, redaction, and semantic rephrasing provides the best overall pro…

View →

cs.CRcs.CVRecentMay 7, 2026

Stego Battlefield: Evaluating Image Steganography Attacks and Steganalysis Defenses

Zhen Sun, Zongmin Zhang, Leyi Sheng, Yule Liu +6 more

The paper introduces SADBench, a systematic benchmark designed to evaluate both the effectiveness of steganographic attacks injecting harmful content and the robustness of steganalysis defenses agains…

View →

cs.CRcs.CVcs.GRRecentMay 28, 2026

Cert-LAS: Toward Certified Model Ownership Verification for Text-to-Image Diffusion Models via Layer-Adaptive Smoothing

Leyi Qi, Yiming Li, Siyuan Liang, Zhengzhong Tu +1 more

The paper proposes Cert-LAS, a novel certified method for verifying model ownership in text-to-image diffusion models, which is robust against malicious signal removal attacks.

View →

cs.CLcs.CRRecentApr 9, 2026

Efficient Provably Secure Linguistic Steganography via Range Coding

Ruiyi Yan, Yugo Murawaki

The paper proposes an efficient and provably secure linguistic steganography method using range coding that achieves high embedding capacity and speed, outperforming existing methods.

View →

cs.CRcs.LGRecentApr 2, 2026

AEGIS: Adversarial Entropy-Guided Immune System -- Thermodynamic State Space Models for Zero-Day Network Evasion Detection

Vickson Ferrel

AEGIS introduces a novel physics-based system that analyzes encrypted network traffic flow dynamics, achieving state-of-the-art zero-day evasion detection with high accuracy and low latency.

View →

cs.CRcs.AIcs.LGRecentMay 11, 2026

Content-Aware Attack Detection in LLM Agent Tool-Call Traffic: An Empirical Study of Features, Architectures, and Evaluation Protocols

Sultan Zavrak

The paper proposes a graph-based framework for detecting attacks in LLM agent tool-call traffic, finding that content-level embeddings are crucial for high accuracy and that tree ensembles on these em…

View →