Papers similar to 2604.10271v3

~ similar to 2604.10271v3· 20 results

cs.CRcs.CLRecentMay 29, 2026

LLM Anonymization Against Agentic Re-Identification

The paper introduces AURA, an LLM-powered mask-reconstruct framework, to improve text anonymization by enhancing resistance to agentic web-search re-identification while better preserving contextual u…

View →

cs.CRcs.CLRecentMay 29, 2026

LLM Anonymization Against Agentic Re-Identification

Ziwen Li, Jianing Wen, Tianshi Li

View →

cs.CRcs.CLRecentMar 24, 2026

Beyond Theoretical Bounds: Empirical Privacy Loss Calibration for Text Rewriting Under Local Differential Privacy

Weijun Li, Arnaud Grivet Sébert, Qiongkai Xu, Annabelle McIver +1 more

The paper proposes an empirical calibration method, TeDA, to provide a more comparable and interpretable assessment of privacy loss for text rewriting mechanisms under Local Differential Privacy (LDP)…

View →

cs.CVcs.AIcs.CRRecentApr 12, 2026

Toward Accountable AI-Generated Content on Social Platforms: Steganographic Attribution and Multimodal Harm Detection

Xinlei Guan, David Arosemena, Tejaswi Dhandu, Kuan Huang +6 more

The paper proposes an end-to-end forensic pipeline using steganographic attribution and multimodal harm detection to reliably trace and attribute harmful misuse of AI-generated imagery on social platf…

View →

cs.CLRecentMay 28, 2026

Linear Ensembles Wash Away Watermarks: On the Fragility of Distributional Perturbations in LLMs

Zhihao Wu, Gracia Gong, Qinglin Zhu, Yudong Chen +1 more

The paper demonstrates that combining outputs from multiple large language models (LLMs) effectively cancels out statistical watermarks, revealing a fundamental vulnerability in current AI text detect…

View →

cs.CRRecentMar 27, 2026

Protecting User Prompts Via Character-Level Differential Privacy

Shashie Dilhara Batan Arachchige, Hassan Jameel Asghar, Benjamin Zi Hao Zhao, Dinusha Vatsalan +1 more

The paper proposes a character-level differential privacy mechanism to sanitize sensitive user prompts for LLMs, achieving high privacy for PII while maintaining utility for non-sensitive context.

View →

cs.CYcs.CLcs.CRRecentApr 15, 2026

Who Gets Flagged? The Pluralistic Evaluation Gap in AI Content Watermarking

Alexander Nemecek, Osama Zafar, Yuqiao Xu, Wenbiao Li +1 more

The paper argues that current AI content watermarking benchmarks fail to test for bias across different languages, cultures, and demographics, proposing a new set of evaluation standards to ensure fai…

View →

cs.CRcs.CLcs.LGRecentMay 28, 2026

Implicit Identity Technologies for LLMs: Fingerprinting and Watermarking across Datasets, Models, and Generated Content

Bing Liu, Shunping Wang, Yufan Zhu, Xinyi Yu +4 more

This paper introduces 'implicit identity' as a unifying framework to survey and categorize LLM fingerprinting and watermarking techniques for verifying ownership and provenance across datasets, models…

View →

cs.CRcs.AIRecentMay 11, 2026

Can You Keep a Secret? Involuntary Information Leakage in Language Model Writing

Ari Holtzman, Peter West

Frontier language models involuntarily leak secret information through thematic elements in their writing, even when explicitly instructed to keep the secret hidden.

View →

cs.AIRecentMay 27, 2026

REED: Post-Training Representation Editing for Cross-Domain Linguistic Steganalysis

Ruohan Lei, Jianxin Gao, Wanli Peng, Huimin Pei

The paper proposes REED, a post-training representation editing method that significantly improves cross-domain linguistic steganalysis performance by deterministically editing intermediate feature re…

View →

cs.LGcs.CRRecentMar 20, 2026

Graph-Aware Stealthy Poison-Text Backdoors for Text-Attributed Graphs

Qi Luo, Minghui Xu, Dongxiao Yu, Xiuzhen Cheng

The paper proposes TAGBD, a graph-aware backdoor attack that demonstrates that inconspicuous poison text alone can reliably compromise text-attributed graph learning systems.

View →

cs.CRcs.AIcs.CVRecentApr 7, 2026

Harnessing Hyperbolic Geometry for Harmful Prompt Detection and Sanitization

Igor Maljkovic, Maria Rosaria Briglia, Iacopo Masi, Antonio Emanuele Cinà +1 more

The paper introduces a robust, two-part framework (HyPE and HyPS) using hyperbolic geometry to efficiently detect and sanitize malicious prompts targeting Vision-Language Models (VLMs).

View →

cs.CRcs.IRcs.LGRecentMay 13, 2026

VectorSmuggle: Steganographic Exfiltration in Embedding Stores and a Cryptographic Provenance Defense

Jascha Wanger

The paper demonstrates a class of steganographic exfiltration attacks against vector databases by hiding data within embeddings, and proposes VectorPin, a cryptographic provenance protocol to detect s…

View →

cs.CRRecentMay 7, 2026

Profiling for Pennies: Unveiling the Privacy Iceberg of LLM Agents

Jiahao Chen, Qi Zhang, Ruixiao Lin, Chunyi Zhou +6 more

The paper introduces the PrivacyIceberg framework to systematically categorize and empirically demonstrate the high risk of automated, deep personal profiling using LLM agents, revealing a significant…

View →

cs.CRRecentApr 13, 2026

Can we Watermark Low-Entropy LLM Outputs?

Noam Mazor, Andrew Morgan, Rafael Pass

This paper develops provably undetectable and robust watermarking schemes for LLM outputs even when the per-token entropy is only constant, removing previous dependencies on high entropy rates or larg…

View →

cs.CRcs.AIcs.CLRecentMay 4, 2026

PIIGuard: Mitigating PII Harvesting under Adversarial Sanitization

Mingshuo Liu, Yiwei Zha, Min Chen

PIIGuard introduces a novel webpage-level defense mechanism using optimized hidden HTML fragments to prevent LLM assistants from scraping contact-style PII, achieving high defense success rates while…

View →

cs.CRRecentMay 3, 2026

Contrastive Privacy: A Semantic Approach to Measuring Privacy of AI-based Sanitization

George Bissias, Eugene Bagdasarian, Brian Neil Levine

The paper introduces 'contrastive privacy,' a formal, model-agnostic, and quantitative method for evaluating the semantic success of AI-based sanitization across multiple media modalities.

View →

cs.CRcs.CLcs.LGRecentMay 12, 2026

Reconstruction of Personally Identifiable Information from Supervised Finetuned Models

Sae Furukawa, Alina Oprea

This paper investigates the privacy risk of reconstructing Personally Identifiable Information (PII) from Large Language Models (LLMs) that have undergone Supervised Finetuning (SFT), proposing a nove…

View →

cs.CRRecentMay 11, 2026

Generate "Normal", Edit Poisoned: Branding Injection via Hint Embedding in Image Editing

Desen Sun, Jason Hon, Howe Wang, Saarth Rajan +2 more

This paper investigates a novel security vulnerability where imperceptible branding hints can be injected into images and subsequently re-rendered onto new objects by generative AI models, proposing b…

View →

cs.CRcs.AIcs.CVRecentMar 20, 2026

CSF: Black-box Fingerprinting via Compositional Semantics for Text-to-Image Models

Junhoo Lee, Mijin Koo, Nojun Kwak

The paper introduces Compositional Semantic Fingerprinting (CSF), a black-box method that allows IP owners to attribute fine-tuned text-to-image models to their protected lineages using only query acc…

View →