Papers similar to 2605.02977v1

~ similar to 2605.02977v1· 20 results

cs.CVcs.AIcs.CRRecentApr 12, 2026

Toward Accountable AI-Generated Content on Social Platforms: Steganographic Attribution and Multimodal Harm Detection

Xinlei Guan, David Arosemena, Tejaswi Dhandu, Kuan Huang +6 more

The paper proposes an end-to-end forensic pipeline using steganographic attribution and multimodal harm detection to reliably trace and attribute harmful misuse of AI-generated imagery on social platf…

View →

cs.CRRecentMar 27, 2026

Protecting User Prompts Via Character-Level Differential Privacy

Shashie Dilhara Batan Arachchige, Hassan Jameel Asghar, Benjamin Zi Hao Zhao, Dinusha Vatsalan +1 more

The paper proposes a character-level differential privacy mechanism to sanitize sensitive user prompts for LLMs, achieving high privacy for PII while maintaining utility for non-sensitive context.

View →

cs.CVcs.AIcs.CRRecentApr 10, 2026

Leave My Images Alone: Preventing Multi-Modal Large Language Models from Analyzing Images via Visual Prompt Injection

Zedian Shao, Hongbin Liu, Yuepeng Hu, Neil Zhenqiang Gong

The paper introduces ImageProtector, a user-side method that embeds an imperceptible perturbation into images to prevent Multi-modal Large Language Models (MLLMs) from analyzing and extracting sensiti…

View →

cs.CYcs.CLcs.CRRecentApr 15, 2026

Who Gets Flagged? The Pluralistic Evaluation Gap in AI Content Watermarking

Alexander Nemecek, Osama Zafar, Yuqiao Xu, Wenbiao Li +1 more

The paper argues that current AI content watermarking benchmarks fail to test for bias across different languages, cultures, and demographics, proposing a new set of evaluation standards to ensure fai…

View →

cs.CRcs.AIcs.CVRecentMar 20, 2026

CSF: Black-box Fingerprinting via Compositional Semantics for Text-to-Image Models

Junhoo Lee, Mijin Koo, Nojun Kwak

The paper introduces Compositional Semantic Fingerprinting (CSF), a black-box method that allows IP owners to attribute fine-tuned text-to-image models to their protected lineages using only query acc…

View →

cs.CRcs.CVRecentApr 17, 2026

TwoHamsters: Benchmarking Multi-Concept Compositional Unsafety in Text-to-Image Models

Chaoshuo Zhang, Yibo Liang, Mengke Tian, Chenhao Lin +5 more

This paper introduces TwoHamsters, a new benchmark that rigorously tests Multi-Concept Compositional Unsafety (MCCU) in text-to-image models, demonstrating that current state-of-the-art models and saf…

View →

cs.CRcs.AIcs.CLRecentMay 4, 2026

PIIGuard: Mitigating PII Harvesting under Adversarial Sanitization

Mingshuo Liu, Yiwei Zha, Min Chen

PIIGuard introduces a novel webpage-level defense mechanism using optimized hidden HTML fragments to prevent LLM assistants from scraping contact-style PII, achieving high defense success rates while…

View →

cs.CRRecentMay 9, 2026

Removing the Watermark Is Not Enough: Forensic Stealth in Generative-AI Watermark Removal

Yevin Nikhel Goonatilake, Giuseppe Ateniese

The paper demonstrates that current AI watermark removal techniques fail to achieve true forensic stealth, as the removal process often leaves behind detectable signals that distinguish the output fro…

View →

cs.CRcs.AIRecentMar 18, 2026

WebPII: Benchmarking Visual PII Detection for Computer-Use Agents

Nathan Zhao

The paper introduces WebPII, a novel, large-scale synthetic benchmark for detecting personally identifiable information (PII) in web screenshots, and demonstrates a model (WebRedact) that significantl…

View →

cs.CVcs.CRRecentMay 11, 2026

What Concepts Lie Within? Detecting and Suppressing Risky Content in Diffusion Transformers

Chenyu Zhang

The paper proposes AHV-D&S, a novel training-free inference-time safeguard that detects and suppresses risky content in Diffusion Transformers (DiTs) by quantifying token sensitivity across attention…

View →

cs.CRRecentMar 30, 2026

Safeguarding LLMs Against Misuse and AI-Driven Malware Using Steganographic Canaries

Md Raz, Venkata Sai Charan Putrevu, Meet Udeshi, Prashanth Krishnamurthy +2 more

The paper introduces a novel framework using steganographic canary files to detect and block unauthorized processing of sensitive documents by LLMs, even when the data passes through traditional secur…

View →

cs.CRcs.AIcs.CVRecentApr 7, 2026

Harnessing Hyperbolic Geometry for Harmful Prompt Detection and Sanitization

Igor Maljkovic, Maria Rosaria Briglia, Iacopo Masi, Antonio Emanuele Cinà +1 more

The paper introduces a robust, two-part framework (HyPE and HyPS) using hyperbolic geometry to efficiently detect and sanitize malicious prompts targeting Vision-Language Models (VLMs).

View →

cs.CRcs.CLRecentMay 22, 2026

Robust LLM Watermarking with Minimal Semantic Distortion for IP Protection

Kieu Dang, Phung Lai, NhatHai Phan, Yelong Shen +1 more

The paper proposes SAFESEAL, a novel key-conditioned watermarking framework that embeds robust, provider-specific watermarks into LLM outputs with minimal semantic distortion, effectively protecting i…

View →

cs.CRcs.CVRecentMay 10, 2026

On the Generation and Mitigation of Harmful Geometry in Image-to-3D Models

Yule Liu, Yilong Yang, Jiale Teng, Hanze Jia +10 more

The paper systematically measures the risk of current image-to-3D models generating harmful geometries, finding that these models are effective at reconstruction and existing safeguards are insufficie…

View →

cs.CVcs.AIcs.CRRecentMar 29, 2026

Towards Context-Aware Image Anonymization with Multi-Agent Reasoning

Robert Aufschläger, Jakob Folz, Gautam Savaliya, Manjitha D Vidanalage +2 more

The paper introduces CAIAMAR, a multi-agent reasoning framework that achieves context-aware and high-fidelity anonymization of personally identifiable information (PII) in street imagery, significantl…

View →

cs.CRcs.AIRecentMay 1, 2026

A Sentence Relation-Based Approach to Sanitizing Malicious Instructions

Soumil Datta, Melissa Umble, Daniel S. Brown, Guanhong Tao

The paper introduces SONAR, a prompt sanitization framework that uses natural language inference metrics to identify and remove malicious instructions injected into LLM prompts, achieving near-zero at…

View →

cs.CLRecentMay 28, 2026

Linear Ensembles Wash Away Watermarks: On the Fragility of Distributional Perturbations in LLMs

Zhihao Wu, Gracia Gong, Qinglin Zhu, Yudong Chen +1 more

The paper demonstrates that combining outputs from multiple large language models (LLMs) effectively cancels out statistical watermarks, revealing a fundamental vulnerability in current AI text detect…

View →

cs.CRcs.SIRecentApr 20, 2026

SoK: Analysis of Privacy Risks and Mitigation in Online Propaganda Detection through the PROMPT Framework

Dhiman Goswami, Al Nahian Bin Emran, Md Hasan Ullah Sadi, Sanchari Das

The paper introduces the PROMPT framework to systematically analyze and mitigate privacy risks in online propaganda detection pipelines, demonstrating that current widely used methods are often non-co…

View →

cs.CRcs.LGRecentMay 7, 2026

FedAttr: Towards Privacy-preserving Client-Level Attribution in Federated LLM Fine-tuning

Su Zhang, Junfeng Guo, Heng Huang

FedAttr introduces a novel client-level attribution protocol for Federated Learning (FL) that accurately identifies which clients trained on watermarked data while maintaining strong privacy guarantee…

View →

cs.CRRecentMar 27, 2026

Not All Entities are Created Equal: A Dynamic Anonymization Framework for Privacy-Preserving RAG

Xinyuan Zhu, Zekun Fei, Enye Wang, Ruiqi He +4 more

The paper proposes TRIP-RAG, a dynamic anonymization framework that selectively anonymizes sensitive entities in knowledge bases used for RAG, significantly improving utility while maintaining strong…

View →