Papers similar to 2605.01834v1

~ similar to 2605.01834v1· 20 results

cs.CRRecentMay 13, 2026

From Compression to Accountability: Harmless Copyright Protection for Dataset Distillation

Yan Liang, Ziyuan Yang, Mengyu Sun, Joey Tianyi Zhou +1 more

The paper proposes SubPopMark, a novel subpopulation-driven framework that injects harmless, verifiable markers into distilled datasets to prevent copyright infringement and data leakage.

View →

cs.CRcs.AIRecentMay 15, 2026

Asking Back: Interaction-Layer Antidistillation Watermarks

Guang Yang, Amir Ghasemian, Fengchen Liu, Zhong Wang +2 more

The paper proposes interaction-layer antidistillation watermarks by embedding behavioral markers into the system prompt, which successfully track knowledge distillation even when paraphrasing attacker…

View →

cs.CLRecentMay 28, 2026

Linear Ensembles Wash Away Watermarks: On the Fragility of Distributional Perturbations in LLMs

Zhihao Wu, Gracia Gong, Qinglin Zhu, Yudong Chen +1 more

The paper demonstrates that combining outputs from multiple large language models (LLMs) effectively cancels out statistical watermarks, revealing a fundamental vulnerability in current AI text detect…

View →

cs.CYcs.CLcs.CRRecentApr 15, 2026

Who Gets Flagged? The Pluralistic Evaluation Gap in AI Content Watermarking

Alexander Nemecek, Osama Zafar, Yuqiao Xu, Wenbiao Li +1 more

The paper argues that current AI content watermarking benchmarks fail to test for bias across different languages, cultures, and demographics, proposing a new set of evaluation standards to ensure fai…

View →

cs.CRcs.CVRecentMay 16, 2026

Watermarks Attack Watermarks: Re-Watermarking as a Generic Removal Strategy

Maria Bulychev, Neil G. Marchant, Benjamin I. P. Rubinstein

The paper proposes a simple, generic attack strategy—re-watermarking—that reliably suppresses existing watermarks, demonstrating that watermarks can be used to attack other watermarks.

View →

cs.CRcs.CVRecentApr 14, 2026

Scaling Exposes the Trigger: Input-Level Backdoor Detection in Text-to-Image Diffusion Models via Cross-Attention Scaling

Zida Li, Jun Li, Yuzhe Sha, Ziqiang Li +2 more

The paper introduces SET, a robust input-level backdoor detection framework that detects hidden malicious triggers in text-to-image diffusion models by analyzing systematic differences in how benign a…

View →

cs.CRcs.CVRecentMay 2, 2026

Checkerboard: A Simple, Effective, Efficient and Learning-free Clean Label Backdoor Attack with Low Poisoning Budget

Yi Yang, Jinyang Huang, Binbin Liu, Feng-Qi Cui +4 more

The paper introduces Checkerboard, a novel, learning-free clean-label backdoor attack that efficiently poisons training data to compromise model integrity with minimal poisoning budget.

View →

cs.CRRecentApr 12, 2026

DuCodeMark: Dual-Purpose Code Dataset Watermarking via Style-Aware Watermark-Poison Design

Yuchen Chen, Yuan Xiao, Chunrong Fang, Zhenyu Chen +1 more

DuCodeMark introduces a robust, dual-purpose watermarking technique that embeds ownership signals into code datasets, ensuring protection across both source-code generation and decompilation tasks.

View →

cs.CRRecentMay 4, 2026

VertMark: A Unified Training-Free Robust Watermarking Framework for Vertical Domain Pre-trained Language Models

Cong Kong, Xin Cheng, Zhaoxia Yin, Shuai Li +2 more

VertMark introduces a novel, unified, and training-free framework to embed robust watermarks into vertical domain pre-trained language models (VPLMs) for copyright protection across multiple specializ…

View →

cs.CRcs.AIcs.CYRecentMay 13, 2026

Watermarking Should Be Treated as a Monitoring Primitive

Toluwani Aremu, Nils Lukas, Jie Zhang

The paper argues that watermarking must be viewed as a monitoring primitive, introducing an observer-based threat model that shows even zero-bit watermarking can enable entity-level attribution throug…

View →

cs.CRRecentApr 29, 2026

Differentially Private Contrastive Learning via Bounding Group-level Contribution

Kecen Li, Chen Gong, Zinan Lin, Tianhao Wang +1 more

The paper proposes DP-GCL, a novel differentially private contrastive learning framework that improves representation learning on sensitive data by bounding gradient dependency through localized group…

View →

cs.CRRecentMay 3, 2026

Contrastive Privacy: A Semantic Approach to Measuring Privacy of AI-based Sanitization

George Bissias, Eugene Bagdasarian, Brian Neil Levine

The paper introduces 'contrastive privacy,' a formal, model-agnostic, and quantitative method for evaluating the semantic success of AI-based sanitization across multiple media modalities.

View →

cs.CRRecentMar 17, 2026

Poisoning the Pixels: Revisiting Backdoor Attacks on Semantic Segmentation

Guangsheng Zhang, Huan Tian, Leo Zhang, Tianqing Zhu +3 more

This paper systematically revisits and expands the threat model for backdoor attacks on semantic segmentation, proposing a unified framework (BADSEG) that demonstrates severe, previously overlooked vu…

View →

cs.CRcs.CLRecentMay 22, 2026

Robust LLM Watermarking with Minimal Semantic Distortion for IP Protection

Kieu Dang, Phung Lai, NhatHai Phan, Yelong Shen +1 more

The paper proposes SAFESEAL, a novel key-conditioned watermarking framework that embeds robust, provider-specific watermarks into LLM outputs with minimal semantic distortion, effectively protecting i…

View →

cs.CRRecentApr 15, 2026

BackFlush: Knowledge-Free Backdoor Detection and Elimination with Watermark Preservation in Large Language Models

Jagadeesh Rachapudi, Ritali Vatsi, Pranav Singh, Praful Hambarde +1 more

BackFlush introduces a novel, knowledge-free framework that detects and eliminates unknown backdoor attacks in LLMs while simultaneously preserving existing watermarks, achieving high detection rates…

View →

cs.CRcs.AIcs.CVRecentMar 31, 2026

Beyond Corner Patches: Semantics-Aware Backdoor Attack in Federated Learning

Kavindu Herath, Joshua Zhao, Saurabh Bagchi

This paper proposes SABLE, a method for generating semantically meaningful and in-distribution backdoor triggers for federated learning, demonstrating that such attacks remain a potent and practical t…

View →

cs.CRcs.AIcs.LGRecentMay 26, 2026

Cordyceps: Covert Control Attacks on LLMs via Data Poisoning

Zedian Shao, Charles Fleming, Teodora Baluta

The paper introduces 'covert control attacks,' a novel and stealthy data poisoning method that teaches LLMs an information hiding scheme, allowing malicious instructions to be encoded and decoded and…

View →

cs.LGcs.CRRecentMay 27, 2026

Density-aware Sample-specific Attack

Qiyuan Wang, Yao Li, Raymond K. W. Wong

This paper proposes a density-aware attack that constructs triggers by placing poisoned samples in low-density regions of the clean data distribution, achieving high attack success rates even after st…

View →

cs.IRcs.CRRecentApr 26, 2026

Green-Red Watermarking for Recommender Systems

Lei Zhou, Min Gao, Zongwei Wang, Yibing Bai +1 more

The paper proposes GREW, a novel Green-REd Watermarking framework that embeds ownership signals into recommender systems' intrinsic ranking process without requiring synthetic data, achieving robust p…

View →

cs.CRcs.LGRecentMay 19, 2026

Awakening the Hydra: Stabilizing Multi-Concept Backdoor Injection in Text-to-Image Diffusion Models

Kai Wang, Jiale Zhang, Chengcheng Zhu, Chuang Ma +1 more

The paper proposes Hydra, a framework to stabilize and control the injection of multiple, conflicting backdoor triggers into text-to-image diffusion models, ensuring high attack reliability while main…

View →