~ similar to 2605.30501· 19 results
Kieu Dang, Phung Lai, NhatHai Phan, Yelong Shen +1 more
The paper proposes SAFESEAL, a novel key-conditioned watermarking framework that embeds robust, provider-specific watermarks into LLM outputs with minimal semantic distortion, effectively protecting i…
Hanbo Huang, Xuan Gong, Yiran Zhang, Hao Zheng +1 more
The paper introduces RLSpoofer, a lightweight, black-box reinforcement learning attack that demonstrates the fragile resilience of current LLM watermarking schemes by achieving a high spoofing success…
PASA introduces a robust, semantic-level watermarking technique that embeds and detects watermarks in the latent embedding space, successfully resisting semantic-invariant attacks like paraphrasing.
Shuhao Zhang, Yuli Chen, Jiale Han, Bo Cheng +1 more
The paper proposes Adaptive Stealing (AS), a novel and more robust watermark stealing algorithm that dynamically selects optimal attack perspectives to significantly increase the efficiency of comprom…
XMark introduces a novel multi-bit watermarking technique that reliably embeds binary messages into LLM-generated text while maintaining high text quality and robust performance even with limited toke…
This paper develops provably undetectable and robust watermarking schemes for LLM outputs even when the per-token entropy is only constant, removing previous dependencies on high entropy rates or larg…
Tom Sander, Hongyan Chang, Tomáš Souček, Tuan Tran +9 more
TextSeal is a novel, non-overhead, and robust watermark for LLMs that enables accurate provenance tracking and detection of unauthorized use even after model distillation.
The paper introduces LUNA, a linguistically adaptive watermarking technique that achieves high detection accuracy across diverse languages while maintaining minimal text distortion, outperforming exis…
The paper introduces BREW, a novel framework that significantly improves the reliability of multi-bit text watermarking for LLMs by replacing flawed decoding-centric methods with a designated two-stag…
Leyi Qi, Yiming Li, Siyuan Liang, Zhengzhong Tu +1 more
The paper proposes Cert-LAS, a novel certified method for verifying model ownership in text-to-image diffusion models, which is robust against malicious signal removal attacks.
The paper proposes a novel binomial multibit LLM watermarking scheme that encodes every bit of a payload at every token position, achieving superior message accuracy and robustness compared to existin…
Alexander Nemecek, Osama Zafar, Yuqiao Xu, Wenbiao Li +1 more
The paper argues that current AI content watermarking benchmarks fail to test for bias across different languages, cultures, and demographics, proposing a new set of evaluation standards to ensure fai…
The paper introduces SeedHijack, a novel, undetectable supply-chain attack that biases LLM watermarking signals by hijacking the underlying Pseudo-Random Number Generator (PRNG) without altering the g…
The paper introduces SeedHijack, a novel, undetectable supply-chain attack that biases LLM watermarking signals by hijacking the underlying PRNG, thereby amplifying the watermark without altering the…
The paper analyzes the robustness of current LLM watermarking schemes against various text modifications, concluding that watermarks can be removed with reasonable effort.
Zikang Ding, Junhao Li, Suling Wu, Junchi Yao +2 more
The paper proposes Functional Subspace Watermarking (FSW), a robust method that embeds ownership signals into a stable, low-dimensional functional subspace of LLMs, significantly improving detection a…
Cong Kong, Xin Cheng, Zhaoxia Yin, Shuai Li +2 more
VertMark introduces a novel, unified, and training-free framework to embed robust watermarks into vertical domain pre-trained language models (VPLMs) for copyright protection across multiple specializ…
The paper argues that watermarking must be viewed as a monitoring primitive, introducing an observer-based threat model that shows even zero-bit watermarking can enable entity-level attribution throug…
The paper proposes a novel global sketch-based watermarking technique for diffusion language models that controls the entire sequence's statistics, offering an order-agnostic and context-independent a…