Zhong Wang
2 indexed papers
Publications per year
Top categories
Frequent co-authors
Research Timeline
The paper proposes interaction-layer antidistillation watermarks by embedding behavioral markers into the system prompt, which successfully track knowledge distillation even when paraphrasing attackers strip traditional token-level signals.
The paper introduces SHADOWMASK, the first systematic backdoor attack targeting Masked Diffusion Language Models (MDLMs), demonstrating near-100% attack success while preserving clean model utility.
Papers
Backdooring Masked Diffusion Language Models
The paper introduces SHADOWMASK, the first systematic backdoor attack targeting Masked Diffusion Language Models (MDLMs), demonstrating near-100% attack success while preserving clean model utility.