~ similar to 2605.18829v1· 19 results
The paper introduces WaveGuard, a frequency-aware, single-pass defense framework that safeguards text-to-image models by injecting structured, imperceptible perturbations into generated images, thereb…
This paper investigates the phenomenon of 'copying' in Distribution Matching Distillation (DMD), finding that high-dimensional distillation causes student models to spontaneously reproduce the teacher…
The paper introduces and evaluates bounded behavioral indistinguishability, showing that while LoRA distillation improves semantic similarity, it does not guarantee that the student model is behaviora…
Yan Liang, Ziyuan Yang, Mengyu Sun, Joey Tianyi Zhou +1 more
The paper proposes SubPopMark, a novel subpopulation-driven framework that injects harmless, verifiable markers into distilled datasets to prevent copyright infringement and data leakage.
The paper proposes Distribution-Aligned Self-Distillation (DASD) to improve self-distillation by dynamically filtering high-perplexity tokens, thereby preserving useful logical knowledge while suppres…
Yanjiang Liu, Jie Lou, Xinyan Guan, Yuqiu Ji +6 more
The paper introduces Lookahead Group Reward (&) to combat Supervision Fidelity Decay (SFD) in on-policy distillation, significantly improving student model performance on long reasoning tasks.
Paul Jünger, Justin Lovelace, Linxi Zhao, Dongyoung Go +1 more
The paper introduces SARDI, a novel, training-free framework that uses low-confidence 'lookahead' tokens generated during the denoising process of discrete diffusion language models to dynamically gui…
DASH introduces a dual-branch distillation framework to effectively compress class-conditional diffusion models by independently supervising both score branches, significantly preserving guidance fide…
Max Hartman, Vidhata Jayaraman, Moulik Choraria, Yash Savani +1 more
The paper introduces TraceGuard, a detectability-aware antidistillation method that identifies and poisons 'thought anchors'—sparsely critical sentences—to degrade student model learning without makin…
Zizhuo Lin, Quanling Liu, Jinsheng Quan, Chao Zhang +5 more
The paper introduces Canonical-Context On-Policy Distillation (CCOPD) to improve multi-turn language model performance by mitigating 'self-anchored drift,' ensuring consistent answers regardless of wh…
The paper introduces ImageProtector, a user-side method that embeds an imperceptible perturbation into images to prevent Multi-modal Large Language Models (MLLMs) from analyzing and extracting sensiti…
This paper proposes a density-aware attack that constructs triggers by placing poisoned samples in low-density regions of the clean data distribution, achieving high attack success rates even after st…
Ziyang Zheng, Zeju Li, Xiangyu Wen, Jianyuan Zhong +4 more
The paper reframes context distillation as a latent memory management problem, proposing a modular framework using LoRA adapters and a Self-Gating mechanism for efficient, selective memory retrieval a…
The paper introduces Responsible Contrastive Soft Prompting (RCSP), a parameter-efficient method using soft prompts to improve LLM reliability by simultaneously suppressing hallucinations, encouraging…
Weak self-training on synthetic data can amplify a language model's existing capabilities, but this effect is strictly dependent on the compatibility between the source and student models, not on the…
The paper introduces DiffusionHijack, a supply-chain backdoor attack that compromises the PRNG used by diffusion models to deterministically control generated images, which is successfully mitigated b…
The paper introduces Asymmetric Langevin Unlearning (ALU), a novel framework that uses public data to significantly reduce the utility loss typically associated with certified machine unlearning, enab…
Guang Yang, Amir Ghasemian, Fengchen Liu, Zhong Wang +2 more
The paper proposes interaction-layer antidistillation watermarks by embedding behavioral markers into the system prompt, which successfully track knowledge distillation even when paraphrasing attacker…