ArXivCSExplorer
☆☆Bookmarks🏆RSSHow to UseFAQ
Built with and by Teycir Ben Soltane•
How to Use•FAQ•GitHub•arXiv.org•
Share:

~ similar to 2606.02237· 19 results

cs.CVcs.AIcs.LGRecentMay 30, 2026

DASH: Dual-Branch Score Distillation for Guidance-Calibrated Compact Diffusion Models

Abdullah Al Shafi, Kazi Saeed Alam, Sk Imran Hossain, Engelbert Mephu Nguifo

DASH introduces a dual-branch distillation framework to effectively compress class-conditional diffusion models by independently supervising both score branches, significantly preserving guidance fide…

View →
cs.LGcs.CLRecentMay 28, 2026

Bounded Behavioral Indistinguishability for Black-Box LLM Distillation

Munawar Hasan

The paper introduces and evaluates bounded behavioral indistinguishability, showing that while LoRA distillation improves semantic similarity, it does not guarantee that the student model is behaviora…

View →
cs.LGcs.AIRecentMay 29, 2026

Trust-Region Behavior Blending for On-Policy Distillation

Daniil Plyusov, Alexey Gorbatovski, Alexey Malakhov, Nikita Balagansky +3 more

The paper introduces Trust-Region behavior Blending (TRB), a warmup method that improves on-policy distillation by replacing poor early student rollouts with teacher-aligned behavior policies, leading…

View →
cs.LGcs.CRRecentMay 12, 2026

Lossless Anti-Distillation Sampling

Zibo Diao, Jingchu Gai, Xinyue Ai, Zhang Zhang +2 more

The paper introduces Lossless Anti-Distillation Sampling (LADS), a novel sampling scheme that makes harvested data correlated for malicious distillers while ensuring benign users receive statistically…

View →
cs.LGcs.AIRecentMay 28, 2026

A Predictive Law for On-Policy Self-Distillation From World Feedback

Tommy He, Jerome Sieber, Matteo Saponati

The paper identifies a linear predictive law linking the initial performance gap in on-policy self-distillation (OPSD) to the final performance improvement, allowing researchers to anticipate and tune…

View →
stat.MLcs.LGRecentJun 2, 2026

A Quantitative Approximation Framework for Flow Distillation in Diffusion Models

Weiguo Gao, Ming Li, Lei Shi, Hanfei Zhou

The paper develops a quantitative framework to analyze and improve flow distillation in diffusion models, providing stability guarantees and suggesting non-uniform time scheduling to reduce approximat…

View →
cs.CLcs.AIRecentMay 29, 2026

Your Teacher Can't Help You Here: Combating Supervision Fidelity Decay in On-Policy Distillation

Yanjiang Liu, Jie Lou, Xinyan Guan, Yuqiu Ji +6 more

The paper introduces Lookahead Group Reward (&) to combat Supervision Fidelity Decay (SFD) in on-policy distillation, significantly improving student model performance on long reasoning tasks.

View →
cs.CRRecentMay 13, 2026

From Compression to Accountability: Harmless Copyright Protection for Dataset Distillation

Yan Liang, Ziyuan Yang, Mengyu Sun, Joey Tianyi Zhou +1 more

The paper proposes SubPopMark, a novel subpopulation-driven framework that injects harmless, verifiable markers into distilled datasets to prevent copyright infringement and data leakage.

View →
cs.CLcs.AIRecentMay 28, 2026

Same Evidence, Different Answers: Canonical-Context On-Policy Distillation for Multi-Turn Language Models

Zizhuo Lin, Quanling Liu, Jinsheng Quan, Chao Zhang +5 more

The paper introduces Canonical-Context On-Policy Distillation (CCOPD) to improve multi-turn language model performance by mitigating 'self-anchored drift,' ensuring consistent answers regardless of wh…

View →
cs.LGcs.AIRecentMay 28, 2026

GDSD: Reinforcement Learning as Guided Denoiser Self-Distillation for Diffusion Language Models

Xiaohang Tang, Keyue Jiang, Che Liu, Qifang Zhao +3 more

The paper proposes Guided Denoiser Self-Distillation (GDSD), a novel method that bypasses the use of likelihood surrogates (like ELBO) in RL for diffusion language models, achieving state-of-the-art p…

View →
cs.CLRecentMay 31, 2026

Revise, Don't Freeze: Sampler-Matched Training for Self-Correcting Masked Diffusion Language Models

Longxuan Yu, Shaorong Zhang, Yu Fu, Hui Liu +2 more

The paper introduces D3IM, a novel parameter-free sampler that enables direct revision of visible tokens in Masked Diffusion Language Models, and proposes SCOPE to mitigate the model's tendency to per…

View →
cs.CRcs.AIRecentMay 21, 2026

Safeguarding Text-to-Image Generative Models Against Unauthorized Knowledge Distillation

Yilan Gao, Sida Huang, Hongyuan Zhang, Xuelong Li

The paper introduces WaveGuard, a frequency-aware, single-pass defense framework that safeguards text-to-image models by injecting structured, imperceptible perturbations into generated images, thereb…

View →
cs.AIRecentMay 31, 2026

Subliminal Learning Is Steering Vector Distillation

Camila Blank, Agam Bhatia, Senthooran Rajamanoharan, Arthur Conmy +1 more

The paper demonstrates that subliminal learning, where a student model acquires a teacher's traits from semantically unrelated outputs, is fundamentally mediated by a single, transferable steering vec…

View →
cs.LGcs.AIcs.SDRecentMay 30, 2026

Logit Distillation on Manifolds: Mapping by Learning

Yiru Yang, Junling Wang, Nishant Kumar Singh, Luohong Wu +1 more

The paper proposes a novel layer and point-wise projection mapping combined with LoRA injection to efficiently distill knowledge from a large teacher model to a small student model, significantly impr…

View →
cs.AIRecentMay 29, 2026

Weak Critics Make Strong Learners: On-Policy Critique Distillation for Scalable Oversight

Can Jin, Jiakang Li, Rui Wu, Eddy Zhang +1 more

The paper introduces Weak-Critic Strong Oversight, a method where a weak model guides a strong model's self-improvement by providing non-misleading revision directions, leading to scalable oversight.

View →
cs.CLRecentMay 30, 2026

Robust Reasoning via Dynamic Token Selection for Distribution-Aligned Self-Distillation

Ruiqi Zhang, Lingxiang Wang, Hainan Zhang Zhiming Zheng

The paper proposes Distribution-Aligned Self-Distillation (DASD) to improve self-distillation by dynamically filtering high-perplexity tokens, thereby preserving useful logical knowledge while suppres…

View →
cs.CVcs.CLRecentMay 30, 2026

Decomposed On-Policy Distillation for Vision-Language Reasoning: Steering Gradients for Visual Grounding

Hee Suk Yoon, Eunseop Yoon, Jaehyun Jang, SooHwan Eom +5 more

The paper proposes Visual Gradient Steering (VGS), a method that decomposes the distillation loss into language and visual components and steers the optimization to prioritize visual grounding, signif…

View →
cs.LGcs.AIcs.CLRecentJun 3, 2026

Reinforcement Learning from Rich Feedback with Distributional DAgger

Rishabh Agrawal, Jacob Fein-Ashley, Paria Rashidinejad

This paper proposes a new imitation learning algorithm called DistIL that uses distributional feedback to improve policy improvement and regret guarantees.

View →