20 results for “self-distillation”
CS papers onlyHybrid search: Keyword + semantic, ranked by combined score.ⓘ
Want pure semantic search? Try claim verification →
The paper identifies a linear predictive law linking the initial performance gap in on-policy self-distillation (OPSD) to the final performance improvement, allowing researchers to anticipate and tune…
Zibo Diao, Jingchu Gai, Xinyue Ai, Zhang Zhang +2 more
The paper introduces Lossless Anti-Distillation Sampling (LADS), a novel sampling scheme that makes harvested data correlated for malicious distillers while ensuring benign users receive statistically…
DASH introduces a dual-branch distillation framework to effectively compress class-conditional diffusion models by independently supervising both score branches, significantly preserving guidance fide…
Ziyang Zheng, Zeju Li, Xiangyu Wen, Jianyuan Zhong +4 more
The paper reframes context distillation as a latent memory management problem, proposing a modular framework using LoRA adapters and a Self-Gating mechanism for efficient, selective memory retrieval a…
This paper investigates the phenomenon of 'copying' in Distribution Matching Distillation (DMD), finding that high-dimensional distillation causes student models to spontaneously reproduce the teacher…
Gaetan Narozniak, Gérard Biau, Rémi Munos, Ahmad Rammal +1 more
The paper introduces Feedback Distillation, a novel training method that uses a language model's privileged feedback to provide token-level supervision, significantly improving complex reasoning tasks…
Sangwoo Park, Woongyeong Yeo, Seanie Lee, Yumin Choi +5 more
The paper proposes SELFCI, a complementary self-distillation framework that effectively balances the privacy requirements of Contextual Integrity (CI) with the utility of large language models, outper…
The paper proposes Distribution-Aligned Self-Distillation (DASD) to improve self-distillation by dynamically filtering high-perplexity tokens, thereby preserving useful logical knowledge while suppres…
Xuewei Yang, Jiachen Yu, Jie Wu, Shaoning Sun +2 more
The paper introduces Temperature-Scaled On-Policy Self-Distillation (TS-OPSD), a novel method that internalizes temperature-based policy reheating into model parameters to combat entropy collapse in r…
Yaocheng Zhang, Jiajun Chai, Yuqian Fu, Songjun Tu +6 more
This paper proposes two horizon-control strategies, Progressive OPD (POPD) and Truncated OPD (TOPD), demonstrating that full rollouts are often unnecessary for On-Policy Distillation, leading to signi…
Hanyang Zhao, Haoxian Chen, Han Lin, Genta Indra Winata +2 more
The paper introduces OPD+, a corrected on-policy distillation framework that mathematically proves the bias of standard stop-gradient methods and improves the stability and performance of knowledge tr…
Xiaohang Tang, Keyue Jiang, Che Liu, Qifang Zhao +3 more
The paper proposes Guided Denoiser Self-Distillation (GDSD), a novel method that bypasses the use of likelihood surrogates (like ELBO) in RL for diffusion language models, achieving state-of-the-art p…
Jiazhen Huang, Xiao Chen, Xiao Luo, Yong Dai +2 more
The paper proposes Skill-Conditioned Gated Self-Distillation (SGSD), a novel framework that uses retrieved, potentially noisy skills to guide LLM reasoning, achieving state-of-the-art performance on m…
The paper develops a quantitative framework to analyze and improve flow distillation in diffusion models, providing stability guarantees and suggesting non-uniform time scheduling to reduce approximat…
Xinyu Liu, Darryl Cherian Jacob, Yang Zhou, Jindong Wang +1 more
The OISD framework improves language model reasoning by distilling on-policy predictive signals from the final output layer to intermediate representations, leading to substantial improvements on math…
The paper introduces Trust-Region behavior Blending (TRB), a warmup method that improves on-policy distillation by replacing poor early student rollouts with teacher-aligned behavior policies, leading…
Shuaike Li, Kai Zhang, Xianquan Wang, Jiachen Liu +1 more
The paper introduces Causal Editing (CODE), a new paradigm that improves knowledge updates in LLMs by grounding fact injection in causal narratives, drastically reducing self-refutation rates.
The paper proposes a theoretical framework, called constraint-coupled reasoning, to make AI models less susceptible to knowledge distillation by coupling high-level capabilities to internal stability…
Yiru Yang, Junling Wang, Nishant Kumar Singh, Luohong Wu +1 more
The paper proposes a novel layer and point-wise projection mapping combined with LoRA injection to efficiently distill knowledge from a large teacher model to a small student model, significantly impr…