~ similar to 2606.00995· 19 results
The paper demonstrates that the phenomenon of 'subliminal learning,' where behavioral traits are transmitted between language models, is not a fundamental learning mechanism but rather a fragile artif…
This paper investigates the phenomenon of 'copying' in Distribution Matching Distillation (DMD), finding that high-dimensional distillation causes student models to spontaneously reproduce the teacher…
Hee Suk Yoon, Eunseop Yoon, Jaehyun Jang, SooHwan Eom +5 more
The paper proposes Visual Gradient Steering (VGS), a method that decomposes the distillation loss into language and visual components and steers the optimization to prioritize visual grounding, signif…
Zizhuo Lin, Quanling Liu, Jinsheng Quan, Chao Zhang +5 more
The paper introduces Canonical-Context On-Policy Distillation (CCOPD) to improve multi-turn language model performance by mitigating 'self-anchored drift,' ensuring consistent answers regardless of wh…
Weak self-training on synthetic data can amplify a language model's existing capabilities, but this effect is strictly dependent on the compatibility between the source and student models, not on the…
Yanjiang Liu, Jie Lou, Xinyan Guan, Yuqiu Ji +6 more
The paper introduces Lookahead Group Reward (&) to combat Supervision Fidelity Decay (SFD) in on-policy distillation, significantly improving student model performance on long reasoning tasks.
The paper proposes CTRL-STEER, a closed-loop framework that adaptively adjusts intervention strength to stabilize concept regulation and improve task success in Vision-Language-Action models without r…
Bo Wang, Jia Ni, Mengnan Zhao, Zhan Qin +1 more
This paper systematically investigates unlearnable examples (UEs) across diverse training paradigms, finding that existing UEs fail under pretraining-finetuning (PF) settings, and proposes Shallow Sem…
The paper demonstrates that content suppression techniques used in language models only mask prohibited content at the output level, failing to eliminate the underlying concepts from the model's inter…
The paper introduces Responsible Contrastive Soft Prompting (RCSP), a parameter-efficient method using soft prompts to improve LLM reliability by simultaneously suppressing hallucinations, encouraging…
The paper introduces the Triangulated Preference Shift score, an automated, curation-free metric to quantify systematic lexical biases introduced into Large Language Models during the preference-learn…
This paper introduces a 'Sleep' paradigm for machine learning models to continually learn and transfer knowledge.
The paper introduces and evaluates bounded behavioral indistinguishability, showing that while LoRA distillation improves semantic similarity, it does not guarantee that the student model is behaviora…
Linfeng Liu, Tiffany Zhan, Louie Hong Yao, Saptarshi Ghosh +1 more
The paper demonstrates that the internal signals governing figurative language generation are reusable across multiple languages, showing that a steering direction learned in one language can effectiv…
Ioannis Prokopiou, Pantelis Vikatos, Maximos Kaliakatsos-Papakostas, Theodoros Giannakopoulos +1 more
The paper proposes an inference-time activation steering framework, utilizing orthogonalization, to achieve fine-grained, deterministic control over discrete musical attributes like Pitch and Duration…
The paper introduces Trajectory-aware OPD (TOPD), a method that uses near-future trajectory information to improve On-Policy Distillation by accurately identifying and guiding true reasoning divergenc…
Qi Liu, Mingdi Sun, Yongyi He, Zhi Zheng +4 more
The paper proposes EKSFT, a selective fine-tuning method that masks high-entropy or high-KL divergence tokens during Supervised Fine-Tuning (SFT) to prevent distribution shift and improve subsequent R…
This paper systematically evaluates LLMs' ability to infer pragmatic meaning from non-verbal responses, finding that their accuracy significantly drops compared to verbal inputs.