Papers similar to 2606.00995

~ similar to 2606.00995· 19 results

cs.AIcs.LGRecentMay 30, 2026

Subliminal Learning is a LoRA Artifact

Todd Nief, Harvey Yiyun Fu, Mark Muchane, Ari Holtzman

The paper demonstrates that the phenomenon of 'subliminal learning,' where behavioral traits are transmitted between language models, is not a fundamental learning mechanism but rather a fragile artif…

View →

cs.LGRecentJun 1, 2026

Why Are DMD Students Lazy? Understanding the Copying Behavior in Few-Step Distillation

Shucheng Li, Iolo Jones, Alexander Tong, Michael M. Bronstein

This paper investigates the phenomenon of 'copying' in Distribution Matching Distillation (DMD), finding that high-dimensional distillation causes student models to spontaneously reproduce the teacher…

View →

cs.CVcs.CLRecentMay 30, 2026

Decomposed On-Policy Distillation for Vision-Language Reasoning: Steering Gradients for Visual Grounding

Hee Suk Yoon, Eunseop Yoon, Jaehyun Jang, SooHwan Eom +5 more

The paper proposes Visual Gradient Steering (VGS), a method that decomposes the distillation loss into language and visual components and steers the optimization to prioritize visual grounding, signif…

View →

cs.CLcs.AIRecentMay 28, 2026

Same Evidence, Different Answers: Canonical-Context On-Policy Distillation for Multi-Turn Language Models

Zizhuo Lin, Quanling Liu, Jinsheng Quan, Chao Zhang +5 more

The paper introduces Canonical-Context On-Policy Distillation (CCOPD) to improve multi-turn language model performance by mitigating 'self-anchored drift,' ensuring consistent answers regardless of wh…

View →

cs.CLcs.AIcs.LGRecentMay 29, 2026

Not All Synthetic Data Is Yours to Learn From

Sina Alemohammad, Li Chen, Richard G. Baraniuk, Zhangyang Wang

Weak self-training on synthetic data can amplify a language model's existing capabilities, but this effect is strictly dependent on the compatibility between the source and student models, not on the…

View →

cs.CLcs.AIRecentMay 29, 2026

Your Teacher Can't Help You Here: Combating Supervision Fidelity Decay in On-Policy Distillation

Yanjiang Liu, Jie Lou, Xinyan Guan, Yuqiu Ji +6 more

The paper introduces Lookahead Group Reward (&) to combat Supervision Fidelity Decay (SFD) in on-policy distillation, significantly improving student model performance on long reasoning tasks.

View →

cs.AIRecentMay 29, 2026

Closed-Loop Neural Activation Control in Vision-Language-Action Models

Abhijith Babu, Ramneet Kaur, Nathaniel D. Bastian, Olivera Kotevska +4 more

The paper proposes CTRL-STEER, a closed-loop framework that adaptively adjusts intervention strength to stabilize concept regulation and improve task success in Vision-Language-Action models without r…

View →

cs.LGcs.AIcs.CRRecentApr 18, 2026

Channel-Level Semantic Perturbations: Unlearnable Examples for Diverse Training Paradigms

Bo Wang, Jia Ni, Mengnan Zhao, Zhan Qin +1 more

This paper systematically investigates unlearnable examples (UEs) across diverse training paradigms, finding that existing UEs fail under pretraining-finetuning (PF) settings, and proposes Shallow Sem…

View →

cs.CLcs.AIRecentMay 27, 2026

The Attentional White Bear Effect in Transformer Language Models

Rebecca Ramnauth, Brian Scassellati

The paper demonstrates that content suppression techniques used in language models only mask prohibited content at the output level, failing to eliminate the underlying concepts from the model's inter…

View →

cs.CLcs.LGRecentMay 30, 2026

Towards Lightweight Reliability: Using Soft Prompts for Hallucination Mitigation in Large Language Models

S M Tahmid Siddiqui, Akib Jawad Ononto, Anoop Singhal, Latifur Khan

The paper introduces Responsible Contrastive Soft Prompting (RCSP), a parameter-efficient method using soft prompts to improve LLM reliability by simultaneously suppressing hallucinations, encouraging…

View →

cs.CLcs.AIRecentMay 29, 2026

Isolating LLM Lexical Bias: A Curation-Free Triangulated Metric for Preference-Stage Learning

Xiaoyang Ming, Jose Hernandez, Thomas Stephan Juzek

The paper introduces the Triangulated Preference Shift score, an automated, curation-free metric to quantify systematic lexical biases introduced into Large Language Models during the preference-learn…

View →

cs.LGcs.AIRecentJun 2, 2026

Language Models Need Sleep: Learning to Self-Modify and Consolidate Memories

Ali Behrouz, Farnoosh Hashemi, Vahab Mirrokni

This paper introduces a 'Sleep' paradigm for machine learning models to continually learn and transfer knowledge.

View →

cs.LGcs.CLRecentMay 28, 2026

Bounded Behavioral Indistinguishability for Black-Box LLM Distillation

Munawar Hasan

The paper introduces and evaluates bounded behavioral indistinguishability, showing that while LoRA distillation improves semantic similarity, it does not guarantee that the student model is behaviora…

View →

cs.CLRecentMay 28, 2026

Cross-Lingual Steering for Figurative Language Generation

Linfeng Liu, Tiffany Zhan, Louie Hong Yao, Saptarshi Ghosh +1 more

The paper demonstrates that the internal signals governing figurative language generation are reusable across multiple languages, showing that a steering direction learned in one language can effectiv…

View →

cs.SDcs.AIcs.IRRecentMay 29, 2026

Latent Space Disentanglement via Activation Steering for Interpretable Attribute Control in Symbolic Music Generation

Ioannis Prokopiou, Pantelis Vikatos, Maximos Kaliakatsos-Papakostas, Theodoros Giannakopoulos +1 more

The paper proposes an inference-time activation steering framework, utilizing orthogonalization, to achieve fine-grained, deterministic control over discrete musical attributes like Pitch and Duration…

View →

cs.CLcs.AIRecentMay 29, 2026

Bridging Reasoning Trajectories in On-Policy Distillation via Near-Future Guidance

Yuxuan Jiang, Francis Ferraro

The paper introduces Trajectory-aware OPD (TOPD), a method that uses near-future trajectory information to improve On-Policy Distillation by accurately identifying and guiding true reasoning divergenc…

View →

cs.AIRecentMay 28, 2026

Entropy-KL Divergence-based Token Masking: A Novel Approach for Selective Fine-tuning of Large Language Models

Qi Liu, Mingdi Sun, Yongyi He, Zhi Zheng +4 more

The paper proposes EKSFT, a selective fine-tuning method that masks high-entropy or high-KL divergence tokens during Supervised Fine-Tuning (SFT) to prevent distribution shift and improve subsequent R…

View →

cs.CLcs.AIRecentJun 1, 2026

Unveiling the Limits of Large Language Models in Inferring Pragmatic Meaning from Non-Verbal Responses

Sugyeong Eo, Heuiseok Lim

This paper systematically evaluates LLMs' ability to infer pragmatic meaning from non-verbal responses, finding that their accuracy significantly drops compared to verbal inputs.

View →