~ similar to 2606.02398· 20 results
Xiangtao Meng, Wenyu Chen, Chuanchao Zang, Xinyu Gao +4 more
This paper systematically measures and explains how sequential model defenses can conflict, finding that 38.9% of ordered defense sequences cause measurable risk exacerbation due to anti-aligned param…
This paper introduces Anchored Weight Decay (AWD), a regularization technique that effectively prevents prior-task forgetting during LLM fine-tuning with Evolution Strategies (ES), positioning ES as a…
The paper challenges the conclusion that LLMs lack reasoning by demonstrating that reported performance drops on GSM-Symbolic are often statistically weak and partially attributable to dataset biases,…
Magnus Jørgenvåg, David Kaczér, Lasse Ruttert, Marvin Gülhan +2 more
This paper demonstrates that reinforcement learning (RL) can cause emergent misalignment (EM) in open-weight models, showing that even seemingly harmless or natural reward signals can induce significa…
Yuduo Li, Xiaofeng Shi, Qian Kou, Longbin Yu +1 more
RAFT proposes a two-stage framework combining data refinement and adaptive distillation to improve domain-specific fine-tuning while mitigating the loss of general model capabilities.
This paper analyzes failure modes in collaborative visual reasoning systems, demonstrating that naive shared workspaces can amplify hallucinations and proposing diagnostics for improving communication…
Yue Cheng, Jiajun Zhang, Xiaohui Gao, Weiwei Xing +2 more
This paper investigates the non-monotonic role of sample difficulty in Reinforcement Learning with Verifiable Reward (RLVR), finding that medium-difficulty problems provide the most balanced and benef…
The paper formally addresses the challenging question of cross-domain transferability of latent predictive models by proposing a structured framework that quantifies the relationship between source an…
Xinyu Liu, Darryl Cherian Jacob, Yang Zhou, Jindong Wang +1 more
The OISD framework improves language model reasoning by distilling on-policy predictive signals from the final output layer to intermediate representations, leading to substantial improvements on math…
Qiao Xiao, Boqian Wu, Patrik Okanovic, Tomasz Sternal +5 more
The paper introduces Sparse Memory-Efficient Training (SMET), a method that stabilizes and optimizes Dynamic Sparse Training (DST) for large language models, enabling stable and memory-efficient spars…
Tong Ye, Hang Yu, Tengfei Ma, Xuhong Zhang +5 more
The paper introduces DOMINO, a novel inductive framework that synthesizes domain-specific data for LLMs using only reference examples, significantly improving performance on challenging, implicitly de…
Ran Liu, Min Yu, Mingqi Liu, Jianguo Jiang +6 more
The paper introduces AdvCL, a framework that repurposes adversarial perturbations as a geometric control signal to stabilize continual learning in large language models, significantly reducing forgett…
CARE-RL introduces a framework combining protocol-aware reward generation and capability-aware optimization to effectively mitigate cross-domain conflicts in multi-domain reinforcement learning for LL…
Zizhuo Lin, Quanling Liu, Jinsheng Quan, Chao Zhang +5 more
The paper introduces Canonical-Context On-Policy Distillation (CCOPD) to improve multi-turn language model performance by mitigating 'self-anchored drift,' ensuring consistent answers regardless of wh…
The paper introduces Reasoning in Memory (RiM), a latent reasoning method that replaces autoregressive token generation with fixed memory blocks to enable compute-efficient internal working memory for…
Qi Liu, Mingdi Sun, Yongyi He, Zhi Zheng +4 more
The paper proposes EKSFT, a selective fine-tuning method that masks high-entropy or high-KL divergence tokens during Supervised Fine-Tuning (SFT) to prevent distribution shift and improve subsequent R…
The paper demonstrates that encoding harmful prompts as genuine mathematical problems, rather than just using mathematical formatting, effectively bypasses the safety filters of large language models.
Zeli Su, Zhankai Xu, Tianlei Chen, Longfei Zheng +3 more
The paper introduces DistractionIF, a benchmark showing that larger LLMs are paradoxically less robust to benign, instruction-like noise in reference text, suggesting reinforcement learning can restor…
The paper demonstrates that the location and nature of state encoding in sequence models are not fixed architectural traits but are highly dependent on the specific task, showing that the encoding pro…
Daize Dong, Junlin Chen, Haolong Jia, Jiawei Wu +8 more
The paper proposes Predictive Routing Replay (PR2) to stabilize reinforcement learning on Mixture of Experts (MoE) LLMs by predicting and incorporating short-horizon router evolution during training a…