~ similar to 2605.29303· 18 results
The paper introduces an entropy-aware masking strategy for Masked Language Modeling (MLM) that targets informative and uncertain tokens, achieving up to a 5% performance improvement on GLUE scores.
This paper introduces Anchored Weight Decay (AWD), a regularization technique that effectively prevents prior-task forgetting during LLM fine-tuning with Evolution Strategies (ES), positioning ES as a…
The paper introduces Canopy Entropy ($ ext{CE}^ ext{*}$), a novel metric that quantifies generation uncertainty across the entire output space, demonstrating that fine-tuning improves information conv…
Divergence Decoding (DD) is a novel, effective, and inexpensive method that uses auxiliary models to steer LLM logits during inference, enabling the removal of memorized sensitive data without signifi…
Wenhang Shi, Yiren Chen, Shuqing Bian, Zhe Zhao +4 more
The paper introduces State-Adaptive Prompt Optimization (SAPO), a novel training strategy that treats prompts as dynamic variables to achieve robust fine-tuning, significantly mitigating catastrophic…
Yiwei Zhang, Jeremiah Birrell, Reza Ebrahimi, Rouzbeh Behnia +2 more
The paper proposes WARDEN, a distributionally robust adversarial training framework that significantly reduces LLM vulnerability to adversarial attacks by dynamically reweighting hard adversarial exam…
The paper introduces a novel, transferable learned attack (LT-MIA) that detects a universal 'signature of memorization' in language models, achieving high accuracy across diverse model architectures (…
The paper introduces Involuntary In-Context Learning (IICL), an effective few-shot pattern completion attack that can bypass safety alignments in large language models, achieving a 24.0% bypass rate a…
This paper provides a systematic, lifecycle-based framework for analyzing security threats and defenses across the entire fine-tuning process of LLMs, revealing that attack effectiveness is highly mod…
Zihan Liu, Yizhen Wang, Rui Wang, Xiu Tang +1 more
This survey provides a comprehensive, structured taxonomy of split learning techniques for fine-tuning Large Language Models (LLMs), covering model optimization, system efficiency, and privacy preserv…
Tong Ye, Hang Yu, Tengfei Ma, Xuhong Zhang +5 more
The paper introduces DOMINO, a novel inductive framework that synthesizes domain-specific data for LLMs using only reference examples, significantly improving performance on challenging, implicitly de…
Magnus Jørgenvåg, David Kaczér, Lasse Ruttert, Marvin Gülhan +2 more
This paper demonstrates that reinforcement learning (RL) can cause emergent misalignment (EM) in open-weight models, showing that even seemingly harmless or natural reward signals can induce significa…
The paper proposes DOA, a training-free attention policy that leverages self-attention in decoder-only SpeechLLMs to achieve high-quality, low-latency simultaneous long-form translation without requir…
Xinyu Liu, Darryl Cherian Jacob, Yang Zhou, Jindong Wang +1 more
The OISD framework improves language model reasoning by distilling on-policy predictive signals from the final output layer to intermediate representations, leading to substantial improvements on math…
Jian Mu, Tianyi Lin, Chengwei Qin, Zhongxiang Dai +1 more
DRIFT proposes a novel framework that efficiently optimizes LLMs for multi-turn interactions by decoupling rollout from optimization, allowing the use of weighted supervised fine-tuning to match the p…
The paper introduces the Triangulated Preference Shift score, an automated, curation-free metric to quantify systematic lexical biases introduced into Large Language Models during the preference-learn…
Zhexin Hu, Li Wang, Xiaohan Wang, Jiajun Chai +3 more
ZipRL introduces an adaptive context compression framework that significantly improves the performance and efficiency of LLMs in complex, multi-turn agent tasks by combining multi-granularity compress…