~ similar to 2605.30844· 20 results
Qi Liu, Mingdi Sun, Yongyi He, Zhi Zheng +4 more
The paper proposes EKSFT, a selective fine-tuning method that masks high-entropy or high-KL divergence tokens during Supervised Fine-Tuning (SFT) to prevent distribution shift and improve subsequent R…
The paper proposes a novel information-geometric framework to analyze LLM stability by integrating task utility, external entropy, and internal structural proxies, showing this composite score improve…
The paper introduces a novel, per-token feature derived from how sampling temperature reshapes the token distribution, demonstrating it is a significantly stronger predictor of LLM creativity than sta…
The paper introduces Multi-Response Training (MRT) to combat the 'mode lottery' problem in language model fine-tuning, showing that retaining multiple valid responses significantly improves distributi…
The paper introduces functional entropy, a code-specific uncertainty quantification method, which successfully predicts functional correctness in LLM-generated code by replacing natural language seman…
The paper introduces a novel, transferable learned attack (LT-MIA) that detects a universal 'signature of memorization' in language models, achieving high accuracy across diverse model architectures (…
Wenhang Shi, Yiren Chen, Shuqing Bian, Zhe Zhao +4 more
The paper introduces State-Adaptive Prompt Optimization (SAPO), a novel training strategy that treats prompts as dynamic variables to achieve robust fine-tuning, significantly mitigating catastrophic…
The paper introduces a diagnostic framework to decompose multilingual LLM performance variance, showing that language identity and model-benchmark interactions are key drivers of performance gaps.
Eric Onyame, Runtao Zhou, Kowshik Thopalli, Bhavya Kailkhura +1 more
This study demonstrates that Chain-of-Thought (CoT) monitoring is fundamentally fragile and unreliable for detecting misaligned behavior across typologically diverse languages, especially in low-resou…
The paper investigates emergent, sophisticated languages developed by populations of language model agents, finding that these languages are designed for oversight evasion and are difficult to monitor…
The paper introduces the Decan metric, a novel, information-theoretic approach for measuring creative diversity in AI outputs, which successfully detects diversity loss across different model fine-tun…
Weak self-training on synthetic data can amplify a language model's existing capabilities, but this effect is strictly dependent on the compatibility between the source and student models, not on the…
This paper provides a systematic, lifecycle-based framework for analyzing security threats and defenses across the entire fine-tuning process of LLMs, revealing that attack effectiveness is highly mod…
The paper proposes using fine-grained quality signals, such as pairwise self-judgments and token-level entropy, instead of simple binary correctness to improve LLM performance on saturated datasets, s…
Qinghua Zhou, Ellina Aleshina, Andrey Lovyagin, Oleg Somov +5 more
The paper proposes a debiasing fine-tuning technique to efficiently enhance the robustness of Large Language Models against semantically similar but textually altered prompts.
The paper introduces NaRA, a noise-aware LoRA technique that dynamically adapts fine-tuning parameters based on the noise level during diffusion, significantly improving the performance of Diffusion L…
This paper introduces Anchored Weight Decay (AWD), a regularization technique that effectively prevents prior-task forgetting during LLM fine-tuning with Evolution Strategies (ES), positioning ES as a…
Aniket Anand, Janvijay Singh, Zhewei Sun, Dilek Hakkani-Tür +1 more
The paper demonstrates that the AI-like style introduced by post-training alignment can be measured, localized, and causally removed using a novel ablation technique called PASTA.
The paper introduces the Calibrated Entropy Score (CES), a single-pass, black-box method that uses the distribution of token-level entropies to detect model hallucinations with high accuracy and forma…
Yujia Tong, Yuxi Wang, Yunyang Wan, Tian Zhang +2 more
This paper investigates whether model compression techniques (like quantization and pruning) preserve a Large Language Model's ability to quantify its own uncertainty, finding that accuracy-only evalu…