~ similar to 2606.00571· 18 results
Yuanjian Xu, Jianing Hao, Wanbo Zhang, Zhong Li +1 more
The paper proposes DiReCT, a novel framework that treats data selection during LLM annealing as a constrained optimization problem based on the spectral geometry of the loss landscape, achieving state…
Boqian Wu, Qiao Xiao, Patrik Okanovic, Tomasz Sternal +5 more
This paper introduces a new scaling law for sparse language models trained with limited data, demonstrating that sparsity can significantly improve performance and delay data saturation during multi-e…
The paper introduces and analyzes several novel data appraisal metrics, including the Vendi Score and matrix spectral functions, demonstrating that efficient optimization techniques make these metrics…
The paper proposes a novel, practical upper bound to estimate the worst-case performance of medical prediction models on the target population, even when the selection bias mechanism and target data a…
The paper proposes a Bayesian meta-learner to accurately predict the distribution of Alzheimer's disease progression scores for individuals, outperforming existing methods, especially for long-term pr…
Yalun Dai, Yangyu Huang, Tongshen Yang, Yonghan Wang +7 more
This paper proposes four guidelines and two novel data ordering methods (STR and SAW) to systematically optimize data organization, significantly enhancing the stability and performance of LLM trainin…
Qiao Xiao, Boqian Wu, Patrik Okanovic, Tomasz Sternal +5 more
The paper introduces Sparse Memory-Efficient Training (SMET), a method that stabilizes and optimizes Dynamic Sparse Training (DST) for large language models, enabling stable and memory-efficient spars…
Tim Nielen, Sameer Ambekar, Johannes Kiechle, Daniel M. Lang +1 more
This paper identifies prediction bias, a failure mode of entropy minimization in test-time adaptation, and proposes Distribution Shift Bias Reduction (DSBR) to stabilize adaptation and prevent model c…
The paper proposes a novel online learning algorithm that achieves an interval regret bound scaling with gradient variation, providing strong theoretical guarantees for non-stationary environments.
Tong Ye, Hang Yu, Tengfei Ma, Xuhong Zhang +5 more
The paper introduces DOMINO, a novel inductive framework that synthesizes domain-specific data for LLMs using only reference examples, significantly improving performance on challenging, implicitly de…
The paper proposes pretraining a Perceiver-style in-context learner on synthetic data to solve Multiple Instance Learning (MIL) tasks efficiently in the low-label regime.
This paper analyzes Best-of-$N$ preference data, deriving explicit reward targets for independent-reference variants and establishing design principles for choosing $N$ and the base distribution to op…
This paper introduces survey sampling techniques to estimate or minimize empirical pairwise loss functions, showing that targeting informative pairs significantly reduces computational cost while main…
The paper introduces an Item Response Theory (IRT)-based indicator that effectively identifies likely mislabeled items in existing LLM benchmarks, revealing systematic errors in labeling and model spe…
Weak self-training on synthetic data can amplify a language model's existing capabilities, but this effect is strictly dependent on the compatibility between the source and student models, not on the…
The paper introduces TSVD, a novel framework that efficiently pre-trains LLMs by enforcing both low rank and strict weight orthonormality, achieving performance comparable to full-parameter models wit…
Divergence Decoding (DD) is a novel, effective, and inexpensive method that uses auxiliary models to steer LLM logits during inference, enabling the removal of memorized sensitive data without signifi…