~ similar to 2606.06494· 17 results
This paper develops a unified spectral analysis framework to explain how knowledge transfer (KT) works across different machine learning regimes, such as Knowledge Distillation and Weak-to-Strong gene…
The paper introduces TSVD, a novel framework that efficiently pre-trains LLMs by enforcing both low rank and strict weight orthonormality, achieving performance comparable to full-parameter models wit…
Dongjun Kim, Adrian de Wynter, Huancheng Chen, Heasung Kim +1 more
The paper introduces FoLoRA, a novel optimization framework that uses a generalized Rayleigh quotient to achieve a superior balance between adapting foundation models to specific tasks and preserving…
This paper provides the first non-vacuous generalization analysis for the Stochastic Variance Reduced Gradient (SVRG) method by establishing sharp, data-dependent algorithmic stability bounds, thereby…
The paper proposes Self-Adaptive Monotonic Normalization (SAMN), a hyperparameter-friendly method that improves long-tailed recognition by enforcing monotonicity on per-class weight norms without requ…
The paper introduces TRACER, a novel regularization framework that uses Weighted Moving Average (WMA) distillation to robustly finetune multimodal models, mitigating catastrophic forgetting and improv…
Ran Liu, Min Yu, Mingqi Liu, Jianguo Jiang +6 more
The paper introduces AdvCL, a framework that repurposes adversarial perturbations as a geometric control signal to stabilize continual learning in large language models, significantly reducing forgett…
CSULoRA is a post-hoc method that corrects trained LoRA adapters by estimating a safety-aligned subspace and solving a penalized minimum-change problem to attenuate unsafe update directions while pres…
The paper proposes FOAM, an adaptive damping method that stabilizes the Shampoo optimization algorithm by dynamically controlling damping and eigendecomposition frequency, thereby reducing staleness-i…
Xiaosong Han, Ke Chen, Xindi Dai, Di Liang +6 more
TRACE proposes a novel method to mitigate catastrophic forgetting in continual LLM fine-tuning by identifying and isolating a small, task-specific subset of essential parameters for each task.
Qiao Xiao, Boqian Wu, Patrik Okanovic, Tomasz Sternal +5 more
The paper introduces Sparse Memory-Efficient Training (SMET), a method that stabilizes and optimizes Dynamic Sparse Training (DST) for large language models, enabling stable and memory-efficient spars…
The paper introduces the Terminal Representation (TR), a novel, lower-dimensional, and structurally distinct formulation for encoding reward-weighted trajectories in RL that bypasses the need for eige…
Zhi Zhou, Ming Yang, Shi-Yu Tian, Kun-Yang Yu +2 more
The paper establishes the first theoretical framework for analyzing the learnability of Test-Time Adaptation (TTA) under non-stationary data streams by introducing Recovery Complexity, which quantifie…
This paper introduces Anchored Weight Decay (AWD), a regularization technique that effectively prevents prior-task forgetting during LLM fine-tuning with Evolution Strategies (ES), positioning ES as a…
Rishit Dagli, Abir Harrasse, Luke Zhang, Florent Draye +3 more
This paper proposes a new framework called STRIDE for training data attribution in Large Language Models.
Hui Yang, Daiwei He, Kevin Jiang, Taejin Park +19 more
The paper introduces a novel paradigm where a fine-tuned LLM acts as an ancillary predictor to forecast likely advertisers, significantly improving ad recommendation systems by augmenting candidate ge…
Han Liu, Shanghao Shi, Yevgeniy Vorobeychik, Chongjie Zhang +1 more
This paper demonstrates that adversarial perturbations possess a low-rank structure, and proposes a two-step method to leverage this property to significantly improve the efficiency and effectiveness…