~ similar to 2606.02365· 17 results
This paper provides the first non-vacuous generalization analysis for the Stochastic Variance Reduced Gradient (SVRG) method by establishing sharp, data-dependent algorithmic stability bounds, thereby…
The paper introduces a Jacobian-based spectral audit to evaluate neural operators, demonstrating that standard prediction error metrics fail to capture crucial local dynamical structures and operator…
Dongjun Kim, Adrian de Wynter, Huancheng Chen, Heasung Kim +1 more
The paper introduces FoLoRA, a novel optimization framework that uses a generalized Rayleigh quotient to achieve a superior balance between adapting foundation models to specific tasks and preserving…
Qiao Xiao, Boqian Wu, Patrik Okanovic, Tomasz Sternal +5 more
The paper introduces Sparse Memory-Efficient Training (SMET), a method that stabilizes and optimizes Dynamic Sparse Training (DST) for large language models, enabling stable and memory-efficient spars…
Amirpasha Hedayat, Ali Mohaghegh, Laura Balzano, Cheng Huang +1 more
The paper introduces a history-aware adaptive Reduced-Order Model (ROM) framework using incremental Singular Value Decomposition (iSVD) that maintains accuracy for online dynamics far beyond the initi…
The paper introduces Singularity-aware Adam (S-Adam), a novel optimizer that stabilizes deep learning training in non-smooth loss landscapes by dynamically damping updates based on local geometric ins…
The paper analyzes a new class of asynchronous adaptive first-order optimization methods and proves their stochastic convergence rate is O(1/sqrt{t}) for non-convex functions.
TailLoR is a new parameter-efficient finetuning method that uses the singular bases of pre-trained weights to learn low-rank updates, specifically penalizing updates along dominant directions to impro…
The paper develops a quantitative framework to analyze and improve flow distillation in diffusion models, providing stability guarantees and suggesting non-uniform time scheduling to reduce approximat…
The paper proposes using a Physics-Informed Neural Network (PINN) residual as an efficient, physics-guided indicator to guide adaptive mesh refinement (AMR) for classical finite-difference PDE solvers…
The paper introduces a novel diffusion posterior sampling method that stabilizes and accelerates data-consistent sampling by replacing hand-tuned guidance weights with a per-noise-level, curvature-gui…
The paper systematically characterizes column-level activation sparsity across various diffusion model architectures, demonstrating that element-level sparsity metrics significantly overestimate the a…
The paper introduces SB-ECC, a novel score-based decoder that models error correction as continuous-time denoising, achieving state-of-the-art performance across various code families and noise levels…
STAB is a novel specification-driven pipeline that generates test cases exposing algorithmic bottlenecks by combining constraint-bound maximization and adversarial structure injection, significantly i…
DASH introduces a dual-branch distillation framework to effectively compress class-conditional diffusion models by independently supervising both score branches, significantly preserving guidance fide…
Yuanjian Xu, Jianing Hao, Wanbo Zhang, Zhong Li +1 more
The paper proposes DiReCT, a novel framework that treats data selection during LLM annealing as a constrained optimization problem based on the spectral geometry of the loss landscape, achieving state…
This paper provides a comprehensive generalization analysis of Stochastic Gradient Descent with Momentum (SGDM) by establishing tight, on-average model stability bounds that show SGDM can generalize w…