~ similar to 2605.29547· 17 results
This paper provides the first non-vacuous generalization analysis for the Stochastic Variance Reduced Gradient (SVRG) method by establishing sharp, data-dependent algorithmic stability bounds, thereby…
The paper proposes FOAM, an adaptive damping method that stabilizes the Shampoo optimization algorithm by dynamically controlling damping and eigendecomposition frequency, thereby reducing staleness-i…
The paper proposes using pseudo-sensitivities, derived from adjoint sensitivity fields, as an optimal conditioning signal in a Bernoulli flow-matching framework to significantly improve the out-of-dis…
Qiao Xiao, Boqian Wu, Patrik Okanovic, Tomasz Sternal +5 more
The paper introduces Sparse Memory-Efficient Training (SMET), a method that stabilizes and optimizes Dynamic Sparse Training (DST) for large language models, enabling stable and memory-efficient spars…
The paper introduces a Jacobian-based spectral audit to evaluate neural operators, demonstrating that standard prediction error metrics fail to capture crucial local dynamical structures and operator…
The paper introduces Inconsistency-Aware Minimization (IAM), a novel training objective that uses a label-free measure called local inconsistency to improve model generalization, particularly in semi-…
Mengnan Zhao, Lihe Zhang, Bo Wang, Tianhang Zheng +2 more
The paper proposes a Distribution-aware Dynamic Guidance (DDG) strategy to mitigate catastrophic overfitting and the robustness-accuracy trade-off inherent in Fast Adversarial Training (FAT) by dynami…
Yuxin Wang, Yuanzhe Hu, Xiaokun Zhong, Xiaopeng Wang +6 more
This paper analyzes the multi-regime behavior of Scientific Machine Learning (SciML) models, finding that optimization effectiveness is regime-specific and that failure modes require a unified, regime…
This paper establishes an exact mathematical correspondence between training and inference in deep learning and the solution of Hamilton-Jacobi partial differential equations, unifying multiple theore…
The paper investigates applying Riemannian optimization techniques to low-rank matrix parameters for deep learning, but finds that the proposed methods do not conclusively outperform the AdamW baselin…
The paper proposes a novel online learning algorithm that achieves an interval regret bound scaling with gradient variation, providing strong theoretical guarantees for non-stationary environments.
Johanna Menn, Miriam Kober, Paul Brunzema, David Stenger +1 more
The paper introduces local Preferential Bayesian Optimization (PBO) methods that adapt high-dimensional Bayesian Optimization techniques, such as trust-region and derivative-informed local search, to…
The paper introduces Diversity-inducing Initialization (DivIn), a novel method that improves image diversity by re-weighting the initial noise selection based on the guidance potential, thereby mitiga…
Wenhao Lan, Shan Li, Xinhua Lai, Meiqi Wu +3 more
The paper investigates how dynamic adversarial fine-tuning (R2D2) reorganizes the internal mechanisms (refusal geometry) of safety-aligned language models, finding that it shifts the optimal refusal c…
Ran Liu, Min Yu, Mingqi Liu, Jianguo Jiang +6 more
The paper introduces AdvCL, a framework that repurposes adversarial perturbations as a geometric control signal to stabilize continual learning in large language models, significantly reducing forgett…
This paper provides a comprehensive generalization analysis of Stochastic Gradient Descent with Momentum (SGDM) by establishing tight, on-average model stability bounds that show SGDM can generalize w…
The paper introduces SORA, an adaptive adversarial training method that dynamically adjusts perturbation sizes to prevent Catastrophic Overfitting, achieving state-of-the-art robustness and clean accu…