Papers similar to 2605.29547

~ similar to 2605.29547· 17 results

cs.LGcs.AIRecentMay 27, 2026

Learning Theory of the SVRG: Generalization and Convergence Analysis

This paper provides the first non-vacuous generalization analysis for the Stochastic Variance Reduced Gradient (SVRG) method by establishing sharp, data-dependent algorithmic stability bounds, thereby…

View →

cs.LGcs.AIRecentJun 1, 2026

FOAM: Frequency and Operator Error-Based Adaptive Damping Method for Reducing Staleness-Oriented Error for Shampoo

Kyunghun Nam, Sumyeong Ahn

The paper proposes FOAM, an adaptive damping method that stabilizes the Shampoo optimization algorithm by dynamically controlling damping and eigendecomposition frequency, thereby reducing staleness-i…

View →

cs.LGcs.AIcs.CERecentJun 1, 2026

On the Generalization in Topology Optimization via Sensitivity-Conditioned Bernoulli Flow Matching

Mohammad Rashed, Duarte F. Valoroso Madeira, Babak Gholami, Caglar Guerbuez +2 more

The paper proposes using pseudo-sensitivities, derived from adjoint sensitivity fields, as an optimal conditioning signal in a Bernoulli flow-matching framework to significantly improve the out-of-dis…

View →

cs.LGcs.AIRecentMay 30, 2026

Memory-Efficient LLM Training with Dynamic Sparsity: From Stability to Practical Scaling

Qiao Xiao, Boqian Wu, Patrik Okanovic, Tomasz Sternal +5 more

The paper introduces Sparse Memory-Efficient Training (SMET), a method that stabilizes and optimizes Dynamic Sparse Training (DST) for large language models, enabling stable and memory-efficient spars…

View →

math.NAcs.LGRecentJun 1, 2026

Spectral Audit of In-Context Operator Networks

Zhiwei Gao, Liu Yang, George Em Karniadakis

The paper introduces a Jacobian-based spectral audit to evaluate neural operators, demonstrating that standard prediction error metrics fail to capture crucial local dynamical structures and operator…

View →

cs.LGcs.AIRecentMay 29, 2026

Inconsistency-Aware Minimization: Improving Generalization with Unlabeled Data

Hee-Sung Kim, Hyeonseong Kim, Sungyoon Lee

The paper introduces Inconsistency-Aware Minimization (IAM), a novel training objective that uses a label-free measure called local inconsistency to improve model generalization, particularly in semi-…

View →

cs.LGcs.CRRecentApr 27, 2026

Mitigating Error Amplification in Fast Adversarial Training

Mengnan Zhao, Lihe Zhang, Bo Wang, Tianhang Zheng +2 more

The paper proposes a Distribution-aware Dynamic Guidance (DDG) strategy to mitigate catastrophic overfitting and the robustness-accuracy trade-off inherent in Fast Adversarial Training (FAT) by dynami…

View →

cs.LGcs.AIphysics.comp-phRecentMay 27, 2026

Unveiling Multi-regime Patterns in SciML: Distinct Failure Modes and Regime-specific Optimization

Yuxin Wang, Yuanzhe Hu, Xiaokun Zhong, Xiaopeng Wang +6 more

This paper analyzes the multi-regime behavior of Scientific Machine Learning (SciML) models, finding that optimization effectiveness is regime-specific and that failure modes require a unified, regime…

View →

cs.LGcs.AImath.DSRecentMay 27, 2026

The Hamilton-Jacobi Theory of Deep Learning

Jose Marie Antonio Miñoza, Erika Fille T. Legara, Christopher P. Monterola

This paper establishes an exact mathematical correspondence between training and inference in deep learning and the solution of Hamilton-Jacobi partial differential equations, unifying multiple theore…

View →

cs.LGRecentJun 1, 2026

Riemannian Gradient Descent for Low-Rank Architectures

Nicholas Knight

The paper investigates applying Riemannian optimization techniques to low-rank matrix parameters for deep learning, but finds that the proposed methods do not conclusively outperform the AdamW baselin…

View →

cs.LGstat.MLRecentJun 2, 2026

Online Learning with Gradient-Variation Interval Regret

Yan-Feng Xie, Shuche Wang, Peng Zhao, Zhi-Hua Zhou

The paper proposes a novel online learning algorithm that achieves an interval regret bound scaling with gradient variation, providing strong theoretical guarantees for non-stationary environments.

View →

cs.LGstat.MLRecentJun 1, 2026

Local Preferential Bayesian Optimization

Johanna Menn, Miriam Kober, Paul Brunzema, David Stenger +1 more

The paper introduces local Preferential Bayesian Optimization (PBO) methods that adapt high-dimensional Bayesian Optimization techniques, such as trust-region and derivative-informed local search, to…

View →

cs.CVcs.AIRecentJun 1, 2026

Initialization is Half the Battle: Generating Diverse Images from a Guidance Potential Posterior

Xiang Li, Dianbo Liu, Kenji Kawaguchi

The paper introduces Diversity-inducing Initialization (DivIn), a novel method that improves image diversity by re-weighting the initial noise selection based on the guidance potential, thereby mitiga…

View →

cs.LGcs.CLcs.CRRecentApr 29, 2026

Dynamic Adversarial Fine-Tuning Reorganizes Refusal Geometry

Wenhao Lan, Shan Li, Xinhua Lai, Meiqi Wu +3 more

The paper investigates how dynamic adversarial fine-tuning (R2D2) reorganizes the internal mechanisms (refusal geometry) of safety-aligned language models, finding that it shifts the optimal refusal c…

View →

cs.LGcs.AIRecentJun 1, 2026

Repurposing Adversarial Perturbations for Continual Learning: From Defense to Active Alignment

Ran Liu, Min Yu, Mingqi Liu, Jianguo Jiang +6 more

The paper introduces AdvCL, a framework that repurposes adversarial perturbations as a geometric control signal to stabilize continual learning in large language models, significantly reducing forgett…

View →

cs.LGcs.AIRecentMay 27, 2026

Stochastic Gradient Descent with Momentum is Algorithmically Stable

Yunwen Lei, Zimeng Wang, Xiaoming Yuan

This paper provides a comprehensive generalization analysis of Stochastic Gradient Descent with Momentum (SGDM) by establishing tight, on-average model stability bounds that show SGDM can generalize w…

View →

cs.LGcs.AIcs.CVRecentMay 30, 2026

SORA: Free Second-Order Attacks in Fast Adversarial Training

Mazdak Teymourian, Ramtin Moslemi, Farzan Rahmani, Mohammad Hossein Rohban

The paper introduces SORA, an adaptive adversarial training method that dynamically adjusts perturbation sizes to prevent Catastrophic Overfitting, achieving state-of-the-art robustness and clean accu…

View →