ArXivCSExplorer
☆☆Bookmarks🏆RSSHow to UseFAQ
Built with and by Teycir Ben Soltane•
How to Use•FAQ•GitHub•arXiv.org•
Share:

~ similar to 2605.28513· 16 results

cs.LGcs.AIRecentMay 27, 2026

Stochastic Gradient Descent with Momentum is Algorithmically Stable

Yunwen Lei, Zimeng Wang, Xiaoming Yuan

This paper provides a comprehensive generalization analysis of Stochastic Gradient Descent with Momentum (SGDM) by establishing tight, on-average model stability bounds that show SGDM can generalize w…

View →
cs.LGcs.AImath.OCRecentMay 28, 2026

Singularity-aware Optimization via Randomized Geometric Probing: Towards Stable Non-smooth Optimization

Ruoran Xu, Borong She, Xiaobo Jin, Qiufeng Wang

The paper introduces Singularity-aware Adam (S-Adam), a novel optimizer that stabilizes deep learning training in non-smooth loss landscapes by dynamically damping updates based on local geometric ins…

View →
cs.CVRecentJun 1, 2026

VISReg: Variance-Invariance-Sketching Regularization for JEPA training

Haiyu Wu, Randall Balestriero, Morgan Levine

VISReg introduces a novel regularization technique that combines variance control with a Sliced-Wasserstein-based sketching objective to stabilize self-supervised learning, achieving state-of-the-art…

View →
cs.LGstat.MLRecentJun 2, 2026

Online Learning with Gradient-Variation Interval Regret

Yan-Feng Xie, Shuche Wang, Peng Zhao, Zhi-Hua Zhou

The paper proposes a novel online learning algorithm that achieves an interval regret bound scaling with gradient variation, providing strong theoretical guarantees for non-stationary environments.

View →
cs.AImath.OCRecentJun 1, 2026

Stochastic convergence of parallel asynchronous adaptive first-order methods

Serge Gratton, Philippe L. Toint

The paper analyzes a new class of asynchronous adaptive first-order optimization methods and proves their stochastic convergence rate is O(1/sqrt{t}) for non-convex functions.

View →
cs.LGcs.AImath.OCRecentMay 28, 2026

A Unified Framework for Gradient Aggregation in Multi-Objective Optimization

Zeou Hu, Kelvin Ho, Yaoliang Yu

The paper introduces a unified theoretical framework for gradient aggregation in multi-objective optimization, establishing convergence rates and sufficient conditions for achieving Pareto stationarit…

View →
cs.LGcs.AIRecentJun 1, 2026

FOAM: Frequency and Operator Error-Based Adaptive Damping Method for Reducing Staleness-Oriented Error for Shampoo

Kyunghun Nam, Sumyeong Ahn

The paper proposes FOAM, an adaptive damping method that stabilizes the Shampoo optimization algorithm by dynamically controlling damping and eigendecomposition frequency, thereby reducing staleness-i…

View →
cs.LGcs.AIcs.CVRecentMay 30, 2026

SORA: Free Second-Order Attacks in Fast Adversarial Training

Mazdak Teymourian, Ramtin Moslemi, Farzan Rahmani, Mohammad Hossein Rohban

The paper introduces SORA, an adaptive adversarial training method that dynamically adjusts perturbation sizes to prevent Catastrophic Overfitting, achieving state-of-the-art robustness and clean accu…

View →
cs.RORecentJun 3, 2026

X4Val: Learning Neural Surrogates for Variance-Reduced Policy Evaluation

Rachel Luo, Michael Watson, Apoorva Sharma, Heng Yang +5 more

This paper introduces X4Val, a framework for variance-reduced real-world metric estimation using non-paired, multi-domain data.

View →
cs.LGcs.AIRecentMay 29, 2026

GPU Forecasters: Language Models as Selective Surrogates for Kernel Runtime Optimization

Zaid Khan, Justin Chih-Yao Chen, Jaemin Cho, Elias Stengel-Eskin +1 more

This paper demonstrates that Large Language Models (LLMs) can serve as accurate and selective surrogates for costly GPU kernel performance measurements, significantly expanding the search space for op…

View →
cs.LGcs.CRRecentJun 1, 2026

Near-Optimal Pure Machine Unlearning for Smooth Strongly Convex Losses

Matthew Regehr, Gautam Kamath, Andrew Lowy

The paper establishes tight upper and lower bounds on the statistical cost of approximate machine unlearning for smooth strongly convex losses, showing that the optimal unlearning rate depends critica…

View →
cs.LGcs.AIRecentMay 28, 2026

Score Broadcast and Decorrelation: A General Framework for Broadcast-Based Credit Assignment

Mustafa Uzun, Mete Erdogan, Cengiz Pehlevan, Alper T. Erdogan

The paper introduces Score Broadcast and Decorrelation (SBD), a general theoretical framework that unifies broadcast-based credit assignment across various differentiable loss functions by leveraging…

View →
cs.CRcs.DSRecentApr 30, 2026

Variational and Majorization Principles in Lattice Reduction

Javier Blanco-Romero, Florina Almenares Mendoza

The paper uses majorization theory to analyze lattice reduction, showing that local swaps smooth the Gram-Schmidt profile and deriving variational and telescoping identities for the worst-case profile…

View →
cs.LGstat.MLRecentJun 1, 2026

Local Preferential Bayesian Optimization

Johanna Menn, Miriam Kober, Paul Brunzema, David Stenger +1 more

The paper introduces local Preferential Bayesian Optimization (PBO) methods that adapt high-dimensional Bayesian Optimization techniques, such as trust-region and derivative-informed local search, to…

View →
cs.LGcs.CRRecentMay 7, 2026

Trade-off Functions for DP-SGD with Subsampling based on Random Shuffling: Tight Upper and Lower Bounds

Marten van Dijk, Murat Bilgehan Ertan

The paper provides a tight, transparent, and closed-form analysis of the trade-off function for Differentially Private SGD using random shuffling, significantly improving upon previous methods and est…

View →
cs.LGcs.AIRecentMay 31, 2026

What Makes a Strong Model? A Unified Spectral Analysis of Knowledge Transfer over High-dimensional Linear Regression

Wendao Wu, Fangqing Zhang, Haihan Zhang, Cong Fang

This paper develops a unified spectral analysis framework to explain how knowledge transfer (KT) works across different machine learning regimes, such as Knowledge Distillation and Weak-to-Strong gene…

View →