Papers similar to 2605.28517

~ similar to 2605.28517· 17 results

cs.LGcs.AIRecentMay 27, 2026

Learning Theory of the SVRG: Generalization and Convergence Analysis

This paper provides the first non-vacuous generalization analysis for the Stochastic Variance Reduced Gradient (SVRG) method by establishing sharp, data-dependent algorithmic stability bounds, thereby…

View →

cs.AImath.OCRecentJun 1, 2026

Stochastic convergence of parallel asynchronous adaptive first-order methods

Serge Gratton, Philippe L. Toint

The paper analyzes a new class of asynchronous adaptive first-order optimization methods and proves their stochastic convergence rate is O(1/sqrt{t}) for non-convex functions.

View →

cs.LGcs.AImath.OCRecentMay 28, 2026

Singularity-aware Optimization via Randomized Geometric Probing: Towards Stable Non-smooth Optimization

Ruoran Xu, Borong She, Xiaobo Jin, Qiufeng Wang

The paper introduces Singularity-aware Adam (S-Adam), a novel optimizer that stabilizes deep learning training in non-smooth loss landscapes by dynamically damping updates based on local geometric ins…

View →

cs.LGRecentJun 1, 2026

Riemannian Gradient Descent for Low-Rank Architectures

Nicholas Knight

The paper investigates applying Riemannian optimization techniques to low-rank matrix parameters for deep learning, but finds that the proposed methods do not conclusively outperform the AdamW baselin…

View →

cs.LGcs.CRRecentMay 7, 2026

Trade-off Functions for DP-SGD with Subsampling based on Random Shuffling: Tight Upper and Lower Bounds

Marten van Dijk, Murat Bilgehan Ertan

The paper provides a tight, transparent, and closed-form analysis of the trade-off function for Differentially Private SGD using random shuffling, significantly improving upon previous methods and est…

View →

cs.CRcs.DSRecentApr 30, 2026

Variational and Majorization Principles in Lattice Reduction

Javier Blanco-Romero, Florina Almenares Mendoza

The paper uses majorization theory to analyze lattice reduction, showing that local swaps smooth the Gram-Schmidt profile and deriving variational and telescoping identities for the worst-case profile…

View →

cs.CLRecentMay 29, 2026

Towards Efficient LLMs Annealing with Principled Sample Selection

Yuanjian Xu, Jianing Hao, Wanbo Zhang, Zhong Li +1 more

The paper proposes DiReCT, a novel framework that treats data selection during LLM annealing as a constrained optimization problem based on the spectral geometry of the loss landscape, achieving state…

View →

cs.LGcs.AIRecentJun 1, 2026

FOAM: Frequency and Operator Error-Based Adaptive Damping Method for Reducing Staleness-Oriented Error for Shampoo

Kyunghun Nam, Sumyeong Ahn

The paper proposes FOAM, an adaptive damping method that stabilizes the Shampoo optimization algorithm by dynamically controlling damping and eigendecomposition frequency, thereby reducing staleness-i…

View →

stat.MLcs.LGRecentJun 2, 2026

A Quantitative Approximation Framework for Flow Distillation in Diffusion Models

Weiguo Gao, Ming Li, Lei Shi, Hanfei Zhou

The paper develops a quantitative framework to analyze and improve flow distillation in diffusion models, providing stability guarantees and suggesting non-uniform time scheduling to reduce approximat…

View →

cs.LGcs.CRRecentJun 3, 2026

DP-MacAdam: Differentially Private Mechanism with Adaptive Clipping and Adaptive Momentum

Naima Tasnim, Lalitha Sankar, Oliver Kosut

The paper proposes DP-MacAdam, a novel differentially private optimization algorithm that simultaneously uses adaptive gradient clipping and momentum, achieving improved model accuracy over existing m…

View →

cs.CRRecentMay 15, 2026

Rethinking the Security of DP-SGD: A Corrected Analysis of Differentially Private Machine Learning

Wenhao Wang, Shujie Cui, Hui Cui, Xingliang Yuan

This paper corrects the theoretical analysis of DP-SGD by identifying that common implementations, which use batch averaging, result in weaker privacy guarantees than previously reported.

View →

cs.LGcs.AIRecentMay 27, 2026

Efficient Pre-Training of LLMs through Truncated SVD Layers

Kaivan Kamali, Kajetan Schweighofer, Hormoz Shahrzad, Olivier Francon +2 more

The paper introduces TSVD, a novel framework that efficiently pre-trains LLMs by enforcing both low rank and strict weight orthonormality, achieving performance comparable to full-parameter models wit…

View →

cs.CVcs.AIcs.LGRecentMay 30, 2026

DASH: Dual-Branch Score Distillation for Guidance-Calibrated Compact Diffusion Models

Abdullah Al Shafi, Kazi Saeed Alam, Sk Imran Hossain, Engelbert Mephu Nguifo

DASH introduces a dual-branch distillation framework to effectively compress class-conditional diffusion models by independently supervising both score branches, significantly preserving guidance fide…

View →

cs.LGcs.CRstat.MLRecentApr 7, 2026

Optimal Rates for Pure $\varepsilon$-Differentially Private Stochastic Convex Optimization with Heavy Tails

Andrew Lowy

The paper characterizes the minimax optimal excess-risk rate for pure $\varepsilon$-DP stochastic convex optimization with heavy-tailed gradients, providing an algorithm that achieves this rate.

View →

cs.LGcs.AIRecentMay 30, 2026

Memory-Efficient LLM Training with Dynamic Sparsity: From Stability to Practical Scaling

Qiao Xiao, Boqian Wu, Patrik Okanovic, Tomasz Sternal +5 more

The paper introduces Sparse Memory-Efficient Training (SMET), a method that stabilizes and optimizes Dynamic Sparse Training (DST) for large language models, enabling stable and memory-efficient spars…

View →

cs.LGcs.AIcs.CVRecentMay 28, 2026

How Much Is a Dataset Worth? Scaling Laws, the Vendi Score, and Matrix Spectral Functions

Jeff A. Bilmes, Gantavya Bhatt, Arnav M. Das

The paper introduces and analyzes several novel data appraisal metrics, including the Vendi Score and matrix spectral functions, demonstrating that efficient optimization techniques make these metrics…

View →

cs.LGcs.AIcs.CERecentJun 1, 2026

On the Generalization in Topology Optimization via Sensitivity-Conditioned Bernoulli Flow Matching

Mohammad Rashed, Duarte F. Valoroso Madeira, Babak Gholami, Caglar Guerbuez +2 more

The paper proposes using pseudo-sensitivities, derived from adjoint sensitivity fields, as an optimal conditioning signal in a Bernoulli flow-matching framework to significantly improve the out-of-dis…

View →