20 results for “Manifold Power Iteration”
CS papers onlyHybrid search: Keyword + semantic, ranked by combined score.ⓘ
Want pure semantic search? Try claim verification →
Senmiao Wang, Tiantian Fang, Haoran Zhang, Yushun Zhang +3 more
This paper proposes a preconditioning layer for stable weight conditioning in LLM training.
This paper proposes a new router redesign for Mixture-of-Experts models using Manifold Power Iteration to align router rows with the principal singular directions of associated experts.
The paper investigates applying Riemannian optimization techniques to low-rank matrix parameters for deep learning, but finds that the proposed methods do not conclusively outperform the AdamW baselin…
The paper introduces Singularity-aware Adam (S-Adam), a novel optimizer that stabilizes deep learning training in non-smooth loss landscapes by dynamically damping updates based on local geometric ins…
This paper provides the first non-vacuous generalization analysis for the Stochastic Variance Reduced Gradient (SVRG) method by establishing sharp, data-dependent algorithmic stability bounds, thereby…
The paper introduces a non-intrusive variant of index-aware learning for solving differential-algebraic equations (DAEs), ensuring that learned solutions maintain physical consistency like Kirchhoff's…
The paper introduces a computational framework using Hodge zero-modes to track the geometry of topological features in parameter-dependent data, providing metrics like curvature and holonomy to quanti…
This paper determines that verifying global parameter identifiability for linear ODE models is an NP-hard problem, establishing a computational complexity boundary for the field.
The paper proposes using pseudo-sensitivities, derived from adjoint sensitivity fields, as an optimal conditioning signal in a Bernoulli flow-matching framework to significantly improve the out-of-dis…
The paper analyzes a new class of asynchronous adaptive first-order optimization methods and proves their stochastic convergence rate is O(1/sqrt{t}) for non-convex functions.
The paper proposes FOAM, an adaptive damping method that stabilizes the Shampoo optimization algorithm by dynamically controlling damping and eigendecomposition frequency, thereby reducing staleness-i…
Yiru Yang, Junling Wang, Nishant Kumar Singh, Luohong Wu +1 more
The paper proposes a novel layer and point-wise projection mapping combined with LoRA injection to efficiently distill knowledge from a large teacher model to a small student model, significantly impr…
TailLoR is a new parameter-efficient finetuning method that uses the singular bases of pre-trained weights to learn low-rank updates, specifically penalizing updates along dominant directions to impro…
The paper introduces and explores Truly Linear FPT (TLFPT), a complexity class defined by $O(n) + f(k)$, demonstrating that it is a strict subset of standard Linear FPT and providing new algorithms fo…
The paper introduces a differentially private manifold denoising framework that allows noisy, non-private query points to be corrected using sensitive reference data while providing formal $(\varepsil…
The paper analyzes low-degree estimation thresholds for recovering hidden signals in planted hypergraphs and tensor PCA, establishing sharp phase transitions and providing polynomial-time recovery alg…
This paper analyzes the computational complexity of evaluating recurrent functions, showing that the complexity depends heavily on how the input offsets are encoded and the structure of the recurrence…
Ei Hmue Khine, Yao Li, Jiebao Sun, Shengzhu Shi +2 more
The paper proposes Latent Geometric Chords (LGC) and LGC-H, a novel method that navigates decision boundaries using curvature-aware geometric search within a semantic manifold to generate high-fidelit…
The paper introduces a subgrid marching tetrahedra scheme that accurately recovers complex, intersection-free manifold meshes from tetrahedral grids, overcoming limitations of classic marching methods…
The paper introduces TSVD, a novel framework that efficiently pre-trains LLMs by enforcing both low rank and strict weight orthonormality, achieving performance comparable to full-parameter models wit…