~ similar to 2606.02328· 18 results
The paper introduces TSVD, a novel framework that efficiently pre-trains LLMs by enforcing both low rank and strict weight orthonormality, achieving performance comparable to full-parameter models wit…
Ei Hmue Khine, Yao Li, Jiebao Sun, Shengzhu Shi +2 more
The paper proposes Latent Geometric Chords (LGC) and LGC-H, a novel method that navigates decision boundaries using curvature-aware geometric search within a semantic manifold to generate high-fidelit…
The paper analyzes congruence-based neural architectures for classifying positive-definite matrices, demonstrating that common semi-orthogonality constraints severely limit the model's expressivity.
The paper proposes a semi-relaxed Gromov-Wasserstein objective to estimate the latent connectivity structure of large-scale networks, achieving statistically consistent and efficient recovery of the u…
This paper provides the first non-vacuous generalization analysis for the Stochastic Variance Reduced Gradient (SVRG) method by establishing sharp, data-dependent algorithmic stability bounds, thereby…
Han Liu, Shanghao Shi, Yevgeniy Vorobeychik, Chongjie Zhang +1 more
This paper demonstrates that adversarial perturbations possess a low-rank structure, and proposes a two-step method to leverage this property to significantly improve the efficiency and effectiveness…
The paper introduces Curvature-Conditioned Query (CCQ), a novel read-time contraction mechanism that improves linear attention's performance on long-context and retrieval tasks by incorporating the ge…
The paper proposes Group Rank-Constrained Deep Matrix Completion (Group RC-DMC), a novel framework that jointly leverages low-rank structure and attention-based modeling to provide accurate group reco…
The paper uses majorization theory to analyze lattice reduction, showing that local swaps smooth the Gram-Schmidt profile and deriving variational and telescoping identities for the worst-case profile…
The paper introduces Singularity-aware Adam (S-Adam), a novel optimizer that stabilizes deep learning training in non-smooth loss landscapes by dynamically damping updates based on local geometric ins…
Yuming Zhao, Junhui Hou, Qijian Zhang, Jia Qin +1 more
The paper introduces PRISM, a novel representation learning framework that learns isometric embeddings by explicitly modeling the intrinsic geodesic metric of 3D surfaces, achieving superior performan…
The paper introduces the Vector Network (VN), a novel recurrent architecture that replaces fixed weight matrices with reusable weight atoms, enabling superior compositional generalization by making st…
This paper establishes an exact mathematical correspondence between training and inference in deep learning and the solution of Hamilton-Jacobi partial differential equations, unifying multiple theore…
The paper introduces and analyzes several novel data appraisal metrics, including the Vendi Score and matrix spectral functions, demonstrating that efficient optimization techniques make these metrics…
CART introduces a parameter-efficient recurrent transformer architecture that reuses a core block multiple times, but its performance does not surpass a dense baseline, suggesting that weight sharing…
Yifei Zuo, Dhruv Pai, Zhichen Zeng, Alec Dewulf +2 more
The paper introduces Parallax, a scalable and numerically stable parameterized Local Linear Attention mechanism that significantly improves LLM performance and efficiency compared to existing methods…
The paper proposes $D^3$, a dynamic graph-constrained scheduling framework that optimizes LLM training order by modeling sample interactions as a dynamic influence graph.
Yuanjian Xu, Jianing Hao, Wanbo Zhang, Zhong Li +1 more
The paper proposes DiReCT, a novel framework that treats data selection during LLM annealing as a constrained optimization problem based on the spectral geometry of the loss landscape, achieving state…