Papers similar to 2606.02328

~ similar to 2606.02328· 18 results

cs.LGcs.AIRecentMay 27, 2026

Efficient Pre-Training of LLMs through Truncated SVD Layers

Kaivan Kamali, Kajetan Schweighofer, Hormoz Shahrzad, Olivier Francon +2 more

The paper introduces TSVD, a novel framework that efficiently pre-trains LLMs by enforcing both low rank and strict weight orthonormality, achieving performance comparable to full-parameter models wit…

View →

cs.CVcs.CRcs.LGRecentMay 29, 2026

Latent Geometric Chords for Query-Efficient Decision-Based Adversarial Attacks

Ei Hmue Khine, Yao Li, Jiebao Sun, Shengzhu Shi +2 more

The paper proposes Latent Geometric Chords (LGC) and LGC-H, a novel method that navigates decision boundaries using curvature-aware geometric search within a semantic manifold to generate high-fidelit…

View →

cs.LGRecentJun 1, 2026

Expressivity of congruence-based architectures for DNNs on positive-definite matrices

Antonin Oswald, Estelle Massart

The paper analyzes congruence-based neural architectures for classifying positive-definite matrices, demonstrating that common semi-orthogonality constraints severely limit the model's expressivity.

View →

cs.LGmath.STstat.MERecentJun 1, 2026

Network Learning with Semi-relaxed Gromov-Wasserstein

Charles Dufour, Ulysse Naepels, Leonardo V. Santoro

The paper proposes a semi-relaxed Gromov-Wasserstein objective to estimate the latent connectivity structure of large-scale networks, achieving statistically consistent and efficient recovery of the u…

View →

cs.LGcs.AIRecentMay 27, 2026

Learning Theory of the SVRG: Generalization and Convergence Analysis

Yunwen Lei, Zimeng Wang, Xiaoming Yuan

This paper provides the first non-vacuous generalization analysis for the Stochastic Variance Reduced Gradient (SVRG) method by establishing sharp, data-dependent algorithmic stability bounds, thereby…

View →

cs.LGcs.CRRecentApr 30, 2026

Low Rank Adaptation for Adversarial Perturbation

Han Liu, Shanghao Shi, Yevgeniy Vorobeychik, Chongjie Zhang +1 more

This paper demonstrates that adversarial perturbations possess a low-rank structure, and proposes a two-step method to leverage this property to significantly improve the efficiency and effectiveness…

View →

cs.CLcs.LGRecentMay 31, 2026

Don't Read Everything: A Curvature-Conditioned Query for Linear Attention

Dong Le, Thong Nguyen, Cong-Duy Nguyen, Anh Tuan Luu

The paper introduces Curvature-Conditioned Query (CCQ), a novel read-time contraction mechanism that improves linear attention's performance on long-context and retrieval tasks by incorporating the ge…

View →

cs.IRcs.AIRecentJun 1, 2026

Rank-Constrained Deep Matrix Completion for Group Recommendation

Mubaraka Sani Ibrahim, Lehel Csató, Isah Charles Saidu

The paper proposes Group Rank-Constrained Deep Matrix Completion (Group RC-DMC), a novel framework that jointly leverages low-rank structure and attention-based modeling to provide accurate group reco…

View →

cs.CRcs.DSRecentApr 30, 2026

Variational and Majorization Principles in Lattice Reduction

Javier Blanco-Romero, Florina Almenares Mendoza

The paper uses majorization theory to analyze lattice reduction, showing that local swaps smooth the Gram-Schmidt profile and deriving variational and telescoping identities for the worst-case profile…

View →

cs.LGcs.AImath.OCRecentMay 28, 2026

Singularity-aware Optimization via Randomized Geometric Probing: Towards Stable Non-smooth Optimization

Ruoran Xu, Borong She, Xiaobo Jin, Qiufeng Wang

The paper introduces Singularity-aware Adam (S-Adam), a novel optimizer that stabilizes deep learning training in non-smooth loss landscapes by dynamically damping updates based on local geometric ins…

View →

cs.CVRecentJun 1, 2026

From Extrinsic to Intrinsic: Geodesic-Guided Representation Learning for 3D Geometric Data

Yuming Zhao, Junhui Hou, Qijian Zhang, Jia Qin +1 more

The paper introduces PRISM, a novel representation learning framework that learns isometric embeddings by explicitly modeling the intrinsic geodesic metric of 3D surfaces, achieving superior performan…

View →

cs.LGcs.AIRecentMay 27, 2026

Learning Compositional Latent Structure with Vector Networks

Niclas Pokel, Benjamin F. Grewe

The paper introduces the Vector Network (VN), a novel recurrent architecture that replaces fixed weight matrices with reusable weight atoms, enabling superior compositional generalization by making st…

View →

cs.LGcs.AImath.DSRecentMay 27, 2026

The Hamilton-Jacobi Theory of Deep Learning

Jose Marie Antonio Miñoza, Erika Fille T. Legara, Christopher P. Monterola

This paper establishes an exact mathematical correspondence between training and inference in deep learning and the solution of Hamilton-Jacobi partial differential equations, unifying multiple theore…

View →

cs.LGcs.AIcs.CVRecentMay 28, 2026

How Much Is a Dataset Worth? Scaling Laws, the Vendi Score, and Matrix Spectral Functions

Jeff A. Bilmes, Gantavya Bhatt, Arnav M. Das

The paper introduces and analyzes several novel data appraisal metrics, including the Vendi Score and matrix spectral functions, demonstrating that efficient optimization techniques make these metrics…

View →

cs.LGcs.CLRecentMay 31, 2026

CART: Context-Anchored Recurrent Transformer -- A Parameter-Efficient Architecture with Learned Stability

Chad A. Capps

CART introduces a parameter-efficient recurrent transformer architecture that reuses a core block multiple times, but its performance does not surpass a dense baseline, suggesting that weight sharing…

View →

cs.LGcs.AIcs.CLRecentMay 27, 2026

Parallax: Parameterized Local Linear Attention for Language Modeling

Yifei Zuo, Dhruv Pai, Zhichen Zeng, Alec Dewulf +2 more

The paper introduces Parallax, a scalable and numerically stable parameterized Local Linear Attention mechanism that significantly improves LLM performance and efficiency compared to existing methods…

View →

cs.CLcs.AIRecentMay 29, 2026

D$^3$: Dynamic Directional Graph-Constrained Data Scheduling for LLM Training

Yuanjian Xu, Jianing Hao, Guang Zhang, Zhong Li

The paper proposes $D^3$, a dynamic graph-constrained scheduling framework that optimizes LLM training order by modeling sample interactions as a dynamic influence graph.

View →

cs.CLRecentMay 29, 2026

Towards Efficient LLMs Annealing with Principled Sample Selection

Yuanjian Xu, Jianing Hao, Wanbo Zhang, Zhong Li +1 more

The paper proposes DiReCT, a novel framework that treats data selection during LLM annealing as a constrained optimization problem based on the spectral geometry of the loss landscape, achieving state…

View →