~ similar to 2606.11104· 18 results
This paper analyzes the computational complexity of verifying feedforward neural networks when their weights are restricted to finite-width arithmetic, finding that verification remains NP-complete fo…
Arnaud Descours, Arnaud Guillin, Geoffrey Lacour, Manon Michel +2 more
This paper develops a novel, computationally efficient method to quantify the uncertainty in wide neural network predictions by characterizing the limiting random fluctuations using stochastic evoluti…
The paper proposes FOAM, an adaptive damping method that stabilizes the Shampoo optimization algorithm by dynamically controlling damping and eigendecomposition frequency, thereby reducing staleness-i…
Zhi Zhou, Ming Yang, Shi-Yu Tian, Kun-Yang Yu +2 more
The paper establishes the first theoretical framework for analyzing the learnability of Test-Time Adaptation (TTA) under non-stationary data streams by introducing Recovery Complexity, which quantifie…
The paper analyzes the algorithmic complexity of finding collisions in single-layer binary neural networks, establishing that the collision resistance depends critically on the activation function's t…
This paper establishes a large deviation principle for the generalization error of interpolating classifiers in the overparametrized regime.
This paper establishes a large deviation principle for the generalization error of interpolating classifiers in the overparametrized regime.
The paper analyzes language generation and identification in the limit under bounded memory, showing that memory constraints significantly alter learnability, particularly affecting achievable density…
The scaling exponent in neural scaling laws is not fixed but systematically depends on the optimizer used, with preconditioned optimizers generally yielding steeper scaling.
The paper analyzes a new class of asynchronous adaptive first-order optimization methods and proves their stochastic convergence rate is O(1/sqrt{t}) for non-convex functions.
Boqian Wu, Qiao Xiao, Patrik Okanovic, Tomasz Sternal +5 more
This paper introduces a new scaling law for sparse language models trained with limited data, demonstrating that sparsity can significantly improve performance and delay data saturation during multi-e…
The paper introduces Automatically Differentiable Nonlinear Tensor Networks (ADNTNs) to achieve massive, structured compression of deep neural networks, demonstrating compression ratios up to 77,000x…
The paper introduces and analyzes several novel data appraisal metrics, including the Vendi Score and matrix spectral functions, demonstrating that efficient optimization techniques make these metrics…
The study finds that specific, interpretable neuron populations (Rosetta Neurons) exhibit predictable, scale-dependent changes in selectivity and specialization as neural models grow larger.
The paper establishes information-theoretic lower bounds for stochastic optimization using low-bit gradients by reducing the problem to compressed Gaussian mean estimation, yielding sharp bounds on co…
Tianren Zhang, Xiangxin Li, Minghao Xiao, Guanyu Chen +1 more
The paper introduces polynomial representations as a quantitative, distribution-aware metric for measuring model simplicity, demonstrating that the effective degree of this representation is a superio…
The paper establishes new hardness amplification results for Learning Parity with Noise (LPN) and its sparse variants, showing that solving the problem on a small fraction of instances implies solving…
The paper analyzes congruence-based neural architectures for classifying positive-definite matrices, demonstrating that common semi-orthogonality constraints severely limit the model's expressivity.