~ similar to 2606.02490· 18 results
The paper provides a unified algebraic framework to determine the formal language expressivity of recurrent neural language models, resolving conflicts in existing literature by linking expressivity t…
The paper investigates applying Riemannian optimization techniques to low-rank matrix parameters for deep learning, but finds that the proposed methods do not conclusively outperform the AdamW baselin…
The paper formalizes the problem of representation identifiability in supervised learning, showing that a representation property is identifiable if and only if it is constant across all possible fact…
The paper proposes a unified framework for designing efficient and expressive token mixing layers by separating the direct and recurrent influences of inputs, allowing for a principled trade-off betwe…
The paper introduces Automatically Differentiable Nonlinear Tensor Networks (ADNTNs) to achieve massive, structured compression of deep neural networks, demonstrating compression ratios up to 77,000x…
The paper demonstrates that the location and nature of state encoding in sequence models are not fixed architectural traits but are highly dependent on the specific task, showing that the encoding pro…
The paper introduces partial multi-neuron relaxation, a novel verification technique that selectively computes tight linear bounds for a small subset of neurons to improve the efficiency and tightness…
Jiafu Huang, Chao Peng, Chenyang Xu, Zhengfeng Yang +6 more
The paper proposes using an auxiliary reconstruction task, specifically one that captures intra-state feature dependencies, to improve the quality of state representations learned by the encoder in ne…
This paper establishes an exact mathematical correspondence between training and inference in deep learning and the solution of Hamilton-Jacobi partial differential equations, unifying multiple theore…
This study empirically benchmarks classical and quantum machine learning models for image recognition, finding that while quantum models offer superior accuracy and resource efficiency at high dimensi…
This paper analyzes the computational complexity of verifying feedforward neural networks when their weights are restricted to finite-width arithmetic, finding that verification remains NP-complete fo…
Senmiao Wang, Tiantian Fang, Haoran Zhang, Yushun Zhang +3 more
This paper proposes a preconditioning layer for stable weight conditioning in LLM training.
Senmiao Wang, Tiantian Fang, Haoran Zhang, Yushun Zhang +3 more
This paper proposes a preconditioning layer for stable weight conditioning in LLM training.
The paper introduces BRo-JEPA, a latent world model that successfully learns modular arithmetic (like addition modulo 10) by explicitly imposing the circular structure of the problem into the latent s…
Clark Hash is a stateless, deterministic quantization method that significantly reduces the storage size of neural embeddings while maintaining high accuracy for cosine similarity search.
The paper systematically investigates the conditions under which linear layers in AES-like ciphers avoid related-differential structures, proving that the MDS property is necessary and identifying spe…
Canyixing Cui, Tao Wu, Xingping Xian, Xiao-Ke Xu +2 more
GJDNet proposes a joint disentanglement framework to enhance the robustness of Graph Neural Networks against adversarial attacks by simultaneously stabilizing node representations and decision boundar…
The paper analyzes the expressivity of padded transformers, proving that their computational power is primarily determined by model depth and numeric precision, rather than attention type or width.