~ similar to 2606.01765· 20 results
The paper proposes a unified framework for designing efficient and expressive token mixing layers by separating the direct and recurrent influences of inputs, allowing for a principled trade-off betwe…
This paper analyzes the computational complexity of verifying feedforward neural networks when their weights are restricted to finite-width arithmetic, finding that verification remains NP-complete fo…
The paper proposes CYKNN, a novel recurrent neural network architecture that directly encodes the CYK parsing algorithm, demonstrating superior performance over large language models on syntactic pars…
The paper analyzes congruence-based neural architectures for classifying positive-definite matrices, demonstrating that common semi-orthogonality constraints severely limit the model's expressivity.
The paper analyzes the expressivity of padded transformers, proving that their computational power is primarily determined by model depth and numeric precision, rather than attention type or width.
The paper introduces BRo-JEPA, a latent world model that successfully learns modular arithmetic (like addition modulo 10) by explicitly imposing the circular structure of the problem into the latent s…
The paper demonstrates that positional encodings are not necessary for transformers to achieve universal computation, showing that the inherent mechanism of sliding context windows already provides su…
The paper analyzes a fragment of Higher-Order Datalog, showing that restricting recursion to a linear form shifts its expressive power from time complexity to space complexity, specifically capturing…
The paper demonstrates that the location and nature of state encoding in sequence models are not fixed architectural traits but are highly dependent on the specific task, showing that the encoding pro…
This paper proposes Supervised Memory Training (SMT), a method for training nonlinear RNNs that sidesteps recurrent credit propagation entirely.
This paper proposes Supervised Memory Training (SMT), a method for training nonlinear RNNs that sidesteps recurrent credit propagation entirely.
The paper analyzes language generation and identification in the limit under bounded memory, showing that memory constraints significantly alter learnability, particularly affecting achievable density…
The paper introduces an automatic numeric-remapping attack to test the robustness of LLMs on arithmetic word problems, finding that LLMs remain sensitive to small numeric changes in datasets like GSM8…
The paper demonstrates that encoding harmful prompts as genuine mathematical problems, rather than just using mathematical formatting, effectively bypasses the safety filters of large language models.
The paper challenges the conclusion that LLMs lack reasoning by demonstrating that reported performance drops on GSM-Symbolic are often statistically weak and partially attributable to dataset biases,…
This paper analyzes the computational complexity of evaluating recurrent functions, showing that the complexity depends heavily on how the input offsets are encoded and the structure of the recurrence…
The paper evaluates LLM reasoning on Boolean satisfiability (SAT) problems, concluding that conventional metrics are misleading and proposing a paired-formula protocol with Accurate Differentiation Ra…
Jiafu Huang, Chao Peng, Chenyang Xu, Zhengfeng Yang +6 more
The paper proposes using an auxiliary reconstruction task, specifically one that captures intra-state feature dependencies, to improve the quality of state representations learned by the encoder in ne…
Xin Su, Dawid Majchrowski, Fangyuan Yu, Vanshil Atul Shah +4 more
The paper introduces Hybrid Verified Decoding, a method that predicts the acceptance length of a cache draft to intelligently select between cache verification and model-based drafting, achieving sign…
The paper systematically explores a vast design space of cryptographic Boolean networks by formalizing six structural constraints, finding that optimal designs result from sparse, mutually compatible…