~ similar to 2605.31421· 20 results
The paper proposes a unified framework for designing efficient and expressive token mixing layers by separating the direct and recurrent influences of inputs, allowing for a principled trade-off betwe…
This study benchmarks token-optimized formats (TOON and TRON) against JSON in end-to-end agentic AI systems, finding that TRON significantly reduces token overhead with minimal performance degradation…
The paper introduces CFGzip, an offline token space compression technique that significantly reduces the computational overhead of constrained decoding, making complex grammar enforcement feasible at…
The paper provides a unified algebraic framework to determine the formal language expressivity of recurrent neural language models, resolving conflicts in existing literature by linking expressivity t…
The paper proposes SubFit, a novel compression technique that achieves superior LLM compression by replacing non-contiguous, submodule-level components (Attention and FeedForward) with lightweight res…
The paper demonstrates that the location and nature of state encoding in sequence models are not fixed architectural traits but are highly dependent on the specific task, showing that the encoding pro…
Jiafu Huang, Chao Peng, Chenyang Xu, Zhengfeng Yang +6 more
The paper proposes using an auxiliary reconstruction task, specifically one that captures intra-state feature dependencies, to improve the quality of state representations learned by the encoder in ne…
Marko Kojic, Ivan Bondyrev, Aral de Moor, Joseph Shtok +5 more
Mellum 2 is an open-weight 12B Mixture-of-Experts (MoE) language model specialized for software engineering, achieving performance competitive with larger models while maintaining the efficiency of a…
Zhenting Qi, Susanna Maria Baby, Stefanie Anna Baby, Kan Yuan +4 more
The paper investigates the limits of self-evolution in LLM reasoning under closed-loop settings, finding that while self-improvement is significant, it consistently falls short of perfect oracle super…
This paper evaluates the causal reasoning abilities of large language models and finds that they rely heavily on lexical pattern matching rather than structural reasoning.
Yifei Zuo, Dhruv Pai, Zhichen Zeng, Alec Dewulf +2 more
The paper introduces Parallax, a scalable and numerically stable parameterized Local Linear Attention mechanism that significantly improves LLM performance and efficiency compared to existing methods…
PrunePath introduces a budget-adaptive structured sparsification framework that efficiently prunes Feed-forward networks in large language models, achieving hardware-friendly sparsity and measurable s…
The paper challenges the conclusion that LLMs lack reasoning by demonstrating that reported performance drops on GSM-Symbolic are often statistically weak and partially attributable to dataset biases,…
The paper introduces CoRP, a gradient-free operator that consolidates the benefits of ensemble-based post-training methods into a single, deployable model update, significantly improving performance w…
Clark Hash is a stateless, deterministic quantization method that significantly reduces the storage size of neural embeddings while maintaining high accuracy for cosine similarity search.
Aniket Anand, Janvijay Singh, Zhewei Sun, Dilek Hakkani-Tür +1 more
The paper demonstrates that the AI-like style introduced by post-training alignment can be measured, localized, and causally removed using a novel ablation technique called PASTA.
Tong Ye, Hang Yu, Tengfei Ma, Xuhong Zhang +5 more
The paper introduces DOMINO, a novel inductive framework that synthesizes domain-specific data for LLMs using only reference examples, significantly improving performance on challenging, implicitly de…
The paper introduces a novel, transferable learned attack (LT-MIA) that detects a universal 'signature of memorization' in language models, achieving high accuracy across diverse model architectures (…
This paper proposes Supervised Memory Training (SMT), a method for training nonlinear RNNs that sidesteps recurrent credit propagation entirely.
This paper proposes Supervised Memory Training (SMT), a method for training nonlinear RNNs that sidesteps recurrent credit propagation entirely.