~ similar to 2606.00532· 19 results
DenseSteer is a training-free inference-time framework that improves the math reasoning capabilities of small language models by steering their internal representations toward a 'Dense Reasoning' patt…
AXIOM is a trust-first neuro-symbolic execution architecture that ensures verifiable mathematical reasoning by strictly separating language model interpretation from deterministic computation, achievi…
The paper introduces Contrastive Reflection (CORE), a novel non-parametric method that rapidly improves language model reasoning by distilling contrasts between successful and unsuccessful problem att…
The paper introduces an automatic numeric-remapping attack to test the robustness of LLMs on arithmetic word problems, finding that LLMs remain sensitive to small numeric changes in datasets like GSM8…
The paper presents Tahoe, a system that optimizes Text-to-SQL performance through dynamic data management and hint learning.
The paper proposes Distribution-Aligned Self-Distillation (DASD) to improve self-distillation by dynamically filtering high-perplexity tokens, thereby preserving useful logical knowledge while suppres…
The paper introduces OCC-RAG, a family of compact, task-specialized Small Language Models (SLMs) designed to achieve highly faithful, multi-hop question answering grounded strictly in provided context…
Guoxin Ma, Yibing Liu, Chengzhengxu Li, Yu Liang +6 more
The paper introduces Thinking as Compression (TaC), a novel paradigm showing that the inherent reasoning process of a large language model can naturally compress long context inputs, outperforming ded…
Yubo Gao, Haotian Wu, Hong Chen, Junquan Huang +7 more
The paper introduces Hierarchical Adaptive Budgeter (HAB), a framework that improves LLM reasoning efficiency by adaptively allocating computational resources to match the intrinsic complexity of both…
Yansong Ning, Mianpeng Liu, Jingwen Ye, Weidong Zhang +1 more
The paper introduces HRBench, a unified and comprehensive evaluation framework for systematically benchmarking and comparing various thinking-mode switching strategies in hybrid-reasoning LLMs.
ThinkSwitch introduces a low-compute co-training procedure that distills the reasoning benefit of large language models into weights, significantly improving performance on specific reasoning tasks.
The paper introduces CosmicFish-HRM, a compact language model that achieves adaptive reasoning by dynamically allocating computational effort through a Hierarchical Reasoning Module (HRM), showing tha…
Zixuan Zhu, Yitong Hu, Yong Dai, Junfeng Fang +3 more
The paper introduces Unified Context Evolution (UCE), a gradient-free framework that externalizes and manages agent experience into a typed, evolving library, significantly improving performance on mu…
Zilin Xiao, Qi Ma, Chun-cheng Jason Chen, Xintao Chen +3 more
This paper proposes a post-training framework called Retrieval-Augmented Reinforcement Fine-Tuning (RA-RFT) to teach language models to reason by analogy.
Marko Kojic, Ivan Bondyrev, Aral de Moor, Joseph Shtok +5 more
Mellum 2 is an open-weight 12B Mixture-of-Experts (MoE) language model specialized for software engineering, achieving performance competitive with larger models while maintaining the efficiency of a…
Wanying Ren, Xin Song, Futing Wang, Guoxiu He +1 more
The paper theoretically analyzes the limitations of parameter-based knowledge editing and empirically demonstrates that these methods consistently damage core LLM capabilities compared to retrieval-ba…
Zhenting Qi, Susanna Maria Baby, Stefanie Anna Baby, Kan Yuan +4 more
The paper investigates the limits of self-evolution in LLM reasoning under closed-loop settings, finding that while self-improvement is significant, it consistently falls short of perfect oracle super…
This paper investigates how different types of compressed reasoning data (Explicit, Composed, Implicit CoT) affect LLM performance during post-training, finding that the choice of compression and subs…