~ similar to 2605.28398· 20 results
The paper evaluates LLM reasoning on Boolean satisfiability (SAT) problems, concluding that conventional metrics are misleading and proposing a paired-formula protocol with Accurate Differentiation Ra…
The paper analyzes the failure modes of aggressive 2-bit quantization in large reasoning models, proposing lightweight controls like FP16 planning and loop rescue to restore accuracy and achieve pract…
Yuxin Wang, Jiahao Lu, Qifeng Wu, Shicheng Fang +4 more
AdaptR1 is a novel Reinforcement Learning framework that adaptively manages reasoning effort at every step of multi-hop Question Answering, significantly reducing unnecessary computational cost withou…
ThinkSwitch introduces a low-compute co-training procedure that distills the reasoning benefit of large language models into weights, significantly improving performance on specific reasoning tasks.
The paper introduces CosmicFish-HRM, a compact language model that achieves adaptive reasoning by dynamically allocating computational effort through a Hierarchical Reasoning Module (HRM), showing tha…
The paper introduces Reasoning in Memory (RiM), a latent reasoning method that replaces autoregressive token generation with fixed memory blocks to enable compute-efficient internal working memory for…
The paper introduces CodeGolf Bench, a novel multi-language benchmark using code golf to measure LLMs' ability to generate highly concise and efficient code, showing that reasoning models significantl…
The paper introduces Chunk-Level Guided Generation, a training-free method that uses an off-the-shelf large language model (LLM) as a process scorer to guide small model generation, achieving performa…
Yubo Gao, Haotian Wu, Hong Chen, Junquan Huang +7 more
The paper introduces Hierarchical Adaptive Budgeter (HAB), a framework that improves LLM reasoning efficiency by adaptively allocating computational resources to match the intrinsic complexity of both…
Yang He, Xiao Ding, Bibo Cai, Yufei Zhang +4 more
DeepTool introduces a novel Process-Supervised Reinforcement Learning framework to enhance Tool-Integrated Reasoning by explicitly supervising and rewarding intermediate, interleaved deliberation step…
The paper proposes a hybrid reasoning framework where Large Language Models (LLMs) generate code to encode complex optimization problems into a preference-based Maximum Satisfiability (MaxSAT) format,…
The paper introduces LinTree, a method that explicitly structures the search history of LLM reasoning traces using parent pointers, significantly improving task performance and search efficiency compa…
DenseSteer is a training-free inference-time framework that improves the math reasoning capabilities of small language models by steering their internal representations toward a 'Dense Reasoning' patt…
This paper investigates how different types of compressed reasoning data (Explicit, Composed, Implicit CoT) affect LLM performance during post-training, finding that the choice of compression and subs…
Renfei Dang, Xinye Wang, Zhejian Lai, Weilu Xu +4 more
The paper proposes RIEQE, a two-stage training framework that synergistically co-evolves implicit and explicit reasoning capabilities in Large Reasoning Models (LRMs) to significantly improve fine-gra…
Xiang Li, Jiwei Wei, Ke Liu, Yitong Qin +4 more
The eMoT framework enhances multi-step reasoning in LLMs by treating reasoning as an evolving memory, stabilizing performance through symbolic computation and structured refinement.
This survey provides a comprehensive analysis of Reasoning Language Model (RLM) adoption across 28 scientific disciplines, revealing significant disparities in RLM maturity across different scientific…
The paper introduces an Integrated, cross-Architecture Reasoning (IAR) framework to provide a unified and robust method for interpreting the opaque reasoning processes within Large Language Models.
Qiuyu Tian, Zequn Liu, Yingce Xia, Haojie Yin +1 more
The paper introduces ForeSci, a novel benchmark that evaluates LLM agents' ability to make forward-looking research judgments using only historical evidence, finding that explicit evidence organizatio…
The paper introduces an automatic numeric-remapping attack to test the robustness of LLMs on arithmetic word problems, finding that LLMs remain sensitive to small numeric changes in datasets like GSM8…