~ similar to 2605.29687· 20 results
The paper evaluates LLM reasoning on Boolean satisfiability (SAT) problems, concluding that conventional metrics are misleading and proposing a paired-formula protocol with Accurate Differentiation Ra…
Yansong Ning, Mianpeng Liu, Jingwen Ye, Weidong Zhang +1 more
The paper introduces HRBench, a unified and comprehensive evaluation framework for systematically benchmarking and comparing various thinking-mode switching strategies in hybrid-reasoning LLMs.
The paper introduces TRAILS~, a novel method that improves code correctness validation by grounding LLM reasoning in concrete (input, output) pairs derived from specifications, achieving state-of-the-…
The paper proposes an unsupervised Reinforcement Learning approach that enforces cross-lingual self-consistency to significantly enhance the multilingual reasoning capabilities of large language model…
The paper introduces LinTree, a method that explicitly structures the search history of LLM reasoning traces using parent pointers, significantly improving task performance and search efficiency compa…
The paper demonstrates that using Reinforcement Learning from Verifiable Rewards (RLVR) significantly improves small language models' functional correctness in code generation, particularly when combi…
Zhenting Qi, Susanna Maria Baby, Stefanie Anna Baby, Kan Yuan +4 more
The paper investigates the limits of self-evolution in LLM reasoning under closed-loop settings, finding that while self-improvement is significant, it consistently falls short of perfect oracle super…
The paper introduces CodeGolf Bench, a novel multi-language benchmark using code golf to measure LLMs' ability to generate highly concise and efficient code, showing that reasoning models significantl…
The paper analyzes the failure modes of aggressive 2-bit quantization in large reasoning models, proposing lightweight controls like FP16 planning and loop rescue to restore accuracy and achieve pract…
This paper introduces the first LLM-generated, domain-independent heuristics for symbolic AI planning, using evolutionary search to surpass the performance of hand-engineered state-of-the-art methods.
Haoyang Liu, Jie Wang, Boxuan Niu, Xiongwei Han +7 more
The paper introduces Opt-Verifier, a novel LLM-based framework that significantly improves the accuracy of automated optimization model generation by implementing dual-side verification from both stru…
Renfei Dang, Xinye Wang, Zhejian Lai, Weilu Xu +4 more
The paper proposes RIEQE, a two-stage training framework that synergistically co-evolves implicit and explicit reasoning capabilities in Large Reasoning Models (LRMs) to significantly improve fine-gra…
Marko Kojic, Ivan Bondyrev, Aral de Moor, Joseph Shtok +5 more
Mellum 2 is an open-weight 12B Mixture-of-Experts (MoE) language model specialized for software engineering, achieving performance competitive with larger models while maintaining the efficiency of a…
This paper unifies the fragmented field of Tree-of-Thoughts (ToT) reasoning by mapping LLM-based search processes onto a formal taxonomy derived from classical heuristic search theory.
This paper investigates how different types of compressed reasoning data (Explicit, Composed, Implicit CoT) affect LLM performance during post-training, finding that the choice of compression and subs…
The paper introduces an automatic numeric-remapping attack to test the robustness of LLMs on arithmetic word problems, finding that LLMs remain sensitive to small numeric changes in datasets like GSM8…
Xin Su, Dawid Majchrowski, Fangyuan Yu, Vanshil Atul Shah +4 more
The paper introduces Hybrid Verified Decoding, a method that predicts the acceptance length of a cache draft to intelligently select between cache verification and model-based drafting, achieving sign…
The paper proposes Luar, a framework that trains reasoning language models to selectively use English translation only when their direct understanding of a non-English input is unreliable, significant…
The paper introduces Contrastive Reflection (CORE), a novel non-parametric method that rapidly improves language model reasoning by distilling contrasts between successful and unsuccessful problem att…
The paper introduces CosmicFish-HRM, a compact language model that achieves adaptive reasoning by dynamically allocating computational effort through a Hierarchical Reasoning Module (HRM), showing tha…