ArXivCSExplorer
☆☆Bookmarks🏆RSSHow to UseFAQ
Built with and by Teycir Ben Soltane•
How to Use•FAQ•GitHub•arXiv.org•
Share:

~ similar to 2605.30327· 20 results

cs.CLcs.AIRecentJun 1, 2026

A Primer in Post-Training Reasoning Data: What We Know About How It Works

Yaoming Li, Guangxiang Zhao, Qilong Shi, Lin Sun +2 more

This paper synthesizes over 150 scattered studies and reports to provide the first comprehensive primer on post-training reasoning data, organizing the field around data objects, utility, construction…

View →
cs.AIcs.LGRecentJun 1, 2026

Extreme Low-Bit Inference in Reasoning Models: Failure Modes and Targeted Recovery

Ekaterina Alimaskina, Darya Rudas, Denis Shveykin, Gleb Molodtsov +2 more

The paper analyzes the failure modes of aggressive 2-bit quantization in large reasoning models, proposing lightweight controls like FP16 planning and loop rescue to restore accuracy and achieve pract…

View →
cs.LGcs.AIEmpiricalRecentJun 4, 2026

RREDCoT: Segment-Level Reward Redistribution for Reasoning Models

Mykyta Ielanskyi, Kajetan Schweighofer, Lukas Aichberger, Sepp Hochreiter

This paper introduces RREDCoT, a method for approximating optimal reward redistribution in Chain-of-Thought reasoning language models without additional generation.

View →
cs.LGcs.AIEmpiricalRecentJun 4, 2026

RREDCoT: Segment-Level Reward Redistribution for Reasoning Models

Mykyta Ielanskyi, Kajetan Schweighofer, Lukas Aichberger, Sepp Hochreiter

This paper introduces RREDCoT, a method for approximating optimal reward redistribution in Chain-of-Thought reasoning language models without additional generation.

View →
cs.AIRecentMay 27, 2026

HRBench: Benchmarking and Understanding Thinking-Mode Switch Strategies in Hybrid-Reasoning LLMs

Yansong Ning, Mianpeng Liu, Jingwen Ye, Weidong Zhang +1 more

The paper introduces HRBench, a unified and comprehensive evaluation framework for systematically benchmarking and comparing various thinking-mode switching strategies in hybrid-reasoning LLMs.

View →
cs.AIRecentMay 27, 2026

DenoiseRL: Bootstrapping Reasoning Models to Recover from Noisy Prefixes

Caijun Xu, Changyi Xiao, Zhongyuan Peng, Yixin Cao

DenoiseRL is a novel reinforcement learning framework that improves reasoning in large language models by optimizing directly from the failures and incorrect reasoning traces of weak models, eliminati…

View →
cs.CLcs.AIcs.LGRecentJun 1, 2026

Off-the-Shelf LLMs as Process Scorers: Training-Free Alternative to PRMs for Mathematical Reasoning

Atoosa Chegini, Soheil Feizi

The paper introduces Chunk-Level Guided Generation, a training-free method that uses an off-the-shelf large language model (LLM) as a process scorer to guide small model generation, achieving performa…

View →
cs.CLcs.LGRecentJun 1, 2026

Unveiling the Entropy Dynamics of Chain-of-Thought Reasoning

Ting Xu, Xu He, Yupu Lu, Jiankai Sun +3 more

The paper analyzes the entropy dynamics of Chain-of-Thought (CoT) reasoning, identifying a transition from an exploratory Uncertainty Region to a stable Confidence Region, which enables superior early…

View →
cs.AIRecentMay 28, 2026

TRACE: Toulmin-based Reasoning Assessment through Constructive Elements for LLM CoT Evaluation

Yundong Kim, Heyoung Yang

The paper introduces TRACE, a novel metric that evaluates the logical structure of LLM reasoning (CoT) by integrating Toulmin's argumentation theory, demonstrating that sound reasoning structure corre…

View →
cs.AIRecentMay 27, 2026

Beyond Consensus: Trace-Level Synthesis in Mixture of Agents

Shreyas Fadnavis, Praitayini Kanakaraj, Felix Wyss

The paper proposes using an LLM aggregator that analyzes complete reasoning traces, demonstrating that trace-level synthesis is superior to traditional consensus methods like majority voting for solvi…

View →
cs.AIRecentMay 27, 2026

CORE: Contrastive Reflection Enables Rapid Improvements in Reasoning

Linas Nasvytis, Simon Jerome Han, Ben Prystawski, Satchel Grant +2 more

The paper introduces Contrastive Reflection (CORE), a novel non-parametric method that rapidly improves language model reasoning by distilling contrasts between successful and unsuccessful problem att…

View →
cs.AIRecentMay 27, 2026

Bridging the Detection-to-Abstention Gap in Reasoning Models under Insufficient Information

Renjie Gu, Jiaxu Li, Yihao Wang, Yun Yue +7 more

The paper addresses the 'detection-to-abstention gap' in reasoning models, where detecting insufficient information does not lead to abstention, by proposing a novel control framework that forces mode…

View →
cs.AIcs.CLcs.LGRecentMay 31, 2026

An Enigma of Artificial Reason: Investigating the Production-Evaluation Gap in Large Reasoning Models

Mingzhong Sun, Teresa Yeo, Armando Solar-Lezama, Tan Zhi-Xuan

This paper investigates the production-evaluation gap in Large Reasoning Models (LRMs), finding that while LRMs excel at generating solutions, they struggle significantly to evaluate flawed reasoning,…

View →
cs.AIRecentMay 27, 2026

Mechanistically Interpreting the Role of Sample Difficulty in RLVR for LLMs

Yue Cheng, Jiajun Zhang, Xiaohui Gao, Weiwei Xing +2 more

This paper investigates the non-monotonic role of sample difficulty in Reinforcement Learning with Verifiable Reward (RLVR), finding that medium-difficulty problems provide the most balanced and benef…

View →
cs.AIcs.CLcs.LGRecentMay 28, 2026

DenseSteer: Steering Small Language Models towards Dense Math Reasoning

Yang Ouyang, Shuhang Lin, Jung-Eun Kim

DenseSteer is a training-free inference-time framework that improves the math reasoning capabilities of small language models by steering their internal representations toward a 'Dense Reasoning' patt…

View →
cs.LGcs.AIRecentMay 30, 2026

The Paradox of Outcome Optimization: A Causal Information-Theoretic Bound on Reasoning Shortcuts in LLMs

Zihan Chen, Yiming Zhang, Wenxiang Geng, Zenghui Ding +1 more

The paper theoretically explains that optimizing LLMs solely on outcomes leads to brittle reasoning (Reward-Induced Manifold Collapse) by favoring low-complexity shortcuts, and proposes process-based…

View →
cs.AIcs.CLcs.LORecentMay 27, 2026

Satisfiability Solving with LLMs: A Matched-Pair Evaluation of Reasoning Capability

Leizhen Zhang, Shuhan Chen, Sheng Chen

The paper evaluates LLM reasoning on Boolean satisfiability (SAT) problems, concluding that conventional metrics are misleading and proposing a paired-formula protocol with Accurate Differentiation Ra…

View →
cs.CRcs.LGRecentMar 19, 2026

Towards Verifiable AI with Lightweight Cryptographic Proofs of Inference

Pranay Anchuri, Matteo Campanelli, Paul Cesaretti, Rosario Gennaro +3 more

The paper introduces a lightweight, sampling-based cryptographic protocol for verifiable AI inference that drastically reduces proving overhead from minutes to milliseconds by leveraging statistical p…

View →
cs.AIcs.CLcs.LORecentMay 27, 2026

Risk-Controlled Lean-as-Judge for Natural-Language Mathematical Reasoning

Pauline Bourigault, Xiaotong Ji, Matthieu Zimmer, Rasul Tutunov +1 more

The paper introduces COVCAL, a risk-controlled method that precisely determines when a partial formalization signal from an autoformalizer can be trusted to certify the correctness of natural-language…

View →
cs.LGcs.AIcs.CLRecentJun 3, 2026

Failed Reasoning Traces Tell You What Is Fixable (But Not by Reading Them)

Nizar Islah, Istabrak Abbes, Irina Rish, Sarath Chandar +1 more

This paper proposes a method to recover recoverability structure from failed traces of post-trained language models, enabling test-time routing and post-training analysis.

View →