~ similar to 2605.29288· 20 results
The paper introduces TRACE, a novel metric that evaluates the logical structure of LLM reasoning (CoT) by integrating Toulmin's argumentation theory, demonstrating that sound reasoning structure corre…
The paper analyzes backtracking dynamics in long reasoning traces to distinguish between useful self-correction and unproductive revision, finding that correct reasoning exhibits early, isolated repai…
This paper investigates how different types of compressed reasoning data (Explicit, Composed, Implicit CoT) affect LLM performance during post-training, finding that the choice of compression and subs…
The paper introduces LinTree, a method that explicitly structures the search history of LLM reasoning traces using parent pointers, significantly improving task performance and search efficiency compa…
This paper evaluates the causal reasoning abilities of large language models and finds that they rely heavily on lexical pattern matching rather than structural reasoning.
Zhaoyang Jiang, Xuanqi Peng, Fei Teng, Zhizhong Fu +4 more
The paper demonstrates that while distilling large language models for medical QA can significantly improve final answer accuracy, this gain often comes at the cost of factual accuracy and detailed re…
ReasonOps introduces an unsupervised method to segment and analyze the common, compositional structure of LLM reasoning traces, discovering universal reasoning operators that predict model identity an…
LongTraceRL addresses long-context reasoning challenges by generating highly challenging training data and introducing a fine-grained rubric reward, significantly improving evidence-grounded reasoning…
Yansong Ning, Mianpeng Liu, Jingwen Ye, Weidong Zhang +1 more
The paper introduces HRBench, a unified and comprehensive evaluation framework for systematically benchmarking and comparing various thinking-mode switching strategies in hybrid-reasoning LLMs.
This paper analyzes the internal decision-making process of large language models by tracking how the answer score changes across multiple internal computational steps (trajectories), finding that mod…
Yu-An Lu, Ci-Yang Tsai, Yu-Lin Tsai, Raluca Ada Popa +1 more
The paper introduces Reasoning Exposure Prompting (REP), a method that demonstrates that even when LLMs hide their internal reasoning steps from users, useful reasoning supervision can still be elicit…
Yu-An Lu, Ci-Yang Tsai, Yu-Lin Tsai, Raluca Ada Popa +1 more
The paper introduces Reasoning Exposure Prompting (REP), a method that demonstrates that even when LLMs hide internal reasoning traces from users, useful reasoning supervision can still be elicited th…
Renjie Gu, Jiaxu Li, Yihao Wang, Yun Yue +7 more
The paper addresses the 'detection-to-abstention gap' in reasoning models, where detecting insufficient information does not lead to abstention, by proposing a novel control framework that forces mode…
The paper introduces TRAILS~, a novel method that improves code correctness validation by grounding LLM reasoning in concrete (input, output) pairs derived from specifications, achieving state-of-the-…
The paper analyzes the failure modes of aggressive 2-bit quantization in large reasoning models, proposing lightweight controls like FP16 planning and loop rescue to restore accuracy and achieve pract…
COFT is a training-free decoding method that significantly reduces societal biases in large language model chain-of-thought reasoning by applying token-level fairness control at decode time.
This paper localizes the attention heads within LLMs responsible for specific reasoning steps, finding that specialized heads handle factual retrieval while higher layers manage global information int…
ThinkSwitch introduces a low-compute co-training procedure that distills the reasoning benefit of large language models into weights, significantly improving performance on specific reasoning tasks.
The paper introduces a novel framework to quantify faithful confidence expression (FC) in Large Reasoning Models (LRMs), finding that FC remains a significant and challenging reliability target for th…
Guancheng Tu, Xiangjun Fu, Suhao Yu, Yao Tang +4 more
This paper proposes NF-CoT, a latent reasoning framework that preserves the advantages of chain-of-thought in large language models.