~ similar to 2606.02011· 20 results
The paper introduces an automatic numeric-remapping attack to test the robustness of LLMs on arithmetic word problems, finding that LLMs remain sensitive to small numeric changes in datasets like GSM8…
This paper investigates how different types of compressed reasoning data (Explicit, Composed, Implicit CoT) affect LLM performance during post-training, finding that the choice of compression and subs…
ThinkSwitch introduces a low-compute co-training procedure that distills the reasoning benefit of large language models into weights, significantly improving performance on specific reasoning tasks.
Yansong Ning, Mianpeng Liu, Jingwen Ye, Weidong Zhang +1 more
The paper introduces HRBench, a unified and comprehensive evaluation framework for systematically benchmarking and comparing various thinking-mode switching strategies in hybrid-reasoning LLMs.
The paper introduces a lightweight, sampling-based cryptographic protocol for verifiable AI inference that drastically reduces proving overhead from minutes to milliseconds by leveraging statistical p…
The paper introduces Chunk-Level Guided Generation, a training-free method that uses an off-the-shelf large language model (LLM) as a process scorer to guide small model generation, achieving performa…
Xunguang Wang, Yuguang Zhou, Qingyue Wang, Zongjie Li +4 more
This paper introduces a novel framework, the Reasoning Safety Monitor, to detect and prevent logical inconsistencies and adversarial manipulations within the internal reasoning steps of large language…
Renfei Dang, Xinye Wang, Zhejian Lai, Weilu Xu +4 more
The paper proposes RIEQE, a two-stage training framework that synergistically co-evolves implicit and explicit reasoning capabilities in Large Reasoning Models (LRMs) to significantly improve fine-gra…
The paper introduces an Integrated, cross-Architecture Reasoning (IAR) framework to provide a unified and robust method for interpreting the opaque reasoning processes within Large Language Models.
The paper analyzes backtracking dynamics in long reasoning traces to distinguish between useful self-correction and unproductive revision, finding that correct reasoning exhibits early, isolated repai…
The paper introduces Reasoning in Memory (RiM), a latent reasoning method that replaces autoregressive token generation with fixed memory blocks to enable compute-efficient internal working memory for…
Shuqiang Wang, Wei Cao, Jiaqi Weng, Jialing Tao +3 more
The paper proposes a black-box attack using a hierarchical genetic algorithm to induce 'overthinking' in Large Reasoning Models, demonstrating that this vulnerability can cause significant resource ex…
DenseSteer is a training-free inference-time framework that improves the math reasoning capabilities of small language models by steering their internal representations toward a 'Dense Reasoning' patt…
The paper demonstrates that for edge-native SLMs used in decentralized governance, simpler, intuitive reasoning (System 1) is significantly more robust and efficient than complex, iterative deliberati…
The paper proposes SLAT, a segment-level adaptive trimming framework, which efficiently reduces redundant reasoning in large language model CoT outputs by selectively suppressing segments with low mar…
The paper proposes a hybrid reasoning framework where Large Language Models (LLMs) generate code to encode complex optimization problems into a preference-based Maximum Satisfiability (MaxSAT) format,…
Nizar Islah, Istabrak Abbes, Irina Rish, Sarath Chandar +1 more
This paper proposes a method to recover recoverability structure from failed traces of post-trained language models, enabling test-time routing and post-training analysis.
The paper introduces CosmicFish-HRM, a compact language model that achieves adaptive reasoning by dynamically allocating computational effort through a Hierarchical Reasoning Module (HRM), showing tha…
Guoxin Ma, Yibing Liu, Chengzhengxu Li, Yu Liang +6 more
The paper introduces Thinking as Compression (TaC), a novel paradigm showing that the inherent reasoning process of a large language model can naturally compress long context inputs, outperforming ded…
This paper systematically studies how soft errors propagate during Large Language Model (LLM) inference using a novel fault-injection framework, providing critical insights and mitigation strategies f…