~ similar to 2606.02054· 20 results
The paper challenges the conclusion that LLMs lack reasoning by demonstrating that reported performance drops on GSM-Symbolic are often statistically weak and partially attributable to dataset biases,…
The paper introduces Reasoning in Memory (RiM), a latent reasoning method that replaces autoregressive token generation with fixed memory blocks to enable compute-efficient internal working memory for…
The paper introduces MemCog, a Memory-as-Cognition system that integrates memory access directly into the reasoning process, significantly improving agent performance, especially in proactive memory r…
Hyeonjeong Ha, Jeonghwan Kim, Cheng Qian, Jiayu Liu +6 more
MemGuard introduces a type-aware memory framework to prevent heterogeneous memory contamination in long-term memory-augmented LLMs, significantly improving memory reliability and efficiency.
This paper investigates how different types of compressed reasoning data (Explicit, Composed, Implicit CoT) affect LLM performance during post-training, finding that the choice of compression and subs…
Yansong Ning, Mianpeng Liu, Jingwen Ye, Weidong Zhang +1 more
The paper introduces HRBench, a unified and comprehensive evaluation framework for systematically benchmarking and comparing various thinking-mode switching strategies in hybrid-reasoning LLMs.
This paper localizes the attention heads within LLMs responsible for specific reasoning steps, finding that specialized heads handle factual retrieval while higher layers manage global information int…
ThinkSwitch introduces a low-compute co-training procedure that distills the reasoning benefit of large language models into weights, significantly improving performance on specific reasoning tasks.
The paper introduces CosmicFish-HRM, a compact language model that achieves adaptive reasoning by dynamically allocating computational effort through a Hierarchical Reasoning Module (HRM), showing tha…
Yunhai Hu, Zining Liu, Xiangyang Yin, Tianhua Xia +4 more
DREAM-R is a novel framework that significantly enhances speculative reasoning in large multimodal models by optimizing draft generation alignment, introducing a robust verification mechanism, and ena…
The paper analyzes the failure modes of aggressive 2-bit quantization in large reasoning models, proposing lightweight controls like FP16 planning and loop rescue to restore accuracy and achieve pract…
This paper unifies the fragmented field of Tree-of-Thoughts (ToT) reasoning by mapping LLM-based search processes onto a formal taxonomy derived from classical heuristic search theory.
Shuqiang Wang, Wei Cao, Jiaqi Weng, Jialing Tao +3 more
The paper proposes a black-box attack using a hierarchical genetic algorithm to induce 'overthinking' in Large Reasoning Models, demonstrating that this vulnerability can cause significant resource ex…
Zhikai Pan, Chih-Ting Liao, Chunrui Liu, Xi Xiao +4 more
The paper introduces a multilingual benchmark (MentalMap) to test if LLMs build internal spatial world models from text, finding a universal 'L3 reasoning cliff' suggesting that text-only working memo…
Kevin Wang, Anna Thöni, Benjamin Kempinski, Bobby Cheng +49 more
The paper introduces Mindgames, a comprehensive multi-game arena for evaluating LLM agents' sustained social and strategic reasoning, demonstrating that current evaluations are limited by structural s…
The paper introduces Contrastive Reflection (CORE), a novel non-parametric method that rapidly improves language model reasoning by distilling contrasts between successful and unsuccessful problem att…
Jingjie Lin, Bingbing Wang, Zihan Wang, Zhengda Jin +3 more
The paper introduces RefMem-Bench, a new benchmark for measuring reflective memory in long-horizon dialogue, and proposes REMIND, a framework that significantly improves models' ability to synthesize…
This paper investigates the production-evaluation gap in Large Reasoning Models (LRMs), finding that while LRMs excel at generating solutions, they struggle significantly to evaluate flawed reasoning,…
Yubo Gao, Haotian Wu, Hong Chen, Junquan Huang +7 more
The paper introduces Hierarchical Adaptive Budgeter (HAB), a framework that improves LLM reasoning efficiency by adaptively allocating computational resources to match the intrinsic complexity of both…