ArXivCSExplorer
☆☆Bookmarks🏆RSSHow to UseFAQ
Built with and by Teycir Ben Soltane•
How to Use•FAQ•GitHub•arXiv.org•
Share:

~ similar to 2606.01080· 19 results

cs.AIRecentMay 27, 2026

HRBench: Benchmarking and Understanding Thinking-Mode Switch Strategies in Hybrid-Reasoning LLMs

Yansong Ning, Mianpeng Liu, Jingwen Ye, Weidong Zhang +1 more

The paper introduces HRBench, a unified and comprehensive evaluation framework for systematically benchmarking and comparing various thinking-mode switching strategies in hybrid-reasoning LLMs.

View →
cs.AIcs.LGRecentJun 1, 2026

Extreme Low-Bit Inference in Reasoning Models: Failure Modes and Targeted Recovery

Ekaterina Alimaskina, Darya Rudas, Denis Shveykin, Gleb Molodtsov +2 more

The paper analyzes the failure modes of aggressive 2-bit quantization in large reasoning models, proposing lightweight controls like FP16 planning and loop rescue to restore accuracy and achieve pract…

View →
cs.CLRecentMay 29, 2026

AdaptR1: Reinforcement Learning Based Adaptive Interleaved Thinking in Multi-hop Question Answering

Yuxin Wang, Jiahao Lu, Qifeng Wu, Shicheng Fang +4 more

AdaptR1 is a novel Reinforcement Learning framework that adaptively manages reasoning effort at every step of multi-hop Question Answering, significantly reducing unnecessary computational cost withou…

View →
cs.AIcs.LGRecentMay 27, 2026

Zipping the Thought: When and How Compressed Reasoning Data Works in LLM Post-Training

Kohsei Matsutani, Gouki Minegishi, Takeshi Kojima, Yusuke Iwasawa +1 more

This paper investigates how different types of compressed reasoning data (Explicit, Composed, Implicit CoT) affect LLM performance during post-training, finding that the choice of compression and subs…

View →
cs.AIRecentMay 27, 2026

Thinking as Compression: Your Reasoning Model is Secretly a Context Compressor

Guoxin Ma, Yibing Liu, Chengzhengxu Li, Yu Liang +6 more

The paper introduces Thinking as Compression (TaC), a novel paradigm showing that the inherent reasoning process of a large language model can naturally compress long context inputs, outperforming ded…

View →
cs.CLcs.AIRecentMay 28, 2026

Unlocking the Working Memory of Large Language Models for Latent Reasoning

Lukas Aichberger, Sepp Hochreiter

The paper introduces Reasoning in Memory (RiM), a latent reasoning method that replaces autoregressive token generation with fixed memory blocks to enable compute-efficient internal working memory for…

View →
cs.CLRecentMay 29, 2026

Unlocking Fine-Grained Translation Quality Estimation in LRMs through Synergistically Evolving Implicit and Explicit Reasoning

Renfei Dang, Xinye Wang, Zhejian Lai, Weilu Xu +4 more

The paper proposes RIEQE, a two-stage training framework that synergistically co-evolves implicit and explicit reasoning capabilities in Large Reasoning Models (LRMs) to significantly improve fine-gra…

View →
cs.CLcs.LGEmpiricalRecentJun 4, 2026

Latent Reasoning with Normalizing Flows

Guancheng Tu, Xiangjun Fu, Suhao Yu, Yao Tang +4 more

This paper proposes NF-CoT, a latent reasoning framework that preserves the advantages of chain-of-thought in large language models.

View →
cs.CLcs.LGEmpiricalRecentJun 4, 2026

Latent Reasoning with Normalizing Flows

Guancheng Tu, Xiangjun Fu, Suhao Yu, Yao Tang +4 more

This paper proposes NF-CoT, a latent reasoning framework that preserves the advantages of chain-of-thought in large language models.

View →
cs.AIcs.CLRecentMay 28, 2026

ReasonOps: Operator Segmentation for LLM Reasoning Traces

Daniel Lee, Owen Queen, James Zou

ReasonOps introduces an unsupervised method to segment and analyze the common, compositional structure of LLM reasoning traces, discovering universal reasoning operators that predict model identity an…

View →
cs.AIRecentMay 27, 2026

CORE: Contrastive Reflection Enables Rapid Improvements in Reasoning

Linas Nasvytis, Simon Jerome Han, Ben Prystawski, Satchel Grant +2 more

The paper introduces Contrastive Reflection (CORE), a novel non-parametric method that rapidly improves language model reasoning by distilling contrasts between successful and unsuccessful problem att…

View →
cs.AIcs.CLcs.LGRecentMay 28, 2026

DenseSteer: Steering Small Language Models towards Dense Math Reasoning

Yang Ouyang, Shuhang Lin, Jung-Eun Kim

DenseSteer is a training-free inference-time framework that improves the math reasoning capabilities of small language models by steering their internal representations toward a 'Dense Reasoning' patt…

View →
cs.AIRecentMay 27, 2026

The Chain Holds, the Answer Folds: Trace-Answer Dissociation in Reasoning Models Under Adversarial Pressure

Yubo Li, Ramayya Krishnan, Rema Padman

The paper identifies a failure mode called unfaithful capitulation (UC), where reasoning models maintain a correct internal thought process (chain-of-thought) but output an incorrect final answer when…

View →
cs.CLRecentMay 30, 2026

OCC-RAG: Optimal Cognitive Core for Faithful Question Answering

Maksim Savkin, Mikhail Goncharov, Alexander Gambashidze, Alla Chepurova +6 more

The paper introduces OCC-RAG, a family of compact, task-specialized Small Language Models (SLMs) designed to achieve highly faithful, multi-hop question answering grounded strictly in provided context…

View →
cs.CLcs.AIRecentMay 28, 2026

Same Evidence, Different Answers: Canonical-Context On-Policy Distillation for Multi-Turn Language Models

Zizhuo Lin, Quanling Liu, Jinsheng Quan, Chao Zhang +5 more

The paper introduces Canonical-Context On-Policy Distillation (CCOPD) to improve multi-turn language model performance by mitigating 'self-anchored drift,' ensuring consistent answers regardless of wh…

View →
cs.AIRecentJun 1, 2026

eMoT: evolving Memory-of-Thought via Symbolic Anchoring and Memory Corrosion

Xiang Li, Jiwei Wei, Ke Liu, Yitong Qin +4 more

The eMoT framework enhances multi-step reasoning in LLMs by treating reasoning as an evolving memory, stabilizing performance through symbolic computation and structured refinement.

View →
cs.CLcs.AIRecentMay 29, 2026

Bridging Reasoning Trajectories in On-Policy Distillation via Near-Future Guidance

Yuxuan Jiang, Francis Ferraro

The paper introduces Trajectory-aware OPD (TOPD), a method that uses near-future trajectory information to improve On-Policy Distillation by accurately identifying and guiding true reasoning divergenc…

View →
cs.AIRecentMay 28, 2026

Tailoring the Curriculum: Student-Centered Reasoning Distillation via Dynamic Data-Model Compatibility

Jiahao Huang, Fei Cheng, Junfeng Jiang, Akiko Aizawa

This paper introduces the Data-Model Compatibility (DMC) metric to quantify how suitable a dataset is for reasoning distillation, showing that optimizing data selection using DMC significantly improve…

View →