ArXivCSExplorer
☆☆Bookmarks🏆RSSHow to UseFAQ
Built with and by Teycir Ben Soltane•
How to Use•FAQ•GitHub•arXiv.org•
Share:

~ similar to 2605.30219· 20 results

cs.AIRecentMay 28, 2026

Meta-Cognitive Memory Policy Optimization for Long-Horizon LLM Agents

Ziyan Liu, Zhezheng Hao, Yeqiu Chen, Hong Wang +6 more

The paper introduces Metacognitive Memory Policy Optimization (MMPO), a novel memory training approach that optimizes LLM memory not based on final task success, but on minimizing epistemic uncertaint…

View →
cs.CLcs.AIRecentJun 2, 2026

Quantifying Faithful Confidence Expression in Large Reasoning Models

Areeb Gani, Asal Meskin, Gabrielle Kaili-May Liu, Arman Cohan

The paper introduces a novel framework to quantify faithful confidence expression (FC) in Large Reasoning Models (LRMs), finding that FC remains a significant and challenging reliability target for th…

View →
cs.CLcs.LGRecentJun 1, 2026

Unveiling the Entropy Dynamics of Chain-of-Thought Reasoning

Ting Xu, Xu He, Yupu Lu, Jiankai Sun +3 more

The paper analyzes the entropy dynamics of Chain-of-Thought (CoT) reasoning, identifying a transition from an exploratory Uncertainty Region to a stable Confidence Region, which enables superior early…

View →
cs.CLcs.AIRecentMay 28, 2026

Same Evidence, Different Answers: Canonical-Context On-Policy Distillation for Multi-Turn Language Models

Zizhuo Lin, Quanling Liu, Jinsheng Quan, Chao Zhang +5 more

The paper introduces Canonical-Context On-Policy Distillation (CCOPD) to improve multi-turn language model performance by mitigating 'self-anchored drift,' ensuring consistent answers regardless of wh…

View →
cs.AIcs.CLcs.LGRecentMay 29, 2026

The Deterministic Horizon: When Extended Reasoning Fails and Tool Delegation Becomes Necessary

Dongxin Guo, Jikun Wu, Siu Ming Yiu

The paper demonstrates that extended pure neural reasoning fails on complex, deterministic state-tracking tasks beyond a certain 'Deterministic Horizon,' necessitating the integration of external tool…

View →
cs.AIRecentMay 31, 2026

Diagnosing LLM Arbitration Behavior over Pre-evidence Epistemic States in RAG-based Fact-Checking

Yuxi Sun, Wenbo Shang, Wei Gao, Xin Huang +1 more

The paper introduces a diagnostic testbed, PAVE, to evaluate how LLMs arbitrate between their internal knowledge and retrieved evidence during fact-checking, revealing that this arbitration is unrelia…

View →
cs.AIRecentMay 27, 2026

Confidence-Orchestrated Self-Evolution against Uncertain LLM Feedback

Bowen Wei, Nan Wang, Yuqing Zhou, Jinhao Pan +1 more

The paper proposes COSE, a method that uses an LLM's intrinsic confidence as an uncertainty signal to improve self-evolutionary training, achieving state-of-the-art performance on general reasoning an…

View →
cs.AIRecentMay 29, 2026

LinTree: Improving LLM Reasoning with Explicitly Structured Search Histories

Liwei Kang, Yee Whye Teh, Wee Sun Lee

The paper introduces LinTree, a method that explicitly structures the search history of LLM reasoning traces using parent pointers, significantly improving task performance and search efficiency compa…

View →
cs.AIcs.LGRecentMay 29, 2026

Diagnosing Failure Modes of Shared-State Collaboration in Resource-Constrained Visual Agents

Yunpeng Zhou

This paper analyzes failure modes in collaborative visual reasoning systems, demonstrating that naive shared workspaces can amplify hallucinations and proposing diagnostics for improving communication…

View →
cs.CLcs.AIRecentJun 1, 2026

A Primer in Post-Training Reasoning Data: What We Know About How It Works

Yaoming Li, Guangxiang Zhao, Qilong Shi, Lin Sun +2 more

This paper synthesizes over 150 scattered studies and reports to provide the first comprehensive primer on post-training reasoning data, organizing the field around data objects, utility, construction…

View →
cs.AIRecentMay 27, 2026

DREAM-R: Multimodal Speculative Reasoning with RL-Based Refined Drafting, Precise Verification, and Fully Parallel Execution

Yunhai Hu, Zining Liu, Xiangyang Yin, Tianhua Xia +4 more

DREAM-R is a novel framework that significantly enhances speculative reasoning in large multimodal models by optimizing draft generation alignment, introducing a robust verification mechanism, and ena…

View →
cs.AIcs.MARecentMay 29, 2026

MindZero: Learning Online Mental Reasoning With Zero Annotations

Shunchi Zhang, Jin Lu, Chuanyang Jin, Yichao Zhou +2 more

MindZero introduces a self-supervised reinforcement learning framework that trains multimodal large language models (MLLMs) for efficient and robust online mental reasoning without requiring explicit…

View →
cs.CLcs.AIRecentMay 28, 2026

Do Language Models Track Entities Across State Changes?

Zilu Tang, Qiao Zhao, Gabriel Franco, Derry Wijaya +3 more

The paper investigates how language models perform entity tracking across state changes and finds that LMs use a non-incremental, parallel aggregation strategy rather than maintaining a true internal…

View →
cs.AIRecentMay 27, 2026

DenoiseRL: Bootstrapping Reasoning Models to Recover from Noisy Prefixes

Caijun Xu, Changyi Xiao, Zhongyuan Peng, Yixin Cao

DenoiseRL is a novel reinforcement learning framework that improves reasoning in large language models by optimizing directly from the failures and incorrect reasoning traces of weak models, eliminati…

View →
cs.LGcs.AIRecentJun 1, 2026

Policy and World Modeling Co-Training for Language Agents

Ning Lu, Baijiong Lin, Shengcai Liu, Jiahao Wu +8 more

The paper proposes PaW, a co-training framework that uses standard RL rollouts to provide auxiliary world model supervision directly during policy training, significantly improving language agent perf…

View →
cs.CRRecentMay 2, 2026

Ghost in the Context: Measuring Policy-Carriage Failures in Decision-Time Assembly

Igor Santos-Grueiro

The paper identifies and measures a critical failure mode where LLM agents violate policies by losing or corrupting directive-bearing state during the process of assembling the decision context, and p…

View →
cs.CLcs.AIcs.LGRecentMay 29, 2026

LongTraceRL: Learning Long-Context Reasoning from Search Agent Trajectories with Rubric Rewards

Nianyi Lin, Jiajie Zhang, Lei Hou, Juanzi Li

LongTraceRL addresses long-context reasoning challenges by generating highly challenging training data and introducing a fine-grained rubric reward, significantly improving evidence-grounded reasoning…

View →
cs.AIcs.LGRecentJun 1, 2026

Extreme Low-Bit Inference in Reasoning Models: Failure Modes and Targeted Recovery

Ekaterina Alimaskina, Darya Rudas, Denis Shveykin, Gleb Molodtsov +2 more

The paper analyzes the failure modes of aggressive 2-bit quantization in large reasoning models, proposing lightweight controls like FP16 planning and loop rescue to restore accuracy and achieve pract…

View →
cs.AIRecentJun 1, 2026

An Abstract Worlds Semantic Framework for Belief Change Operators

Daniel Grimaldi, M. Vanina Martinez, Ricardo O. Rodriguez

The paper introduces Abstract Worlds Semantics (AWS), a set-theoretic framework that treats worlds as primitive elements to provide a unified and generalized analysis of various belief change models.

View →
cs.CLRecentMay 31, 2026

On the Generalization Gap in Self-Evolving Language Model Reasoning

Zhenting Qi, Susanna Maria Baby, Stefanie Anna Baby, Kan Yuan +4 more

The paper investigates the limits of self-evolution in LLM reasoning under closed-loop settings, finding that while self-improvement is significant, it consistently falls short of perfect oracle super…

View →