ArXivCSExplorer
☆☆Bookmarks🏆RSSHow to UseFAQ
Built with and by Teycir Ben Soltane•
How to Use•FAQ•GitHub•arXiv.org•
Share:

~ similar to 2606.13680· 20 results

cs.AIRecentMay 27, 2026

CORE: Contrastive Reflection Enables Rapid Improvements in Reasoning

Linas Nasvytis, Simon Jerome Han, Ben Prystawski, Satchel Grant +2 more

The paper introduces Contrastive Reflection (CORE), a novel non-parametric method that rapidly improves language model reasoning by distilling contrasts between successful and unsuccessful problem att…

View →
cs.CLcs.AIcs.LGRecentMay 29, 2026

LongTraceRL: Learning Long-Context Reasoning from Search Agent Trajectories with Rubric Rewards

Nianyi Lin, Jiajie Zhang, Lei Hou, Juanzi Li

LongTraceRL addresses long-context reasoning challenges by generating highly challenging training data and introducing a fine-grained rubric reward, significantly improving evidence-grounded reasoning…

View →
cs.AIRecentMay 27, 2026

Plan Before Search: Search Agents Need Plan

Zhipeng Qian, Zihan Liang, Yufei Ma, Ben Chen +6 more

The paper introduces Plan, a structured agentic behavior that decomposes multi-hop questions into ordered sub-questions before retrieval, and proposes a self-bootstrapping paradigm to train it without…

View →
cs.IRcs.AIRecentMay 30, 2026

Critic-R: Improving Agentic Search using Instruction-tuned Retrievers with Natural Language Introspective Feedback

Md Zarif Ul Alam, Alireza Salemi, Hamed Zamani

Critic-R introduces a novel framework that uses a critic model to provide natural language introspective feedback, significantly improving the performance of agentic search systems by optimizing retri…

View →
cs.CLRecentJun 1, 2026

When Knowledge Is Not Free: Cost-Aware Evidence Selection in Retrieval-Augmented Generation

Mingyan Wu, Han Yang, Omer Ben-Porat, Yftah Ziser

This paper introduces cost-aware Retrieval-Augmented Generation (RAG), demonstrating that fixed evidence selection is brittle and that adaptive, agentic controllers are necessary for effective knowled…

View →
cs.CLcs.AIRecentJun 2, 2026

QUBRIC: Co-Designing Queries and Rubrics for RL Beyond Verifiable Rewards

Rongzhi Zhang, Rui Feng, Zhihan Zhang, Jingfeng Yang +7 more

QUBRIC introduces a co-design framework that simultaneously optimizes queries and rubrics, overcoming the bottleneck of vague rubrics derived from open-ended questions, leading to significant gains in…

View →
cs.CLRecentMay 31, 2026

Beyond Topical Similarity: Contrastive Evidence Retrieval with Interpretable Attention Alignment in RAG

Francielle Vargas, João Robiatti, Diego Alves, Lucas Pascotti Valem +5 more

The paper introduces CERA, a novel contrastive retrieval framework that improves RAG factuality and interpretability by using subjectivity-based hard negative selection and an auxiliary attention alig…

View →
cs.CLcs.AIRecentMay 28, 2026

CRITIC-R1: Learning Structured Critics for Retrieval-Augmented Generation

Wenhan Xiao, Ziwei Zhang, Chuanyue Yu, Xingcheng Fu +3 more

CRITIC-R1 introduces a structured critic framework that treats RAG critique as an explicit error diagnosis problem using reinforcement learning, significantly improving answer quality over strong RAG…

View →
cs.CLRecentMay 29, 2026

MoG: Mixture of Experts for Graph-based Retrieval-Augmented Generation

Zheng Yuan, Chuang Zhou, Linhao Luo, Siyu An +3 more

MoG proposes a novel Mixture of Experts framework for graph-based RAG, which uses hub graphs to guide the sparse activation of domain-specific expert graphs, significantly improving retrieval accuracy…

View →
cs.AIcs.LGRecentMay 27, 2026

Zipping the Thought: When and How Compressed Reasoning Data Works in LLM Post-Training

Kohsei Matsutani, Gouki Minegishi, Takeshi Kojima, Yusuke Iwasawa +1 more

This paper investigates how different types of compressed reasoning data (Explicit, Composed, Implicit CoT) affect LLM performance during post-training, finding that the choice of compression and subs…

View →
cs.CLRecentMay 29, 2026

Unlocking Fine-Grained Translation Quality Estimation in LRMs through Synergistically Evolving Implicit and Explicit Reasoning

Renfei Dang, Xinye Wang, Zhejian Lai, Weilu Xu +4 more

The paper proposes RIEQE, a two-stage training framework that synergistically co-evolves implicit and explicit reasoning capabilities in Large Reasoning Models (LRMs) to significantly improve fine-gra…

View →
cs.CLcs.AIRecentJun 1, 2026

A Primer in Post-Training Reasoning Data: What We Know About How It Works

Yaoming Li, Guangxiang Zhao, Qilong Shi, Lin Sun +2 more

This paper synthesizes over 150 scattered studies and reports to provide the first comprehensive primer on post-training reasoning data, organizing the field around data objects, utility, construction…

View →
cs.CLRecentMay 29, 2026

AdaptR1: Reinforcement Learning Based Adaptive Interleaved Thinking in Multi-hop Question Answering

Yuxin Wang, Jiahao Lu, Qifeng Wu, Shicheng Fang +4 more

AdaptR1 is a novel Reinforcement Learning framework that adaptively manages reasoning effort at every step of multi-hop Question Answering, significantly reducing unnecessary computational cost withou…

View →
cs.AIRecentMay 27, 2026

HRBench: Benchmarking and Understanding Thinking-Mode Switch Strategies in Hybrid-Reasoning LLMs

Yansong Ning, Mianpeng Liu, Jingwen Ye, Weidong Zhang +1 more

The paper introduces HRBench, a unified and comprehensive evaluation framework for systematically benchmarking and comparing various thinking-mode switching strategies in hybrid-reasoning LLMs.

View →
cs.CLRecentMay 30, 2026

Learning to Retrieve: Dual-Level Long-Term Memory for Text-to-SQL Agents

Yibo Wang, Nikki Lijing Kuang, Philip S. Yu, Zhewei Yao +1 more

The paper proposes MERIT, a dual-level, multi-horizon memory retrieval framework that significantly improves the performance of interactive text-to-SQL agents by providing both global and local memory…

View →
cs.AIRecentMay 30, 2026

Latent Reward Steering: An Adaptive Inference-Time Framework that Implicitly Promotes Cognitive Behaviors in Reasoning LLMs

Jiakang Li, Guanyu Zhu, Can Jin, Chenxi Huang +7 more

The paper introduces Latent Reward Steering (LRS), an adaptive inference-time framework that implicitly improves the reasoning ability of LLMs by guiding the model's internal latent states based on a…

View →
cs.CLcs.AIRecentJun 1, 2026

Learning When to Translate for Multilingual Reasoning

Deokhyung Kang, Hyounghun Kim, Gary Geunbae Lee

The paper proposes Luar, a framework that trains reasoning language models to selectively use English translation only when their direct understanding of a non-English input is unreliable, significant…

View →
cs.AIRecentMay 27, 2026

DenoiseRL: Bootstrapping Reasoning Models to Recover from Noisy Prefixes

Caijun Xu, Changyi Xiao, Zhongyuan Peng, Yixin Cao

DenoiseRL is a novel reinforcement learning framework that improves reasoning in large language models by optimizing directly from the failures and incorrect reasoning traces of weak models, eliminati…

View →
cs.CLRecentMay 31, 2026

Efficient RAG with Intent-Aware Retrieval and Semantics-Preserving Chunking

Fachrina Dewi Puspitasari, Chaoning Zhang, Jiaquan Zhang, Zhicheng Wang +5 more

The paper proposes InSemRAG, an enhanced RAG framework that improves retrieval accuracy and knowledge integrity by incorporating intent-aware retrieval and semantics-preserving chunking, achieving sta…

View →
cs.CLRecentMay 31, 2026

Deep Research as Rubric for Reinforcement Learning

Wangyi Mei, Zhouhong Gu, Zhenhan Bai, Yin Cai +8 more

The paper proposes Deep Research as Rubric (DR-rubric), a novel evidence-driven framework that treats rubric construction itself as a research problem to generate fine-grained, scalable reward signals…

View →