Papers similar to 2606.01697

~ similar to 2606.01697· 19 results

cs.CLcs.IREmpiricalRecentJun 10, 2026

uva-irlab-conv at SemEval-2026 Task 8: Multi-Turn RAG with Learned Sparse Retrieval and Listwise Reranking

Simon Lupart, Kidist Amde Mekonnen, Zahra Abbasiantaeb, Mohammad Aliannejadi

This paper proposes a multi-turn retrieval-augmented generation pipeline for conversational systems across four domains.

View →

cs.CLRecentMay 31, 2026

Efficient RAG with Intent-Aware Retrieval and Semantics-Preserving Chunking

Fachrina Dewi Puspitasari, Chaoning Zhang, Jiaquan Zhang, Zhicheng Wang +5 more

The paper proposes InSemRAG, an enhanced RAG framework that improves retrieval accuracy and knowledge integrity by incorporating intent-aware retrieval and semantics-preserving chunking, achieving sta…

View →

cs.AIcs.IRRecentMay 28, 2026

Xetrieval: Mechanistically Explaining Dense Retrieval

Zhixin Cai, Jun Bai, Yang Liu, Jiaqi Li +6 more

Xetrieval introduces an embedding-level framework to mechanistically explain dense retrieval decisions by decomposing high-dimensional embeddings into sparse, human-interpretable features.

View →

cs.CLcs.AIcs.IRRecentMay 28, 2026

GrepSeek: Training Search Agents for Direct Corpus Interaction

Alireza Salemi, Chang Zeng, Atharva Nijasure, Jui-Hui Chung +3 more

GrepSeek introduces a novel direct corpus interaction (DCI) search agent that trains an LLM to find and compose evidence from large text corpora by issuing executable shell commands, achieving state-o…

View →

cs.IRcs.AIcs.CLRecentMay 29, 2026

On the impact of retrieved content representations in RAG Pipelines

Jonathan J Ross, Bevan Koopman, Anton van der Vegt, Guido Zuccon

The paper systematically compares multiple content representations for RAG pipelines and finds that answer retention—the ability of the representation to preserve the original answer-bearing content—i…

View →

cs.IRcs.AIRecentMay 30, 2026

SkillPager: Query-Adaptive Intra-Skill Navigation via Semantic Node Retrieval

Zicai Cui, Zihan Guo, Weiwen Liu, Weinan Zhang

SkillPager is a novel two-stage framework that efficiently selects minimal, execution-sufficient context from large procedural skill documents by leveraging typed semantic nodes, significantly reducin…

View →

cs.IRcs.AIRecentMay 30, 2026

Critic-R: Improving Agentic Search using Instruction-tuned Retrievers with Natural Language Introspective Feedback

Md Zarif Ul Alam, Alireza Salemi, Hamed Zamani

Critic-R introduces a novel framework that uses a critic model to provide natural language introspective feedback, significantly improving the performance of agentic search systems by optimizing retri…

View →

cs.IREmpiricalRecentJun 10, 2026

CompRank: Efficient LLM Reranking via Token-Level Compression and Decoding-Free Scoring

Xuan Lu, Haohang Huang, Yingqi Fan, Junlong Tong +4 more

This paper proposes CompRank, a token-efficient reranking framework for large language models that reduces redundant computation and achieves strong reranking performance.

View →

cs.AIRecentMay 27, 2026

Plan Before Search: Search Agents Need Plan

Zhipeng Qian, Zihan Liang, Yufei Ma, Ben Chen +6 more

The paper introduces Plan, a structured agentic behavior that decomposes multi-hop questions into ordered sub-questions before retrieval, and proposes a self-bootstrapping paradigm to train it without…

View →

cs.AIcs.IRcs.LGRecentMay 28, 2026

CoHyDE: Iterative Co-Training of LLM Rewriter & Dense Encoder for Tool Retrieval

Vaishali Senthil, Ashutosh Hathidara, Sebastian Schreiber

CoHyDE introduces an iterative co-training framework that jointly optimizes an LLM rewriter and a dense encoder, significantly improving tool retrieval accuracy for LLM agents, especially on vague que…

View →

cs.CLcs.IRRecentMay 29, 2026

Beyond Static Dialogues: Benchmarking Realistic, Heterogeneous, and Evolving Long-Term Memory

Han Zhang, Zihao Tang, Xin Yu, Xiao Liu +7 more

The paper introduces RHELM, a new benchmark designed to test LLMs' long-term memory by simulating realistic, complex, and evolving dialogues that integrate multiple heterogeneous data sources.

View →

cs.CLcs.AIEmpiricalRecentJun 11, 2026

Learning to Reason by Analogy via Retrieval-Augmented Reinforcement Fine-Tuning

Zilin Xiao, Qi Ma, Chun-cheng Jason Chen, Xintao Chen +3 more

This paper proposes a post-training framework called Retrieval-Augmented Reinforcement Fine-Tuning (RA-RFT) to teach language models to reason by analogy.

View →

cs.CLRecentMay 31, 2026

ExpWeaver: LLM Agents Learn from Experience via Latent RAG

Tao Feng, Tianyang Luo, Jingjun Xu, Zhigang Hua +4 more

ExpWeaver introduces a novel framework for LLM agents to learn from past experiences using latent retrieval-augmented generation, achieving state-of-the-art performance while significantly improving t…

View →

cs.CLRecentMay 30, 2026

Chunking Methods on Retrieval-Augmented Generation - Effectiveness Evaluation Against Computational Cost and Limitations

Mateusz Śmigielski, Michał Rajkowski, Mateusz Zbrocki, Michał Bernacki-Janson +4 more

This study systematically evaluates a wide range of chunking methods for Retrieval-Augmented Generation (RAG) to assess their effectiveness and highlight the overlooked challenges associated with chun…

View →

cs.IRcs.AIRecentMay 29, 2026

MIMO: Multilingual Information Retrieval via Monolingual Objectives

Youngjoon Jang, Seongtae Hong, Heuiseok Lim

The paper proposes MIMO, a two-stage framework that improves Multilingual Information Retrieval (MLIR) by stabilizing cross-lingual alignment and enhancing retrieval discrimination using a combination…

View →

cs.IRcs.AIcs.LGRecentMay 31, 2026

Test-Time Training for Zero-Resource Dense Retrieval Reranking

Shiyan Liu, Yichen Li

The paper proposes DART, a test-time adaptation method that enhances zero-resource dense retrieval reranking by adaptively tuning a bilinear scoring matrix using pseudo-positive and pseudo-negative ex…

View →

cs.CRcs.CLcs.IRRecentMay 27, 2026

SilentRetrieval: Hijacking Retrieval-Augmented Generation via Semantically-Preserving Adversarial Data Poisoning

Jiachen Qian

SilentRetrieval introduces a sophisticated, two-stage data poisoning attack that successfully hijacks Retrieval-Augmented Generation (RAG) systems by injecting adversarially crafted, yet highly fluent…

View →

cs.IREmpiricalRecentJun 10, 2026

Tail-Aware Adaptive-k: Query-Adaptive Context Selection for Retrieval-Augmented Generation

Ziyu Song, Jiaming Fang, Kuangyu Li, Tuo Xia +1 more

This paper proposes Tail-Aware Adaptive-k (TAA-k), a training-free framework for adaptive context selection in retrieval-augmented generation systems using Extreme Value Theory.

View →

cs.CLcs.AIRecentMay 31, 2026

CA-BED: Conversation-Aware Bayesian Experimental Design

Daniel Arnould, Rashad Aziz, Zixuan Kang, Tanav Changal +4 more

CA-BED is a novel framework that improves LLM performance in interactive question-answering by integrating Bayesian Experimental Design to strategically select questions that maximize information gain…

View →