~ similar to 2606.02488· 20 results
The paper proposes GroundedCache, an evidence-validated cache router that significantly improves the safety of reusing cached semantic answers in RAG systems by requiring multiple gates to validate th…
This paper proposes a multi-turn retrieval-augmented generation pipeline for conversational systems across four domains.
Zhen Chen, Yibing Liu, Weihao Xie, Yu Liang +2 more
The paper proposes formulating RAG design as an architecture search problem and introduces RAISE, a comprehensive framework and benchmark for systematically optimizing RAG hyperparameters.
Joongmin Shin, Gyuho Shim, Jeongbae Park, Jaehyung Seo +1 more
HiKEY proposes a hierarchical, tree-based multimodal retrieval framework that significantly improves open-domain document question answering by addressing document routing and evidence fragmentation.
The paper proposes InSemRAG, an enhanced RAG framework that improves retrieval accuracy and knowledge integrity by incorporating intent-aware retrieval and semantics-preserving chunking, achieving sta…
Haochun Tang, Yuliang Yan, Jiahua Lu, Huaxiao Liu +1 more
The paper introduces R$^2$A, an adversarial attack that uses suffix optimization to mislead black-box LLM routers into consistently selecting expensive, high-capability models.
The paper proposes DART, a test-time adaptation method that enhances zero-resource dense retrieval reranking by adaptively tuning a bilinear scoring matrix using pseudo-positive and pseudo-negative ex…
Zhenghua Bao, Fengya Tian, Chris Zhang, Zhenjun Chen +2 more
OrcaRouter is a production-ready LLM router that uses a hybrid offline-online learning approach to efficiently select the best large language model for an incoming query, achieving high accuracy at lo…
The paper introduces 'Routing Hijacking,' a severe attack where malicious clients forge semantic profiles in Federated RAG systems to misroute target queries, and proposes a trust-aware post-routing f…
Chengcai Gao, Zhihong Sun, Xiaochuan Shi, Qiufeng Wang +1 more
The paper proposes BiRD, a bidirectional ranking defense mechanism that enhances the robustness of Retrieval-Augmented Generation (RAG) against adversarial attacks by analyzing the alignment between f…
The paper proposes the Sentinel-Strategist architecture, an adaptive defense mechanism that selectively deploys security measures in Retrieval-Augmented Generation (RAG) systems to significantly reduc…
Ziyu Song, Jiaming Fang, Kuangyu Li, Tuo Xia +1 more
This paper proposes Tail-Aware Adaptive-k (TAA-k), a training-free framework for adaptive context selection in retrieval-augmented generation systems using Extreme Value Theory.
肖代替了视觉令牌的永久删除,通过可恢复的路由来改进视觉语言模型的性能
SilentRetrieval introduces a sophisticated, two-stage data poisoning attack that successfully hijacks Retrieval-Augmented Generation (RAG) systems by injecting adversarially crafted, yet highly fluent…
The paper introduces Entity-Collision, a rigorous protocol that separates genuine retrieval lift from simple lexical overlap, demonstrating that embedder performance depends critically on the query ty…
The paper proposes a rigorous, fixed-budget, cluster-aware standard for LLM-as-a-judge evaluation of multi-hop RAG systems, demonstrating that current evaluation methods often overstate performance.
Daize Dong, Junlin Chen, Haolong Jia, Jiawei Wu +8 more
The paper proposes Predictive Routing Replay (PR2) to stabilize reinforcement learning on Mixture of Experts (MoE) LLMs by predicting and incorporating short-horizon router evolution during training a…
Gangmuk Lim, Wanyu Zhao, Brighten Godfrey, Jiaxin Shan +2 more
Lodestar is a novel online learning-based request routing system that significantly improves LLM inference efficiency by dynamically assigning incoming requests to the optimal GPU instance to minimize…
Hyesung Ji, Hyunah Yu, Jongmin Kim, Wonseok Choi +2 more
GPIR is a GPU-accelerated Private Information Retrieval (PIR) system that significantly boosts throughput by introducing a stage-aware hybrid execution model and optimizing data layouts for modern GPU…
Shenghao Ye, Yu Guo, Zhengheng Li, Shuangwu Chen +1 more
The paper proposes RoRo, a rubric-guided process reward framework that improves stepwise model routing by evaluating the quality of intermediate reasoning steps, leading to better performance and cost…