~ similar to 2606.03151· 19 results
Lixuan Guo, Yifei Wang, Tiansheng Wen, Aosong Feng +2 more
The paper introduces Single-stage Sparse Retrieval (SSR), a method that replaces computationally expensive vector clustering with sparse autoencoding to achieve highly efficient multi-vector retrieval…
Zhen Chen, Yibing Liu, Weihao Xie, Yu Liang +2 more
The paper proposes formulating RAG design as an architecture search problem and introduces RAISE, a comprehensive framework and benchmark for systematically optimizing RAG hyperparameters.
Onyx proposes a novel, cost-efficient disk-oblivious Approximate Nearest Neighbor (ANN) search system that significantly reduces both cost and latency compared to state-of-the-art methods.
The paper proposes a Privacy-Preserving Product-Quantization Approximate Nearest Neighbor (PPPQ-ANN) framework that achieves practical performance and strong privacy guarantees for large-scale nearest…
This paper empirically evaluates the performance of the Polars DataFrame engine running within Intel SGX2 enclaves, finding that while the overall security overhead is manageable, the performance is s…
Yibo Wang, Nikki Lijing Kuang, Philip S. Yu, Zhewei Yao +1 more
The paper proposes MERIT, a dual-level, multi-horizon memory retrieval framework that significantly improves the performance of interactive text-to-SQL agents by providing both global and local memory…
Hyesung Ji, Hyunah Yu, Jongmin Kim, Wonseok Choi +2 more
GPIR is a GPU-accelerated Private Information Retrieval (PIR) system that significantly boosts throughput by introducing a stage-aware hybrid execution model and optimizing data layouts for modern GPU…
The elasticAI.explorer is an extensible, unified Python framework that simplifies hardware-aware Neural Architecture Search (NAS) by decoupling search space definition from model implementation and de…
The paper proposes GroundedCache, an evidence-validated cache router that significantly improves the safety of reusing cached semantic answers in RAG systems by requiring multiple gates to validate th…
The paper introduces Entity-Collision, a rigorous protocol that separates genuine retrieval lift from simple lexical overlap, demonstrating that embedder performance depends critically on the query ty…
The paper proposes SubFit, a novel compression technique that achieves superior LLM compression by replacing non-contiguous, submodule-level components (Attention and FeedForward) with lightweight res…
The paper proposes Dynamic Adapter Routing (DAR), a novel method that significantly improves continual multimodal retrieval by adaptively selecting and merging specialized adapters.
The paper proposes moving the query instead of the KV-cache during cross-instance attention, demonstrating that this approach is significantly cheaper than moving the cache, especially on modern GPU f…
The paper proposes DINOSAUR, a framework that incorporates embedding uncertainty into Approximate Nearest Neighbour search to improve retrieval for niche, long-tail content.
The paper introduces Rotary GPU, an exploratory execution approach demonstrating that large Mixture-of-Experts models can be run locally on consumer GPUs with limited VRAM, achieving usable decode thr…
Hanxi Li, Jianan Zhou, Jiale Lao, Yibo Wang +4 more
The paper introduces the Black-Hole Attack, a poisoning vulnerability that exploits geometric defects in high-dimensional embedding spaces to force malicious vectors into the top-k results of vector d…
The paper proposes DART, a test-time adaptation method that enhances zero-resource dense retrieval reranking by adaptively tuning a bilinear scoring matrix using pseudo-positive and pseudo-negative ex…
RASER introduces a family of cheap, router-based systems that selectively decide whether to perform expensive multi-hop retrieval, significantly reducing LLM token costs while maintaining state-of-the…
Yu Liu, Kun Peng, Wenxiao Zhang, Fangfang Yuan +3 more
Trans-RAG introduces a novel query-centric vector transformation technique to enable secure, efficient, and accurate cross-organizational retrieval in RAG systems without plaintext decryption.