Papers similar to 2606.03151

~ similar to 2606.03151· 19 results

cs.IRcs.AIcs.LGRecentMay 28, 2026

No More K-means: Single-Stage Sparse Coding for Efficient Multi-Vector Retrieval

Lixuan Guo, Yifei Wang, Tiansheng Wen, Aosong Feng +2 more

The paper introduces Single-stage Sparse Retrieval (SSR), a method that replaces computationally expensive vector clustering with sparse autoencoding to achieve highly efficient multi-vector retrieval…

View →

cs.AIRecentMay 28, 2026

RAISE: RAG Design as an Architecture Search Problem

Zhen Chen, Yibing Liu, Weihao Xie, Yu Liang +2 more

The paper proposes formulating RAG design as an architecture search problem and introduces RAISE, a comprehensive framework and benchmark for systematically optimizing RAG hyperparameters.

View →

cs.CRcs.AIRecentApr 22, 2026

Onyx: Cost-Efficient Disk-Oblivious ANN Search

Deevashwer Rathee, Jean-Luc Watson, Zirui Neil Zhao, G. Edward Suh +1 more

Onyx proposes a novel, cost-efficient disk-oblivious Approximate Nearest Neighbor (ANN) search system that significantly reduces both cost and latency compared to state-of-the-art methods.

View →

cs.CRRecentApr 20, 2026

Privacy-Preserving Product-Quantized Approximate Nearest Neighbor Search Framework for Large-scale Datasets via A Hybrid of Fully Homomorphic Encryption and Trusted Execution Environment

Shozo Saeki, Minoru Kawahara, Hirohisa Aman

The paper proposes a Privacy-Preserving Product-Quantization Approximate Nearest Neighbor (PPPQ-ANN) framework that achieves practical performance and strong privacy guarantees for large-scale nearest…

View →

cs.CRcs.DBRecentMay 20, 2026

Polars inside Intel SGX2 Enclaves: An Empirical Study of Confidential Analytical Query Processing

Wei Wang, Burns Smith, Kenny Leftin

This paper empirically evaluates the performance of the Polars DataFrame engine running within Intel SGX2 enclaves, finding that while the overall security overhead is manageable, the performance is s…

View →

cs.CLRecentMay 30, 2026

Learning to Retrieve: Dual-Level Long-Term Memory for Text-to-SQL Agents

Yibo Wang, Nikki Lijing Kuang, Philip S. Yu, Zhewei Yao +1 more

The paper proposes MERIT, a dual-level, multi-horizon memory retrieval framework that significantly improves the performance of interactive text-to-SQL agents by providing both global and local memory…

View →

cs.CRcs.ARRecentApr 6, 2026

GPIR: Enabling Practical Private Information Retrieval with GPUs

Hyesung Ji, Hyunah Yu, Jongmin Kim, Wonseok Choi +2 more

GPIR is a GPU-accelerated Private Information Retrieval (PIR) system that significantly boosts throughput by introducing a stage-aware hybrid execution model and optimizing data layouts for modern GPU…

View →

cs.ARRecentMay 28, 2026

elasticAI.explorer: Towards a Unified End-to-End Framework for Hardware-Aware Neural Architecture Search

Natalie Maman, Florian Hettstedt, Andreas Erbslöh, Gregor Schiele

The elasticAI.explorer is an extensible, unified Python framework that simplifies hardware-aware Neural Architecture Search (NAS) by decoupling search space definition from model implementation and de…

View →

cs.CRcs.AIcs.CLRecentMay 26, 2026

Grounded Cache Routing for Retrieval-Augmented Generation: When Is It Safe to Reuse an Answer?

Syed Huma Shah

The paper proposes GroundedCache, an evidence-validated cache router that significantly improves the safety of reusing cached semantic answers in RAG systems by requiring multiple gates to validate th…

View →

cs.CLcs.AIcs.IRRecentMay 28, 2026

Entity-Collision: A Stratified Protocol for Attributing Retrieval Lift in Agent Memory

Youwang Deng

The paper introduces Entity-Collision, a rigorous protocol that separates genuine retrieval lift from simple lexical overlap, demonstrating that embedder performance depends critically on the query ty…

View →

cs.CLcs.AIRecentJun 1, 2026

From Layers to Submodules: Rethinking Granularity in Replacement-Based LLM Compression

Elia Cunegatti, Marcus Vukojevic, Erik Nielsen, Giovanni Iacca

The paper proposes SubFit, a novel compression technique that achieves superior LLM compression by replacing non-contiguous, submodule-level components (Attention and FeedForward) with lightweight res…

View →

cs.CVcs.AIRecentMay 29, 2026

Beyond Classification: Dynamic Adapter Routing for Continual Multimodal Retrieval

Alicja Dobrzeniecka, Filip Szatkowski, Sebastian Cygert, Szymon Lukasik +1 more

The paper proposes Dynamic Adapter Routing (DAR), a novel method that significantly improves continual multimodal retrieval by adaptively selecting and merging specialized adapters.

View →

cs.DCcs.AIcs.NIRecentMay 31, 2026

Move the Query, Not the Cache: Characterizing Cross-Instance Latent Attention Redistribution Across GPU Fabrics

Bole Ma, Jan Eitzinger, Harald Köstler, Gerhard Wellein

The paper proposes moving the query instead of the KV-cache during cross-instance attention, demonstrating that this approach is significantly cheaper than moving the cache, especially on modern GPU f…

View →

cs.IRcs.LGstat.MLRecentJun 3, 2026

Distributional Approximate Nearest Neighbour Search for Uncertainty-Aware Retrieval

Olivier Jeunen

The paper proposes DINOSAUR, a framework that incorporates embedding uncertainty into Approximate Nearest Neighbour search to improve retrieval for niche, long-tail content.

View →

cs.PFcs.ARcs.DCRecentMay 27, 2026

Rotary GPU: Exploring Local Execution Paths for Large Mixture-of-Experts Models Under Limited GPU Memory

Myeong Jun Jo

The paper introduces Rotary GPU, an exploratory execution approach demonstrating that large Mixture-of-Experts models can be run locally on consumer GPUs with limited VRAM, achieving usable decode thr…

View →

cs.CRcs.DBRecentApr 7, 2026

Can You Trust the Vectors in Your Vector Database? Black-Hole Attack from Embedding Space Defects

Hanxi Li, Jianan Zhou, Jiale Lao, Yibo Wang +4 more

The paper introduces the Black-Hole Attack, a poisoning vulnerability that exploits geometric defects in high-dimensional embedding spaces to force malicious vectors into the top-k results of vector d…

View →

cs.IRcs.AIcs.LGRecentMay 31, 2026

Test-Time Training for Zero-Resource Dense Retrieval Reranking

Shiyan Liu, Yichen Li

The paper proposes DART, a test-time adaptation method that enhances zero-resource dense retrieval reranking by adaptively tuning a bilinear scoring matrix using pseudo-positive and pseudo-negative ex…

View →

cs.AIRecentJun 1, 2026

RASER: Recoverability-Aware Selective Escalation Router for Multi-Hop Question Answering

Yuyang Li, Zihe Yan, Tobias Käfer

RASER introduces a family of cheap, router-based systems that selectively decide whether to perform expensive multi-hop retrieval, significantly reducing LLM token costs while maintaining state-of-the…

View →

cs.CRcs.IRRecentApr 10, 2026

Trans-RAG: Query-Centric Vector Transformation for Secure Cross-Organizational Retrieval

Yu Liu, Kun Peng, Wenxiao Zhang, Fangfang Yuan +3 more

Trans-RAG introduces a novel query-centric vector transformation technique to enable secure, efficient, and accurate cross-organizational retrieval in RAG systems without plaintext decryption.

View →