"Document ranking" | ArxivCSExplorer

20 results for “Document ranking”

CS papers only

Hybrid search: Keyword + semantic, ranked by combined score.ⓘ

Want pure semantic search? Try claim verification →

cs.IREmpiricalRecentJun 10, 2026

CompRank: Efficient LLM Reranking via Token-Level Compression and Decoding-Free Scoring

Xuan Lu, Haohang Huang, Yingqi Fan, Junlong Tong +4 more

This paper proposes CompRank, a token-efficient reranking framework for large language models that reduces redundant computation and achieves strong reranking performance.

View →

cs.DCcs.AIcs.CLRecentJun 1, 2026

Compliance-Scored Best-of-N Guardrail Orchestration for Multimodal Document Generation in Payments Dispute Defense

Nataraj Agaram Sundar, Tejas Morabia

The paper introduces a novel guardrail orchestration layer that improves the compliance and efficiency of high-stakes multimodal document generation by scoring multiple generated candidates against we…

View →

cs.IRcs.AIRecentMay 30, 2026

SkillPager: Query-Adaptive Intra-Skill Navigation via Semantic Node Retrieval

Zicai Cui, Zihan Guo, Weiwen Liu, Weinan Zhang

SkillPager is a novel two-stage framework that efficiently selects minimal, execution-sufficient context from large procedural skill documents by leveraging typed semantic nodes, significantly reducin…

View →

cs.IRcs.AIcs.LGRecentMay 31, 2026

Test-Time Training for Zero-Resource Dense Retrieval Reranking

Shiyan Liu, Yichen Li

The paper proposes DART, a test-time adaptation method that enhances zero-resource dense retrieval reranking by adaptively tuning a bilinear scoring matrix using pseudo-positive and pseudo-negative ex…

View →

cs.DCcs.AIcs.CLRecentJun 1, 2026

Self-Conditioned Positional HNSW for Overlap-Aware Retrieval in Chunked-Document RAG Systems: Method and Industrial Evidence-Quality Audit

Nataraj Agaram Sundar, Tejas Morabia

The paper introduces Self-Conditioned Positional HNSW (SCP-HNSW), a method that modifies chunk embeddings and retrieval process to mitigate redundant evidence retrieval from overlapping document chunk…

View →

cs.CRcs.IRRecentMay 19, 2026

BiRD: A Bidirectional Ranking Defense Mechanism for Retrieval Augmented Generation

Chengcai Gao, Zhihong Sun, Xiaochuan Shi, Qiufeng Wang +1 more

The paper proposes BiRD, a bidirectional ranking defense mechanism that enhances the robustness of Retrieval-Augmented Generation (RAG) against adversarial attacks by analyzing the alignment between f…

View →

cs.AIcs.IRRecentMay 28, 2026

HiKEY: Hierarchical Multimodal Retrieval for Open-Domain Document Question Answering

Joongmin Shin, Gyuho Shim, Jeongbae Park, Jaehyung Seo +1 more

HiKEY proposes a hierarchical, tree-based multimodal retrieval framework that significantly improves open-domain document question answering by addressing document routing and evidence fragmentation.

View →

cs.AIcs.IRRecentMay 28, 2026

Rethinking Literature Search Evaluation: Deep Research Helps, and Human Citation Lists Are Not a Ground Truth

Gaurav Sahu, Laurent Charlin, Christopher Pal

The paper introduces a Deep Research pipeline that significantly improves literature search recall and demonstrates that human-curated citation lists are often unreliable and do not serve as a true gr…

View →

cs.CLcs.IREmpiricalRecentJun 10, 2026

uva-irlab-conv at SemEval-2026 Task 8: Multi-Turn RAG with Learned Sparse Retrieval and Listwise Reranking

Simon Lupart, Kidist Amde Mekonnen, Zahra Abbasiantaeb, Mohammad Aliannejadi

This paper proposes a multi-turn retrieval-augmented generation pipeline for conversational systems across four domains.

View →

cs.CLcs.AIcs.CVRecentMay 31, 2026

Dr. DocBench: A Comprehensive Benchmark for Expert-Level and Difficult Document Parsing

Minglai Yang, Xinyan Velocity Yu, Pengyuan Li, Xinyu Guo +21 more

The paper introduces Dr. DocBench, a difficulty-aware, comprehensive benchmark designed to rigorously test expert-level and challenging document parsing capabilities for VLMs, demonstrating that curre…

View →

cs.CLcs.AIcs.LGRecentMay 27, 2026

Enhancing BiGRU with a KAN Block for Legal Document Classification and Summarization

Ahmed Faizul Haque Dhrubo, Souvik Pramanik, Most. Aysha Siddika Sumona, Shahnewaz Siddique +3 more

The paper proposes a novel KAN-enhanced BiGRU architecture to improve legal document classification and summarization in a low-resource, multilingual setting using Bengali and English legal texts.

View →

cs.DScs.DMTheoreticalRecentJun 11, 2026

(Un)ranking Permutation Classes

Nathanaël Hassler, Vincent Vajnovszki

This paper presents methods for ranking and unranking permutations avoiding a pattern of length three in lexicographic or colexicographic order.

View →

cs.IRcs.AIRecentMay 29, 2026

SPECTRA: Synthetic IR Test Collections with Relevance Oracles and Controlled Distractor Diagnostics

Eric Liang

The paper introduces SPECTRA, a scalable framework for generating large, synthetic, and controllable information retrieval test collections, demonstrating its ability to expose system scaling and fail…

View →

cs.IREmpiricalRecentJun 10, 2026

Tail-Aware Adaptive-k: Query-Adaptive Context Selection for Retrieval-Augmented Generation

Ziyu Song, Jiaming Fang, Kuangyu Li, Tuo Xia +1 more

This paper proposes Tail-Aware Adaptive-k (TAA-k), a training-free framework for adaptive context selection in retrieval-augmented generation systems using Extreme Value Theory.

View →

cs.CRcs.AIRecentMay 19, 2026

Security Document Classification with a Fine-Tuned Local Large Language Model: Benchmark Data and an Open-Source System

Ivan Dobrovolskyi

The paper introduces TorchSight, an open-source local system using a fine-tuned Qwen 3.5 27B model that achieves high accuracy (95.0%) in classifying sensitive security documents without relying on ex…

View →

cs.CLcs.LGRecentMay 29, 2026

Scaling Multi-Hop Training Data via Graph-Constrained Path Selection

Pengyu Chen, Yonggang Zhang, Mingming Chen, Jun Song +2 more

The paper proposes a graph-constrained approach to scale multi-hop training data by decoupling path discovery from path verbalization, significantly expanding the usable corpus size for LLMs.

View →

cs.DLcs.CLRecentMay 31, 2026

Digging Up Citations: FOSSIL, a Dataset and Workflow for Reference Extraction in Law and the Humanities

Luca Foppiano, Christian Boulanger

The paper introduces FOSSIL, a new multilingual dataset and specialized workflow designed to significantly improve the extraction of citations embedded within complex footnotes common in law and human…

View →

cs.CLcs.AIcs.CVRecentJun 4, 2026

Benchmarking Open-Source Layout Detection Models for Data Snapshot Extraction from Institutional Documents

AJ Carl P. Dy, Aivin V. Solatorio

This paper introduces a new benchmark dataset and evaluation framework for 'data snapshot extraction,' focusing on identifying and localizing semantically meaningful analytical artifacts within operat…

View →

cs.IRcs.AIcs.CLRecentMay 29, 2026

Reading Between the Citations: A Typed Claim Network for Scientific Literature

Ning Ding, Sergio J. Rodríguez Méndez, Pouya G. Omran

The paper introduces a typed claim network that models cross-document references by explicitly labeling the stance (e.g., agreement, disagreement) of a citation, significantly improving downstream tas…

View →