Papers similar to 2606.03728

~ similar to 2606.03728· 18 results

cs.CLRecentMay 28, 2026

CanLegalRAGBench: Evaluating Retrieval-Augmented Generation on Canadian Case Law

Ethan Zhao, Maksym Taranukhin, Wei Cui, Moira Aikenhead +1 more

The paper introduces CanLegalRAGBench, a new Canadian legal QA benchmark, and evaluates RAG systems, finding that while open-source models are competitive, automatic evaluations struggle with nuanced…

View →

cs.AIRecentMay 28, 2026

Citation-Closure Retrieval and Per-Rule Attribution for Real-World Regulatory Compliance Question Answering

Yeong-Joon Ju, Seong-Whan Lee

The paper introduces RefWalk, a novel framework designed to improve regulatory compliance question answering by ensuring rigorous citation traceability and explicit per-rule attribution across complex…

View →

cs.CLcs.AIRecentMay 27, 2026

UA-Legal-Bench: A Benchmark for Evaluating Large Language Models on Ukrainian Legal Reasoning

Volodymyr Ovcharov

The paper introduces UA-Legal-Bench, a comprehensive Ukrainian legal reasoning benchmark built from a massive judicial corpus, demonstrating that LLM performance is highly task-dependent and that simp…

View →

cs.CLcs.AIRecentMay 27, 2026

BenGER: Benchmarking LLM Systems on Subsumption-Based Legal Reasoning in German Law

Sebastian Nagl, Ann-Kristin Mayrhofer, Martin Heidebach, Aleyna Koçak +5 more

The paper introduces BenGER, a comprehensive benchmark for evaluating LLMs on German legal reasoning, demonstrating that closed-flagship models perform best and that human-AI co-creation significantly…

View →

cs.CLcs.DLRecentMay 30, 2026

Citation Grounding: Detecting and Reducing LLM Citation Hallucinations via Legal Citation Graphs

Volodymyr Ovcharov

The paper introduces Citation Grounding (CG), a novel metric and framework, to systematically detect and reduce the hallucination of legal citations by verifying LLM outputs against a massive, structu…

View →

cs.DLcs.AIcs.CLRecentMay 27, 2026

Verified Misguidance: Measuring Structural Citation Failures in Search-Augmented LLMs

Yongsik Seo, Wooseok Jeong, Eunyoung Kim, Hyeonseo Jang +1 more

The paper introduces CITETRACE, a large-scale dataset and evaluation framework that systematically measures structural citation failures in search-augmented LLMs, revealing a pattern called Verified M…

View →

cs.AIcs.IRRecentMay 28, 2026

Rethinking Literature Search Evaluation: Deep Research Helps, and Human Citation Lists Are Not a Ground Truth

Gaurav Sahu, Laurent Charlin, Christopher Pal

The paper introduces a Deep Research pipeline that significantly improves literature search recall and demonstrates that human-curated citation lists are often unreliable and do not serve as a true gr…

View →

cs.AIcs.IRRecentMay 28, 2026

Xetrieval: Mechanistically Explaining Dense Retrieval

Zhixin Cai, Jun Bai, Yang Liu, Jiaqi Li +6 more

Xetrieval introduces an embedding-level framework to mechanistically explain dense retrieval decisions by decomposing high-dimensional embeddings into sparse, human-interpretable features.

View →

cs.CLRecentMay 29, 2026

Bundesrecht: An Open Library and Corpus for German Statutory Reference Processing

Harshil Darji, Martin Heckelmann, Christina Kratsch, Gerard de Melo

The paper introduces 'bundesrecht,' an open-source, end-to-end pipeline for processing complex German statutory references, which parses, normalizes, and resolves raw citation strings into structured,…

View →

cs.CLcs.AIRecentMay 28, 2026

Multi-Legal-Bench: Evaluating LLMs on Legal Reasoning Across Jurisdictions, Languages, and Legal Traditions

Volodymyr Ovcharov

The paper introduces Multi-Legal-Bench, a novel cross-jurisdictional benchmark evaluating LLMs on five standardized legal reasoning tasks across six diverse countries, demonstrating that cross-lingual…

View →

cs.IRcs.AIcs.CLRecentMay 29, 2026

On the impact of retrieved content representations in RAG Pipelines

Jonathan J Ross, Bevan Koopman, Anton van der Vegt, Guido Zuccon

The paper systematically compares multiple content representations for RAG pipelines and finds that answer retention—the ability of the representation to preserve the original answer-bearing content—i…

View →

cs.IREmpiricalRecentJun 10, 2026

CompRank: Efficient LLM Reranking via Token-Level Compression and Decoding-Free Scoring

Xuan Lu, Haohang Huang, Yingqi Fan, Junlong Tong +4 more

This paper proposes CompRank, a token-efficient reranking framework for large language models that reduces redundant computation and achieves strong reranking performance.

View →

cs.CLcs.AIcs.MARecentMay 27, 2026

LegalGraphRAG: Multi-Agent Graph Retrieval-Augmented Generation for Reliable Legal Reasoning

Zerui Chen, Qinggang Zhang, Zhishang Xiang, Zhimin Wei +4 more

LegalGraphRAG introduces a multi-agent, hierarchical graph retrieval-augmented generation framework to overcome the limitations of traditional RAG in legal domains, achieving state-of-the-art reliable…

View →

cs.AIcs.LGRecentMay 28, 2026

When Does Persona Prompting Actually Help? A Retrieval and Metric Analysis of Expert Role Injection in LLMs

Shuai Xiao, Su Liu, Weikai Zhou, Jialun Wu +3 more

Persona prompting does not universally improve LLM performance; instead, it systematically trades increased expertise depth for reduced clarity, making multi-metric evaluation essential.

View →

cs.CLcs.AIcs.LGRecentMay 30, 2026

Short-form Text Rewriting with Phi Silica

Divya Tadimeti, Shawn Pan, Sameera Lanka, Chenghui Zhou +1 more

This paper demonstrates that targeted adaptation of the small language model Phi Silica, using dataset curation and fine-tuning, significantly improves its performance in short-form text rewriting, na…

View →

cs.CLRecentMay 28, 2026

Generating and Refining Dynamic Evaluation Rubrics for LLM-as-a-Judge

Zijie Wang, Eduardo Blanco

The paper introduces a novel, training-free method to automatically generate fine-grained evaluation rubrics for LLM-as-a-Judge, and further proposes an iterative fine-tuning strategy that significant…

View →

cs.IRcs.AIcs.CYRecentMay 27, 2026

Whose Name Comes Up? III: Persona Prompting Effects in LLM-Based Scholar Recommendation

Annabella Sánchez-Guzmán, Lukas Eberhard, Denis Helic, Lisette Espín-Noboa

The paper proposes a comprehensive benchmark to systematically audit how varying persona prompts and model choices affect the technical quality and social representativeness of scholar recommendations…

View →

cs.CLRecentMay 31, 2026

Beyond Topical Similarity: Contrastive Evidence Retrieval with Interpretable Attention Alignment in RAG

Francielle Vargas, João Robiatti, Diego Alves, Lucas Pascotti Valem +5 more

The paper introduces CERA, a novel contrastive retrieval framework that improves RAG factuality and interpretability by using subjectivity-based hard negative selection and an auxiliary attention alig…

View →