~ similar to 2606.03728· 18 results
Ethan Zhao, Maksym Taranukhin, Wei Cui, Moira Aikenhead +1 more
The paper introduces CanLegalRAGBench, a new Canadian legal QA benchmark, and evaluates RAG systems, finding that while open-source models are competitive, automatic evaluations struggle with nuanced…
The paper introduces RefWalk, a novel framework designed to improve regulatory compliance question answering by ensuring rigorous citation traceability and explicit per-rule attribution across complex…
The paper introduces UA-Legal-Bench, a comprehensive Ukrainian legal reasoning benchmark built from a massive judicial corpus, demonstrating that LLM performance is highly task-dependent and that simp…
The paper introduces BenGER, a comprehensive benchmark for evaluating LLMs on German legal reasoning, demonstrating that closed-flagship models perform best and that human-AI co-creation significantly…
The paper introduces Citation Grounding (CG), a novel metric and framework, to systematically detect and reduce the hallucination of legal citations by verifying LLM outputs against a massive, structu…
Yongsik Seo, Wooseok Jeong, Eunyoung Kim, Hyeonseo Jang +1 more
The paper introduces CITETRACE, a large-scale dataset and evaluation framework that systematically measures structural citation failures in search-augmented LLMs, revealing a pattern called Verified M…
The paper introduces a Deep Research pipeline that significantly improves literature search recall and demonstrates that human-curated citation lists are often unreliable and do not serve as a true gr…
Zhixin Cai, Jun Bai, Yang Liu, Jiaqi Li +6 more
Xetrieval introduces an embedding-level framework to mechanistically explain dense retrieval decisions by decomposing high-dimensional embeddings into sparse, human-interpretable features.
The paper introduces 'bundesrecht,' an open-source, end-to-end pipeline for processing complex German statutory references, which parses, normalizes, and resolves raw citation strings into structured,…
The paper introduces Multi-Legal-Bench, a novel cross-jurisdictional benchmark evaluating LLMs on five standardized legal reasoning tasks across six diverse countries, demonstrating that cross-lingual…
The paper systematically compares multiple content representations for RAG pipelines and finds that answer retention—the ability of the representation to preserve the original answer-bearing content—i…
Xuan Lu, Haohang Huang, Yingqi Fan, Junlong Tong +4 more
This paper proposes CompRank, a token-efficient reranking framework for large language models that reduces redundant computation and achieves strong reranking performance.
Zerui Chen, Qinggang Zhang, Zhishang Xiang, Zhimin Wei +4 more
LegalGraphRAG introduces a multi-agent, hierarchical graph retrieval-augmented generation framework to overcome the limitations of traditional RAG in legal domains, achieving state-of-the-art reliable…
Shuai Xiao, Su Liu, Weikai Zhou, Jialun Wu +3 more
Persona prompting does not universally improve LLM performance; instead, it systematically trades increased expertise depth for reduced clarity, making multi-metric evaluation essential.
Divya Tadimeti, Shawn Pan, Sameera Lanka, Chenghui Zhou +1 more
This paper demonstrates that targeted adaptation of the small language model Phi Silica, using dataset curation and fine-tuning, significantly improves its performance in short-form text rewriting, na…
The paper introduces a novel, training-free method to automatically generate fine-grained evaluation rubrics for LLM-as-a-Judge, and further proposes an iterative fine-tuning strategy that significant…
The paper proposes a comprehensive benchmark to systematically audit how varying persona prompts and model choices affect the technical quality and social representativeness of scholar recommendations…
The paper introduces CERA, a novel contrastive retrieval framework that improves RAG factuality and interpretability by using subjectivity-based hard negative selection and an auxiliary attention alig…