~ similar to 2605.31338· 20 results
The paper introduces FOSSIL, a new multilingual dataset and specialized workflow designed to significantly improve the extraction of citations embedded within complex footnotes common in law and human…
The paper introduces BenGER, a comprehensive benchmark for evaluating LLMs on German legal reasoning, demonstrating that closed-flagship models perform best and that human-AI co-creation significantly…
The paper introduces Multi-Legal-Bench, a novel cross-jurisdictional benchmark evaluating LLMs on five standardized legal reasoning tasks across six diverse countries, demonstrating that cross-lingual…
The paper introduces a cross-encoder re-ranker trained on attribution scores to improve the retrieval of highly relevant citation passages for legal question answering, outperforming standard semantic…
The paper introduces Citation Grounding (CG), a novel metric and framework, to systematically detect and reduce the hallucination of legal citations by verifying LLM outputs against a massive, structu…
Ethan Zhao, Maksym Taranukhin, Wei Cui, Moira Aikenhead +1 more
The paper introduces CanLegalRAGBench, a new Canadian legal QA benchmark, and evaluates RAG systems, finding that while open-source models are competitive, automatic evaluations struggle with nuanced…
Zerui Chen, Qinggang Zhang, Zhishang Xiang, Zhimin Wei +4 more
LegalGraphRAG introduces a multi-agent, hierarchical graph retrieval-augmented generation framework to overcome the limitations of traditional RAG in legal domains, achieving state-of-the-art reliable…
The paper introduces UA-Legal-Bench, a comprehensive Ukrainian legal reasoning benchmark built from a massive judicial corpus, demonstrating that LLM performance is highly task-dependent and that simp…
The paper introduces a typed claim network that models cross-document references by explicitly labeling the stance (e.g., agreement, disagreement) of a citation, significantly improving downstream tas…
The paper introduces RefWalk, a novel framework designed to improve regulatory compliance question answering by ensuring rigorous citation traceability and explicit per-rule attribution across complex…
The paper proposes a novel KAN-enhanced BiGRU architecture to improve legal document classification and summarization in a low-resource, multilingual setting using Bengali and English legal texts.
The authors introduce Structured PubMed, a comprehensive corpus of section-labeled biomedical abstracts compiled from the complete PubMed database.
Junyu Lu, Qi Wei, Peishuo Zheng, Jie Zhang +5 more
The paper introduces Prosecution Decision Prediction (PDP), a new legal AI task that assesses prosecutorial review decisions, showing that current state-of-the-art LLMs perform significantly worse on…
This paper conducts a large-scale audit of human annotation reporting in NLP, finding that while reporting has improved, critical details needed to assess annotation validity, such as training and agr…
Junjie Chen, Yuxi Dong, Haitao Li, Weihang Su +4 more
The paper introduces LongJudgeBench, a new benchmark designed to evaluate the reliability of LLM judges specifically for complex, long-form output evaluation, revealing significant instability gaps in…
Minglai Yang, Xinyan Velocity Yu, Pengyuan Li, Xinyu Guo +21 more
The paper introduces Dr. DocBench, a difficulty-aware, comprehensive benchmark designed to rigorously test expert-level and challenging document parsing capabilities for VLMs, demonstrating that curre…
The paper introduces a Deep Research pipeline that significantly improves literature search recall and demonstrates that human-curated citation lists are often unreliable and do not serve as a true gr…
The paper benchmarks local, offline LLMs for confidential translation workflows, demonstrating that while they are viable for privacy-sensitive use, they generally lag behind top commercial NMT system…
This paper proposes a lightweight encoder-based MEL solution called FAST-MEL that meets three objectives: high linking accuracy, computational efficiency, and storage efficiency.
Yeqi Huang, Yue Chen, Yanwei Ye, Guanhao Su +1 more
The paper introduces Ryze, an automated system that synthesizes evidence-enriched Question-Answering (QA) pairs from raw biomedical papers, resulting in a specialized VLM (BioVLM-8B) that significantl…