~ similar to 2605.30966· 19 results
The paper introduces FOSSIL, a new multilingual dataset and specialized workflow designed to significantly improve the extraction of citations embedded within complex footnotes common in law and human…
The paper introduces a Deep Research pipeline that significantly improves literature search recall and demonstrates that human-curated citation lists are often unreliable and do not serve as a true gr…
The paper introduces a novel, scalable framework to monitor and classify dataset usage within research literature, addressing the current lack of infrastructure for tracking data citations.
The paper introduces 'bundesrecht,' an open-source, end-to-end pipeline for processing complex German statutory references, which parses, normalizes, and resolves raw citation strings into structured,…
The paper introduces RefWalk, a novel framework designed to improve regulatory compliance question answering by ensuring rigorous citation traceability and explicit per-rule attribution across complex…
The paper introduces TechGraphRAG, an advanced, agentic RAG framework that enhances technical literature reasoning by integrating multi-step query refinement, external database searching, and knowledg…
Yongsik Seo, Wooseok Jeong, Eunyoung Kim, Hyeonseo Jang +1 more
The paper introduces CITETRACE, a large-scale dataset and evaluation framework that systematically measures structural citation failures in search-augmented LLMs, revealing a pattern called Verified M…
The paper introduces a multilingual corpus and demonstrates that small, fine-tuned language models (SLMs) are highly effective for Citation Needed Detection (CND) in lower-resource languages, often ou…
The paper introduces a cross-encoder re-ranker trained on attribution scores to improve the retrieval of highly relevant citation passages for legal question answering, outperforming standard semantic…
Zheng Yuan, Chuang Zhou, Linhao Luo, Siyu An +3 more
MoG proposes a novel Mixture of Experts framework for graph-based RAG, which uses hub graphs to guide the sparse activation of domain-specific expert graphs, significantly improving retrieval accuracy…
Pengyu Chen, Yonggang Zhang, Mingming Chen, Jun Song +2 more
The paper proposes a graph-constrained approach to scale multi-hop training data by decoupling path discovery from path verbalization, significantly expanding the usable corpus size for LLMs.
The paper introduces Citation Grounding (CG), a novel metric and framework, to systematically detect and reduce the hallucination of legal citations by verifying LLM outputs against a massive, structu…
The paper introduces COMPOSE, a dual-graph framework that generates plausible future mathematical theorems by simultaneously conditioning a language model on both the scientific citation context and t…
Zerui Chen, Qinggang Zhang, Zhishang Xiang, Zhimin Wei +4 more
LegalGraphRAG introduces a multi-agent, hierarchical graph retrieval-augmented generation framework to overcome the limitations of traditional RAG in legal domains, achieving state-of-the-art reliable…
Pin Qian, Su Wang, Xiaoyuan Wang, Yihang Chen +6 more
The paper introduces FORCEBENCH, a new stress test designed to evaluate whether cited sources genuinely warrant the strength of a claim, revealing that standard citation evaluation methods often fail…
The authors introduce Structured PubMed, a comprehensive corpus of section-labeled biomedical abstracts compiled from the complete PubMed database.
The paper introduces TaDaS, a framework that analyzes large-scale text archives to measure professional sentiment, finding that while AI discussion among economists is initially negative, the trend sh…
This paper introduces GraphSteal, an attack framework demonstrating that Graph RAG systems can leak substantial portions of a hidden knowledge graph by treating them as structural oracles.
The paper introduces KnowledgeGain, a novel metric that measures the actual knowledge gained by readers from science news, and demonstrates its use in optimizing news generation to improve reader lear…