ArXivCSExplorer
☆☆Bookmarks🏆RSSHow to UseFAQ
Built with and by Teycir Ben Soltane•
How to Use•FAQ•GitHub•arXiv.org•
Share:

20 results for “biomedical literature processing”

CS papers only

Hybrid search: Keyword + semantic, ranked by combined score.ⓘ

Want pure semantic search? Try claim verification →

cs.IRcs.CLDatasetRecentJun 9, 2026

A PubMed-Scale Dataset of Structured Biomedical Abstracts

Chia-Hsuan Chang, Haerin Song, Brian Ondov, Hua Xu

The authors introduce Structured PubMed, a comprehensive corpus of section-labeled biomedical abstracts compiled from the complete PubMed database.

View →
cs.AIRecentMay 30, 2026

Ryze: Evidence-Enriched Data Synthesis from Biomedical Papers

Yeqi Huang, Yue Chen, Yanwei Ye, Guanhao Su +1 more

The paper introduces Ryze, an automated system that synthesizes evidence-enriched Question-Answering (QA) pairs from raw biomedical papers, resulting in a specialized VLM (BioVLM-8B) that significantl…

View →
cs.CLcs.AIRecentJun 1, 2026

AutoForest: Automatically Generating Forest Plots from Biomedical Studies with End-to-End Evidence Extraction and Synthesis

Massimiliano Pronesti, Angelo Miculescu, Mohsin Kapdi, Paul Flanagan +7 more

AutoForest is an end-to-end system that automatically generates publication-ready forest plots directly from biomedical papers, streamlining the labor-intensive process of meta-analysis.

View →
cs.CLRecentMay 28, 2026

AI for Monitoring and Classifying Data Used in Research Literature

Rafael Macalaba, Aivin V. Solatorio

The paper introduces a novel, scalable framework to monitor and classify dataset usage within research literature, addressing the current lack of infrastructure for tracking data citations.

View →
cs.CLcs.AIcs.CVRecentMay 31, 2026

Dr. DocBench: A Comprehensive Benchmark for Expert-Level and Difficult Document Parsing

Minglai Yang, Xinyan Velocity Yu, Pengyuan Li, Xinyu Guo +21 more

The paper introduces Dr. DocBench, a difficulty-aware, comprehensive benchmark designed to rigorously test expert-level and challenging document parsing capabilities for VLMs, demonstrating that curre…

View →
cs.CLcs.AIRecentMay 29, 2026

Beyond Agreement: Scoring Panel-Surfaced Biomedical Entity Candidates for Curator Triage

Shuheng Cao, Ruiqi Chen, Renjie Cao, Zhenhao Zhang +2 more

The paper introduces BioConCal, a supervised scoring mechanism that evaluates biomedical NER candidates surfaced by multiple LLMs, significantly improving the quality of the candidate pool for human c…

View →
cs.IRcs.AIcs.CLRecentMay 29, 2026

Reading Between the Citations: A Typed Claim Network for Scientific Literature

Ning Ding, Sergio J. Rodríguez Méndez, Pouya G. Omran

The paper introduces a typed claim network that models cross-document references by explicitly labeling the stance (e.g., agreement, disagreement) of a citation, significantly improving downstream tas…

View →
cs.CLRecentMay 29, 2026

Bundesrecht: An Open Library and Corpus for German Statutory Reference Processing

Harshil Darji, Martin Heckelmann, Christina Kratsch, Gerard de Melo

The paper introduces 'bundesrecht,' an open-source, end-to-end pipeline for processing complex German statutory references, which parses, normalizes, and resolves raw citation strings into structured,…

View →
cs.DLcs.CLRecentMay 31, 2026

Digging Up Citations: FOSSIL, a Dataset and Workflow for Reference Extraction in Law and the Humanities

Luca Foppiano, Christian Boulanger

The paper introduces FOSSIL, a new multilingual dataset and specialized workflow designed to significantly improve the extraction of citations embedded within complex footnotes common in law and human…

View →
cs.CLRecentMay 31, 2026

HypothesisMed: Inference-Time Answer Fusion and Structured Hypothesis-Space Reporting for Biomedical Question Answering

Md Motaleb Hossen Manik, Ge Wang

HypothesisMed introduces an inference-time pipeline for biomedical question answering that improves model reliability and structured output generation by fusing multiple model outputs and diagnosing t…

View →
cs.CLRecentJun 1, 2026

What to Format and How: A Benchmark and Workflow Approach for Document Formatting

Shihao Rao, Liang Li, Jiapeng Liu, Tong Lin +5 more

The paper introduces DocFormBench, a new benchmark for content-aware document formatting, and proposes DocFormFlow, a workflow that improves formatting accuracy and efficiency by decoupling target loc…

View →
cs.CLRecentJun 1, 2026

Towards Multidisciplinary Summarization of Hospital Stays: Efficient Sentence-Level Clinical Provenance Categorization

Baris Karacan, Vaibhav Bhargava, Barbara Di Eugenio, Natalie Parde +20 more

The paper introduces a supervised fine-tuning pipeline using large language models to accurately categorize sentence-level clinical provenance across multi-disciplinary hospital notes, demonstrating t…

View →
cs.IRcs.CLRecentMay 29, 2026

Evaluating Factual Density in Multi-Source RAG: A Study in Medical AI Accuracy

Michael R. DeMarco

The paper introduces Factual Density (FD*), a novel retrieval signal that measures the proportion of verified facts, demonstrating that optimizing RAG retrieval based on this density significantly imp…

View →
cs.CLRecentMay 31, 2026

PMC-InterCPT: Rethinking Biomedical Interleaved Data for Multimodal Continued Pretraining

Guanghao Zhu, Zeyu Liu, Zhitian Hou, Pengkai Wang +8 more

The paper introduces PMC-InterCPT, a refined biomedical interleaved corpus that enhances multimodal continued pretraining by integrating figure-referencing body text alongside captions, leading to imp…

View →
cs.AIcs.IRRecentMay 28, 2026

Rethinking Literature Search Evaluation: Deep Research Helps, and Human Citation Lists Are Not a Ground Truth

Gaurav Sahu, Laurent Charlin, Christopher Pal

The paper introduces a Deep Research pipeline that significantly improves literature search recall and demonstrates that human-curated citation lists are often unreliable and do not serve as a true gr…

View →
cs.CLRecentMay 31, 2026

UniD$^3$: A Knowledge Graph-Enhanced RAG Framework for Drug-Disease Discovery and Reasoning

Qing Wang, Tianshi Liu, Minghao Zhou, Jialu Liang +4 more

UniD$^3$ is a novel Knowledge Graph-enhanced RAG framework that processes vast biomedical literature to systematically extract, organize, and validate comprehensive drug-disease knowledge, achieving h…

View →
cs.CLcs.AIRecentJun 1, 2026

KliniskVestBERT: BERT Model Specialised to Norwegian Clinical Texts

Christian Autenried, Cosimo Persia

This paper introduces KliniskVestBERT, a suite of BERT models specialized by pre-training on a large, diverse corpus of real-world Norwegian clinical texts, demonstrating superior performance for clin…

View →
cs.CLcs.AIRecentMay 28, 2026

MedCase-Structured: A Text-to-FHIR Dataset for Benchmarking Diagnostic Reasoning in Clinically Realistic EHR Settings

Valentina Bui Muti, Eugénie Dulout, Ziquan Fu

The paper introduces MedCase-Structured, a synthetic, FHIR-formatted dataset designed to benchmark diagnostic reasoning in realistic EHR settings, showing that LLMs perform worse on structured data th…

View →
cs.AIRecentJun 1, 2026

An NLP-Driven Framework for Curriculum-Labor Market Alignment: Schema-Constrained LLM Extraction, ESCO-Anchored Semantic Matching, and Multi-Dimensional Gap Quantification

Sherzod Turaev, Mary John, Mamoun Awad, Nazar Zaki +1 more

The paper introduces a robust four-stage NLP framework that uses schema-constrained LLMs and ESCO vocabulary to accurately extract and align educational competencies with labor market demands, quantif…

View →
cs.CLRecentMay 31, 2026

Peacemaker at ATE-IT: Automatic term extraction from Italian text for waste management data using encoder model

Mahdi Bakhtiyarzadeh, Hadi Bayrami Asl Tekanlou, Jafar Razmara

The paper proposes a low-cost and interpretable fine-tuning extraction strategy for automatic term extraction, demonstrating consistent and balanced performance on the ATE Shared Task.

View →