Papers similar to 2605.29446

~ similar to 2605.29446· 20 results

cs.AIRecentMay 28, 2026

OmniMatBench: A Human-Calibrated Multimodal Reasoning Benchmark Across 19 Materials Science Subfields

Wanhao Liu, Jiaqing Xie, Qian Tan, Weida Wang +9 more

The paper introduces OmniMatBench, a comprehensive, human-calibrated multimodal reasoning benchmark covering 19 materials science subfields, revealing that current multimodal language models (MLLMs) h…

View →

cond-mat.mtrl-scics.CEcs.CLRecentMay 29, 2026

A Padding Method for Enhanced Encoding of Inorganic Structures with Varying Chemical Compositions

Thang Dang, Haderbache Amir, Tzanakakis Alexandros, Yoshimoto Yuta

The paper introduces a novel padding method that leverages crystal symmetry to enhance the encoding of complex inorganic structures, significantly improving the generation of stable, novel materials.

View →

cond-mat.mtrl-scics.ETcs.LGRecentJun 1, 2026

Towards Automated Discovery: A Review of Generative Models, Multimodal Learning and Closed-Loop Workflows in Inverse Materials Design

Anand Babu, Rogério Almeida Gouvêa, Gian-Marco Rignanese

This review surveys advanced techniques—including generative models, multimodal learning, and closed-loop workflows—for automated inverse materials design, enabling the targeted discovery of novel cry…

View →

cs.AIcond-mat.mtrl-sciRecentMay 29, 2026

Coupling Language Models with Physics-based Simulation for Synthesis of Inorganic Materials

Edward W. Staley, Tom Arbaugh, Michael Pekala, Alexander New +5 more

The paper proposes a novel hybrid framework that couples Large Language Models (LLMs) with simplified physics-based simulations to improve the synthesis planning of novel inorganic crystalline materia…

View →

cs.AIRecentMay 28, 2026

ProjectionBench: Evaluating Scientific Hypothesis Generation in LLMs Under Progressive Information Disclosure

A. J. Lew, Y. Cao, M. J. Buehler

The paper introduces ProjectionBench, a novel benchmark that progressively discloses information to evaluate LLMs' ability to generate scientific hypotheses, demonstrating that advanced models like GP…

View →

cs.CLRecentMay 30, 2026

ProtStructQA: A Denotation Threshold in Protein Structural Reasoning

Aravind Mandiga, Guoming Li, Jin Lu, Ismailcem Budak Arpinar +2 more

The paper introduces ProtStructQA, an executable benchmark that tests protein structural reasoning by requiring language models to generate measurable 3D coordinates, revealing a capability-dependent…

View →

cs.CERecentJun 1, 2026

Matter to Mechanism: A Benchmark for AI Co-Scientists in Materials and Battery Research

Shashwat Sourav, Tanjin. He, Maria K. Y. Chan, Anubhav Jain +1 more

The paper introduces 'Matter to Mechanism,' a novel benchmark designed to rigorously evaluate AI co-scientists' ability to generate plausible, mechanism-grounded solution hypotheses for complex materi…

View →

cs.CLcs.AIcs.CVRecentMay 31, 2026

Dr. DocBench: A Comprehensive Benchmark for Expert-Level and Difficult Document Parsing

Minglai Yang, Xinyan Velocity Yu, Pengyuan Li, Xinyu Guo +21 more

The paper introduces Dr. DocBench, a difficulty-aware, comprehensive benchmark designed to rigorously test expert-level and challenging document parsing capabilities for VLMs, demonstrating that curre…

View →

cs.LGcs.AIcs.CRRecentMay 28, 2026

NumLeak: Public Numeric Benchmarks as Latent Labels in Foundation Models

Anany Kotawala

The paper introduces NumLeak, a framework demonstrating that top-tier LLMs often exhibit high fidelity recall of specific public numeric benchmarks (like financial factors) due to memorization, which…

View →

cs.LGcs.AIcs.CRRecentMay 28, 2026

NumLeak: Public Numeric Benchmarks as Latent Labels in Foundation Models

Anany Kotawala

The paper introduces NumLeak, a framework demonstrating that top-tier LLMs often exhibit high fidelity recall of specific public numeric benchmarks, suggesting that their apparent skill may be due to…

View →

cs.CRcs.CLRecentApr 28, 2026

The Surprising Universality of LLM Outputs: A Real-Time Verification Primitive

Alex Bogdan, Adrian de Valois-Franklin

The paper identifies a universal, statistically predictable distribution (Mandelbrot) governing LLM outputs, enabling a highly efficient, model-agnostic scoring primitive for provenance and quality as…

View →

cs.LGcs.AIRecentMay 27, 2026

Learning Compositional Latent Structure with Vector Networks

Niclas Pokel, Benjamin F. Grewe

The paper introduces the Vector Network (VN), a novel recurrent architecture that replaces fixed weight matrices with reusable weight atoms, enabling superior compositional generalization by making st…

View →

cs.CLRecentJun 1, 2026

Encoded but Not Routed: Explaining the Table-Chart Gap in Scientific Claim Verification

Sunisth Kumar, Xanh Ho, Tim Schopf, Andre Greiner-Petter +2 more

The paper explains the 'table-chart gap' in scientific claim verification by showing that multimodal LLMs successfully encode information from charts but fail to route it to the final prediction layer…

View →

cs.LGcs.AIstat.MLRecentMay 28, 2026

CalArena: A Large-Scale Post-Hoc Calibration Benchmark

Eugène Berta, David Holzmüller, Francis Bach, Michael I. Jordan

The paper introduces CalArena, a large-scale, standardized benchmark covering nearly 2000 experiments to comprehensively evaluate post-hoc calibration methods, finding that smooth calibration function…

View →

cs.CVcs.AIcs.CLRecentJun 1, 2026

Cross-modal linkage risk in clinical vision-language models

Soroosh Tayebi Arasteh, Mahshad Lotfinia, Sven Nebelung, Daniel Truhn

The paper demonstrates that clinical vision-language models (VLMs) pose a significant privacy risk by allowing de-identified images to be re-linked to original reports, and proposes a targeted differe…

View →

cs.CVcs.AIcs.LGRecentMay 27, 2026

Do We Really Need Quantum Machine Learning?: A Multidimensional Empirical Study

Sudip Vhaduri, Ryan Gammon, Sayanton Dibbo

This study empirically benchmarks classical and quantum machine learning models for image recognition, finding that while quantum models offer superior accuracy and resource efficiency at high dimensi…

View →

cs.CLcs.AIcs.LGRecentJun 1, 2026

Off-the-Shelf LLMs as Process Scorers: Training-Free Alternative to PRMs for Mathematical Reasoning

Atoosa Chegini, Soheil Feizi

The paper introduces Chunk-Level Guided Generation, a training-free method that uses an off-the-shelf large language model (LLM) as a process scorer to guide small model generation, achieving performa…

View →

cs.AIRecentMay 30, 2026

Ryze: Evidence-Enriched Data Synthesis from Biomedical Papers

Yeqi Huang, Yue Chen, Yanwei Ye, Guanhao Su +1 more

The paper introduces Ryze, an automated system that synthesizes evidence-enriched Question-Answering (QA) pairs from raw biomedical papers, resulting in a specialized VLM (BioVLM-8B) that significantl…

View →

cs.CVcs.AIcs.LGRecentJun 1, 2026

Ranking vs. Assignment: The Metric Mismatch in Multi-View Object Association

Matvei Shelukhan, Timur Mamedov, Aleksandr Chukhrov, Karina Kvanchiani

The paper identifies a fundamental mismatch between standard pairwise ranking metrics (like AP and FPR-95) and the true assignment objective in multi-view object association, proposing a Sinkhorn-base…

View →