~ similar to 2605.27931· 18 results
The paper introduces MUSE, a comprehensive benchmark that evaluates Text-to-CAD generation by assessing complex assemblies based on functionality, manufacturability, and assemblability, moving beyond…
Haozhe Zhao, Shuzheng Si, Zhenhailong Wang, Zheng Wang +5 more
The paper introduces Crafter, a multi-agent harness that significantly improves the generation of editable, publication-quality scientific figures from diverse inputs, addressing the limitations of ex…
The paper introduces I-WebGenBench, a framework and benchmark that converts static scientific papers into executable, interactive web systems, allowing users to dynamically explore the paper's mechani…
PlanarBench introduces a novel benchmark to test LLM spatial reasoning by requiring them to draw planar graphs as ASCII art from an edge list, finding that edge count is a stronger difficulty predicto…
Qian Kou, Xiaofeng Shi, Yulin Li, Xiaosong Qiu +3 more
The paper introduces MechVQA, a comprehensive dataset and benchmark for mechanical drawing understanding, and proposes the MechVL model, which significantly improves Multimodal LLMs' performance on th…
Zerui Chen, Qinggang Zhang, Zhishang Xiang, Zhimin Wei +4 more
LegalGraphRAG introduces a multi-agent, hierarchical graph retrieval-augmented generation framework to overcome the limitations of traditional RAG in legal domains, achieving state-of-the-art reliable…
Sunisth Kumar, Xanh Ho, Tim Schopf, Andre Greiner-Petter +2 more
The paper explains the 'table-chart gap' in scientific claim verification by showing that multimodal LLMs successfully encode information from charts but fail to route it to the final prediction layer…
PhyDrawGen is a neuro-symbolic pipeline that generates physically accurate diagrams from natural language by explicitly enforcing physical laws and geometric constraints, significantly outperforming c…
This paper presents an algorithmic framework for exhaustively generating and tabulating knot and link diagrams on the thickened torus.
Chuanjie Wu, Zhishang Xiang, Yunbo Tang, Zerui Chen +2 more
MemGraphRAG introduces a novel memory-based multi-agent system to construct globally consistent and structurally sound knowledge graphs, significantly improving retrieval-augmented generation for comp…
This paper introduces a new benchmark dataset and evaluation framework for 'data snapshot extraction,' focusing on identifying and localizing semantically meaningful analytical artifacts within operat…
Xinkai Ma, Zhiqi Bai, Dingling Zhang, Pei Liu +20 more
The paper introduces TVIR, a new benchmark and multi-agent framework for deep research, to evaluate and improve the generation of factually reliable, text-visual interleaved reports.
The paper introduces TechGraphRAG, an advanced, agentic RAG framework that enhances technical literature reasoning by integrating multi-step query refinement, external database searching, and knowledg…
Wanhao Liu, Jiaqing Xie, Qian Tan, Weida Wang +9 more
The paper introduces OmniMatBench, a comprehensive, human-calibrated multimodal reasoning benchmark covering 19 materials science subfields, revealing that current multimodal language models (MLLMs) h…
The paper introduces COMPOSE, a dual-graph framework that generates plausible future mathematical theorems by simultaneously conditioning a language model on both the scientific citation context and t…
Zhikai Pan, Chih-Ting Liao, Chunrui Liu, Xi Xiao +4 more
The paper introduces a multilingual benchmark (MentalMap) to test if LLMs build internal spatial world models from text, finding a universal 'L3 reasoning cliff' suggesting that text-only working memo…
Chuang Tang, Chenhao Lin, Yin Xu, Hao Wang +4 more
MACReD introduces a hierarchical multi-agent framework that achieves state-of-the-art performance in parsing complex chemical reaction diagrams by coordinating specialized agents for perception and gl…
VISReg introduces a novel regularization technique that combines variance control with a Sliced-Wasserstein-based sketching objective to stabilize self-supervised learning, achieving state-of-the-art…