~ similar to 2605.28219· 19 results
This paper predicts the aggregate crowd salience of a document from its text before its marks accumulate.
The paper introduces I-WebGenBench, a framework and benchmark that converts static scientific papers into executable, interactive web systems, allowing users to dynamically explore the paper's mechani…
Xinkai Ma, Zhiqi Bai, Dingling Zhang, Pei Liu +20 more
The paper introduces TVIR, a new benchmark and multi-agent framework for deep research, to evaluate and improve the generation of factually reliable, text-visual interleaved reports.
The paper introduces a simple, token-efficient vision-language model for generating comprehensive pathology synoptic reports from multiple whole-slide images (WSIs), achieving high performance while s…
MEMENTO proposes a novel framework that treats the open web as a continuous learning signal, enabling agents to acquire task-specific expertise and reusable research strategies in low-data domains wit…
Yu Xue, Haoxuan Qu, Zhuoling Li, Yihang Lou +3 more
The paper introduces ToolFG, a novel tool-integrated MLLM framework that enhances fine-grained image classification by enabling models to autonomously use external tools to gather verifiable visual cu…
The paper introduces a structured benchmark (TGAD) showing that current text-guided anomaly detection models often overstate their language conditioning, as performance significantly degrades when the…
This paper investigates whether a group of people highlighting the same document forms a single consensus or is internally structured into reader sub-groups.
Sunisth Kumar, Xanh Ho, Tim Schopf, Andre Greiner-Petter +2 more
The paper explains the 'table-chart gap' in scientific claim verification by showing that multimodal LLMs successfully encode information from charts but fail to route it to the final prediction layer…
The paper argues that the standard FID metric is unreliable because its performance depends significantly on the geometric structure and density of the reference dataset, not just the sample quality.
Selim Kuzucu, Alessio Tonioni, Vasile Lup, Bernt Schiele +2 more
PARCEL introduces a novel visual tokenization architecture that combines spatial pooling anchors with conditioned elastic queries, efficiently reducing the computational cost of large Vision-Language…
Yalun Dai, Yangyu Huang, Tongshen Yang, Yonghan Wang +7 more
This paper proposes four guidelines and two novel data ordering methods (STR and SAW) to systematically optimize data organization, significantly enhancing the stability and performance of LLM trainin…
This paper analyzes failure modes in collaborative visual reasoning systems, demonstrating that naive shared workspaces can amplify hallucinations and proposing diagnostics for improving communication…
Jonghyun Chung, Rishabh Chaddha, Sanket Badhe, Debanshu Das +2 more
This survey proposes a proactive, lifecycle-based framework, utilizing the C5 Interaction Model, to detect emerging adversarial synthetic narratives generated by GenAI, moving beyond traditional react…
Jonghyun Chung, Rishabh Chaddha, Sanket Badhe, Debanshu Das +2 more
This survey proposes a proactive, lifecycle-based framework, utilizing the C5 Interaction Model, to detect emerging adversarial synthetic narratives generated by Generative AI, moving beyond tradition…
Zhixin Cai, Jun Bai, Yang Liu, Jiaqi Li +6 more
Xetrieval introduces an embedding-level framework to mechanistically explain dense retrieval decisions by decomposing high-dimensional embeddings into sparse, human-interpretable features.
Yung-Yu Shih, Shang-Yu Su, Tzu-I Ho, Dongzhe Wang +1 more
The paper presents BEATS, a human-in-the-loop LLM framework for bootstrapping product attribute taxonomies from scratch.
This paper introduces a new benchmark dataset and evaluation framework for 'data snapshot extraction,' focusing on identifying and localizing semantically meaningful analytical artifacts within operat…
Ziyue Yang, Da Ma, Hanqi Li, Zijian Wang +7 more
DeepSurvey is an agentic system that significantly enhances automated survey generation by extracting deep, structured knowledge from full-text papers and rigorously validating citations, achieving su…