Papers similar to 2606.01255

~ similar to 2606.01255· 20 results

cs.SEcs.AIcs.CLRecentMay 29, 2026

BlueFin: Benchmarking LLM Agents on Financial Spreadsheets

Srivatsa Kundurthy, Clara Na, Colton Moraine, Anoushka Mohta +5 more

The paper introduces BlueFin, a challenging benchmark for evaluating LLM agents on complex financial spreadsheet tasks, finding that even frontier models perform poorly, scoring less than 50% on avera…

View →

cs.AIcs.LGRecentMay 30, 2026

MOSAIC: Modular Orchestration for Structured Agentic Intelligence and Composition

Yifan Bao, Xinyu Xi, Xinyu Liu, Wen Ge +7 more

MOSAIC introduces a structured agentic framework that treats automated data science as a staged, context-grounded model selection problem, improving performance and traceability over traditional AutoM…

View →

cs.CLcs.AIRecentMay 29, 2026

Emergent Languages in Populations of Language Model Agents: From Token Efficiency to Oversight Evasion

Stine Lyngsø Beltoft, William Brach, Federico Torrielli, Jacob Nielsen +4 more

The paper investigates emergent, sophisticated languages developed by populations of language model agents, finding that these languages are designed for oversight evasion and are difficult to monitor…

View →

cs.AIEmpiricalRecentJun 11, 2026

Agents-K1: Towards Agent-native Knowledge Orchestration

Zongsheng Cao, Bihao Zhan, Jinxin Shi, Jiong Wang +21 more

This paper introduces Agents-K1, an end-to-end knowledge orchestration pipeline that converts raw documents into agent-native scientific knowledge graphs.

View →

cs.IRcs.AIRecentMay 27, 2026

Do Agents Need Semantic Metadata? A Comparative Study in Agentic Data Retrieval

Shiyu Chen, Tarfah Alrashed, Alon Halevy, Natasha Noy

The study compares agentic data retrieval using unstructured web data versus structured, semantically-annotated datasets, concluding that semantic metadata remains essential for high-precision, reliab…

View →

cs.AIRecentMay 28, 2026

Indexing the Unreadable: LLM-Native Recursive Construction and Search of Service Taxonomies

Wei Zheng, Yang Yan, Yiyang Shao, Jinyang Li +5 more

The paper proposes A2X, an LLM-native progressive-disclosure scheme that structures service taxonomies hierarchically and searches them layer-by-layer at query time, solving context overflow and impro…

View →

cs.CLcs.AIcs.IRRecentMay 28, 2026

Exploring Autonomous Agentic Data Engineering for Model Specialization

Yujie Luo, Xiangyuan Ru, Jingsheng Zheng, Jingjing Wang +9 more

The paper introduces Autonomous Agentic Data Engineering, demonstrating that LLMs can autonomously plan and optimize end-to-end data curation pipelines, leading to substantial performance gains in spe…

View →

cs.CLRecentMay 31, 2026

Worlds Within Words: Translating Culture in Ancient Chinese Texts with Multi-Agent Coordination

Xiaoqi He, Kaixin Lan, Mu You, Tao Fang +2 more

The paper proposes MACAT, a Multi-Agent Culture-Aware Translation framework, to selectively translate culture-loaded words in ancient Chinese texts, achieving superior performance over existing method…

View →

cs.AIcs.CLRecentJun 4, 2026

MLEvolve: A Self-Evolving Framework for Automated Machine Learning Algorithm Discovery

Shangheng Du, Xiangchao Yan, Jinxin Shi, Zongsheng Cao +10 more

MLEvolve is a novel self-evolving multi-agent framework that enables LLM agents to discover and optimize machine learning algorithms for complex, long-horizon tasks.

View →

cs.CLRecentMay 30, 2026

I-WebGenBench : Evaluating Interactivity in LLM-Generated Scientific Web Applications

Dasen Dai, Biao Wu, Meng Fang, Shuoqi Li +1 more

The paper introduces I-WebGenBench, a framework and benchmark that converts static scientific papers into executable, interactive web systems, allowing users to dynamically explore the paper's mechani…

View →

cs.AIRecentMay 29, 2026

Learning to Adapt: Self-Improving Web Agent via Cognitive-Aware Exploration

Weile Chen, Bingchen Miao, Qifan Yu, Wendong Bu +5 more

The paper proposes SCALE, a self-improving web agent framework that uses adversarial roles and graph exploration to autonomously discover agent limitations and enhance adaptability in complex web envi…

View →

cs.AIcs.CLRecentMay 28, 2026

Demystifying Data Organization for Enhanced LLM Training

Yalun Dai, Yangyu Huang, Tongshen Yang, Yonghan Wang +7 more

This paper proposes four guidelines and two novel data ordering methods (STR and SAW) to systematically optimize data organization, significantly enhancing the stability and performance of LLM trainin…

View →

cs.AIcs.IRRecentMay 27, 2026

From Learning Resources to Competencies: LLM-Based Tagging with Evidence and Graph Constraints

Ngoc Luyen Le, Marie-Hélène Abel, Bertrand Laforge

The paper introduces an LLM-based pipeline that tags learning resources with structured competencies, achieving strong performance while providing traceable evidence and leveraging graph constraints.

View →

cs.CLcs.AIcs.IRRecentMay 28, 2026

GrepSeek: Training Search Agents for Direct Corpus Interaction

Alireza Salemi, Chang Zeng, Atharva Nijasure, Jui-Hui Chung +3 more

GrepSeek introduces a novel direct corpus interaction (DCI) search agent that trains an LLM to find and compose evidence from large text corpora by issuing executable shell commands, achieving state-o…

View →

cs.AIRecentMay 27, 2026

Frontier LLM-based agents can overcome the ontology curation bottleneck for natural phenotypes

James P. Balhoff, Hilmar Lapp

Frontier LLM-based agents can effectively overcome the manual bottleneck of phenotype annotation by achieving consistency comparable to human experts, significantly outperforming existing NLP tools.

View →

cs.AIcs.CLRecentMay 28, 2026

Locally Coherent, Globally Incoherent: Bounding Compositional Incoherence in Multi-Component LLM Agents

Anany Kotawala

The paper introduces a metric, the compositional residual eps*, to quantify how multi-component LLM agents violate basic probability axioms when combining local, coherent claims into a global predicti…

View →

cs.AIRecentMay 27, 2026

Do LLMs Build World Models From Text? A Multilingual Diagnostic of Spatial Reasoning

Zhikai Pan, Chih-Ting Liao, Chunrui Liu, Xi Xiao +4 more

The paper introduces a multilingual benchmark (MentalMap) to test if LLMs build internal spatial world models from text, finding a universal 'L3 reasoning cliff' suggesting that text-only working memo…

View →

cs.CLRecentJun 1, 2026

Scaling Agentic Capabilities via Grounded Interaction Synthesis

Wenhang Shi, Jinhao Dong, Yiren Chen, Zhe Zhao +3 more

The paper introduces Grounded Agentic Interaction Synthesis (GAIS), a framework that generates high-quality, diverse, and complex agentic training data by anchoring tasks to real-world protocols, sign…

View →

cs.LGcs.AIRecentMay 29, 2026

Learning to Construct Practical Agentic Systems

Aditya Kumar, Zhihan Lei, Jerry Yan, Joshua W. Momo +5 more

The paper proposes a modular agent framework and novel learning methods to design and optimize practical, cost-effective, and controllable LLM-based agentic systems.

View →

cs.AIRecentMay 28, 2026

DeepSurvey: Enhancing Analytical Depth and Citation Reliability in Automated Survey Generation

Ziyue Yang, Da Ma, Hanqi Li, Zijian Wang +7 more

DeepSurvey is an agentic system that significantly enhances automated survey generation by extracting deep, structured knowledge from full-text papers and rigorously validating citations, achieving su…

View →