~ similar to 2605.27785· 20 results
The paper introduces a data-centric optimization pipeline to improve coding agents' ability to interact with a branching lakehouse, showing significant accuracy gains by treating agent evaluation as a…
The study compares agentic data retrieval using unstructured web data versus structured, semantically-annotated datasets, concluding that semantic metadata remains essential for high-precision, reliab…
The paper introduces Sophrosyne, a system that moderates LLM agent exploration in relational data systems, significantly reducing over-exploration and boosting SQL generation accuracy by guiding the a…
Yujie Luo, Xiangyuan Ru, Jingsheng Zheng, Jingjing Wang +9 more
The paper introduces Autonomous Agentic Data Engineering, demonstrating that LLMs can autonomously plan and optimize end-to-end data curation pipelines, leading to substantial performance gains in spe…
BADGER is a unified, production-grade evaluation framework that integrates text-to-SQL assessment with agentic behavior evaluation, significantly outperforming existing benchmarks on industry queries.
Julien Piet, Annabella Chow, Yiwei Hou, Muxi Lyu +4 more
The paper argues that web agents should abandon the reactive ReAct paradigm in favor of a plan-then-execute approach, which requires developing typed, task-level APIs to properly structure web interac…
This study benchmarks token-optimized formats (TOON and TRON) against JSON in end-to-end agentic AI systems, finding that TRON significantly reduces token overhead with minimal performance degradation…
MOSAIC introduces a structured agentic framework that treats automated data science as a staged, context-grounded model selection problem, improving performance and traceability over traditional AutoM…
Yunkai Lou, Longbin Lai, Shunyang Li, Zhengping Qian +1 more
SpecDB is a novel system that uses LLMs to synthesize highly customized, purpose-built relational databases, achieving performance comparable to commercial systems while significantly reducing code si…
This study benchmarks four local LLMs for natural-language-to-SQL querying in biopharma manufacturing, finding that general-purpose code-tuned models like Llama 3.1 8B and Qwen 2.5 Coder 7B outperform…
Ruiyi Zhang, Peijia Qin, Qi Cao, Li Zhang +1 more
The paper introduces AIBuildAI-2, a knowledge-enhanced agent that significantly improves the automatic building of AI models by integrating an external, evolving knowledge system, achieving state-of-t…
The paper introduces FORGE, a feedback-driven execution system that improves LLM-based binary analysis by interleaving reasoning and tool interaction, achieving high-quality vulnerability discovery on…
The paper proposes Multi-Agent Computer Use (MACU) systems, which significantly improve performance on complex, long-horizon tasks by enabling parallel execution and dynamic task decomposition compare…
The paper proposes the Intelligent Computing Architecture Model (ICAM), a six-layer framework that unifies disparate concepts in model-native computing by viewing the LLM stack through a dual-plane ar…
Wei Zheng, Yang Yan, Yiyang Shao, Jinyang Li +5 more
The paper proposes A2X, an LLM-native progressive-disclosure scheme that structures service taxonomies hierarchically and searches them layer-by-layer at query time, solving context overflow and impro…
Zhengyang Zhao, Shengjie Ye, Lu Ma, Hao Liang +2 more
The paper introduces Andes, a framework that treats data generation as a plug-and-play agent skill, enabling autonomous alignment of LLMs by providing an intelligent, closed-loop data synthesis interf…
Alireza Salemi, Chang Zeng, Atharva Nijasure, Jui-Hui Chung +3 more
GrepSeek introduces a novel direct corpus interaction (DCI) search agent that trains an LLM to find and compose evidence from large text corpora by issuing executable shell commands, achieving state-o…
Kewei Xu, Xiaoben Lu, Shuofei Qiao, Zihan Ding +3 more
The paper introduces LongDS, a new benchmark for long-horizon, multi-turn data analysis, demonstrating that current AI agents struggle significantly with maintaining and updating complex analytical st…
Agent libOS introduces a library-OS-inspired runtime substrate that treats LLM agents as schedulable processes, providing explicit capability control and robust auditing for long-running, stateful age…
Zhen Yang, Xiaogang Xu, Wen Wang, Cong Chen +2 more
The paper introduces StreamMA, a streaming multi-agent reasoning system that significantly reduces latency and improves effectiveness by passing reasoning steps to downstream agents as they are genera…