~ similar to 2606.01490· 20 results
The study found that while multi-agent LLM code generation architectures significantly affect code complexity, the added complexity does not translate into better functional correctness, suggesting ar…
Pengyu Zhu, Lijun Li, Yaxing Lyu, Qianxin Luo +7 more
The paper introduces a unified framework to fairly evaluate LLM agentic capabilities by standardizing diverse benchmarks and separating the effects of the LLM model from the surrounding framework and…
Ruiyin Li, Yiran Zhang, Xiyu Zhou, Yangxiao Cai +5 more
The paper introduces MAAD, a multi-agent framework that autonomously transforms software requirements into comprehensive, multi-view architectural blueprints, significantly improving completeness and…
Minfeng Qi, Tianqing Zhu, Zijie Xu, Congcong Zhu +2 more
The paper introduces CAESAR, a novel multi-agent framework that coordinates LLM agents across five specialized roles to improve success rates and stability in complex, multi-stage cyber intrusion task…
Zhezheng Hao, Tianfu Wang, Huanshuo Dong, Ziyan Liu +6 more
The paper proposes Meta-Team, an experience-driven framework that enables multi-agent systems (MAS) to collaboratively self-evolve by transforming complex execution experiences into reusable improveme…
Tomer Keren, Nitay Calderon, Asaf Yehudai, Yotam Perlitz +2 more
The paper introduces TASTE, an automatic task synthesis method that generates challenging agent benchmarks by evolving tool sequences, demonstrating that existing benchmarks are saturated and that TAS…
Bushra Sabir, Shigang Liu, Seung Ick Jang, Sharif Abuadbba +5 more
The paper evaluates multi-LLM strategies for secure code generation, finding that hybrid pipelines combining ensembling, static analysis, and patching achieve the strongest security performance, outpe…
This paper empirically demonstrates that the architectural design of multi-agent systems significantly impacts their security, finding that coordination mechanisms can introduce vulnerabilities greate…
Taein Kim, David Jiang, Yuepeng Hu, Yuqi Jia +1 more
The paper presents a large-scale study demonstrating that tool cloning is a pervasive and severe source of hidden duplication in agent-tool ecosystems, necessitating changes in how tool diversity is m…
This paper investigates the scaling behavior of homogeneous LLM-driven Multi-Agent Systems (MAS) and finds that performance exhibits diminishing returns due to coordination overhead, rather than scali…
The paper proposes a novel '3+1' heterogeneous multi-agent architecture using cloud LLMs and a local verifier to achieve high-accuracy, cost-effective code vulnerability detection, significantly outpe…
Jiahao Huang, Fei Cheng, Junfeng Jiang, Zefan Yu +1 more
The paper introduces BenchTrace, a novel benchmark designed to rigorously evaluate the self-evolution and reflection capabilities of LLM agents, revealing that current models struggle with accurate fa…
Aditya Kumar, Zhihan Lei, Jerry Yan, Joshua W. Momo +5 more
The paper proposes a modular agent framework and novel learning methods to design and optimize practical, cost-effective, and controllable LLM-based agentic systems.
The paper evaluates dynamic coordination strategy selection for enterprise multi-agent systems, finding that a calibrated default routing approach is effective, even if a deterministic winner-selectio…
Yipeng Gao, Lei Shu, Genzhi Ye, Xi Xiong +4 more
The paper introduces 3DCodeBench, a systematic benchmark and platform for evaluating Vision-Language Model (VLM) agents' ability to generate procedural 3D models from text and images using code.
The paper introduces a validated, consensus-labeled prompt bank that separates requests for executable malicious code (weapons) from requests for general harmful security knowledge, providing a more g…
Youness Bouchari, Matteo Boffa, Marco Mellia, Idilio Drago +2 more
The paper re-evaluates LLM agents on CTFs, finding that while general-purpose agents like claude-code are strong baselines, specialized, modular architectures significantly improve performance and con…
Melih Catal, Alex Wolf, Tiago Ferreiro Matos, Pooja Rani +1 more
The paper introduces Vertical Integration Bias (VIB) to quantify whether LLMs favor their own provider's ecosystem when generating code, finding that this bias is significant in both direct and agenti…
The paper proposes the Intelligent Computing Architecture Model (ICAM), a six-layer framework that unifies disparate concepts in model-native computing by viewing the LLM stack through a dual-plane ar…
The paper empirically evaluates various agentic architectures for offensive security tasks, finding that while broader coordination improves coverage, the optimal architecture is non-monotonic and dep…