20 results for “Agentic coding”
CS papers onlyHybrid search: Keyword + semantic, ranked by combined score.ⓘ
Want pure semantic search? Try claim verification →
Fuwei Zhang, Yanzhao Zhang, Mingxin Li, Dingkun Long +4 more
This paper introduces CORE-Bench, a comprehensive benchmark for code retrieval in agentic coding.
Yue Liu, Yanjie Zhao, Yunbo Lyu, Ting Zhang +2 more
The paper analyzes how agentic AI coding assistants can be compromised via prompt injection attacks embedded in external artifacts, turning them into unauthorized execution shells for attackers.
Shiping Chen, Qin Wang, Guangsheng Yu, Xu Wang +1 more
This paper systematizes the security challenges of open agentic systems, concluding that while attack characterization is mature, the field lacks robust guidelines for operational governance, memory i…
Yipeng Gao, Lei Shu, Genzhi Ye, Xi Xiong +4 more
The paper introduces 3DCodeBench, a systematic benchmark and platform for evaluating Vision-Language Model (VLM) agents' ability to generate procedural 3D models from text and images using code.
Aditya Kumar, Zhihan Lei, Jerry Yan, Joshua W. Momo +5 more
The paper proposes a modular agent framework and novel learning methods to design and optimize practical, cost-effective, and controllable LLM-based agentic systems.
The paper introduces a data-centric optimization pipeline to improve coding agents' ability to interact with a branching lakehouse, showing significant accuracy gains by treating agent evaluation as a…
The paper introduces the concepts of Agentic Technical Debt and Stochastic Tax to categorize and manage the unique governance and operating liabilities inherent in complex, multi-step AI agent systems…
This paper analyzes the performance of agentic LLM systems in complex binary reverse engineering, identifying key limitations such as handling obfuscation and token constraints, and proposing future d…
Dongping Chen, Xuanao Huang, Zhihan Hu, Qingyuan Shi +2 more
The paper demonstrates that specialized coding agents, using only text and image access within a sandbox, can effectively solve complex omnimodal tasks, often outperforming state-of-the-art native omn…
AgenticVM is a multi-agent framework that uses LLMs and specialized tools to automate and drastically reduce the volume of software vulnerabilities into actionable, prioritized queues.
This paper compares two agentic AI systems, Claude Code and Codex, on a gravitational wave data analysis pipeline, finding that while both achieve scientific convergence, they exhibit vastly different…
The study found that while multi-agent LLM code generation architectures significantly affect code complexity, the added complexity does not translate into better functional correctness, suggesting ar…
Qingshan Liu, Guoqing Wang, Wen Wu, Jingqi Huang +4 more
MemPro introduces a system-level evolution framework that treats the entire memory construction-retrieval pipeline as an evolvable program, significantly improving long-horizon agent performance over…
This study benchmarks token-optimized formats (TOON and TRON) against JSON in end-to-end agentic AI systems, finding that TRON significantly reduces token overhead with minimal performance degradation…
MOSAIC introduces a structured agentic framework that treats automated data science as a staged, context-grounded model selection problem, improving performance and traceability over traditional AutoM…
The paper introduces Language-Based Agent Control (LBAC), a new programming model that extends static typing and runtime enforcement guarantees to agentic applications, ensuring that agent-generated c…
This paper introduces ASE-26, a comprehensive undergraduate curriculum designed to formalize and teach agentic software engineering as a distinct academic discipline.
Ruihang Lai, Hao Kang, Haozhan Tang, Akaash R. Parthasarathy +5 more
The paper introduces PithTrain, a compact, agent-native Mixture-of-Experts (MoE) training framework that significantly improves agent-task efficiency compared to existing production stacks.
Ningzhi Tang, Chaoran Chen, Gelei Xu, Yiyu Shi +4 more
This study analyzes over 20,000 real-world coding sessions to show that AI coding agents frequently fail users through subtle misalignment, requiring constant manual correction even when major system…
Leyline introduces a novel serving-side primitive that allows agentic LLMs to perform targeted, efficient edits to the KV cache, avoiding costly full re-prefilling after content modification.