~ similar to 2606.01185· 20 results
The paper introduces Hyperparam, a set of lightweight JavaScript libraries designed to enable direct, model-aware querying of unstructured data (like agent traces) within client-side AI applications.
The paper proposes a trust schema and verification framework to ensure that agent skills, which augment LLMs, are rigorously verified before deployment, thereby making human-in-the-loop oversight scal…
Yilun Yao, Xinyu Tan, Chao-Hsuan Liu, Yaoming Li +8 more
The paper introduces Harness-Bench, a diagnostic benchmark that measures how different system 'harnesses' affect LLM agent performance in realistic workflows, showing that agent capability must be rep…
The paper argues that current 'on-the-fly' AI agent design lacks necessary software engineering rigor and proposes an 'AI Workflow Store' to provide hardened, reusable, and reliable agent workflows.
Wentao Hu, Zhendong Chu, Yiming Zhang, Junda Wu +5 more
The paper introduces SkillBrew, a multi-objective framework that treats skill bank curation as a constrained optimization problem to build efficient and well-curated skill repositories for LLM agents.
SkCC is a compiler that enables portable and secure development of LLM agent skills by decoupling skill semantics from framework-specific formatting, significantly improving reliability and security.
Yunhao Feng, Yifan Ding, Yingshui Tan, Boren Zheng +5 more
SkillTrojan introduces a novel backdoor attack targeting the composition of reusable skills in agent systems, demonstrating high attack success rates with minimal impact on normal system functionality…
Jiahao Huang, Fei Cheng, Junfeng Jiang, Zefan Yu +1 more
The paper introduces BenchTrace, a novel benchmark designed to rigorously evaluate the self-evolution and reflection capabilities of LLM agents, revealing that current models struggle with accurate fa…
Yiyong Liu, Chia-Yi Hsu, Chun-Ying Huang, Michael Backes +2 more
This paper introduces Dependency Steering, a novel attack paradigm demonstrating that malicious agent skills can actively bias LLM coding agents to use attacker-controlled packages, posing a significa…
Ningzhi Tang, Chaoran Chen, Gelei Xu, Yiyu Shi +4 more
This study analyzes over 20,000 real-world coding sessions to show that AI coding agents frequently fail users through subtle misalignment, requiring constant manual correction even when major system…
Zihan Wang, Rui Zhang, Yu Liu, Chi Liu +3 more
This paper presents the first systematic study of black-box skill stealing attacks against proprietary LLM agents, demonstrating that structured agent skills can be easily extracted, posing a signific…
Tianyi Zhou, Dongrui Liu, Leitao Yuan, Jing Shao +1 more
COLLEAGUE.SKILL introduces an automated system that distills heterogeneous traces of human expertise and role-specific knowledge into portable, inspectable, and usable AI skill packages.
Yuxuan Liu, Zhaochen Su, Lingyun Xie, Yuhao Zhang +10 more
SkillRevise is an execution-grounded framework that iteratively refines initial, imperfect LLM agent skills by diagnosing defects from execution evidence and applying empirically validated edits, sign…
Chenyu Zhou, Xinyun Lu, Jiangyue Zhao, Jianghao Lin +2 more
The paper introduces OR-Space, a novel full-lifecycle workspace benchmark designed to rigorously evaluate industrial optimization agents by simulating real-world, multi-stage OR workflows that go beyo…
Zhiyuan Li, Jingzheng Wu, Xiang Ling, Xing Cui +1 more
This paper provides the first comprehensive security analysis of the Agent Skills framework, identifying severe structural vulnerabilities that require fundamental architectural changes rather than si…
Kevin Eykholt, Dhilung Kirat, Xiaokui Shu, Jiyong Jang +2 more
The paper reports on penetration tests conducted on proprietary, large-scale AI agent systems, finding that security vulnerabilities persist despite stricter development standards.
Xinyu Che, Junqi Xiong, Yunfei Ge, Xinping Lei +9 more
The paper introduces MMG2Skill, a closed-loop framework that converts noisy, human-oriented web guides into editable, executable skills, significantly improving agent performance across diverse tasks.
The paper introduces Behavioral Integrity Verification (BIV), a framework that systematically audits AI agent skills by comparing their declared capabilities against their actual implementation, revea…
Yaoyu Zhao, Yichen Xu, Oliver Bračevac, Cao Nguyen Pham +2 more
The paper introduces LACUNA, a novel programming model that allows LLM agents to write code that shapes the runtime environment while maintaining strong type-checking safety guarantees.
Yangbo Wei, Zhen Huang, Shaoqiang Lu, Junhong Qian +3 more
SkillSmith is a synergy-aware framework that jointly co-evolves skills and tools, significantly improving self-improving agent systems by modeling skill-tool interactions and diagnosing failures.