~ similar to 2606.01311· 19 results
Yuxuan Liu, Zhaochen Su, Lingyun Xie, Yuhao Zhang +10 more
SkillRevise is an execution-grounded framework that iteratively refines initial, imperfect LLM agent skills by diagnosing defects from execution evidence and applying empirically validated edits, sign…
Jianxiang Yu, Jiapeng Zhu, Bochen Lin, Qier Cui +2 more
The paper introduces MASA, a model-aware skill alignment framework that adaptively rewrites general and task-specific skills for LLM agents, achieving superior performance across diverse backbones and…
Yanchao Li, Wanhao Liu, Ben Gao, Jiaqing Xie +4 more
SkillsInjector proposes a two-stage adaptive method to dynamically optimize skill selection, quantity, and presentation for LLM agents, significantly improving task performance over static injection m…
Wentao Hu, Zhendong Chu, Yiming Zhang, Junda Wu +5 more
The paper introduces SkillBrew, a multi-objective framework that treats skill bank curation as a constrained optimization problem to build efficient and well-curated skill repositories for LLM agents.
Jiahao Huang, Fei Cheng, Junfeng Jiang, Zefan Yu +1 more
The paper introduces BenchTrace, a novel benchmark designed to rigorously evaluate the self-evolution and reflection capabilities of LLM agents, revealing that current models struggle with accurate fa…
Tao Chen, Gangwei Jiang, Pengyu Cheng, Siyuan Huang +9 more
The paper proposes Skill-RM, a unified framework that treats reward modeling as an agentic task to consistently integrate diverse evaluation criteria, achieving superior performance over traditional m…
Tianyi Zhou, Dongrui Liu, Leitao Yuan, Jing Shao +1 more
COLLEAGUE.SKILL introduces an automated system that distills heterogeneous traces of human expertise and role-specific knowledge into portable, inspectable, and usable AI skill packages.
Xinyu Che, Junqi Xiong, Yunfei Ge, Xinping Lei +9 more
The paper introduces MMG2Skill, a closed-loop framework that converts noisy, human-oriented web guides into editable, executable skills, significantly improving agent performance across diverse tasks.
Xujun Li, Kehan Zheng, Mingyuan Zhao, Yize Geng +6 more
The paper proposes HiSME, a lightweight hierarchical skill meta-evolving solution that jointly optimizes skills and the skill evolving strategy by learning meta-skills from task execution traces, lead…
The paper introduces a data-centric optimization pipeline to improve coding agents' ability to interact with a branching lakehouse, showing significant accuracy gains by treating agent evaluation as a…
The paper introduces Behavioral Integrity Verification (BIV), a framework that systematically audits AI agent skills by comparing their declared capabilities against their actual implementation, revea…
Jiangrong Wu, Yuhong Nan, Yixi Lin, Huaijin Wang +3 more
SkillScope introduces a graph-based framework to enforce fine-grained least-privilege in LLM Agent Skills, significantly reducing over-privileged actions while maintaining task functionality.
Su Wang, Pin Qian, Yihang Chen, Junxian You +5 more
The paper introduces SkillReact, a framework that measures compositional risk in agent skill ecosystems, finding that even if individual skills are safe, their combination can create significant, unad…
Su Wang, Pin Qian, Yihang Chen, Junxian You +5 more
The paper introduces SkillReact, a framework that measures compositional risk in agent skill ecosystems, finding that even if individual skills are safe, their combination can create significant, expl…
Yangbo Wei, Zhen Huang, Shaoqiang Lu, Junhong Qian +3 more
SkillSmith is a synergy-aware framework that jointly co-evolves skills and tools, significantly improving self-improving agent systems by modeling skill-tool interactions and diagnosing failures.
Zhongyu He, Yuanfan Li, Fei Huang, Tianyu Chen +8 more
SIRI introduces a self-internalizing reinforcement learning framework that allows LLM agents to autonomously discover and integrate reusable skills directly into their core policy, significantly impro…
Haomin Zhuang, Hanwen Xing, Yujun Zhou, Yuchen Ma +4 more
The paper introduces AgentTrap, a dynamic benchmark that measures LLM agent susceptibility to malicious side effects embedded within seemingly benign third-party skills, finding that agents often exec…
The paper introduces LearnWeak, an annotation-free framework that automatically specializes small computer-use agents by identifying and targeting their specific weaknesses using a stronger reference…
GRASP introduces a gated, regression-aware framework for improving LLM agents by ensuring that every proposed skill edit improves performance on a balanced probe without degrading previously learned c…