SIRI introduces a self-internalizing reinforcement learning framework that allows LLM agents to autonomously discover and integrate reusable skills directly into their core policy, significantly improving performance on complex tasks without external skill generators.
Long-horizon LLM agents can benefit from reusable skills, yet existing skill-based methods often rely on external skill generators during training or persistent skill retrieval at inference, increasing engineering complexity, context length, and deployment latency. We propose Self-Internalizing Reinforcement learning with Intrinsic skills (SIRI), a three-phase framework that enables agents to discover, validate, and internalize skills without external skill generators or inference-time skill banks. SIRI first warms up the policy with GiGPO to acquire basic interaction ability and collect successful skill-free trajectories. It then performs self-skill mining, where the current policy summarizes compact skills from its own successful plain rollouts and validates them through paired skill-augmented and skill-free rollouts. Finally, SIRI distills only beneficial skill-guided action tokens into the plain policy using trajectory-level utility and action-level advantage. At inference, the agent runs with the original prompt only. On ALFWorld and WebShop with Qwen2.5-7B-Instruct, SIRI improves GiGPO from 0.908 to 0.930 on ALFWorld and from 0.728 to 0.813 on WebShop, outperforming prompt-based, RL-based, and memory-augmented baselines. Further analysis shows that our self-mining strategy can achieve performance comparable to distillation with closed-source large model. Our code is available at https://github.com/kirito618/SIRI.
Skill is Not One-Size-Fits-All: Model-Aware Skill Alignment for LLM Agents
The paper introduces MASA, a model-aware skill alignment framework that adaptive…
SKILLC: Learning Autonomous Skill Internalization in LLM Agents via Contrastive Credit Assignment
SkillC introduces a Contrastive Skill Credit Assignment (CSCA) framework to enab…
SkillRevise: Improving LLM-Authored Agent Skills via Trace-Conditioned Skill Revision
SkillRevise is an execution-grounded framework that iteratively refines initial,…
Skill-Conditioned Gated Self-Distillation for LLM Reasoning
The paper proposes Skill-Conditioned Gated Self-Distillation (SGSD), a novel fra…
SkillsInjector: Dynamic Skill Context Construction for LLM Agents
SkillsInjector proposes a two-stage adaptive method to dynamically optimize skil…
ReSkill: Reconciling Skill Creation with Policy Optimization in Agentic RL
ReSkill is an RL-in-the-loop framework that reconciles skill creation and policy…
Skill or Skip? Learning Selective Skill Invocation in Agentic Tasks via Dual-Granularity Preference…
SelSkill introduces a dual-granularity preference learning framework that treats…
SkillHarm: Lifecycle-Aware Skill-Based Attacks via Automated Construction
The paper introduces SkillHarm, a comprehensive benchmark and automated framewor…