20 results for “Agentic Reinforcement Learning”
CS papers onlyHybrid search: Keyword + semantic, ranked by combined score.ⓘ
Want pure semantic search? Try claim verification →
Xucong Wang, Ziyu Ma, Yong Wang, Yuxiang Ji +4 more
This paper proposes a new method for agentic Reinforcement Learning called Agentic Procedural Policy Optimization (APPO) that improves tool-use capabilities by assigning credit to fine-grained decisio…
Liuji Chen, Dianxing Tang, Xing Shi, Dingshuo Chen +3 more
The paper proposes EAPO, a framework that enables agentic models to learn when to forgo using external tools, thereby mitigating tool abuse while maintaining high reasoning accuracy.
This paper demonstrates that transformer-based policies can provably learn complex tree search mechanisms, such as depth-first search, purely through reinforcement learning in a stochastic environment…
Ning Lu, Baijiong Lin, Shengcai Liu, Jiahao Wu +8 more
The paper proposes PaW, a co-training framework that uses standard RL rollouts to provide auxiliary world model supervision directly during policy training, significantly improving language agent perf…
Zhikun Xu, Yu Feng, Jacob Dineen, Taiwei Shi +2 more
The paper proposes ReuseRL, a method that improves agent generalization in Reinforcement Learning by enforcing structural compressibility of successful agent trajectories into reusable skills.
Tao Chen, Gangwei Jiang, Pengyu Cheng, Siyuan Huang +9 more
The paper proposes Skill-RM, a unified framework that treats reward modeling as an agentic task to consistently integrate diverse evaluation criteria, achieving superior performance over traditional m…
Zelin He, Haotian Lin, Boran Han, Wei Zhu +5 more
ReSkill is an RL-in-the-loop framework that reconciles skill creation and policy optimization by automatically creating, testing, and refining modular skills alongside the agent's policy learning, lea…
The paper introduces AgenticRL, a self-refining reinforcement learning framework that uses a multimodal GPT agent to automatically design, refine, and deploy reward functions for complex UAV navigatio…
Aditya Kumar, Zhihan Lei, Jerry Yan, Joshua W. Momo +5 more
The paper proposes a modular agent framework and novel learning methods to design and optimize practical, cost-effective, and controllable LLM-based agentic systems.
This paper develops a policy-learning framework to optimally assign prediction tasks to multiple agents, considering individual agent expertise and capacity constraints, achieving systematic performan…
Tong Liu, Cheng Qian, Matej Cief, Yuan He +3 more
This paper analyzes tool-calling in LLM agents, demonstrating that evaluation results are highly sensitive to implementation details and proposing new techniques to significantly improve the efficienc…
The paper introduces the Markov decision contest, a new framework for reinforcement learning using pairwise preferences, and proves that stationary Markov policies are optimal and solvable efficiently…
Chris Hicks, Elizabeth Bates, Shae McFadden, Isaac Symes Thompson +11 more
This paper synthesizes expert knowledge from a workshop to provide a comprehensive framework and best-practice guidelines for developing high-quality reinforcement learning environments for autonomous…
The paper proposes a scalable, distributed approach for constrained Multi-Agent Reinforcement Learning by using local consensus over dual variables to ensure global constraint satisfaction without cen…
Dongdong Hua, Yifei Sun, Renhong Huang, Feng Gao +2 more
The paper introduces PTCG-Bench, a new benchmark using the Pokémon TCG to evaluate LLM agents' strategic decision-making and ability to self-evolve, finding that sustained self-evolution remains chall…
The paper proposes DIBS, a decoupled behavioral cloning approach that stabilizes inductive generalization in RL by separating task-specific policy learning from the evolution function, leading to impr…
Yunbo Tang, Chengyi Yang, Shiyu Liu, Zhishang Xiang +3 more
The paper proposes SAAS, a novel RL framework that equips LLM agents with self-awareness to precisely regulate search behavior, significantly mitigating costly over-search without sacrificing accuracy…
The paper proposes a Network Distributed Multi-Agent Reinforcement Learning (ND-MARL) framework that enables stable, scalable consensus control for large swarms of quadcopters using only local neighbo…
The paper develops an optimistic maximum-likelihood algorithm that achieves $ ilde{O}(\sqrt{T})$ policy regret for sequential decision-making in partially observable Markov games against adaptive oppo…
The paper proposes extending world models for multi-agent reinforcement learning by factorizing the latent state to explicitly model and predict the unobservable intentions and behaviors of teammates.