~ similar to 2606.03963· 17 results
This paper investigates the robustness of world models in vision-based quadrotor navigation and identifies factors governing their quality.
The paper proposes an agentic pipeline for spatial reasoning by introducing a dynamic cognitive map and Spatial Assertion Codes (SAC), achieving state-of-the-art performance on complex spatial tasks.
The paper proposes a Network Distributed Multi-Agent Reinforcement Learning (ND-MARL) framework that enables stable, scalable consensus control for large swarms of quadcopters using only local neighbo…
The paper introduces a logic-driven framework using a neural certificate function to rigorously evaluate and benchmark the generalization capabilities of reinforcement learning algorithms on unseen ta…
The paper proposes CTRL-STEER, a closed-loop framework that adaptively adjusts intervention strength to stabilize concept regulation and improve task success in Vision-Language-Action models without r…
Tao Chen, Gangwei Jiang, Pengyu Cheng, Siyuan Huang +9 more
The paper proposes Skill-RM, a unified framework that treats reward modeling as an agentic task to consistently integrate diverse evaluation criteria, achieving superior performance over traditional m…
Rui Yang, Qianhui Wu, Yuxi Chen, Hao Bai +6 more
The paper introduces OpenWebRL, an open framework that enables training visual web agents using online multi-turn Reinforcement Learning directly on live websites, achieving state-of-the-art performan…
Zhongyu He, Yuanfan Li, Fei Huang, Tianyu Chen +8 more
SIRI introduces a self-internalizing reinforcement learning framework that allows LLM agents to autonomously discover and integrate reusable skills directly into their core policy, significantly impro…
Yuchen Liu, Yingjie Feng, Lixiong Qin, Jiasi Chen +4 more
The paper introduces Graph-Distance Contribution Reward (GDCR) and Step Advantage Policy Optimization (SAPO) to provide fine-grained, step-level credit assignment for agentic search by modeling world…
Ning Lu, Baijiong Lin, Shengcai Liu, Jiahao Wu +8 more
The paper proposes PaW, a co-training framework that uses standard RL rollouts to provide auxiliary world model supervision directly during policy training, significantly improving language agent perf…
Bin Chen, Xinye Liao, Yiming Liu, Xin Liao +1 more
The paper proposes Credit-Attenuated Privileged Feedback (CAPF), a training-time mechanism that uses verifier-side information to guide LLM search agents, significantly improving their performance on…
This paper demonstrates that transformer-based policies can provably learn complex tree search mechanisms, such as depth-first search, purely through reinforcement learning in a stochastic environment…
Hongru Hou, Tiehua Mei, Denghui Geng, Jinhui Huang +4 more
The paper proposes ProRL, an effective Reinforcement Learning framework that rectifies gradient estimation deficiencies to optimize proactive recommendation paths, significantly outperforming existing…
Kangrui Wang, Linjie Li, Zhengyuan Yang, Shiqi Chen +6 more
The paper addresses the challenge of multi-turn view planning for VLMs by proposing an iterative framework that uses self-exploration and view graph distillation, significantly improving planning perf…
The paper introduces Prompted Policy Optimization (PromptPO), an LLM-based method that successfully optimizes policies for various sequential RL tasks, demonstrating that LLMs can replace classical RL…
SkillC introduces a Contrastive Skill Credit Assignment (CSCA) framework to enable LLM agents to autonomously internalize skills during training, significantly outperforming existing methods without r…
Shizuo Tian, Xiaohong Weng, Rui Kong, Yuxuan Chen +8 more
The JAMEL framework addresses the challenge of effective exploration in open-ended environments by jointly training agent memory and exploration policies using natural, novelty-driven signals.