~ similar to 2605.31444· 19 results
The paper proposes a flexible meta-programming framework to declaratively operationalize and explore varied temporal logics, such as TEL, MEL, and DEL, within standard Answer Set Programming systems.
Yi Wang, Haojie Lu, Zhaofan Zhang, Li Chen +1 more
This paper introduces MCTS-Guided Group Relative Policy Optimization (M-GRPO) to enhance LLM spatial reasoning by improving the decomposition of complex tasks into optimal sub-tasks.
Yaoming Li, Guangxiang Zhao, Qilong Shi, Lin Sun +2 more
This paper synthesizes over 150 scattered studies and reports to provide the first comprehensive primer on post-training reasoning data, organizing the field around data objects, utility, construction…
Rongzhi Zhang, Rui Feng, Zhihan Zhang, Jingfeng Yang +7 more
QUBRIC introduces a co-design framework that simultaneously optimizes queries and rubrics, overcoming the bottleneck of vague rubrics derived from open-ended questions, leading to significant gains in…
Tao Feng, Chongrui Ye, Tianyang Luo, Jingjun Xu +7 more
ExpGraph is a model-agnostic framework that uses a self-evolving experience graph to enable LLM agents to reuse past successful strategies and failure lessons, significantly improving performance acro…
The paper introduces a learned 'rerooter' mechanism to improve subgoal-based policy tree search, allowing scalable search in complex environments without the overhead of explicit subgoal generation.
Junyu Zhang, Feihong Yang, Jian Wang, Chao Wang +1 more
The paper introduces Global PSRO, a novel deep reinforcement learning framework that efficiently approximates Nash equilibria in large two-player zero-sum games by intelligently expanding the strategy…
The paper introduces a novel LLM-driven evolutionary framework to synthesize admissible, domain-specific pattern generators, enabling optimal classical planning with high performance and interpretabil…
The paper introduces a logic-driven framework using a neural certificate function to rigorously evaluate and benchmark the generalization capabilities of reinforcement learning algorithms on unseen ta…
Ning Lu, Baijiong Lin, Shengcai Liu, Jiahao Wu +8 more
The paper proposes PaW, a co-training framework that uses standard RL rollouts to provide auxiliary world model supervision directly during policy training, significantly improving language agent perf…
This paper introduces the first LLM-generated, domain-independent heuristics for symbolic AI planning, using evolutionary search to surpass the performance of hand-engineered state-of-the-art methods.
The paper proposes DIBS, a decoupled behavioral cloning approach that stabilizes inductive generalization in RL by separating task-specific policy learning from the evolution function, leading to impr…
The paper introduces a novel shielding framework for Robust MDPs (RMDPs) that guarantees safety under worst-case transition probabilities, enabling safe reinforcement learning even when transition dyn…
The paper proposes a feasible-reward-set framework to perform Inverse Reinforcement Learning (IRL) when data comes from multiple imperfect demonstrators, providing theoretical guarantees and practical…
Rui Yang, Qianhui Wu, Yuxi Chen, Hao Bai +6 more
The paper introduces OpenWebRL, an open framework that enables training visual web agents using online multi-turn Reinforcement Learning directly on live websites, achieving state-of-the-art performan…
Gaetan Narozniak, Gérard Biau, Rémi Munos, Ahmad Rammal +1 more
The paper introduces Feedback Distillation, a novel training method that uses a language model's privileged feedback to provide token-level supervision, significantly improving complex reasoning tasks…
Chenhao Bai, Liqin Lu, Kaijun Wang, Hui Chen +4 more
This paper studies how to scale robust robot policies by expanding physical domains in a recoverable way.
Xinyu Liu, Darryl Cherian Jacob, Yang Zhou, Jindong Wang +1 more
The OISD framework improves language model reasoning by distilling on-policy predictive signals from the final output layer to intermediate representations, leading to substantial improvements on math…
Chris Hicks, Elizabeth Bates, Shae McFadden, Isaac Symes Thompson +11 more
This paper synthesizes expert knowledge from a workshop to provide a comprehensive framework and best-practice guidelines for developing high-quality reinforcement learning environments for autonomous…