~ similar to 2605.30664· 20 results
This paper introduces the first LLM-generated, domain-independent heuristics for symbolic AI planning, using evolutionary search to surpass the performance of hand-engineered state-of-the-art methods.
The paper introduces a novel LLM-driven evolutionary framework to synthesize admissible, domain-specific pattern generators, enabling optimal classical planning with high performance and interpretabil…
Yiming Ren, Yiran Xu, Zicheng Lin, Chufan Shi +7 more
The paper proposes S2L-PO, a framework that uses smaller, naturally diverse models as structured explorers to enhance the policy-level diversity and performance of larger language models during traini…
This paper demonstrates that a complex deep reinforcement learning policy for power grid control can be successfully distilled into a lightweight, auditable decision tree and random forest surrogate t…
The paper introduces ReMax, a novel objective function that naturally encourages stochastic exploration in policy gradient reinforcement learning by evaluating expected maximum returns over multiple s…
The paper introduces LinTree, a method that explicitly structures the search history of LLM reasoning traces using parent pointers, significantly improving task performance and search efficiency compa…
The paper introduces GONDOR, a memory-efficient extension of Greedy Best-First Search (GBFS) that enables search continuation under strict memory constraints by periodically compressing the search tre…
Junyu Zhang, Feihong Yang, Jian Wang, Chao Wang +1 more
The paper introduces Global PSRO, a novel deep reinforcement learning framework that efficiently approximates Nash equilibria in large two-player zero-sum games by intelligently expanding the strategy…
Yike Zhao, Onno Eberhard, Malek Khammassi, Ali H. Sayed +1 more
This paper theoretically justifies the strong performance of linear recurrent neural networks as memory units in partially observable reinforcement learning by constructing specific linear filters tha…
The paper introduces an outer-loop AI agent that autonomously redesigns LLM policy-synthesis pipelines for multi-agent social dilemmas, demonstrating that the optimal pipeline structure depends critic…
Sixue Xing, Haoyu He, Kerui Wu, Zhuo Yang +3 more
The paper proposes BaSE, a multi-armed bandit approach, to optimally allocate a fixed budget of LLM calls across parallel evolutionary search trajectories, significantly improving mean fitness and rel…
Yi Wang, Haojie Lu, Zhaofan Zhang, Li Chen +1 more
This paper introduces MCTS-Guided Group Relative Policy Optimization (M-GRPO) to enhance LLM spatial reasoning by improving the decomposition of complex tasks into optimal sub-tasks.
This paper unifies the fragmented field of Tree-of-Thoughts (ToT) reasoning by mapping LLM-based search processes onto a formal taxonomy derived from classical heuristic search theory.
Yunbo Tang, Chengyi Yang, Shiyu Liu, Zhishang Xiang +3 more
The paper proposes SAAS, a novel RL framework that equips LLM agents with self-awareness to precisely regulate search behavior, significantly mitigating costly over-search without sacrificing accuracy…
Pengcheng Jiang, Zhiyi Shi, Kelly Hong, Xueqiang Xu +4 more
The paper introduces Harness-1, a search agent that separates semantic decision-making from state management by using a stateful search harness, achieving state-of-the-art performance across diverse r…
This paper demonstrates that transformer-based policies can provably learn complex tree search mechanisms, such as depth-first search, purely through reinforcement learning in a stochastic environment…
Xinyu Liu, Darryl Cherian Jacob, Yang Zhou, Jindong Wang +1 more
The OISD framework improves language model reasoning by distilling on-policy predictive signals from the final output layer to intermediate representations, leading to substantial improvements on math…
The paper introduces the Markov decision contest, a new framework for reinforcement learning using pairwise preferences, and proves that stationary Markov policies are optimal and solvable efficiently…
The paper proposes the Policy-Execution-Authorization (PEA) architecture, a separation-of-powers system designed to structurally enforce goal integrity in AI agents, moving safety from a probabilistic…
The paper proposes a feasible-reward-set framework to perform Inverse Reinforcement Learning (IRL) when data comes from multiple imperfect demonstrators, providing theoretical guarantees and practical…