~ similar to 2606.06480· 20 results
Junyu Zhang, Feihong Yang, Jian Wang, Chao Wang +1 more
The paper introduces Global PSRO, a novel deep reinforcement learning framework that efficiently approximates Nash equilibria in large two-player zero-sum games by intelligently expanding the strategy…
The paper introduces Safe Equilibrium Policy Optimization (σepo{}) to train language models for multi-agent strategic tasks, achieving improved safety and robustness across various game domains.
The paper proposes a scalable, distributed approach for constrained Multi-Agent Reinforcement Learning by using local consensus over dual variables to ensure global constraint satisfaction without cen…
The paper develops an optimistic maximum-likelihood algorithm that achieves $ ilde{O}(\sqrt{T})$ policy regret for sequential decision-making in partially observable Markov games against adaptive oppo…
The paper proposes a Network Distributed Multi-Agent Reinforcement Learning (ND-MARL) framework that enables stable, scalable consensus control for large swarms of quadcopters using only local neighbo…
This paper introduces Repeated Policy Regret (RP-Regret), a novel game-theoretic metric for analyzing regret in repeated games with adaptive opponents, and proposes algorithms to minimize it.
The paper introduces the Markov decision contest, a new framework for reinforcement learning using pairwise preferences, and proves that stationary Markov policies are optimal and solvable efficiently…
The paper proposes D-BOS, a novel differentiable method that shapes opponent behavior by directly manipulating the opponent's inferred belief state, outperforming existing techniques in multi-agent ga…
The paper introduces the Generative Response Model (GRM) to improve constrained auto-bidding by predicting future traffic and cost/value curves from a single bid multiplier, allowing for an exact, lig…
This paper develops a policy-learning framework to optimally assign prediction tasks to multiple agents, considering individual agent expertise and capacity constraints, achieving systematic performan…
DriftQL introduces a novel, efficient offline RL method that combines a drift-based behavioral regularizer with critic-driven policy improvement, achieving state-of-the-art performance while maintaini…
Yibo Wang, Nikki Lijing Kuang, Philip S. Yu, Zhewei Yao +1 more
The paper proposes MERIT, a dual-level, multi-horizon memory retrieval framework that significantly improves the performance of interactive text-to-SQL agents by providing both global and local memory…
The paper introduces Coordination Graphs for Constrained Multi-Agent Reinforcement Learning (CG-CMARL), a scalable framework that decomposes complex joint action spaces into pairwise regions to handle…
The paper proposes a unified framework that maps the geometry of games to effective solver dynamics, suggesting that solvability is governed by continuous structural properties rather than discrete cl…
The study extends cooperative bias testing across diverse, next-generation LLMs, finding that provider identity is a stronger predictor of cooperative equilibrium than model generation, and that noise…
The paper proposes a novel framework combining behavior-invariant task representation learning and a Transformer-based world model to achieve robust generalization in offline meta-reinforcement learni…
This paper studies a dynamic assortment problem on a two-sided service platform with incomplete information and heterogeneous customers, and develops a data-driven algorithm to learn parameters and op…
This paper studies a dynamic assortment problem on a two-sided service platform with incomplete information and heterogeneous customers, and develops a data-driven algorithm to learn parameters and op…
Rui Yang, Qianhui Wu, Yuxi Chen, Hao Bai +6 more
The paper introduces OpenWebRL, an open framework that enables training visual web agents using online multi-turn Reinforcement Learning directly on live websites, achieving state-of-the-art performan…
Dongdong Hua, Yifei Sun, Renhong Huang, Feng Gao +2 more
The paper introduces PTCG-Bench, a new benchmark using the Pokémon TCG to evaluate LLM agents' strategic decision-making and ability to self-evolve, finding that sustained self-evolution remains chall…