ArXivCSExplorer
☆☆Bookmarks🏆RSSHow to UseFAQ
Built with and by Teycir Ben Soltane•
How to Use•FAQ•GitHub•arXiv.org•
Share:

20 results for “Multi-agent reinforcement learning”

CS papers only

Hybrid search: Keyword + semantic, ranked by combined score.ⓘ

Want pure semantic search? Try claim verification →

cs.ROcs.AIcs.LGRecentJun 1, 2026

Network Distributed Multi-Agent Reinforcement Learning for Consensus Control of Quadcopters

Youssef Mahran, Zeyad Gamal, Aamir Ahmad, Ayman El-Badawy

The paper proposes a Network Distributed Multi-Agent Reinforcement Learning (ND-MARL) framework that enables stable, scalable consensus control for large swarms of quadcopters using only local neighbo…

View →
cs.LGcs.AIRecentMay 28, 2026

Scalable Constrained Multi-Agent Reinforcement Learning via State Augmentation and Consensus for Separable Dynamics

Santiago Amaya-Corredor, Miguel Calvo-Fullana, Anders Jonsson

The paper proposes a scalable, distributed approach for constrained Multi-Agent Reinforcement Learning by using local consensus over dual variables to ensure global constraint satisfaction without cen…

View →
cs.AIRecentJun 1, 2026

Coordination Graphs for Constrained Multi-Agent Reinforcement Learning

Santiago Amaya-Corredor, Miguel Calvo-Fullana, Anders Jonsson

The paper introduces Coordination Graphs for Constrained Multi-Agent Reinforcement Learning (CG-CMARL), a scalable framework that decomposes complex joint action spaces into pairwise regions to handle…

View →
cs.HCcs.AIRecentMay 27, 2026

Learning to Assign Prediction Tasks to Agents with Capacity Constraints

Shang Wu, Saatvik Kher, Padhraic Smyth

This paper develops a policy-learning framework to optimally assign prediction tasks to multiple agents, considering individual agent expertise and capacity constraints, achieving systematic performan…

View →
cs.ROcs.AIRecentJun 2, 2026

Self-Refining Agentic Reinforcement Learning for Vision-Conditioned UAV Navigation

Roohan Ahmed Khan, Yasheerah Yaqoot, Muhammad Ahsan Mustafa, Dzmitry Tsetserukou

The paper introduces AgenticRL, a self-refining reinforcement learning framework that uses a multimodal GPT agent to automatically design, refine, and deploy reward functions for complex UAV navigatio…

View →
cs.LGcs.AIRecentMay 29, 2026

Reinforcement Learning with Pairwise Preferences in Long-Term Decision Problems

Jonathan Colaço Carr, Prakash Panangaden, Doina Precup, Benjamin Van Roy

The paper introduces the Markov decision contest, a new framework for reinforcement learning using pairwise preferences, and proves that stationary Markov policies are optimal and solvable efficiently…

View →
cs.NIcs.LGEmpiricalRecentJun 11, 2026

Temporally Consistent Graph Q-Networks for Intelligent Network Control

Zacharias Veiksaar, Maxime Bouton

A novel multi-agent reinforcement learning algorithm, TC-GQN, is proposed for high-level control and orchestration of mobile networks, enabling energy savings while maintaining QoS.

View →
cs.GTcs.LGRecentJun 4, 2026

DNQ: Deep Nash Q-Network for Partially Observable n-Player Games

Qintong Xie, Edward Koh, Xavier Cadet, Peter Chin

The paper proposes DNQ, a scalable solver-in-the-loop framework for training agents in multi-turn simultaneous bidding games by leveraging pairwise payoff estimation to approximate complex equilibrium…

View →
cs.MAcs.CLcs.LGRecentJun 1, 2026

Multi-Agent Computer Use

Jing Yu Koh, Ruslan Salakhutdinov, Daniel Fried

The paper proposes Multi-Agent Computer Use (MACU) systems, which significantly improve performance on complex, long-horizon tasks by enabling parallel execution and dynamic task decomposition compare…

View →
cs.LGcs.AIRecentJun 1, 2026

Policy and World Modeling Co-Training for Language Agents

Ning Lu, Baijiong Lin, Shengcai Liu, Jiahao Wu +8 more

The paper proposes PaW, a co-training framework that uses standard RL rollouts to provide auxiliary world model supervision directly during policy training, significantly improving language agent perf…

View →
cs.CRcs.AIcs.LGRecentMar 17, 2026

Learning Communication Between Heterogeneous Agents in Multi-Agent Reinforcement Learning for Autonomous Cyber Defence

Alex Popa, Adrian Taylor, Ranwa Al Mallah

This paper demonstrates that using a communication algorithm (CommFormer) with heterogeneous agents significantly improves the speed and performance of multi-agent reinforcement learning for autonomou…

View →
cs.AIRecentMay 27, 2026

TCP-MCP: Landscape-Guided Co-Evolution of Prompts and Communication Topologies for Multi-Agent Systems

Yi Ding, Zijie Xuan, Haowei Zhou, Zhenyu Ju +5 more

The paper proposes TCP-MCP, a co-evolution framework that jointly optimizes agent prompts and communication topologies to design highly efficient and effective multi-agent systems.

View →
cs.MAcs.AIcs.LGRecentMay 29, 2026

Dreaming Of Others: Latent Teammate Modeling In World Models For Multi-Agent Reinforcement Learning

Tomas Leroy-Stone

The paper proposes extending world models for multi-agent reinforcement learning by factorizing the latent state to explicitly model and predict the unobservable intentions and behaviors of teammates.

View →
cs.CRcs.AIRecentApr 9, 2026

Building Better Environments for Autonomous Cyber Defence

Chris Hicks, Elizabeth Bates, Shae McFadden, Isaac Symes Thompson +11 more

This paper synthesizes expert knowledge from a workshop to provide a comprehensive framework and best-practice guidelines for developing high-quality reinforcement learning environments for autonomous…

View →
cs.LGcs.AIRecentMay 29, 2026

Learning to Construct Practical Agentic Systems

Aditya Kumar, Zhihan Lei, Jerry Yan, Joshua W. Momo +5 more

The paper proposes a modular agent framework and novel learning methods to design and optimize practical, cost-effective, and controllable LLM-based agentic systems.

View →
cs.MAcs.AIRecentMay 29, 2026

Safe Equilibrium Policy Optimization for Strategic Agent Policies

Karthika Arumugam, Kiran Kumar Manku, Amit Dhanda

The paper introduces Safe Equilibrium Policy Optimization (σepo{}) to train language models for multi-agent strategic tasks, achieving improved safety and robustness across various game domains.

View →
cs.LGstat.MLRecentJun 1, 2026

Minimax-Optimal Policy Regret in Partially Observable Markov Games

Raman Arora

The paper develops an optimistic maximum-likelihood algorithm that achieves $ ilde{O}(\sqrt{T})$ policy regret for sequential decision-making in partially observable Markov games against adaptive oppo…

View →
cs.AIcs.LGRecentMay 27, 2026

Differentiable Belief-based Opponent Shaping

Aarav G Sane, Karthik Sivachandran, Rohan Paleja

The paper proposes D-BOS, a novel differentiable method that shapes opponent behavior by directly manipulating the opponent's inferred belief state, outperforming existing techniques in multi-agent ga…

View →
cs.AIRecentMay 27, 2026

TRACER: Turn-level Regret Matching with Inner Reinforcement Credit for Cooperative Multi-LLM Reasoning

Chusen Li, Zhou Liu, Shuigeng Zhou, Wentao Zhang

TRACER introduces a novel turn-level reinforcement framework that enables cooperative multi-LLM reasoning by separating decision-making into a regret-matching controller and a generation-credit layer.

View →