ArXivCSExplorer
☆☆Bookmarks🏆RSSHow to UseFAQ
Built with and by Teycir Ben Soltane•
How to Use•FAQ•GitHub•arXiv.org•
Share:

20 results for “Understanding of multi-agent reinforce learning”

CS papers only

Hybrid search: Keyword + semantic, ranked by combined score.ⓘ

Want pure semantic search? Try claim verification →

cs.HCcs.AIRecentMay 27, 2026

Learning to Assign Prediction Tasks to Agents with Capacity Constraints

Shang Wu, Saatvik Kher, Padhraic Smyth

This paper develops a policy-learning framework to optimally assign prediction tasks to multiple agents, considering individual agent expertise and capacity constraints, achieving systematic performan…

View →
cs.ROcs.AIcs.LGRecentJun 1, 2026

Network Distributed Multi-Agent Reinforcement Learning for Consensus Control of Quadcopters

Youssef Mahran, Zeyad Gamal, Aamir Ahmad, Ayman El-Badawy

The paper proposes a Network Distributed Multi-Agent Reinforcement Learning (ND-MARL) framework that enables stable, scalable consensus control for large swarms of quadcopters using only local neighbo…

View →
cs.LGcs.AIRecentMay 28, 2026

Scalable Constrained Multi-Agent Reinforcement Learning via State Augmentation and Consensus for Separable Dynamics

Santiago Amaya-Corredor, Miguel Calvo-Fullana, Anders Jonsson

The paper proposes a scalable, distributed approach for constrained Multi-Agent Reinforcement Learning by using local consensus over dual variables to ensure global constraint satisfaction without cen…

View →
cs.ROcs.AIRecentJun 2, 2026

Self-Refining Agentic Reinforcement Learning for Vision-Conditioned UAV Navigation

Roohan Ahmed Khan, Yasheerah Yaqoot, Muhammad Ahsan Mustafa, Dzmitry Tsetserukou

The paper introduces AgenticRL, a self-refining reinforcement learning framework that uses a multimodal GPT agent to automatically design, refine, and deploy reward functions for complex UAV navigatio…

View →
cs.AIcs.LGRecentMay 27, 2026

Differentiable Belief-based Opponent Shaping

Aarav G Sane, Karthik Sivachandran, Rohan Paleja

The paper proposes D-BOS, a novel differentiable method that shapes opponent behavior by directly manipulating the opponent's inferred belief state, outperforming existing techniques in multi-agent ga…

View →
cs.AIRecentJun 1, 2026

Coordination Graphs for Constrained Multi-Agent Reinforcement Learning

Santiago Amaya-Corredor, Miguel Calvo-Fullana, Anders Jonsson

The paper introduces Coordination Graphs for Constrained Multi-Agent Reinforcement Learning (CG-CMARL), a scalable framework that decomposes complex joint action spaces into pairwise regions to handle…

View →
cs.AIRecentMay 27, 2026

TRACER: Turn-level Regret Matching with Inner Reinforcement Credit for Cooperative Multi-LLM Reasoning

Chusen Li, Zhou Liu, Shuigeng Zhou, Wentao Zhang

TRACER introduces a novel turn-level reinforcement framework that enables cooperative multi-LLM reasoning by separating decision-making into a regret-matching controller and a generation-credit layer.

View →
cs.AIRecentMay 27, 2026

Examining Agents' Bias Amplification versus Suppression in Multi-Agent Systems

Zejian Eric Wu, Zhongyi Jiang, Yuan Zhuang, Paul Jen-Hwa Hu

This paper investigates how individual agent biases amplify system-wide unfairness in multi-agent systems, demonstrating that uniform exposure to bias can elevate overall bias beyond the sum of indivi…

View →
cs.CRcs.AIcs.LGRecentMar 17, 2026

Learning Communication Between Heterogeneous Agents in Multi-Agent Reinforcement Learning for Autonomous Cyber Defence

Alex Popa, Adrian Taylor, Ranwa Al Mallah

This paper demonstrates that using a communication algorithm (CommFormer) with heterogeneous agents significantly improves the speed and performance of multi-agent reinforcement learning for autonomou…

View →
cs.LGcs.CLRecentJun 2, 2026

Skill-RM: Unifying Heterogeneous Evaluation Criteria via Agent Skill

Tao Chen, Gangwei Jiang, Pengyu Cheng, Siyuan Huang +9 more

The paper proposes Skill-RM, a unified framework that treats reward modeling as an agentic task to consistently integrate diverse evaluation criteria, achieving superior performance over traditional m…

View →
cs.MAcs.AIcs.CYRecentMay 30, 2026

Scaling Behavior of Single LLM-Driven Multi-Agent Systems

Jialing Li, Zhouhong Gu, Yin Cai, Hongwei Feng

This paper investigates the scaling behavior of homogeneous LLM-driven Multi-Agent Systems (MAS) and finds that performance exhibits diminishing returns due to coordination overhead, rather than scali…

View →
cs.LGcs.AIRecentMay 29, 2026

Reinforcement Learning with Pairwise Preferences in Long-Term Decision Problems

Jonathan Colaço Carr, Prakash Panangaden, Doina Precup, Benjamin Van Roy

The paper introduces the Markov decision contest, a new framework for reinforcement learning using pairwise preferences, and proves that stationary Markov policies are optimal and solvable efficiently…

View →
cs.MAcs.AIRecentMay 28, 2026

Unifying Temporal and Structural Credit Assignment in LLM-Based Multi-Agent Prompt Optimization

Wenwu Li, Yuran Song, Mingze Zhao, Bo Jin +1 more

The paper proposes a novel temporal and structural credit assignment framework to efficiently optimize multi-agent LLM systems by decomposing the error signal and using targeted, discrete gradient upd…

View →
cs.LGcs.AIRecentJun 1, 2026

Policy and World Modeling Co-Training for Language Agents

Ning Lu, Baijiong Lin, Shengcai Liu, Jiahao Wu +8 more

The paper proposes PaW, a co-training framework that uses standard RL rollouts to provide auxiliary world model supervision directly during policy training, significantly improving language agent perf…

View →
cs.MAcs.AIcs.LGRecentMay 29, 2026

Dreaming Of Others: Latent Teammate Modeling In World Models For Multi-Agent Reinforcement Learning

Tomas Leroy-Stone

The paper proposes extending world models for multi-agent reinforcement learning by factorizing the latent state to explicitly model and predict the unobservable intentions and behaviors of teammates.

View →
cs.CRcs.LGcs.MARecentApr 6, 2026

Explainable Autonomous Cyber Defense using Adversarial Multi-Agent Reinforcement Learning

Yiyao Zhang, Diksha Goel, Hussain Ahmad

The paper introduces C-MADF, a causally constrained multi-agent framework that significantly reduces false positives in autonomous cyber defense by restricting response actions to structurally consist…

View →
cs.LGcs.AIRecentMay 28, 2026

On Effectiveness and Efficiency of Agentic Tool-calling and RL Training

Tong Liu, Cheng Qian, Matej Cief, Yuan He +3 more

This paper analyzes tool-calling in LLM agents, demonstrating that evaluation results are highly sensitive to implementation details and proposing new techniques to significantly improve the efficienc…

View →
stat.MLcs.AIcs.LGRecentMay 28, 2026

Reward Learning from Best-of-$N$ Preference Data: Targets, Tradeoffs, and Design Principles

Rattana Pukdee, Maria-Florina Balcan, Pradeep Ravikumar

This paper analyzes Best-of-$N$ preference data, deriving explicit reward targets for independent-reference variants and establishing design principles for choosing $N$ and the base distribution to op…

View →
cs.LGcs.MAEmpiricalRecentJun 12, 2026

Contract-Based Compositional Shielding for Safe Multi-Agent Reinforcement Learning

Omar Adalat, Edwin Hamel-De le Court, Francesco Belardinelli

This paper proposes a method for ensuring safety in multi-agent reinforce learning through decentralized execution, using a shared global specification and a non-stationary multi-armed bandit.

View →