20 results for “Understanding of multi-agent reinforce learning”
CS papers onlyHybrid search: Keyword + semantic, ranked by combined score.ⓘ
Want pure semantic search? Try claim verification →
This paper develops a policy-learning framework to optimally assign prediction tasks to multiple agents, considering individual agent expertise and capacity constraints, achieving systematic performan…
The paper proposes a Network Distributed Multi-Agent Reinforcement Learning (ND-MARL) framework that enables stable, scalable consensus control for large swarms of quadcopters using only local neighbo…
The paper proposes a scalable, distributed approach for constrained Multi-Agent Reinforcement Learning by using local consensus over dual variables to ensure global constraint satisfaction without cen…
The paper introduces AgenticRL, a self-refining reinforcement learning framework that uses a multimodal GPT agent to automatically design, refine, and deploy reward functions for complex UAV navigatio…
The paper proposes D-BOS, a novel differentiable method that shapes opponent behavior by directly manipulating the opponent's inferred belief state, outperforming existing techniques in multi-agent ga…
The paper introduces Coordination Graphs for Constrained Multi-Agent Reinforcement Learning (CG-CMARL), a scalable framework that decomposes complex joint action spaces into pairwise regions to handle…
TRACER introduces a novel turn-level reinforcement framework that enables cooperative multi-LLM reasoning by separating decision-making into a regret-matching controller and a generation-credit layer.
This paper investigates how individual agent biases amplify system-wide unfairness in multi-agent systems, demonstrating that uniform exposure to bias can elevate overall bias beyond the sum of indivi…
This paper demonstrates that using a communication algorithm (CommFormer) with heterogeneous agents significantly improves the speed and performance of multi-agent reinforcement learning for autonomou…
Tao Chen, Gangwei Jiang, Pengyu Cheng, Siyuan Huang +9 more
The paper proposes Skill-RM, a unified framework that treats reward modeling as an agentic task to consistently integrate diverse evaluation criteria, achieving superior performance over traditional m…
This paper investigates the scaling behavior of homogeneous LLM-driven Multi-Agent Systems (MAS) and finds that performance exhibits diminishing returns due to coordination overhead, rather than scali…
The paper introduces the Markov decision contest, a new framework for reinforcement learning using pairwise preferences, and proves that stationary Markov policies are optimal and solvable efficiently…
Wenwu Li, Yuran Song, Mingze Zhao, Bo Jin +1 more
The paper proposes a novel temporal and structural credit assignment framework to efficiently optimize multi-agent LLM systems by decomposing the error signal and using targeted, discrete gradient upd…
Ning Lu, Baijiong Lin, Shengcai Liu, Jiahao Wu +8 more
The paper proposes PaW, a co-training framework that uses standard RL rollouts to provide auxiliary world model supervision directly during policy training, significantly improving language agent perf…
The paper proposes extending world models for multi-agent reinforcement learning by factorizing the latent state to explicitly model and predict the unobservable intentions and behaviors of teammates.
The paper introduces C-MADF, a causally constrained multi-agent framework that significantly reduces false positives in autonomous cyber defense by restricting response actions to structurally consist…
Tong Liu, Cheng Qian, Matej Cief, Yuan He +3 more
This paper analyzes tool-calling in LLM agents, demonstrating that evaluation results are highly sensitive to implementation details and proposing new techniques to significantly improve the efficienc…
This paper analyzes Best-of-$N$ preference data, deriving explicit reward targets for independent-reference variants and establishing design principles for choosing $N$ and the base distribution to op…
This paper proposes a method for ensuring safety in multi-agent reinforce learning through decentralized execution, using a shared global specification and a non-stationary multi-armed bandit.