Papers similar to 2605.29042

~ similar to 2605.29042· 20 results

cs.LGcs.AIRecentMay 30, 2026

Beyond Independent Manipulation: Individual Fairness-aware Strategic Classification with Peer Imitation

Xinpeng Lv, Chunyuan Zheng, Yunxin Mao, Renzhe Xu +8 more

The paper introduces Individual Fairness-aware Strategic Classification (IFSC), a framework that models interdependent strategic manipulation where agents imitate nearby positively decided peers to ac…

View →

cs.AIRecentMay 27, 2026

Global Policy-Space Response Oracles for Two-Player Zero-Sum Games

Junyu Zhang, Feihong Yang, Jian Wang, Chao Wang +1 more

The paper introduces Global PSRO, a novel deep reinforcement learning framework that efficiently approximates Nash equilibria in large two-player zero-sum games by intelligently expanding the strategy…

View →

cs.GTcs.LGRecentJun 4, 2026

DNQ: Deep Nash Q-Network for Partially Observable n-Player Games

Qintong Xie, Edward Koh, Xavier Cadet, Peter Chin

The paper proposes DNQ, a scalable solver-in-the-loop framework for training agents in multi-turn simultaneous bidding games by leveraging pairwise payoff estimation to approximate complex equilibrium…

View →

cs.AIRecentMay 30, 2026

Latent Reward Steering: An Adaptive Inference-Time Framework that Implicitly Promotes Cognitive Behaviors in Reasoning LLMs

Jiakang Li, Guanyu Zhu, Can Jin, Chenxi Huang +7 more

The paper introduces Latent Reward Steering (LRS), an adaptive inference-time framework that implicitly improves the reasoning ability of LLMs by guiding the model's internal latent states based on a…

View →

cs.AIcs.LGRecentMay 30, 2026

Regularized Offline Policy Optimization with Posterior Hybrid Bayesian Belief

Hongqiang Lin, Pengfei Wang, Nenggan Zheng

The paper introduces Posterior Hybrid Bayesian Belief (PhyB), a novel framework that reformulates policy optimization in Bayesian Offline RL by approximating expectations as a convex combination over…

View →

cs.AIRecentMay 28, 2026

MINDGAMES: A Live Arena for Evaluating Social and Strategic Reasoning in Multi-Agent LLMs

Kevin Wang, Anna Thöni, Benjamin Kempinski, Bobby Cheng +49 more

The paper introduces Mindgames, a comprehensive multi-game arena for evaluating LLM agents' sustained social and strategic reasoning, demonstrating that current evaluations are limited by structural s…

View →

cs.AIcs.CLcs.LGRecentMay 28, 2026

When Should Models Change Their Minds? Contextual Belief Management in Large Language Models

Haoming Xu, Weihong Xu, Zongrui Li, Mengru Wang +5 more

The paper introduces Contextual Belief Management (CBM) to address how LLMs should manage accumulating information over long interactions, showing that reinforcement learning significantly improves be…

View →

cs.MAcs.AIRecentMay 29, 2026

Safe Equilibrium Policy Optimization for Strategic Agent Policies

Karthika Arumugam, Kiran Kumar Manku, Amit Dhanda

The paper introduces Safe Equilibrium Policy Optimization (σepo{}) to train language models for multi-agent strategic tasks, achieving improved safety and robustness across various game domains.

View →

cs.LGstat.MLRecentJun 1, 2026

Minimax-Optimal Policy Regret in Partially Observable Markov Games

Raman Arora

The paper develops an optimistic maximum-likelihood algorithm that achieves $ ilde{O}(\sqrt{T})$ policy regret for sequential decision-making in partially observable Markov games against adaptive oppo…

View →

cs.MAcs.AIcs.LGRecentMay 29, 2026

Dreaming Of Others: Latent Teammate Modeling In World Models For Multi-Agent Reinforcement Learning

Tomas Leroy-Stone

The paper proposes extending world models for multi-agent reinforcement learning by factorizing the latent state to explicitly model and predict the unobservable intentions and behaviors of teammates.

View →

cs.AIRecentJun 1, 2026

RoleCDE:Benchmarking and Mitigating Role-Alignment Trade-offs in Role-Playing Agents

Huayi Lai, Shichao Song, Simin Niu, Hanyu Wang +4 more

The paper introduces RoleCDE, a novel benchmark that evaluates role-playing agents' ability to resolve conflicts between role-specific values and general alignment constraints, revealing a 'Role Value…

View →

cs.LGcs.AIRecentMay 28, 2026

Scalable Constrained Multi-Agent Reinforcement Learning via State Augmentation and Consensus for Separable Dynamics

Santiago Amaya-Corredor, Miguel Calvo-Fullana, Anders Jonsson

The paper proposes a scalable, distributed approach for constrained Multi-Agent Reinforcement Learning by using local consensus over dual variables to ensure global constraint satisfaction without cen…

View →

cs.LGcs.AIRecentMay 29, 2026

Reinforcement Learning with Pairwise Preferences in Long-Term Decision Problems

Jonathan Colaço Carr, Prakash Panangaden, Doina Precup, Benjamin Van Roy

The paper introduces the Markov decision contest, a new framework for reinforcement learning using pairwise preferences, and proves that stationary Markov policies are optimal and solvable efficiently…

View →

cs.CRcs.AIcs.RORecentMar 24, 2026

TRAP: Hijacking VLA CoT-Reasoning via Adversarial Patches

Zhengxian Huang, Wenjun Zhu, Haoxuan Qiu, Xiaoyu Ji +1 more

This paper introduces TRAP, an adversarial attack that demonstrates how physical patches can hijack the Chain-of-Thought (CoT) reasoning process in Vision-Language-Action (VLA) models, forcing them to…

View →

cs.AIRecentMay 28, 2026

PTCG-Bench: Can LLM Agents Master Pokémon Trading Card Game?

Dongdong Hua, Yifei Sun, Renhong Huang, Feng Gao +2 more

The paper introduces PTCG-Bench, a new benchmark using the Pokémon TCG to evaluate LLM agents' strategic decision-making and ability to self-evolve, finding that sustained self-evolution remains chall…

View →

cs.AIRecentJun 3, 2026

What Type of Inference is Active Inference?

Wouter W. L. Nuijten, Mykola Lukashchuk, Thijs van de Laar, Bert de Vries

This paper provides a detailed message-passing scheme for EFE-based planning and clarifies the corrections needed for cross-entropy planning and full EFE-based planning.

View →

cs.CRcs.LGcs.MARecentApr 6, 2026

Explainable Autonomous Cyber Defense using Adversarial Multi-Agent Reinforcement Learning

Yiyao Zhang, Diksha Goel, Hussain Ahmad

The paper introduces C-MADF, a causally constrained multi-agent framework that significantly reduces false positives in autonomous cyber defense by restricting response actions to structurally consist…

View →

cs.AIRecentMay 29, 2026

Choosing the Lens: Strategic Perspective Activation in Context-Dependent Argumentation

Albert Sadowski, Jarosław A. Chudziak

The paper introduces Context-Dependent Argumentation Frameworks (CDAFs) to model how an agent strategically manipulates the success of arguments by choosing the external evaluation context.

View →

cs.CLcs.AIRecentMay 29, 2026

PatchWorld: Gradient-Free Optimization of Executable World Models

Jiaxin Bai, Yue Guo, Yifei Dong, Jiaxuan Xiong +12 more

PatchWorld introduces a gradient-free framework to create executable Python world models from offline trajectories, achieving high planning scores by inducing symbolic belief-state programs.

View →

cs.CLcs.AIRecentJun 1, 2026

A Primer in Post-Training Reasoning Data: What We Know About How It Works

Yaoming Li, Guangxiang Zhao, Qilong Shi, Lin Sun +2 more

This paper synthesizes over 150 scattered studies and reports to provide the first comprehensive primer on post-training reasoning data, organizing the field around data objects, utility, construction…

View →