ArXivCSExplorer
☆☆Bookmarks🏆RSSHow to UseFAQ
Built with and by Teycir Ben Soltane•
How to Use•FAQ•GitHub•arXiv.org•
Share:

~ similar to 2605.29919· 20 results

cs.GTcs.LGRecentJun 4, 2026

DNQ: Deep Nash Q-Network for Partially Observable n-Player Games

Qintong Xie, Edward Koh, Xavier Cadet, Peter Chin

The paper proposes DNQ, a scalable solver-in-the-loop framework for training agents in multi-turn simultaneous bidding games by leveraging pairwise payoff estimation to approximate complex equilibrium…

View →
cs.AIRecentMay 27, 2026

Global Policy-Space Response Oracles for Two-Player Zero-Sum Games

Junyu Zhang, Feihong Yang, Jian Wang, Chao Wang +1 more

The paper introduces Global PSRO, a novel deep reinforcement learning framework that efficiently approximates Nash equilibria in large two-player zero-sum games by intelligently expanding the strategy…

View →
cs.AIRecentMay 27, 2026

Deconstructing Spatial Complexity: Hierarchical Decomposition for LLM Spatial Reasoning

Yi Wang, Haojie Lu, Zhaofan Zhang, Li Chen +1 more

This paper introduces MCTS-Guided Group Relative Policy Optimization (M-GRPO) to enhance LLM spatial reasoning by improving the decomposition of complex tasks into optimal sub-tasks.

View →
cs.MAcs.AIcs.GTRecentMay 28, 2026

Evolutionary Dynamics of Cooperation in Next-Generation LLM Agent Systems: A Cross-Provider Empirical Extension

Francisco León Zúñiga Bolívar

The study extends cooperative bias testing across diverse, next-generation LLMs, finding that provider identity is a stronger predictor of cooperative equilibrium than model generation, and that noise…

View →
cs.LGstat.MLRecentJun 1, 2026

Minimax-Optimal Policy Regret in Partially Observable Markov Games

Raman Arora

The paper develops an optimistic maximum-likelihood algorithm that achieves $ ilde{O}(\sqrt{T})$ policy regret for sequential decision-making in partially observable Markov games against adaptive oppo…

View →
cs.MAcs.AIRecentMay 29, 2026

Safe Equilibrium Policy Optimization for Strategic Agent Policies

Karthika Arumugam, Kiran Kumar Manku, Amit Dhanda

The paper introduces Safe Equilibrium Policy Optimization (σepo{}) to train language models for multi-agent strategic tasks, achieving improved safety and robustness across various game domains.

View →
cs.AIRecentMay 28, 2026

Procedural Generation of First Person Shooter Maps using Map-Elites

Simone de Donato, Pier Luca Lanzi, Daniele Loiacono

This paper applies the MAP-Elites algorithm to procedurally generate diverse and high-quality First-Person Shooter maps using novel map representations.

View →
cs.AIRecentMay 28, 2026

PTCG-Bench: Can LLM Agents Master Pokémon Trading Card Game?

Dongdong Hua, Yifei Sun, Renhong Huang, Feng Gao +2 more

The paper introduces PTCG-Bench, a new benchmark using the Pokémon TCG to evaluate LLM agents' strategic decision-making and ability to self-evolve, finding that sustained self-evolution remains chall…

View →
cs.LGcs.AIcs.GTRecentJun 4, 2026

Regret Minimization with Adaptive Opponents in Repeated Games

Mingyang Liu, Asuman Ozdaglar, Tiancheng Yu, Kaiqing Zhang

This paper introduces Repeated Policy Regret (RP-Regret), a novel game-theoretic metric for analyzing regret in repeated games with adaptive opponents, and proposes algorithms to minimize it.

View →
cs.AIRecentMay 28, 2026

MINDGAMES: A Live Arena for Evaluating Social and Strategic Reasoning in Multi-Agent LLMs

Kevin Wang, Anna Thöni, Benjamin Kempinski, Bobby Cheng +49 more

The paper introduces Mindgames, a comprehensive multi-game arena for evaluating LLM agents' sustained social and strategic reasoning, demonstrating that current evaluations are limited by structural s…

View →
cs.AIcs.GTRecentMay 28, 2026

PokerSkill: LLMs Can Play Expert-Level Poker without Training or Solvers

Boning Li, Baoxiang Wang, Longbo Huang

The paper introduces PokerSkill, a novel framework that successfully enables Large Language Models (LLMs) to play expert-level poker by grounding their choices using human-designed, rule-based poker s…

View →
cs.CLcs.AIcs.LGRecentJun 1, 2026

MMG2Skill: Can Agents Distill In-the-Wild Guides into Self-Evolving Skills?

Xinyu Che, Junqi Xiong, Yunfei Ge, Xinping Lei +9 more

The paper introduces MMG2Skill, a closed-loop framework that converts noisy, human-oriented web guides into editable, executable skills, significantly improving agent performance across diverse tasks.

View →
cs.AIcs.LGRecentJun 1, 2026

Iteris: Agentic Research Loops for Computational Mathematics

Leheng Chen, Zihao Liu, Wanyi He, Bin Dong

The paper introduces Iteris, an agentic research system, demonstrating its capability to generate numerical evidence, constructions, and proof drafts for open problems in computational mathematics, re…

View →
cs.AIRecentMay 28, 2026

LLM-Evolved Domain-Independent Heuristics for Symbolic AI Planning

Elliot Gestrin, Jendrik Seipp

This paper introduces the first LLM-generated, domain-independent heuristics for symbolic AI planning, using evolutionary search to surpass the performance of hand-engineered state-of-the-art methods.

View →
cs.LGcs.AIRecentMay 29, 2026

Smaller Models are Natural Explorers for Policy-Level Diversity in GRPO

Yiming Ren, Yiran Xu, Zicheng Lin, Chufan Shi +7 more

The paper proposes S2L-PO, a framework that uses smaller, naturally diverse models as structured explorers to enhance the policy-level diversity and performance of larger language models during traini…

View →
cs.LGcs.AIRecentMay 29, 2026

The Terminal Representation in Reinforcement Learning

Amir Esterhuysen, Anders Jonsson

The paper introduces the Terminal Representation (TR), a novel, lower-dimensional, and structurally distinct formulation for encoding reward-weighted trajectories in RL that bypasses the need for eige…

View →
cs.LGcs.CLRecentJun 1, 2026

A Local Perturbation Theory for Cross-Domain Interference and Recovery in Multi-Domain RL

Lei Yang, Siyu Ding, Deyi Xiong

The paper proposes a local perturbation theory showing that cross-domain interference in multi-domain RL occurs via a low-dimensional shared conflict subspace, which can be selectively mitigated by sh…

View →
cs.AIphysics.app-phRecentMay 29, 2026

BilliardPhys-Bench: Benchmarking Physical Reasoning and Visual Dynamics of Multimodal LLMs

Ben Wang, Xiaogang Li, Ruochen Gao, Peiyao Xiao +5 more

The paper introduces BilliardPhys-Bench, a new benchmark that demonstrates that current multimodal LLMs struggle with complex physical reasoning and predicting object dynamics in simulated environment…

View →
cs.CLRecentMay 29, 2026

Skill is Not One-Size-Fits-All: Model-Aware Skill Alignment for LLM Agents

Jianxiang Yu, Jiapeng Zhu, Bochen Lin, Qier Cui +2 more

The paper introduces MASA, a model-aware skill alignment framework that adaptively rewrites general and task-specific skills for LLM agents, achieving superior performance across diverse backbones and…

View →
cs.LGcs.AIRecentMay 31, 2026

MViewRouter: Internalizing Geometric Equivariance via Multi-view Alternating Attention for Combinatorial Routing

Shiyan Liu, Bohan Tan, Yaoxin Wu, Yan Jin

MViewRouter proposes a multi-view framework that internalizes geometric equivariance using a Multi-view Alternating Attention mechanism to improve generalization and stabilize training for combinatori…

View →