Papers similar to 2606.03736

~ similar to 2606.03736· 19 results

cs.LGstat.MLRecentJun 1, 2026

Minimax-Optimal Policy Regret in Partially Observable Markov Games

The paper develops an optimistic maximum-likelihood algorithm that achieves $ ilde{O}(\sqrt{T})$ policy regret for sequential decision-making in partially observable Markov games against adaptive oppo…

View →

cs.AIRecentMay 27, 2026

Constrained Auto-Bidding via Generative Response Modeling

Eunseok Yang, Xingdong Zuo, Kyung-Min Kim

The paper introduces the Generative Response Model (GRM) to improve constrained auto-bidding by predicting future traffic and cost/value curves from a single bid multiplier, allowing for an exact, lig…

View →

cs.GTcs.LGRecentJun 4, 2026

DNQ: Deep Nash Q-Network for Partially Observable n-Player Games

Qintong Xie, Edward Koh, Xavier Cadet, Peter Chin

The paper proposes DNQ, a scalable solver-in-the-loop framework for training agents in multi-turn simultaneous bidding games by leveraging pairwise payoff estimation to approximate complex equilibrium…

View →

math.OCcs.AIcs.LGRecentJun 1, 2026

MINTS: Minimalist Thompson Sampling

Kaizheng Wang

The paper introduces MINTS, a minimalist Bayesian framework that simplifies sequential decision-making by placing priors only on the optimum location, allowing for the incorporation of structural cons…

View →

cs.AIRecentMay 27, 2026

Global Policy-Space Response Oracles for Two-Player Zero-Sum Games

Junyu Zhang, Feihong Yang, Jian Wang, Chao Wang +1 more

The paper introduces Global PSRO, a novel deep reinforcement learning framework that efficiently approximates Nash equilibria in large two-player zero-sum games by intelligently expanding the strategy…

View →

cs.LGcs.AIcs.CLRecentMay 29, 2026

BAGEN: Are LLM Agents Budget-Aware?

Yuxiang Lin, Zihan Wang, Mengyang Liu, Yuxuan Shan +8 more

This paper introduces the concept of Budget-Aware Agents (BAGEN), showing that current LLM agents often fail to manage resources proactively, and proposes that incorporating early stop and interval es…

View →

cs.GTcs.CRmath.PRRecentMay 19, 2026

The Privacy Subsidy in Glosten-Milgrom: Bid-Ask Spread and Welfare under Flip-Noise Direction Observation

Yuki Nakamura

This paper analyzes the bid-ask spread and welfare in the Glosten-Milgrom model when the market maker observes a noisy, privacy-protected trade direction signal, deriving a specific 'privacy subsidy'…

View →

cs.AIcs.LGRecentMay 28, 2026

Certified Policy Optimisation for Nested Causal Bandits via PAC-Bayes Risk

Tim Woydt, Paul-David Zuercher

The paper introduces Nested Contextual Causal Bandits (NCCBs) to model multi-timescale sequential decisions and proposes a certified policy optimization method, NCTS, that provides quantifiable risk b…

View →

cs.LGcs.AIRecentMay 29, 2026

Annealed Softmax Greedy in Many-Armed Bayesian Bandits

William Overman, Mohsen Bayati

The paper analyzes the performance of an annealed softmax policy in a Bayesian bandit setting, proving that under specific prior conditions, it achieves near-optimal regret rates by effectively sampli…

View →

cs.LGmath.OCmath.PREmpiricalRecentJun 9, 2026

Data-Driven Dynamic Assortment in Online Platforms: Learning about Two Sides

Rahul Roy, Nur Sunar, Jayashankar M. Swaminathan

This paper studies a dynamic assortment problem on a two-sided service platform with incomplete information and heterogeneous customers, and develops a data-driven algorithm to learn parameters and op…

View →

cs.LGmath.OCmath.PREmpiricalRecentJun 9, 2026

Data-Driven Dynamic Assortment in Online Platforms: Learning about Two Sides

Rahul Roy, Nur Sunar, Jayashankar M. Swaminathan

View →

cs.LGcs.AIRecentJun 1, 2026

Two-Fidelity Best-Action Identification for Stochastic Minimax Tree

Peter Chen, Xi Chen

The paper proposes 2FFS, a two-fidelity tree-search algorithm that efficiently identifies the best action in stochastic minimax trees by adaptively combining cheap, biased heuristic evaluations with e…

View →

cs.CRRecentMay 28, 2026

Scarcity Is Not Enough: An Impossibility Result for Linear Sybil Cost Under Parallelizable Resources

Homayoun Maleki, Nekane Sainz, Jon Legarda, Igor Santos-Grueiro

The paper proves that for resources with structural parallelizability (like divisibility and transferability), it is impossible to enforce a linear cost for concentrating influence, demonstrating that…

View →

cs.CLcs.LGRecentJun 1, 2026

Unveiling the Entropy Dynamics of Chain-of-Thought Reasoning

Ting Xu, Xu He, Yupu Lu, Jiankai Sun +3 more

The paper analyzes the entropy dynamics of Chain-of-Thought (CoT) reasoning, identifying a transition from an exploratory Uncertainty Region to a stable Confidence Region, which enables superior early…

View →

cs.AIcs.LGecon.THRecentMay 31, 2026

Prospect-Theory Behavior from Bellman Optimality in MDPs with Catastrophic States

Yujiao Chen

This paper shows that standard optimal control in Markov Decision Processes (MDPs) with an absorbing catastrophic state naturally generates behavioral signatures mimicking prospect theory, even withou…

View →

eess.SYcs.CRmath.OCRecentMar 19, 2026

Variational Encrypted Model Predictive Control

Jihoon Suh, Yeongjun Jang, Junsoo Kim, Takashi Tanaka

The paper introduces a Variational Encrypted Model Predictive Control (VEMPC) protocol that enables online MPC execution using only encrypted polynomial operations, eliminating the need for intermedia…

View →

cs.LGcs.AIRecentMay 28, 2026

Scalable Constrained Multi-Agent Reinforcement Learning via State Augmentation and Consensus for Separable Dynamics

Santiago Amaya-Corredor, Miguel Calvo-Fullana, Anders Jonsson

The paper proposes a scalable, distributed approach for constrained Multi-Agent Reinforcement Learning by using local consensus over dual variables to ensure global constraint satisfaction without cen…

View →

cs.AIcs.LGRecentMay 30, 2026

Regularized Offline Policy Optimization with Posterior Hybrid Bayesian Belief

Hongqiang Lin, Pengfei Wang, Nenggan Zheng

The paper introduces Posterior Hybrid Bayesian Belief (PhyB), a novel framework that reformulates policy optimization in Bayesian Offline RL by approximating expectations as a convex combination over…

View →

cs.AIRecentJun 3, 2026

What Type of Inference is Active Inference?

Wouter W. L. Nuijten, Mykola Lukashchuk, Thijs van de Laar, Bert de Vries

This paper provides a detailed message-passing scheme for EFE-based planning and clarifies the corrections needed for cross-entropy planning and full EFE-based planning.

View →