~ similar to 2606.03736· 19 results
The paper develops an optimistic maximum-likelihood algorithm that achieves $ ilde{O}(\sqrt{T})$ policy regret for sequential decision-making in partially observable Markov games against adaptive oppo…
The paper introduces the Generative Response Model (GRM) to improve constrained auto-bidding by predicting future traffic and cost/value curves from a single bid multiplier, allowing for an exact, lig…
The paper proposes DNQ, a scalable solver-in-the-loop framework for training agents in multi-turn simultaneous bidding games by leveraging pairwise payoff estimation to approximate complex equilibrium…
The paper introduces MINTS, a minimalist Bayesian framework that simplifies sequential decision-making by placing priors only on the optimum location, allowing for the incorporation of structural cons…
Junyu Zhang, Feihong Yang, Jian Wang, Chao Wang +1 more
The paper introduces Global PSRO, a novel deep reinforcement learning framework that efficiently approximates Nash equilibria in large two-player zero-sum games by intelligently expanding the strategy…
Yuxiang Lin, Zihan Wang, Mengyang Liu, Yuxuan Shan +8 more
This paper introduces the concept of Budget-Aware Agents (BAGEN), showing that current LLM agents often fail to manage resources proactively, and proposes that incorporating early stop and interval es…
This paper analyzes the bid-ask spread and welfare in the Glosten-Milgrom model when the market maker observes a noisy, privacy-protected trade direction signal, deriving a specific 'privacy subsidy'…
The paper introduces Nested Contextual Causal Bandits (NCCBs) to model multi-timescale sequential decisions and proposes a certified policy optimization method, NCTS, that provides quantifiable risk b…
The paper analyzes the performance of an annealed softmax policy in a Bayesian bandit setting, proving that under specific prior conditions, it achieves near-optimal regret rates by effectively sampli…
This paper studies a dynamic assortment problem on a two-sided service platform with incomplete information and heterogeneous customers, and develops a data-driven algorithm to learn parameters and op…
This paper studies a dynamic assortment problem on a two-sided service platform with incomplete information and heterogeneous customers, and develops a data-driven algorithm to learn parameters and op…
The paper proposes 2FFS, a two-fidelity tree-search algorithm that efficiently identifies the best action in stochastic minimax trees by adaptively combining cheap, biased heuristic evaluations with e…
The paper proves that for resources with structural parallelizability (like divisibility and transferability), it is impossible to enforce a linear cost for concentrating influence, demonstrating that…
Ting Xu, Xu He, Yupu Lu, Jiankai Sun +3 more
The paper analyzes the entropy dynamics of Chain-of-Thought (CoT) reasoning, identifying a transition from an exploratory Uncertainty Region to a stable Confidence Region, which enables superior early…
This paper shows that standard optimal control in Markov Decision Processes (MDPs) with an absorbing catastrophic state naturally generates behavioral signatures mimicking prospect theory, even withou…
The paper introduces a Variational Encrypted Model Predictive Control (VEMPC) protocol that enables online MPC execution using only encrypted polynomial operations, eliminating the need for intermedia…
The paper proposes a scalable, distributed approach for constrained Multi-Agent Reinforcement Learning by using local consensus over dual variables to ensure global constraint satisfaction without cen…
The paper introduces Posterior Hybrid Bayesian Belief (PhyB), a novel framework that reformulates policy optimization in Bayesian Offline RL by approximating expectations as a convex combination over…
This paper provides a detailed message-passing scheme for EFE-based planning and clarifies the corrections needed for cross-entropy planning and full EFE-based planning.