Papers similar to 2605.29645

~ similar to 2605.29645· 19 results

math.OCcs.AIcs.LGRecentJun 1, 2026

MINTS: Minimalist Thompson Sampling

The paper introduces MINTS, a minimalist Bayesian framework that simplifies sequential decision-making by placing priors only on the optimum location, allowing for the incorporation of structural cons…

View →

cs.CRcs.LGRecentMay 6, 2026

Differential Privacy in the Extensive-Form Bandit Problem

Stephen Pasteris, Rahul Savani, Theodore Turocy

The paper proposes an algorithm for the extensive-form bandit problem that achieves $ ilde{O}(rac{ ext{total actions} imes ext{strategies} imes ext{trials}}{ ext{epsilon}})$ regret while satisfyi…

View →

cs.LGstat.MLRecentJun 2, 2026

Online Learning with Gradient-Variation Interval Regret

Yan-Feng Xie, Shuche Wang, Peng Zhao, Zhi-Hua Zhou

The paper proposes a novel online learning algorithm that achieves an interval regret bound scaling with gradient variation, providing strong theoretical guarantees for non-stationary environments.

View →

cs.AIcs.LGRecentMay 28, 2026

Certified Policy Optimisation for Nested Causal Bandits via PAC-Bayes Risk

Tim Woydt, Paul-David Zuercher

The paper introduces Nested Contextual Causal Bandits (NCCBs) to model multi-timescale sequential decisions and proposes a certified policy optimization method, NCTS, that provides quantifiable risk b…

View →

stat.MLcs.LGRecentJun 1, 2026

Doing well with less! On Sampling Techniques for Empirical Pairwise Loss Estimation/Minimization

Louise Davy, Stephan Clémençon, Charlotte Laclau

This paper introduces survey sampling techniques to estimate or minimize empirical pairwise loss functions, showing that targeting informative pairs significantly reduces computational cost while main…

View →

cs.LGcs.AIRecentMay 29, 2026

Reinforcement Learning with Pairwise Preferences in Long-Term Decision Problems

Jonathan Colaço Carr, Prakash Panangaden, Doina Precup, Benjamin Van Roy

The paper introduces the Markov decision contest, a new framework for reinforcement learning using pairwise preferences, and proves that stationary Markov policies are optimal and solvable efficiently…

View →

stat.MLcs.AIcs.LGRecentMay 28, 2026

Reward Learning from Best-of-$N$ Preference Data: Targets, Tradeoffs, and Design Principles

Rattana Pukdee, Maria-Florina Balcan, Pradeep Ravikumar

This paper analyzes Best-of-$N$ preference data, deriving explicit reward targets for independent-reference variants and establishing design principles for choosing $N$ and the base distribution to op…

View →

cs.LGcs.AIRecentMay 29, 2026

Annealed Softmax Greedy in Many-Armed Bayesian Bandits

William Overman, Mohsen Bayati

The paper analyzes the performance of an annealed softmax policy in a Bayesian bandit setting, proving that under specific prior conditions, it achieves near-optimal regret rates by effectively sampli…

View →

cs.LGstat.MLRecentJun 1, 2026

Minimax-Optimal Policy Regret in Partially Observable Markov Games

Raman Arora

The paper develops an optimistic maximum-likelihood algorithm that achieves $ ilde{O}(\sqrt{T})$ policy regret for sequential decision-making in partially observable Markov games against adaptive oppo…

View →

cs.LGcs.AIRecentMay 29, 2026

Smaller Models are Natural Explorers for Policy-Level Diversity in GRPO

Yiming Ren, Yiran Xu, Zicheng Lin, Chufan Shi +7 more

The paper proposes S2L-PO, a framework that uses smaller, naturally diverse models as structured explorers to enhance the policy-level diversity and performance of larger language models during traini…

View →

cs.LGcs.AIRecentJun 1, 2026

Two-Fidelity Best-Action Identification for Stochastic Minimax Tree

Peter Chen, Xi Chen

The paper proposes 2FFS, a two-fidelity tree-search algorithm that efficiently identifies the best action in stochastic minimax trees by adaptively combining cheap, biased heuristic evaluations with e…

View →

cs.LGmath.OCmath.PREmpiricalRecentJun 9, 2026

Data-Driven Dynamic Assortment in Online Platforms: Learning about Two Sides

Rahul Roy, Nur Sunar, Jayashankar M. Swaminathan

This paper studies a dynamic assortment problem on a two-sided service platform with incomplete information and heterogeneous customers, and develops a data-driven algorithm to learn parameters and op…

View →

cs.LGmath.OCmath.PREmpiricalRecentJun 9, 2026

Data-Driven Dynamic Assortment in Online Platforms: Learning about Two Sides

Rahul Roy, Nur Sunar, Jayashankar M. Swaminathan

View →

cs.LGcs.AIcs.GTRecentJun 4, 2026

Regret Minimization with Adaptive Opponents in Repeated Games

Mingyang Liu, Asuman Ozdaglar, Tiancheng Yu, Kaiqing Zhang

This paper introduces Repeated Policy Regret (RP-Regret), a novel game-theoretic metric for analyzing regret in repeated games with adaptive opponents, and proposes algorithms to minimize it.

View →

cs.LGcs.AIRecentMay 29, 2026

From Rashomon Theory to PRAXIS: Efficient Decision Tree Rashomon Sets

Zakk Heile, Hayden McTavish, Varun Babbar, Margo Seltzer +1 more

The paper introduces PRAXIS, a novel algorithm that efficiently approximates the computation of 'Rashomon sets' for decision trees, significantly reducing memory and runtime complexity.

View →

cs.CLcs.AIcs.LGRecentMay 28, 2026

Compute Allocation in Evolutionary Search: From Depth-Breadth to Multi-Armed Bandits

Sixue Xing, Haoyu He, Kerui Wu, Zhuo Yang +3 more

The paper proposes BaSE, a multi-armed bandit approach, to optimally allocate a fixed budget of LLM calls across parallel evolutionary search trajectories, significantly improving mean fitness and rel…

View →

cs.LGcs.CRRecentMay 7, 2026

Trade-off Functions for DP-SGD with Subsampling based on Random Shuffling: Tight Upper and Lower Bounds

Marten van Dijk, Murat Bilgehan Ertan

The paper provides a tight, transparent, and closed-form analysis of the trade-off function for Differentially Private SGD using random shuffling, significantly improving upon previous methods and est…

View →

stat.MLcs.LGmath.STRecentJun 3, 2026

Bayesian learning for the stochastic shortest path problem

Chon Wai Ho, Sumeetpal S. Singh, Jiaqi Guo

The paper proposes a novel Bayesian framework to learn the optimal decision strategy for the stochastic shortest path problem by directly constructing the posterior beliefs for the action-value functi…

View →

cs.LGcs.AIRecentJun 2, 2026

Using Reward Uncertainty to Induce Diverse Behaviour in Reinforcement Learning

Anthony GX-Chen, Ankit Anand, Gheorghe Comanici, Zaheer Abbas +6 more

The paper proposes a novel RL framework that naturally induces diverse agent behavior by reformulating the objective to treat the reward as a distribution over functions, making diversity a rational r…

View →