~ similar to 2606.01655· 19 results
Liad Erez, Fan Chen, Alon Cohen, Tomer Koren +3 more
The paper analyzes the sample complexity of contextual bandits in the $s$-sparse setting, achieving optimal sample bounds for identifying an $\epsilon$-optimal policy.
The paper analyzes the performance of an annealed softmax policy in a Bayesian bandit setting, proving that under specific prior conditions, it achieves near-optimal regret rates by effectively sampli…
The paper proposes a novel online learning algorithm that achieves an interval regret bound scaling with gradient variation, providing strong theoretical guarantees for non-stationary environments.
The paper addresses the failure of fixed-price inference in resource-constrained pricing controllers by developing a target-aware controller that tracks local densities and provides certified, shrinki…
The paper introduces Nested Contextual Causal Bandits (NCCBs) to model multi-timescale sequential decisions and proposes a certified policy optimization method, NCTS, that provides quantifiable risk b…
The paper develops an optimistic maximum-likelihood algorithm that achieves $ ilde{O}(\sqrt{T})$ policy regret for sequential decision-making in partially observable Markov games against adaptive oppo…
The paper proposes 2FFS, a two-fidelity tree-search algorithm that efficiently identifies the best action in stochastic minimax trees by adaptively combining cheap, biased heuristic evaluations with e…
This paper introduces survey sampling techniques to estimate or minimize empirical pairwise loss functions, showing that targeting informative pairs significantly reduces computational cost while main…
The paper proposes a novel Bayesian framework to learn the optimal decision strategy for the stochastic shortest path problem by directly constructing the posterior beliefs for the action-value functi…
This paper analyzes Best-of-$N$ preference data, deriving explicit reward targets for independent-reference variants and establishing design principles for choosing $N$ and the base distribution to op…
The paper introduces the Markov decision contest, a new framework for reinforcement learning using pairwise preferences, and proves that stationary Markov policies are optimal and solvable efficiently…
This paper shows that standard optimal control in Markov Decision Processes (MDPs) with an absorbing catastrophic state naturally generates behavioral signatures mimicking prospect theory, even withou…
The paper introduces Entropy-Cut Metropolis-Hastings, an efficient sampling method that uses next-token entropy to identify and resample from critical decision points in a reasoning trace, significant…
Yuanjian Xu, Jianing Hao, Wanbo Zhang, Zhong Li +1 more
The paper proposes DiReCT, a novel framework that treats data selection during LLM annealing as a constrained optimization problem based on the spectral geometry of the loss landscape, achieving state…
Sixue Xing, Haoyu He, Kerui Wu, Zhuo Yang +3 more
The paper proposes BaSE, a multi-armed bandit approach, to optimally allocate a fixed budget of LLM calls across parallel evolutionary search trajectories, significantly improving mean fitness and rel…
The paper establishes information-theoretic lower bounds for stochastic optimization using low-bit gradients by reducing the problem to compressed Gaussian mean estimation, yielding sharp bounds on co…
Zakk Heile, Hayden McTavish, Varun Babbar, Margo Seltzer +1 more
The paper introduces PRAXIS, a novel algorithm that efficiently approximates the computation of 'Rashomon sets' for decision trees, significantly reducing memory and runtime complexity.
Johanna Menn, Miriam Kober, Paul Brunzema, David Stenger +1 more
The paper introduces local Preferential Bayesian Optimization (PBO) methods that adapt high-dimensional Bayesian Optimization techniques, such as trust-region and derivative-informed local search, to…
ShaplEIG introduces a Bayesian experimental design framework to efficiently and adaptively estimate Shapley values by minimizing the number of required costly function evaluations.