0 citations0 references

On Thompson Sampling and Asymptotic Optimality

2017pp. 4889–4893

Citations Over TimeTop 10% of 2017 papers

Jan Leike, Tor Lattimore, Laurent Orseau, Marcus Hütter

Abstract

We discuss some recent results on Thompson sampling for nonparametric reinforcement learning in countable classes of general stochastic environments. These environments can be non-Markovian, non-ergodic, and partially observable. We show that Thompson sampling learns the environment class in the sense that (1) asymptotically its value converges in mean to the optimal value and (2) given a recoverability assumption regret is sublinear. We conclude with a discussion about optimality in reinforcement learning.

Related Papers

Near-optimal Regret Bounds for Reinforcement Learning(2010)
Regret Bounds for Learning State Representations in Reinforcement Learning(2019)
→ Beyond Value-Function Gaps: Improved Instance-Dependent Regret Bounds\n for Episodic Reinforcement Learning(2021)3 cited
→ Distributed Thompson Sampling(2020)