SlateQ: A Tractable Decomposition for Reinforcement Learning with Recommendation Sets
2019pp. 2592–2599
Citations Over TimeTop 10% of 2019 papers
Eugene Ie, Vihan Jain, Jing Wang, Sanmit Narvekar, Ritesh Agarwal, Rui Wu, Heng-Tze Cheng, Tushar Chandra, Craig Boutilier
Abstract
Reinforcement learning methods for recommender systems optimize recommendations for long-term user engagement. However, since users are often presented with slates of multiple items---which may have interacting effects on user choice---methods are required to deal with the combinatorics of the RL action space. We develop SlateQ, a decomposition of value-based temporal-difference and Q-learning that renders RL tractable with slates. Under mild assumptions on user choice behavior, we show that the long-term value (LTV) of a slate can be decomposed into a tractable function of its component item-wise LTVs. We demonstrate our methods in simulation, and validate the scalability and effectiveness of decomposed TD-learning on YouTube.
Related Papers
- → The Relationship between Surface Roughness, Capillarity and Mineral Composition in Roofing Slates(2020)2 cited
- → V.—Notes on the Ash-Slates and Other Rocks of the Lake District(1892)5 cited
- → VI.—On the Occurrence of a Trilobite in the Skiddaw Slates of the Isle of Man(1893)3 cited
- → III.—On the Relations between the Skiddaw Slates and the Green Slates and Porphyries of the Lake-district(1869)1 cited
- Research on Agent Reinforcement Learning Policy Based on DFS(2010)