Relative Entropy Policy Search
Proceedings of the AAAI Conference on Artificial Intelligence2010Vol. 24(1), pp. 1607–1612
Citations Over TimeTop 10% of 2010 papers
Abstract
Policy search is a successful approach to reinforcement learning. However, policy improvements often result in the loss of information. Hence, it has been marred by premature convergence and implausible solutions. As first suggested in the context of covariant policy gradients, many of these problems may be addressed by constraining the information loss. In this paper, we continue this path of reasoning and suggest the Relative Entropy Policy Search (REPS) method. The resulting method differs significantly from previous policy gradient approaches and yields an exact update step. It can be shown to work well on typical reinforcement learning benchmark problems.
Related Papers
- → A Benchmark Test Structure for Experimental Dynamic Substructuring(2011)9 cited
- → Solutions to the Third Benchmark Control Problem(1991)3 cited
- → The Performance Validation of Linear Programming Algorithm Based on Integrated Benchmark(2010)
- Theoretical Analysis of the Benchmark for Choosing Manipulative Instruments of Monetary Policies(2009)
- → Support Structure Performance Benchmark(2023)