Continuous control with deep reinforcement learning
Citations Over Time
Abstract
We adapt the ideas underlying the success of Deep Q-Learning to the continuous action domain. We present an actor-critic, model-free algorithm based on the deterministic policy gradient that can operate over continuous action spaces. Using the same learning algorithm, network architecture and hyper-parameters, our algorithm robustly solves more than 20 simulated physics tasks, including classic problems such as cartpole swing-up, dexterous manipulation, legged locomotion and car driving. Our algorithm is able to find policies whose performance is competitive with those found by a planning algorithm with full access to the dynamics of the domain and its derivatives. We further demonstrate that for many of the tasks the algorithm can learn policies end-to-end: directly from raw pixel inputs.
Related Papers
- → Review on Reinforcement-concrete Bonded Anchorage(2022)1 cited
- → Variable Ratio Reinforcement and Differential Reinforcement(1974)
- Susquehanna Chorale Spring Concert "Roots and Wings"(2017)
- → The Effect of Vicarious Reinforcement on Inappropriate Behavior in an Elementary School Classroom(1975)
- → ИСПОЛЬЗОВAНИЕ ПОТЕНЦИAЛA СОЦИAЛЬНЫХ ПAРТНЕРОВ В ПОДГОТОВКЕ БУДУЩИХ ПЕДAГОГОВ(2024)