Q-Value Weighted Regression: Reinforcement Learning with Limited Data
Citations Over Time
Abstract
Sample efficiency has emerged as a significant challenge of deep reinforcement learning. We introduce Q-Value Weighted Regression (QWR), a simple RL algorithm that excels in this aspect. QWR builds upon Advantage Weighted Regression (AWR), an off-policy actor-critic algorithm that performs very well on continuous control tasks, but has low sample efficiency and struggles with high-dimensional observation spaces. We perform both theoretical and empirical analyses of AWR, that explain its shortcomings and use these insights to motivate QWR. We show experimentally that QWR either outperforms or matches the state-of-the-art algorithms both on tasks with continuous and discrete actions. In particular, QWR yields results on par with SAC on the MuJoCo suite and - with the same set of hyperparameters - outperforms a highly tuned implementation of Rainbow on a set of Atari games. At the same time, QWR is a much simpler algorithm than both SAC and Rainbow.
Related Papers
- → Estimation of Sea State Parameters From Measured Ship Responses: The Bayesian Approach With Fixed Hyperparameters(2010)6 cited
- → Method for Hyperparameter Tuning of EfficientNetV2-based Image Classification by Deliberately Modifying Optuna Tuned Result(2023)2 cited
- → No More Pesky Hyperparameters: Offline Hyperparameter Tuning for RL(2022)3 cited
- → Practical Differentially Private Hyperparameter Tuning with Subsampling(2023)2 cited
- → Optional Hyperparameter Tuning of Convolutional Neural Network for ECG Classification(2023)