Convergence of the Q-ae learning under deterministic MDPs and its efficiency under the stochastic environment
2002Vol. 1, pp. 177–182
Abstract
Reinforcement learning (RL) is an efficient method for solving Markov decision processes (MDPs) without any priori knowledge about an environment. Q-learning is a representative RL. Though it is guaranteed to derive the optimal policy, Q-learning needs numerous trials to learn the optimal policy. By the use of the feature of Q value, this paper presents an accelerated RL method, the Q-ae learning. Further, utilizing the dynamic programming principle, this paper proves the convergence to the optimal policy of the Q-ae learning under deterministic MDPs. The analytical and simulation results illustrate the efficiencies of the Q-ae learning under deterministic and stochastic MDPs.
Related Papers
- A Generalized Reinforcement-Learning Model: Convergence and Applications(1996)
- → Customized Dynamic Pricing for Air Cargo Network via Reinforcement Learning(2020)1 cited
- → RVI reinforcement learning for semi-Markov decision processes with average reward(2010)1 cited
- 2D1431 Machine Learning Lab 3: Reinforcement Learning(2004)
- Research on Agent Reinforcement Learning Policy Based on DFS(2010)