Convergence of value iterations for total-cost MDPs and POMDPs with general state and action sets
Citations Over TimeTop 10% of 2014 papers
Abstract
This paper describes conditions for convergence to optimal values of the dynamic programming algorithm applied to total-cost Markov Decision Processes (MDPSs) with Borel state and action sets and with possibly unbounded one-step cost functions. It also studies applications of these results to Partially Observable MDPs (POMDPs). It is well-known that POMDPs can be reduced to special MDPs, called Completely Observable MDPs (COMDPs), whose state spaces are sets of probabilities of the original states. This paper describes conditions on POMDPs under which optimal policies for COMDPs can be found by value iteration. In other words, this paper provides sufficient conditions for solving total-costs POMDPs with infinite state, observation and action sets by dynamic programming. Examples of applications to filtration, identification, and inventory control are provided.
Related Papers
- → Partially Observable Markov Decision Processes(2012)293 cited
- → Information state for Markov decision processes with network delays(2008)9 cited
- → A Method for Speeding Up Value Iteration in Partially Observable Markov Decision Processes(2013)11 cited
- → Markov Decision Processes(2014)6 cited
- → Point-Based Methods for Model Checking in Partially Observable Markov Decision Processes(2020)1 cited