Abstract:
Partially observable Markov decision processes (POMDP) changes the non-Markovian into Markovian over the belief state space. It has been an important branch of stochastic decision processes for its characteristics of describing the real world. Tradional algorithms to solve POMPDs are value iteration algorithm and policy iteration algorithm. However, the complexity of exact solution algorithms for such POMDPs are typically computationally intractable for all but the smallest problems. At first, the authors describe the principles and decision processes of POMDP, and then present a policy- and value- iteration algorithm(PVIA) for partially observable Markov decision processes. This algorithm uses advantages of policy iteration and value iteration when programming makes use of policy iteration and when computing uses value iteration. This algorithm using linear programming and dynamic programming resolves curse of dimensionality problem when the belief state is large, and obtains the approximate optimal value. A key contribution of this paper is that it shows how the basic operations of both algorithms can be performed effciently together. The algorithm was implemented in the SZPT_Roc team, which took the 2nd place in the simulation league of the RoboCup 2006 Chinese Open Championship. Finally, compared with some typical algorithms, experimental results show that the algorithm is practical and feasible.