Abstract:
Reinforcement learning gets optimal policy through trial-and-error and interaction with dynamic environment. Its properties of self-improving and online learning make reinforcement learning become one of most important machine learning methods. Against reinforcement learning has been “curse of dimensionality” troubled by the problem the question, a method of heuristic contour list is proposed on the basis of relational reinforcement learning. The method can represent states, actions and Q-functions through using first-order predications with contour list. Thus advantages of Prolog list can be exerted adequately. The method is to combine logical predication rule with reinforcement learning. A new logical reinforcement learning—CCLORRL is formed and its convergence is proved. The method uses contour shape predicates to build shape state tables, drastically reducing the state space; Using heuristic rules to guide the choice of action can reduce choice blindness when the sample does not exist in the state space. The CCLORRL algorithm is used in the Tetris game. Experiments show that the method is more efficient.