Abstract:
The research of the reinforcement learning problem with continuous action space is one of the most challenging and difficult concerns for the time being. Conventional reinforcement learning algorithms are usually aimed at solving the problems of the small scale and discrete action space. For the problems with continuous actions space, most approaches tend to discretize the continuous space by taking advantage of prior information, and then try to find out the optimal solution. However, in many practical applications, action spaces are usually continuous, and moreover little prior information is available for discretizing the action space appropriately. In order to solve this problem, we hereby put forward a least square actor-critic algorithm (LSAC) for continuous action space, which takes advantage of approximate function to represent value function and policy respectively; and uses online least square method to obtain the parameters of approximate value function and approximate policy, where approximate value function is considered as the critic part to guide the solution of the parameter of approximate policy. We applied LSAC to solve the cart pole balancing problem and the mountain car problem which are characterized by continuous action space, and then compared the results with those returned by two classic algorithms, Cacla (continuous actor-critic learning automaton) algorithm and eNAC (episodic natural actor-critic) algorithm. The experimental results show that LSAC can solve the continuous action space problem well and has better executing performance.