高级检索
    葛振兴, 向帅, 田品卓, 高阳. 基于深度强化学习的掼蛋扑克博弈求解[J]. 计算机研究与发展, 2024, 61(1): 145-155. DOI: 10.7544/issn1000-1239.202220697
    引用本文: 葛振兴, 向帅, 田品卓, 高阳. 基于深度强化学习的掼蛋扑克博弈求解[J]. 计算机研究与发展, 2024, 61(1): 145-155. DOI: 10.7544/issn1000-1239.202220697
    Ge Zhenxing, Xiang Shuai, Tian Pinzhuo, Gao Yang. Solving GuanDan Poker Games with Deep Reinforcement Learning[J]. Journal of Computer Research and Development, 2024, 61(1): 145-155. DOI: 10.7544/issn1000-1239.202220697
    Citation: Ge Zhenxing, Xiang Shuai, Tian Pinzhuo, Gao Yang. Solving GuanDan Poker Games with Deep Reinforcement Learning[J]. Journal of Computer Research and Development, 2024, 61(1): 145-155. DOI: 10.7544/issn1000-1239.202220697

    基于深度强化学习的掼蛋扑克博弈求解

    Solving GuanDan Poker Games with Deep Reinforcement Learning

    • 摘要: 在不确定信息的复杂环境下进行决策是现实中人们经常面对的困难之一,因此具有能够进行良好决策的能力被视为人工智能的重要能力之一. 而游戏类型的博弈作为对现实世界的一种高度抽象,具有良定义、易检验算法优劣等特点,成为研究的主流. 其中以掼蛋为代表的扑克类博弈不仅具有他人手牌未知这样的难点,还由于可选出牌动作与他人手牌情况数量庞大等特点,难以进行高效求解. 因此,提出了一种软深度蒙特卡洛(soft deep Monte Carlo,SDMC)求解方法. 该方法能够更好地融合领域知识,加快策略学习速度,并采用软动作采样策略调整实时决策,提升策略胜率. 所提出的SDMC方法训练出的策略模型参加第2届“中国人工智能博弈算法大赛”时获得冠军. 与第1届比赛冠军策略和第2届其他策略模型的实验对比证明了该方法在解决掼蛋扑克博弈中的有效性.

       

      Abstract: Decisions are often made in complex environment without exact information in many real-world occasions. Hence the capability of making proper decisions is expected for artificial intelligence agents. As abstractions of the real world, games provoke interests of researchers with the benefits of well-defined game structure and the facility to evaluate various algorithms. Among these games, GuanDan poker games are typical games with large action space and huge information set size, which exacerbates the problem and increases the difficulty to solve these games. In this work, we propose a novel soft deep Monte Carlo(SDMC) method to overcome the above-mentioned difficulties. By considering how the expert strategy acts in the training process, SDMC can better utilize the expert knowledge and accelerate the convergence of training process. Meanwhile, SDMC applies an action sample strategy in real time playing to confuse the opponents and prohibits the potentional exploitation of them, which could also lead to significant improvement of the performance against different agents. SDMC agent was the champion of the 2nd Chinese Artificial Intelligence Game Algorithm competition. Comprehensive experiments that evaluate the training time and final performance are conducted in this work, showing superior performance of SDMC against other agents such as the champion of 1st competition.

       

    /

    返回文章
    返回