ISSN 1000-1239 CN 11-1777/TP

Journal of Computer Research and Development ›› 2014, Vol. 51 ›› Issue (10): 2160-2170.doi: 10.7544/issn1000-1239.2014.20130823

Previous Articles     Next Articles

Online Counterfactual Regret Minimization in Repeated Imperfect Information Extensive Games

Hu Yujing1, Gao Yang1, An Bo2   

  1. 1(State Key Laboratory for Novel Software Technology (Nanjing University), Nanjing 210023); 2(Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Science, Beijing 100190)
  • Online:2014-10-01

Abstract: In this paper, we consider the problem of exploiting suboptimal opponents in imperfect information extensive games. Most previous works use opponent modeling and find a best response to exploit the opponent. However, a potential drawback of such approach is that the best response may not be a real one, since the modeled strategy actually may not be the same as what the opponent plays. We try to solve this problem from the perspective of online regret minimization, which avoids opponent modeling. We make extensions to a state-of-the-art equilibrium-computing algorithm called counterfactual regret minimization (CFR). The core problem is how to compute the counterfactual values in online scenarios. We propose to learn approximations of these values from the results produced by the game and introduce two different estimators: static estimator which learns the values directly from the results’ distribution, and dynamic estimator which assigns larger weight to new sampled results than older ones for better adapting to dynamic opponents. Two algorithms for online regret minimization are proposed based on the two estimators. We also give the conditions under which the values estimated by our estimators are equal to the true values, showing the relationship between CFR and our algorithms. Experimental results in one-card poker show that our algorithms not only perform the best when exploiting some weak opponents, but also outperform some state-of-the-art algorithms by achieving the highest win rate in matches with a few hands.

Key words: extensive games, imperfect information, regret minimization, counterfactual regret minimization, static estimator, dynamic estimator

CLC Number: