杨峻楠, 张红旗, 张传富. 基于随机博弈与改进WoLF-PHC的网络防御决策方法[J]. 计算机研究与发展, 2019, 56(5): 942-954.
 引用本文: 杨峻楠, 张红旗, 张传富. 基于随机博弈与改进WoLF-PHC的网络防御决策方法[J]. 计算机研究与发展, 2019, 56(5): 942-954.
Yang Junnan, Zhang Hongqi, Zhang Chuanfu. Network Defense Decision-Making Method Based on Stochastic Game and Improved WoLF-PHC[J]. Journal of Computer Research and Development, 2019, 56(5): 942-954.
 Citation: Yang Junnan, Zhang Hongqi, Zhang Chuanfu. Network Defense Decision-Making Method Based on Stochastic Game and Improved WoLF-PHC[J]. Journal of Computer Research and Development, 2019, 56(5): 942-954.

## Network Defense Decision-Making Method Based on Stochastic Game and Improved WoLF-PHC

• 摘要: 当前运用随机博弈的网络攻防分析方法采用完全理性假设，但在实际的网络攻防对抗中攻防双方很难达到完全理性的高要求，降低了现有方法的准确性和指导价值.从网络攻防对抗实际出发，分析有限理性对攻防随机博弈的影响，在有限理性约束下构建攻防随机博弈模型.针对网络状态爆炸的问题，提出一种基于攻防图的网络状态与攻防动作提取方法，有效压缩了博弈状态空间.在上述基础上引入强化学习中的WoLF-PHC算法进行有限理性随机博弈分析并设计了具有在线学习能力的防御决策算法.该算法通过学习可以获得针对当前攻击者的最优防御策略，所得策略在有限理性下优于现有攻防随机博弈模型的纳什均衡策略.通过引入资格迹改进WoLF-PHC算法，进一步提高了防御者的学习速度.通过实验验证了所提方法的有效性与先进性.

Abstract: At present, the method of network attack and defense analysis based on stochastic game adopts the assumption of complete rationality, but in the actual network attack-defense confrontation, it is difficult for both sides of attack and defense to meet the high requirement of complete rationality, which reduces the accuracy and guiding value of the existing methods. Based on the reality of network attack-defense confrontation, the influence of bounded rationality on attack-defense stochastic game is analyzed. Under the constraints of bounded rationality, a stochastic game model is constructed. Aiming at the problem of network state explosion, a method of extracting network state and attack-defense action based on attack-defense graph is proposed, which the game state space is effectively reduced. On this basis, WoLF-PHC algorithm in reinforcement learning is introduced to carry out bounded rational stochastic game analysis and design a defensive decision-making algorithm with online learning ability. By learning, the algorithm can obtain the optimal defense strategy for the current attacker. The obtained strategy is superior to the Nash equilibrium strategy of the existing attack-defense stochastic game model under bounded rationality. By introducing eligibility trace to improve WoLF-PHC, the learning speed of defenders is further improved. The experimental results verify the effectiveness and advancement of the proposed method.

/

• 分享
• 用微信扫码二维码

分享至好友和朋友圈