参数化混合口令猜测方法

韩伟力; 张俊杰; 徐铭; 王传旺; 张浩东; 何震瀛; 陈虎

doi:10.7544/issn1000-1239.20210456

摘要: 基于文本口令的认证方法仍是当前用户身份认证的主流方式.为更好地研究口令安全性，研究人员提出了多种数据驱动的口令猜测方法，如概率上下文无关文法(probabilistic context-free grammars, PCFG)和马尔可夫(Markov)方法等.这些方法在猜测口令时有其独特的猜测优势，即能够以更小的猜测数猜中特定类型的口令.为充分利用这些优势以实现更优的猜测效率，提出了一个通用的参数化混合猜测框架.该框架由模型剪枝方法和理论证明最优的猜测数分配策略构成，能够混合不同数据驱动方法的猜测优势以生成更高效的猜测集.为了验证框架的通用性和最优性，通过分析并混合现有数据驱动猜测方法的不同优势，基于该框架设计了多个混合多元模型的参数化混合猜测方法(统称为hyPassGu)用于猜测实践.并且，还利用从真实网站泄露的4个大规模口令数据集(总共超过1.5亿条口令)对这些混合猜测方法进行了评估实验.实验结果表明，由不同方法组合构建的hyPassGu均表现出超越单一方法的猜测效率，且在10\+\10\猜测数下超越了单一方法最优效率的1.52%~35.49%.此外，不同猜测数下的对比实验结果表明，提出的最优分配策略的猜测表现稳定，优于平均分配策略和随机分配策略，并在分布离散程度最大的口令数据集上有16.87%的相对提升，同时更多元的混合方法整体上也表现出更好的猜测效率.

Abstract: The textual password based authentication method is still the mainstream for users to authenticate their identities. To better study password security, researchers propose many data-driven password guessing methods, such as probabilistic context-free grammars (PCFG) and Markov model-based methods. These methods have unique advantages in guessing passwords, i.e., they can guess specific types of passwords with a smaller number of guesses. To make full use of these advantages for better guessing efficiency, we propose a general practical framework of parameterized hybrid guessing. The framework consists of a model pruning method and an allocation strategy for guesses which is theoretically proved optimal. It can mix the guess advantages of different data-driven methods to generate more efficient guessing sets. To verify the generality and efficiency of the framework, we analyze and mix the different advantages of existing data-driven guessing methods, then design multiple parameterized hybrid guessing methods (collectively referred to as hyPassGu) composed of multiple models based on the framework for guessing practice. We also evaluate these hybrid guessing methods using four large-scale password datasets (more than 150 million passwords in total) leaked from real websites. The experimental results show that hyPassGu constructed by different methods surpass the guessing efficiency of the single method, and surpass the best efficiency of the single method by 1.52%~35.49% under 10\+\10\ guesses. Finally, the comparative experimental results under different guesses show that the performance of the optimal allocation strategy proposed in this paper is stably better than the average allocation strategy and random allocation strategy, and has a relative improvement of 16.87% on the password dataset with the largest dispersion. Moreover, the more diversified hybrid method shows better guessing efficiency as a whole.

参数化混合口令猜测方法

Parameterized Hybrid Password Guessing Method