Abstract:
The textual password based authentication method is still the mainstream for users to authenticate their identities. To better study password security, researchers propose many data-driven password guessing methods, such as probabilistic context-free grammars (PCFG) and Markov model-based methods. These methods have unique advantages in guessing passwords, i.e., they can guess specific types of passwords with a smaller number of guesses. To make full use of these advantages for better guessing efficiency, we propose a general practical framework of parameterized hybrid guessing. The framework consists of a model pruning method and an allocation strategy for guesses which is theoretically proved optimal. It can mix the guess advantages of different data-driven methods to generate more efficient guessing sets. To verify the generality and efficiency of the framework, we analyze and mix the different advantages of existing data-driven guessing methods, then design multiple parameterized hybrid guessing methods (collectively referred to as hyPassGu) composed of multiple models based on the framework for guessing practice. We also evaluate these hybrid guessing methods using four large-scale password datasets (more than 150 million passwords in total) leaked from real websites. The experimental results show that hyPassGu constructed by different methods surpass the guessing efficiency of the single method, and surpass the best efficiency of the single method by 1.52%~35.49% under 10\+\10\ guesses. Finally, the comparative experimental results under different guesses show that the performance of the optimal allocation strategy proposed in this paper is stably better than the average allocation strategy and random allocation strategy, and has a relative improvement of 16.87% on the password dataset with the largest dispersion. Moreover, the more diversified hybrid method shows better guessing efficiency as a whole.