ISSN 1000-1239 CN 11-1777/TP

Journal of Computer Research and Development ›› 2021, Vol. 58 ›› Issue (8): 1773-1788.doi: 10.7544/issn1000-1239.2021.20200498

Previous Articles     Next Articles

Webpage Fingerprinting Identification on Tor: A Survey

Sun Xueliang1,2, Huang Anxin1,2, Luo Xiapu3, Xie Yi1,2   

  1. 1(School of Informatics, Xiamen University, Xiamen, Fujian 361005);2(Fujian Key Laboratory of Sensing and Computing for Smart City (Xiamen University), Xiamen, Fujian 361005);3(Department of Computing, The Hong Kong Polytechnic University, Hong Kong 999077)
  • Online:2021-08-01
  • Supported by: 
    This work was supported by the National Natural Science Foundation of China (61771017, 61671397, 61772438, 61972313) and Hong Kong Innovation and Technology Fund Project (GHP/052/19SZ).

Abstract: With the prosperous development of Web services, protecting Web-surfing privacy has become a significant concern to society. Various protection techniques (e.g., anonymous communication networks) have been proposed to help users hide the real access targets and anonymously browse the Internet. However, Webpage fingerprinting (WF) identifications, through monitoring and analyzing network traffic, can still determine whether a Web page is visited by exploiting network traffic features, thus jeopardizing the anonymity. On the other hand, law enforcement agencies can leverage the methods of WF identification to monitor anonymous networks to prevent abusing them for carrying out illegal activities or covering up crimes. Therefore, WF identification is a significant and noteworthy technique for privacy protection and network supervision. In this survey, we first introduce the concept and development of WF identifications, and then focus on two kinds of WF identifications on Tor, a widely used anonymous network, including single-tag oriented identifications and multi-tag oriented identifications. In particular, the characteristics of these WF identifications are analyzed and these WF limitations are pointed out, such as simplistic assumptions and insufficient experiments for systematical evaluation. Finally, future research directions for WF identifications are concluded.

Key words: Webpage fingerprinting identification, Tor anonymous communication, privacy preserving, traffic analysis, machine learning, network supervision

CLC Number: