ISSN 1000-1239 CN 11-1777/TP

Journal of Computer Research and Development

Previous Articles     Next Articles

Webpage Fingerprinting Identification on Tor: A Survey

Sun Xueliang1,2, Huang Anxin1,2 , Luo Xiapu3 , Xie Yi1,2   

  1. 1School of Informatics, Xiamen University, Xiamen, Fujian 361005)

    2Fujian Key Laboratory of Sensing and Computing for Smart City (Xiamen University), Xiamen, Fujian 361005

    3Department of Computing, The Hong Kong Polytechnic University, Hong Kong)

  • Online:2021-02-05
  • Supported by: 
     This work was supported by the National Natural Science Foundation of China (61771017, 61671397, 61772438, 61972313).

Abstract: With the prosperous development of Web services, how to protect Web-surfing privacy has become a major concern to society. Various protection techniques (e.g., anonymous communication networks) have been proposed to help users hide the real access targets and anonymously browse the Internet. However, Webpage fingerprinting (WF) identifications, through monitoring and analyzing network traffic, can still determine whether a Web page is being visited by exploiting the features of network traffic, thus jeopardizing the anonymity. On the other hand, the methods of WF identification can be leveraged by law enforcement agency to monitor anonymous networks in order to prevent abusing them for carrying out illegal activities or covering up crimes. Therefore, WF identification is a significant and noteworthy technique from the perspective of privacy protection and network supervision. This survey first introduces the concept and development of WF identifications, and then focuses on two kinds of WF identifications on Tor, a widely used anonymous network, including single-tag oriented identifications and multi-tag oriented identifications, respectively. In particular, this survey analyzes the characteristics of these WF identifications and points out their limitations, such as simplistic assumptions on research and insufficient experiments for systematical evaluation. Finally, this survey concludes and suggests future research directions for WF identifications.

Key words: Webpage fingerprinting identification, Tor anonymous communication; privacy preserving, traffic analysis, machine learning, network supervision