Abstract:
The problem of illegal Web resources, especially pornography sites, poses a major challenge for Webrelated applications. Due to the significant differences in page content, site structure and visitors, user behavior patterns on pornography Web sites and ordinary Web sites can be separated from each other. With the help of a popular commercial search engine in China, large scale user behavior data is collected and it is found that when users surf in porn sites, their behaviors are significantly different from that when they are visiting ordinary Web sites. These differences in user behavior patterns can help us separate porn sites from other ones. A number of behavior features are proposed and combined with machine learning algorithms to develop a porn site identification method. Experimental results show effectiveness of the proposed method.