ISSN 1000-1239 CN 11-1777/TP

Journal of Computer Research and Development ›› 2015, Vol. 52 ›› Issue (4): 779-788.doi: 10.7544/issn1000-1239.2015.20148336

Special Issue: 2015大数据驱动的网络科学

Previous Articles     Next Articles

Study of The Long-Range Evolution of Online Human-Interest Based on Small Data

Li Yong1,2,Meng Xiaofeng1, Liu Ji3, Wang Changqing4   

  1. 1(School of Information, Renmin University of China, Beijing 100872); 2(College of Computer Science and Engineering, Northwest Normal University, Lanzhou 730070); 3(School of Statistics and Information, Xinjiang University of Finance and Economics, Urumqi 830012); 4(DNSLAB, China Internet Network Information Center, Beijing 100190)
  • Online:2015-04-01

Abstract: The availability of network big data, such as those from online human surfing log, e-commerce and communication log, makes it possible to probe into and quantify the dynamics of human-interest. These online behavioral data is called “small data” in the era of big data, which can help explaining many complex socio-economic phenomena. A fundamental assumption of Web user behavioral modeling is that the user’s behavior is consistent with the Markov process and the user’s next behavior only depends on his current behavior regardless of the historical behaviors of the past. However, Web user’s behavior is a complex process and often driven by human interests. We know little about regular pattern of human-interest. In this paper, using more than 30000 online users behavioral log dataset from CNNIC, we explore the use of block entropy as a dynamics classifier for human-interest behaviors. We synthesize several entropy-based approaches to apply information theoretic measures of randomness and memory to the stochastic and deterministic processes of human-interests by using discrete derivatives and integrals of the entropy growth curve. Our results are, however preliminary, that the Web user’s behavior is not a Markov process, but a aperiodic infinitary long-range memory power-law process. Further analysis finds that the predictability gain can exceed 95.3 percent when users click 7 consecutive points online, which can provide theoretical guidance for accurate prediction of online user’s interests in the era of big data.

Key words: small data, block entropy, excess entropy, evolution of interest, predictability gain

CLC Number: