ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2015, Vol. 52 ›› Issue (4): 779-788.doi: 10.7544/issn1000-1239.2015.20148336

所属专题: 2015大数据驱动的网络科学

• 网络技术 • 上一篇    下一篇

基于小数据的在线用户兴趣长程演化研究

李勇1,2, 孟小峰1, 刘继3, 王常青4   

  1. 1(中国人民大学信息学院 北京 100872); 2(西北师范大学计算机科学与工程学院 兰州 730070); 3(新疆财经大学统计与信息学院 乌鲁木齐 830012); 4(中国互联网络信息中心互联网基础技术开放实验室 北京 100190) (facingworld@126.com)
  • 出版日期: 2015-04-01
  • 基金资助: 
    基金项目:国家自然科学基金项目(61379050,91224008,71261025);国家“八六三”高技术研究发展计划基金项目(2013AA013204);高等学校博士学科点专项科研基金项目(20130004130001);中国人民大学科学研究基金项目(11XNL010)

Study of The Long-Range Evolution of Online Human-Interest Based on Small Data

Li Yong1,2,Meng Xiaofeng1, Liu Ji3, Wang Changqing4   

  1. 1(School of Information, Renmin University of China, Beijing 100872); 2(College of Computer Science and Engineering, Northwest Normal University, Lanzhou 730070); 3(School of Statistics and Information, Xinjiang University of Finance and Economics, Urumqi 830012); 4(DNSLAB, China Internet Network Information Center, Beijing 100190)
  • Online: 2015-04-01

摘要: 网络大数据中与Web用户行为相关的数据,例如在线点击数据和通讯记录等,为人们深度挖掘和定量分析人类兴趣动力学带来了机遇,这些在线行为数据被称为大数据时代的“小数据”,有助于揭示许多复杂的人类社会与经济现象.Web用户行为建模时常见的前提假设就是人的行为符合Markov过程,用户下一行为仅依赖于当前行为,与过去的历史行为无关.然而,在线用户行为是一个复杂过程,常常依赖于人的兴趣,对于人类兴趣动力学的本质规律目前知之甚少.利用中国互联网络信息中心提供的30000多名在线用户行为记录数据,基于块熵理论对在线用户行为进行分类研究,通过信息论分析方法,结合熵增曲线的离散导数和积分理论,分析在线用户点击行为的随机性和记忆性特征.研究表明,与常见的假设不同,Web用户的行为并不是一个简单的Markov过程,而是一个符合幂率的非周期无限长程记忆过程;进一步还发现,用户在线连续点击7个兴趣点,其行为的平均预测增益就可达到95.3%以上,可为大数据时代在线用户兴趣精准预测提供理论指导.

关键词: 小数据, 块熵, 超熵, 兴趣演化, 预测增益

Abstract: The availability of network big data, such as those from online human surfing log, e-commerce and communication log, makes it possible to probe into and quantify the dynamics of human-interest. These online behavioral data is called “small data” in the era of big data, which can help explaining many complex socio-economic phenomena. A fundamental assumption of Web user behavioral modeling is that the user’s behavior is consistent with the Markov process and the user’s next behavior only depends on his current behavior regardless of the historical behaviors of the past. However, Web user’s behavior is a complex process and often driven by human interests. We know little about regular pattern of human-interest. In this paper, using more than 30000 online users behavioral log dataset from CNNIC, we explore the use of block entropy as a dynamics classifier for human-interest behaviors. We synthesize several entropy-based approaches to apply information theoretic measures of randomness and memory to the stochastic and deterministic processes of human-interests by using discrete derivatives and integrals of the entropy growth curve. Our results are, however preliminary, that the Web user’s behavior is not a Markov process, but a aperiodic infinitary long-range memory power-law process. Further analysis finds that the predictability gain can exceed 95.3 percent when users click 7 consecutive points online, which can provide theoretical guidance for accurate prediction of online user’s interests in the era of big data.

Key words: small data, block entropy, excess entropy, evolution of interest, predictability gain

中图分类号: