ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2016, Vol. 53 ›› Issue (5): 949-967.doi: 10.7544/issn1000-1239.2016.20148387

• 信息处理 •    下一篇

在线社会网络无偏采样技术

王栋1,2,李振宇1,谢高岗1   

  1. 1(中国科学院计算技术研究所 北京 100190); 2(中国科学院大学 北京 100190) (wangdong01@ict.ac.cn)
  • 出版日期: 2016-05-01
  • 基金资助: 
    国家自然科学基金项目(61272473,61572475);江苏省未来网络前瞻性研究项目(BY2013095-1-02)

Unbiased Sampling Technologies on Online Social Network

Wang Dong1,2, Li Zhenyu1, Xie Gaogang1   

  1. 1(Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190); 2(University of Chinese Academy of Sciences, Beijing 100190)
  • Online: 2016-05-01

摘要: 作为当前流行的内容共享和信息传播的平台,在线社会网络(online social network, OSN)(例如Facebook和Twitter)已经吸引了各个领域研究人员的关注.然而,研究者通常很难获取完整的在线社会网络数据集,取而代之的是通过一个具有代表性样本集来估计完整网络的特性.因此,怎样获得无偏样本集或对网络特性进行无偏估算成为了OSN研究的关键前提.对在线社会网络的无偏采样技术研究现状进行了综述分析.首先在理论上给出了大规模网络无偏采样的充分必要条件,接着从采样原理、采样偏见性和采样效率3方面对目前常用的采样技术进行了对比分析,最后讨论了在线社会网络采样技术的发展趋势.该工作为在线社会网络采样技术的使用及其研究提供了重要的参考价值.

关键词: 在线社会网络, 采样技术, 采样偏见性, 采样原理, 采样效率

Abstract: As the popular platform for content sharing and information diffusion, online social network (OSN), such as Facebook and Twitter, have attracted massive researchers in analysis. While using complete datasets provided by the OSN companies can generate the best results, it is hard, if possible, for researchers to get such datasets as most OSN companies are reluctant to share their data in order to protect the users’ privacy. Besides, it may require unreasonable time to get the results in analysis, given the huge amount of data. The alternative is to obtain features of the complete networks based on representative samples. Therefore, how to get unbiased samples or make unbiased estimations on OSN becomes the key premise of OSN research. A general summary of the unbiased sampling technologies on OSN is provided. The general necessary and sufficient condition for unbiased sampling of large-scale networks is studied mathematically at first, and then the performances of the widely-used sampling technologies are compared from the perspectives of sampling principle, sampling bias and sampling efficiency. Finally, the trend in development of sampling technologies on OSN is discussed. This summary can provide the OSN researchers with a valuable reference for use and analysis of sampling technologies.

Key words: online social network (OSN), sampling technology, sampling bias, sampling principle, sampling efficiency

中图分类号: