高级检索
    王栋, 李振宇, 谢高岗. 在线社会网络无偏采样技术[J]. 计算机研究与发展, 2016, 53(5): 949-967. DOI: 10.7544/issn1000-1239.2016.20148387
    引用本文: 王栋, 李振宇, 谢高岗. 在线社会网络无偏采样技术[J]. 计算机研究与发展, 2016, 53(5): 949-967. DOI: 10.7544/issn1000-1239.2016.20148387
    Wang Dong, Li Zhenyu, Xie Gaogang. Unbiased Sampling Technologies on Online Social Network[J]. Journal of Computer Research and Development, 2016, 53(5): 949-967. DOI: 10.7544/issn1000-1239.2016.20148387
    Citation: Wang Dong, Li Zhenyu, Xie Gaogang. Unbiased Sampling Technologies on Online Social Network[J]. Journal of Computer Research and Development, 2016, 53(5): 949-967. DOI: 10.7544/issn1000-1239.2016.20148387

    在线社会网络无偏采样技术

    Unbiased Sampling Technologies on Online Social Network

    • 摘要: 作为当前流行的内容共享和信息传播的平台,在线社会网络(online social network, OSN)(例如Facebook和Twitter)已经吸引了各个领域研究人员的关注.然而,研究者通常很难获取完整的在线社会网络数据集,取而代之的是通过一个具有代表性样本集来估计完整网络的特性.因此,怎样获得无偏样本集或对网络特性进行无偏估算成为了OSN研究的关键前提.对在线社会网络的无偏采样技术研究现状进行了综述分析.首先在理论上给出了大规模网络无偏采样的充分必要条件,接着从采样原理、采样偏见性和采样效率3方面对目前常用的采样技术进行了对比分析,最后讨论了在线社会网络采样技术的发展趋势.该工作为在线社会网络采样技术的使用及其研究提供了重要的参考价值.

       

      Abstract: As the popular platform for content sharing and information diffusion, online social network (OSN), such as Facebook and Twitter, have attracted massive researchers in analysis. While using complete datasets provided by the OSN companies can generate the best results, it is hard, if possible, for researchers to get such datasets as most OSN companies are reluctant to share their data in order to protect the users’ privacy. Besides, it may require unreasonable time to get the results in analysis, given the huge amount of data. The alternative is to obtain features of the complete networks based on representative samples. Therefore, how to get unbiased samples or make unbiased estimations on OSN becomes the key premise of OSN research. A general summary of the unbiased sampling technologies on OSN is provided. The general necessary and sufficient condition for unbiased sampling of large-scale networks is studied mathematically at first, and then the performances of the widely-used sampling technologies are compared from the perspectives of sampling principle, sampling bias and sampling efficiency. Finally, the trend in development of sampling technologies on OSN is discussed. This summary can provide the OSN researchers with a valuable reference for use and analysis of sampling technologies.

       

    /

    返回文章
    返回