Abstract:
As the popular platform for content sharing and information diffusion, online social network (OSN), such as Facebook and Twitter, have attracted massive researchers in analysis. While using complete datasets provided by the OSN companies can generate the best results, it is hard, if possible, for researchers to get such datasets as most OSN companies are reluctant to share their data in order to protect the users’ privacy. Besides, it may require unreasonable time to get the results in analysis, given the huge amount of data. The alternative is to obtain features of the complete networks based on representative samples. Therefore, how to get unbiased samples or make unbiased estimations on OSN becomes the key premise of OSN research. A general summary of the unbiased sampling technologies on OSN is provided. The general necessary and sufficient condition for unbiased sampling of large-scale networks is studied mathematically at first, and then the performances of the widely-used sampling technologies are compared from the perspectives of sampling principle, sampling bias and sampling efficiency. Finally, the trend in development of sampling technologies on OSN is discussed. This summary can provide the OSN researchers with a valuable reference for use and analysis of sampling technologies.