微博数据挖掘研究综述

丁兆云; 贾  焰; 周  斌

微博数据挖掘研究综述

Survey of Data Mining for Microblogs

摘要

摘要: 随着近几年微博的快速发展与普及，微博凭借平台的开放性、终端扩展性、内容简洁性和低门槛等特性，在网民中快速渗透，已发展成一个重要的社会化媒体，微博成为网民获取新闻时事、人际交往、自我表达、社会分享以及社会参与的重要媒介以及社会公共舆论的重要平台，对国家安全和社会发展产生了深远的影响.微博是人类在虚拟网络世界生活的抽象概括和延伸，与一般信息网络不同，微博本身具有大规模、噪音数据多样性、快速传播演化性、非线性、社会媒体性以及多关系等特征，因此其在分析方法和挖掘目标上都与传统信息系统具有很大差别，在相关技术的研究上也带来了更大的挑战.针对微博的新特性，研究了微博近几年的相关研究现状，同时分析了Twitter数据集特征，且总结了未来研究面临的挑战.

Abstract: The past few years the rapid development and popularization of microblogs have already been witnessed. Due to their openness, terminal expansion, content simplicity, low threshold and so on, microblogs deeply affect our daily life by providing an important platform for people to publish comments, transform information and acquire knowledge, to name just a few. Though bearing such advantages, microblogs may cause serious impacts on the national security and social development if they are out of control. Therefore, the research on microblogs is quite valuable from both theoretical and practical perspective, especially in this age of the Internet. Analyzing and mining microblogs also brings great challenges. As can be seen, microblogs can be treated as a generalization and extension of human life in the virtual network world. However, different from traditional information networks, microblogs have their unique characteristics, including noisy data diversity, social media, multi-relations, the rapid spread and evolutionary, nonlinearity, large scalability and etc. Such differences bring forth great challenges in analyzing and mining the microblogs. In this paper, we survey the data mining for microblogs and analyze the dataset of Twitter. Moreover, we summarize the challenges of data mining for microblogs.

HTML全文

参考文献(0)

施引文献

资源附件(0)