Abstract:
The past few years the rapid development and popularization of microblogs have already been witnessed. Due to their openness, terminal expansion, content simplicity, low threshold and so on, microblogs deeply affect our daily life by providing an important platform for people to publish comments, transform information and acquire knowledge, to name just a few. Though bearing such advantages, microblogs may cause serious impacts on the national security and social development if they are out of control. Therefore, the research on microblogs is quite valuable from both theoretical and practical perspective, especially in this age of the Internet. Analyzing and mining microblogs also brings great challenges. As can be seen, microblogs can be treated as a generalization and extension of human life in the virtual network world. However, different from traditional information networks, microblogs have their unique characteristics, including noisy data diversity, social media, multi-relations, the rapid spread and evolutionary, nonlinearity, large scalability and etc. Such differences bring forth great challenges in analyzing and mining the microblogs. In this paper, we survey the data mining for microblogs and analyze the dataset of Twitter. Moreover, we summarize the challenges of data mining for microblogs.