ISSN 1000-1239 CN 11-1777/TP

• 信息处理 •

### 面向大规模微博消息流的突发话题检测

1. (哈尔滨工程大学信息安全研究中心 哈尔滨 150001) (shenguowei@hrbeu.edu.cn)
• 出版日期: 2015-02-01
• 基金资助:
基金项目：国家“八六三”高技术研究发展计划基金项目(2012AA012802)；国家自然科学基金项目(61170242)

### Burst Topic Detection Oriented Large-Scale Microblogs Streams

Shen Guowei, Yang Wu, Wang Wei, Yu Miao

1. (Research Center of Information Security, Harbin Engineering University, Harbin 150001)
• Online: 2015-02-01

Abstract: In microblogs, emergent events spread quickly and produce tremendous influence. Burst of public opinion is widely concerned by government and enterprise. Existing burst topic detection methods only consider one type of entity, such as word or tag. However, Chinese microblogs contain not only new or colloquial words, but also contain some pictures and links, burst patters of which are difficult to detect. To tackle this problem, we propose a real-time burst topic detection framework for multi-type entites. Different from existing method, our method does not require Chinese word segmentation, but generates new words lastly. In this framework，the window size is adjusted based on the microblogs streams dynamically. In order to measure the burst weight of entity, the spread influence of entity is calculated. Moreover, the high order co-clustering algorithm based on non-negative matrix decompostition is used to cluster two types of entities, message and user simultaneously. While the detection of burst topic, we can also obtain the related messages and participating users, which can be used to analyze the cause of burst topic. Experimental on a large Sina Weibo dataset show that our algorithm has higher accuracy and earlier detection of the burst topic compared with the existing algorithms.