ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2018, Vol. 55 ›› Issue (8): 1641-1652.doi: 10.7544/issn1000-1239.2018.20180363

所属专题: 2018数据挖掘前沿进展专题

  1. 1(智能通信软件与多媒体北京市重点实验室(北京邮电大学) 北京 100876);2(釜山国立大学电子工程系 韩国釜山 46241) (
  • 出版日期: 2018-08-01
  • 基金资助: 
    国家自然科学基金项目(61532006,61320106006,61772083) This work was supported by the National Natural Science Foundation of China (61532006, 61320106006, 61772083).

The Social and Conceptual Semantic Extended Search Method for Microblog Short Text

Cui Wanqiu1, Du Junping1, Kou Feifei1, Li Zhijian1,Lee JangMyung2   

  1. 1(Beijing Key Laboratory of Intelligent Telecommunication Software and Multimedia (Beijing University of Posts and Telecommunications), Beijing 100876);2(Department of Electronics Engineering, Pusan National University, Busan, Korea 46241)
  • Online: 2018-08-01

摘要: 充分挖掘微博短文本的语义以实现精准搜索是一项重要任务.由于微博文本内容具有稀疏性和语义局限性的特点,使得仅通过分析字面语义来进行短文本理解和相似性匹配的传统搜索方法受到了一定的限制.因此提出了一种社交与概念化语义结合的扩展搜索方法,通过挖掘社交网络独特的社交属性如#标签#、“@”和链接信息URL,对微博短文本实现进一步的社交语义扩展.该方法将文本字面分析获取的概念词语和社交关系中潜在的关联标签信息相结合,对短文本进行2种角度下的语义特征表示,实现了基于微博短文本语义充分理解的精准搜索.在微博数据集上的对比实验表明,与已有的扩展搜索方法相比所提方法能捕捉更多的语义特征,微博搜索的性能也得到了显著的提升.

关键词: 微博短文本, 社交与概念化语义, 扩展搜索, 概念词语, 关联标签

Abstract: Mining the semantics of the microblog texts to realize accurate search is an essential task in microblog search. Because the content of the short texts in microblog has the characteristics of sparsity and semantic limitation, the traditional search methods which only analyze the semantics of literal text for short texts understanding and similarity matching have certain restriction. Therefore, we propose an extended search algorithm based on social and conceptual semantics. By exploiting the unique social attributes such as the #hashtag#, the mention “@” and the link information URL in the social network, we further extend the short texts in microblog through the social semantics. The method combines the conceptual words obtained from literal analysis of short texts with the potential associated hashtags information in a graph structure formed by social relationships. It performs the feature representation of short texts in two semantic extensions and achieves the precise search based on full mining of short texts meaning. Finally, we conduct experimental comparisons with traditionally extended search algorithms in the microblog datasets. The results show that the proposed algorithm can capture more semantics and has semantic enhancement function in the search for short texts of microblog. Moreover, the search performance has been significantly improved in the short texts of microblog.

Key words: short text in microblog, social and conceptual semantics, extended search, conceptual words, associated hashtags