ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2015, Vol. 52 ›› Issue (1): 191-199.doi: 10.7544/issn1000-1239.2015.20131113

• 人工智能 • 上一篇    下一篇

基于深度学习框架的隐藏主题变量图模型

吴蕾1,张文生2,王珏2   

  1. 1(中国农业科学院农业信息研究所 北京 100081); 2(中国科学院自动化研究所 北京 100190) (girlrable@126.com)
  • 出版日期: 2015-01-01
  • 基金资助: 
    基金项目:国家自然科学基金重点项目(U1135005)|国家自然科学基金重大研究计划项目(90924026)|国家自然科学基金青年科学基金项目(61305018)|国家科技重大专项项目(GFZX0101050302)|武器装备预研基金项目(51301010206)

Hidden Topic Variable Graphical Model Based on Deep Learning Framework

Wu Lei1,Zhang Wensheng2,Wang Jue2   

  1. 1(Agricultural Information Institute, Chinese Academy of Agricultural Sciences, Beijing 100081); 2(Institute of Automation, Chinese Academy of Sciences, Beijing 100190)
  • Online: 2015-01-01

摘要: 隐藏主题变量图模型是一种用节点表示潜在主题或者潜在主题变化的概率图模型.针对当前隐藏主题变量图模型只能提取单层主题节点的缺陷,提出一种基于深度学习框架的提取多层主题节点的概率图模型.该模型在隐藏主题变量图模型的底层增加预处理结构层,即引入自组织映射层,可以有效地提取不同层次的主题状态.另外,隐藏主题变量图模型使用了隐马尔可夫网络和条件随机场的相结合的模型.针对条件随机场,提出了一阶逻辑子句定义的特征函数.弥补了长距离依存特性的缺失.在此基础上提出了一种分层次提取主题状态的新深度学习算法.在国际通用的亚马逊情感分析数据、Tripadvisor情感分析数据上的实验表明,新算法可以提升情感分析的准确率.同时实验结果也表明,提取多层主题状态可以更好地挖掘宏观主题分布信息和评论的局部主题信息.

关键词: 概率图模型, 深度学习, 隐马尔可夫模型, 自组织映射, 一阶逻辑

Abstract: The hidden topic variable graphical model represents potential topics or potential topic changes by nodes. The current study of hidden topic variable graphical models suffers from the flaw that they can only extract single level topic nodes. This paper proposes a probabilistic graphical model based on the framework of deep learning to extract multi-level topic nodes. The model adds the preprocessing layer to the bottom of the hidden topic variable graphical model. The preprocessing layer used in the paper is the self-organizing maps (SOM) model. By introducing the SOM, the model can effectively extract different topic status with those extracted by the hidden topic variable graphical model. In addition, the hidden topic variable graphical model used in this paper is constructed by hidden Markov model (HMM) and conditional random field (CRF). In order to make up the short-distance dependency Markov property, we use the characteristic function defined by first-order logic. On this basis, we propose a new algorithm by hierarchically extracting topic status. Experimental results on both the international universal Amazon sentiment analysis dataset and the Tripadvisor sentiment analysis dataset show that the proposed algorithm improves the accuracy of sentiment analysis. And the new algorithm can mine more macroscopic topic distribution information and local topic information.

Key words: probabilistic graphical model, deep learning, hidden Markov models (HMM), self-organizing maps, first-order logic

中图分类号: