基于深度学习框架的隐藏主题变量图模型

吴蕾; 张文生; 王珏

doi:10.7544/issn1000-1239.2015.20131113

基于深度学习框架的隐藏主题变量图模型

Hidden Topic Variable Graphical Model Based on Deep Learning Framework

摘要

摘要: 隐藏主题变量图模型是一种用节点表示潜在主题或者潜在主题变化的概率图模型.针对当前隐藏主题变量图模型只能提取单层主题节点的缺陷，提出一种基于深度学习框架的提取多层主题节点的概率图模型.该模型在隐藏主题变量图模型的底层增加预处理结构层，即引入自组织映射层，可以有效地提取不同层次的主题状态.另外，隐藏主题变量图模型使用了隐马尔可夫网络和条件随机场的相结合的模型.针对条件随机场，提出了一阶逻辑子句定义的特征函数.弥补了长距离依存特性的缺失.在此基础上提出了一种分层次提取主题状态的新深度学习算法.在国际通用的亚马逊情感分析数据、Tripadvisor情感分析数据上的实验表明，新算法可以提升情感分析的准确率.同时实验结果也表明，提取多层主题状态可以更好地挖掘宏观主题分布信息和评论的局部主题信息.

Abstract: The hidden topic variable graphical model represents potential topics or potential topic changes by nodes. The current study of hidden topic variable graphical models suffers from the flaw that they can only extract single level topic nodes. This paper proposes a probabilistic graphical model based on the framework of deep learning to extract multi-level topic nodes. The model adds the preprocessing layer to the bottom of the hidden topic variable graphical model. The preprocessing layer used in the paper is the self-organizing maps (SOM) model. By introducing the SOM, the model can effectively extract different topic status with those extracted by the hidden topic variable graphical model. In addition, the hidden topic variable graphical model used in this paper is constructed by hidden Markov model (HMM) and conditional random field (CRF). In order to make up the short-distance dependency Markov property, we use the characteristic function defined by first-order logic. On this basis, we propose a new algorithm by hierarchically extracting topic status. Experimental results on both the international universal Amazon sentiment analysis dataset and the Tripadvisor sentiment analysis dataset show that the proposed algorithm improves the accuracy of sentiment analysis. And the new algorithm can mine more macroscopic topic distribution information and local topic information.

HTML全文

参考文献(0)

施引文献

资源附件(0)