基于类卷积交互式注意力机制的属性抽取研究

尉桢楷; 程梦; 周夏冰; 李志峰; 邹博伟; 洪宇; 姚建民

doi:10.7544/issn1000-1239.2020.20190748

基于类卷积交互式注意力机制的属性抽取研究

Convolutional Interactive Attention Mechanism for Aspect Extraction

摘要

摘要: 在基于深度学习的属性抽取研究中，注意力机制是常用的模型之一.目前，面向属性抽取的注意力机制存在2个局限性：其一，注意力机制多为自注意力机制，这是一种全局式注意力机制，其将不相关的噪音(距离目标词较远且与之不相关的词)带入注意力向量的计算；其二，目前的注意力机制多为单层注意力机制，注意力一次建模后缺少交互性.针对这2个局限性，提出一种面向属性抽取的类卷积交互式注意力机制.该方法先将目标句输入到双向循环神经网络，借以获得每个词的隐式表达，再经过类卷积交互式注意力机制进行表示学习.类卷积交互式注意力机制分为2层注意力计算：第1层按序(从句首到句末)通过滑动窗口控制每个词的上下文宽度，并计算每个词的注意力分布向量；第2层将第1层的注意力分布向量与所有单词进行交互注意力计算，将得到的注意力向量与第1层的注意力向量拼接，最终输入到条件随机场进行属性标记.在2014—2016语义评估(semantic evaluation, SemEval)官方数据集上验证了模型的有效性.相比于基线模型，在4个数据集上的F1值分别提高了2.21，1.35，2.22，2.21个百分点.

Abstract: Attention mechanism is a common model in aspect extraction research. There are two limitations in attention mechanism towards aspect extraction: First, existing attention mechanism is mostly static attention or self attention.Self attention mechanism is a global attention mechanism, and it brings the irrelevant noises (words that are far away from the target word and unrelated to it) into attention vector; Second, existing attention mechanisms are mostly single-layer which lack interactivity. To address above two limitations, a convolutional interactive attention (CIA) mechanism is proposed in this paper. A bidirectional long short term memory network (Bi-LSTM) is exploited to obtain hidden representations of words in a target sentence, and then the convolutional interactive attention mechanism is used for representation learning. Convolutional interactive attention mechanism includes two layers: in the first layer, the number of context words for each target word is limited by a window, then the context words are used to calculate the attention vector of target word. In the second layer, the interactive attention vector is calculated by attention distribution of the first layer and all the words in target sentence. After that, we concatenate attention vectors of the first layer and second layer. Finally, conditional random field (CRF) is utilized to label aspects. This paper demonstrates the effectiveness of the proposed method over the official evaluation datasets of 2014—2016 Semantic Evaluation (SemEval).Compared with the baseline, the model proposed in this paper increases the F1 score of aspect extraction with 2.21%, 1.35%, 2.22% and 2.21% respectively on four datasets.

HTML全文

参考文献(0)

施引文献

资源附件(0)