Abstract:
Implicit discourse relation recognition aims at automatically identifying semantic relations (such as Comparison) between two arguments (sentence or clause) in the absence of explicit connectives. Existing methods have confirmed that the introduction of phrase information can effectively boost the performance. However, there are still the following shortcomings: 1) These models typically rely on syntactic parsers and do not fully capture the interactions between words, phrases, and arguments. 2) The problem of data sparsity often occurs during training when incorporating the phrase information. To address the above issues, we propose an implicit discourse relation recognition model based on multi-granularity information interaction (MGII) and develop a chain decoding-inspired data augmentation method (DAM). Specifically, our proposed model is designed to automatically acquire semantic representations of n-grams using a stacked convolutional neural network. It then explicitly models the interactions between words, phrases and arguments based on Transformer layers and ultimately predicts multi-level discourse relationships in a chain-decoding way. Our data augmentation method simultaneously pretrains both the encoding and decoding modules, enabling the effective utilization of massive explicit discourse data, which are naturally annotated by connectives, to mitigate the issue of data sparsity. The proposed method significantly outperforms recent benchmark models on the PDTB datasets. Furthermore, it does not rely on syntactic parsers, demonstrating strong applicability.