高级检索

    基于中间域语义传导的跨领域文本生成方法

    Cross-Domain Text Generation Method Based on Semantic Conduction of Intermediate Domains

    • 摘要: 在多领域数据的文本生成场景中,不同领域中的数据通常存在差异性,而新领域的引入会同时带来数据缺失的问题. 传统的有监督方法,需要目标领域中大量包含标记的数据来训练深度神经网络文本生成模型,而且训练好的模型无法在新领域中取得良好的泛化效果. 针对多领域场景中数据差异和数据缺失的问题,受到迁移学习方法的启发,设计了一种综合性的迁移式文本生成方法,减少了不同领域之间文本数据的差异性,同时借助已有领域和新领域之间文本数据上的语义关联性,帮助深度神经网络文本生成模型在新领域上进行泛化. 通过在公开数据集上的实验,验证了所提方法在多领域场景下领域迁移的有效性,模型在新领域上进行文本生成时具有较好的表现,对比现有的其他迁移式文本生成方法,在各项文本生成评价指标上均有提升.

       

      Abstract: The deep neural network has been widely used in natural language processing. In text generation tasks with multi-domain data, there is often a discrepancy of data in different domains. And the introduction of new domains can simultaneously bring about the problem of data deficiency. The supervised methods require a large amount of data containing ground-truth in the domain of the task to train a deep neural network text generation model, and the trained model cannot achieve good generalization in a new domain. To address the problems of data distribution differences and data deficiency in multi-domain tasks, a comprehensive transfer text generation method inspired by transfer learning methods is designed to reduce the data distribution differences in text data between different domains while leveraging the semantic correlation on text data between source domain and target domain to help deep neural network text generation models generalize over new domains. The effectiveness of the proposed method for domain transfer is verified through experiments on a publicly available dataset, and the transfer deep neural network text generation model has a better performance in text generation on new domains. Also, the proposed method improves in all text generation evaluation metrics compared with other existing transfer text generation methods.

       

    /

    返回文章
    返回