ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2020, Vol. 57 ›› Issue (6): 1252-1268.doi: 10.7544/issn1000-1239.2020.20190641

• 人工智能 • 上一篇    下一篇

古诗词图谱的构建及分析研究

刘昱彤,吴斌,白婷   

  1. (北京市智能通信软件与多媒体重点实验室(北京邮电大学) 北京 100876) (北京邮电大学计算机学院 北京 100876) (liuyutong@bupt.edu.cn)
  • 出版日期: 2020-06-01
  • 基金资助: 
    国家重点研发计划项目(2018YFC0831500);国家自然科学基金项目(U1936220,61972047)

The Construction and Analysis of Classical Chinese Poetry Knowledge Graph

Liu Yutong, Wu Bin, Bai Ting   

  1. (Beijing Key Laboratory of Intelligent Telecommunications Software and Multimedia (Beijing University of Posts and Telecommunications), Beijing 100876) (School of Computer Science, Beijing University of Posts and Telecommunications, Beijing 100876)
  • Online: 2020-06-01
  • Supported by: 
    This work was supported by the National Key Research and Development Program of China (2018YFC0831500) and the National Natural Science Foundation of China (U1936220, 61972047).

摘要: 古诗词是中国宝贵的文化遗产.利用计算机对诗词进行辅助研究,对语言、文学、传承普及中华文化,具有重要意义.然而,关于诗词的知识是高度碎片化的,原因是互联网上的诗词知识,不仅存在于诗词本身,还分布于诗词的各种解读资料,比如诗词的注释、译文、赏析等.若以知识图谱的方式,捕捉古诗词中词语之间潜在的语义联系并将它们以知识的方式关联起来,能够将诗词碎片化的知识有条理地整合在一起,从而更好地对古诗词知识进行推理和分析.基于此,提出了一种古诗词知识图谱的构建方法.构建图谱的节点时,首先利用改进的Apriori算法产生诗词中的候选词,然后检验候选词是否出现在诗词注释和中文词典中,从而判断其是否构成图谱节点.构建图谱的边时,首先利用注释信息在词语之间建立语义联系,然后用人工构建的诗词分类体系在抽象的语义之间建立联系.最终得到一个内容覆盖全面且包含多层词语语义联系的古诗词图谱.古诗词图谱可用于对诗词各种不同维度的分析研究,相比于基于字的数据分析,利用古诗词图谱能够从语义的角度更加深入具体地辅助文学研究.以唐诗为例,说明了古诗词图谱在诗词分析中的必要性.此外,古诗词图谱还适用于各种关于诗词的推理和分析任务,以判定诗词题材和分析诗词情感这2个任务为例,证明了古诗词图谱的有效性和应用价值.

关键词: 数字人文, 古诗词, 知识图谱, 数据分析, 推理分析

Abstract: Classical Chinese poetry is a precious cultural heritage. It is significant to use the rich information in classical Chinese poetry to further investigate the language, literature and historical development of Chinese culture. However, the knowledge of classical Chinese poetry is highly fragmented. It not only exists in poetry itself, but also is widely distributed in the materials which are used to explain poetry, such as annotations, translations, appreciations, etc. Our aim is to obtain the potential semantic relationship between words and expressions, and use knowledge graph to link them. By doing this, we could integrate fragmented knowledge in a systematic way, which enables us to achieve better reasoning and analysis of classical Chinese poetry knowledge. In this paper, we propose a method to construct classical Chinese poetry knowledge graph (CCP-KG). About building nodes of CCP-KG, we use the improved Apriori algorithm to generate candidate words, then check if the candidate word appears in the annotations to determine when it can be a node of CCP-KG. About building edges of CCP-KG, the semantic relationship between words is established through the annotations, then we use the artificially constructed classical Chinese poetry hierarchical structure to establish the relationship between abstract semantics. Finally, we obtain CCP-KG, which covers every aspect of classical Chinese poetry and contains multi-layer semantic links between words. Taking Tang poetry as an example, CCP-KG can be used to analysis classical Chinese poems in different dimensions. Compared with character-based data analysis, the use of CCP-KG assists literary research more in-depth from the perspective of semantics. Therefore, CCP-KG is necessary in analyzing classical Chinese poems. In addition, CCP-KG can also be applied to various tasks like reasoning and analysis in classical Chinese poetry. We conduct experiments on the tasks of determining the theme of poetry and analyzing the emotion of poetry respectively, showing the effectiveness and application value of our constructed CCP-KG.

Key words: digital humanities, classical Chinese poetry, knowledge graph, data analysis, reasoning analysis

中图分类号: