面向中文自然语言文档的自动知识抽取方法
Automatic Knowledge Extraction from Chinese Natural Language Documents
-
摘要: 自动知识抽取方法可以自动识别并抽取Web文档中与本体匹配的事实知识。利用这些事实知识既可以构建基于知识的服务,也能够为语义Web的实现提供必要的语义数据。但面向自然语言特别是中文自然语言的自动知识抽取非常困难.提出了基于语义Web理论和中文自然语言处理(natural language processing, NLP)技术的自动知识抽取新方法AKE,用聚集体知识概念刻画N元关系知识,能够在不使用大规模语言知识库和同义词表的情况下自动识别中文自然语言文档内容中显式和隐含的简单事实知识和N元关系复杂事实知识.实验结果表明该方法优于目前已知的其他方法.Abstract: Automatic knowledge extraction method can recognize and extract the factual knowledge on matching the ontology from the Web documents automatically. These factual knowledge can not only be used to implement knowledge-based services but also provide necessary semantic content to enable the realization of the vision of Semantic Web. However, it is very difficult to deal with the natural language documents, especially the Chinese natural language documents. This paper proposes a new knowledge extraction method (AKE) based on Semantic Web theory and Chinese natural language processing (NLP) technologies. This method uses aggregated knowledge concept to depict N-ary relation knowledge in ontology and can automatically extract not only the explicit but also the implicit simple and N-ary complex factual knowledge from Chinese natural language documents without using the large scale linguistics databases and synonym table. Experimental results show that this method is better than other similar methods.