ISSN 1000-1239 CN 11-1777/TP

Journal of Computer Research and Development ›› 2017, Vol. 54 ›› Issue (1): 50-62.doi: 10.7544/issn1000-1239.2017.20150843

Previous Articles     Next Articles

A Composite Structure Based Method for Knowledge Base Taxonomy Matching

Lin Hailun1, Jia Yantao2, Wang Yuanzhuo2, Jin Xiaolong2, Cheng Xueqi2, Wang Weiping1   

  1. 1(Institute of Information Engineering, Chinese Academy of Sciences, Beijing 100093); 2(Key Laboratory of Network Data Science and Technology (Institute of Computing Technology, Chinese Academy of Sciences), Chinese Academy of Sciences, Beijing 100190)
  • Online:2017-01-01

Abstract: Taxonomy matching, i.e., an operation of taxonomy merging across different knowledge bases, which aims to align common elements between taxonomies, has been extensively studied in recent years due to its wide applications in knowledge base population and proliferation. However, with the continuous development of network big data, taxonomies are becoming larger and more complex, and covering different domains. Therefore, to pose an effective and general matching strategy covering cross-domain or large-scale taxonomies is still a considerable challenge. In this paper, we presents a composite structure based matching method, named BiMWM, which exploits the composite structure information of class in taxonomy, including not only the micro-structure but also the macro-structure. BiMWM models the taxonomy matching problem as an optimization problem on a bipartite graph. It works in two stages: it firstly creates a weighted bipartite graph to model the candidate matched classes pairs between two taxonomies, then performs a maximum weight matching algorithm to generate an optimal matching for two taxonomies in a global manner. BiMWM runs in polynomial time to generate an optimal matching for two taxonomies. Experimental results show that our method outperforms the state-of-the-art baseline methods, and performs good adaptability in different domains and scales of taxonomies.

Key words: knowledge base, taxonomy matching, composite structure, bipartite graph, maximum weight matching

CLC Number: