ISSN 1000-1239 CN 11-1777/TP

Journal of Computer Research and Development ›› 2017, Vol. 54 ›› Issue (12): 2674-2686.doi: 10.7544/issn1000-1239.2017.20170638

Special Issue: 2017人工智能应用专题

Previous Articles     Next Articles

EAE: Enzyme Knowledge Graph Adaptive Embedding

Du Zhijuan, Zhang Yi, Meng Xiaofeng, Wang Qiuyue   

  1. (School of Information, Renmin University of China, Beijing 100872)
  • Online:2017-12-01

Abstract: In recent years a drastic rise in constructing Web-scale knowledge graph (KG) has appeared and the deal with practical problems falls back on KG. Embedding learning of entities and relations has become a popular method to perform machine learning on relational data such as KG. Based on embedding representation, knowledge analysis, inference, fusion, completion and even decision-making could be promoted. Constructing and embedding open-domain knowledge graph (OKG) has mushroomed,which greatly promots the intelligentization of big data in open domain. Meanwhile, specific-domain knowledge graph (SKG) has become an important resource for smart applications in specific domain. However, SKG is developing and its embedding is still in the embryonic stage. This is mainly because there is a germination in SKG due to the difference for data distributions between OKG and SKG. More specifically: 1) In OKG, such as WordNet and Freebase, sparsity of head and tail entities are nearly equal, but in SKG, such as Enzyme KG and NCI-PID, inhomogeneous is more popular. For example, the tail entities are about 1000 times more than head ones in the enzyme KG of microbiology area. 2) Head and tail entities can be commuted in OKG,but they are noncommuting in SKG because most of relations are attributes. For example, entity “Obama” can be a head entity or a tail entity, but the head entity “enzyme” is always in the head position in the enzyme KG. 3) Breadth of relation has a small skew in OKG while imbalance in SKG. For example, a enzyme entity can link 31809 x-gene entities in the enzyme KG. Based on observation, we propose a novel approach EAE to deal with the 3 issues. We evaluate our approach on link prediction and triples classification tasks. Experimental results show that our approach outperforms Trans(E, H, R, D and TransSparse) significantly, and achieves state-of the-art performance.

Key words: specific-domain knowledge graph (SKG), enzyme, embedding, inhomogeneous, nonco-mmuting, imbalance

CLC Number: