ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2017, Vol. 54 ›› Issue (12): 2674-2686.doi: 10.7544/issn1000-1239.2017.20170638

所属专题: 2017人工智能应用专题

• 人工智能 • 上一篇    下一篇

EAE:一种酶知识图谱自适应嵌入表示方法

杜治娟,张祎,孟小峰,王秋月   

  1. (中国人民大学信息学院 北京 100872) (2014000654@ruc.edu.cn)
  • 出版日期: 2017-12-01
  • 基金资助: 
    国家自然科学基金项目(61379050,61532010,91646203,61532016,61762082);国家重点研发计划项目(2016YFB1000603,2016YFB1000602);2017年度河南省科技开放合作项目(172106000077);北大方正集团有限公司数字出版技术国家重点实验室开放课题

EAE: Enzyme Knowledge Graph Adaptive Embedding

Du Zhijuan, Zhang Yi, Meng Xiaofeng, Wang Qiuyue   

  1. (School of Information, Renmin University of China, Beijing 100872)
  • Online: 2017-12-01

摘要: 近年来,构建大规模知识图谱(knowledge graph, KG),并用其解决实际问题已经成为大趋势.KG的嵌入表示方便了机器学习在KG等关系数据上的应用,它可以促进知识分析、推理、融合、补全,甚至决策.最近,开放域知识图谱(open-domain knowledge graph, OKG)的构建和嵌入表示已经得到蓬勃发展,大大促进了开放域中大数据的智能化.与此同时,特定域知识图谱(specific-domain knowledge graph, SKG)也成为了特定领域中智能应用的重要资源.但是,SKG还不发达,其嵌入表示尚处于萌芽阶段.这主要是由于SKG与OKG的数据分布显著不同,更具体地说:1)在OKG中,如WordNet,Freebase,头/尾实体的稀疏度几乎相等;但是在Enzyme,NCI-PID等SKG中不均匀性更受欢迎,例如微生物领域的酶KG中尾实体是头实体的1000倍.2)头实体和尾实体可以在OKG中交换位置,但是它们在SKG中是非交换的,因为大多数关系是属性.例如实体“奥巴马”可以是头实体也可以是尾实体,但是头实体“酶”总是处于头位置.3)关系的广度在OKG中具有小的偏差,而SKG中很不平衡.例如一个酶实体甚至可以链接31809个“x-gene”实体.基于这些观察,提出了一个新方法EAE来处理这3个问题,并在链接预测和元组分类任务上评估了EAE方法.实验结果表明:EAE显著优于Trans(E,H,R,D和TransSparse),达到了最先进的性能.

关键词: 特定域知识图谱, 酶, 嵌入表示, 不均匀, 非交换, 不平衡

Abstract: In recent years a drastic rise in constructing Web-scale knowledge graph (KG) has appeared and the deal with practical problems falls back on KG. Embedding learning of entities and relations has become a popular method to perform machine learning on relational data such as KG. Based on embedding representation, knowledge analysis, inference, fusion, completion and even decision-making could be promoted. Constructing and embedding open-domain knowledge graph (OKG) has mushroomed,which greatly promots the intelligentization of big data in open domain. Meanwhile, specific-domain knowledge graph (SKG) has become an important resource for smart applications in specific domain. However, SKG is developing and its embedding is still in the embryonic stage. This is mainly because there is a germination in SKG due to the difference for data distributions between OKG and SKG. More specifically: 1) In OKG, such as WordNet and Freebase, sparsity of head and tail entities are nearly equal, but in SKG, such as Enzyme KG and NCI-PID, inhomogeneous is more popular. For example, the tail entities are about 1000 times more than head ones in the enzyme KG of microbiology area. 2) Head and tail entities can be commuted in OKG,but they are noncommuting in SKG because most of relations are attributes. For example, entity “Obama” can be a head entity or a tail entity, but the head entity “enzyme” is always in the head position in the enzyme KG. 3) Breadth of relation has a small skew in OKG while imbalance in SKG. For example, a enzyme entity can link 31809 x-gene entities in the enzyme KG. Based on observation, we propose a novel approach EAE to deal with the 3 issues. We evaluate our approach on link prediction and triples classification tasks. Experimental results show that our approach outperforms Trans(E, H, R, D and TransSparse) significantly, and achieves state-of the-art performance.

Key words: specific-domain knowledge graph (SKG), enzyme, embedding, inhomogeneous, nonco-mmuting, imbalance

中图分类号: