ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2019, Vol. 56 ›› Issue (9): 1889-1896.doi: 10.7544/issn1000-1239.2019.20180834

• 信息处理 • 上一篇    下一篇



  1. (西安电子科技大学计算机科学与技术学院 西安 710071) (
  • 出版日期: 2019-09-10
  • 基金资助: 

Prediction of Disease Associated Long Non-Coding RNA Based on HeteSim

Ma Yi, Guo Xingli, Sun Yutong, Yuan Qianqian, Ren Yang, Duan Ran, Gao Lin   

  1. (School of Computer Science and Technology, Xidian University, Xi’an 710071)
  • Online: 2019-09-10
  • Supported by: 
    This work was supported by the General Program of the National Natural Science Foundation of China (61672407, 61672406) and the Key Program of the National Natural Science Foundation of China (61432010, 61532014).

摘要: 越来越多的研究表明,长非编码 RNA(long non-coding RNA, lncRNA)在许多生物过程中具有重要的功能,而这些长非编码 RNA 的变异或功能失调会导致一些复杂疾病的发生.通过生物信息学方法预测潜在的长非编码 RNA-疾病关联关系,对于致病机理的探索以及疾病诊断、治疗、预后和预防都具有重要的意义.基于疾病基因关联关系的异质信息网络,研究者使用了一种相关性计算法方法——HeteSim来计算疾病基因之间的相关性,进而预测致病基因.使用的方法基于路径约束,具有可扩展性,算法效率高,留一交叉验证实验表明该方法的预测结果优于其他方法.将其应用在卵巢癌和胃癌的预测分析中,相关文献表明,所提方法的预测结果已被生物实验等验证,再次表明该方法的有效性.

关键词: 致病基因预测, 相关性计算, 异质信息网络, HeteSim方法, 元路径

Abstract: A growing number of evidences indicate that long non-coding RNAs (lncRNAs) play important roles in many biological processes, and mutations or dysfunction in these long non-coding RNAs can cause serious diseases in human bodies, such as various cancers. Biological methods have been exploited to predict potential associations between diseases and long non-coding RNAs, which are of great significance for the exploration of pathogenesis, diagnosis, treatment, prognosis and prevention of complex diseases. Heterogeneous information network is constructed based on the known disease-gene associations. The association strength between lncRNAs and diseases can be measured by an association score in the heterogeneous network. A simple method called HeteSim is applied to calculate the association scores between lncRNAs and diseases. The method used in this paper is based on all paths existing between a given disease and a given lncRNA. The experiments show that our method can achieve superior performance than state-of-art methods.Our predictions for ovarian cancer and gastric cancer have been verified by biological experiments, indicating the effectiveness of this method. The case studies indicate that our method can give informative clues for further investigation. In conclusion, the only paths based on known disease-gene associations are exploited, and it is can be expected that other disease associated information can also be integrated into our method, and better performance can be available.

Key words: disease-gene prediction, correlation calculation, heterogeneous information networks, HeteSim, meta-path