ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2019, Vol. 56 ›› Issue (8): 1652-1660.doi: 10.7544/issn1000-1239.2019.20190128

所属专题: 2019人工智能前沿进展专题

• 人工智能 • 上一篇    下一篇

融合CNN和Bi-LSTM的miRNA-lncRNA互作关系预测模型

石文浩,孟军,张朋,刘婵娟   

  1. (大连理工大学计算机科学与技术学院 辽宁大连 116023) (swh31809184@mail.dlut.edu.cn)
  • 出版日期: 2019-08-01
  • 基金资助: 
    国家自然科学基金项目(61872055,61702075)

Prediction of miRNA-lncRNA Interaction by Combining CNN and Bi-LSTM

Shi Wenhao, Meng Jun, Zhang Peng, Liu Chanjuan   

  1. (School of Computer Science and Technology, Dalian University of Technology, Dalian, Liaoning 116023)
  • Online: 2019-08-01

摘要: 非编码RNA(ncRNA)在很多动植物生命活动方面起着重要的调节作用,而微小RNA(miRNA)与长非编码RNA(lncRNA)的相互作用更为重要,其互作关系的研究不仅有助于深入分析基因间生物学功能,也可为疾病的诊治和植物的遗传育种方面提供新思路.目前,miRNA-lncRNA互作关系的预测大多使用生物实验和传统机器学习方法.由于生物鉴定代价高耗时长和机器学习涉及过多人工干预且特征提取过程复杂,在此提出一种融合卷积神经网络(convolutional neural network, CNN)和双向长短期记忆网络(bidirectional long short-term memory network, Bi-LSTM)的深度学习模型,兼备两者优势,既考虑序列间信息相关性和结合上下文信息,又能充分提取序列数据的特征.采用交叉检验评估模型性能,在玉米数据集上与传统机器学习方法和单一模型比对,取得较优的分类效果.另外,采用马铃薯和小麦数据集进行模型测试,准确率分别达到95%和93%以上,验证了模型具有良好的泛化能力.

关键词: 卷积神经网络, 双向长短期记忆网络, miRNA-lncRNA, 预测, 深度学习

Abstract: Non-coding RNA (ncRNA) plays an important regulatory role in many animal and plant life activities, and the interaction of microRNA (miRNA) and long non-coding RNA (lncRNA) is more important. The study of their interaction not only helps to analyze the biological functions of genes, but also provides new ideas for disease diagnosis and treatment and plant genetic breeding. At present, biological experiments and machine learning methods are mostly used to predict miRNA-lncRNA interaction. Due to high cost and time consuming of biological identification and the excessive manual intervention of machine learning and the complex feature extraction process, a deep learning model combining convolutional neural network (CNN) and bidirectional long short-term memory network (Bi-LSTM) is proposed. It combines the advantages of two models, considering the information correlation between sequences and combining context information, and fully extracting features between sequence data. In the experiment, the performance of model is evaluated by cross-validation, compared with the traditional machine learning methods and single model on zea mays dataset, and the superior classification effect is obtained. In addition, the model tests of solanum tuberosum and triticum aestivum species are carried out, and the accuracy rates are up to 95% and 93%, respectively, which verifies good generalization ability of the model.

Key words: convolutional neural network (CNN), bidirectional long short-term memory network (Bi-LSTM), miRNA-lncRNA, prediction, deep learning

中图分类号: