ISSN 1000-1239 CN 11-1777/TP

Journal of Computer Research and Development ›› 2018, Vol. 55 ›› Issue (1): 93-101.doi: 10.7544/issn1000-1239.2018.20160508

Previous Articles     Next Articles

A Deep Learning Model for Predicting RNA-Binding Proteins Only from Primary Sequences

Li Hongshun, Yu Hua, Gong Xiujun   

  1. (School of Computer Science and Technology, Tianjin University, Tianjin 300072) (Tianjin Key Laboratory of Cognitive Computing and Application (Tianjin University), Tianjin 300072)
  • Online:2018-01-01

Abstract: RNA-binding proteins (RNA-BPs) play pivotal roles in alternative splicing, RNA editing, methylating and many other biological functions. Predicting functions of these proteins from primary amino acids sequences are becoming one of the major challenges in functional annotation of genomes. Traditional prediction methods often devote themselves to extracting physicochemical features from sequences but ignoring motif information and location information between motifs. Meanwhile, the small scale of data volumes and large noises in training data result in lower accuracy and reliability of predictions. In this paper, we propose a new deep learning based model to predict RNA-binding proteins from primary sequences. The model utilizes two stages of convolutional neutral network(CNN) to detect the function domain of protein sequences, and long short-term memory neural network(LSTM) to obtain the length-fixed feature representation of sequences and learn long short-term dependencies between function domains of protein sequences. It overcomes more human intervention in feature selection procedure than in traditional machine learning method, since all features are learned automatically. The experimental results show its priority in processing large scale of sequence data.

Key words: RNA-binding proteins, convolutional neutral network (CNN), long short-term memory neural network (LSTM), feature learning, deep learning

CLC Number: