Abstract:
RNA-binding proteins (RNA-BPs) play pivotal roles in alternative splicing, RNA editing, methylating and many other biological functions. Predicting functions of these proteins from primary amino acids sequences are becoming one of the major challenges in functional annotation of genomes. Traditional prediction methods often devote themselves to extracting physicochemical features from sequences but ignoring motif information and location information between motifs. Meanwhile, the small scale of data volumes and large noises in training data result in lower accuracy and reliability of predictions. In this paper, we propose a new deep learning based model to predict RNA-binding proteins from primary sequences. The model utilizes two stages of convolutional neutral network(CNN) to detect the function domain of protein sequences, and long short-term memory neural network(LSTM) to obtain the length-fixed feature representation of sequences and learn long short-term dependencies between function domains of protein sequences. It overcomes more human intervention in feature selection procedure than in traditional machine learning method, since all features are learned automatically. The experimental results show its priority in processing large scale of sequence data.