Abstract:
Based on the principle of symbolic dynamics, a novel graphical representation of RNA secondary structures is proposed. The free bases and paired bases in RNA secondary structures are mapped into two kinds of discrete time sequences by considering the biological information in free bases and free energy in paired bases, respectively. With no loss of information in the transfer of data from RNA secondary structures to their mathematical representation, the proposed graphical representation can also identify the paired regions of RNA in 2D graph, clearly. Based on this graphical representation, the characteristic matrices are constructed, and a vector consisting of the leading eigenvalues of these matrices are then designed for comparison of RNA secondary structures. In time and frequency domains, quantitative and qualitative analysis are performed to distinguish a set of RNA secondary structures at the 3′-terminus of different viruses, and similar results are acquired in the two domains. The examination of similaritiesdissimilarities illustrates the utility of the proposed graphical representation. Compared with other methods for similarity analysis, this proposed method can obtain the larger numerical difference between the dissimilar species and the similar ones, which will help to discriminate different species more easily.