Abstract:
The genome sorting problem considers how species are related to each other by computing rearrangement distances between their genomes. The rearrangement distance between two genomes is the minimum number of genetic mutations needed to transform one to another. Multi-chromosomal genomes evolve frequently by a rearrangement event called translocation, which have been well studied. Many research works have been done for the case where only translocations occur. However, the more common conditions in computational biology is that there are genes that appear only in the source or target genome. A method solving this involves the evolution events called insertion and deletion. There has been an approximation algorithm for sorting genomes using translation, insertion and deletion (SG-TID). In this paper, we analyze the relationship between the three operations and then define the invertibility among the operations, through which SG-TID is transformed to a simpler one called sorting genomes by translocation and deletions (SG-TD). Additionally in the algorithm solving the SG-TD problem, we use O(n) memory space to record some important parameters, which help reduce the time complexity to a considerable level. We first give formulas for computing the rearrangement distance for SG-TID, then present a simpler algorithm to compute the rearrangement distance and find the corresponding transformation sequence in O(n) space and O(n\+2) time, which is faster and more practical than the previous TID algorithm.