Abstract:
Sequence alignment is a classical problem in biological sequence analysis, aiming to find out the similarity between sequences, which is of great significance for discovering functional, structural and evolutionary information in biological sequences. The problem can be divided into two categories: pairwise sequence alignment and multiple sequence alignment. The existing work is focused on specific algorithms, and no general solution is designed. In addition, there are few researches on the trustworthy of algorithms. By deeply analyzing the properties of sequence alignment problem and describing the essential characteristics of problem solving, a unified construction framework of sequence alignment dynamic programming algorithm seqAlign is designed based on the problem formal specification and formal method PAR. The process of constructing a multiple sequence alignment algorithm with three sequences by using the framework is further demonstrated, and the constructed results are formally verified by Isabelle theorem prover. Finally, the C++ executable program of the algorithm is generated by the code generation system of PAR platform. The process of mechanized construction of other sequence alignment algorithms using seqAlign framework is analyzed. Through strict specification refinement and formal verification, the reliability of the generated algorithm is effectively guaranteed. The developed seqAlign framework provides a general solution for the class of sequence alignment problems, which significantly improves the efficiency of generating sequence alignment algorithm families. The successful application of the designed seqAlign framework to sequence alignment problem in biological sequence analysis can provide a reference for the construction of highly reliable algorithms in complex bioinformatics field from the perspective of methodology and practice.