Citation: | Shi Haihe, Lan Sunwen, Liu Riming, Shi Haipeng, Wang Lan, Zhong Linhui. Unified Formal Construction and Isabelle Verification of the Dynamic Programming Algorithms for Biological Sequence Alignment[J]. Journal of Computer Research and Development, 2025, 62(1): 119-131. DOI: 10.7544/issn1000-1239.202330698 |
Sequence alignment is a classical problem in biological sequence analysis, aiming to find out the similarity between sequences, which is of great significance for discovering functional, structural and evolutionary information in biological sequences. The problem can be divided into two categories: pairwise sequence alignment and multiple sequence alignment. The existing work is focused on specific algorithms, and no general solution is designed. In addition, there are few researches on the trustworthy of algorithms. By deeply analyzing the properties of sequence alignment problem and describing the essential characteristics of problem solving, a unified construction framework of sequence alignment dynamic programming algorithm seqAlign is designed based on the problem formal specification and formal method PAR. The process of constructing a multiple sequence alignment algorithm with three sequences by using the framework is further demonstrated, and the constructed results are formally verified by Isabelle theorem prover. Finally, the C++ executable program of the algorithm is generated by the code generation system of PAR platform. The process of mechanized construction of other sequence alignment algorithms using seqAlign framework is analyzed. Through strict specification refinement and formal verification, the reliability of the generated algorithm is effectively guaranteed. The developed seqAlign framework provides a general solution for the class of sequence alignment problems, which significantly improves the efficiency of generating sequence alignment algorithm families. The successful application of the designed seqAlign framework to sequence alignment problem in biological sequence analysis can provide a reference for the construction of highly reliable algorithms in complex bioinformatics field from the perspective of methodology and practice.
[1] |
詹乃军,王戟. 复杂系统规约、分析与验证发展现状与展望[J]. 前瞻科技,2023,2(1):7−22 doi: 10.3981/j.issn.2097-0781.2023.01.001
Zhan Naijun, Wang Ji. Challenges and trends for specification, analysis, and verification of complex systems[J]. Science and Technology Foresight, 2023, 2(1): 7−22 (in Chinese) doi: 10.3981/j.issn.2097-0781.2023.01.001
|
[2] |
Hoare T. The verifying compiler: A grand challenge for computing research[J]. Journal of the ACM, 2003, 50(1): 63−69 doi: 10.1145/602382.602403
|
[3] |
王勇献. 生物信息学导论:面向高性能计算的算法与应用[M]. 北京:清华大学出版社,2011
Wang Yongxian. Introduction to Bioinformatics: Algorithms and Applications for High Performance Computing [M]. Beijing: Tsinghua University Press, 2011 (in Chinese)
|
[4] |
Needleman S B, Wunsch C D. A general method applicable to the search for similarities in the amino acid sequence of two proteins[J]. Journal of Molecular Biology, 1970, 48(3): 443−453 doi: 10.1016/0022-2836(70)90057-4
|
[5] |
Smith T F, Waterman M S. Identification of common molecular subsequences.[J]. Journal of Molecular Biology, 1981, 147(1): 195−197 doi: 10.1016/0022-2836(81)90087-5
|
[6] |
Garai G, Chowdhury B. A cascaded pairwise biomolecular sequence alignment technique using evolutionary algorithm[J]. Information Sciences, 2015, 297(3): 118−139
|
[7] |
Alawneh L, Shehab M A, Al-Ayyoub M, et al. A scalable multiple pairwise protein sequence alignment acceleration using hybrid CPU–GPU approach[J]. Cluster Computing, 2020, 23(7): 2677−2688
|
[8] |
Rashed A E E D, Obaya M, El H, et al. Accelerating DNA pairwise sequence alignment using FPGA and a customized convolutional neural network[J]. Computers & Electrical Engineering, 2021, 92(6): 1−21
|
[9] |
Song Y J, Ji D J, Seo H, et al. Pairwise heuristic sequence alignment algorithm based on deep reinforcement learning[J]. IEEE Open Journal of Engineering in Medicine and Biology, 2021, 2: 36−43 doi: 10.1109/OJEMB.2021.3055424
|
[10] |
Clustal W, Thompson J D, Higgins D G, et al. Improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice[J]. Nucleic Acids Research, 1994, 22(22): 4673−4680 doi: 10.1093/nar/22.22.4673
|
[11] |
Katoh K, Misawa K, Kuma K, et al. MAFFT: A novel method for rapid multiple sequence alignment based on fast Fourier transform[J]. Nucleic Acids Research, 2002, 30(14): 3059−3066 doi: 10.1093/nar/gkf436
|
[12] |
Edgar R C. MUSCLE: Multiple sequence alignment with high accuracy and high throughput[J]. Nucleic Acids Research, 2004, 32(5): 1792−1797 doi: 10.1093/nar/gkh340
|
[13] |
Abuín J M, Pena T F, Pichel J C. PASTASpark: Multiple sequence alignment meets big data[J]. Bioinformatics, 2017, 33(18): 2948−2950 doi: 10.1093/bioinformatics/btx354
|
[14] |
Wan Shixiang, Zou Quan. HAlign-II: Efficient ultra-large multiple sequence alignment and phylogenetic tree reconstruction with distributed and parallel computing[J]. Algorithms for Molecular Biology, 2017, 12(1): 1−10 doi: 10.1186/s13015-017-0092-1
|
[15] |
Smirnov V, Warnow T. MAGUS: Multiple sequence alignment using graph clustering[J]. Bioinformatics, 2021, 37(12): 1666−1672 doi: 10.1093/bioinformatics/btaa992
|
[16] |
Mirarab S, Nguyen N, Guo Sheng, et al. PASTA: Ultra-large multiple sequence alignment for nucleotide and amino-acid sequences.[J]. Journal of Computational Biology: A Journal of Computational Molecular Cell Biology, 2015, 22(5): 377−386
|
[17] |
王戟,詹乃军,冯新宇,等. 形式化方法概貌[J]. 软件学报,2019,30(1):33−61
Wang Ji, Zhan Naijun, Feng Xinyu, et al. Overview formalization method[J]. Journal of Software, 2019, 30(1): 33−61 (in Chinese)
|
[18] |
Xue Jinyun. A unified approach for developing efficient algorithmic programs[J]. Journal of computer Science and Technology, 1997, 12(4): 314−329 doi: 10.1007/BF02943151
|
[19] |
Durán J E. Transformational derivation of greedy network algorithms from descriptive specifications[C]// Proc of the 6th Int Conf on Mathematics of Program Construction. Beilin: Springer, 2002: 40−67
|
[20] |
Abrial J R, Cansell D, Méry D. Formal derivation of spanning trees algorithms[C]// Proc of the 3rd Int Conf of B and Z Users. Berlin: Springer, 2003: 457−476
|
[21] |
Almeida J B, Pinto J S. Deriving sorting algorithms[J]. arXiv preprint, arXiv: 0802.3881, 2008
|
[22] |
Nedunuri S, Cook W R. Synthesis of fast programs for maximum segment sum problems[J]. ACM SIGPLAN Notices, 2009, 45(2): 117−126
|
[23] |
Mu S C. Calculating a backtracking algorithm: An exercise in monadic program derivation[J]. arXiv preprint, arXiv: 2101.09409, 2021
|
[24] |
Lammich P. Generating verified LLVM from Isabelle/HOL[C]// Proc of the 10th Int Conf on Interactive Theorem Proving (ITP 2019). Berlin: Schloss Dagstuhl-Leibniz-Zentrum für Informatik, 2019, 22: 1−22: 19
|
[25] |
张健,李弋,彭鑫,等. 正反例归纳合成SQL查询程序[J]. 软件学报,2023,34(9):4132−4152
Zhang Jian, Li Yi, Peng Xin, et al. Inductive SQL synthesis with positive and negative tuples[J]. Journal of Software, 2023, 34(9): 4132−4152 (in Chinese)
|
[26] |
石海鹤,薛锦云. 基于PAR的算法形式化开发[J]. 计算机学报,2009,32(5):982−991 doi: 10.3724/SP.J.1016.2009.00982
Shi Haihe, Xue Jinyun. PAR-based formal development of algorithms[J]. Chinese Journal of Computers, 2009, 32(5): 982−991 (in Chinese) doi: 10.3724/SP.J.1016.2009.00982
|
[27] |
孙凌宇,薛锦云. 最小生成树算法的PAR方法形式化推导[J]. 计算机工程,2006,32(21):85−87 doi: 10.3969/j.issn.1000-3428.2006.21.030
Sun Lingyu, Xue Jinyun. Formal derivation of the minimum spanning tree algorithm with PAR method[J]. Computer Engineering, 2006, 32(21): 85−87 (in Chinese) doi: 10.3969/j.issn.1000-3428.2006.21.030
|
[28] |
游珍,薛锦云. 基于Isabelle定理证明器算法程序的形式化验证[J]. 计算机工程与科学,2009,31(10):85−89 doi: 10.3969/j.issn.1007-130X.2009.10.024
You Zhen, Xue Jinyun. Formal verification of algorithmic programs based on the Isabelle theorem prover[J]. Computer Engineering and Science, 2009, 31(10): 85−89 (in Chinese) doi: 10.3969/j.issn.1007-130X.2009.10.024
|
[29] |
石海鹤,周卫星. 基于动态规划的双序列比对算法构件设计与实现[J]. 计算机研究与发展,2019,56(9):1907−1917 doi: 10.7544/issn1000-1239.2019.20180835
Shi Haihe, Zhou Weixing. Design and implementation of pairwise sequence alignment algorithm components based on dynamic programming[J]. Journal of Computer Research and Development, 2019, 56(9): 1907−1917 (in Chinese) doi: 10.7544/issn1000-1239.2019.20180835
|
[30] |
Shi Haipeng, Chen Huan, Yang Qinghong, et al. A method for bio-sequence analysis algorithm development based on the PAR platform[J]. Big Data Mining and Analytics, 2023, 6(1): 11−20
|
[31] |
肖存威,石海鹤,王岚,等. 基于混合策略的de novo序列拼接算法构造[J]. 江西师范大学学报:自然科学版,2022,46(3):300−307
Xiao Cunwei, Shi Haihe, Wang Lan, et al. The construction of de novo sequence assembly algorithm based on hybrid strategy[J]. Journal of Jiangxi Normal University: Natural Science, 2022, 46(3): 300−307 (in Chinese)
|
[32] |
Dijkstra E W. A Discipline of Programming[M]. Princeton, NJ: Prentice-hall, 1976
|
[33] |
Floyd R W. Assigning meaning to programs[J]. Communications of the ACM, 1967, 10(8): 365−371
|
[34] |
Hoare C A R. An axiomatic basis for computer programming[J]. Communications of the ACM, 1969, 12(10): 576−580 doi: 10.1145/363235.363259
|
[35] |
Dijkstra E W, Scholten C S. Predicate Calculus and Program Semantics[M]. Berlin: Springer, 2012
|
[36] |
石海鹤. 支持泛型程序设计的Apla-Java自动程序转换系统[D]. 南昌:江西师范大学,2004
Shi Haihe. Apla-Java automatic program transformation system supporting generic programming [D]. Nanchang: Jiangxi Normal University, 2004 (in Chinese)
|
[37] |
赖勇. APLA到C++自动程序转换系统的研制[D]. 南昌:江西师范大学,2002
Lai Yong. Development of APLA to C++ automatic program conversion system [D]. Nanchang: Jiangxi Normal University, 2002 (in Chinese)
|
[38] |
Xue Jinyun. Two new strategies for developing loop invariants and their applications[J]. Journal of Computer Science and Technology, 1993, 8(2): 147−154 doi: 10.1007/BF02939477
|