高级检索

    生物序列比对DP算法的统一形式化构造与Isabelle验证

    Unified Formal Construction and Isabelle Verification of the Dynamic Programming Algorithms for Biological Sequence Alignment

    • 摘要: 序列比对是生物序列分析中的一个经典问题,旨在找出序列之间的相似性,它对于发现生物序列中的功能、结构和进化信息都具有重要的意义. 该问题可分为双序列比对和多序列比对2类,现有工作多针对特定算法展开,没有设计通用的求解方法;此外,甚少涉及算法可信性的研究. 从生物序列比对问题的形式化规约出发,通过深入分析问题的性质,刻画问题求解的本质特征,借助形式化方法PAR(partition and recursion)设计了序列比对动态规划算法的统一构造框架seqAlign,展示了应用该框架构造序列数为3的多序列比对算法的过程,并使用Isabelle定理证明器对构造结果进行形式化验证. 最后,利用PAR平台生成了该算法的C++可执行程序,进一步分析了由seqAlign框架机械化构造其他类型序列比对算法的过程. 通过严密的规约精化和形式验证,有效地保证了生成算法的可信性;开发的seqAlign框架提供了序列比对问题类的通用求解方案,显著提高了序列比对算法族生成的效率. 研究结果在生物序列分析中序列比对问题上的成功应用,从方法学和实践上可为复杂生物信息学领域高可靠算法的构造提供参考.

       

      Abstract: Sequence alignment is a classical problem in biological sequence analysis, aiming to find out the similarity between sequences, which is of great significance for discovering functional, structural and evolutionary information in biological sequences. The problem can be divided into two categories: pairwise sequence alignment and multiple sequence alignment. The existing work is focused on specific algorithms, and no general solution is designed. In addition, there are few researches on the trustworthy of algorithms. By deeply analyzing the properties of sequence alignment problem and describing the essential characteristics of problem solving, a unified construction framework seqAlign of sequence alignment dynamic programming algorithm is designed based on the problem formal specification and formal method PAR. The process of constructing a multiple sequence alignment algorithm with 3 sequences by using the framework is further demonstrated, and the constructed results are formally verified by Isabelle theorem prover. Finally, the C++ executable program of the algorithm is generated by the code generation system of PAR platform. The process of mechanized construction of other sequence alignment algorithms using seqAlign framework is analyzed. Through strict specification refinement and formal verification, the reliability of the generated algorithm is effectively guaranteed. The developed seqAlign framework provides a general solution for the class of sequence alignment problems, which significantly improves the efficiency of generating sequence alignment algorithm families. The successful application of the designed seqAlign framework to sequence alignment problem in biological sequence analysis can provide a reference for the construction of highly reliable algorithms in complex bioinformatics field from the perspective of methodology and practice.

       

    /

    返回文章
    返回