ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2014, Vol. 51 ›› Issue (9): 1980-1992.doi: 10.7544/issn1000-1239.2014.20130987

• 系统结构 • 上一篇    下一篇

基于定制协处理器的基因重测序加速技术研究

汤文1,2,张春明1,谭光明1,张佩珩1,孙凝晖1   

  1. 1(中国科学院计算技术研究所高性能计算机研究中心 北京 100190);2(中国科学院大学 北京 100049) (tangwen@ncic.ac.cn)
  • 出版日期: 2014-09-01
  • 基金资助: 
    基金项目:国家“九七三”重点基础研究发展计划基金项目(2012CB316502)

A Customized Coprocessor Acceleration of Genome Re-Sequencing

Tang Wen1,2, Zhang Chunming1, Tan Guangming1, Zhang Peiheng1, Sun Ninghui1   

  1. 1(High Performance Computer Research Center, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190);2(University of Chinese Academy of Sciences, Beijing 100049)
  • Online: 2014-09-01

摘要: 自2008年1月高通量测序技术应用以来,测序的通量和成本都在不断下降.然而基因数据的爆发式增长速度已经超过了摩尔定律,对海量数据的计算处理能力成为制约基因测序应用推广的瓶颈.以基于Hash索引的重测序算法为目标,对计算和访存行为进行分析,从而提出了一个现场可编程门阵列(field programmable gate array, FPGA)作为协处理器的架构,并在Convey公司的HC-1ex平台上进行了设计与实现.其基本处理单元内部采用全流水的设计及FIFO隔离计算模块和访存模块,可以完整执行重测序算法的核心流程.通过将基本处理单元和访存端口的一对一绑定,在4块Xilinx Virtex-6 LX760上实现了64路并行处理流程,总平均读内存带宽可达22.59GBps.与8核Intel Xeon处理器相比,可以提升28.5倍的性能.

关键词: 高通量测序技术, 短序列比对, Hash索引, 现场可编程门阵列, 异构体系结构

Abstract: Since January 2008 when the next-generation DNA sequencing platforms were developed, the sequencing throughput has been significantly improved. However, this technology has been challenged by the large amount of sequencing data which grows dramatically even over the Moore's Law. As an emerging data-intensive workload, the high-throughput re-sequencing tools like Hash-based programs shows different characteristics from traditional computational applications. Both low arithmetic intensity and irregular memory access pattern are major sources of inefficiency on commodity multi-core platforms. In this paper, we propose co-processor architecture for accelerating a short reads mapping algorithm. The complete mapping flow in one processing element (PE) is integrated to an exclusive memory port to improve the parallel performance. This proposed architecture is then implemented on a Convey HC-1ex reconfigurable computer. The design includes 64 parallel PEs on 4 Xilinx Virtex-6 LX760 that operate at 150MHz. Compared with an Intel Xeon 8-cores CPU, the speedup achieves 28.5 times, and the average memory read bandwidth achieves 22.59GBps. Therefore, this proposed design can potentially supply a solution to the large-amount data challenge and be applied in high throughput re-sequencing.

Key words: high-throughput sequencing, short reads mapping, Hash-index, field programmable gate array (FPGA), heterogeneous architecture

中图分类号: