基于FPGA的存储优化的细粒度并行Zuker算法加速器研究

夏飞; 窦勇; 徐佳庆; 张阳

基于FPGA的存储优化的细粒度并行Zuker算法加速器研究

Fine-Grained Parallel Zuker Algorithm Accelerator with Storage Optimization on FPGA

摘要

摘要: RNA二级结构预测是生物信息学领域重要的研究方向，基于最小自由能模型的Zuker算法是目前该领域最典型使用最广泛的算法之一.基于FPGA平台实现了一种细粒度的并行Zuker算法，采用按矩阵列循环划分的任务分配策略实现了处理单元间的负载平衡；采用数据预取、滑动窗口和数据传递流水线实现了处理单元间的数据重用；采用曲线拟合、离散点赋值和地址空间压缩编码等策略减少了约85%的自由能参数存储需求.在单片FPGA上集成了由20个PE构成的主从多PE线性阵列，实验结果表明与运行在AMD四核9650处理器上的ViennaRNA-1.6.5程序相比，可获得超过18倍的加速效果，并且FPGA加速器功耗仅为通用微处理器平均功耗的1/5.

Abstract: In the field of RNA secondary structure prediction, the Zuker algorithm is a most popular method using free energy minimization models. However, general-purpose computers including parallel computers or multi-core computers exhibit embarrassing efficiency of no more than 50%. FPGA chips provide a new approach to accelerate the Zuker algorithm by exploiting fine-grained custom design. The Zuker algorithm shows complicated data dependence, in which the dependence distance is variable, and the dependence direction is also across two dimensions. We propose a systolic-like array including one master PE and multiple slave PEs for fine-grained hardware implementation on FPGA. We partition tasks by columns and assign tasks to PEs for load balance. We exploit data reuse schemes to reduce the need to load matrix from external memory by a sliding triangle window cache and transferring local elements to adjoining PEs. We also propose several methods, fitting curves with linear function, replacing scattered points with register constants, compressing address space and shortening data length to greatly reduce energy parameter tables by more than 85%. The experimental results show a factor of 18x speedup over the ViennaRNA-1.6.5 software for 2981-residue RNA sequence running on a PC platform with AMD Phenom 9650 Quad CPU, however the power consumption of our FPGA accelerator is only about 20% of the latter.

HTML全文

参考文献(0)

施引文献

资源附件(0)