高级检索
    李泽宇, 王泉, 杨鹏飞, 许志伟, 梁金鹏, 高歌. 基于动态自适应冗余的现场可编程门阵列容错方法[J]. 计算机研究与发展, 2022, 59(7): 1428-1438. DOI: 10.7544/issn1000-1239.20210181
    引用本文: 李泽宇, 王泉, 杨鹏飞, 许志伟, 梁金鹏, 高歌. 基于动态自适应冗余的现场可编程门阵列容错方法[J]. 计算机研究与发展, 2022, 59(7): 1428-1438. DOI: 10.7544/issn1000-1239.20210181
    Li Zeyu, Wang Quan, Yang Pengfei, Xu Zhiwei, Liang Jinpeng, Gao Ge. FPGA Fault Tolerance Based on Dynamic Self-Adaptive Redundancy[J]. Journal of Computer Research and Development, 2022, 59(7): 1428-1438. DOI: 10.7544/issn1000-1239.20210181
    Citation: Li Zeyu, Wang Quan, Yang Pengfei, Xu Zhiwei, Liang Jinpeng, Gao Ge. FPGA Fault Tolerance Based on Dynamic Self-Adaptive Redundancy[J]. Journal of Computer Research and Development, 2022, 59(7): 1428-1438. DOI: 10.7544/issn1000-1239.20210181

    基于动态自适应冗余的现场可编程门阵列容错方法

    FPGA Fault Tolerance Based on Dynamic Self-Adaptive Redundancy

    • 摘要: 现场可编程门阵列(field programmable gate array, FPGA)极易遭受由空间高能粒子辐射引发的故障,进而影响片上任务的正常执行.目前常采用三模冗余(triple modular redundance, TMR)进行容错设计,尽管可以取得较好的容错效果但存在资源开销大的问题.尤其当辐射水平较低时,对全部任务采用三模冗余方式执行能使上述资源开销大的问题更加严重.针对此,提出了一种基于动态自适应冗余的容错方法(fault tolerance based on dynamic self-adaptive redundancy, FTDSR).首先,利用片上块存储(block RAM, BRAM)对空间粒子辐射的高敏感性,设计改进了基于BRAM的辐射水平监测器,周期性监测空间环境的辐射水平;其次,以每个任务执行周期的松弛度时间和当前辐射水平为标准评估任务的可靠性等级,进而在不同辐射水平下以单个任务为粒度动态自适应地匹配冗余策略,保证片上任务成功执行,同时避免高资源开销.仿真实验表明,采用FTDSR的FPGA在不同辐射水平下具备高可靠性,与目前主流的FPGA冗余容错方法相比,在同一辐射水平条件下,片上任务完成量平均提高了57.2%.

       

      Abstract: Field programmable gate array (FPGA) is extremely susceptible to failures caused by high-energy particle radiation in space, thereby affecting the normal execution of on-chip tasks. At present, the triple modular redundance (TMR) method is usually used for fault-tolerant design. Although well fault-tolerant effect can be achieved, a large amount of resource expenditure is required. Especially when the radiation level is low, the implementation of TMR method for all tasks can aggravate the above problem of high resource overhead. In view of this, a method of FPGA fault tolerance based on dynamic self-adaptive redundancy is proposed. First of all, using the high sensitivity of on-chip block RAM (BRAM) to space particle radiation, the BRAM-based radiation level monitor is designed and improved to periodically monitor the radiation level of the space environment. Secondly, slack time of execution cycle and current radiation level are standard for evaluating the reliability levels of tasks, and then a task is used as a granular for dynamic self-adaptive matching redundancy strategy under different radiation levels to ensure the successful execution of on-chip tasks while avoiding high resource overhead. Simulation results show that the FPGA with this method has high reliability under different radiation levels. Compared with the popular FPGA fault tolerance method based on redundancy, the on-chip task completion is increased by 57.2% on average under the same radiation level.

       

    /

    返回文章
    返回