Abstract:
With continuous technology scaling, microprocessors are becoming more and more susceptible to soft errors. Architectural vulnerability factor(AVF), which has been introduced to quantify the vulnerability of on-chip structures to soft errors, has demonstrated to exhibit significant runtime variations. While traditional fault tolerant techniques which take no account of the dynamic characteristics of AVF provide protection throughout the entire lifetime of programs, possibly leading to the over-protection and inducing significant costs. AVF prediction based dynamic fault tolerant techniques provide error protection only at the execution points with high AVF rather than the whole execution lifetime of programs, thereby maintaining the reliability goal with minimum cost. In this paper, we aim at developing an efficient online AVF predictor which can be used in dynamic fault tolerant management schemes for L2 Cache. We firstly improve the method of Cache AVF computation and characterize the dynamic vulnerability behavior of L2 Cache. Then based on the observations of the dynamic behavior of L2 Cache AVF, we propose to employ the Bayesian additive regression trees(BART) method to accurately model the variation of L2 Cache AVF and employ bump hunting technique to extract some simple selecting rules on several key performance metrics, thus enabling a fast and efficient prediction of L2 Cache AVF.