高级检索
    郑宁汉, 古志民, 孙贤和. 小计算量下非规则数据密集型热函数的性能优化[J]. 计算机研究与发展, 2013, 50(11): 2436-2443.
    引用本文: 郑宁汉, 古志民, 孙贤和. 小计算量下非规则数据密集型热函数的性能优化[J]. 计算机研究与发展, 2013, 50(11): 2436-2443.
    Zheng Ninghan, Gu Zhimin, Sun Xianhe. Performance Improvement for Irregular Data Intensive Hot-Slice with Low Computing Workload[J]. Journal of Computer Research and Development, 2013, 50(11): 2436-2443.
    Citation: Zheng Ninghan, Gu Zhimin, Sun Xianhe. Performance Improvement for Irregular Data Intensive Hot-Slice with Low Computing Workload[J]. Journal of Computer Research and Development, 2013, 50(11): 2436-2443.

    小计算量下非规则数据密集型热函数的性能优化

    Performance Improvement for Irregular Data Intensive Hot-Slice with Low Computing Workload

    • 摘要: 随着云计算的兴起和发展,基于多核的非规则数据密集型应用越来越多,而大量的数据缺失问题导致这类应用的性能严重下降.利用空闲核资源的传统帮助线程方法试图提前将主线程所需要的非规则数据放入共享的最后一级缓存(last layer cache,LLC),如果帮助线程相对于主线程具有恰当的运算速度,能在主线程访问之前将有关缺失数据放入LLC中,则热函数的性能可被改进.然而,如果热函数缺乏计算任务(称之为小计算量热函数),使用这样的传统方法就无法构建一个相对于主线程有效预取的帮助线程,其热函数性能的改善将会大大降低.针对源代码级小计算量下非规则数据密集型热函数的性能优化问题,先对帮助线程预取QoS进行了形式化描述.在此基础上,通过引入提前量等参数模型,提出了一种小计算量下热函数的性能优化方法.在Intel Core 2 Duo Processor 6550处理器上,通过对科学计算测试程序em3d,mst和SPEC CPU benchmark 2006中的mcf的进行实验,相对于传统方法分别获得了1.97%,31.63%和1.10%的性能提升.

       

      Abstract: With the rising and development of cloud computing, more and more irregular data intensive applications based on chip multi-core processors (CMP) appear, their application performances are badly affected by data cache misses. Traditional methods based on helper thread running in idle cores try to push irregular data into the shared last layer cache (LLC) in advance, which will be soon used by a computing core. If the helper thread runs faster than the main thread, the helper thread can push hot-data into LLC before the main thread uses them, thus the performance of hot slice may be improved. But for the hot-slice with low computing workload, it is impossible to build a helper thread running faster than the main thread by traditional method. This paper is aimed at the performance optimization of irregular data intensive hot-slice with low computing workload. First, the formalization description of the prefetch QoS of the helper thread is given, and then a new performance optimization method is proposed. The new method is implemented in real commercial processors without involving additional hardware modifications. Measurement results show that the performance of science computing benchmark em3d, mst and SPEC CPU2006 mcf gets the increases of 1.97%, 31.63% and 1.10% respectively compared with the traditional method in Intel Core 2 Duo Processor 6550.

       

    /

    返回文章
    返回