高级检索
    钟 祺, 王 晶, 管雪涛, 黄 涛, 王克义. 基于数据对象规模的Rank级内存分配方法[J]. 计算机研究与发展, 2014, 51(3): 672-680.
    引用本文: 钟 祺, 王 晶, 管雪涛, 黄 涛, 王克义. 基于数据对象规模的Rank级内存分配方法[J]. 计算机研究与发展, 2014, 51(3): 672-680.
    Zhong Qi, Wang Jing, Guan Xuetao, Huang Tao, Wang Keyi. Data Object Scale Aware Rank-Level Memory Allocation[J]. Journal of Computer Research and Development, 2014, 51(3): 672-680.
    Citation: Zhong Qi, Wang Jing, Guan Xuetao, Huang Tao, Wang Keyi. Data Object Scale Aware Rank-Level Memory Allocation[J]. Journal of Computer Research and Development, 2014, 51(3): 672-680.

    基于数据对象规模的Rank级内存分配方法

    Data Object Scale Aware Rank-Level Memory Allocation

    • 摘要: 利用主存的多bank/rank/channel结构挖掘访存并行性和局部性,是提高系统性能的重要手段.相关研究工作通过sub-rank技术增加可并行工作的存储资源,或在并行程序之间对bank划分,以隔离访存冲突.但上述方法没有考虑在bank/rank资源共存的情况下,单个程序内部数据对象间的冲突问题.通过观察数据在主存中的分布,发现程序的数据倾向聚簇于单个rank中,并提出了一种基于数据对象规模的rank级内存分配方法(data object scale aware rank-level memory allocation, DSRA).DSRA将冲突开销较大的数据对象分散到不同的rank,利用增长的bank/rank资源提高访存性能.DSRA工作在操作系统层,基于编译器和操作系统提供的信息来分析数据对象间的冲突开销,既不用修改源码,也不依赖特殊的底层硬件.基于2款真实处理器对来自NAS Benchmark和SPEC CPU2000中的存储敏感型基准测试程序进行评测.结果表明,在不影响cache失效率的情况下,DSRA通过减少主存访问周期数,可以降低程序的执行时间.与已有的优化技术相比,性能平均提高6.8%,最高性能提升幅度为16%.

       

      Abstract: The main memory is organized as bank/rank/channel structure, which can be used to improve performance by exploiting parallelism and locality. The previous works have employed sub-ranking techniques to add more bank resource, or guided the bank partition among parallel running processes for isolating the memory interference. However, these methods ignore the interference problem when the memory system involves multiple ranks. In this paper, through an analysis on data layout, we find that program's data is inclined to cluster into a single rank because of the limited working set. This phenomenon results in the underutilized memory resource and system performance. We propose DSRA (data object scale aware rank-level memory allocation), which provides a software-only way to deal with this problem. Based on the cost of interference among objects, DSRA puts them into different ranks to avoid cluster. Meanwhile, with the information extracted by compiler and operating system, it requires no modification of application and underlying hardware. Measurement shows that DSRA, implementing in the Linux 2.6.32 kernel and running on two different types of processors, improves the performance of memory intensive NAS benchmark and SPEC CPU2000 by up to 16%(6.8% on average), with little effect on the cache miss rate.

       

    /

    返回文章
    返回