高级检索
    黄安文 石文强 高 军 张民选. 面向虚拟共享域划分的自适应迁移与复制机制[J]. 计算机研究与发展, 2013, 50(8): 1583-1591.
    引用本文: 黄安文 石文强 高 军 张民选. 面向虚拟共享域划分的自适应迁移与复制机制[J]. 计算机研究与发展, 2013, 50(8): 1583-1591.
    Huang Anwen, Shi Wenqiang, Gao Jun, and Zhang Minxuan. An Adaptive Migration-Replication Mechanism for Virtual Shared Regions Partition[J]. Journal of Computer Research and Development, 2013, 50(8): 1583-1591.
    Citation: Huang Anwen, Shi Wenqiang, Gao Jun, and Zhang Minxuan. An Adaptive Migration-Replication Mechanism for Virtual Shared Regions Partition[J]. Journal of Computer Research and Development, 2013, 50(8): 1583-1591.

    面向虚拟共享域划分的自适应迁移与复制机制

    An Adaptive Migration-Replication Mechanism for Virtual Shared Regions Partition

    • 摘要: 传统数据管理机制无法感知分布式cache布局的非一致访问延迟特性,导致多核处理器大容量cache失效率和命中延迟之间的矛盾日益加剧.此外,单独依靠数据迁移和盲目复制难以解决共享数据块的竞争访问与长延迟命中问题.基于瓦片式多核处理器分布式cache的虚拟共享域划分机制,提出并实现一种域间数据自适应迁移与复制机制,能够协同感知本地目标bank候选牺牲块状态和远程命中块的局部活跃程度,在多个虚拟共享域间对多核竞争访问的共享数据进行动态迁移和复制决策,综合权衡片上长延迟命中和cache容量有效利用率问题,降低平均存储访问延迟.最后,在全系统模拟器中实现虚拟共享域划分和域间共享数据自适应迁移-复制机制,并采用典型测试程序包SPLASH-2评估性能优化情况.实验表明,与传统固定共享域划分机制和同类优化机制相比,自适应迁移与复制机制在不同共享度下均可获得相应性能提升,面积开销可以忽略不计.

       

      Abstract: The speed gap between processor and memory is constantly widening, which substantially exacerbates the dependence of program performance on the on-chip memory hierarchy design in chip multiprocessors (CMP). However, traditional data management mechanism doesn't take advantage of the property of non-uniform cache access latency in large distributed cache in CMP, which causes the contradiction between miss rate and hit latency is increasingly serious. Furthermore, it is difficult to solve the problems of conflicting access and long latency hit to shared blocks by simply replying on dynamic migration and blind replication. Aiming at these challenges, this paper proposes an adaptive migration-replication (AMR) mechanism based on the virtual shared regions (VSR) partition in tiled CMP. Both the state of the victim candidate in local VSR and the activity degree of remote source line are taken into consideration cooperatively, so that the shared blocks accessed by different processor cores can be migrated and replicated between different VSRs adaptively, which results in the reduction of the average memory access time. Finally, the VSR partition and AMR mechanism are implemented using a full system simulator, and the typical benchmark suit SPLASH-2 is used to evaluate the performance improvement. Simulation results demonstrate that AMR performs well under different sharing degree compared with traditional fixed partition mechanism, while the additional hardware overhead is negligible.

       

    /

    返回文章
    返回