ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2015, Vol. 52 ›› Issue (6): 1254-1265.doi: 10.7544/issn1000-1239.2015.20150154

所属专题: 2015面向应用领域需求的体系结构

• 系统结构 • 上一篇    下一篇

MACT:高通量众核处理器离散访存请求批量处理机制

李文明1,2,叶笑春1,王达1,郑方4,李宏亮4,林晗3,范东睿1,孙凝晖1   

  1. 1(计算机体系结构国家重点实验室(中国科学院计算技术研究所) 北京 100190);2(中国科学院大学计算机与控制学院 北京 100049);3(中国科学技术大学计算机科学与技术学院 合肥 230022);4(数学工程与先进计算国家重点实验室 江苏无锡 214125) (liwenming@ict.ac.cn)
  • 出版日期: 2015-06-01
  • 基金资助: 
    基金项目:国家“九七三” 重点基础研究发展计划基金项目(2011CB302501);国家“八六三”高技术研究发展计划基金项目(2012AA010901,2015AA011204);“核高基”国家科技重大专项基金项目(2013ZX0102-8001-001-001);国家自然科学基金项目(61173007,61332009,61204047)

MACT: Discrete Memory Access Requests Batch Processing Mechanism for High-Throughput Many-Core Processor

Li Wenming1,2, Ye Xiaochun1, Wang Da1, Zheng Fang4, Li Hongliang4, Lin Han3, Fan Dongrui1, Sun Ninghui1   

  1. 1(State Key Laboratory of Computer Architecture (Institute of Computing Technology, Chinese Academy of Sciences), Beijing 100190);2(School of Computer and Control Engineering, University of Chinese Academy of Sciences, Beijing 100049);3(School of Computer Science and Technology, University of Science and Technology of China, Hefei 230022);4(State Key Laboratory of Mathematical Engineering and Advanced Computing, Wuxi, Jiangsu 214125)
  • Online: 2015-06-01

摘要: 网络服务等新型高通量应用的迅速兴起给传统处理器设计带来了巨大的挑战.高通量众核处理器作为面向此类应用的新型处理器结构成为研究热点.然而,随着片上处理核数量的剧增,加之高通量应用的数据密集型特点,“存储墙”问题进一步加剧.通过分析高通量应用访存行为,发现此类应用存在着大量的细粒度访存,降低了访存带宽的有效利用率.基于此分析,在高通量处理器设计中通过添加访存请求收集表(memory access collection table, MACT) 硬件机制,结合消息式内存机制,用于收集离散的访存请求并进行批量处理.MACT硬件机制的实现,提高了访存带宽的有效利用率,同时也提高了执行效率;并通过时间窗口机制,确保访存请求在最晚期限之前发送出去,保证任务的实时性.实验以典型高通量应用WordCount,TeraSort,Search为基准测试程序.添加MACT硬件机制后,访存数量减少约49%,访存带宽提高约24%,平均执行速度提高约89%.

关键词: 高通量处理器, 访存请求收集表, 时间窗口机制, 高速缓冲存储器, 便签式存储器

Abstract: The rapid development of new high-throughput applications, such as Web services, brings huge challenges to traditional processors which target at high-performance applications. High-throughput many-core processors, as new processors, become hotspot for high-throughput applications. However, with the dramatic increase in the number of on chip cores, combined with the property of memory intensive of high throughput applications, the “memory wall” problems have intensified. After analyzing the memory access behavior of high throughput applications, it is found out that there are a large proportion of fine-grained granularity memory accesses which degrade the efficiency of bandwidth utilization and cause unnecessary energy consumption. Based on this observation, in high-throughput many-core processors design, memory access collection table (MACT) is implemented to collect discrete memory access requests and to handle them in batch under deadline constraint. Using MACT hardware mechanism, both bandwidth utilization and execution efficiency have been improved. QoS is also guaranteed by employing time-window mechanism, which insures that all the requests can be sent before the deadline. WordCount, TeraSort and Search are typical high-throughput application benchmarks which are used in experiments. The experimental results show that MACT reduces the number of memory accesses requests by 49% and improves bandwidth efficiency by 24%, and the average execution speed is improved by 89%.

Key words: high throughput processor, memory access collection table(MACT), time window mechanism, cache, scratchpad memory(SPM)

中图分类号: