MACT：高通量众核处理器离散访存请求批量处理机制

李文明; 叶笑春; 王达; 郑方; 李宏亮; 林晗; 范东睿; 孙凝晖

doi:10.7544/issn1000-1239.2015.20150154

MACT：高通量众核处理器离散访存请求批量处理机制

MACT: Discrete Memory Access Requests Batch Processing Mechanism for High-Throughput Many-Core Processor

摘要

摘要: 网络服务等新型高通量应用的迅速兴起给传统处理器设计带来了巨大的挑战.高通量众核处理器作为面向此类应用的新型处理器结构成为研究热点.然而，随着片上处理核数量的剧增，加之高通量应用的数据密集型特点,“存储墙”问题进一步加剧.通过分析高通量应用访存行为，发现此类应用存在着大量的细粒度访存，降低了访存带宽的有效利用率.基于此分析，在高通量处理器设计中通过添加访存请求收集表(memory access collection table, MACT) 硬件机制，结合消息式内存机制，用于收集离散的访存请求并进行批量处理.MACT硬件机制的实现，提高了访存带宽的有效利用率，同时也提高了执行效率;并通过时间窗口机制，确保访存请求在最晚期限之前发送出去，保证任务的实时性.实验以典型高通量应用WordCount，TeraSort，Search为基准测试程序.添加MACT硬件机制后，访存数量减少约49%，访存带宽提高约24%，平均执行速度提高约89%.

Abstract: The rapid development of new high-throughput applications, such as Web services, brings huge challenges to traditional processors which target at high-performance applications. High-throughput many-core processors, as new processors, become hotspot for high-throughput applications. However, with the dramatic increase in the number of on chip cores, combined with the property of memory intensive of high throughput applications, the “memory wall” problems have intensified. After analyzing the memory access behavior of high throughput applications, it is found out that there are a large proportion of fine-grained granularity memory accesses which degrade the efficiency of bandwidth utilization and cause unnecessary energy consumption. Based on this observation, in high-throughput many-core processors design, memory access collection table (MACT) is implemented to collect discrete memory access requests and to handle them in batch under deadline constraint. Using MACT hardware mechanism, both bandwidth utilization and execution efficiency have been improved. QoS is also guaranteed by employing time-window mechanism, which insures that all the requests can be sent before the deadline. WordCount, TeraSort and Search are typical high-throughput application benchmarks which are used in experiments. The experimental results show that MACT reduces the number of memory accesses requests by 49% and improves bandwidth efficiency by 24%, and the average execution speed is improved by 89%.

HTML全文

参考文献(0)

施引文献

资源附件(0)