• 中国精品科技期刊
  • CCF推荐A类中文期刊
  • 计算领域高质量科技期刊T1类
高级检索

阵列众核处理器上的高效归并排序算法

石嵩, 李宏亮, 朱巍

石嵩, 李宏亮, 朱巍. 阵列众核处理器上的高效归并排序算法[J]. 计算机研究与发展, 2016, 53(2): 362-373. DOI: 10.7544/issn1000-1239.2016.20148246
引用本文: 石嵩, 李宏亮, 朱巍. 阵列众核处理器上的高效归并排序算法[J]. 计算机研究与发展, 2016, 53(2): 362-373. DOI: 10.7544/issn1000-1239.2016.20148246
Shi Song, Li Hongliang, Zhu Wei. Efficient Merge Sort Algorithms on Array-Based Manycore Architectures[J]. Journal of Computer Research and Development, 2016, 53(2): 362-373. DOI: 10.7544/issn1000-1239.2016.20148246
Citation: Shi Song, Li Hongliang, Zhu Wei. Efficient Merge Sort Algorithms on Array-Based Manycore Architectures[J]. Journal of Computer Research and Development, 2016, 53(2): 362-373. DOI: 10.7544/issn1000-1239.2016.20148246
石嵩, 李宏亮, 朱巍. 阵列众核处理器上的高效归并排序算法[J]. 计算机研究与发展, 2016, 53(2): 362-373. CSTR: 32373.14.issn1000-1239.2016.20148246
引用本文: 石嵩, 李宏亮, 朱巍. 阵列众核处理器上的高效归并排序算法[J]. 计算机研究与发展, 2016, 53(2): 362-373. CSTR: 32373.14.issn1000-1239.2016.20148246
Shi Song, Li Hongliang, Zhu Wei. Efficient Merge Sort Algorithms on Array-Based Manycore Architectures[J]. Journal of Computer Research and Development, 2016, 53(2): 362-373. CSTR: 32373.14.issn1000-1239.2016.20148246
Citation: Shi Song, Li Hongliang, Zhu Wei. Efficient Merge Sort Algorithms on Array-Based Manycore Architectures[J]. Journal of Computer Research and Development, 2016, 53(2): 362-373. CSTR: 32373.14.issn1000-1239.2016.20148246

阵列众核处理器上的高效归并排序算法

基金项目: 国家“八六三”高技术研究发展计划基金项目(2014AA01A301);“核高基”国家科技重大专项基金项目(2013zx0102-8001-001-001)
详细信息
  • 中图分类号: TP302

Efficient Merge Sort Algorithms on Array-Based Manycore Architectures

  • 摘要: 排序是计算机科学中最基本的问题之一,随着众核处理器结构的不断发展,设计众核结构上的高效排序算法具有重要意义.众核处理器的一个重要方向是阵列众核处理器,根据阵列众核处理器的结构特点,提出了2种面向阵列众核结构的高效归并排序算法,通过利用DMA(direct memory access)多缓冲机制提高访存效率、深度平衡归并策略保持众多核心之间的负载均衡、SIMD(single instruction multiple data)归并方法提高归并计算效率以及片上交换归并策略提高片上数据重用率,大幅度提高了阵列众核处理器的排序性能.在异构融合阵列众核处理器DFMC(deeply-fused many-core)原型系统的实验结果表明,算法排序速度达647 MKeys/s(million keys per second),其排序效率(排序速度/峰值性能)是NVIDIA GPU上最快的归并排序算法(GTX580平台)的3.3倍,是Intel Xeon Phi上最快的归并排序算法的2.7倍.最后,建立了阵列众核处理器上归并排序算法的性能分析模型,利用该模型分析了主要结构参数与算法性能的关系,对阵列众核处理器的研究有一定的指导意义.
    Abstract: Sorting is one of the most fundamental problems in the field of computer science. With the rapid development of manycore processors, it shows great importance to design efficient sort algorithms on manycore architecture. An important trend of manycore processors is array-based manycore architecture. This paper presents two efficient merge sort algorithms on array-based manycore architectures. Our algorithms achieve high performance by using DMA(direct memory access) multi-buffering strategy to improve the memory accessing efficiency, deeply balanced merging strategy to keep load-balancing between cores, SIMD(single instruction multiple data) merging method to exploit fine-grained parallelism and on-chip swap merging strategy to reuse on-chip data. Experimental results on DFMC(deeply-fused many-core) prototype system achieve a sorting rate of 647 MKeys/s(million keys per second), and the sorting efficiency(sorting rate/peak performance) is 3.3x higher than state-of-the-art GPU merge sort on GTX580, and 2.7x higher than the fastest merge sort on Intel Xeon Phi. Additionally, this paper establishes an analytical model to characterize the performance of our algorithms. Based on the analytical model, we analyze the influence of the main structural parameters to the performance of the algorithms, which will contribute to the study of many-core architecture.
计量
  • 文章访问数: 
  • HTML全文浏览量:  0
  • PDF下载量: 
  • 被引次数: 0
出版历程
  • 发布日期:  2016-01-31

目录

    /

    返回文章
    返回