ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2015, Vol. 52 ›› Issue (6): 1341-1350.doi: 10.7544/issn1000-1239.2015.20150201

所属专题: 2015面向应用领域需求的体系结构

• 系统结构 • 上一篇    下一篇



  1. (数学工程与先进计算国家重点实验室 江苏无锡 214125) (
  • 出版日期: 2015-06-01

Ant Cluster: A Novel High-Efficiency Multipurpose Computing Platform

Xie Xianghui, Qian Lei, Wu Dong, Yuan Hao, Li Xiang   

  1. (State Key Laboratory of Mathematical Engineering and Advanced Computing, Wuxi, Jiangsu 214125)
  • Online: 2015-06-01

摘要: 在科学计算和大数据处理应用需求的推动下,高性能计算机的性能不断提升、系统规模日益扩大,系统功耗越来越成为制约能力提升的重要瓶颈.在深入分析现有4类高性能计算机的基础上探讨了2项关键技术:1)可重构微服务器(reconfigurable micro server, RMS)技术.解决单个计算节点在领域应用加速能力、系统功耗和体积间的平衡兼顾问题.2)自治与分治相结合的集群构造技术.解决基于微小型化计算节点的大规模计算平台构造与扩展性问题.在此基础上,提出了一种新型的高效多用计算平台架构——“蚁群”,构建了包含2 048个低功耗、微小型化RMS计算节点的蚁群平台原型系统,并实现了大规模指纹实时比对和多RMS节点协同排序2个典型应用.测试表明,单个RMS节点的指纹比对性能是Xeon单核的34倍,功耗仅5W,整个原型系统可实现千万量级指纹库的数百并发实时查询;蚁群平台的数据排序性能功耗比是GPU平台的10倍以上,有效提升数据排序的效率.

关键词: 高性能计算, 集群, 可重构计算, 微服务器, 计算平台

Abstract: Driven by the demands of scientific computing and big data processing, high performance computers in the world have been more powerful and the system scales have been larger than ever before. However, the power consumption of the whole system is becoming a severe bottleneck in the further improvement of performance. In this paper, after analyzing four types of HPC systems deeply, we propose and study two key technologies which include reconfigurable micro server (RMS) technology and cluster constructing technology with the combination of node autonomy and node cooperation. RMS technology provides a new way to make the performance, the power consumption and the size of computing nodes in balance. By combining the node autonomy and the node cooperation, a large amount of small-sized computing nodes can be aggregated to be a scalable RMS cluster. Based on these technologies, we propose a new high-efficiency multipurpose computing platform architecture called Ant Cluster and construct a prototype system which consists of 2,048 low-power ant-like small-sized computing nodes. On this cluster, we implement two actual applications. The test results show that, for real-time large-scale fingerprint matching, single RMS node can achieve 34 times speed-up compared with single Inter Xeon core and the power consumption is only 5W. The whole prototype system supports processing hundreds of queries on a database of 10 million fingerprints in real time. For data sorting, our prototype system achieves 10 times more performance per watt than GPU platform and obtains higher efficiency.

Key words: high performance computing(HPC), cluster, reconfigurable computing, micro server, computing platform