高级检索
    王 凯, 陈 飞, 李 强, 李晓民, 安学军, 孙凝晖. 一种面向高性能计算机的超节点控制器的研究[J]. 计算机研究与发展, 2011, 48(1): 1-8.
    引用本文: 王 凯, 陈 飞, 李 强, 李晓民, 安学军, 孙凝晖. 一种面向高性能计算机的超节点控制器的研究[J]. 计算机研究与发展, 2011, 48(1): 1-8.
    Wang Kai, Chen Fei, Li Qiang, Li Xiaomin, An Xuejun, Sun Ninghui. Research on Hyper-Node Controller for High Performance Computer[J]. Journal of Computer Research and Development, 2011, 48(1): 1-8.
    Citation: Wang Kai, Chen Fei, Li Qiang, Li Xiaomin, An Xuejun, Sun Ninghui. Research on Hyper-Node Controller for High Performance Computer[J]. Journal of Computer Research and Development, 2011, 48(1): 1-8.

    一种面向高性能计算机的超节点控制器的研究

    Research on Hyper-Node Controller for High Performance Computer

    • 摘要: 传统高性能计算机的节点由一个处理单元和一个节点控制器组成.为了有效地维护高速缓存一致性,处理单元中的处理器个数会非常有限.因此一台具有千万亿次处理能力的高性能计算机将会有上万个节点,这对互连网络的延迟和带宽都提出了非常高的要求.超节点控制器能够同时连接多个处理单元构成一个超节点,这能够减小互连网络的规模,从而降低互连网络的设计难度,并保证互连网络的性能.用FPGA实现了超节点控制器的原型系统的测试结果表明,采用超节点设计的高性能计算机拥有非常低的通信延迟,同时其通信带宽也有非常好的扩展性.

       

      Abstract: A traditional high performance computer (HPC) consists of two parts: nodes and interconnection network, and the node part can be further divided into a processing unit and a node controller. The processing unit usually adopts symmetric multi-processors (SMP) or non-uniform memory access (NUMA) structure with cache coherence. In order to maintain the cache coherence efficiently, the number of processors in a processing unit is very limited. Therefore, a HPC of petaflops would possess tens of thousands of nodes, which makes a very high requirement of both latency and bandwidth for the interconnection network. The hyper-node controller introduced in this paper can connect several processing units simultaneously, and they together construct a hyper-node. Implementing hyper-nodes can largely reduce the scale of the interconnection network, which reduces the design complexity of the interconnection network and guarantees the performance of the interconnection network. The key techniques in the hyper-node controller, including supporting global address space, direct memory access, remote load store, global hardware lock, and multi-rail interconnection network, can effectively lower the communication latency, guarantee the sufficient bandwidth and enhance its synchronization performance. The hyper-node controller is implemented with FPGA, and a prototype system is built. The test result shows that the cluster with hyper-nodes has very low latency, and it has a good extendibility in bandwidth.

       

    /

    返回文章
    返回