Abstract:
A traditional high performance computer (HPC) consists of two parts: nodes and interconnection network, and the node part can be further divided into a processing unit and a node controller. The processing unit usually adopts symmetric multi-processors (SMP) or non-uniform memory access (NUMA) structure with cache coherence. In order to maintain the cache coherence efficiently, the number of processors in a processing unit is very limited. Therefore, a HPC of petaflops would possess tens of thousands of nodes, which makes a very high requirement of both latency and bandwidth for the interconnection network. The hyper-node controller introduced in this paper can connect several processing units simultaneously, and they together construct a hyper-node. Implementing hyper-nodes can largely reduce the scale of the interconnection network, which reduces the design complexity of the interconnection network and guarantees the performance of the interconnection network. The key techniques in the hyper-node controller, including supporting global address space, direct memory access, remote load store, global hardware lock, and multi-rail interconnection network, can effectively lower the communication latency, guarantee the sufficient bandwidth and enhance its synchronization performance. The hyper-node controller is implemented with FPGA, and a prototype system is built. The test result shows that the cluster with hyper-nodes has very low latency, and it has a good extendibility in bandwidth.