Abstract:
In this paper, NPB benchmarking is performed on three domestic tera-scale cluster systems with emphasis on the performance characteristics and trends when carrying out tera-scale parallel computing on systems with thousands of processors. The effects of different system configurations (processor, interconnection network, etc.) on the final NPB performance are analyzed and it is found that the programs in NPB suites got their best performance on different clusters. Through further analysis, it is indicated that the scalability of NPB programs can reach hundreds of processors, but can't reach thousands of processors. Most of the NPB programs can only exploit around 10% of the system peak performance, so the scalability of cluster systems and real application performance on tera-scale cluster systems need further improvement. For manufacturing of tera-scale cluster systems with thousands of processors, the performance of collective communication and fine-grained message passing needs further improvement.