Traditional MPI (message passing interface) collectives are implemented by point-to-point messages, and have poor performance. Hardware supported collectives have attracted more and more attention due to their high performance and low CPU utilization. Aggregate tree has crucial impact on the performance of hardware supported collectives. We study the factors that affect the performance of hardware supported collectives, and propose cost model for hardware supported collectives. Then we propose an efficient method to create aggregate trees, which includes three parts. Firstly, we choose appropriate aggregate tree type and breadth according to the operation type and the size of aggregate messages, in order to do tradeoff between network transmission time and data processing time. Secondly, we propose a method to create hierarchical minimum height aggregate tree of type Ⅰ, which reduces the number of inter-group communication. Thirdly, we propose a method to create the minimum cost aggregate tree of type Ⅱ, which minimizes the number of switches used. In the Sunway interconnection network, we test the proposals. In the presence of network noise, the message latency of the hierarchical minimum height aggregate tree of type Ⅰ is reduced by 24%-89% compared with the traditional method. The aggregate entries used by the minimum cost aggregate tree of type Ⅱ for typical communication patterns is reduced by 90% compared with the traditional method.