As an important part of a cluster, the performance of the communication system is one of the most critical factors determining the performance of a whole cluster system. With the enhancing of a single node's computing capability, the communication capability of network needs to be improved corresponsively. An important method to enhancing capability of communication is using multiple cards to deal with messages at the same time. In this paper, an implementation of parallel communication based on smart NICs is presented and evaluated by both communication benchmarks and applications. The experimental results show that both performance of communication and applications is better than parallel communication based on RMA mechanism.