Abstract:
Early parallelizing compilers use the owner-computes rule to partition computation. Partial replication is then introduced to reduce near-neighbor communication at the cost of some repeated computation. It is an important optimization that improves the performance and scalability of parallel programs. Former exploration of partial replicate computation partitioning is limited within a single loop nest, and no explicit cost model is used. In this paper, a formal description of more general partial replicate computation partitioning problems is presented, which is called global partial replicate computation partitioning. As redundant message elimination exerts great influence on the effect of such optimizations, a linear cost model is introduced, which considers its effect. A framework is also developed, which employs the integer linear programming method. Experimental results show that the solution is superior to local approaches. Compared with the heuristic method, the new approach can deal with more general cases and is easier to adapt to different data distribution.