Optimization Methods for Distributed Iterative Computing Performance over Multi-Dimensional Large Graph

Du Yujie; Wang Zhigang; Wang Ning; Liu Xinyi; Yi Juncheng; Nie Jie; Wei Zhiqiang; Gu Yu; Yu Ge

doi:10.7544/issn1000-1239.202110839

Du Yujie, Wang Zhigang, Wang Ning, Liu Xinyi, Yi Juncheng, Nie Jie, Wei Zhiqiang, Gu Yu, Yu Ge. Optimization Methods for Distributed Iterative Computing Performance over Multi-Dimensional Large GraphJ. Journal of Computer Research and Development, 2023, 60(3): 654-675. DOI: 10.7544/issn1000-1239.202110839

Citation:

Optimization Methods for Distributed Iterative Computing Performance over Multi-Dimensional Large Graph

Graphical Abstract

Graphical Abstract

Abstract

Abstract

Complex mining algorithms over large-scale graphs usually require highly frequent iterative analysis, while distributed computing with scalable computing power and storage capacity is a preferred solution for efficiency. However, the edges freely connecting vertices generate a lot of messages across distributed computing tasks and hence incurs communication costs during iterations, which heavily limits the performance improvement of distributed computing. To alleviate the negative impact of communication bottleneck, existing research basically employs combining and replicating techniques on top of the traditional pushing framework design. However, they mainly focus on easy-to-be-optimized single-dimensional-message algorithms with simple message data structure, and are not suitable for other important multi-dimensional-message algorithms with complex structure. Also, they cannot be seamlessly integrated into the-state-of-the-art pulling framework where messages are generated on demand. We thereby propose a lightweight vertex replication mechanism to synchronize replicated vertices and generate messages based on such replications on demand. The mechanism can work well under the pulling framework with inherent advantages in terms of fault-tolerance and memory consumption, and also greatly optimize the communication costs. Moreover, by considering the communication benefits and the costs incurred by possible workload imbalance, it can select an optimal replication threshold for best performance. Finally, extensive experiments over various real-world graphs validate the effectiveness of the lightweight vertex replication framework and the threshold analysis model.

FullText(HTML)

References (32)

Cited By

Turn off MathJax

Article Contents

Optimization Methods for Distributed Iterative Computing Performance over Multi-Dimensional Large Graph

Graphical Abstract

Abstract

Catalog

Export File

Citation

Format

Content