Advanced Search
    Zhao Changhai, Wang Shihu, Luo Guoan, Wen Jiamin, Zhang Jianlei. A Highly Scalable Parallel Algorithm for 3D Prestack Kirchhoff Time Migration[J]. Journal of Computer Research and Development, 2015, 52(4): 869-878. DOI: 10.7544/issn1000-1239.2015.20131915
    Citation: Zhao Changhai, Wang Shihu, Luo Guoan, Wen Jiamin, Zhang Jianlei. A Highly Scalable Parallel Algorithm for 3D Prestack Kirchhoff Time Migration[J]. Journal of Computer Research and Development, 2015, 52(4): 869-878. DOI: 10.7544/issn1000-1239.2015.20131915

    A Highly Scalable Parallel Algorithm for 3D Prestack Kirchhoff Time Migration

    • To support increasing survey sizes and processing complexity, we propose a practical approach that implements the large-scale parallel processing of 3D prestack Kirchhoff time migration(PKTM) on clusters of multi-core nodes. The parallel algorithm is based on three-level decomposition of the imaging space. Firstly, the imaging space is partitioned by offsets. Each node runs in just one process, and all processes are divided into several distinct groups. The imaging work of common-offset space is assigned to a group, and the common-offset input traces are dynamically distributed to the processes of the group. Once all input traces are migrated, the local imaging sections of all the processes in a group are added to form the final common-offset image. In a node, the common-offset imaging section is further partitioned equally by common middle point (CMP) into as many blocks as the number of CPU cores, and the computing threads share the same input traces and spread the sampled points to a different set of imaging points. If the size of a common-offset imaging section exceeds the total physical memory on the compute node, the whole imaging space should be firstly partitioned along in-line direction so that each common-offset imaging space can fit in memory. The algorithm greatly reduces the memory requirement, does not introduce overlapping input traces between any processes, and makes it easy to implement fault-tolerance application. An implementation of the algorithm demonstrats high scalability and excellent performance in our experiment with actual data. Parallelism is scaled to efficiently use up to 497 nodes and 7552,threads.
    • loading

    Catalog

      Turn off MathJax
      Article Contents

      /

      DownLoad:  Full-Size Img  PowerPoint
      Return
      Return