Citation: | Cao Kun, Long Saiqin, Li Zhetao. Lifetime-Driven OpenCL Application Scheduling on CPU-GPU MPSoC[J]. Journal of Computer Research and Development, 2023, 60(5): 976-991. DOI: 10.7544/issn1000-1239.202220700 |
In recent years, multiprocessor system-on-chips (MPSoC) integrating CPU and GPU have been widely deployed in the fields of industrial control, automotive electronics, smart medical, etc. The open computing language (OpenCL) is regarded as a popular application programming standard for CPU-GPU MPSoC due to the power of fully exploiting the parallel computing power of GPU cores and the general-purpose computing power of CPU cores. However, during deploying OpenCL applications to CPU-GPU MPSoC, most of the existing research works have neglected the management of chip temperature and lifetime, resulting in the elevated peak temperature and the early occurrence of permanent failures. In this paper, we explore the lifetime-driven OpenCL application scheduling for latency minimization on CPU-GPU MPSoC under timing, temperature, energy consumption, and lifetime constraints. We propose a method composed of static and dynamic application scheduling techniques. The static application scheduling technique is built on the improved cross-entropy strategy with consideration of the OpenCL application characteristics in searching for optimal OpenCL application design points. The dynamic application scheduling technique is developed on the feedback control strategy capable of processing the new arrival applications for latency optimization at runtime. Experimental results show that our proposed method reduces the average delay of OpenCL applications by 34.58% while satisfying all design constraints.
[1] |
张峰,翟季冬,陈政,等. 面向异构融合处理器的性能分析、优化及应用综述[J]. 软件学报,2020,31(8):2603−2624 doi: 10.13328/j.cnki.jos.006080
Zhang Feng, Zhai Jidong, Chen Zheng, et al. Survey on performance analysis, optimization, and applications of heterogeneous fusion processors[J]. Journal of Software, 2020, 31(8): 2603−2624 (in Chinese) doi: 10.13328/j.cnki.jos.006080
|
[2] |
Hardkernel Co, Ltd. Introduction to Odroid-XU3 [EB/OL]. 2019 [2022-05-27].https://www.hardkernel.com/shop/odroid-xu3/
|
[3] |
Kaeli D R, Mistry P, Schaa D, et al. Heterogeneous Computing with OpenCL 2.0 [M]. San Francisco, CA: Morgan Kaufmann, 2015
|
[4] |
Singh A K, Prakash A, Basireddy K R, et al. Energy-efficient run-time mapping and thread partitioning of concurrent OpenCL applications on CPU-GPU MPSoCs[J]. ACM Transactions on Embedded Computing Systems, 2017, 16(5s): 1−22
|
[5] |
Wang Shaochung, Yu Linya, Her Li’an, et al. Pointer-based divergence analysis for OpenCL 2.0 programs[J]. ACM Transactions on Parallel Computing, 2021, 8(4): 1−23
|
[6] |
Khalid Y N, Aleem M, Ahmed U, et al. Troodon: A machine-learning based load-balancing application scheduler for CPU–GPU system[J]. Journal of Parallel and Distributed Computing, 2019, 132: 79−94 doi: 10.1016/j.jpdc.2019.05.015
|
[7] |
Chen Cen, Li Kenli, Ouyang Anjia, et al. FlinkCL: An OpenCL-based in-memory computing architecture on heterogeneous CPU-GPU clusters for big data[J]. IEEE Transactions on Computers, 2018, 67(12): 1765−1779 doi: 10.1109/TC.2018.2839719
|
[8] |
Ahmed U, Lin J C W, Srivastava G, et al. Fuzzy active learning to detect OpenCL kernel heterogeneous machines in cyber physical systems[J]. IEEE Transactions on Fuzzy Systems, 2022, 30(11): 4618−4629 doi: 10.1109/TFUZZ.2022.3167158
|
[9] |
Harichane I, Makhlouf S A, Belalem G. KubeSC-RTP: Smart scheduler for Kubernetes platform on CPU-GPU heterogeneous systems[J]. Concurrency and Computation: Practice and Experience, 2022, 34(21): 1−19
|
[10] |
Wang Siqi, Ananthanarayanan G, Mitra T. OPTiC: Optimizing collaborative CPU–GPU computing on mobile devices with thermal constraints[J]. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2018, 38(3): 393−406
|
[11] |
Pérez B, Stafford E, Bosque J L, et al. Auto-tuned OpenCL kernel co-execution in OmpSs for heterogeneous systems[J]. Journal of Parallel and Distributed Computing, 2019, 125: 45−57 doi: 10.1016/j.jpdc.2018.11.001
|
[12] |
Isuwa S, Dey S, Singh A K, et al. TEEM: Online thermal-and energy-efficiency management on CPU–GPU MPSoCs [C] //Proc of the 22nd IEEE Design, Automation & Test in Europe Conf & Exhibition. Piscataway, NJ: IEEE, 2019: 438−443
|
[13] |
Maity S, Ghose A, Dey S, et al. Thermal-aware adaptive platform management for heterogeneous embedded systems[J]. ACM Transactions on Embedded Computing Systems, 2021, 20(5s): 1−28
|
[14] |
Navarro A, Corbera F, Rodriguez A, et al. Heterogeneous parallel for template for CPU–GPU chips[J]. International Journal of Parallel Programming, 2019, 47(2): 213−233 doi: 10.1007/s10766-018-0555-0
|
[15] |
Damschen M, Mueller F, Henkel J. Co-scheduling on fused CPU-GPU architectures with shared last level caches[J]. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2018, 37(11): 2337−2347 doi: 10.1109/TCAD.2018.2857042
|
[16] |
李甜甜,于戈,宋杰. 实时系统温度功耗管理的优化方法研究[J]. 计算机研究与发展,2016,53(7):1478−1492 doi: 10.7544/issn1000-1239.2016.20160134
Li Tiantian, Yu Ge, Song Jie. Optimization research on thermal and power management for real-time systems[J]. Journal of Computer Research and Development, 2016, 53(7): 1478−1492 (in Chinese) doi: 10.7544/issn1000-1239.2016.20160134
|
[17] |
Huang Huang, Chaturvedi V, Quan Gang, et al. Throughput maximization for periodic real-time systems under the maximal temperature constraint[J]. ACM Transactions on Embedded Computing Systems, 2014, 13(2s): 1−22
|
[18] |
Kalaivani C T, Kalaiarasi N. Earliest deadline first scheduling technique for different networks in network control system[J]. Neural Computing and Applications, 2019, 31(1): 223−232
|
[19] |
Rosing T S, Mihic K, De Micheli G. Power and reliability management of SoCs[J]. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 2007, 15(4): 391−403 doi: 10.1109/TVLSI.2007.895245
|
[20] |
Chantem T, Xiang Yun, Hu Sharon Xiaobo, et al. Enhancing multicore reliability through wear compensation in online assignment and scheduling [C] //Proc of the 16th IEEE Design, Automation & Test in Europe Conf & Exhibition. Piscataway, NJ: IEEE, 2013: 1373−1378
|
[21] |
Rubinstein R Y, Kroese D P. The Cross-Entropy Method: A Unified Approach to Combinatorial Optimization, Monte-Carlo Simulation, and Machine Learning [M]. Berlin: Springer, 2004
|
[22] |
Helton J C, Davis F J. Latin hypercube sampling and the propagation of uncertainty in analyses of complex systems[J]. Reliability Engineering & System Safety, 2003, 81(1): 23−69
|
[23] |
Li Liying, Cong Peijin, Cao Kun, et al. Game theoretic feedback control for reliability enhancement of EtherCAT-based networked systems[J]. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2019, 38(9): 1599−1610 doi: 10.1109/TCAD.2018.2859241
|
[24] |
Dey S, Singh A K, Wang Xiaohang, et al. User interaction aware reinforcement learning for power and thermal efficiency of CPU-GPU mobile MPSoCs [C] //Proc of the 23rd IEEE Design, Automation & Test in Europe Conf & Exhibition. Piscataway, NJ: IEEE, 2020: 1728−1733
|
[25] |
Samsung Electronics Co, Ltd. Introduction to Samsung Exynos 9810 mobile SoC [EB/OL]. 2018 [2022-07-10].https://www.notebookcheck.net/Samsung-Exynos-9810-SoC.276866.0.html
|
[26] |
Zhang Runjie, Stan M R, Skadron K. HotSpot 6.0: Validation, acceleration and extension [EB/OL]. 2015 [2022-07-10].https://www.cs.virginia.edu/~skadron /Papers/HotSpot60_TR.pdf
|
[27] |
Ma Yue, Zhou Junlong, Chantem T, et al. Improving reliability of soft real-time embedded systems on integrated CPU and GPU platforms[J]. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2019, 39(10): 2218−2229
|
[28] |
GitHub Inc. Hetero-Mark benchmark suit [EB/OL]. 2021 [2022-07-10].https://github.com/NUCAR-DEV/Hetero-Mark
|
[29] |
Grauer-Gray S, Killian W, Cavazos J, et al. PolyBench benchmark suit [EB/OL]. [2022-07-10]. http://cavazos-lab.github.io/PolyBench-ACC/
|
[30] |
Bienia C, Kumar S, Singh J P, et al. The PARSEC benchmark suite: Characterization and architectural implications [C] //Proc of the 17th Int Conf on Parallel Architectures and Compilation Techniques. New York: ACM, 2008:72-81
|
[31] |
Bento M E C. A hybrid particle swarm optimization algorithm for the wide-area damping control design[J]. IEEE Transactions on Industrial Informatics, 2022, 18(1): 592−599 doi: 10.1016/j.chemolab.2015.08.020
|