• 中国精品科技期刊
  • CCF推荐A类中文期刊
  • 计算领域高质量科技期刊T1类
Advanced Search
Niu Gen, Zhang Fuxin. Code Cache Optimization Schemes Based on Fine-Grained State Label[J]. Journal of Computer Research and Development. DOI: 10.7544/issn1000-1239.202330856
Citation: Niu Gen, Zhang Fuxin. Code Cache Optimization Schemes Based on Fine-Grained State Label[J]. Journal of Computer Research and Development. DOI: 10.7544/issn1000-1239.202330856

Code Cache Optimization Schemes Based on Fine-Grained State Label

Funds: This work was supported by the National Key Research and Development Program of China (2022YFB3105103).
More Information
  • Author Bio:

    Niu Gen: born in 1996. PhD candidate. Student member of CCF. His research interests include virtualization and binary translation

    Zhang Fuxin: born in 1976. PhD, professor. Senior member of CCF. His main research interests include computer architecture, binary translation, and operation system. (fxzhang@ict.ac.cn)

  • Received Date: October 30, 2023
  • Revised Date: January 06, 2025
  • Accepted Date: January 25, 2025
  • Available Online: January 25, 2025
  • Software Code Cache is widely used in dynamic binary translators to manage the dynamically generated code blocks. The translation, refresh, and memory occupancy of code blocks are key metrics for software code cache. There has been little research on software code cache for system-level dynamic binary translators. Existing system-level dynamic binary translators use state label scheme to achieve correct and efficient instruction semantic simulation, but this scheme introduces additional problems for software code cache management. Through in-depth analysis of the state label scheme, two types of problems are summarized: conflicts and redundancies. To address these two problems, two code cache optimization schemes based on fine-grained state label are proposed, including multi-state code cache scheme and weak state label scheme. These two schemes are implemented in LATX-SYS and evaluated with Ubuntu/x86 16.04 and Windows XP/x86 system booting on LoongArch platform. The evaluation results show that the code block refresh and translation are reduced by 43% and 18% respectively. The code block similarity ratio is decreased from 59.63% to 5.06%. The translation overhead and memory occupancy are both reduced. Overall, the system boot time was reduced by 20%. Finally, testing of the weak state label scheme on SPEC CPU2000 shows that the number of code blocks is reduced by an average of 13%, with only 2%-3% performance overhead introduced.

  • [1]
    Fabrice B. QEMU: A fast and portable dynamic translator[C]//Proc of the 2005 USENIX Annual Technical Conference (USENIX ATC 05). Anaheim, CA: USENIX Association, 2005: 41−46
    [2]
    胡伟武,汪文祥,吴瑞阳,等. 龙芯指令系统架构技术[J]. 计算机研究与发展,2023,60(1):2−16 doi: 10.7544/issn1000-1239.202220196

    Hu Weiwu, Wang Wenxiang, Wu Ruiyang, et al. Loongson Instruction Set Architecture Technology[J]. Journal of Computer Research and Development, 2023, 60(1): 2−16 (in Chinese) doi: 10.7544/issn1000-1239.202220196
    [3]
    Kim H, Michael D S. Code cache management schemes for dynamic optimizers[C]//Proc of the 6th Annual Workshop on Interaction between Compilers and Computer Architectures. Piscataway, NJ: IEEE, 2002: 92−100
    [4]
    Kim H, Michael D S. Managing bounded code caches in dynamic binary optimization systems[J]. ACM Transactions on Architecture and Code Optimization, 2006, 3(3): 263−294 doi: 10.1145/1162690.1162692
    [5]
    马舒兰. 动态二进制翻译中的TCache替换算法[J]. 计算机应用与软件,2008,25(4):273−275 doi: 10.3969/j.issn.1000-386X.2008.04.105

    Ma Shulan. TCache replacement algorithm for dynamic binary translation[J]. Computer Applications and Software, 2008, 25(4): 273−275(in Chinese) doi: 10.3969/j.issn.1000-386X.2008.04.105
    [6]
    王楠,单征,岳峰. I386到Alpha动态二进制翻译中的代码缓存管理优化[J]. 信息工程大学学报,2010,11(6):688−691 doi: 10.3969/j.issn.1671-0673.2010.06.010

    Wang Nan, Shan Zheng, Yue Feng. Optimization in code cache management from I386 to Alpha in dynamic binary translation[J]. Journal of Information Engineering University, 2010, 11(6): 688−691(in Chinese) doi: 10.3969/j.issn.1671-0673.2010.06.010
    [7]
    殷金彪,宋强. 动态二进制翻译器 QEMU 的 Tcache 管理策略[J]. 计算机应用与软件,2012,29(9):98−100 doi: 10.3969/j.issn.1000-386x.2012.09.026

    Yin Jinbiao, Song Qiang. TCache management schemes for dynamic binary translator QEMU[J]. Computer Applications and Software, 2012, 29(9): 98−100 (in Chinese) doi: 10.3969/j.issn.1000-386x.2012.09.026
    [8]
    徐金龙,蒋烈辉,董卫宇,等. 动态二进制翻译缓存的分区管理机制研究[J]. 计算机工程,2012,38(2):60−62 doi: 10.3969/j.issn.1000-3428.2012.02.019

    Xu Jinlong, Jiang Liehui, Dong Weiyu, et al. Research on division management mechanism of dynamic binary translation cache[J]. Computer Engineering, 2012, 38(2): 60−62(in Chinese) doi: 10.3969/j.issn.1000-3428.2012.02.019
    [9]
    Apala G, Kim H, Mary S. Balancing memory and performance through selective flushing of software code caches[C/OL]//Proc of the 2010 Int Conf on Compilers, Architectures and Synthesis for Embedded Systems. New York: ACM 2010[2024-11-11]. https://dl.acm.org/doi/10.1145/1878921.1878923
    [10]
    Kim H, Michael D S. Characterizing inter-execution and inter-application optimization persistence[C]//Proc of the 17th Annual Int Conf on Supercomputing Workshop on Exploring the Trace Space for Dynamic Optimization Techniques. New York: ACM, 2003: 51−58
    [11]
    Vijay J R, Dan C, Robert S C. Persistence in dynamic code transformation systems[J]. ACM SIGARCH Computer Architecture News, 2005, 33(5): 69−74 doi: 10.1145/1127577.1127591
    [12]
    Vijay J R, Dan C, Robert C, et al. Persistent code caching: Exploiting code reuse across executions and applications[C]//Prof of the 5th Int Symp on Code Generation and Optimization. Piscataway, NJ: IEEE, 2007: 74−88
    [13]
    Derek B, Vladimir K. Process-shared and persistent code caches[C]//Proc of the 4th ACM SIGPLAN/SIGOPS Int Conf on Virtual Execution Environments. New York: ACM, 2008: 61−70
    [14]
    Wang Wenwen, Yew P C, Zhai A, et al. A general persistent code caching framework for dynamic binary translation (DBT)[C]//Proc of the 2016 USENIX Conf on USENIX Annual Technical Conference (USENIX ATC '16). Berkeley, CA: USENIX Association, 2016: 591−603
    [15]
    Lin Haoming, Dong Yong, Chi Wanqing, et al. Efficient dynamic binary translation with accumulative persistent code caching[C]//Proc of the 28th Int Conf on Parallel and Distributed Systems (ICPADS). Piscataway, NJ: IEEE, 2023: 458−466
    [16]
    Vasanth B, Evelyn D, Sanjeev B. Dynamo: A transparent dynamic optimization system[J]. ACM SIGPLAN Notices, 2000, 35(5): 1−12 doi: 10.1145/358438.349303
    [17]
    Derek B, Saman A. Maintaining consistency and bounding capacity of software code caches[C]//Proc of the 3rd Int Symp on Code Generation and Optimization. Piscataway, NJ: IEEE, 2005: 74−85
    [18]
    Ma Ruhui, Guan Haibing, Zhu Erzhou, et al. Code cache management based on working set in dynamic binary translator[J]. Computer Science and Information Systems, 2011, 8(3): 653−671 doi: 10.2298/CSIS100327022M
    [19]
    Deng Fei, Gao Feng, Yan Yuanqiang, et al. Research on code cache management strategy based on code heat in dynamic binary translator[C]//Proc of the 18th Int Conf on Software Quality, Reliability and Security Companion (QRS-C). Piscataway, NJ: IEEE, 2018: 647−651
    [20]
    Chen Wei, Shen Li, Lu Hongyi, et al. A light-weight code cache design for dynamic binary translation[C]//Proc of the 15th Int Conf on Parallel and Distributed Systems. Piscataway, NJ: IEEE, 2009: 120−125
    [21]
    Chen Wei, Wang Zhiying, Lu Hongyi, et al. A hardware approach for reducing interpretation overhead[C]//Proc of the 9th Int Conf on Computer and Information Technology. Piscataway, NJ: IEEE, 2009: 98−103
    [22]
    Jose A B, Bruce R C. Heterogeneous code cache: using scratchpad and main memory in dynamic binary translators[C]//Proc of the 46th ACM/IEEE Design Automation Conference. Piscataway, NJ: IEEE, 2009: 744−749
    [23]
    Hsieh A C, Liu C C, Hwang T T. Enhanced heterogeneous code cache management scheme for dynamic binary translation[C]//Proc of the 16th Asia and South Pacific Design Automation Conf (ASP-DAC 2011). Piscataway, NJ: IEEE, 2011: 231−236
    [24]
    Filipe S, Tiago G, Adriano T, et al. A hardware-assisted translation cache for dynamic binary translation in embedded systems[C]//Proc of the 23rd Int Conf on Emerging Technologies and Factory Automation (ETFA). Piscataway, NJ: IEEE, 2018: 307−312
    [25]
    李战辉,刘畅,孟建熠,等. 基于高速缓存负荷均衡的动态二进制翻译研究[J]. 计算机研究与发展,2015,52(9):2105−2113 doi: 10.7544/issn1000-1239.2015.20140220

    Li Zhanhui, Liu Chang, Meng Jianyi, et al. Cache load balancing oriented dynamic binary translation[J]. Journal of Computer Research and Development, 2015, 52(9): 2105−2113 (in Chinese) doi: 10.7544/issn1000-1239.2015.20140220
    [26]
    Jose A B, Bruce R C, Jack W D, et al. Fragment cache management for dynamic binary translators in embedded systems with scratchpad[C]//Proc of the 2007 Int Conf on Compilers, Architecture, and Synthesis for Embedded Systems (CASES '07). New York: ACM, 2007: 75−84
    [27]
    Apala G, Kim H, Mary L S. Reducing exit stub memory consumption in code caches[C]//Proc of the 2nd Int Conf on High performance embedded architectures and compilers (HiPEAC'07). Berlin: Springer, 2007: 87−101
    [28]
    José A B, Bruce R C, Jack W D, et al. Reducing pressure in bounded DBT code caches[C]//Proc of the 2008 Int Conf on Compilers, Architectures and Synthesis for Embedded Systems (CASES '08). New York: ACM, 2008: 109−118
    [29]
    Apala G, Kim H, Mary L S. DBT path selection for holistic memory efficiency and performance[C]//Proc of the 6th ACM SIGPLAN/SIGOPS Int Conf on Virtual Execution Environments (VEE '10). New York: ACM, 2010: 145−156
    [30]
    刘畅,陈志坚,孟建熠,等. 利用控制流识别进行二进制翻译代码缓存压缩[J]. 计算机辅助设计与图形学学报,2014,26(6):999−1006

    Liu Chang, Chen Zhijian, Meng Jianyi, et al. Compress DBT code cache using control flow identification[J]. Journal of Computer-Aided Design & Computer Graphics, 2014, 26(6): 999−1006 (in Chinese)
    [31]
    Wang Zhaoguo, Liu Ran, Chen Yufei, et al. COREMU: A scalable and portable parallel full-system emulator[J]. ACM SIGPLAN Notices, 2011, 46(8): 213−222 doi: 10.1145/2038037.1941583
    [32]
    Ding Jiun-Hung, Chang Po-Chun, Hsu Wei-Chung, et al. PQEMU: A parallel system emulator based on QEMU[C]//Proc of the 17th Int Conf on Parallel and Distributed Systems. Piscataway, NJ: IEEE, 2011: 276−283
    [33]
    Emilio G C, Paolo B, Alex B, et al. Cross-ISA machine emulation for multicores[C]//Proc of the 15th Int Symp on Code Generation and Optimization. Piscataway, NJ: IEEE, 2017: 210−220
    [34]
    Emilio G C, Luca P C. Cross-ISA machine instrumentation using fast and scalable dynamic binary translation[C]//Proc of the 15th ACM SIGPLAN/SIGOPS Int Conf on Virtual Execution Environments (VEE 2019). New York: ACM, 2019: 74−87
    [35]
    胡伟武. 共享存储系统结构[M]. 北京:高等教育出版社,2001:55−73

    Hu Weiwu. Shared Memory Architecture[M]. Beijing: Higher Education Press, 2001: 55−73 (in Chinese)

Catalog

    Article views (39) PDF downloads (6) Cited by()

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return