• 中国精品科技期刊
  • CCF推荐A类中文期刊
  • 计算领域高质量科技期刊T1类
Advanced Search
Zhang Chuanqi, Wang Sa, Sun Ninghui, Bao Yungang. GroupUCP: An On-Demand Fine-Grained Cache Dynamic Partition Strategy[J]. Journal of Computer Research and Development, 2025, 62(4): 989-1002. DOI: 10.7544/issn1000-1239.202440061
Citation: Zhang Chuanqi, Wang Sa, Sun Ninghui, Bao Yungang. GroupUCP: An On-Demand Fine-Grained Cache Dynamic Partition Strategy[J]. Journal of Computer Research and Development, 2025, 62(4): 989-1002. DOI: 10.7544/issn1000-1239.202440061

GroupUCP: An On-Demand Fine-Grained Cache Dynamic Partition Strategy

Funds: This work was supported by the Strategic Priority Research Program of Chinese Academy of Sciences (XDA0320000, XDA0320300), the National Natural Science Foundation of China (62090022, 62172388), and the Science and Technology Project of State Grid Corporation of China (SGJSXT00TYJS2100414).
More Information
  • Author Bio:

    Zhang Chuanqi: born in 1996. PhD candidate. Student member of CCF. His main research interest includes data center computer architecture

    Wang Sa: born in 1986. PhD, associate professor. Member of CCF. His main research interests include cloud computing, operating systems, and system modeling and performance analysis

    Sun Ninghui: born in 1968. PhD. Academician of Chinese Academy of Engineering. Fellow of CCF. His main research interests include computer architecture and high performance computing

    Bao Yungang: born in 1980. PhD, professor. Member of CCF. His main research interests include datacenter architecture, agile design methodology of processor chips, and ecosystem of open-source processor chips

  • Received Date: January 28, 2024
  • Revised Date: July 18, 2024
  • Accepted Date: September 02, 2024
  • Available Online: September 08, 2024
  • With the advancement of modern computer technology, the memory wall problem is getting more and more severe. Under this background, the last-level cache in multi-level memory hierarchy becomes a key resource affecting system performance. In recent years, various researches have optimized the last-level cache by means of size expansion and dynamic resource management. Way-partitioning technique is the main method of cache resource management, which optimizes system performance by partitioning the cache into ways and allocating them to each application. However, it is coarse-grained and requires all sets of caches to follow the same way-partitioning strategy. In fact, applications may have different space demand on different sets, and the way-partitioning technique restricts the space utilization of the cache, resulting in a waste of cache resources. In this paper, we propose an on-demand fine-grained cache resource management technique, GroupUCP, whose design idea is to aggregate individual cache sets into groups based on the different space demand of each application on each set, using dynamic grouping and real-time evaluation. Each group can be allocated space on demand independently, thus improving cache utilization and overall system performance. Experiments demonstrate that GroupUCP achieves finer-grained on-demand resource allocation using less hardware resources than the traditional UCP approach and achieves higher system performance improvement in cache-sensitive application combinations which shows imbalance space demand of cache.

  • [1]
    Wulf Wm A, McKee S A. Hitting the memory wall: Implications of the obvious[J]. ACM SIGARCH Computer Architecture News, 1995, 23(1): 20−24 doi: 10.1145/216585.216588
    [2]
    Wuu J, Agarwal R, Ciraula M, et al. 3D V-Cache: The implementation of a hybrid-bonded 64 MB stacked cache for a 7nm x86−64 CPU[C]//Proc of 2022 IEEE Int Solid-State Circuits Conf (ISSCC). Piscataway, NJ: IEEE, 2022: 428−429
    [3]
    Advanced Micro Devices, Inc. 3rd Gen AMD EPYC processors with AMD 3D V-Cache technology deliver outstanding leadership performance in technical computing workloads[EB/OL]. (2022-03-21)[2023-11-03]. https://www.amd.com/en/press-releases/2022-03-21-3rd-gen-amd-epyc-processors-amd-3d-v-cache-technology-deliver-outstanding
    [4]
    Yang Ailin. Resolving noisy neighbors[EB/OL]. (2022-10-19)[2023-09-20]. https://www.intel.com/content/www/us/en/developer/articles/technical/noisy-neighbors-problem-in-kubernetes.html
    [5]
    Kim S, Chandra D, Solihin Y. Fair cache sharing and partitioning in a chip multiprocessor architecture[C]//Proc of the 13th Int Conf on Parallel Architecture and Compilation Techniques. Piscataway, NJ: IEEE, 2004: 111−122
    [6]
    Chen Shimin, Gibbons P B, Kozuch M, et al. Scheduling threads for constructive cache sharing on CMPs[C]//Proc of the 19th ACM Symp Parallel Algorithms and Architectures. New York: ACM, 2007: 105−115
    [7]
    El-Sayed N, Mukkara A, Tsai P A, et al. KPart: A hybrid cache partitioning-sharing technique for commodity multicores[C]//Proc of the 24th IEEE Int Symp on High Performance Computer Architecture (HPCA). Piscataway, NJ: IEEE, 2018: 104−117
    [8]
    Xu Cong, Rajamani K, Ferreira A, et al. dCat: Dynamic cache management for efficient, performance-sensitive infrastructure-as-a-service[C/OL]//Proc of the 13th EuroSys Conf. New York: ACM, 2018[2023-11-30]. https://doi.org/10.1145/3190508.3190555
    [9]
    Nguyen K. Introduction to cache allocation technology in the Intel® Xeon®[EB/OL].[2023-11-20]. https://www.intel.com/content/www/us/en/developer/articles/technical/introduction-to-cache-allocation-technology.html
    [10]
    Advanced Micro Devices, Inc. AMD64 technology platform quality of service extensions[EB/OL]. 2018[2024-01-01]. http://kib.kiev.ua/x86docs/AMD/MISC/56375_1.00_PUB.pdf
    [11]
    Arm Limited. Memory system resource partitioning and monitoring (MPAM), for A-profile architecture[EB/OL]. 2022[2023-11-20]. https://developer.arm.com/documentation/ddi0598/latest
    [12]
    Xiang Yaocheng, Wang Xiaolin, Huang Zihui, et al. DCAPS: Dynamic cache allocation with partial sharing[C/OL]//Proc of the 13th EuroSys Conf. New York: ACM, 2018[2023-11-30]. https://doi.org/10.1145/3190508.3190511
    [13]
    Chen Ruobing, Wu Jinping, Shi Haosen, et al. DRLPart: A deep reinforcement learning framework for optimally efficient and robust resource partitioning on commodity servers[C]//Proc of the 30th Int Symp on High-Performance Parallel and Distributed Computing. New York: ACM, 2021: 175−188
    [14]
    Qureshi M K, Patt Y N. Utility-based cache partitioning: A low-overhead, high-performance, runtime mechanism to partition shared caches[C]//Proc of the 39th Annual IEEE/ACM Int Symp on Microarchitecture. New York: ACM, 2006: 423−432
    [15]
    Shahrad M, Balkind J, Wentzlaff D. Architectural implications of function-as-a-service computing[C]//Proc of the 52nd Annual IEEE/ACM Int Symp on Microarchitecture. New York: ACM, 2019: 1063−1075
    [16]
    Rolán D, Fraguela B B, Doallo R. Adaptive line placement with the set balancing cache[C]//Proc of the 42nd Annual IEEE/ACM Int Symp on Microarchitecture (MICRO). New York: ACM, 2009: 529−540
    [17]
    Zhan Dongyuan, Jiang Hong, Seth S C. Exploiting set-level non-uniformity of capacity demand to enhance CMP cooperative caching[C/OL]//Proc of 2010 IEEE Int Symp on Parallel & Distributed Processing (IPDPS). Piscataway, NJ: IEEE, 2010[2023-11-29]. https://ieeexplore.ieee.org/document/5470441
    [18]
    Zhan Dongyuan, Jiang Hong, Seth S C. STEM: Spatiotemporal management of capacity for intra-core last level caches[C]//Proc of the 43rd Annual IEEE/ACM Int Symp on Microarchitecture. New York: ACM, 2010: 163−174
    [19]
    Rolán D, Fraguela B B, Doallo R. Adaptive set-granular cooperative caching[C]//Proc of the 18th IEEE Int Symp on High-Performance Computer Architecture. Piscataway, NJ: IEEE, 2012: 213−224
    [20]
    Barroso L A, Hölzle U, Ranganathan P. The Datacenter as A Computer: Designing Warehouse-Scale Machines[M]. Cham: Springer, 2019
    [21]
    马久跃,余子濠,包云岗,等. 体系结构内可编程数据平面方法[J]. 计算机研究与发展,2017,54(1):123−133 doi: 10.7544/issn1000-1239.2017.20160102

    Ma Jiuyue, Yu Zihao, Bao Yungang, et al. A programmable data plane design in computer architecture[J]. Journal of Computer Research and Development, 2017, 54(1): 123−133 (in Chinese) doi: 10.7544/issn1000-1239.2017.20160102
    [22]
    Delimitrou C, Kozyrakis C. iBench: Quantifying interference for datacenter applications[C]//Proc of 2013 IEEE Int Symp on Workload Characterization (IISWC). Piscataway, NJ: IEEE, 2013: 23−33
    [23]
    Sherwood T, Calder B, Emer J. Reducing cache misses using hardware and software page placement[C]//Proc of the 13th Int Conf on Supercomputing. New York: ACM, 1999: 155−164
    [24]
    Tam D, Azimi R, Soares L, et al. Managing shared L2 caches on multicore systems in software[C]//Proc of Workshop on the Interaction between Operating Systems and Computer Architecture. New York: ACM, 2007: 26−33
    [25]
    Jin Xinxin, Chen Haogang, Wang Xiaolin, et al. A simple cache partitioning approach in a virtualized environment[C]//Proc of 2009 IEEE Int Symp on Parallel and Distributed Processing with Applications. Piscataway, NJ: IEEE, 2009: 519−524
    [26]
    Lin Jiang, Lu Qingda, Ding Xiaoning, et al. Gaining insights into multicore cache partitioning: Bridging the gap between simulation and real systems[C]//Proc of the 14th IEEE Int Symp on High Performance Computer Architecture. Piscataway, NJ: IEEE, 2008: 367−378
    [27]
    Zhang Xiao, Dwarkadas S, Shen Kai. Towards practical page coloring-based multicore cache management[C]//Proc of the 4th ACM European Conf on Computer Systems. New York: ACM, 2009: 89−102
    [28]
    Zhang Ludan, Liu Yi, Wang Rui, et al. Lightweight dynamic partitioning for last level cache of multicore processor on real system[C]//Proc of the 13th Int Conf on Parallel and Distributed Computing, Applications and Technologies. Piscataway, NJ: IEEE, 2012: 33−38
    [29]
    Ye Ying, West R, Cheng Zhuoqun, et al. COLORIS: A dynamic cache partitioning system using page coloring[C]//Proc of the 23rd Int Conf on Parallel Architectures and Compilation. New York: ACM, 2014: 381−392
    [30]
    Qureshi M K, Thompson D, Patt Y N. The V-Way cache: Demand-based associativity via global replacement[C]//Proc of the 32nd Int Symp on Computer Architecture (ISCA’05). Los Alamitos, CA: IEEE Computer Society, 2005: 544−555
    [31]
    Varadarajan K, Nandy S K, Sharda V, et al. Molecular caches: A caching structure for dynamic creation of application-specific Heterogeneous cache regions[C]//Proc of the 39th Annual IEEE/ACM Int Symp on Microarchitecture (MICRO’06). New York: ACM, 2006: 433−442
    [32]
    Beckmann N, Sanchez D. Jigsaw: Scalable software-defined caches[C]//Proc of the 22nd Int Conf on Parallel Architectures and Compilation Techniques. Piscataway, NJ: IEEE, 2013: 213−224
    [33]
    Sanchez D, Kozyrakis C. Vantage: Scalable and efficient fine-grain cache partitioning[C]//Proc of the 38th Annual Int Symp on Computer Architecture (ISCA). New York: ACM, 2011: 57−68
    [34]
    Brock J, Ye Chencheng, Ding Chen, et al. Optimal cache partition-sharing[C]//Proc of the 44th Int Conf on Parallel Processing. Los Alamitos, CA: IEEE Computer Society, 2015: 749−758
    [35]
    Qureshi M K, Lynch D N, Mutlu O, et al. A case for MLP-aware cache replacement[C]//Proc of the 33rd Int Symp on Computer Architecture (ISCA’06). Los Alamitos, CA: IEEE Computer Society, 2006: 167−178
    [36]
    Rajkumar R, Lee C, Lehoczky J, et al. A resource allocation model for QoS management[C]//Proc of the 18th Real-Time Systems Symp. Piscataway, NJ: IEEE, 1997: 298−307
    [37]
    Binkert N, Beckmann B, Black G, et al. The GEM5 simulator[J]. ACM SIGARCH Computer Architecture News, 2011, 39(2): 1−7 doi: 10.1145/2024716.2024718
    [38]
    Bucek J, Lange K D, Von Kistowski J. SPEC CPU2017: Next-generation compute benchmark[C]//Proc of the 9th ACM/SPEC Int Conf on Performance Engineering. New York: ACM, 2018: 41−42
    [39]
    Beamer S, Asanović K, Patterson D. The GAP benchmark suite[J]. arXiv preprint, arXiv: 1508.03619, 2017
    [40]
    Kasture H, Sanchez D. Tailbench: A benchmark suite and evaluation methodology for latency-critical applications[C]//Proc of 2016 IEEE Int Symp on Workload Characterization (IISWC). Piscataway, NJ: IEEE, 2016: 3−12
    [41]
    Jain A, Lin C. Back to the future: Leveraging Belady’s algorithm for improved cache replacement[C]//Proc of the 43rd ACM/IEEE Annual Int Symp on Computer Architecture (ISCA). Piscataway, NJ: IEEE, 2016: 78−89
    [42]
    Jaleel A, Hasenplaugh W, Qureshi M, et al. Adaptive insertion policies for managing shared caches[C]//Proc of the 17th Int Conf on Parallel Architectures and Compilation Techniques (PACT). Piscataway, NJ: IEEE, 2008: 208−219
    [43]
    Sherwood T, Perelman E, Hamerly G, et al. Automatically characterizing large scale program behavior[C]//Proc of the 10th Int Conf on Architectural Support for Programming Languages and Operating Systems. New York: ACM, 2002: 45−57
    [44]
    Velásquez R A, Michaud P, Seznec A. Selecting benchmark combinations for the evaluation of multicore throughput[C]//Proc of 2013 IEEE Int Symp on Performance Analysis of Systems and Software (ISPASS). Piscataway, NJ: IEEE, 2013: 173−182
    [45]
    Eyerman S, Eeckhout L. System-level performance metrics for multiprogram workloads[J]. IEEE Micro, 2008, 28(3): 42−53 doi: 10.1109/MM.2008.44
    [46]
    Snavely A, Tullsen D M. Symbiotic jobscheduling for a simultaneous multithreaded processor[C]//Proc of the 9th Int Conf on Architectural Support for Programming Languages and Operating Systems. New York: ACM, 2000: 234−244
    [47]
    Wu Hao, Nathella K, Pusdesris J, et al. Temporal prefetching without the off-chip metadata[C]//Proc of the 52nd Annual IEEE/ACM Int Symp on Microarchitecture. New York: ACM, 2019: 996−1008
    [48]
    Wu Hao, Nathella K, Pabst M, et al. Practical temporal prefetching with compressed on-chip metadata[J]. IEEE Transactions on Computers, 2022, 71(11): 2858−2871 doi: 10.1109/TC.2021.3065909
    [49]
    Muralimanohar N, Balasubramonian R, Jouppi N P. CACTI 6.0: A tool to model large caches[EB/OL]. (2014-05-30)[2024-03-10]. https://www.researchgate.net/publication/242516869_Cacti_60_A_tool_to_model_large_caches
  • Related Articles

    [1]Li Nan, Ding Yidong, Jiang Haoyu, Niu Jiafei, Yi Ping. Jailbreak Attack for Large Language Models: A Survey[J]. Journal of Computer Research and Development, 2024, 61(5): 1156-1181. DOI: 10.7544/issn1000-1239.202330962
    [2]Chen Xuanting, Ye Junjie, Zu Can, Xu Nuo, Gui Tao, Zhang Qi. Robustness of GPT Large Language Models on Natural Language Processing Tasks[J]. Journal of Computer Research and Development, 2024, 61(5): 1128-1142. DOI: 10.7544/issn1000-1239.202330801
    [3]Shu Wentao, Li Ruixiao, Sun Tianxiang, Huang Xuanjing, Qiu Xipeng. Large Language Models: Principles, Implementation, and Progress[J]. Journal of Computer Research and Development, 2024, 61(2): 351-361. DOI: 10.7544/issn1000-1239.202330303
    [4]Yang Yi, Li Ying, Chen Kai. Vulnerability Detection Methods Based on Natural Language Processing[J]. Journal of Computer Research and Development, 2022, 59(12): 2649-2666. DOI: 10.7544/issn1000-1239.20210627
    [5]Pan Xuan, Xu Sihan, Cai Xiangrui, Wen Yanlong, Yuan Xiaojie. Survey on Deep Learning Based Natural Language Interface to Database[J]. Journal of Computer Research and Development, 2021, 58(9): 1925-1950. DOI: 10.7544/issn1000-1239.2021.20200209
    [6]Zheng Haibin, Chen Jinyin, Zhang Yan, Zhang Xuhong, Ge Chunpeng, Liu Zhe, Ouyang Yike, Ji Shouling. Survey of Adversarial Attack, Defense and Robustness Analysis for Natural Language Processing[J]. Journal of Computer Research and Development, 2021, 58(8): 1727-1750. DOI: 10.7544/issn1000-1239.2021.20210304
    [7]Pan Xudong, Zhang Mi, Yan Yifan, Lu Yifan, Yang Min. Evaluating Privacy Risks of Deep Learning Based General-Purpose Language Models[J]. Journal of Computer Research and Development, 2021, 58(5): 1092-1105. DOI: 10.7544/issn1000-1239.2021.20200908
    [8]Bao Yang, Yang Zhibin, Yang Yongqiang, Xie Jian, Zhou Yong, Yue Tao, Huang Zhiqiu, Guo Peng. An Automated Approach to Generate SysML Models from Restricted Natural Language Requirements in Chinese[J]. Journal of Computer Research and Development, 2021, 58(4): 706-730. DOI: 10.7544/issn1000-1239.2021.20200757
    [9]Yu Kai, Jia Lei, Chen Yuqiang, and Xu Wei. Deep Learning: Yesterday, Today, and Tomorrow[J]. Journal of Computer Research and Development, 2013, 50(9): 1799-1804.
    [10]Che Haiyan, Feng Tie, Zhang Jiachen, Chen Wei, and Li Dali. Automatic Knowledge Extraction from Chinese Natural Language Documents[J]. Journal of Computer Research and Development, 2013, 50(4): 834-842.

Catalog

    Article views (166) PDF downloads (23) Cited by()

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return