Citation: | Zhang Chuanqi, Wang Sa, Sun Ninghui, Bao Yungang. GroupUCP: An On-Demand Fine-Grained Cache Dynamic Partition Strategy[J]. Journal of Computer Research and Development, 2025, 62(4): 989-1002. DOI: 10.7544/issn1000-1239.202440061 |
With the advancement of modern computer technology, the memory wall problem is getting more and more severe. Under this background, the last-level cache in multi-level memory hierarchy becomes a key resource affecting system performance. In recent years, various researches have optimized the last-level cache by means of size expansion and dynamic resource management. Way-partitioning technique is the main method of cache resource management, which optimizes system performance by partitioning the cache into ways and allocating them to each application. However, it is coarse-grained and requires all sets of caches to follow the same way-partitioning strategy. In fact, applications may have different space demand on different sets, and the way-partitioning technique restricts the space utilization of the cache, resulting in a waste of cache resources. In this paper, we propose an on-demand fine-grained cache resource management technique, GroupUCP, whose design idea is to aggregate individual cache sets into groups based on the different space demand of each application on each set, using dynamic grouping and real-time evaluation. Each group can be allocated space on demand independently, thus improving cache utilization and overall system performance. Experiments demonstrate that GroupUCP achieves finer-grained on-demand resource allocation using less hardware resources than the traditional UCP approach and achieves higher system performance improvement in cache-sensitive application combinations which shows imbalance space demand of cache.
[1] |
Wulf Wm A, McKee S A. Hitting the memory wall: Implications of the obvious[J]. ACM SIGARCH Computer Architecture News, 1995, 23(1): 20−24 doi: 10.1145/216585.216588
|
[2] |
Wuu J, Agarwal R, Ciraula M, et al. 3D V-Cache: The implementation of a hybrid-bonded 64 MB stacked cache for a 7nm x86−64 CPU[C]//Proc of 2022 IEEE Int Solid-State Circuits Conf (ISSCC). Piscataway, NJ: IEEE, 2022: 428−429
|
[3] |
Advanced Micro Devices, Inc. 3rd Gen AMD EPYC processors with AMD 3D V-Cache technology deliver outstanding leadership performance in technical computing workloads[EB/OL]. (2022-03-21)[2023-11-03]. https://www.amd.com/en/press-releases/2022-03-21-3rd-gen-amd-epyc-processors-amd-3d-v-cache-technology-deliver-outstanding
|
[4] |
Yang Ailin. Resolving noisy neighbors[EB/OL]. (2022-10-19)[2023-09-20]. https://www.intel.com/content/www/us/en/developer/articles/technical/noisy-neighbors-problem-in-kubernetes.html
|
[5] |
Kim S, Chandra D, Solihin Y. Fair cache sharing and partitioning in a chip multiprocessor architecture[C]//Proc of the 13th Int Conf on Parallel Architecture and Compilation Techniques. Piscataway, NJ: IEEE, 2004: 111−122
|
[6] |
Chen Shimin, Gibbons P B, Kozuch M, et al. Scheduling threads for constructive cache sharing on CMPs[C]//Proc of the 19th ACM Symp Parallel Algorithms and Architectures. New York: ACM, 2007: 105−115
|
[7] |
El-Sayed N, Mukkara A, Tsai P A, et al. KPart: A hybrid cache partitioning-sharing technique for commodity multicores[C]//Proc of the 24th IEEE Int Symp on High Performance Computer Architecture (HPCA). Piscataway, NJ: IEEE, 2018: 104−117
|
[8] |
Xu Cong, Rajamani K, Ferreira A, et al. dCat: Dynamic cache management for efficient, performance-sensitive infrastructure-as-a-service[C/OL]//Proc of the 13th EuroSys Conf. New York: ACM, 2018[2023-11-30]. https://doi.org/10.1145/3190508.3190555
|
[9] |
Nguyen K. Introduction to cache allocation technology in the Intel® Xeon®[EB/OL].[2023-11-20]. https://www.intel.com/content/www/us/en/developer/articles/technical/introduction-to-cache-allocation-technology.html
|
[10] |
Advanced Micro Devices, Inc. AMD64 technology platform quality of service extensions[EB/OL]. 2018[2024-01-01]. http://kib.kiev.ua/x86docs/AMD/MISC/56375_1.00_PUB.pdf
|
[11] |
Arm Limited. Memory system resource partitioning and monitoring (MPAM), for A-profile architecture[EB/OL]. 2022[2023-11-20]. https://developer.arm.com/documentation/ddi0598/latest
|
[12] |
Xiang Yaocheng, Wang Xiaolin, Huang Zihui, et al. DCAPS: Dynamic cache allocation with partial sharing[C/OL]//Proc of the 13th EuroSys Conf. New York: ACM, 2018[2023-11-30]. https://doi.org/10.1145/3190508.3190511
|
[13] |
Chen Ruobing, Wu Jinping, Shi Haosen, et al. DRLPart: A deep reinforcement learning framework for optimally efficient and robust resource partitioning on commodity servers[C]//Proc of the 30th Int Symp on High-Performance Parallel and Distributed Computing. New York: ACM, 2021: 175−188
|
[14] |
Qureshi M K, Patt Y N. Utility-based cache partitioning: A low-overhead, high-performance, runtime mechanism to partition shared caches[C]//Proc of the 39th Annual IEEE/ACM Int Symp on Microarchitecture. New York: ACM, 2006: 423−432
|
[15] |
Shahrad M, Balkind J, Wentzlaff D. Architectural implications of function-as-a-service computing[C]//Proc of the 52nd Annual IEEE/ACM Int Symp on Microarchitecture. New York: ACM, 2019: 1063−1075
|
[16] |
Rolán D, Fraguela B B, Doallo R. Adaptive line placement with the set balancing cache[C]//Proc of the 42nd Annual IEEE/ACM Int Symp on Microarchitecture (MICRO). New York: ACM, 2009: 529−540
|
[17] |
Zhan Dongyuan, Jiang Hong, Seth S C. Exploiting set-level non-uniformity of capacity demand to enhance CMP cooperative caching[C/OL]//Proc of 2010 IEEE Int Symp on Parallel & Distributed Processing (IPDPS). Piscataway, NJ: IEEE, 2010[2023-11-29]. https://ieeexplore.ieee.org/document/5470441
|
[18] |
Zhan Dongyuan, Jiang Hong, Seth S C. STEM: Spatiotemporal management of capacity for intra-core last level caches[C]//Proc of the 43rd Annual IEEE/ACM Int Symp on Microarchitecture. New York: ACM, 2010: 163−174
|
[19] |
Rolán D, Fraguela B B, Doallo R. Adaptive set-granular cooperative caching[C]//Proc of the 18th IEEE Int Symp on High-Performance Computer Architecture. Piscataway, NJ: IEEE, 2012: 213−224
|
[20] |
Barroso L A, Hölzle U, Ranganathan P. The Datacenter as A Computer: Designing Warehouse-Scale Machines[M]. Cham: Springer, 2019
|
[21] |
马久跃,余子濠,包云岗,等. 体系结构内可编程数据平面方法[J]. 计算机研究与发展,2017,54(1):123−133 doi: 10.7544/issn1000-1239.2017.20160102
Ma Jiuyue, Yu Zihao, Bao Yungang, et al. A programmable data plane design in computer architecture[J]. Journal of Computer Research and Development, 2017, 54(1): 123−133 (in Chinese) doi: 10.7544/issn1000-1239.2017.20160102
|
[22] |
Delimitrou C, Kozyrakis C. iBench: Quantifying interference for datacenter applications[C]//Proc of 2013 IEEE Int Symp on Workload Characterization (IISWC). Piscataway, NJ: IEEE, 2013: 23−33
|
[23] |
Sherwood T, Calder B, Emer J. Reducing cache misses using hardware and software page placement[C]//Proc of the 13th Int Conf on Supercomputing. New York: ACM, 1999: 155−164
|
[24] |
Tam D, Azimi R, Soares L, et al. Managing shared L2 caches on multicore systems in software[C]//Proc of Workshop on the Interaction between Operating Systems and Computer Architecture. New York: ACM, 2007: 26−33
|
[25] |
Jin Xinxin, Chen Haogang, Wang Xiaolin, et al. A simple cache partitioning approach in a virtualized environment[C]//Proc of 2009 IEEE Int Symp on Parallel and Distributed Processing with Applications. Piscataway, NJ: IEEE, 2009: 519−524
|
[26] |
Lin Jiang, Lu Qingda, Ding Xiaoning, et al. Gaining insights into multicore cache partitioning: Bridging the gap between simulation and real systems[C]//Proc of the 14th IEEE Int Symp on High Performance Computer Architecture. Piscataway, NJ: IEEE, 2008: 367−378
|
[27] |
Zhang Xiao, Dwarkadas S, Shen Kai. Towards practical page coloring-based multicore cache management[C]//Proc of the 4th ACM European Conf on Computer Systems. New York: ACM, 2009: 89−102
|
[28] |
Zhang Ludan, Liu Yi, Wang Rui, et al. Lightweight dynamic partitioning for last level cache of multicore processor on real system[C]//Proc of the 13th Int Conf on Parallel and Distributed Computing, Applications and Technologies. Piscataway, NJ: IEEE, 2012: 33−38
|
[29] |
Ye Ying, West R, Cheng Zhuoqun, et al. COLORIS: A dynamic cache partitioning system using page coloring[C]//Proc of the 23rd Int Conf on Parallel Architectures and Compilation. New York: ACM, 2014: 381−392
|
[30] |
Qureshi M K, Thompson D, Patt Y N. The V-Way cache: Demand-based associativity via global replacement[C]//Proc of the 32nd Int Symp on Computer Architecture (ISCA’05). Los Alamitos, CA: IEEE Computer Society, 2005: 544−555
|
[31] |
Varadarajan K, Nandy S K, Sharda V, et al. Molecular caches: A caching structure for dynamic creation of application-specific Heterogeneous cache regions[C]//Proc of the 39th Annual IEEE/ACM Int Symp on Microarchitecture (MICRO’06). New York: ACM, 2006: 433−442
|
[32] |
Beckmann N, Sanchez D. Jigsaw: Scalable software-defined caches[C]//Proc of the 22nd Int Conf on Parallel Architectures and Compilation Techniques. Piscataway, NJ: IEEE, 2013: 213−224
|
[33] |
Sanchez D, Kozyrakis C. Vantage: Scalable and efficient fine-grain cache partitioning[C]//Proc of the 38th Annual Int Symp on Computer Architecture (ISCA). New York: ACM, 2011: 57−68
|
[34] |
Brock J, Ye Chencheng, Ding Chen, et al. Optimal cache partition-sharing[C]//Proc of the 44th Int Conf on Parallel Processing. Los Alamitos, CA: IEEE Computer Society, 2015: 749−758
|
[35] |
Qureshi M K, Lynch D N, Mutlu O, et al. A case for MLP-aware cache replacement[C]//Proc of the 33rd Int Symp on Computer Architecture (ISCA’06). Los Alamitos, CA: IEEE Computer Society, 2006: 167−178
|
[36] |
Rajkumar R, Lee C, Lehoczky J, et al. A resource allocation model for QoS management[C]//Proc of the 18th Real-Time Systems Symp. Piscataway, NJ: IEEE, 1997: 298−307
|
[37] |
Binkert N, Beckmann B, Black G, et al. The GEM5 simulator[J]. ACM SIGARCH Computer Architecture News, 2011, 39(2): 1−7 doi: 10.1145/2024716.2024718
|
[38] |
Bucek J, Lange K D, Von Kistowski J. SPEC CPU2017: Next-generation compute benchmark[C]//Proc of the 9th ACM/SPEC Int Conf on Performance Engineering. New York: ACM, 2018: 41−42
|
[39] |
Beamer S, Asanović K, Patterson D. The GAP benchmark suite[J]. arXiv preprint, arXiv: 1508.03619, 2017
|
[40] |
Kasture H, Sanchez D. Tailbench: A benchmark suite and evaluation methodology for latency-critical applications[C]//Proc of 2016 IEEE Int Symp on Workload Characterization (IISWC). Piscataway, NJ: IEEE, 2016: 3−12
|
[41] |
Jain A, Lin C. Back to the future: Leveraging Belady’s algorithm for improved cache replacement[C]//Proc of the 43rd ACM/IEEE Annual Int Symp on Computer Architecture (ISCA). Piscataway, NJ: IEEE, 2016: 78−89
|
[42] |
Jaleel A, Hasenplaugh W, Qureshi M, et al. Adaptive insertion policies for managing shared caches[C]//Proc of the 17th Int Conf on Parallel Architectures and Compilation Techniques (PACT). Piscataway, NJ: IEEE, 2008: 208−219
|
[43] |
Sherwood T, Perelman E, Hamerly G, et al. Automatically characterizing large scale program behavior[C]//Proc of the 10th Int Conf on Architectural Support for Programming Languages and Operating Systems. New York: ACM, 2002: 45−57
|
[44] |
Velásquez R A, Michaud P, Seznec A. Selecting benchmark combinations for the evaluation of multicore throughput[C]//Proc of 2013 IEEE Int Symp on Performance Analysis of Systems and Software (ISPASS). Piscataway, NJ: IEEE, 2013: 173−182
|
[45] |
Eyerman S, Eeckhout L. System-level performance metrics for multiprogram workloads[J]. IEEE Micro, 2008, 28(3): 42−53 doi: 10.1109/MM.2008.44
|
[46] |
Snavely A, Tullsen D M. Symbiotic jobscheduling for a simultaneous multithreaded processor[C]//Proc of the 9th Int Conf on Architectural Support for Programming Languages and Operating Systems. New York: ACM, 2000: 234−244
|
[47] |
Wu Hao, Nathella K, Pusdesris J, et al. Temporal prefetching without the off-chip metadata[C]//Proc of the 52nd Annual IEEE/ACM Int Symp on Microarchitecture. New York: ACM, 2019: 996−1008
|
[48] |
Wu Hao, Nathella K, Pabst M, et al. Practical temporal prefetching with compressed on-chip metadata[J]. IEEE Transactions on Computers, 2022, 71(11): 2858−2871 doi: 10.1109/TC.2021.3065909
|
[49] |
Muralimanohar N, Balasubramonian R, Jouppi N P. CACTI 6.0: A tool to model large caches[EB/OL]. (2014-05-30)[2024-03-10]. https://www.researchgate.net/publication/242516869_Cacti_60_A_tool_to_model_large_caches
|
[1] | Attention-enhanced Semantic Fusion Knowledge Graph Representation Learning Framework[J]. Journal of Computer Research and Development. DOI: 10.7544/issn1000-1239.202440669 |
[2] | Ke Jing, Xie Zheyong, Xu Tong, Chen Yuhao, Liao Xiangwen, Chen Enhong. An Implicit Semantic Enhanced Fine-Grained Fake News Detection Method Based on Large Language Models[J]. Journal of Computer Research and Development, 2024, 61(5): 1250-1260. DOI: 10.7544/issn1000-1239.202330967 |
[3] | Zhang Wenhan, Liu Xiaoming, Yang Guan, Liu Jie. Cross-Domain Named Entity Recognition of Multi-Level Structured Semantic Knowledge Enhancement[J]. Journal of Computer Research and Development, 2023, 60(12): 2864-2876. DOI: 10.7544/issn1000-1239.202220413 |
[4] | Qi Peng, Cao Juan, Sheng Qiang. Semantics-Enhanced Multi-Modal Fake News Detection[J]. Journal of Computer Research and Development, 2021, 58(7): 1456-1465. DOI: 10.7544/issn1000-1239.2021.20200804 |
[5] | Wu Famin, Lü Guangyi, Liu Qi, He Ming, Chang Biao, He Weidong, Zhong Hui, Zhang Le. Deep Semantic Representation of Time-Sync Comments for Videos[J]. Journal of Computer Research and Development, 2019, 56(2): 293-305. DOI: 10.7544/issn1000-1239.2019.20170752 |
[6] | Cheng Xiaoyang, Zhan Yongzhao, Mao Qirong, Zhan Zhicai. Video Semantic Analysis Based on Topographic Sparse Pre-Training CNN[J]. Journal of Computer Research and Development, 2018, 55(12): 2703-2714. DOI: 10.7544/issn1000-1239.2018.20170579 |
[7] | Yang Lin, Zhang Libo, Luo Tiejian, Wan Qiyang, Wu Yanjun. Knowledge Schematization Method Based on Link and Semantic Relationship[J]. Journal of Computer Research and Development, 2017, 54(8): 1655-1664. DOI: 10.7544/issn1000-1239.2017.20170177 |
[8] | Han Jun, Fan Ju, Zhou Lizhu. Semantic-Enhanced Spatial Keyword Search[J]. Journal of Computer Research and Development, 2015, 52(9): 1954-1964. DOI: 10.7544/issn1000-1239.2015.20140686 |
[9] | Ma Yuchi, Yang Ning, Xie Lin, Li Chuan, and Tang Changjie. Social Roles Discovery of Moving Objects Based on Spatial-Temporal Associated Semantics and Temporal Entropy of Trajectories[J]. Journal of Computer Research and Development, 2012, 49(10): 2153-2160. |
[10] | Liu Yanan, Wu Fei, and Zhuang Yueting. Video Semantics Mining Using Multi-Modality Subspace Correlation Propagation[J]. Journal of Computer Research and Development, 2009, 46(1): 1-8. |
1. |
台建玮,杨双宁,王佳佳,李亚凯,刘奇旭,贾晓启. 大语言模型对抗性攻击与防御综述. 计算机研究与发展. 2025(03): 563-588 .
![]() | |
2. |
布文茹,王昊,李晓敏,周抒,邓三鸿. 古诗词中的探赜索隐:决策层融合大模型修正的典故引用识别方法. 科技情报研究. 2024(04): 37-52 .
![]() | |
3. |
付志远,陈思宇,陈骏帆,海翔,石岩松,李晓琦,李益红,岳秋玲,张玉清. 大语言模型安全的挑战与机遇. 信息安全学报. 2024(05): 26-55 .
![]() |