面向多线程程序的智能缓存分配方法

何铭健; 王桦

doi:10.7544/issn1000-1239.202550001

面向多线程程序的智能缓存分配方法

何铭健,
王桦

A Learning-Based Cache Partitioning Method for Multithreaded Applications

摘要

摘要: 多线程编程模型被广泛用于促进多核上的应用程序执行. 然而，并发执行的线程对最后一级缓存（last level cache，LLC）的竞争造成的干扰可能会导致性能下降. 英特尔缓存分配技术（cache allocation technology，CAT）提供了一种机制为不同线程实时分配缓存. 然而，有2个原因导致现有分配方法并不适用于多线程程序. 首先，它们是为多应用场景量身定制的，而不是为涉及多个相关线程的单个多线程应用场景. 其次，它们旨在提高每周期指令数（instruction per cycle, IPC），这在多线程场景中不是一个合适的性能指标. 为了解决这个问题，提出了LPart，这是一种用于多线程应用程序的智能缓存分配技术，通过分配缓存显著提高了系统的吞吐量. LPart利用深度强化学习为应用程序中的不同线程分配适当数量的缓存空间. 在微基准测试、Redis、商用分布式存储系统和多种应用场景上评估LPart.实验结果表明，与默认配置相比，LPart分别实现了26.9%, 8.1%, 9.8%, 24.1%的性能提升.

Abstract: Multithread programming models are widely employed to facilitate application execution on multiple cores. However, interference caused by competition for the last level cache (LLC) among concurrently executed threads can lead to performance degradation. Intel Cache Allocation Technology (CAT) provides a mechanism to assign cache ways to cores for achieving cache isolation. Unfortunately, prior studies on CAT-based partitioning are not applicable to multithreaded programs due to two reasons. Firstly, they are tailored for multi-application scenarios rather than single-application scenarios involving multiple related threads. Secondly, they are designed for improving instructions per cycle (IPC), which is not a suitable performance indicator in multithreaded scenarios. To bridge this gap, we present LPart, a learning-based cache partitioning technique for multithreaded applications that significantly improve throughput in multicores with way-partitioned caches. LPart leverages deep reinforcement learning to assign an appropriate number of LLC ways for different threads within an application, without needing any prior knowledge of the application characteristics. We evaluate LPart on a microbenchmark, Redis, a commercial production-ready distributed storage system, and multiple application scenarios. The experimental results indicate that, compared to the default configuration, LPart achieved speedups of up to 26.9%, 8.1%, 9.8%, and 24.1% respectively without requiring any changes to the operating system.

HTML全文

参考文献(35)

施引文献

资源附件(0)