Abstract:
Multithread programming models are widely employed to facilitate application execution on multiple cores. However, interference caused by competition for the last level cache (LLC) among concurrently executed threads can lead to performance degradation. Intel Cache Allocation Technology (CAT) provides a mechanism to assign cache ways to cores for achieving cache isolation. Unfortunately, prior studies on CAT-based partitioning are not applicable to multithreaded programs due to two reasons. Firstly, they are tailored for multi-application scenarios rather than single-application scenarios involving multiple related threads. Secondly, they are designed for improving instructions per cycle (IPC), which is not a suitable performance indicator in multithreaded scenarios.
To bridge this gap, we present LPart, a learning-based cache partitioning technique for multithreaded applications that significantly improve throughput in multicores with way-partitioned caches. LPart leverages deep reinforcement learning to assign an appropriate number of LLC ways for different threads within an application, without needing any prior knowledge of the application characteristics. We evaluate LPart on a microbenchmark, Redis, a commercial production-ready distributed storage system, and multiple application scenarios. The experimental results indicate that, compared to the default configuration, LPart achieved speedups of up to 26.9%, 8.1%, 9.8%, and 24.1%, respectively, without requiring any changes to the operating system.