KFCMetric: a Runtime Performance Metric for Cloud Applications Based on Kernel Function Call Frequencies
-
Graphical Abstract
-
Abstract
Improving resource utilization is crucial for cloud service providers to reduce operational costs while meeting the growing demand for scalable and efficient services. However, existing performance metrics often fail to provide accurate monitoring due to their reliance on static benchmarks or small-scale load testing, which are insufficient for dynamic, rapidly evolving environments, particularly during sudden workload surges common in multi-tenant systems. Furthermore, these metrics frequently depend on specific software or hardware configurations, limiting their adaptability and universality across diverse cloud infrastructures. To address these limitations, this paper introduces KFCMetric, an innovative performance metric designed to enhance resource scheduling and performance monitoring by leveraging kernel function call frequencies. Unlike traditional methods, KFCMetric delves into runtime behavior at the kernel level, extracting kernel function call graphs to significantly reduce the search space and identify critical branches that directly reflect application performance. This targeted approach improves monitoring precision while minimizing overhead. Additionally, KFCMetric employs a probabilistic model to derive a normalized performance metric that calculates deviation values, enabling unified evaluation across various applications without manual configuration or predefined thresholds. Its ability to dynamically adapt to workload fluctuations ensures real-time detection of performance degradation, and by analyzing critical branch ratios, it efficiently pinpoints resource bottlenecks and triggers intelligent resource reallocation mechanisms to maintain service quality. Experimental results reveal that KFCMetric reduces the mean tail latency of applications by 5.5% to 20% compared to state-of-the-art methods such as PARTIES, highlighting its efficiency, adaptability, and reliability. Notably, it is particularly effective in scenarios where hardware performance counters are unavailable, leveraging kernel-level insights to offer a novel and robust solution for extracting critical kernel features, identifying resource bottlenecks, and optimizing resource scheduling, making it a practical choice for complex and dynamic cloud environments.
-
-