Abstract:
Improving resource utilization is important for cloud service providers to achieve cost reduction and efficiency, as the cost of construction and operating costs in data centers is extremely high. However, existing metrics such as benchmarks are often scenario-specific and lack adaptability. Some metrics, such as simulated or pre-execution solutions are struggle to cope with sudden workload fluctuations. Therefore, cloud service providers urgently require a real-time performance metric for monitoring performance and managing resources. Existing real-time metrics often rely on software or hardware support, and most fail to accurately identify resource bottlenecks. To address this challenge, this paper proposes KFCMetric (metric based on kernel function call frequencies), a performance metric based on the frequency of kernel function calls. KFCMetric employs static kernel analysis techniques to generate kernel function call graphs and utilizes a random forest algorithm to select kernel functions that exhibit a strong correlation with application performance and resource status. The behavior of these kernel can dynamically reflect the runtime resource state of applications and accurately pinpoint the root causes of performance bottlenecks. Building on KFCMetric, the paper further introduces a resource management system, KFCSys. Using dynamic kernel instrumentation techniques, KFCSys captures the calls and returns of these kernel functions and uses a variance-based scaling algorithm to measure the deviation of data points from normal. Based on the real-time status of latency-critical applications, KFCSys dynamically adapts resource allocation to meet the resource requirements of latency-critical applications, ensuring their service quality. Experimental results indicate that KFCMetric achieves an accuracy of 96.7% in detecting resource bottlenecks, and its interference detection accuracy shows up to a 31.4% improvement over CPI. In terms of resource management, KFCSys reduces the average tail latency by up to 30.4% compared to PARTIES, and by 40.7% compared to the standard Linux resource management system. Moreover, KFCSys decreases the proportion of performance violation time by 10% relative to PARTIES. Therefore, in scenarios where hardware or software support is not available, KFCMetric, constructed from kernel function features, is a novel and robust performance metric for performance monitoring and resource management.