高级检索

    基于内核函数调用频率的云应用性能监测指标

    A Performance Metric for Cloud Applications Based on Kernel Function Call Frequencies

    • 摘要: 云计算数据中心的建设和运维成本居高不下,因此提升资源利用率、实现“降本增效”一直是云服务提供商尤为关注的话题。然而,现有的性能指标如基准测试通常基于特定场景,缺乏适应性。而模拟运行或预运行的方案难以应对负载的突然变化。因此,云服务提供商迫切需要一种实时的性能指标来进行性能监测和资源管理。但是,现有的实时指标通常依赖于软硬件支持,且大多数无法准确识别资源瓶颈。为此,提出了KFCMetric (metric based on kernel function call frequencies),一种基于内核函数调用频率的性能指标。KFCMetric通过静态内核分析技术生成内核函数调用图,并利用随机森林筛选出与应用性能和资源状态高度相关的内核函数。这些内核函数的行为能够实时反映应用程序的运行状态,并精确定位性能瓶颈的根源。基于KFCMetric,进一步构建了资源管理系统KFCSys。KFCSys借助内核动态插桩技术实时捕获这些内核函数的调用与返回信息,并通过方差倍数法来量化数据点偏离正常值的程度来预警性能下降。从而根据延迟敏感型应用的实时状态,调度器动态调整资源分配,以满足延迟敏感型应用的资源需求。实验结果表明,KFCMetric在资源瓶颈检测方面的准确率达到96.7%,且在干扰检测上相比CPI的准确率最高提升31.4%。在资源管理方面,KFCSys相较于PARTIES最高降低了30.4%的平均延迟,同时比Linux 资源管理系统降低了40.7%。此外,在性能违例时间占比上,KFCSys较PARTIES减少了10%。因此,在硬件或软件支持不可用的情况下,KFCMetric基于内核函数调用频率特征,为性能监测与资源管理提供了一种新颖且稳健的解决方案。

       

      Abstract: Improving resource utilization is important for cloud service providers to achieve cost reduction and efficiency, as the cost of construction and operating costs in data centers is extremely high. However, existing metrics such as benchmarks are often scenario-specific and lack adaptability. Some metrics, such as simulated or pre-execution solutions are struggle to cope with sudden workload fluctuations. Therefore, cloud service providers urgently require a real-time performance metric for monitoring performance and managing resources. Existing real-time metrics often rely on software or hardware support, and most fail to accurately identify resource bottlenecks. To address this challenge, this paper proposes KFCMetric (metric based on kernel function call frequencies), a performance metric based on the frequency of kernel function calls. KFCMetric employs static kernel analysis techniques to generate kernel function call graphs and utilizes a random forest algorithm to select kernel functions that exhibit a strong correlation with application performance and resource status. The behavior of these kernel can dynamically reflect the runtime resource state of applications and accurately pinpoint the root causes of performance bottlenecks. Building on KFCMetric, the paper further introduces a resource management system, KFCSys. Using dynamic kernel instrumentation techniques, KFCSys captures the calls and returns of these kernel functions and uses a variance-based scaling algorithm to measure the deviation of data points from normal. Based on the real-time status of latency-critical applications, KFCSys dynamically adapts resource allocation to meet the resource requirements of latency-critical applications, ensuring their service quality. Experimental results indicate that KFCMetric achieves an accuracy of 96.7% in detecting resource bottlenecks, and its interference detection accuracy shows up to a 31.4% improvement over CPI. In terms of resource management, KFCSys reduces the average tail latency by up to 30.4% compared to PARTIES, and by 40.7% compared to the standard Linux resource management system. Moreover, KFCSys decreases the proportion of performance violation time by 10% relative to PARTIES. Therefore, in scenarios where hardware or software support is not available, KFCMetric, constructed from kernel function features, is a novel and robust performance metric for performance monitoring and resource management.

       

    /

    返回文章
    返回