基于机器学习的并行文件系统性能预测
Predicting the Parallel File System Performance via Machine Learning
-
摘要: 并行文件系统能有效解决高性能计算系统的海量数据存储和I/O瓶颈问题.由于影响系统性能的因素十分复杂,如何有效地评估系统性能并对性能进行预测成为一个潜在的挑战和热点.以并行文件系统的性能评估和预测作为研究目标,在研究文件系统的架构和性能因子后,设计了一个基于机器学习的并行文件系统预测模型,运用特征选择算法对性能因子数量进行约简,挖掘出系统性能和影响因子之间的特定的关系进行性能预测.通过设计大量实验用例,对特定的Lustre文件系统进行性能评估和预测.评估和实验结果表明:threads/OST、对象存储器(OSS)的数量、磁盘数目和RAID的组织方式是4个调整系统性能最重要因子,预测结果的平均相对误差能控制在25.1%~32.1%之间,具有较好预准确度.Abstract: Parallel file system can effectively solve the problems of massive data storage and I/O bottleneck. Because the potential impact on the system is not clearly understood, how to evaluate and predict performance of parallel file system becomes the potential challenge and hotspot. In this work, we aim to research the performance evaluation and prediction of parallel file system. After studying the architecture and performance factors of such file system, we design a predictive mode of parallel file system based on machine learning approaches. We use feature selection algorithms to reduce the number of performance factors to be tested in validating the performance. We also mine the particular relationship of system performance and impact factors to predict the performance of a specific file system. We validate and predict the performance of a specific Lustre file system through a series of experiment cases. Our evaluation and experiment results indicate that threads/OST, num of OSSs (Object Storage Server), num of disks and num and type of RAID are the four most important parameters to tune the performance of Lustre file system. The average relative errors of predictive results can be controlled within 25.1%—32.1%, which shows the better prediction accuracy.