• 中国精品科技期刊
  • CCF推荐A类中文期刊
  • 计算领域高质量科技期刊T1类
高级检索

机器学习方法赋能系统软件:挑战、实践与展望

唐楚哲, 王肇国, 陈海波

唐楚哲, 王肇国, 陈海波. 机器学习方法赋能系统软件:挑战、实践与展望[J]. 计算机研究与发展, 2023, 60(5): 964-973. DOI: 10.7544/issn1000-1239.202330127
引用本文: 唐楚哲, 王肇国, 陈海波. 机器学习方法赋能系统软件:挑战、实践与展望[J]. 计算机研究与发展, 2023, 60(5): 964-973. DOI: 10.7544/issn1000-1239.202330127
Tang Chuzhe, Wang Zhaoguo, Chen Haibo. Empowering System Software with Machine Learning Methods: Challenges, Practice, and Prospects[J]. Journal of Computer Research and Development, 2023, 60(5): 964-973. DOI: 10.7544/issn1000-1239.202330127
Citation: Tang Chuzhe, Wang Zhaoguo, Chen Haibo. Empowering System Software with Machine Learning Methods: Challenges, Practice, and Prospects[J]. Journal of Computer Research and Development, 2023, 60(5): 964-973. DOI: 10.7544/issn1000-1239.202330127
唐楚哲, 王肇国, 陈海波. 机器学习方法赋能系统软件:挑战、实践与展望[J]. 计算机研究与发展, 2023, 60(5): 964-973. CSTR: 32373.14.issn1000-1239.202330127
引用本文: 唐楚哲, 王肇国, 陈海波. 机器学习方法赋能系统软件:挑战、实践与展望[J]. 计算机研究与发展, 2023, 60(5): 964-973. CSTR: 32373.14.issn1000-1239.202330127
Tang Chuzhe, Wang Zhaoguo, Chen Haibo. Empowering System Software with Machine Learning Methods: Challenges, Practice, and Prospects[J]. Journal of Computer Research and Development, 2023, 60(5): 964-973. CSTR: 32373.14.issn1000-1239.202330127
Citation: Tang Chuzhe, Wang Zhaoguo, Chen Haibo. Empowering System Software with Machine Learning Methods: Challenges, Practice, and Prospects[J]. Journal of Computer Research and Development, 2023, 60(5): 964-973. CSTR: 32373.14.issn1000-1239.202330127

机器学习方法赋能系统软件:挑战、实践与展望

基金项目: 国家自然科学基金项目(61925206, 62272304, 62132014)
详细信息
    作者简介:

    唐楚哲: 1996年生. 博士研究生. CCF学生会员. 主要研究方向为并行和分布式数据库系统

    王肇国: 1986年生. 博士. 上海交通大学软件学院长聘教轨副教授. CCF会员. 主要研究方向为并行和分布式数据库系统的基础理论和系统构建

    陈海波: 1982年生. 博士. 上海交通大学软件学院特聘教授. IEEE fellow,CCF杰出会员. 主要研究方向为操作系统、分布式系统与系统安全

    通讯作者:

    陈海波(haibochen@sjtu.edu.cn

  • 中图分类号: TP31

Empowering System Software with Machine Learning Methods: Challenges, Practice, and Prospects

Funds: This work was supported by the National Natural Science Foundation of China (61925206, 62272304, 62132014).
More Information
    Author Bio:

    Tang Chuzhe: born in 1996. PhD candidate. Student member of CCF. His main research interests include parallel and distributed database systems

    Wang Zhaoguo: born in 1986. PhD. Tenure-track associate professor at the School of Software, Shanghai Jiao Tong University. Member of CCF. His main research interests include the fundamental theory and system building of parallel and distributed database systems

    Chen Haibo: born in 1982. PhD. Distinguished professor at the School of Software, Shanghai Jiao Tong University. IEEE fellow, distinguished member of CCF. His main research interests include operation systems, distributed systems, and systems security

  • 摘要:

    机器学习方法为构建系统软件带来了新的机遇. 为充分利用硬件资源支撑新型应用,系统软件的设计与实现需要不断改进与演化,以适应不同场景的需求. 机器学习方法具有从数据中提取规律并自动优化系统性能的潜力. 然而,使用机器学习方法赋能系统软件面临一些挑战,包括设计面向系统软件的定制化模型、获取足量且高质量的训练数据、降低模型开销对系统性能的影响,以及消除模型误差对系统正确性的影响等. 介绍了上海交通大学并行与分布式系统研究所在索引结构、键值存储系统、并发控制协议等方面应用机器学习方法优化系统软件的实践,并从模型设计、系统集成和实践者自身知识储备等方面总结了经验与教训. 此外,还回顾了国内外相关研究,并对此研究方向提出了展望与建议,希望为未来的研究提供参考与帮助.

    Abstract:

    Machine learning methods have brought new opportunities for building system software that fully utilizes hardware resources to support emerging applications. However, in order to adapt to the demands of various application scenarios, system software design and implementation need continuous improvement and evolution. Meanwhile, machine learning methods have the potential to extract patterns from data and automatically optimize system performance. Despite this potential, applying machine learning methods to empower system software faces several challenges, such as customizing models for system software, obtaining training data with sufficient quality and quantity, reducing the impact of model execution costs on system performance, and avoiding the hindrance of model errors on system correctness. We present the practical experience of the Institute of Parallel and Distributed Systems (IPADS) at Shanghai Jiao Tong University in applying machine learning methods to optimize system software for index structures, key-value storage systems, and concurrency control protocols. The lessons learned from the practice in model design, system integration, and practitioner knowledge are summarized. Additionally, we briefly review relevant research at home and abroad, and propose prospects and suggestions for this line of research, including collaboration between systems and machine learning experts, building modular, reusable system prototypes, and exploring model optimization techniques dedicated to systems context. The aim is to offer references and help for future work.

  • 图  1   机器学习模型代替索引结构的示意图

    Figure  1.   Illustration of replacing index structures with machine learning models

    图  2   将数据排序并连续存放可简化被拟合函数

    Figure  2.   Sorting data and storing date continuously simplify the approximated function

    图  3   通过两阶段压缩更新索引可避免数据一致性问题

    Figure  3.   Updating indexes with two-phase compaction avoids data consistency issue

    图  4   在类TPC-C负载下XIndex可扩展性良好且性能优于现有系统

    Figure  4.   XIndex achieves good scalability in a TPC-C style workload and outperforms state-of-the-art baselines

    图  5   RDMA键值存储系统架构

    Figure  5.   Architectures of RDMA-based key-value stores

    图  6   XStore系统架构

    Figure  6.   The architecture of XStore

    图  7   在只读和读写负载下XStore性能均优于现有系统

    Figure  7.   XStore outperforms state-of-the-art baselines in both read-only workload and read-write workload

    图  8   Polyjuice使用的并发控制策略表格表示方式

    Figure  8.   Concurrency control policy table representation used in Polyjuice

    图  9   Polyjuice用遗传算法为特定工作负载训练并发控制策略

    Figure  9.   Polyjuice uses the evolutionary algorithm to train concurrency control policies for specific workloads

    图  10   不同工作负载下Polyjuice都能优于现有并发控制协议

    Figure  10.   Polyjuice outperforms existing concurrency control protocols under different workloads

    图  11   Polyjuice能更高效地交错执行TPC-C负载的事务

    Figure  11.   Polyjuice can interleave transactions more efficiently in the TPC-C workload

  • [1]

    He Kaiming, Zhang Xiangyu, Ren Shaoqing, et al. Deep residual learning for image recognition [C] //Proc of the 2016 IEEE Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2016: 770−778

    [2]

    Silver D, Schrittwieser J, Simonyan K, et al. Mastering the game of Go without human knowledge[J]. Nature, 2017, 550(7676): 354−359 doi: 10.1038/nature24270

    [3]

    Silver D, Huang A, Maddison C, et al. Mastering the game of Go with deep neural networks and tree search[J]. Nature, 2016, 529(7587): 484−489 doi: 10.1038/nature16961

    [4]

    Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need [C] //Advances in Neural Information Processing Systems 30. Red Hook, NY: Curran Associates Inc. , 2017: 6000−6010

    [5]

    Radford A, Narasimhan K, Salimans, T, et al. Improving language understanding by generative pre-training [R/OL]. San Francisco, CA: OpenAI, 2018 [2023-02-27].https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf

    [6]

    Radford A, Wu J, Child R, et al. Language models are unsupervised multitask learners [R/OL]. San Francisco, CA: OpenAI, 2019 [2023-02-27].https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf

    [7]

    Brown T B, Mann B, Ryder N, et al. Language models are few-shot learners [C] //Advances in Neural Information Processing Systems 33. Red Hook, NY: Curran Associates Inc. , 2020: 1877−1901

    [8]

    Lampson B W. Hints for computer system design [C] //Proc of the 9th ACM Symp on Operating Systems Principles. New York: ACM, 1983: 33−48

    [9]

    Lampson B W. Hints and principles for computer system design [R/OL]. Ithaca, NY: CoRR, 2020 [2023-02-27].https://arxiv.org/abs/2011.02455

    [10]

    Hornik K, Stinchcombe M, White H. Multilayer feedforward networks are universal approximators[J]. Neural Networks, 1989, 2(5): 359−366 doi: 10.1016/0893-6080(89)90020-8

    [11]

    Kraska T, Beutel A, Chi E H, et al. The case for learned index structures [C] //Proc of the 2018 Int Conf on Management of Data. New York: ACM, 2018: 489−504

    [12]

    Tang Chuzhe, Wang Youyun, Dong Zhiyuan, et al. XIndex: A scalable learned index for multicore data storage [C] //Proc of the 25th ACM SIGPLAN Symp on Principles and Practice of Parallel Programming. New York: ACM, 2020: 308−320

    [13]

    Wang Youyun, Tang Chuzhe, Wang Zhaoguo, et al. SIndex: A scalable learned index for string keys [C] //Proc of the 11th ACM SIGOPS Asia-Pacific Workshop on Systems. New York: ACM, 2020: 17−24

    [14]

    Wang Zhaoguo, Chen Haibo, Wang Youyun, et al. The concurrent learned indexes for multicore data storage[J]. ACM Transactions on Storage, 2022, 18(1): 1−35

    [15] 陈游旻,陆游游,罗圣美,等. 基于RDMA的分布式存储系统研究综述[J]. 计算机研究与发展,2019,56(2):227−239 doi: 10.7544/issn1000-1239.2019.20170849

    Chen Youmin, Lu Youyou, Luo Shengmei, et al. Survey on RDMA-based distributed storage systems[J]. Journal of Computer Research and Development, 2019, 56(2): 227−239 (in Chinese) doi: 10.7544/issn1000-1239.2019.20170849

    [16]

    Wei Xingda, Chen Rong, Chen Haibo, et al. 2021. XStore: Fast RDMA-based ordered key-value store using remote learned cache[J]. ACM Transactions on Storage, 2021, 17(3): 1−32

    [17]

    Wei Xingda, Chen Rong, Chen Haibo. Fast RDMA-based ordered key-value store using remote learned cache [C] //Proc of the 14th USENIX Symp on Operating Systems Design and Implementation. Berkeley, CA: USENIX Association, 2020: 117−135

    [18]

    Wang Jiachen, Ding Ding, Wang Huan, et al. Polyjuice: High-performance transactions via learned concurrency control [C] //Proc of the 15th USENIX Symp on Operating Systems Design and Implementation. Berkeley, CA: USENIX Association, 2021: 198−216

    [19]

    Tu S, Zheng Wenting, Kohler E, et al. Speedy transactions in multicore in-memory databases [C] //Proc of the 24th ACM Symp on Operating Systems Principles. New York: ACM, 2013: 18−32

    [20]

    Mao Hongzi, Schwarzkopf M, Venkatakrishnan S B, et al. Learning scheduling algorithms for data processing clusters [C] //Proc of the ACM Special Interest Group on Data Communication. New York: ACM, 2019: 270−288

    [21]

    Kristo A, Vaidya K, Çetintemel U, et al. The case for a learned sorting algorithm [C] //Proc of the 2020 ACM SIGMOD Int Conf on Management of Data. New York: ACM, 2020: 1001−1016

    [22]

    Marcus R, Negi P, Mao Hongzi, et al. Neo: A learned query optimizer[J]. Proceedings of the VLDB Endowment, 2019, 12(11): 1705−1718 doi: 10.14778/3342263.3342644

    [23]

    Marcus R, Negi P, Mao Hongzi, et al. Bao: Making learned query optimization practical [C] //Proc of the 2021 Int Conf on Management of Data. New York: ACM, 2021: 1275−1288

    [24]

    Kraska T, Alizadeh M, Beutel A, et al. SageDB: A learned database system [C/OL] //Proc of the 9th Biennial Conf on Innovative Data Systems Research. 2019 [2023-02-27].https://www.cidrdb.org/cidr2019/papers/p117-kraska-cidr19.pdf

    [25]

    Ding Jialin, Marcus R, Kipf A, et al. SageDB: An instance-optimized data analytics system[J]. Proceedings of the VLDB Endowment, 2022, 15(13): 4062−4078 doi: 10.14778/3565838.3565857

    [26]

    Li Pengfei, Hua Yu, Jia Jingnan, et al. FINEdex: A fine-grained learned index scheme for scalable and concurrent memory systems[J]. Proceedings of the VLDB Endowment, 2021, 15(2): 321−334 doi: 10.14778/3489496.3489512

    [27]

    Dai Yifan, Xu Yien, Ganesan A, et al. From WiscKey to Bourbon: A learned index for log-structured merge trees [C] //Proc of the 14th USENIX Conf on Operating Systems Design and Implementation. Berkeley, CA: USENIX Association, 2020: 155−171

    [28]

    Li Pengfei, Hua Yu, Zuo Pengfei, et al. ROLEX: A scalable RDMA-oriented learned key-value store for disaggregated memory systems [C] //Proc of the 21st USENIX Conf on File and Storage Technologies. Berkeley, CA: USENIX Association, 2023: 99−114

    [29] 孟小峰,马超红,杨晨. 机器学习化数据库系统研究综述[J]. 计算机研究与发展,2019,56(9):1803−1820 doi: 10.7544/issn1000-1239.2019.20190446

    Meng Xiaofeng, Ma Chaohong, Yang Chen. Survey on machine learning for database systems[J]. Journal of Computer Research and Development, 2019, 56(9): 1803−1820 (in Chinese) doi: 10.7544/issn1000-1239.2019.20190446

图(11)
计量
  • 文章访问数:  0
  • HTML全文浏览量:  0
  • PDF下载量:  0
  • 被引次数: 0
出版历程
  • 收稿日期:  2023-02-27
  • 修回日期:  2023-03-29
  • 网络出版日期:  2023-04-09
  • 刊出日期:  2023-05-11

目录

    /

    返回文章
    返回