基于机器学习的基数估计技术综述

岳文静; 屈稳稳; 林宽; 王晓玲

doi:10.7544/issn1000-1239.202220649

基于机器学习的基数估计技术综述

Survey of Cardinality Estimation Techniques Based on Machine Learning

摘要

摘要: 基数估计是数据库关系系统查询优化器的基础和核心. 随着人工智能技术的发展，其在数据处理、提取数据之间的关系等方面显现出优越的性能. 近年来，基于机器学习的基数估计技术取得了显著的进展，受到了学术界的广泛关注. 首先总结了基于机器学习的技术估计技术的发展现状，其次给出了基数估计的相关概念及其特征编码技术.接着建立了基数估计技术的分类体系.在此基础上，进一步将基于机器学习的基数估计技术细分为查询驱动、数据驱动和混合模型这3类基数估计技术.然后重点分析了每一类技术的建模流程、典型技术和模型特点，并对其在SQL和NoSQL中的应用进行了分析和总结.最后讨论了基于机器学习的基数估计技术面临的挑战和未来的研究方向。

Abstract: Cardinality estimation is the basis and core of query optimizer for the database management system (DBMS). With the development of artificial intelligence (AI) technology, AI technology has shown superior performance in data processing and extracting the relationship from the data. In recent years, the research of the cardinality estimation method based on machine learning has made significant progress and received wide attention from the academic community. Firstly, we introduce the technical background and development status of cardinality estimation methods based on machine learning. Secondly, we give the definition and the feature encoding technology of the related concepts of cardinality estimation. Then, we expound on the classification structure of cardinality estimation technology from two aspects: traditional cardinality estimation and cardinality estimation based on machine learning. Then, we further subdivide cardinality estimation based on machine learning into three types of cardinality estimation techniques: query-driven, data-driven, and hybrid models. Then, we focus on analyzing the modeling flow, typical methodologies, and characteristics of each type of model. In addition, we analyze and summarize the application of cardinality estimation in SQL and NoSQL. Finally, we discuss the challenges and future research directions on cardinality estimation methods based on machine learning.

HTML全文

参考文献(52)

施引文献

资源附件(0)