Abstract:
Cardinality estimation is the basis and core of query optimizer for the database management system (DBMS). With the development of artificial intelligence (AI) technology, AI technology has shown superior performance in data processing and extracting the relationship from the data. In recent years, the research of the cardinality estimation method based on machine learning has made significant progress and received wide attention from the academic community. Firstly, we introduce the technical background and development status of cardinality estimation methods based on machine learning. Secondly, we give the definition and the feature encoding technology of the related concepts of cardinality estimation. Then, we expound on the classification structure of cardinality estimation technology from two aspects: traditional cardinality estimation and cardinality estimation based on machine learning. Then, we further subdivide cardinality estimation based on machine learning into three types of cardinality estimation techniques: query-driven, data-driven, and hybrid models. Then, we focus on analyzing the modeling flow, typical methodologies, and characteristics of each type of model. In addition, we analyze and summarize the application of cardinality estimation in SQL and NoSQL. Finally, we discuss the challenges and future research directions on cardinality estimation methods based on machine learning.