基于查询编译的SQL执行技术研究进展

潘青峰; 徐辰

doi:10.7544/issn1000-1239.202330507

基于查询编译的SQL执行技术研究进展

潘青峰,
徐辰

Advances in SQL Execution Techniques Based on Query Compilation

摘要

摘要: 信息系统通常会借助数据管理系统来进行数据管理，其中SQL凭借良好的易用性和灵活性一直作为数据管理的主流查询语言，用户将编写的SQL语句交由数据管理系统执行后便可得到查询结果. 执行模型的高效与否决定了系统能否快速响应用户的查询请求，现有执行模型主要采用解释执行和编译执行2种方式. 解释执行具有良好的拓展性、可维护性等因而被大多数系统采用. 不同于解释执行，编译执行为原本需要解释执行的查询生成高效的定制化代码来加速查询，带来的显著性能提升吸引了一众数据管理系统开始实现相应技术. 然而，如何针对查询生成其对应的定制化代码是一个复杂的过程，在实现时需要考虑诸多方面，甚至在某些情况下，采用编译执行的查询性能可能还不及传统的火山模型. 从概念、技术等角度系统地综述了编译执行技术的研究进展. 首先，概述了编译执行的基本概念，对相关术语和背景知识进行了介绍；其次，分别从中间代码生成、中间表示、机器码生成与运行3个角度介绍了相关技术；最后，结合当前数据管理系统的研究趋势以及近期研究工作展望了编译执行未来的发展方向.

Abstract: Information systems usually use data management systems to manage data, among which SQL has been the mainstream query language for data management because of its ease of use and flexibility, and users can write SQL statements and submit them to the data management system to get query results. The efficiency of the execution model determines whether the system can quickly respond to user queries. The existing execution models mainly adopt interpreted execution and compiled execution. Interpreted execution is used by most systems due to its scalability and maintainability. Unlike interpreted execution, compiled execution generates efficient custom code to speed up queries that should have been processed by interpreted execution, and the significant performance gains have attracted a number of database systems to implement the technology. However, generating the corresponding custom code for a query is a complex process that requires a number of considerations, even in some cases, the performance of using compiled execution may not be as good as the traditional volcano model. We provide a systematic review of the progress of compiled execution techniques from conceptual and technical perspectives. Firstly, we outline the basic concepts of query compilation and introduce the relevant terminology and background knowledge. Secondly, we introduce the relevant techniques from three perspectives: intermediate code generation, intermediate representation, machine code generation and running. Finally, we look at the future development direction of compiled execution technology in the context of current research trends in data management systems and recent research work.

HTML全文

参考文献(62)

施引文献

资源附件(0)