ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2021, Vol. 58 ›› Issue (9): 1925-1950.doi: 10.7544/issn1000-1239.2021.20200209

• 人工智能 • 上一篇    下一篇

基于深度学习的数据库自然语言接口综述

潘璇1,3,徐思涵1,3,蔡祥睿2,3,温延龙1,3,袁晓洁2,3   

  1. 1(南开大学计算机学院 天津 300350);2(南开大学网络空间安全学院 天津 300350);3(天津市网络与数据安全技术重点实验室(南开大学) 天津 300350) (panxuan@dbis.nankai.edu.cn)
  • 出版日期: 2021-09-01
  • 基金资助: 
    国家自然科学基金重点项目(U1936206);国家自然科学基金项目(U1836109,U1903128);国家自然科学基金面上项目(61772289,62077031);国家自然科学基金青年科学基金项目(62002178);天津市自然科学基金项目(20JCQNJC01730)

Survey on Deep Learning Based Natural Language Interface to Database

Pan Xuan1,3, Xu Sihan1,3, Cai Xiangrui2,3, Wen Yanlong1,3, Yuan Xiaojie2,3   

  1. 1(College of Computer Science, Nankai University, Tianjin 300350);2(College of Cyber Science, Nankai University, Tianjin 300350);3(Tianjin Key Laboratory of Network and Data Security Technology (Nankai University), Tianjin 300350)
  • Online: 2021-09-01
  • Supported by: 
    This work was supported by the Key Program of the National Natural Science Foundation of China (U1936206), the National Natural Science Foundation of China (U1836109, U1903128), the General Program of the National Natural Science Foundation of China (61772289, 62077031), the National Natural Science Foundation of China for Young Scientists (62002178), and the Natural Science Foundation of Tianjin (20JCQNJC01730).

摘要: 数据库自然语言接口(natural language interface to database, NLIDB)能够凭借自然语言描述实现数据库查询操作,是促进用户无障碍地与数据库交互的重要工具.因为NLIDB具有较高的应用价值,近年来一直受到学术与商业领域的关注.目前成熟的NLIDB系统大部分基于经典自然语言处理方法,即通过指定的规则实现自然语言查询到结构化查询的转化.但是基于规则的方法仍然存在拓展性不强的缺陷.深度学习方法具有分布式表示和深层次抽象表示等优势,能深入挖掘自然语言中潜在的语义特征.因此近年来在NLIDB中,引入深度学习技术成为了热门的研究方向.针对基于深度学习的NLIDB研究进展进行总结:首先以解码方法为依据,将现有成果归纳为4种类型分别进行分析;然后汇总了7种模型中常用的辅助方法;最后根据目前尚待解决的问题,提出未来仍需关注的研究方向.

关键词: 自然语言接口, 数据库, SQL, 深度学习, 语义分析

Abstract: NLIDB (natural language interface to database) provides a new form to access databases with barrier-free text query, which reduces the burdens for users to learn the SQL (structured query language). Because of its great application value, NLIDB has attracted much attention in the field of scientific research and commercial in recent years. Most of the current mature NLIDB systems are based on classical natural language processing technologies, which depend on rule-based approaches to realize the transformation from natural language questions to SQL. But these approaches have poor ability to generalize. Deep learning models have advantages on distributed and high-level representation learning, which are competent for semantic feature mining in natural language. Therefore, the application of deep learning technology in NLIDB has gradually become a hot research topic nowadays. This paper provides a systematic review of the NLIDB researches based on deep learning in recent years. The main contributions are as follows: firstly, according to the decoding method, we sort out existing deep learning-based NLIDB models into 4 categories, and state the advantage and the weakness respectively; secondly, we summarize 7 common assist techniques in the implementations of the NLIDB models; thirdly, we propose the problems remaining to be solved and put forward the relevant directions for future researches.

Key words: natural language interface, database, SQL, deep learning, semantic parsing

中图分类号: