连续学习研究进展

韩亚楠; 刘建伟; 罗雄麟

doi:10.7544/issn1000-1239.20201058

连续学习研究进展

(中国石油大学(北京)信息科学与工程学院北京 102249) (857182813@qq.com)

基金项目: 中国石油大学(北京)科研基金项目(2462020YXZZ023)

详细信息

中图分类号: TP391
计量
- 文章访问数: 1921
- HTML全文浏览量: 38
- PDF下载量: 1155
出版历程
- 发布日期: 2022-05-31

Research Progress of Continual Learning

(College of Information Science and Engineering, China University of Petroleum(Beijing), Beijing 102249)

Funds: This work was supported by the Research Fund of China University of Petroleum(Beijing) (2462020YXZZ023).

摘要

摘要: 近年来，随着信息技术的不断发展，各种数据呈现爆炸式的增长，传统的机器学习算法只有当测试数据与训练数据分布类似时，学习算法才能取得较好的性能，换句话说，它们不能在动态环境中连续自适应地学习，然而，这种自适应学习的能力却是任何智能系统都具备的特性.深度神经网络在许多应用中显示出最好的学习能力，然而，使用该方法对数据进行增量更新学习时，会面临灾难性的干扰或遗忘问题，导致模型在学习新任务之后忘记如何解决旧任务.连续学习(continual learning, CL)的研究使这一问题得到缓解.连续学习是模拟大脑学习的过程，按照一定的顺序对连续非独立同分布的(independently and identically distributed, IID)流数据进行学习，进而根据任务的执行结果对模型进行增量式更新.连续学习的意义在于高效地转化和利用已经学过的知识来完成新任务的学习，并且能够极大程度地降低遗忘带来的问题.连续学习研究对智能计算系统自适应地适应环境改变具有重要的意义.基于此，系统综述了连续学习的研究进展，首先概述了连续学习的定义，介绍了无遗忘学习、弹性权重整合和梯度情景记忆3种典型的连续学习模型，并对连续学习存在的关键问题及解决方法进行了介绍，之后又对基于正则化、动态结构和记忆回放互补学习系统的3类连续学习模型进行了分类和阐述，并在最后指明了连续学习进一步研究中需要解决的问题以及未来可能的发展方向.
- 连续学习 /
- 灾难性遗忘 /
- 增量学习 /
- 正则化 /
- 动态结构 /
- 记忆回放
Abstract: In recent years, with the continuous development of information technology, all kinds of data have shown explosive growth. Traditional machine learning algorithms can only achieve better performance when the distribution of testing data and training data is similar. In other words, it is impossible to continuously and adaptively learn in dynamic environment. However, this ability that can learn adaptively in dynamic environment is very important for any intelligent systems. Deep neural networks have shown the best learning ability in many applications. However, when we apply these methods to incrementally update the model parameters, the model would face catastrophic interference or forgetting problems, which can cause the model to forget the old knowledge after learning a new task. The research of continual learning alleviates this problem. Continual learning is a process of simulating brain learning. It learns continual non-independent and identically distributed data streams in a certain order, and incrementally updates the model according to the results of task. The significance of continual learning is to efficiently transform and use the knowledge that has been learned to complete the learning of new tasks, and to greatly reduce the problems caused by forgetting. The study of continuous learning is of great significance for intelligent computing systems to adaptively learn changes in the environment. In view of the application value, theoretical significance and future development potential of continual learning, the article systematically reviews the research progress of continual learning. Firstly, this paper outlines the definition of continual learning. Three typical continual learning models are introduced, namely learning without forgetting, elastic weight consolidation and gradient episodic memory. Then, the key problems and solutions of continual learning are also introduced. After that, the three types of methods based on regularization, dynamic framework, memory replay and complementary learning systems have been introduced. At last, this paper points out potential challenges and future directions in the field of continual learning.
- continual learning(CL) /
- catastrophic forgetting /
- incremental learning /
- regularization /
- dynamic framework /
- memory replay

HTML全文

在实现分布式数据库的技术方案上，业界存在不同的选择. 第一种方式需要对应用系统进行拆分，通过分库分表将原本单个数据库管理的数据分散到多个集中式数据库. 分库分表方案要求应用系统重构，跨库访问效率较低，关系数据库的重要功能，如外键、全局唯一性约束、全局索引等无法使用. 第二种方式是对传统集中式关系数据库进行分布式改造，增加分布式事务处理，小规模集群部署下的自动故障恢复等功能. 这类分布式数据库由于存储系统、事务处理和SQL优化器等源自集中式架构，在分布式场景下面临功能和性能上的诸多限制. 第三种方式是从头开始设计和实现一个原生分布式关系数据库，将分布式作为基本特性融入存储系统、事务处理和SQL优化器等关键组件. 相比前两种方案，原生分布式数据库在高可用、数据一致性、事务性能、弹性伸缩、快速无损的故障恢复等方面有着更大的优势.

OceanBase是一个从头开始设计与实现的分布式关系数据库系统. OceanBase因淘宝而诞生，因支付宝而发展和壮大，如今已在金融、政务、通信和互联网等领域得到广泛应用. 由OceanBase首席科学家阳振坤领衔的分布式数据库研发团队实现了多项技术创新和突破，该团队撰写的论文“OceanBase分布式关系数据库架构与技术”介绍了OceanBase的分布式架构，分布式事务处理、存储引擎、SQL优化、多租户机制等关键技术，具体总结如下：

1）设计了强一致、高可用、可扩展的分布式事务处理机制，实现了单机/单机房故障的自动、无损、快速的故障恢复；

2）提出了单机/分布式一体化关系数据库架构，实现了关系数据库容量和处理能力从单机数据库到分布式数据库的无缝切换和伸缩；

3）实现了关系数据库的性能无损的高倍率数据压缩，论文实验展示了数据压缩倍率是主流关系数据库的3倍甚至更高；

4）实现了单数据库系统同时支持高性能事务处理和实时分析处理，典型场景的事务处理性能和分析处理性能都高于MySQL.

OceanBase是迄今为止唯一同时获得了TPC-C和TPC-H性能榜首的数据库. 尽管关系数据库的提出已经过去了半个世纪之久，真正意义上的分布式关系数据库时代才刚刚开始，论文不仅展示了OceanBase采用的分布式数据库关键技术，也对未来分布式数据库的发展方向提出了展望. 我相信，这篇论文能引发很多关于数据库发展方向的思考，对于从事相关研究和开发的工程技术人员和数据库应用领域的专业人士都有重要的参考价值.

评述专家

周傲英，教授，博士生导师. 主要研究方向为Web数据管理、数据密集型计算、内存集群计算、分布事务处理、大数据基准测试和性能优化.

亮点论文

阳振坤，杨传辉，韩富晟，王国平，杨志丰，成肖君. OceanBase分布式关系数据库架构与技术[J]. 计算机研究与发展，2024，61（3）：540−554. DOI:10.7544/issn1000-1239.202330835

参考文献(0)

施引文献(5)

期刊类型引用(2)

1.	樊青龙，耿磊，邓亚明，梁志斌，张敬文，董刚. 五举煤业智能洗选综合管控平台设计与应用. 选煤技术. 2025(01): 64-74 . 百度学术
2.	陈秀丽. 分布式数据库系统在云计算环境中的数据一致性保障机制. 信息与电脑(理论版). 2024(08): 137-139 . 百度学术