面向增量分类的多示例学习

魏秀参; 徐书林; 安鹏; 杨健

doi:10.7544/issn1000-1239.20220071

摘要: 近年来多示例学习(multi-instance learning, MIL)被广泛应用于复杂数据问题中，但现有的多示例学习算法往往在封闭静态环境中工作良好，其所处理的类别数量也恒定不变.然而在现实应用当中，常会有新的类别不断地加入到系统当中，例如科学的发展中不断出现新的议题、社交媒体中不断出现新的话题.由于存储限制或保密协议等原因，旧数据可能随着时间的发展变得不可见，这使得直接学习新的类别时模型会忘记曾经学过的知识.增量学习则被用于解决上述问题.因此，在多示例学习设定下进行增量数据挖掘十分有意义，然而目前针对多示例学习下的增量数据挖掘的工作十分稀少.提出一个基于注意力机制和原型分类器映射的多示例增量数据挖掘方法，通过注意力机制选择性地将多示例包的示例汇合为统一的特征表示，然后为每个类别生成类别原型表示并存储下来.类别原型通过原型分类器映射模块得到无偏鲁棒的类别分类器，并通过上一个增量阶段生成的分类器的预测结果对新增量阶段生成的分类器的预测结果进行知识蒸馏，使得模型能够在多示例学习下以极低的存储很好地保留模型的旧知识.实验结果表明：提出的方法能够有效地进行面向增量分类的多示例学习.

Abstract: In recent years, multi-instance learning (MIL) has been widely used in complicated data problems, but the existing MIL methods often study a fixed number of categories in a closed environment. However, in real applications, novel categories are constantly added to the system, such as the continuous emergence of new topics in the development of science or social media. Due to storage restrictions or confidentiality agreements, old data may become invisible over time, which makes the model forget the previously learned knowledge when directly learning new categories. Incremental learning is often used to deal with the aforementioned problems. The mining of multi-instance learning with incremental classes is very meaningful, but the current works on this is rare to be focused. We propose a novel multi-instance incremental data mining method based on both attention mechanism and prototype classifier mapping. Through the attention mechanism, the MIL bags are selectively merged into unified feature representations, which will be used to generate the corresponding storable category prototypes. Through the prototype classifier mapping, each category prototype is mapped into an unbiased and robust classifier. The prediction results of the classifier generated in the previous incremental stage are used to perform knowledge distillation on the prediction results of the classifier generated in novel incremental stages, so that the model can retain the old knowledge with very low storage under MIL. Experimental results on benchmarks of three different tasks show that our proposed method have achieved effective performance in MIL with incremental classes.

面向增量分类的多示例学习

Multi-Instance Learning with Incremental Classes