面向特征演变环境的标记噪声鲁棒学习算法

张震宇; 姜远

doi:10.7544/issn1000-1239.202330238

面向特征演变环境的标记噪声鲁棒学习算法

张震宇,
姜远

Label Noise Robust Learning Algorithm in Environments Evolving Features

摘要

摘要: 在现实应用中，数据通常以流的形式不断积聚，数据的特征可能随时间而演变. 例如，在环境监测任务中，由于旧传感器达到使用寿命和新传感器的部署，数据特征可能会动态地消失或增加. 此外，除了可演变的特征空间，数据标记可能存在噪声. 当特征空间演变和数据标记带噪同时发生时，设计具有理论保障的学习算法，尤其是具备对算法泛化能力的理解是非常具有挑战性的. 为了应对这一挑战，提出了一种在特征演变环境中针对标记带噪数据的差异度量方法，称为容忍标记噪声的演变差异. 该差异度量启发了泛化误差分析，并根据泛化误差的理论分析设计了一种基于深度神经网络实现的学习算法. 合成数据上的实证研究验证了所提差异度量的合理性，而在现实应用任务上的实验则验证了所提算法的有效性.

Abstract: In real-world applications, data are often collected in the form of a stream, with features that can evolve over time. For instance, in the environmental monitoring task, features can be dynamically vanished or augmented due to the existence of expired old sensors and deployed new sensors. Additionally, besides the evolvable feature space, the labels potentially contain noise. When feature space evolves and data conceal inaccurate labels at the same time, it is quite challenging to design algorithms with guarantees, particularly theoretical understandings of generalization ability. To address this difficulty, we propose a new discrepancy measure for noisy labeled data with evolving feature space, named the label noise robust evolving discrepancy. Using this measure, we present the generalization error analysis, and the theory motivates the design of a learning algorithm which is further implemented by deep neural networks. Empirical studies on synthetic data confirm the rationale of our discrepancy measure and extensive experiments on real-world tasks validate the effectiveness of our algorithm.

HTML全文

参考文献(41)

施引文献

资源附件(0)