Citation: | Zhang Zhenyu, Jiang Yuan. Label Noise Robust Learning Algorithm in Environments Evolving Features[J]. Journal of Computer Research and Development, 2023, 60(8): 1740-1753. DOI: 10.7544/issn1000-1239.202330238 |
In real-world applications, data are often collected in the form of a stream, with features that can evolve over time. For instance, in the environmental monitoring task, features can be dynamically vanished or augmented due to the existence of expired old sensors and deployed new sensors. Additionally, besides the evolvable feature space, the labels potentially contain noise. When feature space evolves and data conceal inaccurate labels at the same time, it is quite challenging to design algorithms with guarantees, particularly theoretical understandings of generalization ability. To address this difficulty, we propose a new discrepancy measure for noisy labeled data with evolving feature space, named the label noise robust evolving discrepancy. Using this measure, we present the generalization error analysis, and the theory motivates the design of a learning algorithm which is further implemented by deep neural networks. Empirical studies on synthetic data confirm the rationale of our discrepancy measure and extensive experiments on real-world tasks validate the effectiveness of our algorithm.
[1] |
Zhou Zhihua. Open-environment machine learning[J]. National Science Review, 2022, 9(8): nwac123 doi: 10.1093/nsr/nwac123
|
[2] |
Zhou Zhihua. A brief introduction to weakly supervised learning[J]. National Science Review, 2018, 5(1): 44−53 doi: 10.1093/nsr/nwx106
|
[3] |
Hou Bojian, Zhang Lijun, Zhou Zhihua. Learning with feature evolvable streams[C] //Advances in Neural Information Processing Systems 30. Cambridge, MA: MIT, 2017: 1416−1426
|
[4] |
Zhang Zhenyu, Zhao Peng, Jiang Yuan, et al. Learning with feature and distribution evolvable streams[C] //Proc of the 37th Int Conf on Machine Learning. New York: ACM, 2020: 11317−11327
|
[5] |
Cesa-Bianchi N, Dichterman E, Fischer P, et al. Sample-efficient strategies for learning in the presence of noise[J]. Journal of the ACM, 1999, 46(5): 684−719 doi: 10.1145/324133.324221
|
[6] |
Natarajan N, Dhillon I S, Ravikumar P K, et al. Learning with noisy labels[C] //Advances in Neural Information Processing Systems 26. Cambridge, MA: MIT, 2013: 1196−1204
|
[7] |
Song H, Kim M, Lee J G. Selfie: Refurbishing unclean samples for robust deep learning[C] //Proc of the 36th Int Conf on Machine Learning. New York: ACM, 2019: 5907−5915
|
[8] |
Ben-David S, Blitzer J, Crammer K, et al. Analysis of representations for domain adaptation[C] //Advances in Neural Information Processing Systems 19. Cambridge, MA: MIT, 2006: 137–144
|
[9] |
Mansour Y, Mohri M, Rostamizadeh A. Domain adaptation: Learning bounds and algorithms[C] //Proc of the 22nd Conf on Learning Theory. New York: ACM, 2009: 18–29
|
[10] |
Cortes C, Mohri M, Medina A M. Adaptation based on generalized discrepancy[J]. Journal of Machine Learning Research, 2019, 20(1): 1−30
|
[11] |
Dietterich T G. Steps Toward Robust Artificial Intelligence[J]. AI Magazine, 2017, 38(3): 3−24
|
[12] |
Zhou Zhihua. Learnware: On the future of machine learning[J]. Frontiers of Computer Science, 2016, 10(4): 589–590
|
[13] |
Guan S U, Li Shanchun. Incremental learning with respect to new incoming input attributes[J]. Neural Processing Letters, 2001, 14: 241−260 doi: 10.1023/A:1012799113953
|
[14] |
Zhang Qin, Zhang Peng, Long Guodong, et al. Online learning from trapezoidal data streams[J]. IEEE Transactions on Knowledge and Data Engineering, 2016, 28(10): 2709−2723 doi: 10.1109/TKDE.2016.2563424
|
[15] |
刘艳芳,李文斌,高阳. 基于被动-主动的特征演化流学习[J]. 计算机研究与发展,2021,58(8):1575−1585
Liu Yanfang, Li Wenbin, Gao Yang. Passive-aggressive learning with feature evolvable streams[J]. Journal of Computer Research and Development, 2021, 58(8): 1575−1585 (in Chinese)
|
[16] |
Hou Chenping, Zhou Zhihua. One-pass learning with incremental and decremental features[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 40(11): 2776−2792 doi: 10.1109/TPAMI.2017.2769047
|
[17] |
刘兆清,古仕林,侯臣平. 面向特征继承性增减的在线分类算法[J]. 计算机研究与发展,2022,59(8):1668−1682
Liu Zhaoqing, Gu shilin, Hou Chenping. Online classification algorithm with feature inheritably increasing and decreasing[J]. Journal of Computer Research and Development, 2022, 59(8): 1668−1682 (in Chinese)
|
[18] |
He Yi, Wu Baijun, Wu Di, et al. Online learning from capricious data streams: A generative approach[C] //Proc of the 28th Int Joint Conf on Artificial Intelligence. Macao, SAR China: Morgan Kautman, 2019: 2491–2497
|
[19] |
Beyazit E, Alagurajah J, Wu Xingdong. Online learning from data streams with varying feature spaces[C] //Proc of the 33rd AAAI Conf on Artificial Intelligence. Menlo Park: CA: AAAI, 2019: 3232−3239.
|
[20] |
Dong Jiahua, Cong Yang, Sun Gan, et al. Evolving metric learning for incremental and decremental features[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2021, 32(4): 2290−2302
|
[21] |
Hou Bojian, Zhang Lijun, Zhou Zhihua. Prediction with unpredictable feature evolution[J]. IEEE Transactions on Neural Networks and Learning Systems, 2021, 33(10): 5706−5715
|
[22] |
Angluin D, Laird P. Learning from noisy examples[J]. Machine Learning, 1988, 2: 343−370
|
[23] |
Aslam J A, Decatur S E. On the sample complexity of noise-tolerant learning[J]. Information Processing Letters, 1996, 57(4): 189−195 doi: 10.1016/0020-0190(96)00006-3
|
[24] |
Gao Wei, Wang Lu, Zhou Zhihua. Risk minimization in the presence of label noise[C] //Proc of the 30th AAAI Conf on Artificial Intelligence. Menlo Park, CA: AAAI, 2016: 1575−1581
|
[25] |
Arora S, Ge Rong, Moitra A. Learning topic models-going beyond SVD[C] //Proc of the 53rd IEEE Annual Symp on Foundations of Computer Science.Piscataway, NJ: IEEE, 2012: 1−10
|
[26] |
Liu Tongliang, Tao Dacheng. Classification with noisy labels by importance reweighting[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016, 38(3): 447−461 doi: 10.1109/TPAMI.2015.2456899
|
[27] |
Zhang Zhenyu, Zhao Peng, Jiang Yuan, et al. Learning from incomplete and inaccurate supervision[J]. IEEE Transactions on Knowledge and Data Engineering, 2022, 34(12): 5854−5868 doi: 10.1109/TKDE.2021.3061215
|
[28] |
Scott C, Blanchard G, Handy G. Classification with asymmetric label noise: Consistency and maximal denoising[C] //Proc of the 26th Conf on Learning Theory. Berlin: Springer, 2013: 489−511
|
[29] |
Ramaswamy H, Scott C, Tewari A. Mixture proportion estimation via kernel embeddings of distributions[C] //Proc of the 33rd Int Conf on Machine Learning. New York: ACM, 2016: 2052−2060
|
[30] |
Sugiyama M, Nakajima S, Kashima H, et al. Direct importance estimation with model selection and its application to covariate shift adaptation[C] //Advances in Neural Information Processing Systems. Cambridge, MA: MIT, 2007: 1433–1440
|
[31] |
Gretton A, Borgwardt K M, Rasch M J, et al. A kernel two-sample test[J]. Journal of Machine Learning Research, 2012, 13(1): 723−773
|
[32] |
Mohri M, Muñoz-Medina A. New analysis and algorithm for learning with drifting distributions[C] //Proc of the 23rd Int Conf on Algorithmic Learning Theory. Berlin: Springer, 2012: 124−138
|
[33] |
Menon A K, Rawat A S, Reddi S J, et al. Can gradient clipping mitigate label noise?[C/OL] //Proc of the 8th Int Conf on Learning Representations. 2020. https://openreview.net/forum?id=rklB76EKPr
|
[34] |
Han Bo, Yao Quanming, Yu Xingrui, et al. Co-teaching: Robust training of deep neural networks with extremely noisy labels[C] //Proc of Advances in Neural Information Processing Systems 31, Cambridge, MA: MIT, 2018: 8536−8546
|
[35] |
Kanamori T, Suzuki T, Sugiyama M. Theoretical analysis of density ratio estimation[J]. IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences, 2010, 93(4): 787−798
|
[36] |
Huang Jiayuan, Gretton A, Borgwardt K, et al. Correcting sample selection bias by unlabeled data[C] //Advances in Neural Information Processing Systems. Cambridge, MA: MIT, 2006: 601−608
|
[37] |
Kanamori T, Hido S, Sugiyama M. A least-squares approach to direct importance estimation[J]. Journal of Machine Learning Research, 2009, 10: 1391−1445
|
[38] |
Ganin Y, Ustinova E, Ajakan H, et al. Domain-adversarial training of neural networks[J]. Journal of Machine Learning Research, 2016, 17(1): 2096−2030
|
[39] |
Zhang Yuchen, Liu Tianle, Long Mingsheng, et al. Bridging theory and algorithm for domain adaptation[C] //Proc of the 36th Int Conf on Machine Learning. New York: ACM, 2019: 7404−7413
|
[40] |
McAuley J, Targett C, Shi Qinfeng, et al. Image-based recommendations on styles and substitutes [C] //Proc of the 38th int ACM SIGIR Conf on Research and Development in Information Retrieval. New York: ACM, 2015: 43−52
|
[41] |
Amini MR, Usunier N, Goutte C. Learning from multiple partially observed views-an application to multilingual text categorization[C] //Advances in Neural Information Processing Systems. Cambridge, MA: MIT, 2009: 28−36
|
[1] | He Xin, Gui Xiaolin, An Jian. A Distributed Area Coverage Algorithm Based on Delayed Awakening in Wireless Sensor Networks[J]. Journal of Computer Research and Development, 2011, 48(5): 786-792. |
[2] | Xu Jia, Feng Dengguo, Su Purui. Research on Network-Warning Model Based on Dynamic Peer-to-Peer Overlay Hierarchy[J]. Journal of Computer Research and Development, 2010, 47(9): 1574-1586. |
[3] | Xiong Wei, Xie Dongqing, Jiao Bingwang, Liu Jie. A Structured Peer to Peer File Sharing Model with Non-DHT Searching Algorithm[J]. Journal of Computer Research and Development, 2009, 46(3): 415-424. |
[4] | Li Xiaolong, Lin Yaping, Hu Yupeng, Liu Yonghe. A Subset-Based Coverage-Preserving Distributed Scheduling Algorithm[J]. Journal of Computer Research and Development, 2008, 45(1): 180-187. |
[5] | Hu Jinfeng, Hong Chunhui, Zheng Weimin. Granary: An Architecture of Object Oriented Internet Storage Service[J]. Journal of Computer Research and Development, 2007, 44(6): 1071-1079. |
[6] | Zhang Sanfeng and Wu Guoxin. A Fault-Tolerant Asymmetric DHT Method Towards Dynamic and Heterogeneous Network[J]. Journal of Computer Research and Development, 2007, 44(6): 905-913. |
[7] | Cao Jia, Lu Shiwen. Research on Topology Discovery in the Overlay Multicast[J]. Journal of Computer Research and Development, 2006, 43(5): 784-790. |
[8] | Mao Yingchi, Liu Ming, Chen Lijun, Chen Daoxu, Xie Li. A Distributed Energy-Efficient Location-Independent Coverage Protocol in Wireless Sensor Networks[J]. Journal of Computer Research and Development, 2006, 43(2): 187-195. |
[9] | Wen Yingyou, Zhao Jianli, Zhao Linliang, and Wang Guangxing. A Study of the Relationship Between Performance of Topology-Based MANET Routing Protocol and Network Coverage Density[J]. Journal of Computer Research and Development, 2005, 42(4): 684-689. |
[10] | Zhou Jin and Li Yanda. A Peer-to-Peer DHT Algorithm Based on Small-World Network[J]. Journal of Computer Research and Development, 2005, 42(1): 109-117. |