• 中国精品科技期刊
  • CCF推荐A类中文期刊
  • 计算领域高质量科技期刊T1类
Advanced Search
Jiang Gaoxia, Wang Wenjian. A Numerical Label Noise Filtering Algorithm for Regression Task[J]. Journal of Computer Research and Development, 2022, 59(8): 1639-1652. DOI: 10.7544/issn1000-1239.20220053
Citation: Jiang Gaoxia, Wang Wenjian. A Numerical Label Noise Filtering Algorithm for Regression Task[J]. Journal of Computer Research and Development, 2022, 59(8): 1639-1652. DOI: 10.7544/issn1000-1239.20220053

A Numerical Label Noise Filtering Algorithm for Regression Task

Funds: This work was supported by the National Natural Science Foundation of China (U21A20513, 62076154, 61906113, U1805263), the Key Research and Development Program of Shanxi Province International Cooperation (201903D421050), and the Scientific and Technological Innovation Programs of Higher Education Institutions in Shanxi (2020L0007).
More Information
  • Published Date: July 31, 2022
  • Numerical label noise in regression may misguide the model training and weaken the generalization ability. As a popular technique, noise filtering could reduce the noise level by removing mislabeled samples, but it could rarely ensure a better generalization performance. Some filters care about the noise level so much that many noise-free samples are also removed. Although the existing sample selection framework could balance the number of removals and the noise level, it is too complicated to be understood intuitively and applied in reality. A generalization error bound is proposed for data with numerical label noise according to the learning theory in the noise-free regression task. It clarifies the key data factors, including data size and noise level, that affect the generalization ability. On this basis, an interpretable noise filtering framework is proposed, the goal of which is to minimize the noise level with a low cost of sample removal. Meanwhile, the relationship between noise and key indicators (center and radius) of the covering interval is theoretically analyzed for noise estimation. Then a relative noise estimator is proposed. The relative noise filtering (RNF) algorithm is designed by integrating the proposed framework with the estimator. The effectiveness of RNF is verified on the benchmark datasets and age estimation dataset. Experimental results show that RNF can be adapted to various types of noises and significantly improve the generalization ability of the regression model. On the age estimation dataset, RNF detects some samples with label noises. It effectively improves the data quality and model prediction performance.
  • Cited by

    Periodical cited type(9)

    1. 姜高霞,李政莹,王文剑. 噪声标签回归的泛化误差估计及过滤算法. 小型微型计算机系统. 2025(01): 72-80 .
    2. 侯森寓,姜高霞,王文剑. 基于相对离群因子的标签噪声过滤方法. 自动化学报. 2024(01): 154-168 .
    3. 姜高霞,王菲,许行,王文剑. 有序标签噪声的鲁棒估计与过滤方法. 计算机科学. 2024(06): 144-152 .
    4. 符达辉. 4K高清全媒体转播车音频信号非稳态噪声过滤. 微型电脑应用. 2024(06): 50-52+64 .
    5. 姚勤岐,唐明康,叶龙莹,张培培,王科盛,凌丹,何倩鸿,陈柱. 应急救治呼吸机评价筛选指标研究. 医疗卫生装备. 2024(07): 8-16 .
    6. 史颖,祁晓博,亓慧,姜高霞,王文剑. 融合粒度划分的回归标签噪声过滤方法. 小型微型计算机系统. 2024(08): 1832-1838 .
    7. 刘昕雨,张琳,姜高霞,王文剑. 标记相关性修正的多标记众包标签推断方法. 小型微型计算机系统. 2024(05): 1025-1031 .
    8. 李金阔,王秀平. 基于循环BP模型的变压器状态数据清洗方法. 电力学报. 2023(02): 101-109 .
    9. 姜高霞,秦佩,王文剑. 极限距离噪声估计与过滤方法. 计算机科学. 2023(06): 151-158 .

    Other cited types(6)

Catalog

    Article views (293) PDF downloads (166) Cited by(15)
    Turn off MathJax
    Article Contents

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return