• 中国精品科技期刊
  • CCF推荐A类中文期刊
  • 计算领域高质量科技期刊T1类
高级检索

面向大规模数据属性效应控制的核心向量回归机

刘解放, 王士同, 王骏, 邓赵红

刘解放, 王士同, 王骏, 邓赵红. 面向大规模数据属性效应控制的核心向量回归机[J]. 计算机研究与发展, 2017, 54(9): 1979-1991. DOI: 10.7544/issn1000-1239.2017.20160519
引用本文: 刘解放, 王士同, 王骏, 邓赵红. 面向大规模数据属性效应控制的核心向量回归机[J]. 计算机研究与发展, 2017, 54(9): 1979-1991. DOI: 10.7544/issn1000-1239.2017.20160519
Liu Jiefang, Wang Shitong, Wang Jun, Deng Zhaohong. Core Vector Regression for Attribute Effect Control on Large Scale Dataset[J]. Journal of Computer Research and Development, 2017, 54(9): 1979-1991. DOI: 10.7544/issn1000-1239.2017.20160519
Citation: Liu Jiefang, Wang Shitong, Wang Jun, Deng Zhaohong. Core Vector Regression for Attribute Effect Control on Large Scale Dataset[J]. Journal of Computer Research and Development, 2017, 54(9): 1979-1991. DOI: 10.7544/issn1000-1239.2017.20160519
刘解放, 王士同, 王骏, 邓赵红. 面向大规模数据属性效应控制的核心向量回归机[J]. 计算机研究与发展, 2017, 54(9): 1979-1991. CSTR: 32373.14.issn1000-1239.2017.20160519
引用本文: 刘解放, 王士同, 王骏, 邓赵红. 面向大规模数据属性效应控制的核心向量回归机[J]. 计算机研究与发展, 2017, 54(9): 1979-1991. CSTR: 32373.14.issn1000-1239.2017.20160519
Liu Jiefang, Wang Shitong, Wang Jun, Deng Zhaohong. Core Vector Regression for Attribute Effect Control on Large Scale Dataset[J]. Journal of Computer Research and Development, 2017, 54(9): 1979-1991. CSTR: 32373.14.issn1000-1239.2017.20160519
Citation: Liu Jiefang, Wang Shitong, Wang Jun, Deng Zhaohong. Core Vector Regression for Attribute Effect Control on Large Scale Dataset[J]. Journal of Computer Research and Development, 2017, 54(9): 1979-1991. CSTR: 32373.14.issn1000-1239.2017.20160519

面向大规模数据属性效应控制的核心向量回归机

基金项目: 国家自然科学基金项目(61300151,61572236);江苏省杰出青年基金项目(BK20140001);江苏省自然科学基金项目(BK20130155,BK20151299)
详细信息
  • 中图分类号: TP391

Core Vector Regression for Attribute Effect Control on Large Scale Dataset

  • 摘要: 属性效应在现实生活中广泛存在,如果不加以控制,将会严重影响回归学习的性能.针对大规模数据属性效应控制的非线性回归学习问题,提出了快速等均值核心向量回归机(fast equal mean-core vector regression, FEM-CVR).首先基于间隔最大化目标学习准则,通过施加等均值约束条件,提出了等均值支持向量回归机(equal mean-support vector regression, EM-SVR).在此基础上,证明了EM-SVR等价于一个中心约束最小包含球(center constrained-minimum enclosing ball, CC-MEB)问题,然后通过引入近似最小包含球理论,得到原始输入数据集的压缩集即核心集(core set),进一步提出了针对大规模数据属性效应控制的最小包含球快速非线性回归学习方法FEM-CVR,并从理论上对相关性质进行了深入分析.实验表明:该方法能够快速处理针对大规模数据属性效应控制的非线性回归学习问题,具有较好的泛化能力,并且其时间复杂度上限与数据集大小无关,仅与最小包含球近似参数ε有关.
    Abstract: Attribute effect is a kind of phenomenon of data bias caused by sensitive attributes, which widely exists in real world. If not controlled, it will seriously affect the learning performance of regression model. In order to control the attribute effect in nonlinear regression model on large scale biased dataset, a novel fast equal mean-core vector regression (FEM-CVR) is proposed. First, a novel equal mean-support vector regression (EM-SVR) based on margin maximization criterion is proposed by using the constraint condition of equal mean. On this basis, the fact that the optimization problem of EM-SVR is equivalent to a center constrained-minimum enclosing ball (CC-MEB) problem is derived. Then a novel fast minimum enclosing ball based nonlinear regression learning algorithm for attribute effect control on large scale biased dataset, referred to as FEM-CVR, is further proposed by integrating the approximate minimum enclosing ball theory and reducing the original input dataset into the core set. In addition, some fundamental theoretical properties are deeply discussed. Finally, extensive experiments are conducted on synthetic and real datasets, and experimental results show that our FEM-CVR can effectively control attribute effect in nonlinear regression model on large scale biased dataset with good generalization ability, whose upper bound of the time complexity is independent of the size of the dataset, only related to the approximate parameter of the minimum enclosing ball ε.
  • 期刊类型引用(11)

    1. 周显春,喻佳. 基于图神经网络的人工自然语言语义挖掘仿真. 计算机仿真. 2024(01): 344-348 . 百度学术
    2. 孟祥福,田友发,张霄雁. 基于LightGBM模型的肺腺癌免疫相关基因筛选与患者生存率预测. 生物医学工程学杂志. 2024(01): 70-79 . 百度学术
    3. 陈伟,周丽华,王亚峰,王丽珍,陈红梅. 异质信息网络中基于解耦图神经网络的社区搜索. 计算机科学. 2024(03): 90-101 . 百度学术
    4. 万齐智,万常选,胡蓉,刘德喜,刘喜平,廖国琼. 面向研究问题的深度学习事件抽取综述. 自动化学报. 2024(11): 2079-2101 . 百度学术
    5. 刘超,孔兵,杜国王,周丽华,陈红梅,包崇明. 高阶互信息最大化与伪标签指导的深度聚类. 浙江大学学报(工学版). 2023(02): 299-309 . 百度学术
    6. 杨成波,周丽华,黄亚群,杨宇迪. 异质网络中基于关键词属性的Truss社区搜索. 计算机应用研究. 2023(06): 1708-1714 . 百度学术
    7. 白明昌. 基于折叠路径聚合的属性网络节点嵌入方法. 计算机工程. 2023(07): 76-84 . 百度学术
    8. 谢小杰,梁英,王梓森,刘政君. 基于图卷积的异质网络节点分类方法. 计算机研究与发展. 2022(07): 1470-1485 . 本站查看
    9. 王宏琳,杨丹,聂铁铮,寇月. 自注意力机制的属性异构信息网络嵌入的商品推荐. 计算机研究与发展. 2022(07): 1509-1521 . 本站查看
    10. 盛妍,朱龙珠,丁毛毛,刘鲲鹏,刘海龙. 面向电力服务情绪识别的图卷积网络方法研究. 电子器件. 2022(04): 959-963 . 百度学术
    11. 李琳,梁永全,刘广明. 基于重启随机游走的图自编码器. 计算机应用研究. 2021(10): 3009-3013 . 百度学术

    其他类型引用(20)

计量
  • 文章访问数:  1001
  • HTML全文浏览量:  3
  • PDF下载量:  673
  • 被引次数: 31
出版历程
  • 发布日期:  2017-08-31

目录

    /

    返回文章
    返回