ISSN 1000-1239 CN 11-1777/TP

Journal of Computer Research and Development ›› 2017, Vol. 54 ›› Issue (9): 1979-1991.doi: 10.7544/issn1000-1239.2017.20160519

Previous Articles     Next Articles

Core Vector Regression for Attribute Effect Control on Large Scale Dataset

Liu Jiefang1,2, Wang Shitong1, Wang Jun1, Deng Zhaohong1   

  1. 1(School of Digital Media, Jiangnan University, Wuxi, Jiangsu 214122);2(School of Transportation and Information, Hubei Communications Technical College, Wuhan 430079)
  • Online:2017-09-01

Abstract: Attribute effect is a kind of phenomenon of data bias caused by sensitive attributes, which widely exists in real world. If not controlled, it will seriously affect the learning performance of regression model. In order to control the attribute effect in nonlinear regression model on large scale biased dataset, a novel fast equal mean-core vector regression (FEM-CVR) is proposed. First, a novel equal mean-support vector regression (EM-SVR) based on margin maximization criterion is proposed by using the constraint condition of equal mean. On this basis, the fact that the optimization problem of EM-SVR is equivalent to a center constrained-minimum enclosing ball (CC-MEB) problem is derived. Then a novel fast minimum enclosing ball based nonlinear regression learning algorithm for attribute effect control on large scale biased dataset, referred to as FEM-CVR, is further proposed by integrating the approximate minimum enclosing ball theory and reducing the original input dataset into the core set. In addition, some fundamental theoretical properties are deeply discussed. Finally, extensive experiments are conducted on synthetic and real datasets, and experimental results show that our FEM-CVR can effectively control attribute effect in nonlinear regression model on large scale biased dataset with good generalization ability, whose upper bound of the time complexity is independent of the size of the dataset, only related to the approximate parameter of the minimum enclosing ball ε.

Key words: regression learning, attribute effect control, center constrained-minimum enclosing ball (CC-MEB), equal mean constraint, large scale data

CLC Number: