Abstract:
Weak learning theorem in machine learning area shows that if the weak learning algorithm slightly better than random guess can be found, the strong learning algorithm with any precision can be constructed. AdaBoost and Bagging are the methods most in use based on this theorem. But many problems about AdaBoost and Bagging have not been well solved: The error analyses of AdaBoost and Bagging are not uniformed; The training errors used in AdaBoost are not the real training errors, but the errors based on sample weights, and if they can represent the real training errors, explanation is needed; The conditions for assuring the effectiveness of final strong learning algorithm also needs to be explained. After adjusting the error rate of Bagging and adopting weighted voting method, the algorithm flows and error analyses of AdaBoost and Bagging are unified. By direct graph analysis, how weak learning algorithm is promoted to strong learning algorithm is explained. Based on the explanation and proof of large number law to weak learning theorem, the effectiveness of AdaBoost is analyzed. The sample weight adjustment strategy of AdaBoost is used to assure the uniform distribution of correct samples. Its probabilities of training errors are equal in probability to that of the real training errors. The rules for training weak learning algorithm are proposed to assure the effectiveness of AdaBoost. The effectiveness of AdaBoost is explained, and the methods for constructing new integrated learning algorithms are given. Some suggestions about the selection strategy of training set in Bagging are given by consulting AdaBoost.