通用集成学习算法的构造

付忠良

通用集成学习算法的构造

付忠良

A Universal Ensemble Learning Algorithm

Fu Zhongliang

摘要

摘要: 集成学习算法的构造属于机器学习领域的重要研究内容，尽管弱学习定理指出了弱学习算法与强学习算法是等价的，但如何构造好的集成学习算法仍然是一个未得到很好解决的问题.Freund和Schapire提出的AdaBoost算法和Schapire和Singer提出的连续AdaBoost算法部分解决了该问题.提出了一种学习错误定义，以这种学习错误最小化为目标，提出了一种通用的集成学习算法，算法可以解决目前绝大多数分类需求的学习问题，如多分类、代价敏感分类、不平衡分类、多标签分类、模糊分类等问题，算法还对AdaBoost系列算法进行了统一和推广.从保证组合预测函数的泛化能力出发，提出了算法中的简单预测函数可统一基于样本的单个特征来构造.理论分析和实验结论均表明，提出的系列算法的学习错误可以任意小，同时又不用担心出现过学习现象.

Abstract: The construction of ensemble learning algorithms is one of the important contents in machine learning area. The weak learning theorem proves that the weak learning algorithm is equal to the strong one essentially, but how to construct a good ensemble learning algorithm is still a problem to be studied. Freund and Schapire’s AdaBoost boosting algorithm, and Schapire and Singer’s real AdaBoost boosting algorithm partially solved this problem. A concept of learning error is defined. Based on it, aiming at the minimization of learning error, a universal ensemble learning algorithm is put forward. By using it, the learning error can decrease while increasing simple predictions. This universal ensemble learning algorithm can solve almost all classification problems, such as the multi-class classification problem, the cost-sensitive classification problem, the imbalanced classification problem, the multi-label classification problem, the fuzzy classification problem, etc. The universal ensemble learning algorithm can unify a series of AdaBoost algorithms and has generalized AdaBoost algorithm also. It is put forward that the simple prediction in all algorithms above can be constructed by single characteristics based on samples for the generalization ability of the ensemble prediction. Theoretical analysis and experimental conclusion all show that this universal ensemble learning algorithm can get any small learning error, and it is not easy to cause over-learning phenomenon.

HTML全文

参考文献(0)

施引文献

资源附件(0)