分类器线性组合的有效性和最佳组合问题的研究

付忠良

分类器线性组合的有效性和最佳组合问题的研究

付忠良

Effective Property and Best Combination of Classifier Linear Combination

Fu Zhongliang

摘要

摘要: 通过多个分类器的组合来提升分类精度是机器学习领域主要研究内容，弱学习定理保证了这种研究的可行性.分类器的线性组合，也即加权投票，是最常用的组合方法，其中广泛使用的AdaBoost算法和Bagging算法就是采取的加权投票.分类器组合的有效性问题以及最佳组合问题均需要解决.在各单个分类器互不相关和分类器数量较多条件下，得到了分类器组合有效的组合系数选取条件以及最佳组合系数公式，给出了组合分类器的误差分析.结论表明，当各分类器分类错误率有统一的边界时，即使采取简单投票，也能确保组合分类器分类错误率随分类器个数增加而以指数级降低.在此基础上，仿照AdaBoost算法，提出了一些新的集成学习算法，特别是提出了直接面向组合分类器分类精度快速提升这一目标的集成学习算法，分析并指出了这种算法的合理性和科学性，它是对传统的以错误率最低为目标的分类器训练与选取方法的延伸和扩展.从另一个角度证明了AdaBoost算法中采用的组合不仅有效，而且在一定条件下等效于最佳组合.针对多分类问题，得到了与二分类问题类似的分类器组合理论与结论，包括组合有效条件、最佳组合、误差估计等.还对AdaBoost算法进行了一定的扩展.

Abstract: Several classifiers are usually combined to promote the precision of classification in machine learning. The effectiveness of the combination is proved by the weak learning theory. The linear combination of classifiers, called weighted voting, is one of the most common combination methods. The widely-used AdaBoost and Bagging adopt weighted voting methods. The effectiveness of classifier combination and the problem of best combination both have to be solved. The coefficient selection condition for the effectiveness of classifier combination and the coefficient formula of best combination problem are given when there are many classifiers and every classifier is not relevant to other classifiers. The error of combined combination classifier is analyzed. It is concluded that the classification error rate drops exponentially with the increase of classifiers even simple voting method is adopted when the classification error rate of every classifier has unified boundary. Based on this conclusion, according to AdaBoost, some new integrated learning algorithms are proposed. One of them is to directly and rapidly promote the classification precision of the combined classifier. The reasonableness and scienctific nature of this algorithm are analyzed. It is the extension of traditional classifier trading and selecting method to minimize the classification error rate. It is proved that the combination in AdaBoost is efficient and sometimes is the best combination. A classifier combination theory and conclusion on multi-classification problem are given, which are similar to that on two-class classification problem, including effective condition, best combination, error estimation, etc. Moreover, AdaBoost is extended to some extent.

HTML全文

参考文献(0)

施引文献

资源附件(0)