ISSN 1000-1239 CN 11-1777/TP

• 人工智能 •

### 一种基于决策森林的单调分类方法

1. 1(山西大学计算机与信息技术学院 太原 030006);2(计算智能与中文信息处理教育部重点实验室(山西大学) 太原 030006);3(山西财经大学应用数学学院 太原 030006) (xuh102@126.com)
• 出版日期: 2017-07-01
• 基金资助:
国家自然科学基金项目(61673249,61503229)；山西省回国留学人员科研基金项目(2016-004)；山西省研究生教育创新项目(2016BY003)

### A Method for Monotonic Classification Based on Decision Forest

Xu Hang1, Wang Wenjian1,2, Ren Lifang1,3

1. 1(School of Computer and Information Technology, Shanxi University, Taiyuan 030006);2(Key Laboratory of Computational Intelligence and Chinese Information Processing(Shanxi University), Ministry of Education, Taiyuan 030006);3(School of Applied Mathematics, Shanxi University of Finance and Economics, Taiyuan 030006)
• Online: 2017-07-01

Abstract: Monotonic classification is an ordinal classification problem in which the monotonic constraint exists between features and class. There have been some methods which can deal with the monotonic classification problem on the nominal datasets well. But for the monotonic classification problems on the numeric datasets, the classification accuracies and running efficiencies of the existing methods are limited. In this paper, a monotonic classification method based on decision forest (MCDF) is proposed. A sampling strategy is designed to generate decision trees, which can make the sampled training data subsets having a consistent distribution with the original training dataset, and the influence of non-monotonic noise data is avoided by the sample weights. It can effectively improve the running efficiency while maintaining the high classification performance. In addition, this strategy can also determine the number of trees in decision forest automatically. A solution for the classification conflicts of different trees is also provided when the decision forest determines the class of a sample. The proposed method can deal with not only the nominal data, but also the numeric data. The experimental results on artificial, UCI and real datasets demonstrate that the proposed method can improve the monotonic classification performance and running efficiency, and reduce the length of classification rules and solve the monotonic classification problem on large datasets.