ISSN 1000-1239 CN 11-1777/TP

• 人工智能 •

### 基于主动学习的模式类别挖掘模型

1. 1(山西大学计算机与信息技术学院 太原 030006);2(计算智能与中文信息处理教育部重点实验室(山西大学) 太原 030006) (guohusheng@sxu.edu.cn)
• 出版日期: 2014-10-01
• 基金资助:
国家自然科学基金项目(61175051,61070131,61175033)

### A Pattern Class Mining Model Based on Active Learning

Guo Husheng1, Wang Wenjian1,2

1. 1(School of Computer and Information Technology, Shanxi University, Taiyuan 030006); 2(Key Laboratory of Computational Intelligence and Chinese Information Processing (Shanxi University), Ministry of Education, Taiyuan 030006)
• Online: 2014-10-01

Abstract: In practical applications, there are a lot of data mining problems with unknown class information for the diversity, fuzziness and complexity of objective world. However, traditional methods are generally based on the categories of data which is known before mining, while there are no effective methods for solving this kind of problems. To solve this kind of problems, this paper presents a pattern class mining model based on active learning, namely PM_AL. Firstly, by the difference measurements between the unlabeled samples and labeled samples, some samples are selected as the most valuable samples according to active learning technique. Then these valuable samples are labeled by experts, and the model quickly mines pattern classes implicated in unlabeled samples. Hence, the most valuable samples will be extracted, and the model can quickly mine pattern categories implicated in unlabeled samples. Therefore, a non-labeling multi-class problem can be transferred into a labeling multi-class problem with the very low labeling cost. Through active learning during initial classes mining, the proposed PM_AL model can obtain high learning efficiency, low labeling cost and good generalization performance. The experiment results demonstrate that PM_AL model can effectively find categories as many as possible and solve the large scale multiple classification problems with unknown categories.