Abstract:
In practical applications, there are a lot of data mining problems with unknown class information for the diversity, fuzziness and complexity of objective world. However, traditional methods are generally based on the categories of data which is known before mining, while there are no effective methods for solving this kind of problems. To solve this kind of problems, this paper presents a pattern class mining model based on active learning, namely PM_AL. Firstly, by the difference measurements between the unlabeled samples and labeled samples, some samples are selected as the most valuable samples according to active learning technique. Then these valuable samples are labeled by experts, and the model quickly mines pattern classes implicated in unlabeled samples. Hence, the most valuable samples will be extracted, and the model can quickly mine pattern categories implicated in unlabeled samples. Therefore, a non-labeling multi-class problem can be transferred into a labeling multi-class problem with the very low labeling cost. Through active learning during initial classes mining, the proposed PM_AL model can obtain high learning efficiency, low labeling cost and good generalization performance. The experiment results demonstrate that PM_AL model can effectively find categories as many as possible and solve the large scale multiple classification problems with unknown categories.