ISSN 1000-1239 CN 11-1777/TP

Journal of Computer Research and Development ›› 2020, Vol. 57 ›› Issue (8): 1605-1616.doi: 10.7544/issn1000-1239.2020.20200196

Special Issue: 2020数据挖掘与知识发现专题

Previous Articles     Next Articles

A Bayesian Classification Algorithm Based on Selective Patterns

Ju Zhuoya1,2, Wang Zhihai1   

  1. 1(School of Computer and Information Technology, Beijing Jiaotong University, Beijing 100044);2(Unit 32178, Beijing 100012)
  • Online:2020-08-01
  • Supported by: 
    This work was supported by the National Natural Science Foundation of China (61672086) and the Beijing Natural Science Foundation (4182052).

Abstract: Data mining is mainly related to the theories and methods on how to discover knowledge from data in very large databases, while classification is an important topic in data mining. In the field of classification research, the Nave Bayesian classifier is a simple but effective learning technique, which has been widely used. It is commonly thought to assume that the probability of each attribute belonging to a given class value is independent of all other attributes. However, there are lots of contexts where the dependencies between attributes are more complex. It is an important technique to construct a classifier using specific patterns based on “attribute-value” pairs in lots of researchers’ work, while the dependencies among the attributes implied in the patterns and others will have significant impacts on classification results, thus the dependency between attributes is exploited adequately here. A Bayesian classification algorithm based on selective patterns is proposed, which could not only make use of the excellent classification ability based on Bayesian network classifiers, but also further weaken restrictions of the conditional independence assumption by further analyzing the dependencies between attributes in the patterns. The classification accuracies will benefit from fully considering the characteristics of datasets, mining and employing patterns which own high discrimination, and building the dependent relationship between attributes in a proper way. The empirical research results have shown that the average accuracy of the proposed classification algorithm on 10 datasets has been increased by 1.65% and 4.29%, comparing with the benchmark algorithms NB and AODE, respectively.

Key words: classification, pattern discovery, selective patterns, dependency, Bayesian classifier

CLC Number: