A Two-Layer Bayes Model: Random Forest Naive Bayes

Zhang Wenjun; Jiang Liangxiao; Zhang Huan; Chen Long

doi:10.7544/issn1000-1239.2021.20200521

Zhang Wenjun, Jiang Liangxiao, Zhang Huan, Chen Long. A Two-Layer Bayes Model: Random Forest Naive Bayes[J]. Journal of Computer Research and Development, 2021, 58(9): 2040-2051. DOI: 10.7544/issn1000-1239.2021.20200521

Citation:

A Two-Layer Bayes Model: Random Forest Naive Bayes

Graphical Abstract

Graphical Abstract

Abstract

Abstract

Text classification is an essential task in natural language processing. The high dimension and sparsity of text data bring many problems and challenges to text classification. Naive Bayes (NB) is widely used in text classification due to its simplicity, efficiency and comprehensibility, but its attribute conditional independence assumption is rarely met in real-world text data and thus affects its classification performance. In order to weaken the attribute conditional independence assumption required by NB, scholars have proposed a variety of improved approaches, mainly including structure extension, instance selection, instance weighting, feature selection, and feature weighting. However, all these approaches construct NB classification models based on the independent term features, which restricts their classification performance to a certain extent. In this paper, we try to improve the naive Bayes text classification model by feature learning and thus propose a two-layer Bayes model called random forest naive Bayes (RFNB). RFNB is divided into two layers. In the first layer, random forest (RF) is used to learn high-level features of term combinations from original term features. Then the learned new features are input into the second layer, which is used to construct a Bernoulli naive Bayes model after one-hot encoding. The experimental results on a large number of widely used text datasets show that the proposed RFNB significantly outperforms the existing state-of-the-art naive Bayes text classification models and other classical text classification models.

FullText(HTML)

References (0)

Cited By

Turn off MathJax

Article Contents

A Two-Layer Bayes Model: Random Forest Naive Bayes

Graphical Abstract

Abstract

Catalog

Export File

Citation

Format

Content