ISSN 1000-1239 CN 11-1777/TP

Journal of Computer Research and Development ›› 2020, Vol. 57 ›› Issue (8): 1594-1604.doi: 10.7544/issn1000-1239.2020.20200490

Special Issue: 2020数据挖掘与知识发现专题

Previous Articles     Next Articles

Mondrian Deep Forest

He Yixiao, Pang Ming, Jiang Yuan   

  1. (National Key Laboratory for Novel Software Technology(Nanjing University), Nanjing 210023) (Collaborative Innovation Center of Novel Software Technology and Industrialization(Nanjing University), Nanjing 210023)
  • Online:2020-08-01
  • Supported by: 
    This work was supported by the National Natural Science Foundation of China (61673201).

Abstract: Most studies about deep learning are built on neural networks, i.e., multiple layers of parameterized differentiable nonlinear modules trained by backpropagation. Recently, deep forest was proposed as a non-NN style deep model, which has much fewer parameters than deep neural networks. It shows robust performance under different hyperparameter settings and across different tasks, and the model complexity can be determined in a data-dependent style. Represented by gcForest, the study of deep forest provides a promising way of building deep models based on non-differentiable modules. However, deep forest is now used offline which inhibits its application in many real tasks, e.g., in the context of learning from data streams. In this work, we explore the possibility of building deep forest under the incremental setting and propose Mondrian deep forest. It has a cascade forest structure to do layer-by-layer processing. And we further enhance its layer-by-layer processing by devising an adaptive mechanism, which is capable of adjusting the attention to the original features versus the transformed features of the previous layer, therefore notably mitigating the deficiency of Mondrian forest in handling irrelevant features. Empirical results show that, while inheriting the incremental learning ability of Mondrian forest, Mondrian deep forest has a significant improvement in performance. And using the same default setting of hyperparameters, Mondrian deep forest is able to achieve satisfying performance across different datasets. In the incremental training setting, Mondrian deep forest achieves highly competitive predictive performance with periodically retrained gcForest while being an order of magnitude faster.

Key words: machine learning, deep forest, Mondrian forest, ensemble methods, incremental learning

CLC Number: