ISSN 1000-1239 CN 11-1777/TP

Journal of Computer Research and Development ›› 2021, Vol. 58 ›› Issue (1): 60-69.doi: 10.7544/issn1000-1239.2021.20190838

Previous Articles     Next Articles

Safe Tri-training Algorithm Based on Cross Entropy

Zhang Yong, Chen Rongrong, Zhang Jing   

  1. (School of Computer & Information Technology, Liaoning Normal University, Dalian, Liaoning 116081)
  • Online:2021-01-01
  • Supported by: 
    This work was supported by the National Natural Science Foundation of China (61772252, 61902165), the Program for Liaoning Innovative Talents in Universities (LR2017044), and the Natural Science Foundation of Liaoning Province (2019-MS-216).

Abstract: Semi-supervised learning methods improve learning performance with a small amount of labeled data and a large amount of unlabeled data. Tri-training algorithm is a classic semi-supervised learning method based on divergence, which does not need redundant views of datasets and has no specific requirements for basic classifiers. Therefore, it has become the most commonly used technology in semi-supervised learning methods based on divergence. However, Tri-training algorithm may produce the problem of label noise in the learning process, which leads to a bad impact on the final model. In order to reduce the prediction bias of the noise in Tri-training algorithm on the unlabeled data and learn a better semi-supervised classification model, cross entropy is used to replace the error rate to better reflect the gap between the predicted results and the real distribution of the model, and the convex optimization method is combined to reduce the label noise and ensure the effect of the model. On this basis, we propose a Tri-training algorithm based on cross entropy, a safe Tri-training algorithm and a safe Tri-training learning algorithm based on cross entropy, respectively. The validity of the proposed method is verified on the benchmark dataset such as UCI (University of California Irvine) machine learning repository and the performance of the method is further verified from a statistical point of view using a significance test. The experimental results show that the proposed semi-supervised learning method is superior to the traditional Tri-training algorithm in classification performance, and the safe Tri-training algorithm based on cross entropy has higher classification performance and generalization ability.

Key words: semi-supervised, Tri-training algorithm, cross entropy, convex optimization, sample label

CLC Number: