ISSN 1000-1239 CN 11-1777/TP

Journal of Computer Research and Development ›› 2019, Vol. 56 ›› Issue (9): 1832-1842.doi: 10.7544/issn1000-1239.2019.20180353

Previous Articles     Next Articles

A Generative Model for Synthesizing Structured Datasets Based on GAN

Song Kehui1, Zhang Ying1, Zhang Jiangwei2, Yuan Xiaojie1   

  1. 1(College of Computer Science, Nankai University, Tianjin 300350); 2(School of Computing, National University of Singapore, Singapore 117417)
  • Online:2019-09-10
  • Supported by: 
    This work was supported by the National Natural Science Foundation of China (61772289, U1836109).

Abstract: Synthesizing high quality dataset has been a long-standing challenge in both machine learning and database community. One of the applications of high quality dataset synthesis is to improve the model training, especially deep learning models. A robust model training process requires a large annotated dataset. One way of acquiring a large annotated training set is via the domain experts manual annotation, which is expensive and prone to mistakes. Therefore, as an alternative, automatic synthesis of high quality and similar dataset is much more plausible. Some efforts have been devoted for synthesizing image dataset due to the rapid development of computer vision. However, those models can not be applied to the structured data (numeric & categorical table) directly. Moreover, little efforts have been payed to the numeric & categorical table. Therefore, we propose TableGAN, the first generative model from GAN family, which improves the performance of the generative model with adversarial learning mechanism. TableGAN modifies the internal structure of traditional GAN targeting numeric & categorical table, including the optimization function, to synthesize more high-quality training dataset samples for improving the effectiveness of the training models. Extensive experiments on real datasets show significant performance improvement for those models trained on the enlarged training datasets, and thus verify the effectiveness of our TableGAN.

Key words: deep learning, generative models, neural network, generative adversarial network (GAN), classification

CLC Number: