• 中国精品科技期刊
  • CCF推荐A类中文期刊
  • 计算领域高质量科技期刊T1类
Advanced Search
Yang Bin, Wang Zhengyang, Cheng Zihang, Zhao Huiying, Wang Xin, Guan Yu, Cheng Xinzhou. Customer Churn Prediction Based on Generation Data Reconstruction Using Diffusion Model[J]. Journal of Computer Research and Development, 2024, 61(2): 324-337. DOI: 10.7544/issn1000-1239.202330742
Citation: Yang Bin, Wang Zhengyang, Cheng Zihang, Zhao Huiying, Wang Xin, Guan Yu, Cheng Xinzhou. Customer Churn Prediction Based on Generation Data Reconstruction Using Diffusion Model[J]. Journal of Computer Research and Development, 2024, 61(2): 324-337. DOI: 10.7544/issn1000-1239.202330742

Customer Churn Prediction Based on Generation Data Reconstruction Using Diffusion Model

Funds: This work was supported by the Open Foundation of Yunnan Key Laboratory of Software Engineering (2023SE202).
More Information
  • Author Bio:

    Yang Bin: born in 1986. PhD. Member of CCF. His main research interests include data mining and natural language processing

    Wang Zhengyang: born in 2003. Undergraduate. His main research interests include data mining and artificial intelligence

    Cheng Zihang: born in 2003. Undergraduate. His main research interests include data mining and artificial intelligence

    Zhao Huiying: born in 1991. PhD, postdoctoral fellow of China United Network Communications Group Co., Ltd. Her main research interests include graph neural network and autonomous networks

    Wang Xin: born in 1988. Master. Her main research interests include artificial intelligence and graph neural network

    Guan Yu: born in 1998. Master. His main research interests include software test, data mining, and artificial intelligence

    Cheng Xinzhou: born in 1978. Professor level senior engineer. His main research interests include network intelligent operations and artificial intelligence

  • Received Date: September 10, 2023
  • Revised Date: December 13, 2023
  • Available Online: December 20, 2023
  • In the field of data mining, the issue of data imbalance impacting model prediction accuracy is widespread, and also the issue of user privacy protection is neglected. Fake dataset generation has come to light as a crucial remedy for these problems. However, because of the traits of high-dimensional and irrelevant features, it is difficult to generate high-quality data in circumstances where structured data predominate. Considering the successful applications of the diffusion model in image generation task, we aim to apply the diffusion model for the task of customer churn prediction, which is a typical scenario in data mining. we utilize the Gaussian diffusion model and polynomial diffusion model to generate data for numerical and categorical features in customer churn data. Research and analysis have been conducted on the predictive performance and data privacy protection capabilities of our model. We conduct extensive experiments on customer churn data from multiple domains to explore the potential of fusing synthetic dataset and real dataset for data reconstruction. The results demonstrate that the diffusion model can generate high-quality data and improve the performance of various prediction methods, which can help alleviate the issue of data imbalance. Additionally, the data produced by the diffusion model exhibit a distribution that is quite similar to the original dataset, which may be useful for protecting user privacy.

  • [1]
    姚博. 客户流失预测模型研究及其应用[D]. 西安:西北大学,2017

    Yao Bo. Research and application of customers churn prediction model[D]. Xi’an: Northwest University, 2017 (in Chinese)
    [2]
    Jain H, Khunteta A, Srivastava S. Churn prediction in telecommunication using logistic regression and logit boost[J]. Procedia Computer Science, 2020, 167: 101−112 doi: 10.1016/j.procs.2020.03.187
    [3]
    Qiu Yanfang, Li Chen. Research on e-commerce user churn prediction based on logistic regression[C]// Proc of the 2017 IEEE 2nd Information Technology, Networking, Electronic and Automation Control Conf (ITNEC). Piscataway, NJ: IEEE, 2017: 87−91
    [4]
    Xing Ying, Lin Wanting, Lin Xueyan, et al. Cross-project defect prediction based on two-phase feature importance amplification[J]. Computational Intelligence and Neuroscience, 2022. https://www.hindawi.com/journals/cin/2022/232044
    [5]
    Zeng Fuping, Lin Wanting, Xing Ying, et al. A cross-project defect prediction model using feature transfer and ensemble learning[J]. Tehnički Vjesnik, 2022, 29(4): 1089−1099
    [6]
    钱文君,沈晴霓,吴鹏飞,等. 大数据计算环境下的隐私保护技术研究进展[J]. 计算机学报,2022,45(4):669−701

    Qian Wenjun, Shen Qingni, Wu Pengfei, et al. Research progress on privacy-preserving techniques in big data computing environment[J]. Chinese Journal of Computers, 2022, 45(4): 669−701(in Chinese)
    [7]
    赵景欣,岳星辉,冯崇朋,等. 基于通用数据保护条例的数据隐私安全综述[J]. 计算机研究与发展,2022,59(10):2130−2163

    Zhao Jingxin, Yue Xinghui, Feng Chongpeng, et al. Survey of data privacy security based on general data protection regulation[J]. Journal of Computer Research and Development, 2022, 59(10): 2130−2163 (in Chinese)
    [8]
    Zhang Hongyi, Cisse M, Dauphin Y, et al. mixup: Beyond empirical risk minimization[C]//Proc of the 6th Int Conf Learn Represent (ICLR). Vancouver, BC, Canada: OpenReview.net, 2018: 1−13
    [9]
    Chawla N V, Bowyer K W, Hall L O, et al. SMOTE: Synthetic minority over-sampling technique[J]. Journal of Artificial Intelligence Research, 2002, 16: 321−357 doi: 10.1613/jair.953
    [10]
    Ai-jun L, Peng Z. Research on unbalanced data processing algorithm base tomeklinks-smote[C]//Proc of the 3rd Int Conf on Artificial Intelligence and Pattern Recognition. New York: ACM, 2020: 13−17
    [11]
    Bogaert M, Delaere L. Ensemble methods in customer churn prediction: A comparative analysis of the state-of-the-art[J]. Mathematics, 2023, 11(5): 1137 doi: 10.3390/math11051137
    [12]
    Goodfellow I, Pouget-Abadie J, Mirza M, et al. Generative adversarial networks[J]. Communications of the ACM, 2020, 63(11): 139−144 doi: 10.1145/3422622
    [13]
    He Huang, Yu P S, Wang Changhu. An introduction to image synthesis with generative adversarial nets[J]. arXiv preprint, arXiv: 1803. 04469, 2018
    [14]
    Wang Jun, Yu Lantao, Zhang Weinan, et al. IRGAN: A minimax game for unifying generative and discriminative information retrieval models[C]//Proc of the 40th Int ACM SIGIR Conf on Research and Development in Information Retrieval. New York: ACM, 2017: 515−524
    [15]
    Zhao Zilong, Kunar A, Birke R, et al. CTAB-GAN: Effective table data synthesizing[C]// Proc of Asian Conf on Machine Learning. https://www.acml-conf.org/2021/
    [16]
    Ho J, Jain A, Abbeel P. Denoising diffusion probabilistic models[J]. Advances in Neural Information Processing Systems, 2020, 33: 6840−6851
    [17]
    Yang Bin, Li Huilai, Xing Ying et al. Directed search based on improved whale optimization algorithm for test case prioritization[J]. International Journal of Computers Communications & Control. https://www.univagora.ro/jour/index.php/ijccc/article/view/5049
    [18]
    应维云,覃正,赵宇,等. SVM方法及其在客户流失预测中的应用研究[J]. 系统工程理论与实践,2007,27(7):105−110

    Ying Weiyun, Qin Zheng, Zhao Yu, et al. Support vector machine and its application in customer churn prediction[J]. Systems Engineering-Theory & Practice, 2007, 27(7): 105−110 (in Chinese)
    [19]
    Ahn J, Hwang J, Kim D, et al. A survey on churn analysis in various business domains[J]. IEEE Access, 2020, 8: 220816−220839 doi: 10.1109/ACCESS.2020.3042657
    [20]
    Wu Zengyuan, Jing Lizheng, Wu Bei, et al. A PCA-AdaBoost model for e-commerce customer churn prediction[J]. Annals of Operations Research, 2022: 1−18
    [21]
    Prokhorenkova L, Gusev G, Vorobev A, et al. Catboost: Unbiased boosting with categorical features[C]//Advances in Neural Information Processing Systems. New York: Curran Associates, 2018, 31: 6638−6648
    [22]
    Pekel Ozmen E, Ozcan T. A novel deep learning model based on convolutional neural networks for employee churn prediction[J]. Journal of Forecasting, 2022, 41(3): 539−550 doi: 10.1002/for.2827
    [23]
    Wu Xiaojun, Meng Sufang. E-commerce customer churn prediction based on improved SMOTE and AdaBoost[C]// Proc of the 13th Int Conf on Service Systems and Service Management (ICSSSM). Piscataway, NJ: IEEE, 2016: 1−5
    [24]
    Park N, Mohammadi M, Gorde K, et al. Data synthesis based on generative adversarial networks[J]. arXiv preprint, arXiv: 1806. 03384, 2018
    [25]
    Nichol A Q, Dhariwal P. Improved denoising diffusion probabilistic models[C]// Proc of the Int Conf on Machine Learning. Virtual: PMLR, 2021: 8162−8171
    [26]
    Dhariwal P, Nichol A. Diffusion models beat GANs on image synthesis[C]//Advances in Neural Information Processing Systems. New York: Curran Associates, 2021, 34: 8780−8794
    [27]
    Nie Guangli, Rowe W, Zhang Lingling, et al. Credit card churn forecasting by logistic regression and decision tree[J]. Expert Systems with Applications, 2011, 38(12): 15273−15285 doi: 10.1016/j.eswa.2011.06.028
    [28]
    Xie Yaya, Li Xiu, Ngai E W T, et al. Customer churn prediction using improved balanced random forests[J]. Expert Systems with Applications, 2009, 36(3): 5445−5449 doi: 10.1016/j.eswa.2008.06.121
    [29]
    梁家富,邱新泳. 基于GBDT和LR算法的用户流失监控技术研究[J]. 河北软件职业技术学院学报,2021,23(3):1−4

    Liang Jiafu, Qiu Xinyong. Research on user churn monitoring technology based on GBDT and LR algorithm[J]. Journal of Hebei Software Institute, 2021, 23(3): 1−4 (in Chinese)
    [30]
    杨光锴. 基于扩散模型的指纹图像生成方法[J]. 河北省科学院学报,2023,40(1):13−18+66

    Yang Guangkai. Fingerprint image generation method based on diffusion model[J]. Journal of the Hebei Academy of Sciences, 2023, 40(1): 13−18+66 (in Chinese)
    [31]
    Sohl-Dickstein J, Weiss E, Maheswaranathan N, et al. Deep unsupervised learning using nonequilibrium thermodynamics [C]// Proc of the Int Conf on Machine Learning. Lille, France: PMLR, 2015: 2256-2265
    [32]
    Kotelnikov A, Baranchuk D, Rubachev I, et al. TabDDPM: Modelling tabular data with diffusion models[J]. arXiv preprint, arXiv: 2209.15421,2022
    [33]
    Hoogeboom E, Nielsen D, Jaini P, et al. Argmax flows and multinomial diffusion: Learning categorical distributions [C]//Advances in Neural Information Processing Systems. New York: Curran Associates, 2021, 34: 12454−12465
    [34]
    Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need[C]//Advances in Neural Information Processing Systems. New York: Curran Associates, 2017 [2023-08-01]. https://proceedings.neuri ps. cc/paper_ files/paper/2017/hash/3f5ee243547dee91fbd053clc4a845aa-Abstract. html
    [35]
    Yang Bin, Li Haoling, Teng Sikai, et al. Attentional interactive encoder network focused on aspect for sentiment classification[J]. Electronics, 2023, 12(6): 1329 doi: 10.3390/electronics12061329
    [36]
    Lundberg S M, Lee S I. A unified approach to interpreting model predictions[C] //Advances in Neural Information Processing Systems. New York: Curran Associates. http://proceedings.neurips.cc/paper/2017/hash/8a20a8621978632d76c43dfd28b67767-Abstract,html
  • Related Articles

    [1]Zhou Yuanding, Gao Guopeng, Fang Yaodong, Qin Chuan. Perceptual Authentication Hashing with Image Feature Fusion Based on Window Self-Attention[J]. Journal of Computer Research and Development. DOI: 10.7544/issn1000-1239.202330669
    [2]Gao Wei, Chen Liqun, Tang Chunming, Zhang Guoyan, Li Fei. One-Time Chameleon Hash Function and Its Application in Redactable Blockchain[J]. Journal of Computer Research and Development, 2021, 58(10): 2310-2318. DOI: 10.7544/issn1000-1239.2021.20210653
    [3]Wu Linyang, Luo Rong, Guo Xueting, Guo Qi. Partitioning Acceleration Between CPU and DRAM: A Case Study on Accelerating Hash Joins in the Big Data Era[J]. Journal of Computer Research and Development, 2018, 55(2): 289-304. DOI: 10.7544/issn1000-1239.2018.20170842
    [4]Jiang Jie, Yang Tong, Zhang Mengyu, Dai Yafei, Huang Liang, Zheng Lianqing. DCuckoo: An Efficient Hash Table with On-Chip Summary[J]. Journal of Computer Research and Development, 2017, 54(11): 2508-2515. DOI: 10.7544/issn1000-1239.2017.20160795
    [5]Wang Wendi, Tang Wen, Duan Bo, Zhang Chunming, Zhang Peiheng, Sun Ninghui. Parallel Accelerator Design for High-Throughput DNA Sequence Alignment with Hash-Index[J]. Journal of Computer Research and Development, 2013, 50(11): 2463-2471.
    [6]Yuan Xinpan, Long Jun, Zhang Zuping, Luo Yueyi, Zhang Hao, and Gui Weihua. Connected Bit Minwise Hashing[J]. Journal of Computer Research and Development, 2013, 50(4): 883-890.
    [7]Qin Chuan, Chang Chin Chen, Guo Cheng. Perceptual Robust Image Hashing Scheme Based on Secret Sharing[J]. Journal of Computer Research and Development, 2012, 49(8): 1690-1698.
    [8]Ding Zhenhua, Li Jintao, Feng Bo. Research on Hash-Based RFID Security Authentication Protocol[J]. Journal of Computer Research and Development, 2009, 46(4): 583-592.
    [9]Li Zhiqiang, Chen Hanwu, Xu Baowen, Liu Wenjie. Fast Algorithms for Synthesis of Quantum Reversible Logic Circuits Based on Hash Table[J]. Journal of Computer Research and Development, 2008, 45(12): 2162-2171.
    [10]Liu Ji. One-Way Hash Function based on Integer Coupled Tent Maps and Its Performance Analysis[J]. Journal of Computer Research and Development, 2008, 45(3): 563-569.

Catalog

    Article views (377) PDF downloads (168) Cited by()

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return