VOTCL及其在交叉销售问题上的应用研究

周广通  尹义龙  郭心建  董彩玲

VOTCL及其在交叉销售问题上的应用研究

周广通尹义龙郭心建董彩玲

VOTCL and the Study of Its Application on Cross-Selling Problems

Zhou Guangtong, Yin Yilong, Guo Xinjian, and Dong Cailing

摘要

摘要: 交叉销售已成为企业盈利的重要手段，如何解决其数据中普遍同时存在的类别不平衡和代价敏感问题是准确预测交叉销售客户的关键，也是难点之一.针对上述问题，提出了一种基于最优阈值的投票方法：VOTCL.该方法首先结合过抽样和欠抽样技术获取多个类别平衡的训练数据集，然后在每个平衡数据集上分别训练得到多个底层学习器，最后利用所提出的基于最优阈值的投票集成方法集成底层学习器得到决策模型.在PAKDD 2007数据挖掘竞赛的交叉销售数据集上，VOTCL预测的AUC值为0.6037.该集成模型在性能上优于单个学习器，这也在一定程度上表明了所提出的基于最优阈值的投票集成方法的有效性.

Abstract: Cross-selling is regarded as one of the most promising strategies to make profits. The authors first describe a typical cross-selling model, followed by analysis showing that class-imbalance and cost-sensitivity usually co-exist in the data sets collected from this domain. In fact, the central issue in real-world cross-selling applications focuses on the identification of potential cross-selling customers. However, the performance of customer prediction suffers from the problem that class-imbalance and cost-sensitivity are arising simultaneously. To address this problem, an effective method called VOTCL is proposed. In the first stage, VOTCL generates a number of balanced training data sets by combining under-sampling and over-sampling techniques; then a base learner is trained on each of the data set in the second stage; finally, VOTCL obtains the final decision-making model by using an optimal threshold based voting scheme. The effectiveness of VOTCL is validated on the cross-selling data set provided by PAKDD 2007 competition where an AUC value of 0.6037 is achieved by using the proposed method. The ensemble model also outperforms a single base learner, which to some extent shows the efficacy of the proposed optimal threshold based voting scheme.

HTML全文

参考文献(0)

施引文献

资源附件(0)