Abstract:
Cross-selling is regarded as one of the most promising strategies to make profits. The authors first describe a typical cross-selling model, followed by analysis showing that class-imbalance and cost-sensitivity usually co-exist in the data sets collected from this domain. In fact, the central issue in real-world cross-selling applications focuses on the identification of potential cross-selling customers. However, the performance of customer prediction suffers from the problem that class-imbalance and cost-sensitivity are arising simultaneously. To address this problem, an effective method called VOTCL is proposed. In the first stage, VOTCL generates a number of balanced training data sets by combining under-sampling and over-sampling techniques; then a base learner is trained on each of the data set in the second stage; finally, VOTCL obtains the final decision-making model by using an optimal threshold based voting scheme. The effectiveness of VOTCL is validated on the cross-selling data set provided by PAKDD 2007 competition where an AUC value of 0.6037 is achieved by using the proposed method. The ensemble model also outperforms a single base learner, which to some extent shows the efficacy of the proposed optimal threshold based voting scheme.