Skip to main content
Log in

A support vector machine (SVM) approach to imbalanced datasets of customer responses: comparison with other customer response models

  • Published:
Service Business Aims and scope Submit manuscript

Abstract

Customer response is a crucial aspect of service business. The ability to accurately predict which customer profiles are productive has proven invaluable in customer relationship management. An area that has received little attention in the literature on direct marketing is the class imbalance problem (the very low response rate). We propose a customer response predictive model approach combining recency, frequency, and monetary variables and support vector machine analysis. We have identified three sets of direct marketing data with a different degree of class imbalance (little, moderate, high) and used random undersampling method to reduce the degree of the imbalance problem. We report the empirical results in terms of gain values and prediction accuracy and the impact of random undersampling on customer response model performance. We also discuss these empirical results with the findings of previous studies and the implications for industry practice and future research.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  • Baesens B, Viaene S, Van den Poel D, Vanthienen J, Dedene G (2002) Bayesian neural network learning for repeat purchase modelling in direct marketing. Eur J Oper Res 138:191–211

    Article  Google Scholar 

  • Blattberg R, Kim B, Neslin S (2008) Database marketing: analyzing and managing customers, Chapt. 2 RFM analysis. Springer, New York

  • Bose I, Chen X (2009) Quantitative models for direct marketing: a review from systems perspective. Eur J Oper Res 195:1–16

    Article  Google Scholar 

  • Burez J, Van den Poel D (2009) Handling class imbalance in customer churn prediction. Expert Syst Appl 36:4626–4636

    Article  Google Scholar 

  • Clarke R, Ressom H, Wang A, Xuan J, Liu M, Gehan E, Wang Y (2008) The properties of high-dimensional data spaces: implications for exploring gene and protein expression data. Nat Rev 8:37–49

    Article  Google Scholar 

  • Cui D, Curry D (2005) Prediction in marketing using the support vector machine. Mark Sci 24:595–615

    Article  Google Scholar 

  • Cui G, Wong M, Zhang G, Li L (2008) Model selection for direct marketing: performance criteria and validation methods. Mark Intell Plan 26:275–292

    Article  Google Scholar 

  • Drummond C, Holte R (2003) C4.5, class imbalance, and cost sensitivity: why under-sampling beats over-sampling. In: Workshop on learning from imbalanced data sets at the 17th international conference on machine learning. Washington, DC, pp 1–8

  • Ha K, Cho S, Maclachlan D (2005) Response models based on bagging neural networks. J Interactive Mark 19:17–30

    Article  Google Scholar 

  • Han J, Kamber M, Pei J (2011) Data mining: concepts and techniques, 3rd edn. Morgan Kaufmann, San Francisco

    Google Scholar 

  • He H, Garcia E (2009) Learning from imbalanced data. IEEE Trans Knowl Data Eng 21:1263–1284

    Article  Google Scholar 

  • Hughes A (2005) Strategic database marketing, 3rd edn. McGraw-Hill, New York

    Google Scholar 

  • Joo Y, Kim Y, Yang S (2011) Valuing customers for social network services. J Bus Res 64:1239–1244

    Article  Google Scholar 

  • Khoshgoftaar T, Van Hulse J, Napolitano A (2010) Supervised neural network modeling: an empirical investigation into learning from imbalanced data with labeling errors. IEEE Trans Neural Netw 21:813–830

    Article  Google Scholar 

  • Khoshgoftaar T, Van Hulse J, Napolitano A (2011) Comparing boosting and bagging techniques with noisy and imbalanced data. IEEE Trans Syst Man Cybern Part A 41:552–568. doi:10.1109/Tsmca.2010.2084081

    Article  Google Scholar 

  • Lessmann S, Voß S (2009) A reference model for customer-centric data mining with support vector machines. Eur J Oper Res 199:520–530

    Article  Google Scholar 

  • Ling C, Li C (1998) Data mining for direct marketing: problems and solutions. In: Proceeding of 4th international conference on knowledge discovery and data mining (KDD’98). AAAI Press, New York, pp 73–79

  • Linoff G, Berry M (2011) Data mining techniques, 3rd edn. Wiley, Indianapolis

    Google Scholar 

  • McCarthy J, Hastak M (2007) Segmentation approaches in data-mining: a comparison of RFM, CHAID, and logistic regression. J Bus Res 60:656–662

    Article  Google Scholar 

  • Ngai E, Xiu L, Chau D (2009) Application of data mining techniques in customer relationship management: a literature review and classification. Expert Syst Appl 36:2592–2602. doi:10.1016/j.eswa.2008.02.021

    Article  Google Scholar 

  • Olson D (2007) Data mining in business services. Serv Bus 1:181–193. doi:10.1007/s11628-006-0014-7

    Article  Google Scholar 

  • Olson D, Delen D (2008) Advanced data mining techniques. Springer, Heidelberg

    Google Scholar 

  • Olson D, Cao Q, Gu C, Lee D (2009) Comparison of customer response models. Serv Bus 3:117–130

    Article  Google Scholar 

  • Schölkopf B, Smola A, Williamson R, Bartlett P (2000) New support vector algorithms. Neural Comput 12:1207–1245

    Article  Google Scholar 

  • Vapnik V (1995) The nature of statistical learning theory. Springer, New York

    Google Scholar 

  • Verhaert G, Van den Poel D (2011) Empathy as added value in predicting donation behavior. J Bus Res 64:1288–1295

    Article  Google Scholar 

  • Verhoef P, Spring P, Hoekstra J, Leeflang P (2003) The commerical use of segmentation and predictive modeling techniques for database marketing in the Netherlands. Decis Support Syst 34:471–481

    Article  Google Scholar 

  • Verhoef P, Venkatesan R, McAlister L, Malthouse E, Krafft M, Ganesan S (2010) CRM in data-rich multichannel retailing environments: a review and future research directions. J Interactive Mark 24:121–137

    Article  Google Scholar 

  • Viaene S, Baesens B, Van Gestel T, Suykens J, Van den Poel D, Vanthienen J, De Moor B, Dedene G (2001) Knowledge discovery in a direct marketing case using least squares support vector machines. Int J Intell Syst 16:1023–1036

    Article  Google Scholar 

  • Wang K, Zhou S, Yang Q, Yeung J (2005) Mining customer value: from association rules to direct marketing. Data Min Knowl Disc 11:57–79. doi:10.1007/s10618-005-1355-x

    Article  Google Scholar 

  • Weiss G (2004) Mining with rarity: a unifying framework. ACM SIGKDD Explor Newsl 6:7–19

    Article  Google Scholar 

  • Wu J, Roy J, Stewart W (2010) Prediction modeling using EHR data. Med Care 48:S106–S113

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bongsug Kevin Chae.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kim, G., Chae, B.K. & Olson, D.L. A support vector machine (SVM) approach to imbalanced datasets of customer responses: comparison with other customer response models. Serv Bus 7, 167–182 (2013). https://doi.org/10.1007/s11628-012-0147-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11628-012-0147-9

Keywords

Navigation