Skip to main content
Log in

A clustering-based flexible weighting method in AdaBoost and its application to transaction fraud detection

  • Research Paper
  • Published:
Science China Information Sciences Aims and scope Submit manuscript

Abstract

AdaBoost is a famous ensemble learning method and has achieved successful applications in many fields. The existing studies illustrate that AdaBoost easily suffers from noisy points, resulting in a decline of classification performance. The main reason is that it increases the weights of all misclassified samples (especially noisy points) in the same way so that the influence of noisy points can hardly be weakened. In this paper, the clustering algorithm is used to dynamically decide noisy points in the process of iterations. More precisely, we compute a misclassification degree for every cluster in every iteration that is used to decide if a misclassified sample is a noisy point or not in the current iteration. Furthermore, we propose a flexible method to update the weights of the misclassified samples. The experimental results on 22 public datasets show that our method achieves better results than the state-of-the-art methods including AdaBoost, AdaCoast, LogitBoost, and SPLBoost. We also apply our method to the transactions fraud detection, and the experiments on our real big dataset of transactions also illustrate its good performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Yu B, Xu Z B. A comparative study for content-based dynamic spam classification using four machine learning algorithms. Know-Based Syst, 2008, 24: 355–362

    Article  Google Scholar 

  2. Ju W H, Vardi Y. A hybrid high-order Markov chain model for computer intrusion detection. J Comput Graph Stat, 2001, 10: 277–295

    Article  Google Scholar 

  3. Shen L, Bai L, Bardsley D, et al. Gabor feature selection for face recognition using improved adaboost learning. In: Li S Z, Sun Z, Tan T, eds. Advances in Biometric Person Authentication. Berlin: Springer, 2005. 3781: 39–49

    Chapter  Google Scholar 

  4. Panigrahi S, Kundu A, Sural S, et al. Credit card fraud detection: a fusion approach using Dempster-Shafer theory and Bayesian learning. Inf Fusion, 2009, 10: 354–363

    Article  Google Scholar 

  5. Salzberg S L. C4.5: programs for machine learning. Mach Learn, 1994, 16: 235–240

    Article  Google Scholar 

  6. Cortes C, Vapnik V. Support vector network. Mach Learn, 1995, 20: 273–297

    Article  MATH  Google Scholar 

  7. Ng A Y, Jordan M I. On discriminative vs. generative classifiers: a comparison of logistic regression and naive Bayes. In: Proceedings of the 14th International Conference on Neural Information Processing Systems: Natural and Synthetic, Cambridge, MIT Press, 2001. 841–848

    Google Scholar 

  8. Zhou Z H. Ensemble Learning. In: Encyclopedia of Biometrics. Boston: Springer, 2009

    Google Scholar 

  9. Dietterich T G. Ensemble methods in machine learning. In: Proceedings of International Workshgp on Multiple Classifier Systems, 2000. 1857: 1–15

  10. Freund Y, Schapire R E. A decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci, 1997, 55: 119–139

    Article  MATH  Google Scholar 

  11. Breiman L. Bagging predictors. Mach Learn, 1996, 24: 123–140

    Article  MATH  Google Scholar 

  12. Freund Y, Schipare R E. Experiments with a new boosting algorithm. In: Proceedings of the 13th International Conference on Machine Learning, 1996. 148–156

  13. Wei F, Stolfo S J, Zhang J X, et al. AdaCost: misclassification cost-sensitive boosting. In: Proceedings of International Conference on Machine Learning (ICML-99), Bled, 1999. 97–105

  14. Friedman J, Hastie T, Tibshirani R. Additive logistic regression: a statistical view of boosting (with discussion and a rejoinder by the authors). Ann Statist, 2000, 28: 337–407

    Article  MATH  Google Scholar 

  15. Wang K, Wang Y, Zhao Q, et al. SPLBoost: an improved robust boosting algorithm based on self-paced learning. 2017. ArXiv: 1706.06341

  16. Wong J A, Hartiganm A. Algorithm AS 136: a k-means clustering algorithm. J R Stat Soc, 1979, 28: 100–108

    Google Scholar 

  17. Breiman L. Random forests. Mach Learn, 2001, 45: 5–32

    Article  MATH  Google Scholar 

  18. Xuan S Y, Liu G J, Li Z C, et al. Random forest for credit card fraud detection. In: Proceedings of IEEE 15th International Conference on Networking, Sensing and Control (ICNSN), Zhuhai, 2018. 27–29

  19. Jiang C, Song J, Liu G, et al. Credit card fraud detection: a novel approach using aggregation strategy and feedback mechanism. IEEE Internet Things J, 2018, 5: 3637–3647

    Article  Google Scholar 

  20. Zhang F J, Liu G J, Li Z C, et al. GMM-based undersampling and its application for credit card fraud detection. In: Proceedings of the 32nd International Joint Conference on Neural Network (IJCNN2019), Budapest, 2019. 14–19

  21. Zheng L, Liu G, Yan C, et al. Transaction fraud detection based on total order relation and behavior diversity. IEEE Trans Comput Soc Syst, 2018, 5: 796–806

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported in part by National Key Research and Development Program of China (Grant No. 2018YFB2100801), and Fundamental Research Funds for Central Universities of China (Grant No. 22120190198). Authors would like to thank anonymous reviewers for their constructive comments.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Guanjun Liu or Changjun Jiang.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yang, C., Liu, G., Yan, C. et al. A clustering-based flexible weighting method in AdaBoost and its application to transaction fraud detection. Sci. China Inf. Sci. 64, 222101 (2021). https://doi.org/10.1007/s11432-019-2739-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11432-019-2739-2

Keywords

Navigation