A clustering-based flexible weighting method in AdaBoost and its application to transaction fraud detection

Yang, Chaofan; Liu, Guanjun; Yan, Chungang; Jiang, Changjun

doi:10.1007/s11432-019-2739-2

A clustering-based flexible weighting method in AdaBoost and its application to transaction fraud detection

Research Paper
Published: 25 November 2021

Volume 64, article number 222101, (2021)
Cite this article

Science China Information Sciences Aims and scope Submit manuscript

Chaofan Yang^1,2,
Guanjun Liu^1,2,
Chungang Yan^1,2 &
…
Changjun Jiang^1,2

164 Accesses
13 Citations
Explore all metrics

Abstract

AdaBoost is a famous ensemble learning method and has achieved successful applications in many fields. The existing studies illustrate that AdaBoost easily suffers from noisy points, resulting in a decline of classification performance. The main reason is that it increases the weights of all misclassified samples (especially noisy points) in the same way so that the influence of noisy points can hardly be weakened. In this paper, the clustering algorithm is used to dynamically decide noisy points in the process of iterations. More precisely, we compute a misclassification degree for every cluster in every iteration that is used to decide if a misclassified sample is a noisy point or not in the current iteration. Furthermore, we propose a flexible method to update the weights of the misclassified samples. The experimental results on 22 public datasets show that our method achieves better results than the state-of-the-art methods including AdaBoost, AdaCoast, LogitBoost, and SPLBoost. We also apply our method to the transactions fraud detection, and the experiments on our real big dataset of transactions also illustrate its good performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Imbalanced Data Classification Approach Based on Clustered Training Set

Application of SIRUS in Credit Card Fraud Detection

LightGBM Model for Credit Card Fraud Discovery

References

Yu B, Xu Z B. A comparative study for content-based dynamic spam classification using four machine learning algorithms. Know-Based Syst, 2008, 24: 355–362
Article Google Scholar
Ju W H, Vardi Y. A hybrid high-order Markov chain model for computer intrusion detection. J Comput Graph Stat, 2001, 10: 277–295
Article Google Scholar
Shen L, Bai L, Bardsley D, et al. Gabor feature selection for face recognition using improved adaboost learning. In: Li S Z, Sun Z, Tan T, eds. Advances in Biometric Person Authentication. Berlin: Springer, 2005. 3781: 39–49
Chapter Google Scholar
Panigrahi S, Kundu A, Sural S, et al. Credit card fraud detection: a fusion approach using Dempster-Shafer theory and Bayesian learning. Inf Fusion, 2009, 10: 354–363
Article Google Scholar
Salzberg S L. C4.5: programs for machine learning. Mach Learn, 1994, 16: 235–240
Article Google Scholar
Cortes C, Vapnik V. Support vector network. Mach Learn, 1995, 20: 273–297
Article MATH Google Scholar
Ng A Y, Jordan M I. On discriminative vs. generative classifiers: a comparison of logistic regression and naive Bayes. In: Proceedings of the 14th International Conference on Neural Information Processing Systems: Natural and Synthetic, Cambridge, MIT Press, 2001. 841–848
Google Scholar
Zhou Z H. Ensemble Learning. In: Encyclopedia of Biometrics. Boston: Springer, 2009
Google Scholar
Dietterich T G. Ensemble methods in machine learning. In: Proceedings of International Workshgp on Multiple Classifier Systems, 2000. 1857: 1–15
Freund Y, Schapire R E. A decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci, 1997, 55: 119–139
Article MATH Google Scholar
Breiman L. Bagging predictors. Mach Learn, 1996, 24: 123–140
Article MATH Google Scholar
Freund Y, Schipare R E. Experiments with a new boosting algorithm. In: Proceedings of the 13th International Conference on Machine Learning, 1996. 148–156
Wei F, Stolfo S J, Zhang J X, et al. AdaCost: misclassification cost-sensitive boosting. In: Proceedings of International Conference on Machine Learning (ICML-99), Bled, 1999. 97–105
Friedman J, Hastie T, Tibshirani R. Additive logistic regression: a statistical view of boosting (with discussion and a rejoinder by the authors). Ann Statist, 2000, 28: 337–407
Article MATH Google Scholar
Wang K, Wang Y, Zhao Q, et al. SPLBoost: an improved robust boosting algorithm based on self-paced learning. 2017. ArXiv: 1706.06341
Wong J A, Hartiganm A. Algorithm AS 136: a k-means clustering algorithm. J R Stat Soc, 1979, 28: 100–108
Google Scholar
Breiman L. Random forests. Mach Learn, 2001, 45: 5–32
Article MATH Google Scholar
Xuan S Y, Liu G J, Li Z C, et al. Random forest for credit card fraud detection. In: Proceedings of IEEE 15th International Conference on Networking, Sensing and Control (ICNSN), Zhuhai, 2018. 27–29
Jiang C, Song J, Liu G, et al. Credit card fraud detection: a novel approach using aggregation strategy and feedback mechanism. IEEE Internet Things J, 2018, 5: 3637–3647
Article Google Scholar
Zhang F J, Liu G J, Li Z C, et al. GMM-based undersampling and its application for credit card fraud detection. In: Proceedings of the 32nd International Joint Conference on Neural Network (IJCNN2019), Budapest, 2019. 14–19
Zheng L, Liu G, Yan C, et al. Transaction fraud detection based on total order relation and behavior diversity. IEEE Trans Comput Soc Syst, 2018, 5: 796–806
Article Google Scholar

Download references

Acknowledgements

This work was supported in part by National Key Research and Development Program of China (Grant No. 2018YFB2100801), and Fundamental Research Funds for Central Universities of China (Grant No. 22120190198). Authors would like to thank anonymous reviewers for their constructive comments.

Author information

Authors and Affiliations

Department of Computer Science, Tongji University, Shanghai, 201804, China
Chaofan Yang, Guanjun Liu, Chungang Yan & Changjun Jiang
Shanghai Electronic Transactions and Information Service Collaborative Innovation Center, Tongji University, Shanghai, 201804, China
Chaofan Yang, Guanjun Liu, Chungang Yan & Changjun Jiang

Authors

Chaofan Yang
View author publications
You can also search for this author in PubMed Google Scholar
Guanjun Liu
View author publications
You can also search for this author in PubMed Google Scholar
Chungang Yan
View author publications
You can also search for this author in PubMed Google Scholar
Changjun Jiang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Guanjun Liu or Changjun Jiang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yang, C., Liu, G., Yan, C. et al. A clustering-based flexible weighting method in AdaBoost and its application to transaction fraud detection. Sci. China Inf. Sci. 64, 222101 (2021). https://doi.org/10.1007/s11432-019-2739-2

Download citation

Received: 26 April 2019
Revised: 04 August 2019
Accepted: 26 December 2019
Published: 25 November 2021
DOI: https://doi.org/10.1007/s11432-019-2739-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A clustering-based flexible weighting method in AdaBoost and its application to transaction fraud detection

Abstract

Access this article

Similar content being viewed by others

Imbalanced Data Classification Approach Based on Clustered Training Set

Application of SIRUS in Credit Card Fraud Detection

LightGBM Model for Credit Card Fraud Discovery

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A clustering-based flexible weighting method in AdaBoost and its application to transaction fraud detection

Abstract

Access this article

Similar content being viewed by others

Imbalanced Data Classification Approach Based on Clustered Training Set

Application of SIRUS in Credit Card Fraud Detection

LightGBM Model for Credit Card Fraud Discovery

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation