Abstract
Given the rise in loan defaults, especially after the onset of the COVID-19 pandemic, it is necessary to predict if customers might default on a loan for risk management. This paper proposes an early warning system architecture using anomaly detection based on the unbalanced nature of loan default data in the real world. Most customers do not default on their loans; only a tiny percentage do, resulting in an unbalanced dataset. We aim to evaluate potential anomaly detection methods for their suitability in handling unbalanced datasets. We conduct a comparative study on different classification and anomaly detection approaches on a balanced and an unbalanced dataset. The classification algorithms compared are logistic regression and stochastic gradient descent classification. The anomaly detection methods are isolation forest and angle-based outlier detection (ABOD). We compare them using standard evaluation metrics such as accuracy, precision, recall, F1 score, training and prediction time, and area under the receiver operating characteristic (ROC) curve. The results show that these anomaly detection methods, particularly isolation forest, perform significantly better on unbalanced loan default data and are more suitable for real-world applications.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
2023 U.S. Lev Loan Default Forecast Raised to 2.0%–3.0%; 2024 Projected at 3.0%–4.0%. https://www.fitchratings.com/site/pr/10213716
Canada’s biggest banks set aside \$2.5 billion to cover an expected wave of loan defaults. https://www.thestar.com/business/2023/03/03/canadas-big-six-banks-set-aside-25-billion-as-they-prepare-for-credit-losses.html
CIBC - Annual Report 2022. https://www.cibc.com/content/dam/cibc-public-assets/about-cibc/investor-relations/pdfs/quarterly-results/2022/ar-22-en.pdf
Metrics to Evaluate your Machine Learning Algorithm. https://towardsdatascience.com/metrics-to-evaluate-your-machine-learning-algorithm-f10ba6e38234
Royal Bank of Canada - Annual Report 2022. https://www.rbc.com/investor-relations/_assets-custom/pdf/ar_2022_e.pdf
Dhaker, M.: L &T Vehicle Loan Default Prediction Data. Kaggle (2019). https://www.kaggle.com/datasets/mamtadhaker/lt-vehicle-loan-default-prediction
Kiefer, J., Wolfowitz, J.: Stochastic estimation of the maximum of a regression function. Ann. Math. Stat. 23(3), 462–466 (1952)
Kriegel, H.P., Schubert, M., Zimek, A.: Angle-based outlier detection in high-dimensional data. In: 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 444–452. ACM (2008). https://doi.org/10.1145/1401890.1401946
Liu, F.T., Ting, K.M., Zhou, Z.H.: Isolation forest. In: 2008 Eighth IEEE International Conference on Data Mining, pp. 413–422 (2008). https://doi.org/10.1109/ICDM.2008.17
McCullagh, P., Nelder, J.A.: Generalized Linear Models. Chapman and Hall (1983)
Mukherjee, P., Badr, Y.: Detection of defaulters in P2P lending platforms using unsupervised learning. In: 2022 IEEE International Conference on Omni-layer Intelligent Systems (COINS), pp. 1–5 (2022). https://doi.org/10.1109/COINS54846.2022.9854964
Mulero Chaves, J., De Cola, T.: Public warning applications: requirements and examples. In: Câmara, D., Nikaein, N. (eds.) Wireless Public Safety Networks 3, pp. 1–18. Elsevier (2017). https://doi.org/10.1016/B978-1-78548-053-9.50001-9
Nigmonov, A., Shams, S.: COVID-19 pandemic risk and probability of loan default: evidence from marketplace lending market. Financ. Innov. 7(1), 1–28 (2021). https://doi.org/10.1186/s40854-021-00300-x
Qiu, H., Tu, Y., Zhang, Y.: Anomaly detection for power consumption patterns in electricity early warning system. In: 2018 Tenth International Conference on Advanced Computational Intelligence (ICACI), pp. 867–873 (2018). https://doi.org/10.1109/ICACI.2018.8377577
Rao, C., Liu, Y., Goh, M.: Credit risk assessment mechanism of personal auto loan based on PSO-XGBoost model. In: Complex & Intelligent Systems. Springer Science and Business Media LLC (2022). https://doi.org/10.1007/s40747-022-00854-y
Siddhartha, M.: Bondora peer-to-peer lending data. IEEE Dataport (2020). https://doi.org/10.21227/33kz-0s65
Song, Y., Wang, Y., Ye, X., Zaretzki, R., Liu, C.: Loan default prediction using a credit rating-specific and multi-objective ensemble learning scheme. Inf. Sci. 629, 599–617 (2023). https://doi.org/10.1016/j.ins.2023.02.014
Zhu, Q., Ding, W., Xiang, M., Hu, M., Zhang, N.: Loan default prediction based on convolutional neural network and LightGBM. In: International Journal of Data Warehousing and Mining (IJDWM), vol. 19, pp. 1–16. IGI Global (2023)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Pirani, R., Kobti, Z. (2023). A Novel System Architecture for Anomaly Detection for Loan Defaults. In: Ossowski, S., Sitek, P., Analide, C., Marreiros, G., Chamoso, P., Rodríguez, S. (eds) Distributed Computing and Artificial Intelligence, 20th International Conference. DCAI 2023. Lecture Notes in Networks and Systems, vol 740. Springer, Cham. https://doi.org/10.1007/978-3-031-38333-5_14
Download citation
DOI: https://doi.org/10.1007/978-3-031-38333-5_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-38332-8
Online ISBN: 978-3-031-38333-5
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)