Fuzzy clustering analysis for the loan audit short texts

Han, Lu; Liu, Zhidong; Qiang, Jipeng; Zhang, Zhuangyi

doi:10.1007/s10115-023-01943-1

Fuzzy clustering analysis for the loan audit short texts

Regular paper
Published: 25 July 2023

Volume 65, pages 5331–5351, (2023)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Lu Han ORCID: orcid.org/0000-0002-1120-3220¹,
Zhidong Liu¹,
Jipeng Qiang² &
…
Zhuangyi Zhang¹

155 Accesses
1 Citation
Explore all metrics

Abstract

In China, post-loan management is usually executed in the form of a visit survey conducted by a credit manager. Through a quarterly visit survey, a large number of loan audit short texts, which contain valuable information for evaluating the credit status of small and micro-enterprises, are collected. However, methods for analysing this type of short text remain lacking. This study proposes a method for processing short loan audit texts called fuzzy clustering analysis (FCA). This method first transforms short texts into a fuzzy matrix through lexical analysis; it then calculates the similarity between records based on each fuzzy matrix and constructs an association graph with this similarity. Finally, it uses a prism minimum spanning tree to extract clusters based on different \({\alpha }\) cuts. Experiments using actual data from a commercial bank in China revealed that the FCA yields suitable clustering results when handling loan audit briefs. Moreover, it exhibited superior performance compared to BIRCH, k-means, and fuzzy c-means.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

W-Hash: A Novel Word Hash Clustering Algorithm for Large-Scale Chinese Short Text Analysis

A Novel Short Text Clustering Model Based on Grey System Theory

Article 15 October 2019

An Approach to Fuzzy Hierarchical Clustering of Short Text Fragments Based on Fuzzy Graph Clustering

References

Moscato V, Picariello A, Sperlí G (2021) A benchmark of machine learning approaches for credit score prediction. Expert Syst Appl 165:113986. https://doi.org/10.1016/j.eswa.2020.113986
Article Google Scholar
Dastile X, Celik T, Potsane M (2020) Statistical and machine learning models in credit scoring: a systematic literature survey. Appl Soft Comput 91:106263. https://doi.org/10.1016/j.asoc.2020.106263
Article Google Scholar
Acheampong A, Elshandidy T (2021) Does soft information determine credit risk? Text-based evidence from European banks. J Int Financ Mark Inst Money. https://doi.org/10.1016/j.intfin.2021.101303
Article Google Scholar
Cao J, Xu X, Yin X, Pan B (2022) A risky large group emergency decision-making method based on topic sentiment analysis. Expert Syst Appl 195:116527. https://doi.org/10.1016/j.eswa.2022.116527
Article Google Scholar
Rashid J, Shah SMA, Irtaza A (2019) Fuzzy topic modeling approach for text mining over short text. Inf Process Manag 56(6):102060. https://doi.org/10.1016/j.ipm.2019.102060
Article Google Scholar
Yadollahi A, Shahraki AG, Zaiane OR (2017) Current state of text sentiment analysis from opinion to emotion mining. ACM Comput Surv 50(2):25. https://doi.org/10.1145/3057270
Article Google Scholar
Fan F, Zhao WX, Wen J, Xu G, Chang EY (2017) Mining collective knowledge: inferring functional labels from online review for business. Knowl Inf Syst 53(3):723–747. https://doi.org/10.1007/s10115-017-1050-4
Article Google Scholar
Ruspini EH (1969) A new approach to clustering. Inf Control 15(1):22–32
Article MATH Google Scholar
Wang HY, Wang J, Wang G (2022) A survey of fuzzy clustering validity evaluation methods. Inf Sci 618:270–297. https://doi.org/10.1016/j.ins.2022.11.010
Article Google Scholar
Silva DMB, Pereira GHA, Magalhães TM (2022) A class of categorization methods for credit scoring models. Eur J Oper Res 296(1):323–331. https://doi.org/10.1016/j.ejor.2021.04.029
Article MathSciNet MATH Google Scholar
Kozodoi N, Jacob J, Lessmann S (2022) Fairness in credit scoring: assessment, implementation and profit implications. Eur J Oper Res 297(3):1083–1094. https://doi.org/10.1016/j.ejor.2021.06.023
Article MathSciNet MATH Google Scholar
Altman EI (2018) A fifty-year retrospective on credit risk models, the Altman Z-score family of models and their applications to financial markets and managerial strategies. J Credit Risk 14(4):1–34. https://doi.org/10.21314/JCR.2018.243
Article Google Scholar
Gunnarsson BR, Vanden Broucke S, Baesens B, Óskarsdóttir M, Lemahieu W (2021) Deep learning for credit scoring: do or don’t? Eur J Oper Res 295(1):292–305. https://doi.org/10.1016/j.ejor.2021.03.006
Article MathSciNet MATH Google Scholar
Louzada F, Ara A, Fernandes GB (2016) Classification methods applied to credit scoring: systematic review and overall comparison. Surv Oper Res Manag Sci 21(2):117–134. https://doi.org/10.1016/j.sorms.2016.10.001
Article MathSciNet Google Scholar
Makki S, Assaghir Z, Taher Y, Haque R, Hacid MS, Zeineddine H (2019) An experimental study with imbalanced classification approaches for credit card fraud detection. IEEE Access 7:93010–93022. https://doi.org/10.1109/ACCESS.2019.2927266
Article Google Scholar
Jiang C, Lu W, Wang Z, Ding Y (2023) Benchmarking state-of-the-art imbalanced data learning approaches for credit scoring. Expert Syst Appl 213:118878. https://doi.org/10.1016/j.eswa.2022.118878
Article Google Scholar
Wang Z, Jiang C, Zhao H, Ding Y (2020) Mining semantic soft factors for credit risk evaluation in peer-to-peer lending. J Manag Inf Syst 37(1):282–308. https://doi.org/10.1080/07421222.2019.1705513
Article Google Scholar
Stevenson M, Mues C, Bravo C (2021) The value of text for small business default prediction: a deep learning approach. Eur J Oper Res 295(2):758–771. https://doi.org/10.1016/j.ejor.2021.03.008
Article MathSciNet MATH Google Scholar
Yang K, Yuan H, Lau RYK (2022) PsyCredit: an interpretable deep learning-based credit assessment approach facilitated by psychometric natural language processing. Expert Syst Appl 198:116847. https://doi.org/10.1016/j.eswa.2022.116847
Article Google Scholar
Dong C, Li Y, Gong H, Chen M, Li J, Shen Y, Yang M (2022) A survey of natural language generation. ACM Comput Surv. https://doi.org/10.1145/3554727
Article Google Scholar
Erdem E, Kuyu M, Yagcioglu S, Frank A, Parcalabescu L, Plank B, Babii A, Turuta O, Erdem A, Calixto I, Lloret E (2022) Neural natural language generation: a survey on multilinguality, multimodality, controllability and learning. J Artif Int Res. https://doi.org/10.1613/jair.1.12918
Article MathSciNet MATH Google Scholar
Chen Y, Zhang H, Liu R, Ye Z, Lin J (2019) Experimental explorations on short text topic mining between LDA and NMF based Schemes. Knowl-Based Syst 163:1–13. https://doi.org/10.1016/j.knosys.2018.08.011
Article Google Scholar
Choudhary N, Aggarwal CC, Subbian K, Reddy CK (2022) Self-supervised short-text modeling through auxiliary context generation. ACM Trans Intell Syst Technol 13(3):51. https://doi.org/10.1145/3511712
Article Google Scholar
Feng J, Zhang Z, Ding C, Rao Y, Xie H, Wang FL (2022) Context reinforced neural topic modeling over short texts. Inf Sci 607:79–91. https://doi.org/10.1016/j.ins.2022.05.098
Article Google Scholar
Srivastava R, Singh P, Rana KPS, Kumar V (2022) A topic modeled unsupervised approach to single document extractive text summarization. Knowl-Based Syst 246:108636. https://doi.org/10.1016/j.knosys.2022.108636
Article Google Scholar
Shi Y, Zhu LY, Li W, Gao K, Zheng YC (2019) Survey on classic and latest textual sentiment analysis articles and techniques. Int J Inf Technol Decis Mak 18(4):1243–1287. https://doi.org/10.1142/S0219622019300015
Article Google Scholar
Wang L, Niu JW, Yu S (2020) SentiDiff: combining textual information and sentiment diffusion patterns for Twitter sentiment analysis. IEEE Trans Knowl Data Eng 32(10):2026–2039. https://doi.org/10.1109/TKDE.2019.2913641
Article Google Scholar
Ahmed M, Chen Q, Li ZH (2020) Constructing domain-dependent sentiment dictionary for sentiment analysis. Neural Comput Appl 32(18):14719–14732. https://doi.org/10.1007/s00521-020-04824-8
Article Google Scholar
Alekseev V, Egorov E, Vorontsov K, Goncharov A, Nurumov K, Buldybayev T (2021) TopicBank: collection of coherent topics using multiple model training with their further use for topic model validation. Data Knowl Eng. https://doi.org/10.1016/j.datak.2021.101921
Article Google Scholar
Consoli S, Barbaglia L, Manzan S (2022) Fine-grained, aspect-based sentiment analysis on economic and financial lexicon. Knowl-Based Syst 247:108781. https://doi.org/10.1016/j.knosys.2022.108781
Article Google Scholar
Xu J, Liu J, Araki K (2015) A hybrid topic model for multi-document summarization. IEICE Trans Inf Syst E98D(5):1089–1094. https://doi.org/10.1587/transinf.2014EDP7229
Article Google Scholar
Zhou H, Yu H, Hu R (2017) Topic evolution based on the probabilistic topic model: a review. Front Comput Sci 11(5):786–802. https://doi.org/10.1007/s11704-016-5442-5
Article Google Scholar
Han HY, Zhang JP, Yang J, Shen YR, Zhang YS (2018) Generate domain-specific sentiment lexicon for review sentiment analysis. Multimed Tools Appl 77(16):21265–21280. https://doi.org/10.1007/s11042-017-5529-5
Article Google Scholar
Chauhan U, Shah A (2021) Topic modeling using latent Dirichlet allocation: a survey. ACM Comput Surv. https://doi.org/10.1145/3462478
Article Google Scholar
Xu G, Yu Z, Yao H, Li F, Meng Y, Wu X (2019) Chinese text sentiment analysis based on extended sentiment dictionary. IEEE Access 7:43749–43762. https://doi.org/10.1109/ACCESS.2019.2907772
Article Google Scholar
Gul S, Räbiger S, Saygın Y (2022) Context-based extraction of concepts from unstructured textual documents. Inf Sci 588:248–264. https://doi.org/10.1016/j.ins.2021.12.056
Article Google Scholar
Sinoara RA, Camacho-Collados J, Rossi RG, Navigli R, Rezende SO (2019) Knowledge-enhanced document embeddings for text classification. Knowl-Based Syst 163:955–971. https://doi.org/10.1016/j.knosys.2018.10.026
Article Google Scholar
Rahimi Z, Homayounpour MM (2020) Tens-embedding: a tensor-based document embedding method. Expert Syst Appl. https://doi.org/10.1016/j.eswa.2020.113770
Article Google Scholar
Wu Y, Zhao S, Li W (2020) Phrase2Vec: phrase embedding based on parsing. Inf Sci 517:100–127. https://doi.org/10.1016/j.ins.2019.12.031
Article Google Scholar
Yu H (2020) Bibliographic automatic classification algorithm based on semantic space transformation. Multim Tools Appl 79(13–14):9283–9297. https://doi.org/10.1007/s11042-019-7400-3
Article Google Scholar
Song R, Gao S, Yu Z, Zhang Y, Zhou G (2021) Case2vec: joint variational autoencoder for case text embedding representation. Int J Mach Learn Cybern 12(9):2517–2528. https://doi.org/10.1007/s13042-021-01335-3
Article Google Scholar
Tang Z, Li W, Li Y, Zhao W, Li S (2020) Several alternative term weighting methods for text representation and classification. Knowl-Based Syst. https://doi.org/10.1016/j.knosys.2020.106399
Article Google Scholar
Liu S, Wang X, Collins C, Dou W, Ouyang F, El-Assady M, Jiang L, Keim DA (2019) Bridging text visualization and mining: a task-driven survey. IEEE Trans Vis Comput Graph 25(7):2482–2504. https://doi.org/10.1109/TVCG.2018.2834341
Article Google Scholar
Jung H, Lee BG (2020) Research trends in text mining: semantic network and main path analysis of selected journals. Expert Syst Appl. https://doi.org/10.1016/j.eswa.2020.113851
Article Google Scholar
Li M (2021) Capturing the risk signals for a specific emerging technology: an integrated framework of text mining. IEEE Trans Eng Manag 68(5):1245–1258. https://doi.org/10.1109/TEM.2019.2930335
Article Google Scholar
Cheerkoot-Jalim S, Khedo KK (2021) A systematic review of text mining approaches applied to various application areas in the biomedical domain. J Knowl Manag 25(3):642–668. https://doi.org/10.1108/JKM-09-2019-0524
Article Google Scholar
Han L, Rajasekar A, Li S (2022) An evidence-based credit evaluation ensemble framework for online retail SMEs. Knowl Inf Syst 64(6):1603–1623. https://doi.org/10.1007/s10115-022-01682-9
Article Google Scholar
Li S, Han L (2023) A two-stage NER method for online-sale comments. Springer, Singapore. https://doi.org/10.1007/978-981-19-2768-3_26
Book Google Scholar
Ferreira R, Lins RD, Simske SJ, Freitas F, Riss M (2016) Assessing sentence similarity through lexical, syntactic and semantic analysis. Comput Speech Lang 39:1–28. https://doi.org/10.1016/j.csl.2016.01.003
Article Google Scholar
Takahashi S, Tanaka-Ishii K (2019) Evaluating computational language models with scaling properties of natural language. Comput Linguist 45(3):481–513. https://doi.org/10.1162/coli_a_00355
Article Google Scholar
Wang J, Lin J, Han L (2023) Word2vec fuzzy clustering algorithm and its application in credit evaluation. Springer, Singapore. https://doi.org/10.1007/978-981-19-2768-3_56
Book Google Scholar
Pop PC (2020) The generalized minimum spanning tree problem: an overview of formulations, solution procedures and latest advances. Eur J Oper Res 283(1):1–15. https://doi.org/10.1016/j.ejor.2019.05.017
Article MathSciNet MATH Google Scholar
Khan MS, Lohani QMD (2022) Topological analysis of intuitionistic fuzzy distance measures with applications in classification and clustering. Eng Appl Artif Intell 116:105415. https://doi.org/10.1016/j.engappai.2022.105415
Article Google Scholar

Download references

Acknowledgements

The work was supported by the National Natural Science Foundation of China (Grant No. 72101279).

Author information

Authors and Affiliations

School of Management Science and Engineering, Central University of Finance and Economics, Beijing, 100081, China
Lu Han, Zhidong Liu & Zhuangyi Zhang
School of Information Engineering, Yangzhou University, Yangzhou, 225127, China
Jipeng Qiang

Authors

Lu Han
View author publications
You can also search for this author in PubMed Google Scholar
Zhidong Liu
View author publications
You can also search for this author in PubMed Google Scholar
Jipeng Qiang
View author publications
You can also search for this author in PubMed Google Scholar
Zhuangyi Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

LH and JQ wrote the main manuscript text, ZL proposed research ideas, and ZZ revised the manuscript and summarized literatures. All authors reviewed the manuscript.

Corresponding author

Correspondence to Jipeng Qiang.

Ethics declarations

Conflict of interest

There are no conflicts of interest to declare.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Han, L., Liu, Z., Qiang, J. et al. Fuzzy clustering analysis for the loan audit short texts. Knowl Inf Syst 65, 5331–5351 (2023). https://doi.org/10.1007/s10115-023-01943-1

Download citation

Received: 25 March 2023
Revised: 12 June 2023
Accepted: 10 July 2023
Published: 25 July 2023
Issue Date: December 2023
DOI: https://doi.org/10.1007/s10115-023-01943-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fuzzy clustering analysis for the loan audit short texts

Abstract

Access this article

Similar content being viewed by others

W-Hash: A Novel Word Hash Clustering Algorithm for Large-Scale Chinese Short Text Analysis

A Novel Short Text Clustering Model Based on Grey System Theory

An Approach to Fuzzy Hierarchical Clustering of Short Text Fragments Based on Fuzzy Graph Clustering

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Fuzzy clustering analysis for the loan audit short texts

Abstract

Access this article

Similar content being viewed by others

W-Hash: A Novel Word Hash Clustering Algorithm for Large-Scale Chinese Short Text Analysis

A Novel Short Text Clustering Model Based on Grey System Theory

An Approach to Fuzzy Hierarchical Clustering of Short Text Fragments Based on Fuzzy Graph Clustering

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation