Credit Card Fraud Detection Based on Machine Learning Classification Algorithm

Abstract


A. Introduction
Credit card fraud poses major risks and costs for financial institutions globally, with losses estimated at over $30 billion annually [1].Traditional rulebased fraud detection systems rely on cumbersome manual rule engineering which struggles to keep pace with the evolving tactics of sophisticated fraudsters [2].Moreover, such systems often suffer from unacceptably high false positive rates, negatively impacting the customer experience [3] [4].Machine learning has emerged as a promising approach for developing more accurate predictive models that can adapt to changing fraud patterns without extensive manual work .Recent studies have shown machine learning algorithms such as random forests, neural networks, and ensemble methods achieve high fraud detection performance when applied to credit card transaction data [5] [6].However, open questions remain regarding several key factors important for real-world implementation, including model performance across different environments, explain ability of predictions, and suitability for operational use in high-risk financial applications that demand transparency and accountability [7][8].
This paper presents a rigorous comparative evaluation of popular machine learning algorithms for the task of credit card fraud risk analysis and prediction.Models are trained and tested on a large real-world transaction dataset and objectively assessed based on predictive power as well as issues like class imbalance handling and interpretability [9] [10].The most effective and transparent models are identified according to their ability to balance predictive performance with characteristics necessary for use in financial risk analysis systems that demand trusted decision-making [11] [12].The main aim of this work A performance benchmark of classification algorithms for credit card fraud detection, Identification of suitable machine learning approaches through consideration of multiple factors beyond predictive accuracy alone, Guidance for stakeholders on responsibly applying advanced analytics for credit risk assessment.The results provide insights for progressing fraud detection capabilities in a manner aligned with expectations of the financial sect.Figure 1.It focuses on the ML integration approach to support credit card detection

B. Machine Learning Algorithms Logistic Regression:
Logistic regression is a statistical model that models the probability of a binary outcome (0 or 1) based on one or more predictor variables.It uses the logistic sigmoid function to map the linear combination of the predictors to a value between 0 and 1, representing the probability of belonging to the positive class.Logistic regression is widely used in various fields, including credit card fraud detection, due to its interpretability and ability to handle both continuous and categorical predictors [13].Figure 2. It shows an S-shaped curve labeled "Fraud" and a straight line labeled "Non Fraud.".[37] Naive Bayes:

Figure 2. Logistic regression algorithm
Naive Bayes classifiers are a family of simple yet powerful probabilistic classifiers based on Bayes' theorem with the "naive" assumption of independence between features.They calculate the probability of each class given the feature values and then select the class with the highest probability.Despite the strong independence assumption, Naive Bayes classifiers often perform surprisingly well in practice and are particularly useful for text classification and spam filtering tasks [14][15].Figure 3. the flowchart illustrating an iterative process for analyzing attributes or features by repeatedly examining values, computing probabilities for classes, and updating class assignments for each attribute until no more attributes remain.

Support Vector Machine (SVM):
SVMs are a class of supervised learning algorithms that can be used for both classification and regression tasks.The key idea behind SVMs is to find the optimal hyperplane that maximizes the margin between the classes in a high-dimensional feature space.This is achieved by transforming the input data using a kernel function and then finding the maximum-margin hyperplane in the transformed space [16] [17].SVMs are known for their ability to handle high-dimensional data and their effectiveness in dealing with non-linear decision boundaries [18].

Random Forest:
Random forests are an ensemble learning method that combines multiple decision trees to improve accuracy and reduce overfitting.Each tree in the forest is trained on a random subset of the features and instances, using a technique called bootstrap aggregating (bagging) [21] [22 ].The final prediction is made by aggregating the predictions of all trees, typically by majority vote for classification or by averaging for regression tasks.Random forests are highly effective and as demonstrated by logistic regression and KNN, meeting a critical requirement for payment security.All things considered, this study proved that supervised algorithms are practical for use in credit card fraud detection applications.
Chen et.al. (2020) [32], This research paper presented a study on utilizing machine learning techniques to identify credit card fraud.This study was created by the researchers in response to the considerable financial losses brought on by credit card fraud.For this challenge, two supervised classification algorithmslogistic regression and K-Nearest Neighbours (KNN)-were created and contrasted.Behavioural analysis, Hidden Markov models, and genetic algorithms were all studied in earlier related work.To deal with discrepancies, the suggested system first preprocessed the transaction data.After that, the data was used to train logistic regression and KNN classifiers to distinguish between legitimate and fraudulent transactions.The models' accuracy was assessed in order to make comparisons.Kumar S et. al. (2020) [33], The studied evaluate assessed naive Bayes, random forest, logistic regression, decision tree, and artificial neural network (ANN) models.A overview of similar past work on supervised and unsupervised fraud detection methods is also included.The research makes use of a European transaction dataset with over 284,000 records, 492 of which are fraud incidents.To address the unbalanced data, oversampling is used.To determine if a transaction is legitimate or fraudulent, the models are trained and evaluated using the dataset.To compare the performance of the models, evaluation criteria including accuracy, precision, and recall are employed.At 98.69%, the ANN model had the highest accuracy.Analysis findings are displayed as confusion matrices.
Sarag I h et.al. (2020) [34], The paper presented the use of machine learning techniques to identify credit card fraud is covered in this research study.The number of credit card fraud cases has been steadily increasing, resulting in enormous annual losses.Using a transaction dataset, the study employs algorithms such as artificial neural networks, decision trees, machine learning, and isolation forests.99.87% accuracy was attained by the isolation forest method in detecting unusual transactions.The algorithm is an outlier detection method that divides instances and arbitrarily chooses attributes to isolate observations.Multiple decision trees are built from randomly chosen data subsets in order for it to function.Anomalies are those transactions that require additional partitioning in order to separate.
Luo et.al. (2020) [35], This paper presented proposed a wise on-line banking gadget based totally on the HERCULES structure.It introduces the new features and challenges of on-line banking with the development of economic technology and synthetic intelligence.The HERCULES architecture is analysed, consisting of its multi-channel get admission to issue model and tender load balancing set of rules.AOP dynamic module procedure templates and the enterprise transaction protection framework are also discussed.Machine mastering algorithms are applied to key methods like clever deposits and whitecollar loans.An clever on line banking commercial enterprise model is designed and carried out primarily based on the HERCULES architecture.
Visalakshi et.al.(2021) [36], The main focused of this research paper is to analyses the identification of credit score card fraud in saving accounts based totally on transactions.Both on line and offline fraud occur in online and offline account transactions in the actual global.However, the charge of fraud incidents has multiplied exponentially over time.An good sized survey was performed on extraordinary techniques used to detect fraud in on-line transactions.Based at the survey, various gadget gaining knowledge of algorithms like random wooded area, choice tree, SVM, Gaussian NB and logistic regression had been proposed to stumble on fraudulent transactions and perceive correct statistics The paper explores applying these algorithms on a credit card transactions dataset to classify fraudulent and valid activities.The modules protected data evaluation, cleaning, pre-processing, partitioning the dataset for schooling and testing, and then evaluating the results.Random woodland, SVM, choice tree, Gaussian NB and logistic regression algorithms have been implemented on the dataset.The most accuracy accomplished turned into above 90% based totally at the consequences.A comparison of the set of rules performances is supplied in a graph.Each algorithm was observed to paintings properly but with a few versions in accuracy.The proposed fashions can help stumble on credit card fraud and decrease financial losses.Future paintings may contain growing a software of this solution and exploring new technologies like machine mastering, AI and deep study.
Vuppula et.al. (2021) [37], This studied explored making use of machine learning algorithms for monetary transaction fraud detection.Credit card fraud poses a sizeable problem, resulting in substantial financial losses for banks and customers.The researchers aimed to develop an advanced version for efficaciously classifying fraudulent and valid transactions.A huge database containing over 1,000,000 actual-world credit card transactions became received from Kaggle.Data pre-processing methods consisting of cleaning, normalization and feature choice were applied to put together the dataset for modelling.Several type algorithms have been then implemented, together with Logistic Regression, Decision Tree and Random Forest.Notably, a gradient boosting set of rules referred to as Light GBM accomplished the very best overall performance, demonstrating over 90% accuracy and a robust AUC rating, as validated via the confusion matrix and ROC curves.This validates Light GBM's robust predictive electricity for this application.
Hamal et.al. (2021) [38], This paper presented a looked at ambitions to evaluate the effectiveness of gadget mastering classifiers in detecting economic accounting fraud for small-and medium-sized companies (SMEs) in Turkey.The dataset consists of financial statements from 341 Turkish SMEs over a five-12 months length, comprising 1384 non-fraudulent cases and 321 fraudulent cases diagnosed through 122 corporations.Two degrees of analysis are carried out.In the primary degree, 32 financial ratios are calculated from the financial statements.Feature choice techniques (T-check and genetic search) are used to identify the maximum essential ratios for detecting fraud.Sampling strategies (oversampling and underneath sampling) also are applied to cope with magnificence imbalance among fraudulent and non-fraudulent instances.The performance of 7 machine gaining knowledge of classifiers (assist vector machine, Naive Bayes, artificial neural community, okay-nearest neighbour, random woodland, logistic regression, bagging) is evaluated and as compared the usage of numerous metrics.That the random wooded area classifier with out function choice and with oversampling plays first-rate average in detecting monetary accounting fraud for Turkish SMEs.This observe contributes to the literature by means of making use of sampling techniques to address class imbalance and comparing their effect on fraud detection accuracy.The findings can assist banks and different stakeholders improve fraud chance evaluation for SMEs Izotova et.al. (2021) [39], This paper presented compare Poisson approaches and machine getting to know algorithms for credit card fraud detection.Fraud detection in imbalanced records is hard because of the rarity of fraudulent lessons.Firstly, homogeneous and non-homogeneous Poisson methods are used to version the intensity of fraudulent events through the years.The probability function is derived to estimate the intensity parameter.Three Poisson models are examined with steady, linear and quadratic intensity functions.Secondly, ensemble system mastering methods consisting of Light GBM, XGB oost and Cat Boost are carried out.These gradient boosting algorithms sequentially construct timber to decrease error.Several strategies deal with statistics imbalance, which includes placing clean customers' intensity to 0. The dataset of ninety-four,850 credit score card transactions is pre-processed.Key elements like consumer ID, time and label are extracted.The records is break up into eighty% education and 20% test units via customer.The Poisson fashions and ensembles are evaluated on test records the use of ROC-AUC.Gradient boosting achieves near-perfect accuracy at the same time as Poisson fashions carry out reasonably higher than random.However, Poisson models require fewer attributes and less computation.Overall, the paper demonstrates two methods for fraud detection -stochastic strategies modelling occasion depth and supervised system studying.Poisson methods offer a simplified detection approach.When mixed with ensembles, those strategies ought to cause greater effective fraud detection on monetary datasets.The studies offers insights into credit card fraud analysis.
Kute et.al. (2021) [40], The paper performed a comprehensive assessment of the literature on making use of system gaining knowledge of, deep mastering, and explainable AI strategies for detecting suspicious cash laundering transactions.A quantity of machine mastering algorithms and strategies have been studied for anti-cash laundering, which includes selection trees, random forest, neural networks, graph-primarily based strategies, and anomaly detection models.However, many studies lacked focus on records pleasant and actual-international evaluation.Deep studying procedures which includes graph convolutional networks and autoencoders have shown promise for analysing financial transaction networks and figuring out anomalous patterns.Natural language processing combined with deep getting to know can comprise additional context.However, interpretability remains a venture for many fashions like neural networks that are dealt with as "black containers".Explainable AI has now not been extensively included to deal with this for regulatory compliance.Most research skilled and evaluated models on older transaction databases, missing actual-time access to more latest touchy cash laundering instances and labelled information for supervised gaining knowledge of.Key barriers protected scarcity of labelled facts, facts nice troubles, and inability to dynamically update complicated monetary fraud patterns over the years in an unmonitored manner.
The paper identifies possibilities to use cutting-edge techniques like reinforcement mastering, graph networks, and bringing factors via XAI as areas for future work.
Stojanović et.al. (2021) [41], This paper presented the Machine gaining knowledge of and anomaly detection strategies are being increasingly used to locate fraud in fintech domain names like credit score playing cards, economic transactions, and blockchain.This is due to the fact fraud is adaptive and guide detection is inaccurate and inefficient.Techniques implemented include deep studying, clustering, neural networks for credit cards.Financial transaction fraud addressed using clustering, graphs and visible analytics.Blockchain fraud detection uses clustering, random forests and isolation forests.The paper evaluates outlier detection techniques like random wooded area, isolation wooded area, and elliptic envelope on real and artificial monetary fraud datasets.It analyses approach effectiveness the use of metrics like AUC.Feature engineering and choice play an essential function in fraud detection across domain names for addressing challenges like elegance imbalance and idea glide over time.
Zhou et.al. (2021) [42], This paper proposed an shrewd and disbursed Big Data technique for detecting economic fraud at the internet.It objectives to enhance the efficiency of fraud detection on big-scale datasets.The approach includes four fundamental modules: facts pre-processing, function extraction from ordinary data, graph embedding the usage of Node2Vec, and a prediction module the usage of a deep neural community classifier.It constructs a network graph from the financial transactions and uses the Node2Vec algorithm to examine topological representations of nodes in the graph as low-dimensional vectors.This captures structural and homophily features in the transaction network.Node2Vec extends Deepthi's paper proposes an sensible and dispensed Big Data technique for detecting monetary fraud on the net.
Severino et.al. )2021) [43], This paper presented evaluated fraud prediction in assisted insurance claims the usage of diverse machine learning models based on real-world information from a prime Brazilian insurance organisation.Nine machines getting to know algorithms had been tested: logistic regression, penalized logistic regression, naive Bayes, K-nearest neighbour's, assist vector gadget with polynomial and Gaussian kernels, deep neural network, random woodland, and gradient boosting machine.Their common predictive performances were compared over a thousand rounds of training and trying out on random subsets of the records even as controlling for kind I and II mistakes.Ensemble strategies like random woodland and gradient boosting yielded the great effects.Additionally, interpretable gadget gaining knowledge of strategies have been used to analyses function significance and incorrectly predicted observations.The findings provide insights for chance analysts and professionals in assessing strengths and weaknesses of various models to build effective choice regulations for comparing destiny insurance guidelines.
F. Ferreira et.al. (2021) [44], This paper presented They accomplished feature engineering on a deliver chain dataset from Data Go to achieve pre-processed data for modelling.An SVM category version became built and finished 98.Sixty one% accuracy for fraud prediction, outperforming logistic regression and naive Bayes fashions.This validated SVM's potential to efficiently classify fraudulent transactions via mastering from historic deliver chain statistics.The observe highlighted the importance of characteristic engineering prior to constructing supervised studying fashions for packages like fraud detection the usage of imbalanced transaction datasets.
El-Bannany, et.al. (2021) [45], This studied explore haw corporations using gadget learning techniques such as guide vector system, logistic regression, selection tree, and neural community.The statistics changed into collected from UAE Securities and Commodities Authority overlaying the period from 2010 to 2018.Results display that SVM had the exceptional performance with 89.54% accuracy and seventy seven.18%F1 rating outperforming different classifiers.This study goals to highlight the importance of making use of machine learning algorithms like SVM, LR, DT, and NN to mitigate economic risks for businesses.
Tanouz et.al. ( 2021) [46] ,The paper presented a look at aimed to categorise fraudulent and non-fraudulent transactions using algorithms consisting of logistic regression, random woodland and Naive Bayes on an imbalanced credit score card transaction dataset.Various pre-processing strategies together with undersampling, outlier detection and feature removal have been carried out.Results display that the random forest classifier executed best with ninety six.Seventy seven% accuracy, a hundred% precision, ninety one.Eleven% don't forget and ninety five.35%F1 rating, outperforming other models.While all algorithms executed in addition, the have a look at shows better effects can be performed through combining one of a kind strategies or schooling fashions with greater actual-global data.
Tran T et.al.(2021) [47], This paper presented address the issue of imbalanced information in credit card fraud detection the use of device learning algori thms.Two resampling strategies, SMOTE and ADASYN, are used to balance the skewed dataset containing fraudulent and non-fraudulent transactions.Four system gaining knowledge of models, particularly random wooded area, ok-nearest neighbour's, choice tree and logistic regression, are then carried out to the resampled datasets and evaluated the usage of numerous class overall performance metrics such as accuracy, precision, don't forget, rating and AUC.The experimental outcomes display that the device gaining knowledge of algorithms reap higher detection of fraudulent transactions after coping with dataset imbalance with resampling, with random wooded area showing the first-rate overall performance usual on both SMOTE and ADASYN resampled statistics.This observe demonstrates the ability of resampling strategies and supervised gaining knowledge of in credit score card fraud detection with imbalanced realinternational transactional information.Dong et.al.(2021) [48], presented a machine learning model based on support vector mechanism (SVM) for product fraud detection.They did feature engineering for supply chain-related data provided by DataGo, transforming discrete data into continuous numerical variables by encoding labels.They compared the SVM classification model with logistic regression and naive Bayesian models, and found that the SVM model achieved the highest accuracy of 98.61% in classifying fraudulent product transactions.The authors concluded that their SVM-based model effectively detects fraud in product transactions in the supply chain, showing its superiority over other algorithms Hao Wang et. al. (2021) [49], proposed a product fraud detection model based on the decision tree algorithm to forecast the supply of certain products.They performed feature engineering on a supply chain dataset from DataGo Global, selecting relevant features using information gain.The decision tree model was developed, and its process, including tree generation, pruning, and classification, was explained.Experiments were conducted to evaluate the model's performance using accuracy as the metric.The decision tree model achieved higher accuracy than logistic regression and support vector machine models on the same dataset, demonstrating its effectiveness for product supply forecasting tasks Moreira et.al. (2022) [50], conducted an exploratory analysis and implemented machine learning techniques for predictive assessment of fraud in banking systems.They analysed a database containing over six million financial transaction records from a bank.An exploratory data analysis revealed the main variables influencing fraud evaluation, including binary and financial percentages related to fraud losses.To address the imbalance between regular and fraudulent transactions, they employed Random Under Sampling, SMOTE, and ADASYN techniques to balance the dataset.Subsequently, they trained and tested Logistic Regression, Naive Bayes, KNN, and Perceptron models on the balanced data.The study presented the feasibility of each machine learning model in different scenarios for fraud detection and provided final considerations and proposals for future work.
Esenogho et.al. (2022) [51], proposed an efficient credit card fraud detection approach using a neural network ensemble classifier and a hybrid data resampling method.They employed an LSTM neural network as the base learner in the AdaBoost ensemble technique.The imbalanced dataset was resampled using the SMOTE-ENN method to create a balanced dataset.The proposed LSTM ensemble outperformed benchmark algorithms like SVM, MLP, decision tree, and traditional AdaBoost, achieving a sensitivity of 0.996 and specificity of 0.998 on a real-world credit card transaction dataset.Their experiments demonstrated the effectiveness of the hybrid resampling technique and the LSTM ensemble in improving fraud detection performance on imbalanced data.
Wang et.al. (2022) [52], proposed a fraud detection framework integrating quantum machine learning (QML) with quantum annealing solvers to address challenges in online fraud detection, such as real-time detection and highly imbalanced datasets.They implemented a QML system using Support Vector Machine (SVM) enhanced with quantum capabilities and compared its performance against twelve traditional machine learning algorithms on two datasets: a non-time-series, moderately imbalanced dataset of Israeli credit card transactions, and a time-series, highly imbalanced bank loan dataset.The results showed that the quantum-enhanced SVM outperformed all other algorithms in both speed and accuracy for the highly imbalanced bank loan dataset.However, its detection accuracy was similar to traditional algorithms for the moderately imbalanced credit card dataset.Feature selection significantly improved detection speed across most algorithms but marginally impacted accuracy.The findings demonstrate Wu and Du et.al.(2022) [53],conducted an analysis on financial statement fraud detection for Chinese listed companies using deep learning techniques.They proposed a novel multi-dimensional fraud factors index system derived from financial information and managerial comments in annual reports.A Chinese textual data mining framework was presented for fraud detection from the Management Discussion and Analysis (MD&A) section using state-of-the-art deep learning models like LSTM and GRU.About 5130 annual reports of Chinese listed companies were analyzed, combining numerical features from financial statements and textual data.The empirical results suggested the feasibility and effectiveness of the proposed approach, with LSTM and GRU achieving correct classification rates of 94.98% and 94.62% respectively on testing samples, demonstrating the promising performance of extracted textual features in reinforcing financial fraud detection.
LUO et.al ( 2023) [54],The studied explores the application of differential privacy algorithms to credit card data in various machine learning algorithms.It addresses the lack of research on the utility impact of differential privacy on complex credit card datasets.The findings suggest that employing differential privacy mechanisms like Laplace, Duchi, and Piecewise can effectively balance data utility and privacy protection.The research emphasizes the importance of selecting the appropriate differential privacy method based on dataset characteristics and machine learning task specifics.Overall, the study highlights the potential of differential privacy in safeguarding user privacy during credit card data analysis, contributing significantly to the fields of financial technology and privacy protection.The insights from this research are expected to guide future endeavors in enhancing the security and privacy of data analysis practices involving sensitive credit card information.
Madhurya et.al. (2022) [55], conducted an exploratory analysis of credit card fraud detection using machine learning techniques.They compared the performance of various classifiers, including logistic regression, decision trees, random forests, Naïve Bayes, K-nearest neighbours, and artificial neural networks, in detecting fraudulent credit card transactions.The study found that while logistic regression had higher accuracy, the learning curves indicated that most algorithms underfitted the data, except for K-nearest neighbours (KNN), which exhibited better classification ability for credit card fraud detection.
Hsin et.al. ( 2022) [56], The researched focuses on feature engineering and resampling strategies for fund transfer fraud detection.It emphasizes the importance of handcrafted features and transparent cause-effect relationships for effective prediction outcomes.The study addresses the challenges posed by timeinhomogeneous features and the impact of data imbalance on detection performance.By utilizing the Kolmogorov-Smirnov test for feature selection and comparing various resampling methods, such as oversampling and GANs, the research provides insights into enhancing fraud detection models' robustness and accuracy Wang et.al.(2022) [57] proposed a fraud detection framework integrating quantum machine learning for online transactions.The study utilizes statistical tests to determine data stationarity and applies detrending methods for nonstationary data.Least Absolute Shrinkage and Selection Operator (LASSO) is used for feature selection, enhancing prediction models.Support Vector Machine (SVM) kernel functions are transformed into Quantum Unconstrained Binary Optimization (QUBO) for fraud detection.Two datasets, ICCT and LOAN, are analyzed for fraud prediction using SVM-QUBO and traditional machine learning algorithms Shahbazi et.al.(2022) [58] developed a machine learning-based system for analyzing financial risks in the cryptocurrency market.They focused on risk management strategies using advanced analytics to address the complexities and challenges of the cryptocurrency environment.The study highlighted the importance of utilizing machine learning techniques for effective risk mitigation in the volatile cryptocurrency market Nguyen et.al. (2022) [59], proposed a card fraud detection model based on Cat Boost .They used the IEEE-CIS Fraud Detection Dataset provided by Vesta Corporation.The key idea was user separation, dividing users into old and new before applying Cat Boost and DNN to each category, respectively.Various techniques were employed to improve detection accuracy, such as handling imbalanced datasets, feature transformation, and feature engineering.The experimental results showed the model performed well, obtaining AUC scores of 0.97 for Cat Boost and 0.84 for DNN.
Arora et.al. (2021) [60], conducted a study to predict credit card defaults through data analysis and machine learning techniques.They analysed over 10 million records from the Bank of Taiwan.Using logistic regression, they explored the relationship between the class variable and independent variables.They performed exploratory data analysis and applied various machine learning algorithms, including Random Forest, Support Vector Machine (SVM), Logistic Regression, Naive Bayes, and K-Nearest Neighbours.

D.
Related Work Summary Table Table

E. Discussion
The table summarizes several studies applying machine learning techniques for fraud detection across different domains and datasets.A variety of algorithms were evaluated, including supervised, unsupervised, ensemble.The table covers a wide range of studies focused on fraud detection across various domains like credit card transactions, financial transactions, product sales, cryptocurrency trading, and network traffic data.Several machine learning techniques have been employed, including regression, classification, clustering, ensemble methods, and reinforcement learning algorithms.Most studies are focused on achieving the highest possible accuracy in fraud detection using a variety of traditional and modern machine learning algorithms.Studies [31], [32], [36], [42], [50] achieved approximately 99% accuracy using support vector machines, decision trees, and XGBoost on various data.However, other studies such as [45], [52] focused on improving other performance parameters such as accuracy and sensitivity.
Several studies have addressed the problem of class imbalance in fraud data, where fraud cases are few compared to benign cases.Studies [33], [34], [54] have addressed this challenge using techniques such as SMOTE, resampling and synthetic data.On the other hand, studies [46], [47] used a different approach by combining unsupervised anomaly detection with supervised learning.Some studies such as [39], [45], [57] applied pre-processing operations such as feature selection and hyperparameter tuning to obtain the best performance.While other studies such as [43], [58] focused on comparing several different algorithms on the same data set.Studies [40], [41], [51] have explored advanced techniques such as reinforcement learning, quantum learning, and synthetic data generation using different types of data such as banking data and cryptocurrencies.However, some of these approaches have failed to outperform traditional methods.While most studies used publicly available or restricted data, studies such as [48] tested their methods on private data to obtain more realistic results.It is important to note that most studies have been limited to evaluating performance using static test suites, while studies such as [56] have discussed the need for adaptive learning and faster response mechanisms to deal with evolving fraud patterns.In general, combined and hybrid methods such as [59] that combine multiple techniques have shown promising results.

F. Conclusion and future directions
In conclusion, effective risk analysis and prediction models are crucial for credit card issuers and financial institutions to mitigate losses from delinquencies and defaults.This comprehensive review has examined the various statistical, machine learning, and hybrid techniques employed for credit risk modelling.Traditional statistical methods such as logistic regression and discriminant analysis have been widely used, but their linear assumptions and inability to capture complex non-linear patterns limit their predictive power.Machine learning algorithms like decision trees, random forests, and neural networks have demonstrated superior performance by automatically learning intricate relationships from large datasets.Ensemble and hybrid models that combine multiple techniques have further improved predictive accuracy.Key factors influencing credit risk include applicant characteristics (e.g., income, debt, employment), credit history, macroeconomic conditions, and behavioural data from credit card usage patterns.Incorporating diverse and relevant features is essential for building robust predictive models.Advanced feature engineering and selection methods help identify the most informative predictors.However, challenges remain in dealing with issues like class imbalance, missing data, concept drift over time, and ethical concerns around bias and discrimination.Explainable AI techniques that provide insights into model decisions are increasingly important for transparency and fairness in credit scoring.Looking ahead, the integration of alternative data sources (social media, digital footprints) and sophisticated deep learning architectures holds promise for further enhancing risk prediction capabilities.Continuous model monitoring and recalibration will be necessary to adapt to evolving consumer behaviour and market dynamics.Interdisciplinary collaborations between data scientists, risk analysts, and domain experts are vital for developing practical and trustworthy credit risk solutions.

Figure 1 .
Figure 1.ML integration credit card fraud detection

Figure 4 .
the concept of support vectors in data classification, where classified data points belonging to two different classes are shown, and a separating line (margin) is drawn between them to determine the classification

Figure 4 .
Figure 4. support vectors Algorithm [10] Decision Tree: Decision trees are a type of tree-like model where each internal node represents a feature, each branch represents a decision rule, and each leaf node represents a class label or a numerical value.They work by recursively partitioning the input space based on the feature values, creating a hierarchical structure of decisions.Decision trees are easy to interpret, can handle both numerical and categorical data, and are relatively robust to outliers and noise[19][20].Figure 5.The diagram of a decision tree consisting of root and subdecision nodes, as well as leaf nodes representing the final outcomes or classified categories.

1 .
Summary of related work on credit card fraud detection using ML