Facilitating User Authorization from Imbalanced Data Logs of Credit Cards Using Artificial Intelligence

An effective machine learning implementation means that artificial intelligence has tremendous potential to help and automate financial threat assessment for commercial firms and credit agencies. )e scope of this study is to build a predictive framework to help the credit bureau by modelling/assessing the credit card delinquency risk. Machine learning enables risk assessment by predicting deception in large imbalanced data by classifying the transaction as normal or fraudster. In case of fraud transaction, an alert can be sent to the related financial organization that can suspend the release of payment for particular transaction. Of all the machine learning models such as RUSBoost, decision tree, logistic regression, multilayer perceptron, K-nearest neighbor, random forest, and support vector machine, the overall predictive performance of customized RUSBoost is the most impressive. )e evaluation metrics used in the experimentation are sensitivity, specificity, precision, F scores, and area under receiver operating characteristic and precision recall curves. Datasets used for training and testing of the models have been taken from kaggle.com.


Introduction
For this study, the term "credit" refers to a method of e-commerce without having funds. A credit card is a thin, rectangular metal or plastic block provided by the banking institution, allowing card users to borrow cash to pay for products and services. Credit cards enforce cardholders to repay the financial leverage, interest payment, and any other fees decided from time to time. e credit card issuer often offers its customers a line of credit (LOC), allowing them to lend cash withdrawals. Issuers usually preset lending thresholds depending on specific creditworthiness [1,2]. e use of credit cards is vital these days, and it plays a significant role in e-commerce and online funds transfer [3,4]. e ever-increasing use of credit cards has posed many threats to the users and the companies issuing such cards. Fraudsters keep on finding new ways to commit cheating, which can cause considerable losses to card users and these companies as well [5,6].

Credit Card Payment Processing
Steps. Figure 1 illustrates how payments are transferred to the vendor's bank account, whenever the clients make purchases through the credit card [7]: (a) A client sends a credit card purchase via Internet of ings-(IoT-) enabled swipe devices/POS/online sites. (b) Payment gateway collects and transfers the transaction details safely to the merchant's bank computer-based controller system (c) e bank processor forwards the verification (i.e., processing, clearing, and settlement) process to the Credit Card Interchange (CCI) (d) e CCI transfers the transaction to the client's credit card provider (e) e card provider accepts or rejects the purchase based on current funds in the client's account and passes back the transaction information to the CCI (f ) e CCI transmits transaction information to the vendor's bank computer-based controller system (g) e controller system of the vendor's bank transmits transaction details further to the payment gateway (h) e payment gateway keeps and delivers transaction details to the vendor and/or client (i) e CCI transfers the required funds to the vendor's bank, which further transfers funds into the merchant's account [7] 1.2. Fraud in Credit Card Transaction. Fraud and illegal behavior have various perspectives. e Association of Certified Fraud Examiners (ACFE) is a professional fraud examiner organization. Its activities include producing information, forming tools, and imparting training to avoid frauds. e ACFE has termed "fraud" as usage of one's profession for self-benefit via deliberate misapplication or misuse of assets of the organization [3]. A fraud is committed with the chief intention to acquire access by illegal means. It adversely affects the economic growth, governance, and even fundamental social values. Any technical infrastructure involving money and resources can be breached by unethical practices, e.g., auction site systems, medical insurance, vehicle insurance, credit cards, and banking. Cheating in these applications is perceived as cyber crime, potentially causing significant economic losses [3,8].
Fraud can lower the trust in the industry, disturb the economic system, and significantly impact the overall living costs [9,10]. IoT-enabled systems maintain the trace of their operational activities, which can be beneficial for analyzing some specific patterns. e previous methods based on manual processing such as auditing were cumbersome and ineffective due to large-size data or its attributes. Data mining techniques are considered effective in assessing small outliers in large datasets [9,11,12]. Frauds lead to heavy business losses. e credit card frauds contribute hundreds of millions of dollars per year for the lost revenue, and some estimates have indicated that US cumulative annual costs could surpass $400 billion [9].

Types of Credit Card-Related Frauds.
e advancements in technology such as the Internet and mobile devices have contributed to increased fraudulent activities in recent times [13]. Fraudsters keep on finding new techniques, and therefore, monitoring systems are required to evolve correspondingly. Frauds related to credit cards can be broadly categorized into offline and online frauds [14]: (i) Offline credit card fraud occurs whenever fraudsters stole the credit card and used it as the rightful owner in outlets. is is unusual as financial firms will promptly block the missing card whenever cardholders suspect the theft [3]. (ii) Online credit card frauds are more common and serious as compared with offline frauds in which credit card details are compromised by fraudsters through phishing, website cloning, and skimming and used in digital transactions [3,15].
Global connectivity through new and advanced technology has exponentially increased the credit card frauds.
us, the issue has acquired an alarming dimension in the present scenario, and a suitable system needs to be developed for detecting and avoiding such frauds.

Fraud Prevention System (FPS)
. FPS is the first form of defense for technological systems toward forgery. e aim of this phase is to suppress first-place fraud. e techniques in this phase prohibit, destroy, and respond to cyber attacks in computer servers (software and hardware), networks, or data, for example, encryption algorithms and firewall to decipher data and to block inner private networks from outside world, respectively [3,16].

Fraud Detection System (FDS)
. FDS becomes the next safety measure to spot and recognize the fraudulent practices when they reach the networks and notify these to a network administrator [17]. Earlier, manual auditing methods such as discovery sampling were used to detect any such fraud [18]. is method had to tackle different environmental, political, legal, and business practices. To improve detection efficiency, computerized and automatic FDSs were developed. FDS capacities have been constrained however, as identification is primarily based on predefined rules set by the experts. Different data mining approaches are being developed to detect the frauds effectively. Oddity or outlier identification in FDS depends on behavioral profiling methods that model the pattern of behavior for every entity and assess any divergence from the normal [19]. Many authors have adopted anomaly-based FDSs in different areas of fraud detection [20][21][22][23].

Distributed Deployment of Security-Related Aspects.
Financial firms have indeed acknowledged that the deployment of isolated control systems on solo delivery channels apparently no longer implements the requisite degree of vigilance toward illegal account operation. An additional layer of security, i.e., "Fraud Management," is enhancing the robustness by combining with security protocols at the level of standard channel [24]. e implemented fraud detection strategy can be distributed as reactive and proactive, depending on the point where data analysis is implemented in different transaction orders. Fraud identification approaches derived from data processing, neural networks, and/or various deep learning algorithms conduct sophisticated model processing via collected datasets in reactive fraud management to identify suspect transfers. e newly arrived operations are evaluated "on the fly" in proactive fraud management before proper authorization and finalization, to allow the detection of unusual occurrences prior to any financial value movement. Proactive fraud detection is accomplished by relocating the inherent security which allows real-time scanning prior to completion of the transaction. Statistical analysis and data mining-related approaches have been implemented on classed posttransactional data to derive common traits correlated to suspicious occurrences in fraud strategic management.

Data Imbalance Is a Major Concern.
Skewed distribution is regarded as one of the chief sensitive problems of FDS [3]. Usually, the skewed data problem is the scenario where there are far fewer instances of fraudulent cases than usual [25], making it difficult for learners to uncover trends in minority class data [26]. Moreover, class imbalance has a significant influence on the efficiency of classification models, which are normally dominated by majority class labels. Imbalanced datasets have a detrimental effect on classification performance that tends to be overshadowed by the majority class, thereby ignoring the minority class. As shown in Figure 2, the data-balancing methods can be divided into two subcategories, viz., data level methods and algorithmic level methods [27].
1.5.1. Data Level Methods. Such methods are taken as preprocessing to reorient the collected data before applying the classification algorithms. Many investigators have used the balancing methods, viz., undersampling or oversampling, in FDS-related studies [3]. In undersampling, a portion of the dataset of the dominant class is eliminated [28]. A broad range of FDS has used the undersampling technique to equalize training samples. e oversampling method duplicates minority class data samples. e oversampling technique is not frequently used because it induces overfitting of a model, especially for noisy data [29]. Synthetic minority oversampling technique (SMOTE) [30] is being used for fraud detection and considered as a superior complement to its current peers. SMOTE synthesizes new minority instances in the reported zone. Investigators, in their study [31], have conducted many simulations using various data level methods (SMOTE and EasyEnsemble) to identify the most suitable credit card FDS [3].

Algorithmic Level Methods.
In this category, classifiers have been used to detect suspicious classes in a sample dataset. e algorithmic level approach uses cost-sensitive learning (CSL) to counter unequal class distribution. CSL places a cost variable to misinterpret the various classes by presuming that a cost matrix is present for various errors. Cost matrix structure is significantly correlated with these observations: false negative/positive and true negative/ positive [32]. Another algorithmic approach followed in the FDS literature would be to use learners to manage imbalanced distribution. Such learners are either immune to class inequality by the learner's intrinsic characteristics as with Repeated Incremental Pruning to Produce Error Reduction (RIPPER) [33] or the learners are reinforced against the issue by intrinsic alterations [3].
Falsified transactions have a narrow percentage in the overall dataset that may hinder the efficiency of FDS. In credit card systems, misclassifying legitimate transactions causes dissatisfied customers, which itself is regarded more detrimental than fraud itself. As mentioned above, two approaches, viz., algorithmic and data levels, were used to fix class imbalances. e researchers, in their works [34][35][36][37][38], have used undersampling techniques while dealing with the concern of class skewness in credit card FDS. However, Stolfo et al. [26] have used the oversampling method in the preprocessing stage of credit card FDS.
On the contrary, an algorithmic level approach has been followed using cost-sensitive learning techniques or by using the learner itself to manage uneven distribution. Sahin et al. [39] have used cost-sensitive classifiers to address the class imbalance. Dorronsoro et al. [21] have used nonlinear discriminant analysis (NLDA) neural models to tackle the class with imbalances. Ju and Lu [40] have used an enhanced imbalance class weighted support vector machine (ICW-SVM) to handle the skewness of the dataset. Bentley et al. [41] have given a fraud density map to enhance detection accuracy. In a study by Pozzolo et al. [42], the authors have suggested a race model to choose the right approach for an imbalance dataset. Chen [28] has used the binary support vector system (BSVS) and genetic algorithm (GA) to achieve a higher prediction accuracy from imbalance inputs. Minegishi and Niimi [43] have suggested the creation of a very fast decision tree (VFDT) learner, which could be tailored for extremely unbalanced datasets. Seeja and Zareapoor [44] have proposed FraudMiner for managing class imbalance via explicitly entering unbalanced data to the classification model. G.C. de Sá et al. have customized the bayesian network classifier (BNC) algorithm for credit card fraud detection [45]. Husejinovic has introduced a methodology to detect credit card fraud using naive bayesian and C4.5 decision tree classifiers [46]. Arya et al. have proposed deep ensemble learning to identify fraud cases in real-time data streams. e proposed model is capable of adapting to data imbalance as well as is robust to innate transaction patterns such as purchasing behavior [4].

Scope of the Study
is manuscript explores the concern of classifying imbalanced data by merging data level and algorithm level techniques to detect the fraudster from the log files generated for credit cards used at IoT-enabled terminals. Furthermore, an appropriate alert message can be sent to either the credit card holder or the issuer for reverting/ blocking the transaction. Here, the random undersampling (RUS) approach has been deployed at the data level and boosting at the algorithmic level. e merger of these two components is RUSBoost [47]. Here, RUS is a data sampling technique that aims to mitigate class inequality by modifying the training dataset's class distribution. RUS eliminates instances from the majority class completely at random before a reasonable class distribution is reached [48,49]. e boosting method helps in improving the classification precision of weak classifiers by combining weak hypotheses. Initially, all training dataset examples are given equal weights. Base learner forms a weak hypothesis during each iteration of adaptive boosting (AdaBoosting). Boosting is said to be adaptive since poor learners are subsequently tweaked in support of cases which are not classified by former classifiers. e inconsistency connected with the hypothesis is determined, and the weight of each instance is modified in such a manner that incorrectly classified cases raise their weights, whereas correctly classified samples decrease their weights. us, successive boosting steps will produce hypotheses which are able to correctly classify the previous incorrectly labeled instances. After all repetitions, a weighted vote would be used to allot a class to samples in the dataset [48]. RUSBoost is less costly than oversampling and bagging when used for classification (like SMOTEBagging). Figure 3 highlights the various phases, taking credit card transactional logs (imbalanced dataset) as input and giving an alert to the bank or the credit card holder regarding the status of the transactions performed at some IoT-based terminals. Figure 3 shows that on the credit card transactional logs, the customized RUSBoost (CtRUSBoost) gets applied and results into showing the status of the transaction held. Here, the approach constitutes random undersampling and boosting using decision tree as per the normal RUSBoost algorithm with a further add-on/customization of having bagging process using SVM. CtRUSBoost can be deployed at the stage/step of either Credit Card Interchange or Credit Card Provider Computer Controller System (as shown in Figure 1), and from these controlling systems, an alert message can be escalated for suspending or stopping the financial transaction. e various symbolic notations used in the proposed algorithm CtRUSBoost have been defined in Table 1. e RUSBoost given by Seiffert et al. [48,49] has been modified by the authors here in this research work. e rounded rectangles at steps 2d, 2e, 3a, 3b, and 4 show the customization proposed by the authors here, which has resulted in comparatively better outcomes. In step 1, the weights of each sample are initialized to (1/x), where x is the total of instances in the training dataset. e weak hypotheses, viz., DT and SVM, are iteratively trained in steps 2a-2i. In step 2a, random undersampling has been implemented to suppress the class labels until the required minority class proportion is reached in the current (temporary) training dataset SEG z ′ . For example, if the required class proportion is 50:50, then most class instances are predictably excluded until majority and minority class instances are comparable. erefore, SEG z ′ will have a new distribution of weight as DIS z ′ .

Methodology
Step 2b moves SEG z ′ and DIS z ′ to the decision tree, generating the weak hypothesis h z (step 2c). In step 2d, support vector machine has been employed to compute the weak hypothesis h svm z in step 2e. e pseudo loss ε t (based on SEG and DIS z ) has been determined in step 2f.

Data imbalance approaches
Algorithm-based Undersampling Oversampling Figure 2: Various techniques of handling the concern related to data imbalance.
In step 2f, the hypothesis values for those tuples have only been considered where there is a misclassification. Here, in the subexpression q k ≠ q, q k means the original label/class of the k th row/tuple in the dataset and q is the label/class obtained after employing/deploying the weak learner decision tree. Subexpression h z (p k , q k ) is the numeric confidence value in z th iteration for the instance p k , where the label is q k , and subexpression h z (p k , q) is the numeric confidence value in the same z th iteration for the instance p k considered earlier, where the label is mismatched and obtained as q instead of q k . In step 2g, the parameter α is computed as (ε z /(1 − ε z ))which symbolizes the weight update. In step 2h, the weight distribution gets updated DIS z+1 .
Step 2i normalizes the value computed in the previous step. After the completion of Z iterations, in step 3a, the maximum value of h z has been computed among the ones given by decision tree under boosting, where the knowledge/learning from the previous dataset segment has been used for getting the hypothesis value of the next dataset segment, but in the last step, all the results have not been merged to obtain the final one. Instead, the final value of the hypothesis has been obtained from the last dataset segment. In step 3b, hypothesis values as obtained by employing SVM for each dataset segment in Ziterations have been finalized by performing voting or averaging among all the values of h svm z . In step 4, the final hypothesis H (p) has been computed taking the maximum of the value obtained for h z and h svm z .

Results and Experiment
e results obtained after using the three different datasets, viz., (i) Abstract Dataset for Credit Card Fraud Detection [50], (ii) Default of Credit Card Client Dataset [51], and (iii) Credit Card Fraud Dataset [52] are shown in this section. Customized RUSBoost results were compared using RUSBoost, decision tree (DT), logistic regression (LR), multilayer perceptron (MLP), K-nearest neighbors (KNN), random forest (RF), AdaBoost, and support vector machine (SVM).
ree separate datasets based on the number of tuples were taken for the current work. Datasets of less than five thousand tuples were considered as small; tuples with a range of over five thousand and less than ten thousand were considered as medium; and those with a range of over ten thousand entries were considered as large. All the datasets have been divided into two partitions, i.e., 80% and 20% of the full dataset, where the bigger portion has been taken for training and the smaller one for testing of the machine learning models.
Hypothesis value obtained through support vector machine in z th iteration for the instance p k (this serves as a numeric confidence rating) h z (p k ) Hypothesis value obtained through decision tree in z th iteration for the instance p k (this serves as a numeric confidence rating) ε z Cumulative pseudo loss α z Parameter to update the weight factor C z Factor for normalizing the (z + 1) th distribution of weights taking the full training dataset/or normalized value for the distribution DIS z (k) Distribution of weights at z th iteration taking the full training dataset for the k th sample DIS z+1 Distribution of weights at (z +  [50] has been taken from the kaggle.com database. e authors classified this as a small dataset with less than 5,000 tuples. e dataset included the usage of 3,075 clients and 11 attributes. Of the 3,075 samples, 2,627 represent nonfraudulent transactions and 448 are fraudulent transactions (about 6:1). e eleven variables taken in this dataset are described in Table 2.

Medium Dataset.
e dataset called Default of Credit Card Client Dataset (Dataset B) [51] has also been taken from the kaggle.com database.
is includes details on default payments, demographic factors, credit data, payment history, and credit card company bills in Taiwan from April 2005 to September 2005. Among the 30,000 observations, 23,364 are cardholders with default payment as no and 6,636 with status as yes (about 4:1). Default payment in the finance domain is known as nonrepayment of debt such as interest or principal toward credit or estate. A default can result when a purchaser could not render payments on time, slows payouts, or declines or drops payment [53].
is dataset used a binary variable default payment as the answer variable. Table 3 explains the twenty-four variables taken up in Dataset B.

Large Dataset.
e dataset called Credit Card Fraud Detection (Dataset C) [52] was taken again from the kaggle.com database.
is dataset includes purchases by European cardholders in September 2013. is sample dataset outlined two-day activities, with 492 frauds out of 284,807 total transactions. e dataset is highly imbalanced, where the positive class (fraud) constitutes 0.172% of all transactions deemed. e details of the dataset's features are given in Table 4 and include all numeric values.
It includes only numerical variables resulting from PCA transformation. Kaggle did not provide any original features as well as additional details due to privacy concerns. Features V 1 , V 2 , . . . , and V 28 are the key PCA components with untransformed attributes as "time" and "amount."

Evaluation Metrics.
Assessment measures are employed to calculate statistic or machine learning model efficiency. A confusion matrix gives us the output matrix that characterizes the model's complete efficiency. Here, in the proposed model, the security context is said to be robust if the model is capable of finding/classifying fraudster transactions accurately. e metrics used for comparing ML models for their accuracy are sensitivity and specificity from the confusion matrix, precision, F1 score, receiver operating characteristic (ROC), and area under precision recall (AUPR).

Confusion Matrix.
e confusion matrix is a representation of an algorithm's performance in the field related to machine learning. e term "Confusion" has appeared from the fact that if the machine learning model causes confusion between two classes, it is easy to see. Figure 4 depicts a confusion matrix providing sensitivity, specificity, recall, and fall-out information. e column in this matrix represents instances in the actual class, while each row represents instances in one expected class.
Sensitivity is an estimate of the total of truly positive instances expected to be positive. e larger sensitivity value will have a high true positive value and less false negative value. Models with high sensitivity are required for health and financial purposes. Specificity is defined as the share of actual negatives, predicted to be negative. is ratio may also be called the false positive rate. e higher specificity value will mean the higher true negative and lower false positive rate.

Precision and F1
Score. Precision and F-measurements are considered more suitable for estimating the performance of a classification algorithm when the dataset is imbalanced, where precision is characterized as the positive predictive value. F-measure in the confusion matrix is the weighted harmonic mean of sensitivity and precision [54]: (1) Precision is the percentage of true positives to all positives. For our problem statement here, the precision would be the measure of fraudster transactions that we correctly identified as fraud out of all the transactions, which are actually fraud. Recall refers to the proportion of the overall predictions of the algorithm being accurately categorized. Furthermore, the value of F1 gives a single score that balances out both recall and the precision.
Here, decision tree, logistic regression, multilayer perceptron (MLP), K-nearest neighbor (KNN), random forest (RF), AdaBoost, and support vector machine (SVM) models have been compared w.r.t. sensitivity, specificity, precision, and F1 score. Decision tree is a nonparametric, supervised learning system for classification and regression tasks. e decision tree is designed using an algorithmic method that recognizes ways of splitting data based on different conditions. Logistic regression is an algorithm for machine learning that is based on the probability principle. It is an algorithm for classification used to attribute observations to a specific class set. Using the logistic sigmoid function, logistic regression transforms the output to return a probability value. A multilayer perceptron is a neural network that links different layers in a directed graph, meaning the signal path through nodes only goes one directional. In MLP, every node is having a nonlinear activation function, except the input nodes. K-nearest neighbor is a single algorithm that holds all existing cases in a similarity measure (i.e., distance function) and classifies new cases. e random forest algorithm generates decision trees on data samples and then obtains predictions from each and finally, picks the best option by voting. In AdaBoost, a sequence of weak learners is linked so that each weak classifier attempts to enhance the classification of observations incorrectly labeled by the preceding weak classifier. Support vector machine uses a kernel trick to transform data and then determines an optimal boundary between potential outputs. e results showing comparison among customized RUSBoost, decision tree, logistic regression, multilayer perceptron (MLP), K-nearest neighbor (KNN), random forest (RF), AdaBoost, and support vector machine (SVM) models have been presented in Tables 5-7.
In Table 7, the value that has been observed for the precision and F1 score is NaN under SVM because the zero divided by zero is undefined as a real number, and in computing systems, it can be represented as NaN.

Receiver Operating Characteristic (ROC).
In machine learning, measuring efficiency is an integral activity. ROC is considered the most significant measurement to test the efficiency of any classification model. It tells how much the model can differentiate between classes. e higher the AUC, the better it would be to predict 0 s as 0 s and 1 s as 1 s. e curve for ROC is plotted with TP rate vs. FP rate, taking TP and FP rates at y-axis and x-axis, respectively [55]. (i) Input: x, SEG, P × Q(with q r ∈ Q, |Q| � 2) (ii) Output: maximum of [(maximum of h z value), (maximum of h svm z value)] Begin (1) Initialization of DIS 1 (k) � 1/x for all k (2) Do for z � 1, 2, 3, . . . , Z (a) Create temporary training dataset SEG z ′ with weight distribution DIS z ′ by using random undersampling (b) Call decision tree, considering the sample set as SEG z ′ and distribution of weight DIS z ′ (c) Compute a hypothesis h z : P × Q ⟶ [0, 1] (d) Call support vector machine considering the sample set as SEG z ′ and distribution of weight as DIS z ′ (e) Compute a hypothesis h svm z : P × Q ⟶ [0, 1] (f ) Compute the pseudo loss for SEG and DIS z ε z � (k, q): q k ≠ q DIS z (k)(1 − h z (p k , q k ) + h z (p k , q)) (g) Compute the parameter to update the weighing factor:  Besides ROC, the precision recall (PR) curves are also considered better for evaluating the algorithmic efficiency when the sample set is highly biased.
e results of the current work are also presented through an AUPR curve obtained on various machine learning models.

Area under Precision Recall (AUPR).
e ROC curve has some drawbacks, including class skew decoupling. at is why the precision recall (PR) curve, which plots precision against recall and is equivalent to the false discovery rate curve, has gained attention in recent years. is output

Conclusion
In this research work, the existing RUSBoost algorithm has been customized by using a combination of bagging and boosting. e results obtained after customizing the RUS-Boost in the proposed methodology are more reliable and authentic when compared with simple/normal RUSBoost, DT, RF, AdaBoost, SVM, LR, KNN, and MLP. e scores obtained for the CtRUSBoost algorithm on three benchmark datasets A, B, and C taken from kaggle.com are 96.30, 99.60, and 100, respectively, for sensitivity; 85.60, 98.70, and 99.80, respectively, for specificity; 94.20, 95.70, and 99.30, respectively, for precision; and 88.60, 97.60, and 99.60, respectively, for F1 score. e results obtained from CtRUSBoost have outperformed all the peer approaches used in this study by a large margin, which means it can detect fraudster transactions more robustly. In the future, the work proposed here can be customized further by adding weak classifiers to the process such as K-nearest neighbors, linear regression, and multilayer perceptron.

Conflicts of Interest
e authors declare that they have no conflicts of interest.