Paper—Credit Card Fraud Detection Using Fuzzy Rough Nearest Neighbor and Sequential Minimal... Credit Card Fraud Detection Using Fuzzy Rough Nearest Neighbor and Sequential Minimal Optimization with Logistic Regression

The global online communication channel made possible with the internet has increased credit card fraud leading to huge loss of monetary fund in their billions annually for consumers and financial institutions. The fraudsters constantly devise new strategy to perpetrate illegal transactions. As such, innovative detection systems in combating fraud are imperative to curb these losses. This paper presents the combination of multiple classifiers through stacking ensemble technique for credit card fraud detection. The fuzzy-rough nearest neighbor and sequential minimal optimization are employed as base classifiers. Their combined prediction becomes data input for the meta-classifier, which is logistic regression resulting in a final predictive outcome for improved detection. Simulation results compared with seven other algorithms affirms that ensemble model can adequately detect credit card fraud with detection rates of 84.90% and 76.30%. Keywords—Fraud detection, Credit card, Ensemble technique, Stacking, Machine learning


Introduction
The motive that drives fraud is for criminal purposes. This act of pursuing committing fraud is basically for siphoning money illegally that leads to loss of financial or personal gain [1]. According to definition, credit card fraud is the usage of information assigned to credit card without the users' knowledge for purchases [2]. The transactions performed with the credit card are orchestrated physically or virtually. Physical in the sense, transactions involve an exchange of the card in person by the user during point of purchase. Virtual transactions encompass online operations via the World Wide Web [2], [3]. While credit card usage paved the way for easy, convenient, and proficient online transaction through e-commerce, it also created a loophole for criminal activities thereby inflating rate of fraud [4]. Online transactions for goods and services over the years have skyrocketed. It is reported that an estimate of US$15 billion was the overall orders executed in 2009, with online payment of 84% [2]. In Malaysia, credit card transactions accounted for 320 million in 2011, and rose to 360 million in 2015 [5]. Fraud hiked from US$ 23 billion in 2013 to US$32 billion in 2014 [6]. Another source stated that in 2015 [7], the global credit card fraud was US$ 21.84 billion which is decrease to the report in [6]. There are different numbers of techniques that have been proposed and developed for tackling fraud detection. They comprise of Bayesian network, Markov model, decision tree, support vector machines, and a host of algorithms that are nature-inspired [8]- [12]. In this paper, an alternative method for the detection of credit card fraud is proposed based on stacking ensemble technique. It adopted machine learning algorithms of fuzzy-rough nearest neighbor (FRNN), sequential minimal optimization (SMO), and logistic regression (LR). By combining the predictions of these algorithm results in a classification outcome for effective detection. Datasets from wellknown database were retrieved for experimentations and evaluated with standard metrics for fair comparison. The organization of the paper is as follows: Section 2 summarizes relevant literatures in relation to credit card fraud detection. Section 3 discusses the algorithms of fuzzy-rough nearest neighbor, sequential minimal optimization, and logistic regression. The proposed ensemble model is formulated in Section 4. Experimentations are analyzed in Section 5. In Section 6 occupies the conclusion and future works.

Related Works
There has been lots of research conducted for the detection of credit card fraud in literature. This section reviews the various work carried out to solve the problem of fraud detection. Ref. [13] proposed a number of different modifications of artificial neural network (ANN) totaling five new ANNs for the classification of fraud in credit card as well as identification of customers. Dependent on real-life data, experimental outcomes show the developed models measured up and, in some cases,, performed better in comparison to other algorithms. A bagging ensemble based on decision tree was constructed to adequately predict credit card fraud [14]. The authors made use of realworld data to investigate the performance of the devised model, and after undergoing experimental rudiments, the bagging ensemble outperformed support vector machine, naïve bayes and k-nearest neighbor. The combination of random forest (RF) and rough set theory (RST) proved efficient for the detection of fraud as put forward by Ref. [15]. RF serves the purpose of selecting relevant attributes, which is passed onto RST for classification. The decision tree and neural network were also drafted for proper comparison. Final results places that RST was able to give a better classification performance. By adapting the algorithmic methods of AdaBoost and majority voting, a selection of twelve stand-alone algorithms have been incorporated for ascertain criminalities by fraudsters on credit card [5]. Employing a collated data over a period of three months and a benchmark data, the empirical results confirms majority voting show superiority with the inclusion of noise. According to [11], fisher discriminant analysis was adjusted by injecting a weighted average that promotes linear discriminant to suite the profitable projections as conceptualized by the authors. The classification and regression tree was used for streamlining important attributes. The selected ones are thus applied by proposed fisher discriminant analysis, and edged the decision tree, naïve bayes, ANN, and original fisher discriminant analysis in terms of detecting fraud. A comparative analysis of algorithms used mostly for credit card fraud detection was conducted in the work by Ref. [16] that involves logistic regression, decision tree, and random forest. Publicly available dataset of German credit data served for evaluation among the algorithms. Results extracted from analysis reveals that random forest proved superior. Still on random forest algorithm, Ref. [17] focused on two variations of random forest namely; random-tree based and classification and regression tree (CART) random forest. Applying the forest-based models on dataset collected from China, the CART accounted for superlative percentages to tree-based algorithm. The inbuilt advantages provided by hyper-heuristic evolutionary algorithms opened the pathway for the development of an intelligent Bayesian network classifier for credit card fraud detection [18]. Empirical analysis when compared to other traditional Bayesian network algorithms and some learning algorithms rated the proposed method better to others in terms of economy efficiency. Data mining methods has always been in the fore front in tackling credit card fraud, which is further echoed in the work presented by Ref. [19]. The support vector machine, random forest, and logistic regression were used for data analysis. As recorded above, random forest once again shows its prowess by generating high performance. In handling big amount of data, the convolution neural network was drafted for use in the detection of behaviours deemed fraudulent in credit card data patterns [20]. State of the art algorithms such as SVM, NN and RF were compared with the proposed model. The RF proved it mettle but could not be stronger than CNN in overall performance. Ref. [4] investigated the credibility of three machine learning models namely; logistic regression, k-nearest neighbor, and naïve bayes, for finding suspicious behavioural patterns in fraud data. The principal component analysis act as feature reduction technique before the processed data is injected into the classifiers. A higher accuracy was accrued by k-nearest neighbor to other models. Ref. [21] proposed a strategy based on feature engineering for credit card fraud detection. A sequence classification task with reliance on Long Short-Term Memory (LSTM) was used for addressing the issue of fraud detection [22]. By deploying a deep learning technology of generative adversarial networks, a boost in the classification effective was achieved for credit card fraud detection [23].

3
Conceptual Characteristics of Fuzzy-Rough Nearest Neighbor, Sequential Minimal Optimization, and Logistic Regression

Fuzzy rough set
On the condition that there is crisp set BS  , the lower and upper approximation of Pawlak [24], [25], are defined in (1) and (2) with regards to equivalence as seen identically in (3) and (4), With B and r E denoting a set and relation in S that are fuzzy, it is possible to expand the equations in (3) and (4) with fuzzy implicator and t-norm depicted as I and T in (5) and (6)

Vaguely quantified rough set
The inf and sup operators in equation (5) and (6), processed from fuzzy rough sets are closely related to  and  quantifiers in (3) and (4). Such interconnections can have an immense influence on approximations when one entity changes. This makes fuzzy rough sets susceptible to meaningless and corrupted data. Hence, a decision to substitute  and  with abstract quantifiers like most and some was put forward to address this restriction [26], [27]. Vague quantifies are modelled mathematically via continuously growing fuzzy quantifier [28]: a growing [0,1] → [0,1] maps Q meeting the borderline specifications Q(0) = 0 and Q(1) = 1. In (7), the construction of instances defining fuzzy quantifiers is created using accompanying parameterized formula, for 01     , and s in [0,1].
Paper-Credit Card Fraud Detection Using Fuzzy Rough Nearest Neighbor and Sequential Minimal… The determination of a pair ( , ) lu QQ leads to the description of approximations termed lower (8) and (9)

Fuzzy nearest neighbor
The process of classifying a test object owing to the similarity with respect to a specified K-nearest neighbor and their respective membership degrees is ascribed to the proposition of fuzzy K-nearest neighbor (FNN) algorithm [29], [30]. The FNN pseudocode is shown in Algorithm 1. Given that an object z resides within class C, the similarity is formulated as: where N connotes the set of object z's K-nearest neighbors.
( , ) r E s z is similarity of s and z and is located inside [0,1]. It can also be defined traditionally as: where  depicts Euclidean norm, and m is used for handling the similarity's weight.

Algorithm 1: The fuzzy nearest neighbor (FNN) algorithm
Require: S: the training data, Ϛ: the class set of decision, z: the object to be classified, K: the number of nearest neighbors 1: N ← get Nearest Neighbors(z,K) as the output 6: end

Fuzzy rough nearest neighbors
The concatenation of approximations of fuzzy rough set with that of traditional FNN schematics gave birth to the proposition of fuzzy-rough nearest neighbours (FRNN) algorithm [31]. The algorithm, as revealed in Algorithm 2, relies solely on building fuzzy lower and upper decision class approximations using the nearest neighbours. Classification procedure of instances is based on linkage of membership to approximations. The where the maximum and minimum value of attribute q denoted as max q and min q respectively. A high ( )( ) signifies the inclusion of all of z's neighbor to class C, goes high, it indicates that at least one neighbor belongs to C.

Sequential minimal optimization
The goal of sequential minimal optimization (SMO) is to train the support vector machines (SVMs). Basically to dissolve associated SVM deficiencies in handling large sized problems [32]. The concept of SVM goes thus; With reference to [33], if there exist collection of data points   ( , ) p Hw   ; H  and p are input vector and all training data. The process involved in training SVM for the purpose of classification is analogous to finding solution to the following: with b acquired from Equation (14). The SVM fails to deal with QP problems of large sizes. In resolving this, the SMO disintegrate enormous QP task into sub-problems. Optimization of a training data sequence subset in each phase, which is called a working set. Two working sets are used to mitigate the QP sub-problems with a simple systematic technique [34]. A set of rules are vital in specifying two   . SMO adjusts quadratically the total data sequence.

Logistic regression
Logistic regression (LR) is a statistical technique for assessing the likelihood of a binary result determined by a number of reasonable factors. This explains the effect of the considered variables on the dependent variable examined. Contrary, if the explanatory factors include a minimum of three unsorted subgroups, then multinomial logistic regression (MLR) is deployed. Compliance with the notion of binomial logistic regression, the MLR approach was conceived on the same fundamental arrangement. It can therefore be stated that the logistic regression is being extended [35]- [37].
In the work done by Le Cessie and Van Houwelingen [38], a ridge values of 8 1 10  was recommended for the log probability computation. There exist modifications to for the classification purpose [39]. If n cases with m features have k classes, the ( 1) mk − matrix points towards component B being computed. The probability for class j with the exception of the class is as in (17).
The last class has probability as shown in (18).
Therefore, the negative multinomial log-likelihood is represented as follows: A Quasi-Newton process is employed for discovering enhanced values of ( 1) mk − elements to locate matrix B where L is reduced. The matrix B is compressed to a ( 1) mk − vector prior to the optimization approach [38], [39]. Proposed Methodology The step-by-step procedure of the proposed ensemble algorithm consisting of fuzzy rough nearest neighbor (FRNN), sequential minimal optimization (SMO), and logistic regression (LR) is described in this section.
To begin execution of the ensemble algorithm, the original training data is loaded into base classifiers which are FRNN and SMO algorithms. They are trained to form a combined prediction of the FRNN and SMO. The resulting predictive outcome ultimately serves as input for the meta-classifier to give a final prediction. Figure 1

Experimental Setup and Results
The credit card fraud datasets used for experimentations are provided and retrieved from UCI Machine Learning Repository through http://archive.ics.uci.edu/ml [40]. The datasets are Australian credit approval data and German credit data. The Australian Credit Approval is composed of credit application data and has 14 attributes with one class label, + or -, as well as 690 instances. 307 instances are categorized as positive (credit approved) and 383 instances as negative (credit denied). The dataset is a good mixture of attributes, including nominal and numerical values. Usage of all numerical attributes version for Australian Credit Approval is employed for use. With respect to the German Credit data, the numeric version is adopted for use. It consists of 700 instances of creditworthy applicants and 300 instances of non-creditworthy applicants. It describes the credit details for each applicant with 24 input variables. Both datasets are trained with the ensemble model of FRNN, SMO and LR algorithms. Popular algorithms within the domain of credit card fraud detection are selected for comparison namely; multi-layer perceptron (MLP), IBk or K-nearest neighbour algorithm, Naïve Bayes, and random forest (RF). The Waikato environment for knowledge analysis (WEKA) takes the centre stage for running all the experiments. Training and assessment is done with a 10-fold cross-validation. This involves the dataset divide into ten subsets of the same size by allocating nine subsets for the training data. An average mean of each results are collated.

Assessment measures
The performance metrics to evaluating algorithms' effectiveness are the detection rate (DR) (true positive rate), false alarm rate (FAR) (false positive rate), specificity (SP), positive predictive value (PPV), and F-measure. The terms are described in (20) to (24) where TP and FP are the true positives and false positives, while FN and TN are the false negatives and true negatives.

Simulation results
The execution of the simulations relies on WEKA having a 3.40GHz Intel® Core i7 Processor with 4GB of RAM. The findings are tabled and diagrammatically visualized following series of experiments for each dataset. The performance results in Table 1 accommodate the Australian credit approval datasets. With respect to detection rate, FRNN, SMO and LR generated rates of 81.00%, 84.60%, and 85.40% respectively.
Other algorithms such as the MLP, IBk, naïve bayes and random forest accounted for detection rates at 83.80%, 82.00%, 77.50%, and 84.90% accordingly. The proposed ensemble model is rated second at 84.90% alongside random forest. Assigned with the lowest detection rate is naïve bayes algorithm, and LR shows to produce highest rate at 85.40%. Regarding false alarm rate, the lower the rate, the algorithm shows to be better. The proposed ensemble model of FRNN, SMO, and LR, gave the lowest and best rate at 13.80%. Naïve bayes has the poorest false rate of 26.10%. It can be revealed that in terms of specificity, the proposed model supersedes all other algorithms with a rate of 86.20%. Also, the proposed ensemble model certifies its superiority over the compared algorithms when positive predictive value is concerned. An 85.90% PPV is accredited to the proposed model. With f-measure, in second place is the proposed model at 85.00%. LR proved better overall with a rate of 85.40%. Scanning through the results of Australian credit approval, naïve bayes performed poorly to others overall, while the proposed model proved the best on the overall comparison. The performance results in Table 2 accommodate the German credit datasets. With respect to detection rate, FRNN, SMO and LR generated rates of 68.50%, 76.40%, and 76.30% respectively. Other algorithms such as the MLP, IBk, naïve bayes and random forest accounted for detection rates at 70.20%, 66.00%, 75.40%, and 73.80% accordingly. The proposed ensemble model is rated second at 76.30% alongside logistic regression. Assigned with the lowest detection rate is IBk algorithm, and SMO shows to produce highest rate at 76.40%. The proposed ensemble model gave a fasle alarm rate at 40.40%, and is ranked fourth. Random forest has the poorest false rate of 49.70% with naïve bayes having the best at 38.70%. It can be revealed that in terms of specificity, the proposed model was able to supersede some other algorithms with a rate of 56.60%. Also, the proposed ensemble model certifies its superiority over the compared algorithms when positive predictive value is concerned. A 75.10% PPV is accredited to the proposed model. With f-measure, in second place is the proposed model at 75.10%, tied with LR. SMO proved better overall with a rate of 75.20%. Observations acquired with results of German credit dataset is that the proposed model performed significantly well in par with rest of the algorithms. Illustrated in Figure 4 through to Figure 7 are the receiver operating characteristic (ROC) curves for all the algorithms. It is analogous to its corresponding area under the curves values that are generated from the ROC curves tabulated in Table 3. For the Australian Credit Approval, the proposed model reveals a better AUC than other algorithms at 0.8555. The LR came close with an AUC of 0.8550. Naïve Bayes recorded the lowest AUC value of 0.7570. Focusing on the AUC results for German credit data, eclipsing four of the algorithm is the proposed system. Only three algorithms of naïve bayes, SMO, and LR with AUC values at 0.6835, 0.6810, and 0.6810 were superior to the AUC of proposed model of 0.6795.

Statistical analysis of logistic regression using pseudo-R 2
The quality of regression model is assessed statistically by analyzing with the pseudo-R 2 . Relating to Australian credit approval, the pseudo-R 2 value is 0.594897. Pvalue is 3.5E-122 which is less than (<) 0.05. So it is statistically significant. As with German credit, the value of 0.236271 is accounted for by pseudo-R 2 . It has a p-value of 1.83E-47, that is statistically significant.

Conclusion
This paper presents a stacking ensemble classification model based on fuzzy-rough nearest neighbor algorithm, sequential minimal optimization, and logistic regression for credit card fraud detection. The ensemble method takes advantage of the prediction results of base classifiers by combining them. Afterwards, the meta-classifier accommodates the results accrued from base classifier to generate a final classification result. It also improves the efficiency of classification model. The experimental results on Australian credit approval and German credit datasets indicates that the proposed classification model is able to produce significant and promising classification results in terms of detection rate, false alarm rate, specificity, positive predictive value, f-measure, ROC curves and AUC area. A detection rate of 84.90% and AUC of 0.8555 is generated for Australian credit approval dataset and a 76.30% detection rate with 0.6795 AUC for German credit dataset using 10-fold cross validation procedure. The difference in results between the dataset could be attributed to the dataset features. Australian credit approval with 14 features and German credit having 24 features. A higher data feature may result in lower performance. Therefore, the proposed model through experimentation and analysis confirms that it is very suitable and proficient for the detection of credit card. Future works can be directed towards expanding the algorithms for ensemble in getting better classification results. Also, other techniques that are used in developing ensemble models aside from stacking should be considered for credit card fraud detection.