Decision Support System (DSS) for Fraud Detection in Health Insurance Claims Using Genetic Support Vector Machines (GSVMs)

,


Introduction
Low-income countries have made significant development policy frameworks for the sustainability of growth.ese frameworks include healthcare delivery.Ghana is one of the countries which aspired to provide effective and efficient health care.In achieving this noble goal, the National Health Insurance Scheme (NHIS) was established by an Act of Parliament, Act 650, in 2003 [1].
e NHIS, as a social protection initiative, aims at providing financial risk protection against the cost of primary health care for residents of Ghana, and it has replaced the hitherto obnoxious cash and carry system of paying for health care at the point of receiving service.Since its introduction, the scheme has grown to become a significant instrument for financing healthcare delivery in Ghana.For effective and efficient implementation, NHIS introduced a tariff as a standardized primary fee for service rendered to its beneficiaries at their affiliated health institutions.is standardized tool was reviewed in January 2007 by the National Health Insurance Authority (NHIA), the governing body of NHIS, to develop a new tariff for the NHIS due to its expansion of service coverage.e new tariff was developed based on a GDRG (Ghana Diagnostic Related Group) system to include various clinical conditions and surgical procedures grouped under eleven Major Diagnostic Categories (MDC) or clinical specialties, namely, Adult Medicine, Pediatrics, Adult Surgery, Pediatrics Surgery, Ear, Nose and roat (ENT), Obstetrics and Gynecology, Dental, Ophthalmology, Orthopedics, Reconstructive Surgery, and Out-Patients' Department (OPD) [2].
ese specialties provide a guide to the claim adjudication process and the operational mechanism for reporting claims as well as determine the reimbursement process and create standards of operation between NHIS and service providers [2].
e GDRG code structure uses seven alphanumeric characters.e first four characters represent the MDC or clinical specialty.e next two characters are numbers to represent the number of GDRG within MDC.
e last character (A or C) represent the age categories.An "A" represents those greater than or equal to 12 years, and C stands for those less than 12 years.
e World Health Organization (WHO) provided an International Classification of Diseases (ICD-10) to meet the requirements for claim submission [3,4], but NHIS utilized the GDRG codes since they have full control over them.Hence, the GDRG codes are used to develop the fraud detection model.
A claim is a detailed invoice that service providers send to the health insurer, which shows exactly what services a patient or patients received at the point of healthcare service delivery.Claim processing is the major challenge of providers under the Health Insurance Scheme (HIS) globally due to the excessive fraud in submitted claims and gaming of the system through well-coordinated schemes to siphon money from its coffers [5][6][7][8][9].
It was estimated conservatively that at least 3%, or more than $60 billion, of the US's annual healthcare expenditure was lost due to fraud.Other estimates by government and law enforcement agencies placed this loss as high as 10% or $170 billion [9,12].In addition to financial loss, fraud also severely hinders the US healthcare system from providing quality care to legitimate beneficiaries [9].Hence, effective fraud detection is essential for improving the quality and reducing the cost of healthcare services.
e National Health Care Antifraud Association report in [12,20] intimated that healthcare fraud strips nearly $70 billion from the healthcare industry each year.In response to these realities, the Health Insurance Portability and Accountability Act of 1996 (HIPAA) specifically established healthcare fraud as a federal criminal offense, with the primary crime carrying a federal prison term of up to 10 years in addition to significant financial penalties [8,21].
is paper presents the hybridized approach of combining genetic algorithms and support vector machines (GSVMs) to solve the health insurance claim classification problem and eliminate fraudulent claims while minimizing conversion and labour costs through automated claim processing.e significant contributions of this paper are as follows: (1) analysis of existing data mining and machine learning techniques (decision tree, Bayesian networks, Naïve-Bayes classifier, and support vector machines) for fraud detection; (2) development of a novel fraud detection model for insurance claims processing based on genetic support vector machines; (3) design, development, and deployment of a decision support system (DSS) which incorporates the fraudulent claim detection model, business intelligence, and knowledge representation for claims processing at NHIS Ghana; (4) development of a user-friendly graphical user interface (GUI) for the intelligent fraud detection system; and (5) evaluation of the health insurance claims fraud detection system using Ghana National Health Insurances Subscribers' data from different hospitals.
e outline of the paper is as follows: Section 1 presents the introduction and problem statement with research objectives.Section 2 outlines the systematic literature review on various machine learning and data mining techniques for health insurance claims fraud detection.Section 3 gives the theoretical and mathematical foundations of genetic algorithms (GA), support vector machines (SVMs), and the hybrid genetic support vector machines (GSVMs) in combating this global phenomenon.Section 4 provides the proposed methodology for the GSVM fraud detection system, its design processes, and development.Section 5 comprises the design and implementation of the genetic support vector machines, while Section 6 presents the key findings of the research with conclusions and recommendations for future work.

Literature Review
Researching into health insurance claims fraud domain requires a clear distinctive view on what fraud is because it is sometimes lumped together with abuse and waste.However, fraud and abuse refer to a situation where healthcare service is paid for but not provided or reimbursement of funds is made to third-party insurance companies.Fraud and abuse are further explained as healthcare providers receiving kickbacks, patients seeking treatments that are potentially harmful to them (such as seeking drugs to satisfy addictions), and the prescription of services known to be unnecessary [12,[17][18][19].Health insurance fraud is an intentional act of deceiving, concealing, or misrepresenting information that results in healthcare benefits being paid to an individual or group.
Health insurance fraud detection involves account auditing and detective investigation.Careful account auditing can reveal suspicious providers and policyholders.Ideally, it is best to audit all claims one-by-one.However,

2
Journal of Engineering auditing all claims is not feasible by any practical means.Furthermore, it is challenging to audit providers without concrete smoking clues.A practical approach is to develop shortlists for scrutiny and perform auditing on providers and patients in the shortlists.Various analytical techniques can be employed in developing audit shortlists.e most common fraud detection techniques reported through the literature include the use of machine learning, data mining, AI, and statistical methods.
e most costsaving model using the Naïve-Bayes algorithm was used to create a subsample of 20 claims consisting of 400 objects where 50% of objects were classified as fraud and the other 50% classified as legal, which eventually does not give a clear picture of the decision if compared to other classifiers [22].
e integration of multiple traditional methods has emerged as a new research area in combating fraud.is approach could be supervised, unsupervised, or both, for one method to depend on the other for classification.One method may be used as a preprocessing step to modify the data in preparation for classification [9,23,24], or at a lower level, the individual steps of the algorithms can be intertwined to create something fundamentally original.Hybrid methods can be used to tailor solutions to a particular problem domain.Different aspects of performance can be specifically targeted, including classification ability, ease of use, and computational efficiency [14].
Fuzzy logic was combined with neural networks to assess and automatically classify medical claims [14].e concept of data warehousing for data mining purposes in health care was applied to develop an electronic fraud detection application to review service providers on behavioral heuristics and compared to similar service providers.Australia's Health Insurance Commission has explored the online discounting learning algorithm to identify rare cases in pathology insurance data [10,[25][26][27].
Researchers in Taiwan developed a detection model based on process mining that systematically identified practices derived from clinical pathways to detect fraudulent claims [8].
Results published in [28,29] used Benford's Law Distributions to detect anomalies in claims reimbursements in Canada.Despite the detection of some anomalies and irregularities, the ability to identify suspected claims is very limited to health insurance claim fraud detection since it applies to service providers with payer-fixed prices.
Neural networks were used to develop an application for detecting medical abuse and fraud for a private health insurance scheme in Chile [30].e ability to process claims on real-time basis accounts for the innovative nature of this method.e application of association rule mining to examine billing patterns within a particular specialist group to detect these suspicious claims and potential fraudulent individuals was incorporated in [9,22,30].

Mathematical Foundations for Genetic Support Vector Machines
In the 1960s, John Holland invented genetic algorithms by involving a simulation of Darwinian survival of the fittest as well as the processes of crossover, mutation, and inversion that occurs in other genetics.Holland's inversion demonstrated that, under certain assumptions, GA indeed achieves an optimal balance [31][32][33][34].In contrast with evolution strategies and evolutionary programming, Holland's original goal was not to design algorithms to solve specific problems but rather to formally study the phenomenon of adaptation as it occurs in nature and to develop ways in which the mechanisms of natural adaptation might be imported into computer systems.Moreover, Holland was the first to attempt to put computational evolution on a firm theoretical footing [35].Genetic algorithms operate through three main operators, namely, (1) reproduction, (2) crossover, and (3) mutation.A typical genetic algorithm requires (1) a genetic representation of the solution domain and (2) a fitness function to evaluate the solution domain [31][32][33][34].
Reproduction is controlled by crossover and mutation operators.Crossover is the process whereby genes are selected from the parent chromosomes, and new offsprings are produced.A mutation is designed to add diversity to the population and ensure the possibility of exploring the entire search space.It replaces the values of some randomly selected genes of a chromosome by some arbitrary new values [33,35].
During the reproduction stage, an individual is assigned a fitness value derived from its raw performance measure given by the objective function.
Support vector machine (SVM) as a statistical machine learning theory was introduced in 1995 by Vapnik and Corte as an alternative technique for polynomial, radial function, and multilayer perceptron classifiers, in which the weights of the neurons are found by solving quadratic programming (QP) problem with linearity, inequality, and equality constraints rather than by solving a nonconvex, unconstrained minimization problem [36][37][38][39].As a novel machine learning technique for binary classification, regression analysis, face detection, text categorization in bioinformatics and data mining, and outlier detection, SVMs face challenges when the dataset is very large due to the dense nature and memory requirement of the quadratic form of the dataset.However, SVM is an excellent example of supervised learning that tries to maximize the generalization by maximizing the margin and supports nonlinear separation using kernelization [40].SVM tries to avoid overfitting and underfitting.e margin in SVM denotes the distance from the boundary to the closest data points in the feature space.
Given the claims training dataset correspondingly to x n : R n ∈ F in the feature space F , the calculated linear hyperplane dividing them into two labelled classes y i (fraud and legal) can be mathematically obtained as Assuming the training dataset is correctly classified, as shown in Figure 1.
is means that the SVC computes the hyperplane to maximize the margin separating the classes (legal claims and fraud claims).

Journal of Engineering
In the simplest linear form, an SVC is a hyperplane that separates the legal claims from the false claims with a maximum margin.Finding this hyperplane involves obtaining two hyperplanes parallel to it, as shown in Figure 1 above, with an equal distance to the maximum margin.If all the training dataset satisfies the constraints as follows: where ω is the normal to the hyperplane, |b|/‖ω‖ is the perpendicular distance from the hyperplane to the origin, and ‖ω‖ is the Euclidean norm of ω. e separating hyperplane is defined by the plane ω T x i + b � 0 and the above constraints in (2) are combined to form e pair of the hyperplanes that gives the maximum margin (c) can be found by minimizing ‖ω‖ 2 , subject to constraint in (9). is leads to a quadratic optimization problem formulated as is problem is reformulated by introducing Lagrange multipliers, α i (i � 1, . . ., n { }) for each constraint and subtracting them from the function f is results in establishing the primal Lagrangian function: ( Taking the partial derivatives of L P (ω, b, α) with respect to ω, b & α, respectively, and applying the duality theory yields e problem defined in ( 5) is a quadratic optimization (QP) problem.Maximizing the primal problem L P with respect to α i , subject to the constraints that the gradient of L P with respect to w and b vanish, and that α i ≥ 0, gives the following two conditions: Substituting these constraints gives the dual formulation of the Lagrangian: But the values of α i , ω, and b are obtained from these respective equations, namely, Also, the Lagrange multiplier is computed using Hence, this dual Lagrangian L D is maximized with respect to its nonnegative α i to give a standard quadratic optimization problem.e respective training vectors are called support vectors.With the input dataset x i as a nonzero Lagrangian multiplier α i , e equation above gives the support vectors (SVs).Despite that the SVM classifier can only have a linear hyperplane as its decision surface, its formulation can be extended to build a nonlinear SVM.SVMs with nonlinear decision surfaces can classify nonlinearly separable data by introducing a soft margin hyperplane, as shown in Figure 2: Introducing the slack variable into the constraints yields Legitimate claims

Misclassified point
Support e parameter C is a regularization parameter that trades off the wide margin with a small number of margin failures.
e parameter C is finite.e larger the C value, the more significant the error.
e Karush-Kuhn-Tucker (KKT) conditions are necessary to ensure optimality of the solution to a nonlinear programming problem: e KKT conditions for the primal problem are used in the nonseparable case, after which the primal Lagrangian becomes With β i as the Lagrange multipliers to enforce positivity of the slack variables (ξ i ) and applying the KKT conditions to the primal problem yields where the parameter d represents the dimension of the dataset.
Observing the expressions obtained above after applying KKT conditions yields is implies that any training point for which 0 < α i < C will be taken to compute for b as a data point that does not cross the boundary: is does not participate in the derivation of the separating function with α i � C and ξ i > 0: Nonlinear SVM maps the training samples from the input space into a higher-dimensional feature space via a kernel mapping function F. In the dual Lagrangian function, the inner products are replaced by the kernel function: Effective kernels are used in finding the separating hyperplane without high computational resources.e nonlinear SVM dual Lagrangian subject to is is like that of the generalized linear case.e nonlinear SVM separating hyperplane is illustrated in Figure 3 with the support vectors, class labels, and margin.
is model can be solved by the method of optimization in the separable case.erefore, the optimal hyperplane has the following form: where b is the decision boundary from the origin.Hence, separating newly arrived dataset x implies that However, feasible kernels must be symmetrical, i.e., the matrix K with the component k(x i , x j ) is positive semidefinite and satisfies Mercer's condition given in [39,40].
e summarized kernel functions considered in this work are given in Table 1.
ese kernels satisfied Mercer's condition with RBF or Gaussian kernel, which is the widely used kernel function from the literature.e RBF has an advantage of adding a single free parameter c > 0, which controls the width of the RBF kernel as c � 1/2σ 2 , where σ 2 is the variance of the resulting Gaussian hypersphere.e linear kernel is given as k(x i , x j ) � x i • x j .Consequently, the training of SVMs used the solution of the QP optimization problem.e above mathematical formulations form the foundation for the development and deployment of genetic support vector machines as the decision support tool for detecting and classifying health insurance fraudulent claims.In recent times, decisionmaking activities of knowledge-intensive enterprises depend holistically on the successful classification of data patterns, despite time and computational resources required to achieve the results due to the complexity associated with the dataset and its size.

Methodology for GSVM Fraud Detection
e systematic approach adopted for the design and development of genetic support vector machines for health insurance claims fraud detection is presented in the conceptual framework in Figure 4 and the flow chart implementation in Figure 5.
e conceptual framework incorporates the design and development of key algorithms that enable submitted claims data to be analysed and a model to be developed for testing and validation.
e flow chart presents the algorithm implemented based on theoretical foundations in incorporating genetic algorithms and support vector machines, two useful machine learning algorithms necessary for fraud detection.
eir combined use in the detection process generates accurate results.e methodology for the design and development of genetic support vector machines as presented above consists of three (3) significant steps, namely, (1) data preprocessing, (2) classification engine development, and (3) data postprocessing.

Data Preprocessing.
e data preprocessing is the first significant stage in the development of the fraud detection system. is stage involves the use of data mining techniques to transform the data from its raw form into the required format to be used by the SVC for the detection and identification of health insurance claims fraud.
e data preprocessing stage involves the removal of unwanted customers, missing records, and data smoothening.is is to make sure that only useful and relevant information is extracted for the next process.
Before the preprocessing, the data were imported from MS Excel CSV format into MySQL to a created database called NHIS.
e imported data include the electronic Health Insurance Claims (e-HIC) data and the HIC tariff datasets as tables imported into the NHIS.e e-HIC data preprocessing involves the following steps: (1) claims data filtering and selection, (2) feature selection and extraction, and (3) feature adjustment.
Journal of Engineering e WEKA machine learning and knowledge analysis environment were used for feature selection and extraction, while the data processing codes are written in the MATLAB technical computing environment.
e developed MAT-LAB-based decision support engine was connected via MYSQL using the script shown in Figure 6.
Preprocessing of the raw data involves claims cost validity checks.e tariff dataset consists of the approved tariffs for each diagnostic-related group, which was strictly enforced to clean the data before further processing.Claims are partitioned into two, namely, (1) claims with the valid and approved cost within each DRG and (2) claims with invalid costs (those above the approved tariffs within each DRG).
With the recent increase in the volume of real dataset and dimensionality of the claims data, there is the urgent need for a faster, more reliable, and cost-effective data mining technique for classification models.e data mining techniques require the extraction of a smaller and optimized set of features that can be obtained by removing largely redundant, irrelevant, and unnecessary features for the class prediction [41].
Feature selection algorithms are utilized to extract a minimal subset of attributes such that the resulting probability distribution of data classes is close to the original distribution obtained using all attributes.Based on the idea of survival of the fittest, a new population is constructed to comply with fittest rules in the current population, as well as the offspring of these rules.Offsprings are generated by applying genetic operators such as crossover and mutation.
e process of offspring generation continues until it evolves a population N where every rule in N satisfies the fitness threshold.With an initial population of 20 instances, generation continued till the 20 th generation with crossover probability of 0.6 and mutation probability of 0.033.
e selected features based on genetic algorithms are "Attendance date," "Hospital code," "GDRG code," "Service bill," and "Drug bill."ese are the features selected, extracted, and used as the basis for the optimization problem formulated below: e GA e-HIC dataset is subjected to SVM training, using 70% of the dataset and 30% for testing as depicted in Figure 7.
e e-HIC dataset, which passes the preprocessing stage, that is the valid claims, was used for SVM training and testing.e best data, those that meet the genetic algorithm's criteria, are classified first.Each record of this dataset is classified as either "Fraudulent Bills" or "Legal Bills." e same SVM training and the testing dataset are applied to the SVM algorithm for its performance analysis.
e inbuilt MATLAB code for SVM classifiers was integrated as one function for linear, polynomial, and RBF kernels.e claim datasets were partitioned for the classifier training, testing, and validation, 70% of the dataset was used for training, and 30% used for testing.
e linear, polynomial, and radial basis function SVM classification kernels were used with ten-fold cross validation for each kernel and the results averaged.For the polynomial classification kernel, a cubic polynomial was used.e RBF classification kernel used the SMO method [40].
is method ensures the Journal of Engineering handling of large data sizes as it does data transformation through kernelization.After running many instances and varying parameters for RBF, a variance of 0.9 gave better results as it corresponded well with the datasets for the classification.After each classification, the correct rate is calculated and the confusion matrix extracted.e confusion matrix gives a count for the true legal, true fraudulent, false legal, false fraudulent, and inconclusive bills.
(i) True legal bills: this consists of the number of "Legal Bills," which were correctly classified as "Legal Bills" by the classifier.
(ii) True fraudulent bills: this consists of the number of "Fraudulent Bills," which were correctly classified as "Fraudulent Bills" by the classifier.
(iii) False legal bills: this consists of the bills classified as "Legal Bills" even though they are not.at is, these are wrongly classified as "Legal Bills" by the kernel used.(iv) False, fraudulent bills: the classifier also wrongly classified bills as fraudulent.e confusion matrix gives a count of these wrongly or incorrectly classified bills.e correct rate: this is calculated as the total number of correctly classified bills, namely, the true legal bills and true fraudulent bills, divided by the total number of bills used for the classification: correct rate � number of TLB + number of TFB total number of bills (TB) , (25) where TLB: True Legal Bills; TFB: True Fraudulent Bills. accuracy e probability of a correct classification.

GSVM Fraud Detection System Implementation and
Testing. e decision support system comprises four main modules integrated together, namely, (1) algorithm implementation using MATLAB technical computing platform, (2) development of graphical user interface (GUI) for the HIC fraud detection system which consists of uploading and processing of claims management, (3) system administrator management, and (4) postprocessing of detection and classification results.Journal of Engineering e front end of the detection system was developed using XAMPP, a free and open-source cross-platform web server solution stack package developed by Apache Friends [42], consisting mainly of the Apache HTTP Server, MariaDB database, and interpreters for scripts written in the PHP and Perl programming languages [42].XAMPP stands for Cross-Platform (X), Apache (A), MariaDB (M), PHP (P), and Perl (P).e Health Insurance Claims Fraud Detection System (HICFDS) was developed using MATLAB technical computing environment with the capability to connect to an  ese results are shown in Figure 9. e developed GUI portal for the analysis of results obtained from the classification of the submitted health insurance claims is displayed in Figure 9.By clicking on the fraudulent button in the GUI, a pop-up menu generating the labelled Figure 10 is obtained for the claims dataset.It shows the grouping of detected fraudulent claim types in the datasets.
For each classifier, a 10-fold cross validation (CV) of hyperparameters (C, γ) from Patients Payment Data (PPD) was performed.e performance measured on GA optimization tested several hyperparameters for the optimal SVM. e SVC training aims for the best SVC parameters (C, c) in building the HICFD classifier model.e developed classifier is evaluated using testing and validation data.e accuracy of the classifier is evaluated using cross validation (CV) to avoid overfitting of SVC during training data.e random search method was used for SVC parameter training, where exponentially growing sequences of hyperparameters (C, c) as a practical method to identify suitable parameters were used to identify SVC parameters and obtain the best CV accuracy for the classifier claims data samples.Random search slightly varies from grid search.Instead of searching over the entire grid, random search only evaluates a random sample of points on the grid. is makes the random search a computational method cheaper than a grid search.Experimentally, 10-fold CV was used as the measure of the training accuracy, where 70% of each sample was used for training and the remaining 30% used for testing and validation.In evaluating the classifiers obtained with the analyzed methods, the most widely employed performance measures are used: accuracy, sensitivity, and specificity with their concepts of True Legal (TP), False Fraudulent (FN), False Legal (FP), and True Fraudulent (TN). is classification is shown in Table 3.
e figures below show the SVC plots on the various classifiers (linear, polynomial, and RBF) on the claims datasets (Figures 11-13).
From the performance metrics and overall statistics presented in Table 4, it is observed that the support vector machine performs better classification with an accuracy of 87.91% using the RBF kernel function, followed by the   5, where i utilized in the computation of the performance metric of the SVM classifiers.For the purpose of statistical and machine learning classification tasks, a confusion matrix, also known as an error matrix, is a specific table layout that allows visualization of the performance of a supervised learning algorithm.
Besides classification, the amount of time required in processing the sample dataset is also an important consideration in this research.From the above, the compared computational time shows that increase in the size of the sample dataset also increases the computational time needed to execute the process regardless of the machine used, which is widely expected.is difference in time costs is merely due to the cause of training the dataset.us, as global data warehouse increases, more computational resources will be needed in machine learning and data mining research pertaining to the detection of insurance fraud as depicted in Figure 14 relating the average computational time and sample data.
Figure 15 summarizes the fraudulent claims detected during the testing of the HICFD with the sample dataset used.As the sample data size increases, the number of suspected claims increases rapidly based on the various fraudulent types detected.
Benchmarking HICFD analysis ensures understanding of HIC outcomes.From the chart above, an increase in the claims dataset has a corresponding increase in the number of suspected claims.e graph in Figure 16 shows a sudden rise in the level of suspected claims on tested 100 datasets representing 52% of the sample dataset, after which it continues to increase slightly on the suspected numbers of claims by 2% to make up 58% on the tested data size of 300 claims.
Among these fraud types, the most frequent fraudulent act is uncovered services rendered to insurance subscribers by service providers.It accounts for 22% of the fraudulent claims as to the most significant proportion of the total health insurance fraud on the total tested dataset.Consequently, overbilling of submitted claims is recorded as the second fraudulent claims type representing 20% of the total sample dataset used for this research.
is is caused by service providers billing for a service greater than the expected tariff to the required diagnoses.Listing and billing for a more complex or higher level of service by providers are done to boost their financial income flow unfairly in the legitimate claims.Moreover, some illicit service providers claim to have rendered service to insurance subscribers on costly services instead of providing more affordable ones.Claims prepared on expensive service rendered to insurance subscribers represent 8% of the fraudulent claims detected on the total sample dataset.Furthermore, 3.1% of service procedure that should be considered an integral part of a single procedure known as the unbundle claims contributed to fraudulent claims of the set of claims dataset used as the test data.Due to the insecure process for quality delivery of healthcare service, insurance subscribers are also contributing to the fraudulent type of claims by loaning their ID cards to family members of the third party who pretend to be owners and request for the HIS benefits in the healthcare sector.Duplicated claims as part of the fraudulent act recorded the minimum rate of 0.5% of contribution to fraudulent claims in the whole sample dataset.
As observed in Table 6, the cost of the claims bill increases proportionally with an increase in the sample size of the claims bill.
is is consistent with an increase in fraudulent claims as sample size increases.From Table 6, we can see the various costs for each raw record (R) of sample claim dataset.Valid claims bill after processing dataset, the variation in the claims bill (R-V), and their percentage representation as well are illustrated in Table 6.ere is a 27% financial loss of the total submitted claim bills to insurance carriers.is loss is the highest rate of loss within the 750 datasets of submitted claims.
Summary of results and comparison with other machine learning algorithms such as decision trees and Naïve-Bayes is presented in Table 7.
e MATLAB Classification Learner App [43] was chosen to validate the results obtained above.It enables ease of comparison with the different methods of classification algorithms implemented.e data used for the GSVM were subsequently used in the Classification Learner App, as shown below.
Figures 17 and 18 show the classification learner app with the various implemented algorithms and corresponding accuracies in MATLAB technical computing language environment and the results obtained using the 500-claim dataset, respectively.Figures 19 and 20 depict the subsequent results when the 750-and 1000-claim datasets were utilized for the algorithmic runs and reproducible comparison, respectively.e summarized results and accuracies are illustrated in Table 7. e summarized results in Table 7 portray the effectiveness of our proposed approach of using the genetic support vector machines (GSVMs) for fraud detection of insurance claims.From the result, it is evident that GSVM achieves a higher level of accuracy compared to decision trees and Naïve-Bayes.

Conclusions and Recommendations
is work aimed at developing a novel fraud detection model for insurance claims processing based on genetic support vector machines, which hybridizes and draws on the strengths of both genetic algorithms and support vector machines.
e GSVM has been investigated and applied in the development of HICFDS. is paper used GSVM for detection of anomalies and classification of health insurance claims into legitimate and fraudulent claims.SVMs have been considered preferable to other classification techniques due to several advantages.ey enable separation (classification) of claims into legitimate and fraudulent using the soft margin, thus accommodating updates in the generalization performance of HICFDS.With other notable advantages, it has a nonlinear dividing

Figure 2 :
Figure 2: Linear separating hyperplanes for the nonseparable case of SVC by introducing the slack variable (ξ).

Figure 3 :
Figure 3: Nonlinear separating hyperplane for the nonseparable case of SVM.

Figure 4 :
Figure 4: Conceptual model design and development of the genetic support vector machines.

Figure 5 :
Figure 5: Flow chart for design and development of the genetic support vector machines.

Figure 6 :
Figure 6: MATLAB-based decision support engine connection to the database.

Figure 7 :
Figure 7: Data preprocessing for SVM training and testing.

Figure 10 :
Figure 10: Fraud type distribution on the sample data sizes.

Figure 17 :
Figure 17: Classification Learner App showing the various algorithms and percentage accuracies in MATLAB.

Table 1 :
Summarized kernel functions used.

Table 2 :
Sample data size and the corresponding fraud types.

Table 3 :
Summary performance metrics of SVM classifiers on samples sizes.

Table 4 :
Averages performance analysis of SVM classifiers.

Table 5 :
Confusion matrix for SVM classifiers.

Table 6 :
Cost analysis of tested claims dataset.

Table 7 :
Comparison of results of GSVM with decision trees and Naïve-Bayes.