Cost-sensitive thresholding over a two-dimensional decision region for fraud detection

Fraud in credits is a difficult to detect growing menace, whose results depend on the amount, so a cost-sensitive perspective must be taken. Classical approaches reduce to a fraud probability estimation and a decision threshold selection without considering the amount or considering it but neither explicitly nor examining the aggregated losses on the sample, leading to sub-optimal strategies. A new thresholding approach is proposed to solve these drawbacks and minimize aggregated losses, based on the construction of a two-dimensional decision space using any estimated fraud probability and the credit amount. This expansion allows more freedom for the optimal decision making rule search, which is performed with a new algorithm. The proposed method generalizes previous approaches, so an improvement is consistently achieved. This is shown in a study of two real data sets, comparing the results obtained by a wide range of classifiers.


Introduction
By definition, a fraud is an operation without any payment intention, leading to the total loss of the financed credit.Thus, it is one of the most dangerous (and increasing) risks in a financial company [13,27].Therefore, all operations undergo an initial screening for fraud detection before it follows the usual flow of risk analysis and concession.Furthermore, there are several challenges regarding its detection in the context of consumer finance, where the modeling variables available are very limited, as only the information that the client presents at the request time is available.There is no history of behavior or a large database as in other scenarios such as, for example, credit cards.Another obstacle is the lack of likely patterns needed to train any kind of supervised model, as fraudsters learn from and adapt to risk policies, hindering a stable categorization.This is aggravated by the scarcity of fraudulent cases, which leads to an extraordinarily imbalanced problem [1].Besides, there is a lack of reliable data as fraudsters usually modify or falsify their information, leading to class overlap [12].In addition, in the consumer finance context, complex models as neural networks cannot be used due to regulation and interpretability restrains [2,13,15], since all the decisions that are made must be explainable to the regulator and/or the client.Lastly, when an operation is marked as a possible fraud, an analyst must study its legitimacy, which incurs in operational costs and limitations that cannot be overlooked [1,21,6].Moreover, when legitimate clients are contacted during fraud screening, their reaction is not always favorable and decide to seek for another institution, a susceptible business loss that can damage both the portfolio and the image of the entity.Consequently, in our practical application it is imposed a percentage of operations to analyze (POA) restriction of 10%, and ideally smaller than 5%.
Classification models are commonly trained and evaluated in terms of statistical performance measures that do not take into account the actual business objective, which is to minimize financial losses due to fraud.Several authors have pointed out that decision making based only on an estimated probability have a worse performance in problems where not all error types have the same weight [13,16,22,23].Note that it would be preferable to detect a 10, 000e fraud than five 1, 000e frauds.Costs due to misclassification vary between instances, so fraud must be acknowledged as an instancedependent cost-sensitive problem [11].This creates an additional degree of complexity, as costs depend not only on the class but also on characteristics of the instance (the loan amount).
The fraud detection and cost-sensitive classification literature, although extensive, lacks state-of-the-art references due to the absence of publicly available datasets and consequent comparisons.To overcome this, a wide range of models and methods are introduced and tested in this paper.Among cost-insensitive approaches, undersampling and oversampling techniques [1,6,8,13], although appropriate given the problem imbalance, imply a loss of information and bias or an increase in variance respectively [10,6,5].Classification techniques as support vector machines [27], data envelopment analysis methods [15] or fuzzy models [14,18] do not consider the loan amount in the decision making, which is an important drawback.Regarding cost-sensitive approaches, there are two distinguished philosophies.The first one, known as predict-and-optimize, consists on construct a classifier considering a costsensitive objective function, tuning the estimated probabilities [3,12,25,21] or using weighted versions of logistic regression [3,11] or boosting algorithms [16,11] among others.The second one, known as predict-then-optimize or thresholding, is to focus on the decision making, building a predictive model with the aim of maximizing accuracy and then use a model to optimize decision-making minimizing losses [1,4,8,11,22,21,19].The drawback of these classification rules is that they do not take into account aggregated losses.In an imbalanced setting, as fraud detection, with considerable class overlap, an individual decision rule can imply analysis costs exceeding the detected frauds savings over all the sample [11], so a global strategy is more likely to produce better practical results.
Given the difficulties presented, more complex models, besides banned in consumer credit, are unlikely to achieve better classification than state of the art models [2,7,16,6,24].The method introduced in this paper belongs to the latter philosophy of predict-then-optimize, which [22,8] found to be more effective than training with a task-specific loss or their combination.Hence, it is constructed a novel decision space using the variables on which losses depend on: an estimated fraud probability and the credit amount.In this expanded space there is more freedom for the optimal decision making search, which is accomplished with a new proposed algorithm that includes and expands all previous thresholding approaches, so an improvement is obtained.Furthermore, the algorithm permits the restricted optimal decision-making search, something that any previous approach solve [21,23], situating the proposed algorithm as the best solution in cost-sensitive settings.
The rest of the paper is as follows.Next section presents the cost-sensitive classification problem to be addressed.Section 3 introduces state of the art approaches for the fraud probability estimation.In Section 4, the available thresholding strategies are listed, emphasizing their drawbacks and motivating the proposed methodology, which is explained in Section 5. Finally, Section 6 summarizes the different combinations of classifiers and thresh-olding strategies and studies their performance over two real data sets, one provided by a collaborator financial company and a wide-used open fraud data set.Conclusions and future extensions are included in Section 7.

Cost-sensitive classification
Cost-sensitive classification address the prediction of a binary dependent variable Y ∈ {0, 1} (0 indicating legitimate and 1 fraud) from a set of independent variables X = (X 1 , . . ., X p ) taking into account costs of prediction error and potentially other costs.The objective is loss reduction, so model performance must be evaluated considering classification error costs, which depends on the estimated probability, p(x), of the conditional probability, p(x) = P (Y = 1 | X = x), the credit amount, ξ, and the thresholding strategy.When only the estimated probability is considered for decision-making, prediction is defined by a cut-off point t as Ŷ = I(p(x) > t).
The most widespread approach considers a cost matrix constructed from the true class Y and the predicted class Ŷ , which assumes that every error of the same type has the same cost [8,16].This means a clear overlook of information [3,4,8,28], for which an amount-dependent loss function is defined in order to obtain a more realistic way to measure error cost.Table 1, generalized from [1,4,8,23], contains an instance dependent cost matrix, from which the loss function is constructed.Costs are assumed independent of the covariable vector X in line with [28] and are motivated by the data set presented in Section 6.
In Table 1, C F N i encloses the cost of an undetected fraud, i.e. the total credit amount ξ i .The lost benefit when classifying a legitimate client as a fraudster is summarized in C F P i .It incorporates the proportion of clients who forgo financing for doubting them, a 1 , and the mean gain per operation, a 2 ξ i , with a = a 1 a 2 .The fixed cost of investigating the operation, b, is included both in C F P i and C T P i , i.e. whenever Ŷi = 1.Gains could be introduced, but they do not appear as there is only the possibility of loss when dealing with fraud.From Table 1, the loss function is defined as: In this paper the performance metric considered is savings, with an spread use in the literature [1,3,4].For a sample ( Ŷi , ξ i , Y i ) n i=1 it is expressed as: where the denominator is the total loss faced if no preventive action is taken in order to have a base reference [23].The objective is minimization of n i=1 ℓ( Ŷi , ξ i , Y i ), equivalently maximization of (2), along possible classifiers.In order to show the importance of considering a cost-sensitive metric, for the second data set from Section 6, a logistic model is fitted.Its accuracy, sensitivity, POA = n i=1 Ŷi /n and savings (2) are shown in Figure 1 for different decision thresholds over the estimated probability dimension, where it can be seen the nonlinear relationship between them."Score" is referred as the escalation of the estimated fraud probability between 0 and 10, for the sake of confidentiality, and the metrics represented as percentages.This notation is followed throughout the paper.It can be seen that detecting more frauds not necessarily leads to an increase in savings (2) due to analysis costs.Also, the extreme class imbalance biases accuracy, as the greatest is achieved labeling all operations as legitimate.Consequently, the fraud detection problem should be addressed from a cost sensitive perspective, both conceptually and in order to obtain better practical results.

Classification methods
Fraud probability estimation methods are presented in this section, so the thresholding strategies introduced in next section can be applied afterwards.There are introduced different non and cost-sensitive approaches, so in the practical application it could be tested if the latter help in the posterior thresholding or not, in line with [22].

Logistic regression
In practice, fraud detection is often addressed as a mere classification problem.Logistic regression is the de facto model in credit risk, modeling the probability as: The problem becomes estimating the parameter θ = (β 0 , β) that maximizes the log-likelihood function in equation ( 4) below with w i = 1/n for i = 1, • • • , n.It assigns the same weight to both classification error types, which is not the case in many real applications as fraud detection.

Weighted logistic regression
In order to enhance and adapt logistic regression to the cost-sensitive setting, weights are introduced in the log-likelihood function [3,12,17]: where and w i is the weight of the i-th observation.The modeling of fraud probability is the same as in equation ( 3), but it is considered another objective function, affecting the model parameter estimation, θ, and therefore the classification.Weights are introduced either for balancing the data set [17] or for putting more emphasis on an operation depending on the amount [3], which is expected to improve classification error costs.

AdaBoost
AdaBoost [9,16] is selected as complex cost-insensitive benchmark method so as to obtain a measure of the predictive capacity that can be reached by considering a more complicated model.Other complex cost-insensitive methods are not considered as, based on simplicity, flexibility and performance, since AdaBoost has shown to have a better behavior in practical applications [27,16].In addition, they cannot be used in practice due to the interpretability restraints aforementioned, so just Adaboost seems to be enough as benchmark.
AdaBoost is an ensemble learning algorithm which outputs the predicted class of a point as a weighted majority voting of T weak classifiers as shallow decision trees [16].Each weak classifier is trained over a new weighted data set which gives more weight to misclassified observations in previous models.Thus, on each iteration, focus is put on refining the classification of all points as the algorithm progresses.Given a sample (y i , x i ) n i=1 , weights for each observation i are equally initialized w 1 i = 1/n and a weak classifier, m 1 fitted maximizing accuracy, acc 1 .On each successive step t, a weak learner is fitted with updated weights w t i ∝ w t−1 i e −y i αtmt(x i ) , where α t = log(acc t−1 /(1 − acc t−1 )) and m t is the t-th classifier.This process is repeated T times and the final labeling is defined as ŷi = I( T t=1 α t m t (x i )/T > 1/2).Despite having good results in classification tasks, this proposal is more intended to solve the imbalance problem than the cost-sensitive problem, and it can also fall into overfitting as it can be seen in the first application in Section 6.2.

Instance-dependent cost-sensitive logistic regression
In this approach, introduced in [11] as cslogit, the novelty is that the estimation of the parameters is carried out in a cost-sensitive manner.Starting from a logistic model like (3), parameters are estimated by minimizing the average expected cost (AEC) of the loss function (1): where , so θ = (β 0 , β) is estimated as the minimizer of (5) in θ, which can be found using a gradient descent algorithm.

Instance-dependent cost-sensitive boosting
Introduced in [11], this method is constructed as the Adaboost model introduced in Section 3.3 but optimizing, instead of the accuracy, the regularized loss function: where Ω(m) = γL + 1 2 λ∥ω∥ 2 , L is the number of leaves in each tree, ω the weights of each leaf in the tree and λ, γ ≥ 0 a penalization constant.Thus, the regularized objective function tends to select a model employing simple functions, as the complexity of each tree is penalized, avoiding over-fitting.Since a loss function is considered, the fitted model is more likely to rely on high-amount frauds and therefore obtain greater savings.

Previous thresholding approaches
The essence of cost-sensitive decision making is that, even when some class is more probable, it can be more profitable to act as if another class is true [11].Almost all classifiers produce probability estimates.Thresholding encompasses all the techniques to select a proper threshold from training instances according to some criteria as accuracy or misclassification cost.It can be regarded as a decision making post-process, converting cost-insensitive learning algorithms into cost-sensitive ones without modifying them [19,22].Some approaches rely on the estimated probabilities, for which calibrated ones are needed [5].These are not always easy to obtain, and even less in a variable setting, as fraud detection.The results using all the introduced approaches are represented in Figure 2 over the two-dimensional map generated by the calibrated probabilities introduced in Figure 1 and the amount, represented as its scaled logarithm between 0 and 10 for the sake of confidentiality.This notation is followed throughout the paper.In this figure it can be seen at a glance the difference between all approaches and their shortfalls.

Youden's J statistic
Youden's J statistic [26] is often used in conjunction with a receiver operating characteristic (ROC) curve, and its maximum used for selecting the optimal classification threshold.For a certain cut-off point, it is defined as: The optimal decision rule is defined by the cut-off point t maximizing J (6), and consequently accuracy, which could be sub-optimal when the objective is loss reduction as shown in Section 2.

Brute force threshold
In order to scrutinize the best strategy considering only the estimated fraud probability, an empirical thresholding search is considered.It gives the greatest savings on a training set [19].A grid is constructed dividing the one-dimensional decision space in 1000 equally spaced intervals.The savings obtained when considering each cut-off point t in the grid is computed, and the one that produces the maximum is taken as classification threshold.Since the resulting savings are computed for each cut-off, the restricted search can be implemented as well, considering only the thresholds satisfying the POA restriction.

Cost matrix
A common cost-sensitive approach for the threshold selection is to consider the different costs as in Table 1 but with a fixed cost in each of the four entries for all i.This approach is introduced to evaluate the impact of considering an instance-dependent threshold instead of a fixed one.Using the average of the instance-dependent cost matrix is proposed in [22].In [8] different classification costs are considered to define the optimal threshold, leading to the least expected cost:

Bayes minimum risk
Bayes minimum risk approach [4,8] is introduced since it is the theoretical optimal cost-sensitive decision rule.It is also related to the new method to be presented in Section 5.It considers the exogenous variable in the decision making combined with an estimated probability, p(y | x).Taking into account the risk of a data point: where y ∈ {0, 1} and ℓ is a loss function as (1), a data point is labeled as fraud if R(1, ξ | x) ≤ R(0, ξ | x), i.e. if the risk of classifying it as a fraud is lower than as legitimate.Considering the loss function defined in (1), for a new data point i this leads to the decision rule: Note that although this approach is theoretically optimal, in an extreme imbalanced setting it could not be the case when considering the aggregated sample results, leading to improvable results in terms of savings.For example, for b = 20e , a 1 = 0.05 and a 2 = 0.08, a data point with pi = 0.1 will be analyzed for fraud if ξ i ≥ 217.39e.Suppose that there is a fraudulent operation with p = 0.1 and an amount of 300e.It can be more operations, let's say 20, with p = 0.1 and amount greater than 217.39e given the imbalance of the sample.The aggregated costs would be 20 • 20e versus a 300e fraud, so although one fraud could be detected, the aggregated results involve more losses than the fraud itself due to the analysis cost.This can be seen in Figure 2, where there are low and medium amount frauds surrounded by a lot of legitimate operations inside the decision region, leading to worse aggregated results in terms of savings, as it can be seen in Section 6.2.Besides, the frontier defined by (7) does not allow any flexibility in order to adapt the decision region, so the method cannot be used in practice as it does not fulfill the imposed POA restrictions.

Two-dimensional thresholding
In order to overcome previous thresholding approaches limitations, we propose expanding the decision space to a two-dimensional map generated by the estimated probability, p, and the loan amount, ξ.Thus a more flexible and effective decision region can be explored with a trade-off between fraud probability and amount.A new algorithm is introduced for searching the optimal decision region with an arbitrary frontier shape.
For numerical optimization, given a sample {(p i , ξ i )} n i=1 , a grid, G(k), is defined depending on a k parameter which drives the search smoothness: where When considering a two-dimensional decision space, a simple idea is to take a threshold in each dimension as a generalization of the brute force approach introduced in Section 4.2.Hence the decision region consists of an upper right quadrant . A nonparametric approach is proposed, based on adding quadrants recursively to the decision region until no improvement is found in terms of savings.A decision region defined by a set of points, R = {r j } m j=1 , is constructed as the union of their associated upper right quadrants Q r j , Given an instance (p i , ξ i ) and a decision region D(R) as (10), its labeling is defined as Ŷ R i = I ((p i , ξ i ) ∈ D(R)), and given the sample {(p i , ξ i )} n i=1 , savings is defined as: Algorithm 2-DDR(k) (2-dimensional decision region algorithm depending on the parameter k) is proposed for the optimal decision-making estimation.It starts with a decision region defined by the most northeast point of the grid G(k), the one with highest estimated fraud probability and amount.In a recursively manner, each of the points of G(k) surrounding the current decision region is added to the current region as in (10) and savings computed as in (11).The point whose inclusion produces the greatest savings increase is added.If there is no savings improvement with respect to the previous decision region, the process is repeated with the next surrounding points of G(k).The algorithm stops when the minimum in the data support is reached.An example of the first iterations is shown in Figure 3. Starting with a preliminary decision region, savings is calculated considering the surrounding points of the grid.As no improvement is obtained, the next surrounding vertices are explored.This time, an improvement is obtained, so it is updated.In the unrestricted scenario, the proposal consists in just running Algorithm 2-DDR(k).For the constrained cases, it is iterated until the POA restriction (e.g.10% or 5%) is met.The resulting regions for the second data set in Section 6 are plotted in Figure 4.Note how the algorithm, as it would be expected, focus on high fraud probability and amount points and avoids autonomously areas with high legitimate points density, where the analysis cost do not compensate the fraudulent amount detected when considering aggregated costs.The strength of the algorithm is that this intuitive logic is developed automatically without any need of additional estimation neither tuning parameters.In addition, the search is performed over all the space, so if the optimal decision have the shape of one of the previous proposals, it will be found except for some roughness depending on the k parameter.Thus, Algorithm 2-DDR(k) is expected to improve (or at least reproduce) previous approaches in terms of savings thanks to the totally free search, permitting restricted decision rules as well, solving the cost-sensitive classification problem.

Algorithm 2-DDR(k) Two-dimensional decision region algorithm
Steps δ 1 and δ 2 as ( 9) and the grid G(k) as defined in (8); end while 19: end while 20: Output A decision region defined as in (10)

Experiments
Two real data sets are presented to evaluate the performance of all introduced approaches.Both consists in real fraud data sets, according to which they exhibit the difficulties presented in Section 1 as extreme imbalance and class overlap.To assess the importance of costs during training independently from the thresholding strategy, we rely on threshold-independent metrics, namely AUC, Gini index, the Kolmogorov-Smirnov statistic (KS) and H-measure (H).The latter may be more informative given the high degree of class imbalance [22].Expected savings (5), are summarized as well as this is the objective in the costsensitive setting.To evaluate the impact of the decision-making threshold, it is reported the accuracy (Acc), recall (Rec), specifity (Spec) and F-score (F) along with the objective function, savings defined in (2).We train over the two data sets the introduced classifiers in Section 3: logistic regression (LR), weighted logistic regression (WLR), adaboost (AB), cslogit (CSL) and csboost (CSB).Thresholding methods introduced in Section 4 and the twodimensional method proposed in Section 5 with k = 20, 50, 100 are applied over the estimated probabilities, previously calibrated following [8,5] so the thresholding approaches that rely on probabilities are not distorted.For each data set, a 5-fold cross validation is performed and the mean results summarized in Section 6.2.

Data sets
The first data set, available at kaggle.com/mlg-ulb/creditcardfraud[6,11], consists of 284, 807 credit card transactions made in two days, where there are 492 (0.17%) frauds.It consists in 28 variables resulting from a PCA along with a "Time" variable (seconds elapsed from the first transaction) and the "Amount" of each transaction.The "Class" variable indicates if an operation is legitimate (0) or fraudulent (1).
The second one is a real data set of 210, 180 loan requests lend by a collaborator financial entity, collected between January 2018 and December 2021 with a 0.67% fraud percentage.In order to preserve confidentiality, the number of registers is truncated and so the fraud proportion, and only selected variables summarized in Table 2 in terms of its information value (a measure of the relation between a variable and the odds ratio [20]).These are limited to information provided at the request time, implying another handicap as the information at this point is limited and a fraudster is not necessarily one with a bad credit profile.Considered variables were selected fitting a logistic model over the train set, selecting variables with a stepwise algorithm and taking only significant ones in terms of the t-test.Only formalized requests are considered, because nothing can be assured about a non-formalized operation.Note that these are the operations of interest (and most difficult to detect) as they are the ones that passed all the filters and controls.

Results
Classifiers results obtained in the first data set over the test samples are summarized in Table 3. Adaboost outperforms all classifiers in terms of classification and surprisingly even csboost in terms of ES due to an overfit in the latter.Cslogit obtains the highest ES among all classifiers, due to a reasonable trade-off between adjusting to the costs and a generalizable model.In this case the restricted search is not considered as it is a public data set without any imposed restriction.Table 4 summarizes thresholding approaches performance for the different classifiers.The first highlight is that detecting more frauds does not imply an increase in savings, as can be seen with all the classifiers and in particular with CM and JS.As cost are not considered in the decision making, the highest frauds are not detected and there is an increase in POA, which leads to high analysis costs and consequently smaller savings.This is a clear example of the importance of the correct selection of operations to inspect in an amount-dependent problem.AdaBoost obtains the greatest metrics in the train set, but a much lower performance in the test set due to a clear overfit that could become dangerous in practice.We highlight the cslogit approach, outperforming most of other existing methods in terms of classification and savings.In the test set, the best results in terms of savings are achieved with significantly smaller POA with the new proposed approach: 2-DDR algorithm.It focus on detecting high amount frauds, so it obtains the smaller POA with every classifier as well as the highest savings, for which is considered the outperforming approach.The smoothness effect of the parameter k is clear, with a direct relation with savings obtained in the train set and certain overfit in some classifiers.Finally, for this dataset, cslogit with Algorithm 2-DDR(k) is selected as the outstanding approach as it obtains the greatest results in the test samples with an understandable model and low POA.Regarding the second data set, classifiers results are summarized in Table 5. Cost-sensitive approaches clearly outperform classical ones significantly.This setting is more difficult, which is reflected in the smaller savings compared to the previous data set, so good classification is even less guaranteed to lead to an increase in savings.Table 6 summarizes different thresholding strategies results.Csboost gives a very good performance, probably due to the scarcity of variables that prevent overfitting and make necessary a more complex modeling.Algorithm 2-DDR(k) outputs the best results under each classifier in terms of savings (2) again.Csboost seems to be falling into overfitting again, leading to improvable results in the test samples.It is also worth noting the fact that Algorithm 2-DDR(k) outperforms all thresholding approaches in terms of classification metrics except making clear how the algorithm focus on the more profitable operations to analyze.As a consequence, POA is always significantly smaller, which is another benefit.Highest savings are obtained with Adaboost and Algorithm 2-DDR(k) in the test samples, but almost the same savings can be obtained considering logistic regression, a simpler model, according to what is debatable which will be the better approach in practice.The restricted search is performed as it was required by our collaborator financial entity.POA of 10% and 5% are considered in Table 7.The results are parallel to the ones obtained in the unrestricted case.It is worth mentioning the 62% of the savings achieved with the best unrestricted approach adjusting to the 10% POA restriction.Taking into account its applicability in the financial context and the good results obtained, the finally selected model is the logistic regression along with Algorithm 2-DDR(k).Thus the bank can have a model that satisfies the interpretability restrains and its own workload limits obtaining a 27% reduction in losses due to fraud.

Summary and conclusions
This work introduces a new cost-sensitive methodology for fraud detection to reduce aggregated losses, the main concern in any business.It can be generalized into any cost-sensitive problem, with potential in other settings as credit-risk or customer churn prediction.Algorithm 2-DDR(k) improves previous thresholding approaches results with an understandable decision rule and without any need of further estimations neither parameter tuning.Calibrated probabilities are not needed as the proposed approach only rely on the points ordering, reducing the degree of complexity of the problem.Lastly, it has the added advantage that any POA restriction can be considered.Regarding the computational time, empirical results show that it depends mainly on k, as it controls the search grid size and not on the sample size, which makes this approach suitable for scalability to larger data sets.
Tables 4 and 6 summarize the results of all the approaches considered in the paper over two real fraud data sets.Although some thresholding approaches outperform the proposed methodology in terms of classification, they are always beaten by the new proposal in terms of the objective function, savings in (2).This illustrate the contrast between minimizing costs or classification error during training, indicating that these are two fundamentally different objectives.Previous thresholding approaches are contained in Algorithm 2-DDR(k) search, for which a consistent improvement was expected.This has been verified with both data sets results.For each data set a different model was selected.However, given the no-free-lunch theorem, there is never a clear, overall winner and some experimentation will always be required to optimize scorecard performance.
Further extensions can be considered.A different grid defined by the percentiles in each dimension is a possible way of reducing computational times.Also, more dimensions could be introduced.For example, an extra default probability dimension could be included to optimize the credit admission strategy globally.These are just some examples of possible extensions that can be made from the flexibility offered by the proposed method, which has shown a satisfactory performance in amount-dependent problems.

Disclosure statement
Abanca Servicios Financieros consents to the publication of this work after verifying that no type of confidential or sensitive information related to the company or its clients is provided.

Figure 1 :
Figure 1: Accuracy (dashed black), sensitivity (red), POA (blue) and savings (solid black) considering different cut-off decision points over the score.

Figure 3 :
Figure 3: Algorithm 2-DDR representation along the grid G(k) (black dots), evaluated decision region (dashed lines) and the updated decision region (solid lines).

Table 2 :
Summary of the second data set variables.

Table 3 :
Classification metrics for the different classifiers over the credit card data set

Table 4 :
Mean results summary in the credit card data set for the combination of all the approaches introduced throughout the paper.

Table 5 :
Classification metrics for the different classifiers over the second fraud data set

Table 7 :
Mean results summary in the second fraud data set considering the 5% and 10% POA restriction for the combination of all the approaches introduced throughout the paper.