Causality-based Counterfactual Explanation for Classification Models

Counterfactual explanation is one branch of interpretable machine learning that produces a perturbation sample to change the model's original decision. The generated samples can act as a recommendation for end-users to achieve their desired outputs. Most of the current counterfactual explanation approaches are the gradient-based method, which can only optimize the differentiable loss functions with continuous variables. Accordingly, the gradient-free methods are proposed to handle the categorical variables, which however have several major limitations: 1) causal relationships among features are typically ignored when generating the counterfactuals, possibly resulting in impractical guidelines for decision-makers; 2) the counterfactual explanation algorithm requires a great deal of effort into parameter tuning for dertermining the optimal weight for each loss functions which must be conducted repeatedly for different datasets and settings. In this work, to address the above limitations, we propose a prototype-based counterfactual explanation framework (ProCE). ProCE is capable of preserving the causal relationship underlying the features of the counterfactual data. In addition, we design a novel gradient-free optimization based on the multi-objective genetic algorithm that generates the counterfactual explanations for the mixed-type of continuous and categorical features. Numerical experiments demonstrate that our method compares favorably with state-of-the-art methods and therefore is applicable to existing prediction models. All the source codes and data are available at \url{https://github.com/tridungduong16/multiobj-scm-cf}.


Introduction
Machine learning (ML) is increasingly recognized as an effective approach for large-scale automated decisions in several domains.However, when an ML model is deployed in critical decision-making scenarios such as criminal justice [1,2] or credit assessment [3], many people are skeptical about its accountability and reliability.Hence, interpretablity is vital to make machine learning models transparent and understandable by humans.Recent years witness an increasing number of studies that have explored ML mechanisms under the causal perspective [4,5,6].Among these studies, counterfactual explanation (CE) is the prominent example-based method that focuses on generating counterfactual samples for interpreting model decisions.For example, consider a customer A whose loan application has been rejected by the ML model of a bank.Counterfactual explanations can generate a "what-if" scenario of this person, e.g., "your loan would have been approved if your income was $51,000 more".Namely, the goal of counterfactual explanation is to generate perturbations of an input that leads to a different outcome from the ML model.By allowing users to explore such "what-if" scenarios, counterfactual examples are interpretable and are easily understandable by humans.
Despite recent interests in counterfactual explanations, existing methods suffer three limitations.First, the counterfactual methods neglect the causal relationship among features, leading to the infeasible counterfactual samples for decision makers [7,8].In fact, a counterfactual sample is considered as feasible if the changes satisfy conditions restricted by the causal relations.For example, since education causes the choice of the occupation, changing the occupation without changing the education is infeasible for the loan applicant in the real world.Namely, the generated counterfactuals need to preserve the causal relations between features in order to be realistic and actionable.Second, on the algorithm level, most counterfactual methods use the gradient-free optimization algorithm to deal with various data and model types [9,8,10,11,12].These gradient-free optimizations rely on the heuristic search, which however suffers from inefficiency due to the large heuristic search space.In addition, optimizing the trade-off among different loss terms in the objective function is difficult, which often leads to sub-optimal counterfactual samples [13,14,11].
To address the above limitations, we propose a prototype-based counterfactual explanation framework (ProCE) in this paper.ProCE is a model-agnostic method and is capable of explaining the classification in the mixed feature space.
It should be emphasized that the proposed method focuses on maintaining the causal relationships among the features in dataset instead of the causal relationship between features and target variable [15].Overall, our contributions are summarized as follows: • By integrate causal discovery framework and causal loss function, our proposed method can produce the counterfactual samples that satisfy the causal constraints among features.
• We utilize the auto-encoder model and class prototype to guide the search progress and speed up the searching speed of counterfactual samples.
• We design a novel multi-objective optimization that can find the optimal trade-off between the objectives while maintaining diversity in counterfactual explanations' feature space.

Preliminary
Throughout the paper, lower-cased letters x and x denote the deterministic scalars and vectors, respectively.We consider a dataset D = {x i , c i } n i=1 con-sisting of n instances, where x i ∈ X is a sample, c i ∈ C = {0, 1} is the class of individuals x i , and x j i is the j-th feature of x i .Also, we consider a classifier H : X → Y that has the input of feature space X and the output as Y = {0, 1}.
We denote Q φ (.) as an encoder model parameterized by φ.Finally, proto * (x) and K(x) are the prototype and the set of K-nearest instances of an instance x, respectively.Definition 2.1 (Counterfactual Explanation).With the original sample x org ∈ X , and original prediction y org ∈ Y, the counterfactual explanation aims to find the nearest counterfactual sample x cf such that the outcome of classifier for x cf changes to desired output class y cf .In general, the counterfactual explanation x cf for the individual x org is the solution of the following optimization problem: x * cf = arg min where f (x cf ) is the function measuring the distance between x org and x cf .Eq (1) demonstrates the optimization objective that minimizes the similarity of the counterfactual and original samples, as well as ensures the classifier to change its decision output.For such explanations to be plausible, they should only suggest small changes in a few features.
To make it clear, we consider a simple scenario that a person with a set of features {income: $50k, CreditScore: "good", education: "bachelor" , age: 52} applies for a loan in a financial organization and receives the reject decision from a predictive model.In this case, the company can utilize the counterfactual explanation (CF) as an advisor that provides constructive advice for this customer.To allow this customer successfully get the loan, CF can give an advice that how to change the customer's profile such as increasing his/her income to $51k, or enhancing the education degree to "Master".This toy example illustrates that CF is capable of providing interpretable advice that how to makes the least changes for the sample to achieve the desired outcome.

Related Work
Recently, there has been an increasing number of studies in this field.The existing counterfactual explanation methods can be categorized into gradientbased methods [16,17,14], auto-encoder model [18,13], heuristic search based methods [8,9] and integer linear optimization [19,20].
Gradient-based methods: Counterfactual explanation is first proposed by the study [17] as the example-based method to interpret machine learning models' decision.In this study, the authors construct the cross-entropy loss between the desired class and counterfactual samples' prediction with the purpose of changing the model output.Thereafter, some gradient-descent optimization algorithms would be used to minimize the constructed loss.This approach draws much attention with a plethora of studies [11,18,14,14] that aim to customize the loss function to enhance the properties of counterfactual generation.For example, the study [21] extends the distance functions in Eq (1) by using a weight vector (Θ) to emphasize the importance of each feature.Some algorithms such as k-nearest neighbors or global feature evaluation can be deployed to find this vector (Θ).Another framework called DiCE [14] proposes using the diversity score to produce the number of generated samples that allows users to have more options.They thereafter use the weighted sum to combine different loss functions together and also adopt the gradient-descent algorithm to approximately find the optimal solution.The research [22] utilizes the class prototype to guide the search progress to fall into the distribution of the expected class.
This method however does not consider the causal relationship among features.
The differentiable methods are the prominent approach in counterfactual explanation that allows to optimize easily and control the loss functions, but are only restricted to the differentiable models, and finds it hard to deal with the non-continuous values in tabular data.
Auto-encoder model: Other recent studies based on the variational autoencoder (VAE) model utilizes the properties of generative models to generate new counterfactual samples.In the study [23], the authors first construct an encoder-decoder architecture.Thereafter, they generate the latent representa-tion from the encoder, and make some perturbation into the latent representation, and go through the decoder until the prediction models achieve the desired class.Meanwhile, another line of recent work [13] proposes the conditional auto-encoder model by combining different loss functions including prediction loss and proximity loss.They thereafter generate multiple counterfactual samples for all input data points by conditioning on the target class.These studies heavily rely on gradient-descent optimization which can face difficulties when handling categorical features.In addition, VAE models that maximize the lower bound of the log-likelihood rather than measuring the exact log-likelihood can give unstable and inconsistent results.
Heuristic search methods: There is an increasing number of counterfactual explanation methods for non-differentiable models, which makes the previous gradient-based approach not applicable.They utilizes heuristic search for the optimization problem such as Nelder-Mead [11], growing spheres [24], FISTA [10,22], or genetic algorithms [25,12,9].The main idea of these approaches adopts evolutionary algorithms to effectively finds the optimal counterfactual samples based on the defined cost functions.For example, CERTIFAI [9] customizes the genetic algorithm for the counterfactuals search progress.CER-TIFAI adopts the indicator functions (1 for different values, else 0) and mean squared error for categorical and continuous features, respectively.Apart from that, the study [8] introduces a method called FACE that adopts Dijsstra's algorithm to generate counterfactual samples by finding the shortest path of the original input and the existing data points.The main advantage of FACE is that the produced path from Dijsstra's algorithm provides an insight into the step-by-step and feasible actions that users can take to achieve their goals.
The generated samples of this method are limited to the input space without generating new data.

Integer linear optimization
The studies [7,19] propose to adopt integer linear optimization (ILO) solver for linear models utilizing linear costs to generate the actionable changes.Specifically, they formulate the problem of finding counterfactual samples according to the cost function as a mixed-integer lin-ear optimization problem and then utilize some existing solvers [26] to obtain the optimal solution.To speed up the counterfactual samples search process, the study [27] introduces convex constraints to bound the solutions in a region of data space locally.Although these approaches seem promising when dealing with non-continuous features and non-differentiable functions, they can be applied to linear models only.
Our method extends the line of studies [22,13] by integrating both structural causal model and class prototype.We also formulate the problem as the multi-objective optimization problem and propose an algorithm to find the counterfactual samples effectively.

Methodology
In this section, we firstly present different objective functions corresponding to different properties of counterfactual samples.The structural causal model and causal distance are also investigated to exploit the underlying causal relationship among features.Then, we formulate the counterfactual sample generation as a multi-objective optimization problem and propose an algorithm based on the non-dominated sorting genetic algorithm (NSGA-II) to obtain the optimal solutions.Figure 1 generally describes the overall architecture of our proposed framework containing four main different loss functions: 1) prediction loss that ensures the valid counterfactual samples, 2) proximity loss encourages that only small changes would be performed in the counterfactual samples from the original one, 3) prototype-based loss that guides the search progress, and finally 4) causality-preserving loss that maintains the causal relationships.
Moreover, there are three models in the framework: provided prediction model (h), auto-encoder model (Q φ ), and structural causal model (M).

Prototype-based Causal Model
Counterfactuals provide these explanations in the form of "how to assign these features with different values, your credit application would have been accepted".This indicates that counterfactual samples should be constrained under several particular conditions.We first provide definitions of each constraint condition and further tie them together as a multi-objective optimization problem to find an optimal counterfactual explanation.For clarity, we first introduce each constrain condition as loss function as follows.

Prediction Loss
We firstly consider the prediction loss which is the prominent loss function for counterfactual explanation.In order to achieve the desired outcome, prediction loss aims to calculate the distance between the counterfactual and expected/desired predictions.This loss function encourages the predictive models to change their predictions of counterfactual samples towards the desired outcomes.Particularly, for the classification scenario, we use the cross-entropy loss to minimize the counterfactual and expected outcome.The prediction loss is defined as follows: Cross-entropy loss [28] normally measures the performance of a classification model whose output is a probability value between 0 and 1. Cross-entropy loss is considered in this case to increases as the predicted probability of counterfactual samples H(x cf ) diverges from desired outcome y cf .

Prototype-based Loss
In practice, the search space of counterfactuals might be incredibly large which thus results in slow optimization.Inspired by the work [22], we utilize the class prototype to guide the search progress with the aim of improving the efficiency of finding the counterfactual solutions.Class prototype is first defined as the mean encoding of the instances belonging to the same class [29].
Therefore, in our work, we construct an auto-encoder model to obtain the latent space which allows us to learn a better representation of these instances.
We resort to an encoder function denoted by Moreover, the classes of these K instances, i.e., {c k } K k=1 , are different from the original prediction y org meaning that c k = y org .Formally, K(x org ) is defined as: Therefore, a prototype of an original instance x org is computed by the mean of its nearest neighbors in the latent space: The definition of proto * in Eq. 5 indicates that the prototype is in fact the representatives of the samples belonging to counterfactual class.We thus define the prototype loss function as L 2 -norm distance between the representation of the counterfactual samples x cf in the latent space and the obtained prototypes: (6)

Features cost
One of the main obstacles of generating counterfactual samples is to compute the feature cost which captures the effort required for changing from original instance x org to counterfactual ones x cf .From the fundamental principles of counterfactual explanation, the generated samples should be as close as to the original one.The smallest changes mean that the least efforts are made for decision-makers to take to achieve their desired goals.However, even experts would find it hard to put the precise cost to demonstrate how unactionable the feature is.Moreover, when it comes to the mixed-type tabular data that contains both the categorical and continuous features, it is challenging to define the distance loss function [30,31,32,33].The previous studies [9,14,25] normally apply the indicator function that returns 1 when two categorical values match and returns 0 otherwise, and adopts L 2 -norm distance for comparing continuous features.However, the indicator function which only returns 0 and 1 fails to measure the degree of similarity of two categories.In this study, we use the encoder model Q φ to map the categorical features into the latent space before estimating their distance.The main advantage of this approach is that the encoder model has the capability to capture the underlying relationship and pattern between each categorical value.This means that manual feature engineering such as assigning weight for each category is not necessary, thus saving a great deal of time and effort.Thus, we come up with the distance between two samples is defined as below: , if x j is j-th categorical feature (7)

Causality-preserving Loss
Although the distance function in Eq. ( 7) demonstrates the similarity of two samples, it fails to capture the causal relationship between each feature.
To deal with this problem, we integrate the structural causal model, and thus construct the causal loss function to ensure the features' causal relationships in generated samples.We provide some fundamental definitions about causality and thereafter define the corresponding causal loss.In general, a structural causal model M = {U, V, F} [34] consists of three main components defined as below: • U is the set of exogenous nodes which has no parents in the causal graph.
• V is the set of random variables which are endogenous nodes whose causal mechanisms we are modeling.These variables have parents in the causal graph.
• F is the set of structural causal functions describing the causal relationships among the unobserved and observed variables.Specifically, for each Pa(X) is the parent nodes of X.
A causal graph indicates a probabilistic graphical model that represents the assumptions about data-generating mechanism.A causal graph consists of a set of nodes and edges where each node represents a random variable, and each edge illustrates the causal relationship.The causal effect in causal model is facilitated by do-operator or intervention [35] that assigns value x to a random variable X denoted by do(x).The symbol do(x) is a model manipulation on a causal graph M, which is defined as substitution of causal equation For each endogenous node v ∈ V , and its parent nodes (v p1 , v p2 , . . ., v p k ), we estimate each node v as v = g(v p1 , v p2 , . . ., v p k ) to represent their causal relationship with g( * ) is the structural causal equation constructed by linear regression model.Since having the full causal graph is often impractical in real-world setting, it is quite challenging to estimate structural causal equation g( * ).In this work, we utilize LiNGAM [36] which is a novel estimation technique based on the non-Gaussianity of the data to determine the function g( * ).During the counterfactuals generation progress, we firstly produce the predicted value of endogenous node x v based on their parents before estimating the distance, which is measured as: With a set of observed variables containing the endogenous and exogenous ones X = {U, V}, we can re-write the general distance between the original and counterfactual sample is the sum of distance of both of them.For the exogenous nodes U (nodes without any parents in the causal network), we still utilize the Eq. ( 7) which computingthe distance between two instances, while the causal distance in Eq. ( 8) is employed for exogenous variables V (the remaining features).

Multi-objective Optimization
In this section, we aim to describe the proposed algorithm which is used for optimization process.With the loss functions presented in Sections 3.1 including f pred , f proto , f final dist , we come up with the general objective functions Eq (10).
These loss functions illustrates different properties that counterfactual samples should adhere to.The general loss functions containing three different losses is: Therefore, the optimal solutions can be re-written as follows: x * cf = arg min In order to obtain the optimal solutions, the majority of existing studies [13,14,11] uses the trade-off parameter sum assigning each loss function a weight, and combines them together.This approach seems to be reasonable; however, it is very challenging to balance the weights for each loss, resulting in a great deal of efforts and time into hyperparameter tuning.To address this issue, we propose to formulate the counterfactual explanation search as the multiobjective problem (MOP).In this study, we modify the elitist non-dominated sorting genetic algorithm (NSGA-II) [37] to deal with this optimization problem.
Its main superiority is to optimize each loss function simultaneously as well as provide the solutions presenting the trade-offs among objective functions.To make it clear, we first present some related definitions.Given a set of n candidate solutions P = {x i } n i=1 , we have the following ones: Definition 3.1 (Dominance in the objective space).In the multi-objective optimization problem, the goodness of a solution is evaluated by the dominance [38].
Given two solutions x and x along with a number of p objective functions f i , we have: 2. x dominates x (x x) iff x x and x = x.Definition 3.2 (Pareto front).Pareto front is a set of m solutions denoted by F * = {x j } m j=1 ⊂ P such that x j dominates all remaining solutions x r ∈ {P\F * } with all objective functions.It means that f i (x j ) ≥ f i (x r ) ∀i ∈ {1, . . ., p}.
The main goal of non-dominated solutions is to provide a reasonable compromise between all the objective functions that enhance one function's performance but not degrade others.Particularly, to measure this characteristic, the crowding distance [39,40] is used to rank each candidate solution.Specficially, the crowding distance of an instance x is calculated as follows: where p is the number of objective functions, x a and x b are two nearest instances of x by calculating the Euclidean distance, f i is the i-th objective function, f min i and f max i are its minimum or maximum value, respectively.The fundamental concept behind crowding distance is to compute the Euclidean distance between each candidate solution {x j } m j=1 in a front F * by using p objective functions corresponding to p-dimensional hyper space.
The optimization process for objective function (10) is given by Algorithm 1.
The main idea behinds our approach is that for each generation, the algorithm chooses the Pareto Front for each objective function and evolves to the better ones.We firstly find the nearest class prototype of the original sample x org , which is used to measure the prototype loss function later.For the optimal counterfactual x * cf finding progress, each candidate solution is represented by the D-dimensional feature as the genes.A random candidate population is initialized with the Gaussian distribution.Thereafter, the objective functions including f pred , f proto , f final dist are calculated for each candidate.Non-dominated sorting procedure illustrated in Definition 3.3 is then performed to obtain a set of Pareto fronts F = {F i } H h=1 .The crowding distance function illustrated in Definition 3.4 and Eq. ( 12) then is adopted as the score to assign to each individual in the current population.
The algorithm only keeps the candidate solutions having the greatest ranking score, which illustrates that these solutions have low density.The cross-over and mutation procedures [41] are finally performed to generate the next population.
Particularly, the cross-over of two parents generates the new candidate solutions by randomly swapping parts of genes.Meanwhile, the mutation procedure randomly alters some genes in the candidate solutions to encourage diversity and avoid local minimums.We repeat this process through many generations to find the optimal counterfactual solution.
for each candidate solution ∆i in P do 7: Compute f pred (∆i) based on Eq. ( 2).

9:
Compute f final dist (∆i) based on Eq. ( 9).Obtain F = {F h } H h=1 by using non-dominated sorting procedure in Definition 3.3.Compute the crowding distance as the ranking score for each solution in P based on Eq. ( 12).

19:
Keep n individuals in P based on ranking score.

Experiments
We conduct experiments on four datasets to demonstrate the superior performance of our method when compared with state-of-the-art methods.All implementations are conducted in Python 3.7.7 with 64-bit Red Hat, Intel(R) Xeon(R) Gold 6150 CPU @ 2.70GHz.For our method, we construct the multiobjective optimization algorithm with the support of library Pymoo1 [42].More details of implementation settings can be found in our code repository.

Datasets
This section provides information about the datasets, on which we perform the comparison experiments.Our method is capable of generating counterfactual samples while maintaining the causal relationship.To validate this claim, we consider some feature conditions that restrict the generated counterfactual samples for each dataset.For simplicity, we denote a ∝ b for the condition that (a increase ⇒ b increase) AND (a decrease ⇒ b decrease).We use four datasets including Simple-BN, Sangiovese, Adult and Law.
Simple-BN [13] is a synthetic dataset containing 10,000 records with three features (a 1 ,a 2 ,a 3 ) and a binary output (y).The data is generated based on the followed causal mechanism: As illustrated by structural causal equations in Eq (13), two random variables a 1 and a 2 follow the corresponding normal distribution N (µ 1 , σ 1 ) and N (µ 2 , σ 2 ), while a 3 follows the normal distribution with mean value determined by the function of a 1 and a 2 .Additionally, target variable y follows the Bernoulli distribution with the function of a 1 , a 2 and a 3 .Based on these generating mechanism, we consider the following causal relationship between a 1 , a 2 and The condition in Eq (13) means that a 3 monotonically increase and decrease by a function of two random variables a 1 and a 2 .
Sangiovese2 [43] dataset evaluates the impact of several agronomic settings on the quality of the Tuscan grapes.This dataset provides information about 14 continuous features along with the binary output.We consider the task of determining whether the grapes' quality is good or not.Based on the conditional linear Bayesian network provided with the dataset, we consider a causal relationship between two features including mean number of sprouts (SproutN) and mean number of bunches (BunchN) that is: Adult3 [44] is the real-world dataset providing information of loan applicants in the financial organization.It is a mixed-type dataset that consists of instances having both continuous features and categorical features.For this dataset, we consider the task of determining whether the annual income of a person exceeds $50k dollars.Similar to the study [13]

∝
x age org ) demonstrates the education-age causal relationship that obtaining a higher degree of education such as from "Bachelor" to "PhD" requires years to complete, thus causing age to increase.As a result, any counterfactual sample increasing education-level without increasing age is infeasible.
Law4 [45] dataset provides information of students with their features: sex, race and their entrance exam scores (LSAT), grade-point average (GPA) and first year average grade (FYA).The main task is to determine which applicants will be accepted to the law program.We consider a causal relationship: In order to evaluate the models' effectiveness, we randomly split each dataset into 80% training and 20% test set.We conduct 100 repeated experiments, then evaluate performance on the test set and finally report the average statistics.

Evaluation Metrics
In this section, we briefly describe six quantitative metrics that are used to evaluate the performance of our proposed method and baselines.We sample a number of n factual samples and generate the counterfactual samples for them.Meanwhile n cat and n con are the corresponding number of categorical and continuous features.1(.) is the indicator function that returns 1 when the conditions are satisfied, otherwise returns 0.
Target-class validity (%Tcv) [13,8] evaluates how well the algorithm can produce valid samples.Particularly, %Tcv is calculated as the ratio of the number of samples belonging to the desired class and the number of factual samples.
Higher target-class validity is favorable, demonstrating that the algorithm can generate greater numbers of counterfactual samples towards the desirable target variable.
Causal-constraint validity (%Ccv) measures the percentage of counterfactual samples satisfying the pre-defined causal conditions.With this metric, the main aim is to evaluate how well our algorithm can generate feasible counterfactual samples that do not violate the causal relationship among features [13].With the causal conditions defined in the Section 4.1, using n s as the number samples satisfying causal conditions, the causal-constraint validity is defined in Eq (20).Higher causal-constraint validity is preferable, illustrating the greater number of satisfied counterfactual samples.
Categorical proximity measures the proximity for categorical features representing the total number of matches on the values of each category between x cf and x org .Higher categorical proximity is better, implying that the counterfactual sample preserves the minimal changes from the original [14].
Continuous proximity illustrates the proximity of the continuous features, which is calculated as the negative of L 2 -norm distance between the continuous features in x cf and x org .Higher continuous proximity is preferable, implying that the distance between the continuous features of x org and x cf should be as small as possible [14].
IM1 and IM2 are two interpretability metrics (IM) proposed in [22].Let Q org φ , Q cf φ and Q full φ be the auto-encoder models trained specifically on samples of class y org , samples of class y cf and the full dataset, respectively, we first provide the general idea behind these two metrics.On the one hand, IM1 measures the ratio of reconstruction errors of counterfactual sample x cf using Q cf φ and Q org φ .A smaller value for IM1 indicates that x cf can be reconstructed more accurately by the autoencoder trained only on instances of the counterfactual class y cf than by the autoencoder trained on the original class y org .This therefore demonstrate that the counterfactual sample x cf lies closer to the data manifold of counterfactual class y cf , which is considered to be more interpretable.On the other hand, IM2 evaluates the similarity of counterfactual sample x cf produced by Q cf φ and Q φ .A low value of IM2 means that the reconstructed instances of x cf are very similar when using either Q cf φ or Q full φ .Therefore, the data distribution of the counterfactual class y cf describes x cf as close as the distribution of all classes.Particularly, IM1 and IM2 are defined as follows:

Baseline Methods
We compare our proposed method (ProCE) with several baselines including Wachter (AR), Growing Sphere (GS), CERTIFAI, CCHVAE and FACE.All of them are the recent approaches in the counterfactual explanation with available source codes and framework.The brief description of these baselines are illustrated as follows: 1. Wachter (Wach) [17] which is a fundamental approach that generates counterfactual explanations by minimizing L 1 -norm by using gradient descent to find counterfactuals x cf as close as to original instance x org .
2. Growing Sphere (GS) [46] is a random search algorithm, which generates samples around the factual input point until a point with a corresponding counterfactual class label was found.Growing hyperspheres are utilized to create the random samples around the original instance.This approach deals with immutable features by excluding them from the search procedure.[47] CERTIFAI is an approach that utilizes genetic algorithm to finds the counterfactual samples more effectively.The source code for this method is not avaibale; therefore, we implement the CERTIFAI with the support from Python library PyGAD5 .[14].DiCE is one of the most prominent counterfactual explanation framework.This construct the weighted sum of different loss functions including proximity, diversity and sparsity together, and optimize the combined loss via the gradient-descent algorithm.For implementation, we utilize the source code6 with default settings.

FACE [8] produces a feasible and actionable set of counterfactual actions
based on the shortest path lengths as determined by density-weighted metrics.The generated counterfactuals by this method that are plausible and coherent with the underlying data distribution.
For all the experiments, we build two predictions model namely 1 st classifier and 2 nd classifier.The first classifier is a neural network with three hidden layers, while the second one has five hidden layers with the following architecture: The continuous features in datasets are in different value ranges; therefore, following the common practice in feature engineering [48,49,50], we normalize the continuous feature to range (0,1).Moreover, regarding the categorical features, we transform them into numeric forms by using a label encoder.

Results and Discussions
The performance of different metrics on 1 st and 2 nd classifier are illustrated in Table 1 and 2, respectively.Regarding to the 1 st classifier from Table 1, all three methods achieve the competitive target-class validity, except the Watch performance in all datasets with around 90% of samples belonging to the target class.Regarding the percentage of samples satisfying the causal constraints, by far the greatest performance is achieved by ProCE with 85.91%, 91.84%, 95.64% and 90.43% for Simple-BN, Sangiovese, Adult and Law datasets, respectively.FACE also produces a competitive performance across four datasets in terms of this metric, standing at 81.49%, 88.65%, 92.49% and 86.71% while the majority of generated samples from Watch violate the causal constraints (63.61%, 58.1%, 70.40% and 76.71%).The performance of %Ccv cannot be achieved to 100% for all the methods which demonstrates that it is quite challenging to maintain the causal constraints in counterfactual samples.Moreover, these results indicate that by integrating the structural causal model, our proposed method can effectively produce the counterfactual samples preserving the features' causal relationships.Regarding interpretability scores, our proposed method achieved the best IM1 and IM2 on four datasets.DiCE is ranked second recorded with competitive result in Adult dataset (0.0809 for IM1 and 0.2679 for IM2) and Law dataset (0.0423 for IM1 and 0.0427 for IM2).The performance of all metrics on the 2 nd classifier in Table 2 also demonstrates the competitive performance of our proposed method across all metrics.We also notice that although the 2 nd has a more complicated architecture than the 1 st classifier, there is a small variation on the performance of counterfactual explanation algorithm.Finally, as expected, by using prototype as a guideline of the counterfactual search process, ProCE produces more interpretable counterfactual instances recorded with good performance in IM1 and IM2.By contrast, it is challenging for other approaches to reconstruct the counterfactual samples, leading to high interpretability scores (IM1 and IM2).
On the other hand, to better comprehend the effectiveness of our proposed method in producing counterfactual samples compared with other approaches, we also perform a statistical significance test (paired t-test) between our approach (ProCE) and other methods on each dataset and each metric with the obtained results on 100 randomly repeated experiments and report the result of p-value in Table 1 and 2. We find that our model is statistically significant with p < 0.05, thus demonstrating the effectiveness of ProCE in counterfactual samples generation task.duces the least fluctuation in continuous proximity for Sangiovese, Simple-BN, Adult, while the biggest variation is witnessed in Law.
We also report the running time of different methods in Table 3. Overall, the shortest time is recorded with Watch method on Simple-BN, Sangiovese, and Law datasets.The possible reason is that Watch is the naive approach which optimizes the basic proximity loss functions using gradient descent.This therefore allows producing the counterfactual sample in a prominent time but demonstrates a poor performance in several metrics.Our approach (ProCE) also demonstrates competitive time performance on these three datasets.Regarding   ure 3a that the performance of continuous proximity for Simple-BN, Sangiovese and Adult datasets is nearly stable with different embedding sizes, while Law witnesses a quite significant variation, increasing from around -0.336 to -0.224 corresponding to embedding sizes of 32 to 256, followed by a slight decrease to -0.33 (embedding size 512).A similar pattern also is recorded for the remaining metrics including categorical proximity, IM1, and IM2 with the good and stable performance at an embedding size of 256.The slight small fluctuations possibly illustrate that the impact of embedding size on the model performance is not very significant.Moreover, 256 is the preferable embedding size, while the sizes of 32 and 512 seem to be relatively small and large to sufficiently capture latent information for embedding vectors.Regarding categorical proximity, the performance declines slightly by 0.1 from 32 to 64, and thereafter varies slightly around 4.0 -4.09 with embedding sizes of 128, 256, and 512.On the other hand, as can be seen from Figure 3b, IM1 and IM2 demonstrate a similar pattern illustrated by the worst performance when the number of instances of 15, followed by a stagnant performance from 25 to 45 instances.It is believed that the similar trend occurring in IM1 and IM2 is reasonable due to their similar properties illustrated in Section 4.2.Meanwhile, there is no significant variation in the performance of continuous and categorical proximity across four datasets.
These results suggest that the performance of our proposed method witnesses a small variation in all evaluation metrics regarding two hyperparameters (embedding sizes and numbers of nearest neighbors), implying our model's stability and robustness.

Conclusion
This paper introduces a novel counterfactual explanation algorithm by integrating the structural causal model and the class prototype.We also proposed formulating the counterfactual generation as a multi-objective problem and construct an optimization algorithm to find the optimal counterfactual explanation in an effective manner.Our experiments validate that our method outperforms the state-of-the-art methods on many evaluation metrics.For future work, we plan to extend our framework to the imperfect structural causal model that is very commonplace in real-world scenarios.Meanwhile, other multi-objective optimization algorithms such as reinforcement learning and multi-task learning are also worthy of investigation.

Figure 1 :
Figure 1: The overall framework for the proposed ProCE.The counterfactual samples are first initialized randomly.

Definition 3 . 3 (
Non-dominated sorting procedure).Non-dominated sorting step is mainly used to sort the solutions in population according to the Pareto dominance principle, which plays a central role in the selection procedure.In fact, the set of candidate solutions P can be divided into a set of H disjoint Pareto front as F = {F 1 , F 2 , . . ., F H } where H is the maximum number of fronts.Non-dominated sorting is a procedure for finding them.Particularly, in the non-dominated sorting step, all the non-dominated solutions from Definition 3.2 are selected from the population and are constructed as the Pareto front F 1 .After that, the non-dominated solutions are chosen from the remaining population.The process is repeated until all the solutions are assigned to a front F H . Definition 3.4 (Crowding distance).One of the vital characteristics of a population solution is diversity.In order to encourage the diversity of candidate solutions, the simplest approach is to choose the individuals having a low density.

Algorithm 1 1 :
Multi-objective Optimization for Prototype-based Counterfactual Explanation (ProCE) Input: An original sample xorg with its prediction yorg, desired class y cf , a provided machine learning classifier H and encoder model Q φ .Compute prototype proto * by Eq. (5).

2 :
Initialize a batch of initial population with n candidate solutions P = {∆i} n i=1 with ∆i ∼ N (µ, ν).

Figure 2
Figure 2 provides information about the categorical proximity in the Adult dataset and continuous proximity in four datasets.For the categorical proximity on both 1 st and 2 nd classifier, ProCE consistently achieves an average of 5 out of the total 6 categories in the dataset meaning that the counterfactual sample

( a )
Our performance under different sizes of E-dimensional embedding for encoder function Q φ .Our performance under different numbers of K-nearest neighbors for class prototype

Table 1 :
Performance of all methods on 1 st classifier.We compute p-value by conducting a Simple-BN, Sangiovese, Adult and Law dataset.Our proposed method pro-

Table 2 :
Performance of all methods on 2 nd classifier.We compute p-value by conducting a paired t-test between our approach (ProCE) and baselines with 100 repeated experiments for each metric.

Table 3 :
We report running time of different methods on four datasets.Baseline results in terms of Continuous proximity and Categorical proximity.Higher continuous and categorical proximity are better.