1 Introduction

Counterfactual explanations are very powerful tools to explain the decision process of machine learning models (Wachter et al., 2017; Karimi et al., 2020). They give us the intuition of what could have happened if the state of the world was different (e.g., if you had taken the umbrella, you would not have gotten soaked). Researchers have developed many methods that can generate counterfactual explanations given a trained model (Wachter et al., 2017; Dandl et al., 2020; Mothilal et al., 2020; Karimi et al., 2020; Guidotti et al., 2018; Stepin et al., 2021). However, these methods do not provide any actionable information about which steps are required to obtain the given counterfactual. Thus, most of these methods do not enable algorithmic recourse. Algorithmic recourse describes the ability to provide “explanations and recommendations to individuals who are unfavourably treated by automated decision-making systems” (Karimi et al., 2021). For instance, algorithmic recourse can answer questions such as: what actions does a user have to perform to be granted a loan? Recently, providing feasible algorithmic recourse has also become a legal necessity (Voigt & Bussche, 2017). Some research works address this problem by developing ways to generate counterfactual interventions (Karimi et al., 2021), i.e., sequences of actions that, if followed, can overturn a decision made by a machine learning model, thus guaranteeing recourse. While being quite successful, these methods have several limitations. First, they are purely optimization methods that must be rerun from scratch for each new user. As a consequence, this requirement prevents their use for real-time intervention generation. Second, they are expensive in terms of queries to the black-box classifier and computing time. Last but not least, they fail to explain their recommendations (e.g., why does the model suggest getting a better degree rather than changing jobs?). On the contrary, explainability has been pointed out as a major requirement for methods generating counterfactual interventions (Barocas et al., 2020).

In this paper, we cast the problem of providing explainable counterfactual interventions as a program synthesis task (De Toni et al., 2021; Pierrot et al., 2019; Bunel et al., 2018; Balog et al., 2017): we want to generate a “program” that provides all the steps needed to overturn a bad decision made by a machine learning model. We propose a novel reinforcement learning (RL) method coupled with a discrete search procedure, Monte Carlo Tree Search (Coulom, 2006), to generate counterfactual interventions in an efficient data-driven manner. We call it FARE (eFficient counterfActual REcourse). As done by Naumann and Ntoutsi (2021), we assume a causal model encoding relationships between user features and consequences of potential interventions. We also provide a solution to distil an explainable deterministic program from the learned policy in the form of an automaton (E-FARE, Explainable and eFficient counterfActual REcourse). Figure 1 provides an overview of the architecture and the learning strategy and an example of an explainable intervention generated by the extracted automaton. Our approach addresses the three main limitations characterizing existing solutions:

  • It learns a general policy that can be used to generate interventions for multiple users, rather than running separate user-specific optimizations.

  • By coupling reinforcement learning with Monte Carlo Tree Search, it can efficiently explore the search space, requiring massively fewer queries to the black-box classifier than the best evolutionary algorithm (EA) model available, especially in settings with many features and (relatively) long interventions.

  • By extracting a program from the learned policy, it can complement the intervention with explanations motivating each action from contextual information. Furthermore, the program can be executed in real-time without accessing the black-box classifier.

Our experimental results on synthetic and real-world datasets confirm the advantages of the proposed solution over existing alternatives in terms of generality, scalability and interpretability.

Fig. 1
figure 1

1. Model architecture. Given the state \(s_t\) representing the features of the user, the agent generates candidate intervention policies \(\pi _{f}\) and \(\pi _{x}\) for functions and arguments, respectively (an action is a function-argument pair). MCTS uses these policies as a prior, and it extracts the best next action \((f, x)_{t+1}^*\). Once found, the reward received upon making the action is used to improve the MCTS estimates, and correct traces (i.e., those leading to the desired outcome change) are saved in a replay buffer. 2. Training step. The buffer is used to sample a subset of correct traces to be used to train the RL agent to mimic the behaviour of MCTS. 3. Explainable intervention. Example of an explainable intervention generated by the automaton extracted from the learned agent. Actions are in black, while explanations for each action are in red

2 Related work

Counterfactual explanations are versatile techniques to provide post-hoc interpretability of black-box machine learning models (Wachter et al., 2017; Dandl et al., 2020; Mothilal et al., 2020; Karimi et al., 2020; Guidotti et al., 2018; Stepin et al., 2021). They are model-agnostic, which means that they can be applied to trained models without performance loss. Compared to other global methods (Greenwell et al., 2018; Apley & Zhu, 2020), they provide instead local explanations. Namely, they underline only the relevant factors impacting a decision for a given initial target instance. They are also human-friendly and present many characteristics of what it is considered to be a good explanation (Miller, 2019). Therefore, they are suitable candidates to provide explanations to end-users since they are both highly-informative and localized. Recent research has shown how to generate counterfactual interventions for algorithmic recourse via various techniques (Karimi et al., 2020), such as probabilistic models (Karimi et al., 2020), integer programming (Ustun et al., 2019; Kanamori et al., 2020), reinforcement learning (Yonadav & Moses, 2019), program synthesis (Ramakrishnan et al., 2020), and genetic algorithms (Naumann & Ntoutsi, 2021). Researchers also developed solutions tied to a specific class of machine learning models, such as linear models (Tolomei et al., 2017) or Additive Tree Models (Cui et al., 2015). Methods with (approximated) convergence guarantees on the optimal counterfactual policies have also been proposed (Tsirtsis & Rodriguez, 2020). However, most of these methods ignore the causal relationships between user features (Tsirtsis & Rodriguez, 2020; Ustun et al., 2019; Yonadav & Moses, 2019; Ramakrishnan et al., 2020). Without assuming an underlying causal graph, the proposed interventions become permutation invariant. For example, given an intervention consisting of three actions [ABC], any intervention that is a permutation of the actions will have the same total cost. More importantly, it has been recently shown that optimal algorithmic recourse is impossible to achieve without a causal model of the interactions between the features (Karimi et al., 2020). The work by Karimi et al. (2020) provides algorithmic recourse following a probabilistic causal model but optimizes for subpopulation-based interventions instead of personalizing for a single user. CSCF (Naumann & Ntoutsi, 2021) is the only model-agnostic method capable of producing consequence-aware sequential interventions by exploiting causal relationships between features represented by a causal graph. However, CSCF is still purely an (evolutionary-based) optimization method, so it has to be run from scratch for each new user. Furthermore, the approach is opaque with respect to the reasons behind a suggested intervention. In this work, we show how our approach improves over CSCF in terms of generality, efficiency and interpretability.

3 Methods

3.1 Problem setting

The state of a user is represented as a vector of attributes \(s\in \mathcal {S}\) (e.g., age, sex, monthly income, job). A black-box classifier \(h : \mathcal {S} \rightarrow \{True,False\}\) predicts an outcome given a user state, with True being favourable to the user and False being unfavourable. The setting can be easily extended to multiclass classification by either grouping outcomes in favourable and unfavourable ones or learning separate programs converting from one class to the other. A counterfactual intervention I is a sequence of actions. Each action is represented as a tuple, \((f,x) \in \mathcal {A}\), composed by a function, f, and its argument, \(x \in \mathcal {X}_f\) (e.g., (change_income, 500)). When an action is performed for a certain user, it modifies their state by altering one of their attributes according to its argument. A library \(\mathcal {F}\) contains all the possible functions which can be called. This library and the corresponding DSL (Domain Specific Language) are typically defined as a-priori by experts to prevent changes to protected attributes (e.g., age, sex, etc.). Examples of such DSLs can be found in ”Appendix B”. Moreover, each function possesses pre-conditions in the form of Boolean predicates over its arguments which describe the conditions that a user state must meet in order for a function to be called. The end of an intervention I is always specified by the STOP action. We also define a cost function, \(C: \mathcal {A}\times \mathcal {S} \rightarrow \mathbb {R}\) which mimics the effort made by a given user to perform an action given the current state. The cost is computed by looking at a causal graph \(\mathcal {G}\) (Pearl, 2009), where the nodes of the graph are the user’s features. This assumption encodes the concept of consequences and it ensures a notion of order for the intervention’s actions. For example, it might be easier to get first a degree and then a better salary rather than doing the opposite. The causal graph is problem-specific, and we can estimate it using domain knowledge or a domain expert. If we have observational data, we can also try to learn a candidate \(\mathcal {G}\) using automated methods (Tian & Pearl, 2001; Spirtes & Zhang, 2016), although inferring the “true” causal graph without interventions is not trivial. We use the former method for the evaluation by manually crafting the causal graphs. Figure 2 shows an example of a causal graph \(\mathcal {G}\) and of the corresponding costs. Our goal is to train an agent that, given a user with an unfavourable outcome, generates counterfactual interventions that overturn it. Given a black-box classifier h, a user \(s_0\) for whom the prediction by h is unfavourable (i.e., \(h(s_0) = False\)), a causal graph \(\mathcal {G}\) and a set of possible actions \(\mathcal {A}\) (implicitly represented by the functions in \(\mathcal {F}\) and their arguments in \(\mathcal {X}\)), we want to generate a sequence \(I^*\), that, if applied to \(s_0\), produces a new state, \(s^* = I(s_0)\), such that \(h(s^*) = True\). This sequence must be actionable, which means that the user has to be able to perform those actions, and minimize the user’s cost. More formally:

$$\begin{aligned} I^*{} & {} = \min _{I} \sum _{t=0}^T C(a_t,s_t) \nonumber \\{} & {} \hbox {s.t.}\,\quad \quad I = \{a_t\}_{t=0}^T \quad a_t \in \mathcal {A} \quad \forall t \nonumber \\{} & {} \quad \quad \quad s_t = I_{t-1}(s_{t-1}) \quad \forall t > 0 \nonumber \\{} & {} \quad \quad \quad h(I(s_0)) \ne h(s_0) \end{aligned}$$
(1)
Fig. 2
figure 2

Examples of interventions on a causal graph. A A causal graph and a set of candidate actions. B Examples of interventions together with their costs. Note that the green line (\(\sum C =15\)) has a lower cost than the red line (\(\sum C=28\)) thanks to a better ordering of the actions making up the intervention (Color figure online)

Fig. 3
figure 3

Agent architecture. Given the user’s state \(s_t\), it outputs a function policy, \(\pi _f\), an argument policy \(\pi _x\) and an estimate of the expected reward from the state \(v_t\). These outputs are used to select the next best action \((f,x)_{t+1}\)

3.2 Model architecture

3.2.1 Overall structure


Figure 1 shows the complete FARE model architecture. It is composed of a binary encoder and an RL agent coupled with the Monte Carlo Tree Search procedure. The binary encoder converts the user’s features into a binary representation. The conversion is done by one-hot-encoding the categorical features and discretizing the numerical features into ranges. In the following sections, we will use \(s_t\) to directly indicate the user’s state binary version. Given a state \(s_t\), the RL agent generates candidate policies, \(\pi _{f}\) and \(\pi _{x}\), for the function and argument generation respectively. MCTS uses these policies as priors for its exploration of the action space and extracts the best next action \((f, x)_{t+1}^*\). The action is then applied to the environment. The procedure ends when the STOP action is chosen (i.e., the intervention was successful) or when the maximum intervention length is reached, in which case the result is marked as a failure. During training, the reward is used to improve the MCTS estimates of the policies. Moreover, correct traces (i.e., traces of interventions leading to the desired outcome change) are stored in a replay buffer, and a sample of traces from the buffer is used to refine the RL agent.

3.2.2 RL agent structure


The agent structure is inspired by previous program synthesis works (De Toni et al., 2021; Pierrot et al., 2019). It is composed by 5 components: a state encoder, \(g_{enc}\), an LSTM controller, \(g_{lstm}\), a function network \(g_{f}\), an argument network \(g_x\) and a value network \(g_{V}\). See Fig. 3 for an overview. We use simple feedforward networks to implement \(g_f\), \(g_x\) and \(g_V\).

$$\begin{aligned} g_{enc}(s_t)= & {} e_t \qquad g_{lstm}(e_t, h_{t-1}) = h_{t} \end{aligned}$$
(2)
$$\begin{aligned} g_{f}(h_{t})= & {} \pi _{f} \quad g_{x}(h_{t}) = \pi _{x} \quad g_{V}(h_{t}) = v_t \end{aligned}$$
(3)

\(g_{enc}\) encodes the user’s state in a latent representation which is fed to the controller, \(g_{lstm}\). The controller, \(g_{lstm}\) learns an implicit representation of the program to generate the interventions. The function and argument networks are then used to extract the corresponding policies, \(\pi _f\) and \(\pi _x\), by taking as input the hidden state \(h_t\) from \(g_{lstm}\). \(g_V\) represents the value function V and it outputs the expected reward from the state \(s_t\). Here, we omit the state \(s_t\) when defining the policies and the value function output, since \(s_t\) is already embedded into the \(h_t\) representation. In our settings, we try to learn a single program, which we call INTERVENE.

3.2.3 Policy


A policy is a distribution over the available actions (i.e., functions and their arguments) such that \(\sum _{i=0}^{N} \pi (i) = 1\). Our agent produces two policies: \(\pi _{f}\) on the function space, and \(\pi _{x}\) on the argument space. The next action, \((f,x)_{t+1}\), is chosen by taking the argmax over the policies:

$$\begin{aligned} f_{t+1} = \mathop {{{\,\mathrm{arg\!max}\,}}}\limits _{f \in \mathcal {F}} \pi _{f}(f) \quad x_{t+1} = \mathop {{{\,\mathrm{arg\!max}\,}}}\limits _{x \in \mathcal {X}_{f_{t+1}}} \pi _{x}(x\vert f_{t+1}) \end{aligned}$$

Each program starts by calling the program INTERVENE, and it ends when the action STOP is called.

3.2.4 Reward


Once we have applied the intervention I, given the black-box classifier h, the reward, r, is computed as:

$$\begin{aligned} r = \lambda ^T R \; \quad \lambda \in (0,1), \; R = {\left\{ \begin{array}{ll} 1 \quad h(I(s)) \ne h(s)\\ 0 \quad \hbox {otherwise} \end{array}\right. } \end{aligned}$$
(4)

where \(\lambda\) is a regularization coefficient and T is the length of the intervention. The \(\lambda ^T\) penalizes longer interventions in favour of shorter ones. Minimizing the intervention length is related to minimizing the sparsity, which indicates how many features we have changed to obtain a successful counterfactual (Wachter et al., 2017). Sparsity is regarded as an important quality for counterfactual examples and algorithmic recourse (Miller, 2019).

3.3 Monte Carlo tree search

Monte Carlo Tree Search (MCTS) is a discrete heuristic search procedure that can successfully solve combinatorial optimization problems with large action spaces (Silver et al., 2018, 2016). MCTS explores the most promising nodes by expanding the search space based on a random sampling of the possible actions. In our setting, each tree node represents the user’s state at a time t, and each arc represents a possible action determining a transition to a new state. MCTS searches for the correct sequence of interventions that minimize the user effort and changes the prediction of the black-box model. We use the agent policies, \(\pi _f\) and \(\pi _x\), as a prior to explore the program space. Then, the newly found sequence of interventions is used to train the RL agent. To select the next node, we maximize the UCT criterion (Kocsis & Szepesvári, 2006):

$$\begin{aligned} (f,x)_{t+1} = \mathop {{{\,\mathrm{arg\!max}\,}}}\limits _{f \in \mathcal {F}, x \in \mathcal {X}_f} Q(s, (f,x)) + U(s, (f,x))+L(s,(f,x)) \end{aligned}$$
(5)

Here Q(s, (fx)) returns the expected reward by taking action (fx). U(s, (fx)) is a term that trades-off exploration and exploitation, and it is based on how many times we visited node s in the tree. L(s, (fx)) is a scoring term which is defined as follows:

$$\begin{aligned} L(s, (f,x)) = {e^{-(l_{cost}((f,x),s) + l_{count}(f))}} \end{aligned}$$
(6)

where \(l_{cost}=C(a,s) \in \mathbb {R}\) represents the effort needed to perform the \(a=(f,x) \in \mathcal {A}\) action, and \(l_{count} \in \mathbb {R}\) penalizes interventions that call multiple times the same function f. MCTS uses the simulation results to return an improved version of the agent policies \(\pi _f^{mcts}\) and \(\pi _x^{mcts}\). We can also specify the depth of the search tree as a hyperparameter to balance the computational load requested by the procedure.

From the found intervention, we build an intervention trace, which is a sequence of tuples that stores, for each time step t: the input state, the output state, the reward, the hidden state of the controller and the improved policies. The traces are stored in the replay buffer, to be used to train the RL agent.

3.4 Training the agent

The agent has to learn to replicate the interventions provided by MCTS at each step t. Given the replay buffer, we sample a batch of intervention traces and we minimize the cross-entropy \(\mathcal {L}\) between the MCTS policies and the agent policies for each time step t:

$$\begin{aligned} \mathop {{{\,\mathrm{arg\!min}\,}}}\limits _{\theta } \sum _{batch} (V-r)^2 -(\pi _{f}^{mcts})^T\log (\pi _{f}) -(\pi _{x}^{mcts})^T\log (\pi _{x}) \end{aligned}$$
(7)

where \(\theta\) represents the agent’s parameters and V is the value function evaluation computed by the agent.

3.5 Generate interventions through RL

When training the agent, we learn a general policy that can be used to provide interventions for many different users. The inference procedure is similar to the one used for training. Given an initial state s, MCTS explores the tree search space using as “prior” the learnt policies \(\pi _x\) and \(\pi _f\) coming from the agent. The policies \(\pi _x\) and \(\pi _f\) give MCTS a hint of which node to select at each step. Once MCTS finds the minimal cost trace that achieves recourse, we return it to the user. In principle, we can also use only \(\pi _x\) and \(\pi _f\) to obtain a viable intervention (e.g., by deterministically taking the action with highest probability each time). However, keeping the search component (MCTS) with a small exploration budget outperforms the RL agent alone. See Table 2 in Sect. 4 for the comparison between the agent-only model and the agent augmented with MCTS.

Learning a general policy to provide interventions is a powerful feature. However, the policy is encoded in the latent states of the agent, thus making it impossible for us to understand it. We want to be able to extract from the trained model an explainable version of this policy, which can then be used to explain why the model suggested a given intervention. Namely, besides providing to the users a sequence of actions, we want to show also the reason behind each suggested action. The intuition to achieve this is the following: given a set of successful interventions generated by the agent, we can distill a synthetic automaton, or program, (E-FARE) which condense the policy in a graph-like structure which we can traverse.

3.6 Explainable intervention program

We now show how we can build a deterministic program given the agent. Figure 4 shows the complete procedure and an example of the produced trace. First, we sample M intervention traces from the trained agent and extract a sequence of \(\{(s_i, (f, x)_i)\}_{i=0}^T\) for each trace. Then, we construct an automaton graph, \(\mathcal {P}\), in the following way:

  1. 1.

    Given the function library \(\mathcal {F}\), we create a node for each function f available. We also add a starting node called INTERVENE and a “sink” node called STOP;

  2. 2.

    We connect each node by unrolling the sampled traces. Starting from INTERVENE, we treat each action \((f,x)_{t}\) as a transition. We label the transition with (fx) and we connect the current node to the one representing the function f;

  3. 3.

    Lastly, for each node f, we store a collection of outgoing state-action pairs \((s_i, (f,x)_i)\). Namely, we store all the states s and the corresponding outward transitions which were decided by the model while at the node f;

  4. 4.

    For each node, \(f \in \mathcal {P}\), we train a decision tree on the tuples \((s_i,(f,x)_i)\) stored in the node to predict the transition \((f,x)_i\) given a user’s state \(s_i\).

The decision trees are trained only once by using the collection of traces sampled from the trained agent. The agent is frozen at this step, and it is not trained further. At this point, we perform Step 1 to 3 of Fig. 4. The pseudocode of the entire procedure is available in the ”Appendix A”.

Fig. 4
figure 4

Procedure to generate the explainable program from intervention traces. 1. For all \(f \in \mathcal {F}\), we add a new node. 2. Given the samples traces, we add the transitions, and we store \((s_i, (f_i, x_i))\) in each node. 3. We train a decision tree for each node to predict the next action (consistently with the sampled traces). 4. We execute the program on the new instance at prediction time, using the decision trees to decide the next action at each node. We extract a Boolean rule explaining it from the corresponding decision tree for each action. On the right, an example of generated intervention. The actions (fx) are black, while the explanations are red (Color figure online)

3.7 Generate explainable interventions

The intervention generation is done by traversing the graph \(\mathcal {P}\), starting from the node INTERVENE, until we reach the STOP node or we reach the maximum intervention length. In the last case, the program is marked as a failure. Given the node \(f \in \mathcal {P}\) and given the state \(s_t\), we use the decision tree of that node to predict the next transition \((f',x')\). Moreover, we can extract from the decision tree interpretable rules which tell us why the next action was chosen. A rule is a boolean proposition on the user’s features such as \((income > 5000 \wedge education = bachelor)\). Then, we follow \((f',x')\), which is an arc going from f to the next node \(f'\), and we apply the action to \(s_t\) to get \(s_{t+1}\). Again, the program is “fixed” at inference time, and it is not trained further. See Step 4 of Fig. 4 for an example of the inference procedure and of the produced explainable trace.

4 Experiments

Our experimental evaluation aims at answering the following research questions: (1) Does our method provide better performances than the competitors in terms of the validity of the algorithmic recourse? (2) Does our approach allow us to complement interventions with action-by-action explanations in most cases? (3) Does our method minimize the interaction with the black-box classifier to provide interventions?

The code and the dataset of the experiments are available on Github to ensure reproducibility.Footnote 1 The software exploit parallelization through mpi4python (Dalcin & Fang, 2021) to improve inference and training time. We compared the performance of our algorithm with CSCF (Naumann & Ntoutsi, 2021), to the best of our knowledge the only existing model-agnostic approach that can generate consequence-aware interventions following a causal graph. However, note that earlier solutions still perform user-specific optimization, so that our results in terms of generality, interpretability and cost (number of queries to the black-box classifier and computational cost) carry over to these alternatives. For the sake of a fair comparison, we built our own parallelized version of the CSCF model based on the original code. We developed the project to make it easily extendable and reusable by the research community. The experiments were performed using a Linux distribution on an Intel(R) Xeon(R) CPU E5-2660 2.20GHz with 8 cores and 100 GB of RAM (only 4 cores were used).

Table 1 Description of the datasets

4.1 Dataset and black-box classifiers

Table 1 shows a brief description of the datasets. They all represent binary (favourable/unfavourable) classification problems. The two real world datasets, German Credit (german) and Adult Score (adult) (Dua & Graff, 2017), are taken from the relevant literature. Given that in these datasets a couple of actions is usually sufficient to overturn the outcome of the black-box classifier, we also developed two synthetic datasets, syn and syn_long, where longer interventions are required, so as to evaluate the models in more challenging scenarios. The datasets are made of both categorical and numerical features (e.g., monthly income, job type, etc.). Each dataset was randomly split into \(80\%\) train and \(20\%\) test. For each dataset, we manually define a causal graph, \(\mathcal {G}\), by looking at the features available. For the synthetic datasets, we sampled instances directly from the causal graph. See Fig. 10 in the Appendix for an example of these graphs. The black-box classifier for german and adult was obtained by training a 5-layers MLP with ReLu activations. The trained classifiers are reasonably accurate (\(\sim 0.9\) test-set accuracy for german, \(\sim 0.8\) for adult). The synthetic datasets (syn and syn_long) do not require any training since we directly use our manually defined decision function.

4.2 Models

We evaluate four different models: FARE, the agent coupled with MCTS (\(M_{\hbox {FARE}}\)), E-FARE, the explainable deterministic program distilled from the agent (\(M_{\hbox {E-FARE}}\)), and two versions of CSCF, one (\(M_{cscf}\)) with a large budget of generation, n, and population size, p, (\(n=50, p=200\)) and one (\(M_{cscf}^{small}\)) with a smaller budget (\(n=25, p=100\)). For \(M_{\hbox {FARE}}\), we set the MCTS exploration depth to 7 for all the experiments.

Fig. 5
figure 5

Experimental results. (Left) validity (fraction of successful interventions); (Middle) Average length of a successful intervention; (Right) Average cost of a successful intervention. Results are averaged over 100 test examples

4.3 Evaluation

The left plot in Fig. 5 shows the average validity of the different models, namely the fraction of instances for which a model manages to generate a successful intervention (Wachter et al., 2017). We can see how \(M_{\hbox {FARE}}\) outperforms or is on-par with the \(M_{cscf}\) and \(M_{cscf}^{small}\) models on both the real-world and synthetic datasets. The performance difference is more evident in the synthetic datasets because the evolutionary algorithm struggles to generate interventions that require more than a couple of actions. The validity loss incurred in distilling \(M_{\hbox {FARE}}\) into a program (\(M_{\hbox {E-FARE}}\)) is rather limited. This implies that we are able to provide interventions with explanations for \(94\%\) (german), \(66\%\) (adult), \(99\%\) (syn) and \(87\%\) (syn_long) of the test users.Footnote 2 Moreover, \(M_{\hbox {E-FARE}}\) generates similar interventions to \(M_{\hbox {FARE}}\). The sequence similarity between their respective interventions for the same user are 0.89 (german), 0.72 (adult), 0.80 (syn) and 0.71 (syn_long), where 1.0 indicates identical interventions.

The main reason for the validity gains of our model is the ability to generate long interventions, something evolutionary-based algorithms struggle with. This effect can be clearly seen from the middle plot of Fig. 5. Both \(M_{cscf}\) and \(M_{cscf}^{small}\) rarely generate interventions with more than two actions, while our approach can easily generate interventions with up to five actions. A drawback of this ability is that intervention costs are, on average, higher (right plot of Fig. 5). On the one hand, this is due to the fact that our model is capable of finding interventions for more complex instances, while \(M_{cscf}\) and \(M_{cscf}^{small}\) fail. Indeed, if we compute lengths and costs on the subset of instances for which all models find a successful intervention, the difference between the approaches is less pronounced. See Fig. 6 for the evaluation. On the other hand, there is a clear trade-off between solving a new optimization problem from scratch for each new user, and learning a general model that, once trained, can generate interventions for new users in real-time and without accessing the black-box classifier.

We also conducted a quantitative analysis of the quality of the explanations generated using \(M_{\hbox {E-FARE}}\). We measured the average number of boolean clauses in the rule of a given suggested action. Our explanations need to be concise, thus involving a limited number of features, to be easily understandable. The literature defines seven as the maximum acceptable number of concepts in an explanation (Miller, 2019, 1956). We have an average of 3 for the syn, syn_long and german datasets, while we have an average of  6.5 clauses for the adult dataset. Indeed, the \(M_{\hbox {E-FARE}}\) model can generate compact explanations as boolean predicates. The adult dataset requires a more complex recourse policy. Therefore the decision rules of the automaton are more complex, thus involving longer boolean predicates. See Fig. 7 for examples of interventions coupled with rule-based explanations.

Fig. 6
figure 6

Evaluation considering only the instances for which all the models provide a successful intervention. If we restrict the comparison to the subset of instances for which all models manage to generate a successful intervention, the difference in costs between methods shrinks substantially (top left vs bottom left). The same behaviour applies to the intervention length (top right vs bottom right)

Fig. 7
figure 7

Example of Interventions with rule-based explanations. We show here two additional examples of successful interventions (syn and german datasets) combined with boolean predicates explaining why we suggested the given action. The black text indicates the action \((f,x)_t\), while the red text indicates the decision rule (Color figure online)

Table 2 Ablation study
Fig. 8
figure 8

Number of queries. Total number of queries to the black-box classifier made by the models. \(M_{\hbox {E-FARE}}(predict)\) is not visible, as the automaton does not query the black-box classifier to generate interventions. Note that the number of queries is in logscale

Fig. 9
figure 9

Validity of \(M_{\hbox {E-FARE}}\) when varying the training budget. We show the effect on increasing the sampling budget (from 100 to 700 traces) when training the \(M_{\hbox {E-FARE}}\) model

Figure 8 reports the average number of queries to the black-box classifier. Our approach requires far fewer queries than \(M_{cscf}\) (note that the plot is in logscale), and even substantially less than \(M_{cscf}^{small}\) (that is anyhow not competitive in terms of validity). Furthermore, most queries are made for training the agent (\(M_{\hbox {FARE}}(train)\)), which is only done once for all users. Once the model is trained, generating interventions for a single user requires around two orders of magnitude fewer queries than the competitors. Note that MCTS is crucial to allow the RL agent to learn a successful policy with a low budget of queries. Indeed, training an RL agent without the support of MCTS fails to converge in the given budget (between 50 and 100 iterations), leading to a completely useless policy. By efficiently searching the space of interventions, MCTS manages to quickly correct inaccurate initial policies, allowing the agent to learn high quality policies with a limited query budget. MCTS is also critical during inference, since it increases the validity of the results. Given a trained agent, the validity drops if we perform inference without the MCTS components. See Table 2 for the evaluation.

When turning to the program, building the automaton (\(M_{\hbox {E-FARE}}(train)\)) requires a negligible number of queries to extract the intervention traces used as supervision.

Using the automaton to generate interventions does not require to query the black-box classifier. This characteristic can substantially increase the usability of the system, as \(M_{\hbox {E-FARE}}\) can be employed directly by the user even if they have no access to the classifier. Computationally speaking, the advantage of a two-step phase is also quite dramatic. \(M_{cscf}\) takes an average of \(\sim 693\,\textrm{s}\) for each user to provide a solution (the same order of magnitude of training a model for all users with \(M_{\hbox {FARE}}\)), while \(M_{\hbox {FARE}}\) inference time is under 1s, allowing real-time interaction with the user.

Additionally, Fig. 9 shows how it is possible to improve the performances of \(M_{\hbox {E-FARE}}\) by just sampling more traces from the trained agent (\(M_{\hbox {FARE}}\)). We can see how the validity increases in the adult, syn and syn_long datasets. We also notice that using a larger budget to train \(M_{\hbox {E-FARE}}\) produces longer explainable rules by keeping the length and cost of the generated interventions almost constant. The total number of queries to the black-box classifier will also slightly increase.

Overall, our experimental evaluation allows us to affirmatively answer the research questions stated above.

5 Conclusion

This work improves the state-of-the-art on algorithmic recourse by providing a method, FARE (eFficient counterfActual REcourse), that can generate effective and interpretable counterfactual interventions in real-time. Our experimental evaluation confirms the advantages of our solution with respect to alternative consequence-aware approaches in terms of validity, interpretability and number of queries to the black-box classifier. Our work unlocks many new research directions, which could be explored to solve some of its limitations. First, following previous work on causal-aware intervention generation, we use manually-crafted causal graphs and action costs. Learning them from the available data directly, minimizing the human intervention, would allow applying the approach in settings where this information is not available or unreliable. Second, we showed how our method learns a general program by optimizing over multiple users. It would be interesting to investigate additional RL methods to optimize the interventions globally and locally to provide more personalized sequences to the users. Such methods could be coupled with interactive approaches eliciting preferences and constraints directly from the user, thus maximizing the chance to generate the most appropriate intervention for a given user.

6 Ethical Impact

The research field of algorithmic recourse aims at improving fairness, by providing unfairly treated users with tools to overturn unfavourable outcomes. By providing real-time, explainable interventions, our work makes a step further in making these tools widely accessible. As for other approaches providing counterfactual interventions, our model could in principle be adapted by malicious users to “hack” a fair system. Research on adversarial training can help in mitigating this risk.