EcoShap: Save Computations by only Calculating Shapley Values for Relevant Features

Jamshidi, Parisa; Nowaczyk, Sławomir; Rahat, Mahmoud

doi:10.1007/978-3-031-50396-2_2

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1947))

Included in the following conference series:

European Conference on Artificial Intelligence

410 Accesses

Abstract

One of the most widely adopted approaches for eXplainable Artificial Intelligence (XAI) involves employing of Shapley values (SVs) to determine the relative importance of input features. While based on a solid mathematical foundation derived from cooperative game theory, SVs have a significant drawback: high computational cost. Calculating the exact SV is an NP-hard problem, necessitating the use of approximations, particularly when dealing with more than twenty features. On the other hand, determining SVs for all features is seldom necessary in practice; users are primarily interested in the most important ones only. This paper introduces the Economic Hierarchical Shapley values (ecoShap) method for calculating SVs for the most crucial features only, with reduced computational cost. EcoShap iteratively expands disjoint groups of features in a tree-like manner, avoiding the expensive computations for the majority of less important features. Our experimental results across eight datasets demonstrate that the proposed technique efficiently identifies top features; at a 50% reduction in computational costs, it can determine between three and seven of the most important features.

You have full access to this open access chapter, Download conference paper PDF

Shapley values for cluster importance

Article Open access 06 December 2022

Unsupervised Feature Value Selection Based on Explainability

Shapley value: from cooperative game to explainable artificial intelligence

Article Open access 09 February 2024

Keywords

1 Introduction

In recent years, researchers are increasingly focusing on explaining machine learning models due to their ever-growing practical applications in industry, business, society, healthcare, and justice. Especially in safety-critical systems, it is essential to be able to interpret the output of a prediction model correctly. This builds trust among users, allows humans to understand the machine’s decision-making process, and provides insight into the model’s potential enhancements [1,2,3]. Among many diverse approaches to XAI, including feature importance, prototype explanations, rule-based systems, counterfactual analysis, and model distillation [4,5,6], explanations based on feature importance are arguably the most popular.

It is so because feature importance provides a straightforward and intuitive understanding and enables a deeper comprehension of the relationship between input features and prediction targets. This approach can increase transparency and build trust in the model by highlighting the most important features, especially for users without a technical background. It can also help data scientists identify and revise biased, irrelevant, or redundant features, improving model accuracy and reducing overfitting. Finally, only with humans understanding how and why ML made its decisions can AI provide knowledge discovery through actionable and robust insights exploiting super-human performance achievable in some tasks. Generally, feature importance scores are used to assess the influence each input feature has on a particular model predicting a specific variable. The fact that they are often easier to understand than other XAI techniques makes them a popular choice for many users [7,8,9].

Among the different approaches to feature importance, those based on cooperative game theory (CGT) have gained recognition in recent years. In contrast to other approaches, CGT concepts are axiomatically motivated. An example of this type of solution is the Shapley value (SV), built on a very strong theoretical foundation and characterized by fairness, symmetry, and efficiency [10]. However, computing the SV is NP-hard; even with significant progress related to calculating approximate Shapley values [11], the computational complexity still limits potential usage areas [12].

In this paper, we propose a method to calculate a limited number of the highest Shapley values, instead of all of them, in significantly reduced time. The solution builds on a recent idea introduced by [13], where SVs are calculated for a group of features instead of one feature at a time. We exploit the (typically assumed to hold) superadditivity property and the lower computational cost associated with calculating SVs for groups of features at once. This way, our approach allows calculating SVs for the most important features at a fraction of the cost (across eight popular ML datasets, we can always find the single most important feature in 30% of the calculations; and in half of the time, we can compute from three to seven highest ranked features).

The idea is motivated by the notion that, in most applications, only a select few of the most important features warrant in-depth analysis; the SVs for all the others are of little value. The primary goal for the user is to evaluate the impact a specific input feature has on predicting the target variable; thus, the explanations should highlight critical features and facilitate relative comparisons among them. SVs for low-importance features are rarely needed. In practice, simply not being among the top ones conveys sufficient information. Let us consider the three specific examples of uses of explanations mentioned earlier.

First, XAI increases transparency and fosters trust in the model. This is typically achieved by comparing the user’s (human expert’s) expectations against the ML model’s internal mechanisms. Trust is undermined when the model disregards features deemed important or assign considerable weight to features known to be irrelevant. Clearly, both of these scenarios can be identified based on SVs for the most important features. While a user may have intuitions concerning the relative significance of key features (e.g., that the top feature should be at least twice as important as the second one), it is difficult to envision similar phenomena for the lowest-ranked features. The second use of XAI involves model debugging and enhancement. Virtually all actions in this context can be carried out based on SVs for the top features as well. For instance, by dropping irrelevant features, a model can be simplified and made computationally cheaper – but precise SVs are not needed, only information regarding which features fall outside of the retained range. Another enhancement technique entails increasing the quality of key features, by allocating additional preprocessing and cleaning efforts where they yield the most benefit – on the highest-ranked features. Finally, knowledge creation or discovery of insights is, essentially, the opposite of trust building. It relies on finding discrepancies between the model’s operation and expert understanding. Specifically, in cases where the model outperforms the human, these discrepancies are discovery opportunities. Both unexpectedly high-importance features and those with unexpectedly low importance are valuable in this context – and both can be efficiently detected using ecoShap.

The remainder of the paper is organized as follows. In Sect. 2, we first present the background of this work in terms of Shapley values in the ML setting, the assumptions made, and the proposed method. In Sect. 3, we cover the experimental setting and dataset that was used. In Sect. 4, we present some computational results of the experiments and finish the paper with some conclusions and future works.

2 Methodology

2.1 Background

Lloyd Shapley introduced one of the most influential solution concepts in cooperative games, now referred to as the Shapley value [14]. When a group of players agrees to cooperate, the SV helps determine a fair payoff for each individual, considering that each player may have contributed to varying extents.

Although extensively studied from a theoretical standpoint, calculating the SV is an NP-hard problem [15]. Due to its strong theoretical properties, the SV has emerged as a favored explanation method for black-box models. Considerable efforts have been dedicated to approximate the SVs in cases where exact solutions are impractical. Nevertheless, even with those approximations, employing SVs for larger datasets continues to pose significant computational challenges.

Shapley Values. In a cooperative game, SVs are formally defined as a fair way to distribute payoffs to players according to their marginal contributions. The Shapley value ${\phi }^{Sh}_i$ for player i is calculated as follows:

$$\begin{aligned} {\phi }^{Sh}_i(v) =\sum _{S\subseteq N \setminus \left\{ i \right\} }^{}\frac{\left| S \right| !\ \bigl (\left| N \right| -\left| S \right| -1\bigr )!}{\left| N \right| !} \Bigl ( v\bigl (S\cup \left\{ i \right\} \bigr )-v(S) \Bigr ), \end{aligned}$$

(1)

where N is a set of all players, S is a partial coalition, and v(S) is the payoff (sometimes referred to as value or worth) created by coalition $S\subseteq N$ (so-called “characteristic function”). This formula computes the average marginal contribution of player i over all possible orderings of the players.

In recent years, SVs have been widely adopted in various machine learning settings, including explainable machine learning, feature selection, data valuation, multi-agent reinforcement learning, and ensemble model evaluation, all of which have specific cooperative game formulations, see for example [16, 17]. In this paper, we are particularly interested in using SVs to find the most important features in supervised machine learning problems, so we formulate cooperative games based on that setting. In principle, though, the core idea of the proposed approach could be extended to most, if not all, other formulations as well.

Let $\mathcal {D} = \left\{ \left( \vec{x^{i}},y^{i} \right) |i=1:M \right\} $ be a dataset, where the target variable y can be either categorical or continuous for classification or regression, respectively. Each instance $\vec{x_i} \in \mathcal {R}^n$ is described by n features $\vec{F} = [ f_1, \cdots , f_n ]$, and $\mathcal {A} : \mathcal {R}^n \rightarrow y$ is a black box model to predict the outputs.

Let $v(S) = g(y^{i}, \hat{y}^{i})$, where $g(\cdot )$ is the goodness of fit function, $y^{i}$ is the ground truth and $\hat{y}^i = \mathcal {A}^S(\vec{x^{i}})$ is the target predicted by $\mathcal {A}^S$, namely the model trained on a subset of features $S\subseteq \vec{F}$. The ${\phi }^{Sh}_i$ is SV of a single feature i, calculated according to Eq. 1. Then, ${\phi }^{Sh}_G$ is SV of a group of features G, where $G\subseteq \vec{F}$. To calculate ${\phi }^{Sh}_G$, we consider all features in G as a single unit:

$$\begin{aligned} {\phi }^{Sh}_G(v) =\sum _{S\subseteq F \setminus G }^{}\frac{\left| S \right| !\ \bigl (\left| F \right| -\left| S \right| - \left| G \right| \bigr )!}{(\left| F \right| - \left| G \right| )!} \Bigl (v\bigl (S\cup G \bigr )-v(S)\Bigr ). \end{aligned}$$

(2)

As can be seen from Eq. 2, calculating ${\phi }^{Sh}_G$ for $|G| \gg 1$ is significantly faster than ${\phi }^{Sh}_i$ from Eq. 1. Since coalitions are elements of the power set of $N\setminus \{G\}$, the power set of remaining players would therefore have much fewer elements when calculating the SV of a group than a single feature. The larger the group of players G is, the fewer elements in the power set and the fewer model evaluations are required according to Eq. 2.

2.2 EcoShap Assumptions

Intuitively, the Shapley value of a feature represents the extent of the contribution made by that feature to the machine learning model. SV for a group of features captures the combined contribution of all these features together. As a result, it is reasonable to expect that the SV for a set of features will be at least as high as the individual SVs of each of those features.

Incorporating additional features does not diminish the performance of an optimal machine learning model. An ideal model should be able to discern features that negatively affect the result and disregard them. In practice, of course, less robust machine learning algorithms fall prey to spurious features and overfit. However, with a sufficiently powerful model, one can expect that the introduction of more features will either cause the SV of the group to increase or maintain its current level.

This corresponds to an assumption of the superadditive characteristic function. Formally, if S and T are disjoint coalitions players $(S \cap T = \varnothing )$, then

$$\begin{aligned} v(G_1 \cup G_2) \ge v(G_1)+v(G_2). \end{aligned}$$

(3)

It means the value of two disjoint coalitions working together is at least as big as when they work separately. Of course, in game theory, superadditivity is not required for many coalition games; however, it seems natural for the machine learning formulation. From the above directly follows the following constraint:

$$\begin{aligned} v(G) \ge \max _{f_i\in G} v(f_i). \end{aligned}$$

(4)

Thus, if we calculate SV for a group of features G, and it is lower than some “threshold of interest”, there is no need to calculate individual SVs for any of the $f_i \in G$ since no feature in that group can be “good enough”.

2.3 EcoShap Algorithm

Our proposed approach follows the binary search tree idea. First, we intuitively describe how to find the single most important feature. We then generalize the approach to more features in ecoShap Algorithm 1.

In the first step, the set of all features F is randomly split into two disjoint subsets of equal sizes, $G_1$ and $G_2$. Given that we can identify which of these two subsets contains the most important feature $f^*$, then – by ignoring the other subset – we would be able to find $f^*$, recursively, in $log_2|F|$ steps by splitting one of the groups until we reach a leaf, i.e., a group comprising just one feature.

In the general case, of course, it is not possible to (efficiently) determine that with absolute certainty. However, it is easy to determine which branch is more likely to contain the most important feature(s) based on the SV of each group. As mentioned in Sect. 2.2, a group has a value greater than or equal to the maximum value of the features belonging to that group. Thus, we calculate SV for each of the two groups. Let us assume that the first subgroup has value ${\phi }^{Sh}_{G_1}$ and the second has value ${\phi }^{Sh}_{G_2}$, and without loss of generality ${\phi }^{Sh}_{G_1} < {\phi }^{Sh}_{G_2}$.

We can then suppose that $G_2$ contains $f^*$ and select that one to split first. Nevertheless, we store $G_1$ in a priority queue called $\mathcal {G}$, in case we need to revisit it later. Generally, we expect that $f^*$ belongs to the group with the highest SV among all the already evaluated groups. We call it $G^*$, and in each step, $G^*$ will split into two disjoint groups.

Continuing our example, if at some point we reach a state where ${\phi }^{Sh}_{G_1}$ is the largest SV, we will “backtrack” and split it as well. Since there is no upper bound on the SV of a group, it is conceivable that a number of individually weak features combine into a powerful impact on the model. While somewhat harmful to the computational performance of our algorithm, this backtracking procedure guarantees such phenomena do not affect the correctness – and they happen relatively rarely in practice, according to the experimental evaluation.

We repeat such splitting of $G^*$ until we find the single feature $f_i$. As long as $f_i$ has the highest value among all the unexpanded nodes in the current tree (please note that we do not need to evaluate this over all the leaves of the fully-expanded tree), we can be sure that $f_i$ has a higher SV than the features belonging to the other leaves. Therefore, it is the most important feature, i.e., $f_i = f^*$. For clarity, this example is visualized in Fig. 1.

$$\begin{aligned} f_i\equiv f^{*}, \quad if \quad \forall _{G \in \mathcal {G}} \quad {\phi }^{Sh}_{f_i} >{\phi }^{Sh}_G .\end{aligned}$$

(5)

After finding the single feature $f^*$ with the highest Shapley value ${\phi }^{Sh}_*$, we can identify the next-in-line $G^*$ on the remainder of the tree. This is the group that should be expanded next. As before, while there is no guarantee that it contains the second-best feature, we also cannot exclude that possibility. And whenever we find a single feature with the second-highest value among all leaves, we can be sure that it is the second-important feature.

Overall, whenever we find a single feature with a higher value than all the remaining groups within the $\mathcal {G}$ at any point in the search, we can be sure that this feature is more important than all the “yet unexplored” features, allowing us to efficiently calculate an arbitrary number of SVs.

3 Experiments

3.1 Datasets

We used eight well-known machine learning datasets of moderate size for the experiments. Four variants are very similar to each other, corresponding to data of Wave Energy Converters from four different cities. This allows us to compare the behavior of ecoShap on datasets with consistent characteristics. Note that we remove all categorical features from all datasets to avoid the arbitrary choice of encoding. An overview of the datasets used can be found in Table 1.

Table 1. Summary of datasets

Full size table

We intentionally selected relatively small datasets for the experimental evaluation to more clearly illustrate the behavior near the budget threshold. In practice, ecoShap is especially well-suited for datasets with hundreds or thousands of features, where “classical” Shapley approaches are infeasible. More formally, the complexity of ecoShap grows logarithmically with the number of features, whereas existing methods require a linear number of SV computations.

3.2 Baseline Shapley Value

All experiments were conducted using our custom Python^{Footnote 1} implementation of the Monte Carlo permutation method (MCshap) to approximate SVs. The implementation permits the approximation of the SV for either a single feature or a group of features, utilizing the same core algorithm. Given that the concept of calculating SVs for groups of features is relatively novel, we are not aware of any existing implementation that offers such versatility.

The MCshap method involves randomly permuting the feature values based on a subset of the training set, known as the “background,” and computing the difference in model predictions with and without the feature under evaluation. As ML models are trained with all features, directly excluding a feature during prediction is impossible. The permutation simulates such exclusion by replacing actual values with random values from the background samples. It effectively disrupts any existing meaningful patterns while preserving the structure of the data. The global SV for any feature is determined by repeating the above procedure for multiple test instances and calculating the mean absolute value. Consequently, the SV depends not only on the feature itself, but also on the test and background instances used in the computations.

3.3 Experimental Setup

To assess the global feature importance, ecoShap requires a trained model, a test set, and background data. The datasets have been randomly divided into training (75%) and testing (25%) sets, except for the MSD dataset, in which the train and test split is predefined. Two models, namely, Extreme Gradient Boosting (XGBoost^{Footnote 2}) [21] and Random Forest Regression^{Footnote 3}) [22], have been trained on each dataset. To estimate the global SV, we randomly select 100 data points as the test set and 100 data points as the background from the test set and training set, respectively.

We use ecoShap and MCshap to find different numbers of important features, repeating every experiment 50 times on each dataset. It is worth mentioning that both models performed similarly, though, for brevity, only the result of XGB is reported.

As mentioned in Sect. 2.1, calculating the SV is an NP-hard problem. Computing the exact SV for real-world datasets, particularly those with more than 20–25 features, is infeasible [15], and the lack of ground truth presents a significant challenge.

Therefore, we compare against the MCshap baseline to demonstrate that the ecoShap method does not introduce significant additional errors, beyond those inherent in the Monte Carlo approximation. We used MCshap 50 times to approximate the SVs of each dataset and considered their average as an SV of each feature. We refer to these values as close-to-ground-truth and use them to evaluate the ecoShap.

4 Results

In this section, we showcase the results of experiments that emphasize the advantages of the ecoShap method. We start by illustrating the computational efficiency of the ecoShap and explore how different dataset characteristics influence its performance. Finally, we verify the precision of the ecoShap results by comparing them to the baseline MCshap method.

4.1 Measuring Computational Efficiency of EcoShap

The initial experiment aims to demonstrate the computational effectiveness of the proposed ecoShap method in identifying the most important features. Specifically, we investigate the relationship between the computational cost and the number of highest-ranked features found. The findings for two representative datasets (MSD and SC) are presented in Fig. 2, while comprehensive results for all datasets can be found in Figs. 5 and 6.

The most straightforward way to compare the two methods is the number of times they need to call the SV function. Clearly, MCshap needs to calculate SV for all features and then sort them; thus, even to find the single most important feature, MCshap must call the SV function as many times as there are features. In contrast, ecoShap uses significantly fewer calls to identify the first feature, but computing SV for additional features incurs an extra cost. The leftmost panels of Fig. 2 illustrate (top one for MSD and bottom for SC datasets) the number of times each method calls the SV function (y-axis) to find the required number of important features (x-axis). The intersection of the blue line (corresponding to ecoShap) with the horizontal red line (MCshap) represents the break-even point for the proposed method, signifying that fewer function calls are required to calculate SVs for that many top features.

It is worth noting, though, that the ecoShap method often considers a group of features as a single entity for calculating SV; and somewhat counterintuitively, the computational cost for a group is smaller than for a single feature(see Sect. 2.2). Consequently, considering both a group and a single feature as a unit of computation underestimates the computational benefits of the ecoShap method.

To achieve a fairer comparison, therefore, we examine the time required by each method. A direct comparison of the time consumption reveals that the advantage of the ecoShap approach over MCshap is even more pronounced. As indicated in the middle panels of Fig. 2 (labeled b1 and b2), ecoShap discovers even more features up to the intersection point, confirming that the number of SV function calls is biased. However, measuring time directly introduces experimental uncertainties, particularly in an environment with shared resources, and creates undesired correlations with specific hardware, making future comparisons more challenging.

Consequently, as a final and most equitable comparison, we propose using the total number of sample evaluations by the ML model across the whole SV calculation process. Since estimating the SV for a group of features requires fewer permutations and fewer sample evaluations than for a single feature, this metric provides a more comprehensive and fair assessment. These findings are shown in the far-right panels of Fig. 2 (labeled c1 and c2). When comparing subfigures b1 against c1 and b2 against c2, the fidelity of the number of sample evaluations is superior to the SV function calls and is more reliable than direct time measurements.

4.2 EcoShap Performance on Budget

The first experiment focused on comparing the time efficiency of ecoShap and MCshap using different evaluation metrics. In this section, we explore how many top-ranked features ecoShap can calculate SVs for, while still conserving computations in comparison to MCshap. To this end, we consider the MCshap computational costs to be the “full budget.” Table 2 shows the number of features that ecoShap can identify by allocating different percentages of the budget (from 10% to 100%) for each of the eight datasets.

Table 2. The average number of features found based on the budget percentage

Full size table

Likewise, we are also interested in determining the amount of budget necessary to identify various numbers of important features. Table 3 illustrates the percentage of computations (MCshap full budget) required to find between one and ten most important features for each of the eight datasets.

These experiments demonstrate that ecoShap can always identify the top three most important features by using up to half of the budget, often much less. In some instances, it can discover as many as seven of the most significant features at half of the budget. These experiments demonstrate the computational advantages of ecoShap in identifying essential features with a limited budget.

Table 3. The mean and (std) of the budget rate were used to find the first to the tenth important features in eight datasets.

Full size table

4.3 Dataset Characteristics

Interestingly, there is a noticeable variation in the results across different datasets. This is evident even in the simplest scenario, where the objective is to find the first most important feature. In some datasets, ecoShap can discover the top feature by spending less than 10% of the budget, while in others, it requires more than 35% of the budget.

This relation between each dataset’s features SV pattern and the budget used by ecoShap can be explained based on the algorithm’s design. EchoShap can find the most important feature faster when the SV of the first feature is significantly higher than those of the remaining features. For instance, in the case of the SC dataset, ecoShap identified the first important feature with as little as 5% of the budget – because it has a notably higher SV than other features. The SV of the second feature is roughly half that of the first one. In contrast, the SV of the first sixteen significant features of the WEC_T dataset are all very similar to each other, and ecoShap expended more than 35% of the budget to identify the first one.

Intuitively, the more similar the SVs are, the more budget ecoShap will need to recognize the order of features. This phenomenon is a result of the ecoShap algorithm, which partitions feature groups based on their SV values. When the SV of the first important feature, $f^*$, is significantly higher than that of other features, it strongly impacts the SV of its group. Thus, any group it is placed into becomes $G^*$ most of the time. As a result, the algorithm does not need to evaluate any other groups and can always split the group containing $f^*$. On the other hand, if multiple features share similar high SVs, ecoShap needs to analyze several unwanted groups, and finding $f^*$ will require more SV calculations.

When considering more than just the top feature, the differences in slope in Fig. 3 can be similarly explained. For example, in the MSD dataset, there is a substantial increase in computational cost between the 5th and 6th features, which is much larger than that between the 4th and 5th features. A comparison with the corresponding SV plot (Fig. 4) reveals a more significant difference between the SVs of the 6th and 7th features than that between the 6th and 5th features.

4.4 Backtracking Cost

As discussed in Sect. 2.3, the proposed algorithm can backtrack when the wrong branch is chosen. In this experiment, we investigate ecoShap’s backtrack ability. For simplicity, we limit ourselves to the single most important feature. We first compute the expected number of SV function calls required to identify the most significant feature, assuming the correct branch is always selected (i.e., an oracle is used). In Table 4, we then compare this to the mean number of calls made by ecoShap, for each dataset.

For example, the algorithm finds the first important feature of the HP dataset by calling the SV function close to 12 times while the expected value is 10.44). It shows that the algorithm almost always selects the branch containing the most important feature for the split. Meanwhile, the true number of calls for the WEC_T dataset exceeds the expected value by a factor of three. This indicates that the algorithm explored many incorrect branches before finding the first feature. These results are consistent with earlier findings and supported by Fig. 4.

Table 4. Expected versus actual number of SV function calls.

Full size table

4.5 The Accuracy of the EcoShap

As the final step, we consider the accuracy of the proposed method. In principle, given our assumptions, ecoShap should not introduce any error in the calculations. In practice, though, due to the stochastic nature of all the algorithms involved, there is a possibility of errors accumulating in unfavorable ways. This section aims to demonstrate that these effects are negligible.

Given that there is no definitive ground truth for the SVs, we compare ecoShap results with their close-to-ground-truth (CtGT) MCshap counterparts to demonstrate that our proposed method does not significantly deviate from its baseline in approximating SVs. We use the “features on the whole budget” (FoB) metric, which refers to the most important features that ecoShap can identify using the MCshap computational budget. As a measure of accuracy, we use the sum of absolute errors (SAE) of the FoB features, defined as:

$$\begin{aligned} SAE = \sum _{f\in FoB}^{}\left| ecoShap(f)- CtGT(f)\right| \end{aligned}$$

(6)

Table 5. The mean and standard deviation of the SAE for each dataset.

Full size table

Table 5 presents the mean and standard deviation of the SAE for each dataset, indicating that there is essentially no approximation error caused by ecoShap. For example, in the HP dataset, the sum of errors for 14 features is 0.0011, which is quite negligible.

5 Conclusion

In this paper, we introduced the Economic Hierarchical Shapley value (ecoShap) method, which efficiently identifies the most important features and calculates their Shapley values, with computational savings of up to 95%. By utilizing group-wise efficient computation of Shapley values in the early stages of the search process, ecoShap serves as a filter, bypassing the unnecessary calculation of Shapley values for less important individual features.

Our method can be used based on the desired number of important features or the computational budget. Experimental results indicate that ecoShap performs much better in datasets whose features are well separated and feature importance levels differ. Additionally, ecoShap has consistently identified between three and seven most important features across all evaluated datasets while using less than half of the available budget. To verify the accuracy of ecoShap, it was compared with the close-to-ground-truth results obtained from a baseline, demonstrating that there is no significant approximation error introduced.

It is worth noting that in the current version of ecoShap, features are randomly split into groups. One idea for future work is to develop more effective grouping approaches, for example, based on correlations, that will allow ecoShap to avoid unnecessary divisions and perform an even smarter search.

Notes

References

Doshi-Velez, F., Kim, B.: Towards a rigorous science of interpretable machine learning. arXiv preprint arXiv:1702.08608 (2017)
Lipton, Z.C.: The mythos of model interpretability: in machine learning, the concept of interpretability is both important and slippery. Queue 16(3), 31–57 (2018)
Article Google Scholar
Ribeiro, M.T., Singh, S., Guestrin, C.: “ why should I trust you?” explaining the predictions of any classifier. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1135–1144 (2016)
Google Scholar
Domingos, P.: A few useful things to know about machine learning. Commun. ACM 55(10), 78–87 (2012)
Article Google Scholar
Kusner, M.J., Loftus, J., Russell, C., Silva, R.: Counterfactual fairness. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Google Scholar
Van Looveren, A., Klaise, J.: Interpretable counterfactual explanations guided by prototypes. In: Oliver, N., Pérez-Cruz, F., Kramer, S., Read, J., Lozano, J.A. (eds.) ECML PKDD 2021. LNCS (LNAI), vol. 12976, pp. 650–665. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-86520-7_40
Chapter Google Scholar
Ribeiro, M.T., Singh, S., Guestrin, C.: Model-agnostic interpretability of machine learning. arXiv preprint arXiv:1606.05386 (2016)
Sundararajan, M., Najmi, A.: The many shapley values for model explanation. In: International Conference on Machine Learning, pp. 9269–9278. PMLR (2020)
Google Scholar
Karlaš, B., et al.: Data debugging with shapley importance over end-to-end machine learning pipelines. arXiv preprint arXiv:2204.11131 (2022)
Shapley, L.S., et al.: A Value for N-person Games: Princeton University Press, Princeton (1953)
Google Scholar
Lundberg, S.M., Lee, S.-I.: A unified approach to interpreting model predictions. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Google Scholar
Sundararajan, M., Dhamdhere, K., Agarwal, A.: The shapley taylor interaction index. In: International Conference on Machine Learning, pp. 9259–9268. PMLR (2020)
Google Scholar
Harris, C., Pymar, R., Rowat, C.: Joint shapley values: a measure of joint feature importance. arXiv preprint arXiv:2107.11357 (2021)
Shapley, L.S.: “Notes on the n-person game – ii: the value of an n-person game,” RAND Corporation, Santa Monica, Calif., Technical Report (1951)
Google Scholar
Ancona, M., Oztireli, C., Gross, M.: Explaining deep neural networks with a polynomial time algorithm for shapley value approximation. In: International Conference on Machine Learning, pp. 272–281. PMLR (2019)
Google Scholar
Rozemberczki, B., et al.: The shapley value in machine learning. arXiv preprint arXiv:2202.05594 (2022)
Wang, G.: Interpret federated learning with shapley values. arXiv preprint arXiv:1905.04519 (2019)
Dua, D., Graff, C.: UCI machine learning repository (2017). http://archive.ics.uci.edu/ml
Fernandes, K., Vinagre, P., Cortez, P.: A proactive intelligent decision support system for predicting the popularity of online news. In: Pereira, F., Machado, P., Costa, E., Cardoso, A. (eds.) EPIA 2015. LNCS (LNAI), vol. 9273, pp. 535–546. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-23485-4_53
Chapter Google Scholar
Hamidieh, K.: A data-driven statistical model for predicting the critical temperature of a superconductor. Comput. Mater. Sci. 154, 346–354 (2018)
Article Google Scholar
Chen, T., Guestrin, C.: Xgboost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 785–794 (2016)
Google Scholar
Geurts, P., Ernst, D., Wehenkel, L.: Extremely randomized trees. Mach. Learn. 63, 3–42 (2006)
Article Google Scholar

Download references

Acknowledgments

This work was partially supported by the CHIST-ERA grant CHIST-ERA-19-XAI-012, funded by the Swedish Research Council.

Author information

Authors and Affiliations

Center for Applied Intelligent Systems Research (CAISR), Halmstad University, Halmstad, Sweden
Parisa Jamshidi, Sławomir Nowaczyk & Mahmoud Rahat

Authors

Parisa Jamshidi
View author publications
You can also search for this author in PubMed Google Scholar
Sławomir Nowaczyk
View author publications
You can also search for this author in PubMed Google Scholar
Mahmoud Rahat
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sławomir Nowaczyk .

Editor information

Editors and Affiliations

Halmstad University, Halmstad, Sweden
Sławomir Nowaczyk
Warsaw University of Technology, Warsaw, Poland
Przemysław Biecek
Warsaw University, Warsaw, Poland
Neo Christopher Chung
University of Huddersfield, Huddersfield, UK
Mauro Vallati
AGH University of Science and Technology, Kraków, Poland
Paweł Skruch
AGH University of Science and Technology, Kraków, Poland
Joanna Jaworek-Korjakowska
University of Huddersfield, Huddersfield, UK
Simon Parkinson
University of Huddersfield, Huddersfield, UK
Alexandros Nikitas
Universität Osnabrück, Osnabrück, Germany
Martin Atzmüller
University of Economics Prague, Prague, Czech Republic
Tomáš Kliegr
University of Bamberg, Bamberg, Germany
Ute Schmid
Jagiellonian University, Kraków, Poland
Szymon Bobek
Jožef Stefan Institute, Ljubljana, Slovenia
Nada Lavrac
HU University of Applied Sciences Utrecht, Utrecht, The Netherlands
Marieke Peeters
Rotterdam University of Applied Sciences, Rotterdam, The Netherlands
Roland van Dierendonck
Amsterdam University of Applied Sciences, Amsterdam, The Netherlands
Saskia Robben
University of Reims Champagne-Ardenne, Reims, France
Eunika Mercier-Laurent
Istanbul Technical University, Istanbul, Türkiye
Gülgün Kayakutlu
Wroclaw University of Economics and Business, Wrocław, Poland
Mieczyslaw Lech Owoc
University of Galway, Galway, Ireland
Karl Mason
University of Galway, Galway, Ireland
Abdul Wahid
University of Calabria, Rende, Italy
Pierangela Bruno
University of Calabria, Rende, Italy
Francesco Calimeri
Marche Polytechnic University, Ancona, Italy
Francesco Cauteruccio
University of Calabria, Rende, Italy
Giorgio Terracina
University of Bamberg, Bamberg, Germany
Diedrich Wolter
Coburg University of Applied Sciences, Coburg, Germany
Jochen L. Leidner
FAU Erlangen-Nürnberg, Erlangen, Germany
Michael Kohlhase
University of Leeds, Leeds, UK
Vania Dimitrova

Rights and permissions

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jamshidi, P., Nowaczyk, S., Rahat, M. (2024). EcoShap: Save Computations by only Calculating Shapley Values for Relevant Features. In: Nowaczyk, S., et al. Artificial Intelligence. ECAI 2023 International Workshops. ECAI 2023. Communications in Computer and Information Science, vol 1947. Springer, Cham. https://doi.org/10.1007/978-3-031-50396-2_2

Download citation

DOI: https://doi.org/10.1007/978-3-031-50396-2_2
Published: 21 January 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-50395-5
Online ISBN: 978-3-031-50396-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

EcoShap: Save Computations by only Calculating Shapley Values for Relevant Features

Abstract