The role of explainability in creating trustworthy artificial intelligence for health care: a comprehensive survey of the terminology, design choices, and evaluation strategies

Artificial intelligence (AI) has huge potential to improve the health and well-being of people, but adoption in clinical practice is still limited. Lack of transparency is identified as one of the main barriers to implementation, as clinicians should be confident the AI system can be trusted. Explainable AI has the potential to overcome this issue and can be a step towards trustworthy AI. In this paper we review the recent literature to provide guidance to researchers and practitioners on the design of explainable AI systems for the health-care domain and contribute to formalization of the field of explainable AI. We argue the reason to demand explainability determines what should be explained as this determines the relative importance of the properties of explainability (i.e. interpretability and fidelity). Based on this, we give concrete recommendations to choose between classes of explainable AI methods (explainable modelling versus post-hoc explanation; model-based, attribution-based, or example-based explanations; global and local explanations). Furthermore, we find that quantitative evaluation metrics, which are important for objective standardized evaluation, are still lacking for some properties (e.g. clarity) and types of explanators (e.g. example-based methods). We conclude that explainable modelling can contribute to trustworthy AI, but recognize that complementary measures might be needed to create trustworthy AI (e.g. reporting data quality, performing extensive (external) validation, and regulation).


Introduction
Artificial intelligence (AI) offers great opportunities for progress and innovation due to its ability to solve cognitive problems normally requiring human intelligence. Practical successes of AI in a variety of domains already influenced people's lives (e.g. voice recognition, recommendation systems, and self-driving cars). In the future, AI is likely to play an even more prominent role. The International Data Corporation estimates spending on AI to increase from 37.5 billion in 2019 to 97.9 billion in 2023 [1]. Due to the increasing availability of electronic health records (EHRs) and other patient-related data, AI also has huge potential to improve the health and well-being of people. For example, by augmenting the work of clinicians in the diagnostic process, signaling opportunities for prevention, and providing personalized treatment recommendations. Although some simple assistive tools have been deployed in practice [2,3], there is no widespread use of AI in health care yet [4,5].
Lack of transparency is identified as one of the key barriers to implementation [5,6]. As it is the responsibility of clinicians to give the best care to each patient, they should be confident that AI systems (i.e. AI models and all other parts of the implementation) can be trusted. Health care is a domain with unique ethical, legal, and regulatory challenges as decisions can have immediate impact on the well-being or life of people [7]. Oftenmentioned concerns include potential algorithmic bias and lack of model robustness or generalizability. Other problems include the inability to explain the decision-making progress of the AI system to physicians and patients, difficulty to assign accountability for mistakes, and vulnerability to malicious attacks. It is still unclear how to implement and regulate trustworthy AI systems in practice. We follow the definition of the High-Level Expert Group on AI 1 that trustworthy AI should satisfy three necessary conditions: AI systems should comply with all applicable laws and regulations (lawfulness), adhere to ethical principles and values (ethicality), and be safe, secure and reliable (robustness) [8].
It is difficult to ensure that these conditions hold as there are no proven methods to translate these conditions into practice [9].
A possible step towards trustworthy AI is to develop explainable AI. The field of explainable AI aims to create insight into how and why AI models produce predictions, while maintaining high predictive performance levels. In a recent report, the European Institute of Innovation and Technology Health [10] identified 'explainable, causal and ethical AI' as a potential key driver of adoption. Other guidelines mention explainability as a requirement of trustworthy AI [8,11]. However, it is undefined what a suitable explanation is and how its quality should be evaluated.
Previous research on explainable AI includes work on formal definitions [12,13], development of explainable AI techniques (for an extensive overview see Guidotti et al. [14]), and -to a lesser extent -evaluation methods (for a recent survey see Mohseni, Zarei and Ragan [15]). For high-level introductions to the field we refer to [16][17][18].
Murdoch, Singh, Kumbier, Abbasi-Asl and Yu [19] present a common vocabulary to help select and evaluate explainable AI methods. For a domain-specific introduction, we point to Ahmad et al. [7] who reviewed the notion of explainability and its challenges in the context of health care. For a recent review focusing on applications of explainable AI models in health care we refer to Payrovnaziri et al. [20]. Open problems include: a remaining lack of agreement on what explainability means [3,14,16], no clear guidance how to choose amongst explainable AI methods [19], and the absence of standardized evaluation methods [14,16,19].
In this paper we investigate how explainable AI can contribute to the bigger goal of creating trustworthy AI. We reviewed the literature of the last five years about research and developments in the field of explainable AI, focusing on papers that present conceptual frameworks or methodology, not on applications of existing techniques. As we are interested in the recent advancement in the field, we also included preprints posted on arXiv. We wanted to answer the following questions: § What does explainability mean? (Section 2) § Why and when is explainability needed? (Section 3) § Which explainable AI methods are available? (Section 4) 1 The High-Level Expert Group on AI is an independent group of experts set up by the European Commission. § How can explainability be evaluated? (Section 5) § How to choose amongst different explainable AI methods? (Section 6) By answering these questions we aim to provide guidance to researchers and practitioners on the design of explainable AI systems for the health-care domain and contribute to formalization of the field of explainable AI.

What does explainability mean?
In this section we discuss the meaning of the terms explainability, interpretability, comprehensibility, intelligibility, transparency, and understandability in more detail. Lipton [12] points out that these terms are often ill-defined in the existing literature. Researchers do not specify what terms mean, use the same term for different meanings, or refer to the same notion by different terms. Terms are used differently in the public versus scientific setting [16] and across AI communities [21]. There is a widely recognized need for more formal definition of the properties that explanations should satisfy [3,14,16].

Explainability
Some use explainability and interpretability synonymously [22,23]. However, looking at the literature, we suggest to follow Gilpin et al. [24], who state that interpretability and fidelity 2 are both necessary to reach explainability. The fidelity of an explanation expresses how accurately an explanation describes model behavior, i.e. how faithful an explanation is to the task model. Hence, they argue an explanation should be understandable to humans and correctly describe model behavior in the entire feature space. The importance of a faithful explanation is also recognized by others (e.g. [25]). It is a challenge in explainable AI to achieve both interpretability and fidelity simultaneously. In line with Arrieta et al. [17] we consider interpretability a property related to an explanation and explainability a broader concept referring to all actions to explain. The explanation can be the task model or a post-hoc explanation. The task model is the model generating predictions. A post-hoc explanation accompanies an AI model and provides insights without knowing the mechanisms by which the model works (e.g. by showing feature importance). This leads to the following definition: Definition 1: explainability An AI system is explainable if the task model is intrinsically interpretable (here the AI system is the task model) 3 or if the non-interpretable task model is complemented with an interpretable and faithful explanation (here the AI system also contains a post-hoc explanation).
We discuss different explainable AI methods in Section 4.

Interpretability and fidelity
For interpretability, various definitions exist in the literature. We distinguish three types of definitions: 1. Definitions based on formal aspects of system operations. For example, in the computer science field, a system is considered interpretable if the relation between input and output can be formally proven to be correct [26]. A common objection to this type of definitions is that they do not focus enough on the user value of explanations [7].
2. Definitions focused on the explanatory value to the user. One commonly used definition is provided by Doshi-Velez and Kim [13], who define interpretability as the ability to explain or to present AI systems in understandable terms to a human.
Although these definitions give the user a central role, a problem with this type of definitions is that it is still not clearly defined what it means to be understandable.
3. Definitions viewing interpretability as a latent property. Poursabzi-Sangdeh, Goldstein, Hofman, Vaughan and Wallach [27] argue that interpretability cannot directly be observed and measured. Instead, they define interpretability as the collection of underlying manipulatable factors influencing model complexity (e.g. number of features, model representation) that play a role in influencing different outcomes of interest.
To achieve a set of practical definitions, we split both interpretability and fidelity into generic underlying factors that together determine the quality of an explanation (definition type 3). This leads to the following definitions: Interpretability describes the extent to which a human can understand an explanation.

Definition 3: fidelity
An explanation is faithful if [29]: a. the explanation describes the entire dynamic of the task model, i.e. it provides sufficient information to compute the output for a given input (completeness) 5 , b. the explanation is correct, i.e. it is truthful to the task model (soundness).
Fidelity describes the descriptive accuracy of an explanation.
In Section 5 we investigate how to evaluate each of these properties quantitatively for different types of explainable AI methods. It is important to note that the usefulness of an explanation is influenced by the expertise (the level of AI or domain knowledge), preferences, and other contextual values of the target user [16]. The terms defined above thus depend on the user, which could be a developer, deployer (owner), or end-user of an AI application. Moreover, the usefulness of an explanation depends on the reason to demand explainability (more on this in Sections 3 and 6) [24,26].

Related terms
Other common terms in the literature are comprehensibility, intelligibility, transparency, and understandability. As with explainability and interpretability, some equate these to other terms, whereas others attach different meanings. Guidotti  Intelligibility is sometimes referred to as the degree to which a human can predict how a change in the AI system will affect the output [30]. Others equate intelligibility, understandability and interpretability [22,31]. We view intelligibility as one possible way to measure interpretability, but do not use this as separate notion as it does not describe a different goal. We do not distinguish comprehensibility, intelligibility, and understandability from interpretability.

Why and when is explainability needed?
Some argue explainable AI is necessary for the field of AI to further develop [16].
However, the importance of explainability depends on the application domain and specific use case. In this section we explore why and when the need for explainability arises.
Some mention potential problems of AI models that could be detected by explanations (e.g. use of a wrong or incomplete objective, distributional shift), others refer to model desiderata (e.g. reliability, legality) or end-goals (e.g. enhancing user acceptance, building trust).
Adadi and Berrada [16] summarize the literature by formulating four motivations for explainability: 1) to justify decisions and comply with the 'right to explanation', 2) to enable user control by identifying and correcting mistakes, 3) to help improve models by knowing why a certain output was produced, and 4) to gain new insights by investigating learned prediction strategies. However, this taxonomy does not mention one commonly mentioned motivation for explainability, as a means to verify whether other model desiderata are satisfied [13]. Examples of such model desiderata are fairness (i.e.
protected groups are not discriminated against), generalizability (i.e. model transferable to different data), privacy (i.e. sensitive information in data protected), robustness (i.e. performance independent of input data), and security (i.e. model not susceptible to adversarial attacks). Although model desiderata are often used to motivate explainability, we stress explainability is not necessary to fulfill these model desiderata.
We classify explainable AI systems based on the need they strive to fulfill, as this influences the design choice of explainable AI systems. We distinguish three reasons why explainability can be required: 2. To manage social interaction. The need for explainability can also be motivated by the social dimension of explanations [32]. One reason why people generally ask for explanations is to create a shared meaning of the decision-making process. This is important to help build trust. Furthermore, it is important to comply with the 'right to explanation' in the EU General Data Protection Regulation (GDPR) of the European Union [33] and to allow the possibility of a human in the loop. Even when there is no legal obligation, decision makers are often expected to provide an explanation (e.g. in a clinical care setting).
3. To discover new insights. One can also demand explainability to learn from the models for knowledge discovery. Explainability enables comparisons of learned strategies with existing knowledge and facilitates learning for educational and research purposes. These insights can be used to guide future research (e.g. new drug development).
However, explanations can be costly (time consuming to design and to use) and might only be needed in some AI applications. First, when the cost of misclassification is high.
For example, in safety-critical applications where life and health of humans is involved or when there is potential of huge monetary losses. Second, when the AI system has not yet proven to work well in practice and we still need to work on user trust, satisfaction, and acceptance. When the model has no significant impact or has proven its performance sufficiently, the cost of explanation may outweigh the benefit [34].

Which explainable AI methods are available?
There are many different explainable AI methods described in the literature. One way to achieve explainable AI is by explainable modelling, i.e. by developing an AI model where the internal functioning is directly accessible to the user, so that the model is intrinsically Different taxonomies have been proposed based on the explanation-generating mechanism, the type of explanator, the scope of explanation, the type of model it can explain, or a combination of these features [14,35]. We are interested to classify explainable AI techniques according to the type of explanator and the scope of explanation, as these properties have a strong influence on the interpretability and fidelity of an explanation, and how this can be evaluated. We distinguish three types of explanators: model-based explanations, attribution-based explanations, and examplebased explanations. Each type of explanator can be used to provide a global or local explanation. In explainable modelling there is no difference in scope, as the task model gives both explanations. We focus on post-hoc explanation methods that are model-agnostic.
The explainable AI methods discussed in this section are summarized in Table 1. We now discuss each type of explanator in more detail.

Model-based explanations
This class includes the methods that use a model to explain the task model. Model-based explanations fall under explainable modelling as well as post-hoc explanations. Either the task model itself is used as explanation (explainable modelling) or another, more interpretable model is created to explain the task model (post-hoc explanation). Note that the task model can always provide an additional explanation to the user, even when posthoc explanations are used, but this explanation might not be sufficient if the task model is non-interpretable.
In explainable modelling, the aim is to develop a task model that is by itself interpretable for the user. To achieve this one can opt for a model class that is known to generate interpretable models for humans. Three model classes that are typically considered interpretable are sparse linear classifiers (e.g. linear/logistic regression, general additive models), discretization methods (e.g. rule-based learners, decision trees) and examplebased models (e.g. K-nearest neighbors) [53]. However, interpretability is also influenced by other factors such as the number and comprehensibility of input features. Hence, even though a decision tree is typically considered easier to interpret than a neural network, a deep decision tree may be less interpretable than a compact neural network. Other ways to obtain an interpretable model include using architectural modifications, developing hybrid models, or training the task model to provide explanations. Architectural modifications include regularization to set shape constraints [54], to stimulate high layers in a neural network to represent an object [55], or to develop neural networks with the structure of an additive index model [56].

Attribution-based explanations
Attribution methods rank or measure the explanatory power of input features and use this to explain the task model. The majority of post-hoc explanation methods available fall in this class. These methods are sometimes also called feature/variable importance, relevance, or influence methods. We can classify attribution methods according to the explanation-generating mechanism into perturbation and backpropagation methods [60].
Methods based on backpropagation are not model-agnostic, as they are often designed for a specific model class or require the model function to be differentiable. We thus only discuss example methods based on perturbation here.

How can explainability be evaluated?
Many papers claim to have reached their goal without formal evaluation ('I know it when I see it') [13]. Several researchers stress the need for formal evaluation metrics and a more systematic evaluation of the methods [14,16]. Before discussing the relevant literature, we point out that the goal of evaluation methods is twofold. First, evaluation allows a formal comparison of available explanation methods. Many methods have been proposed, often with a similar goal, but it is unclear which one is to be preferred. When evaluating post-hoc explanations, the problem is there is no ground truth, as we do not know the real inner workings of the model [32]. Second, evaluation offers a formal method to assess if explainability is achieved in an application. Here the focus lies on determining if the offered form of explainability achieves the defined objective [12].
Doshi-Velez and Kim [13] divide evaluation approaches in application-grounded Quantitative metrics include measuring human-machine task performance in terms of accuracy, response time needed, likelihood to deviate, or ability to detect errors [27,[61][62][63].
Although application-grounded evaluation approaches provide the strongest evidence of success [13], developing an AI system by repeatedly updating and evaluating on humans is an inefficient process. Experiments with humans are necessary and provide valuable information, but also have important disadvantages: they are expensive, time-consuming, and subjective. For an objective assessment of explanation quality and a formal comparison of explanation methods, we need purely quantitative metrics [32].
To evaluate quality, we are interested to evaluate to what extent the properties of explainability, i.e. interpretability (consisting of clarity and parsimony) and fidelity (consisting of completeness and soundness), are satisfied (see Section 2). Note that some explainable AI methods always satisfy certain properties. Model-based explanations provide a full explanation of the task model and thus always satisfy the completeness property. Similarly, we know that soundness is satisfied when the task model itself is used as explanation. Finally, as global explanations provide a rationale for the entire model at once, they usually satisfy the clarity property. An exception to this are model-based explanations that can provide multiple rationales because of ambiguity in the model itself.
For example, due to overlapping rules in the case of an unordered rule-based system [36].
In the remainder of this section we discuss the quantitative proxy metrics available to evaluate the quality of model-based, attribution-based, and example-based explanations.
We focus on model-agnostic evaluation methods that are domain and task independent. Table 2 summarizes the availability of metrics. We now discuss methods to evaluate each type of explanator in more detail.

Evaluating model-based explanations
Model size is often used as an approximation for model complexity and used to measure the level of model interpretability [14,64]. Examples of metrics are the number of features   [36]. They define fidelity as the level of (dis)agreement between the task model and model explanation and measure it as the percentage of predictions that are the same. Their measure of unambiguity (based on rule overlap and coverage of feature space) is not model-agnostic, but can be used to quantify clarity for unordered rule-based systems that are used as global explanation. We did not find a metric to assess the clarity of local modelbased explanations.

Evaluating attribution-based explanations
Different methods to evaluate attribution-based explanations are proposed in the literature. We distinguish between empirical and axiomatic evaluation approaches. We discuss both in turn.
Empirical evaluation approaches directly measure the performance of the attribution method. Some do this by artificially creating a ground truth for evaluation. Ribeiro et al. The model with the lowest number of untrustworthy predictions is considered most sound.
Arras, Osman, Müller and Samek [65] propose to evaluate methods using a simple toy task by adding or subtracting inputs, such that the true relevance value is known.
Other methods are based on measuring performance degradation using perturbation analysis. Samek, Binder, Montavon, Lapuschkin and Müller [66] propose to evaluate the quality of heatmaps by progressively removing information using perturbation and measuring the performance drop. Their method is generalizable beyond the evaluation of heatmaps, by perturbing the most important input variables instead. Hooker, Erhan, Kindermans and Kim [67] also propose an empirical evaluation method, but point out that the performance drop found using the approach of Samek et al. Furthermore, they define selectivity as the ability of an explanation to give relevance to variables that have the strongest impact on the prediction value, and suggest to compute this using the method of Samek et al. [66]. Finally, Ancona et al. [60] state an attribution method satisfies sensitivity-N when the sum of the attributions for any subset of features of cardinality N is equal to the variation of the output caused by removing the features in the subset. This property is also called completeness or summation to delta [32,68]. Tests for these properties can be used to assess the soundness of explanations, with the exception of continuity [69], which relates to the clarity of an explanation.
Unfortunately, both methods have important limitations. Although empirical evaluation approaches capture the essence of what an attribution method should achieve, they cannot distinguish the performance of the task model from the performance of the attribution method [68]. Axiomatic evaluation approaches, on the other hand, have the disadvantage that it is difficult to define what we expect from attribution methods. More research is necessary to define the desired behavior of explanation methods and to translate them to testable properties [32]. Moreover, we did not find metrics to assess the completeness of attribution-based explanations.

Evaluating example-based explanations
We do not know of any example-based evaluation methods. Evaluation methods for example-based explanations might be underdeveloped, because the methods have slightly different objectives (i.e. selecting prototypes and criticisms, identifying influential instances, or creating counterfactual explanations) and less methods are available of this class.

How to choose amongst different explainable AI methods?
How to best design explainable AI systems is a non-trivial problem. We argue that depending on the reason to demand explainability (see Section 3), different properties of explainability might be more or less important (see Section 2). We believe fidelity is most important to assist in verifying other model desiderata or to discover new insights, as it is essential to find the true underlying mechanisms of the model. We argue interpretability, on the other hand, is most important to manage social interaction as we know from social sciences humans also tailor their explanation to their audience and do not necessarily give the most likely explanation [23].  Step-by-step guide with concrete design recommendations to choose amongst explainable AI methods.

6.1
Step 1: How important is explainability relative to predictive performance?
Predictive performance is of crucial importance. No one would be willing to adopt an AI system that has unsatisfactory performance. However, explanations can be costly and the importance of explainability depends on the application domain and specific use case (see Section 3). If explainability is not important and it is acceptable to have a black box model, one can look for the model with the best predictive performance.
However, when explainability is (somewhat or very) important, one needs to choose amongst explainable AI methods. Post-hoc explanations are approximations of the task model's inner workings and are by definition not completely faithful. These explanations have the potential to present plausible but misleading explanations [12,70]. As the goals of the explainer and the user of the explanations are not necessarily aligned, it is difficult to determine whether explanations can be trusted. Whereas the goal of the explainer could be to simply generate user trust (i.e. might be beneficial not to reveal mistakes), the user might want to understand limitations of the AI system (i.e. is interested in mistakes). Moreover, explainers can provide an empty explanation (i.e. without information content) to soothe users or carefully select one suiting their goals [32]. Hence, in cases where both interpretability and fidelity are very important, i.e. explainability is required, explainable modelling is most appropriate. When explainability is somewhat important, the next relevant question is whether a complex model is performing better than an interpretable model.

6.2
Step 2: Does a complex model perform better than an interpretable model?
Machine learning algorithms that are known for their promising predictive performance are often not interpretable (e.g. neural networks), and vice versa (e.g. linear regression).
Hence, a trade-off between predictive performance and interpretability might exist. It is important to note that this trade-off does not always occur, in which case a more interpretable model is generally preferred (i.e. explainable modelling) [71]. However, if the predictive performance decreases substantially when employing an interpretable model, one can opt for a post-hoc explanation.

6.3
Step 3: Is fidelity or interpretability more important?
The benefit of choosing for post-hoc explanation, is that the model with the best predictive performance can be used as task model (regardless of its interpretability

Discussion
Strategies to develop and regulate trustworthy AI systems are still under development [8,11,72]. As the demands for explainable AI and trustworthy AI are closely related, we investigated the role of explainability in creating trustworthy AI. We extended other recent surveys [14][15][16][17][18][19] by providing concrete recommendations to choose between classes of explainable AI methods. Furthermore, we proposed practical definitions and contributed to the existing literature by assessing the current state of quantitative evaluation metrics.
We now discuss how explainable AI can contribute to the bigger goal of creating trustworthy AI, and then highlight other measures to create trustworthy AI.

Developing explainable AI to create trustworthy AI
In Section 6, we argued that the reason to demand explainability determines what should be explained and introduced a step-by-step guide with concrete design recommendations.
When applied to make design choices for trustworthy AI, we note that ensuring the AI system is lawful, ethical, and robust coincides with the reason to assist in verifying other model desiderata. For this need, we believe explainability -and especially fidelity -is extremely important. Hence, explainable modelling is the preferred method.
Some even argue explainable modelling is the only good choice in high-stake domains [71]. However, prioritizing explainability at the cost of accuracy can also be argued to be unethical and some experiments show people prefer to prioritize the accuracy of the system in health care [73]. Although post-hoc explanations can be misleading, a potential solution could be to develop post-hoc explanation methods that include argumentative support for their claims [70]. If one wants to opt for a post-hoc explanation, model-based explanations are the preferred type of explanator as they satisfy completeness and have quantitative proxy metrics available to evaluate soundness.
More research is needed to investigate the performance of explainable models (e.g. rulebased systems, generalized additive models with or without interaction terms) in the health-care domain [7] and to improve explainable modelling methods. Furthermore, as interpretable features lead to more interpretable explanations, interpretable feature engineering is also important. We believe developing hybrid methods with data-driven and knowledge-driven elements for feature selection or engineering is another promising research direction to enhance interpretability.
When using explainable AI to create trustworthy AI, evaluating the quality of explanations is key. This part of the literature is currently underdeveloped and there are no standard evaluation methods yet in the community [14,16]. We outlined several properties of explainability that are important (Section 2), and assessed the current state of quantitative evaluation metrics (Section 5). It should be noted that although quantitative proxy metrics are necessary for an objective assessment of explanation quality and a formal comparison of explanation methods, they should be complemented with human evaluation methods before employing AI systems in real-life as good performance may not give direct evidence of success [74]. We found that clarity is difficult to assess for local explanators and quantitative evaluation metrics for example-based methods are still lacking. Another problem when evaluating the quality of explanations is that although interpretability is generally accepted to be user dependent, it is not quantified as such [14]. Determining a standard that indicates when AI systems are explainable for different users is thus an important direction for future work. Furthermore, outlining how different explanations can be best combined in a user interface [75], and how these combined AI systems should be evaluated are open research problems.

Complementary measures to create trustworthy AI
Some argue that explanations are neither necessary, nor sufficient to establish trust in AI (e.g. [76]). Other important influences at play are perceived system ability, control, and predictability [77]. Hence, although the field of explainable AI can contribute to trustworthy AI, it has its limits [78]. Therefore, we highlight some other measures that can be used complementary to create trustworthy AI: • Reporting data quality. Training data may contain biases, mistakes, or be incomplete. Hence, understanding the data quality and how the data were collected is at least as important as explainability since it allows to understand the limitations of the resulting model [73]. An example of a framework that can be used to assess whether EHR data is suited for a specific use case is presented by Kahn et al. [79].
This framework evaluates data quality based on conformance, completeness, and plausibility, and can be used to communicate the findings in a structured manner to users of the AI system that uses the data.
• Performing extensive (external) validation. The concern that models are not robust or generalizable, can be addressed using external validation. Replicating a prediction model on new data can be a slow process due to lack of data standardization.
Although external validation is recognized as an area of improvement for clinical risk prediction models [80], it is increasingly feasible with the adoption of common data structures (e.g. OMOP-CDM [81]). The Observational Health Data Sciences and Informatics (OHDSI) network has developed standards and tools that allow patient-level prediction models to be developed and externally validated at a large scale in a transparent way following accepted best practices [82,83]. This also ensures reproducibility of the results. Other model desiderata can likewise be assessed directly instead of by using explainability.
Research in this area is for example addressing stability [84], fairness [85], and privacy [86]. Developing quality checks for these model desiderata, as well as investigating how to incorporate model desiderata during model optimization, are promising alternatives to create more trustworthy AI.
• Regulation. Although regulation of AI systems is currently still under development, established regulation for other safety-critical applications (e.g. drug safety) suggests that it can be an effective way to generate trust in the long-term. There are different possible forms of regulating AI. The first way is requiring the AI system to satisfy pre-defined requirements. However, it is difficult to define an exhaustive list of verifiable criteria that ensure an AI system is lawful, ethical, and robust.
Instead of regulating the end-product (i.e. the AI system), an alternative would be to control the development process by introducing standard development guidelines that should be followed. However, just like it is difficult to assess model quality on all desired points, it might be difficult to get sufficient insight in the development process. Finally, we could introduce a licensing system to regulate developers as suggested by Mittelstadt [9]. This allows professional accountability (e.g. malpractice can be punished with losing one's license) and can be compared to licensing of doctors in the health-care domain. The U.S. Food and Drug Administration is currently investigating new types of regulation for digital technologies, among which a shift of regulation from end-products to firms (developers) [87].

Conclusion
The aim of this paper is to provide guidance to researchers and practitioners on the design of explainable AI systems for the health-care domain and contribute to formalization of the field of explainable AI. The existing literature focuses on some of the questions answered in this paper separately, highlighting challenges and directions for future research. This survey provides a holistic view of the literature; connecting different perspectives and providing concrete design recommendations. We conclude that explainable modelling might be preferred over post-hoc explanations when using explainable AI to create trustworthy AI for health care. In addition, we recognize that explainability alone may not be sufficient and complementary measures might be needed to create trustworthy AI (e.g. reporting data quality, performing extensive (external) validation, and regulation).