Privacy-Preserving Generative Adversarial Network for Case-Based Explainability in Medical Image Analysis

Although Deep Learning models have achieved incredible results in medical image classification tasks, their lack of interpretability hinders their deployment in the clinical context. Case-based interpretability provides intuitive explanations, as it is a much more human-like approach than saliency-map-based interpretability. Nonetheless, since one is dealing with sensitive visual data, there is a high risk of exposing personal identity, threatening the individuals’ privacy. In this work, we propose a privacy-preserving generative adversarial network for the privatization of case-based explanations. We address the weaknesses of current privacy-preserving methods for visual data from three perspectives: realism, privacy, and explanatory value. We also introduce a counterfactual module in our Generative Adversarial Network that provides counterfactual case-based explanations in addition to standard factual explanations. Experiments were performed in a biometric and medical dataset, demonstrating the network’s potential to preserve the privacy of all subjects and keep its explanatory evidence while also maintaining a decent level of intelligibility.


I. INTRODUCTION
Deep Learning has achieved outstanding results in most image classification tasks, including medical imaging [1], [2]. Nonetheless, most of these models are "black-boxes" whose predictions are difficult for humans to understand and, consequently, trust. Moreover, their outstanding performance sometimes relies on confounding factors rather than application-related features [3], [4]. Due to the lack of interpretability of Deep Learning algorithms, their application in real-world contexts, namely in clinics, is hindered. To overcome this problem, several interpretability methods have been proposed to enhance the transparency in the decisionmaking process and improve the trust in the results of the models [5].
Case-based interpretability techniques are very much in line with human reasoning, providing intuitive explanations through the presentation of representative examples [6]- [8].
These examples are selected using retrieval systems that can find the most semantically similar cases from a pool of well-curated candidates, explaining the observation under analysis [9]- [12]. However, these methods cannot be applied to contexts where the data exposes identity, such as in the medical scene, where privacy is a major concern. In [13], the authors show the weaknesses in the application of current privacy-preserving methods to medical data. Most of the current strategies fail to preserve relevant semantic features that serve as explanatory evidence in the context of casebased explanations. Furthermore, some privacy-preserving methods also fail to ensure privacy for all the subjects in the training data. This fact inhibits the use of these methods in the privatization of medical case-based explanations and highlights the need for having new privacy-preserving approaches.
• The proposal of a privacy-preserving generative model capable of privatizing case-based explanations in a clinical setting, enabling their use in real-world contexts. • The generation of counterfactual explanations that increase the explanatory value of the deep learning system.

II. RELATED WORK
This section briefly introduces case-based interpretability, followed by a literature review of the current privacypreserving methods for visual data. Since deep learning privacy-preserving methods use generative networks, we will also introduce some background on relevant deep generative models.

A. CASE-BASED INTERPRETABILITY
Case-based interpretability focuses on retrieving cases from the data, which may or may not be used to perform the predictive task, as explanations for a model's decision. There are various types of explanations that these methods can provide. Methods that establish a similarity metric to compare the data and retrieve the most similar cases produce factual explanations by similar examples [9]- [12]. The most well-known example of such a distance-based method is the traditional K-Nearest Neighbors algorithm. Prototypebased methods, which define prototypes representative of the data, produce explanations by typical examples [17]- [22]. Some methods produce counterfactual examples, whose purpose is to explain the alterations that should occur in the images to change their prediction [23], [24]. These methods usually generate the counterfactual explanations based on the original image rather than retrieving a case from the data. Finally, semi-factual examples have the same class as the original image but are closer to the decision boundary than the most similar case. Semi-factual explanations can be generated based on the original image [23] or retrieved from the data as a sample that is closer to a decision boundary than the case under analysis [25].

B. PRIVACY-PRESERVING METHODS FOR VISUAL DATA
Privacy-preserving methods have been applied in medical imaging with the purpose of increasing the availability of medical data to train artificial intelligence algorithms [26]. Anonymization and pseudonymization techniques remove or alter metadata associated with the medical images (e.g., the patients' names). However, the images themselves expose identity, which can be used to identify the patients through re-identification techniques [27]. Encryption [28] results in unintelligible images that cannot be shown to humans as case-based explanations. Other privacy-preserving techniques avoid disclosing sensitive information about the data during a model's training. For instance, Federated Learning [29] consists of training the models in the data owners' servers to avoid sharing the private medical data [30], [31]. Differential privacy [32] has also been applied to hide contributions of individual patients during a model's training [33]. Nevertheless, these techniques cannot be applied to privatize case-based explanations, which are meant to be exposed to humans, as they act on the model and not on the data itself. No privacy-preserving method for medical imaging considers altering the image to remove a patient's identifiable features while preserving disease-related information and the image's intelligibility. In this section, we review stateof-the-art privacy-preserving methods capable of generating intelligible privatized images, that have been applied in domains other than the medical field. We discuss the methods in regards to their application to case-based explanations. Furthermore, we consider that identity-related features in the images may be entangled with explanatory features that must be preserved. We distinguish these methods in traditional and Deep Learning methods.
Traditional privacy-preserving methods are applied over the whole input, as they cannot identify sensitive image regions. These methods require an additional pre-processing step to locate the image regions that need to be privatized. The most well-known traditional method consists of applying filters such as blur to an image [34]. The most significant issue in this type of method is that relevant explanatory features are lost at the same rate as identity features. As such, privatized images with acceptable degrees of privacy do not preserve explanatory evidence [13]. Another famous class of privacy-preserving techniques is the K-Same-based family [35], [36], which were developed for face de-identification. In these methods, the privatized images are an average of various training images, guaranteeing K-Anonymity, where the highest probability of a person being recognized in the image is 1 K . This technique imposes limitations in privacy, as the privatization process directly uses images from other subjects in the database, and in explanatory evidence preservation. An alternative to those methods is face-swapping [37], which consists of replacing the faces in an image with models from a public database. Although this method guarantees privacy, if identity-related features and explanatory features are entangled, the replacement of the image regions that contain identity-related features will result in the loss of the associated explanatory features.
In Deep Learning, privacy-preserving models usually comprise a generative network responsible for generating privatized images and an identity recognition network that guides the privatization process. Some models directly obtain identity vectors from the images by disentangling identityrelated features from the remaining features, as is the case with the CLEANIR model [38] and the R 2 V AE model [39]. These identity vectors can then be altered to hide the original identity of the images. Other privacy-preserving strategies focus on creating privatized images that do not share the same identity as the original images by using a Siamese identity recognition network [14] to guide the generation of privacypreserving images [40], [41]. These networks ensure image utility by maximizing the structural similarity between the original and privatized images.
The biggest problem in the previous deep learning methods is that none guarantees the preservation of relevant semantic features needed for a particular classification task. Privacy-preserving methods that preserve task-related features use a task-related classifier to ensure the feature preservation process. PPRL-VGAN [42] was developed for privacy-preserving facial expression recognition. It privatizes images through identity replacement. Although this model successfully hides the identity from the original image, it exposes identities from other subjects in the data. As such, this model still violates privacy. Furthermore, this model only preserves the task-related class of the original image and not its explanatory features.
In general, none of the privacy-preserving models explores the explicit preservation of the original images' explanatory evidence. Furthermore, some of the models still possess privacy issues as they directly use training data in the privatization process.

C. GENERATIVE MODELS
Generative models model the probability distribution of the data and allow the generation of new data by sampling from the learned distribution. The most relevant generative models for this work are Generative Adversarial Networks (GANs) and Variational Autoencoders (VAEs).
GANs [43] capture the data distribution implicitly, through a minimax game between a generator and a discriminator. The generator is responsible for generating new samples, while the discriminator is responsible for the binary classification task of distinguishing between real and fake images. The goal of the generator is to trick the discriminator into classifying the generated images as real.
VAEs [44] explicitly learn an approximation of the data distribution. These models maximize data likelihood using an encoder-decoder architecture. The encoder maps an image in the original data space into its representation in a latent space with a simpler distribution. The decoder learns to map points from the latent space into the original data space by reconstructing the original image from its representation. Then, to generate new images, we can sample points from the simple distribution used in the latent space and use the decoder to visualize them in the original data space. VAEs are trained using a regularization loss based on the Kullback-Leibler (KL) divergence, to approximate the distribution of the latent representation to the original one, and a reconstruction loss.

III. PROPOSED METHODOLOGY
Given an image that serves as a Deep Learning model's explanation, we aim to produce an intelligible image that does not expose the original image's identity and, at the same time, preserves the image's explanatory evidence.
We build our approach on top of one existing privacypreserving model: the PPRL-VGAN [42]. This work is motivated by [13], where the authors analyze the PPRL-VGAN model and its applicability to the privatization of case-based explanations, highlighting its potential but also identifying its weaknesses. In this work, we introduce new modules and loss functions to improve the PPRL-VGAN in terms of privacy, intelligibility, and preservation of explanatory evidence, in the context of medical imaging.

A. PPRL-VGAN MODEL
The PPRL-VGAN model proposed by Chen et al. [42] comprises a GAN with a conditional VAE as the generator and a multi-task discriminator, as shown in Figure 1. The generator is conditioned into generating an image with the target identity replacement c. The multi-task discriminator contains a real/fake classifier to promote realistic synthetic images, a multi-class identity recognition network to guide the privatization, and a task-related classification network to ensure the preservation of the original image's class.
There are various weaknesses in this model that prevent its use for the privacy-preservation of case-based explanations. The most critical ones are the privacy violation inherent to using identity replacement as the privatization mechanism and the non-preservation of explanatory evidence as it exists VOLUME 4, 2016 in the original image [13]. Regarding applying this model to medical data, the model also has difficulty in disentangling identity-related factors in cases where most subjects only have images from one disease-related class [13]. Moreover, a multi-class identity recognition network is challenging to train when the data only has a small number of images per identity, as frequently happens in the medical context.

B. PRIVACY-PRESERVING NETWORK WITH MULTI-CLASS IDENTITY RECOGNITION
Using the PPRL-VGAN model as a base, we defined a novel privacy-preserving network for the privatization of casebased explanations.
To ensure that privacy is preserved for every subject in the training data, we removed the replacement identity given to the decoder. Instead of creating an image that looks like the replacement identity, we try to keep the identity recognition close to random guessing (i.e., close to a uniform distribution).
By promoting a uniform distribution across identities, the generative task became more complex, leading to poor image quality and mode collapse problems. We pre-trained the identity recognition model and the task-related classifier on the dataset used to train the privacy-preserving model, to facilitate the generative task and improve image quality. In PPRL-VGAN, the mode intentionally collapsed to the identity given as replacement and to the task-related class from the original image. However, in our case, the mode collapse was unintentional and affected the explanatory value of the images, as they all looked identical. To fix this problem and improve image quality, we replaced the generative framework with a WGAN-GP network [56], using Wasserstein loss with gradient penalty to stabilize the discriminator.
We explicitly preserve explanatory evidence by using interpretability saliency maps to reconstruct relevant taskrelated features in the privatized images. In specific, we use Deep Taylor [57] to create masks containing the relevant image features. We input these masks into the generative network and concatenate them with the original images inside the VAE's encoder, after feature extraction and before calculating the parameters of a Gaussian distribution. In the loss function, we use the squared L2 loss to reconstruct relevant features. We also ensure that the privatized images are assigned the same classification score as the original images to aid the preservation of explanatory features.
We summarized the changes introduced to the PPRL-VGAN model in Figure 2. With these changes, we obtained a privacy-preserving model with three modules: a generative module, a privacy module, and an explanatory module.
Generative Module: The generative module is responsible for the generation of intelligible images, given an image I from the original data space's probability distribution p d . It is composed of a GAN with a VAE as the generator G. The discriminator, D, is trained using Wasserstein loss and gradient penalty, as shown in Equation 1, wherex corresponds to random samples and λ is the weight associated with the gradient penalty term. In the generator, there are two terms: a realness term to promote the generation of realistic images (Equation 2), and a regularization term in the VAE. The regularization term, shown in Equation 3, consists of approximating the prior distribution on the latent space p(f (I)), where f (I) corresponds to the image I's latent representation, and the conditional distribution q(f (I) | I) parameterized by the encoder. (1) Privacy Module: The privacy module is responsible for anonymizing the images, guaranteeing privacy for the subjects in the image and in the database. Using a pre-trained multi-class identity recognition network D id , we promote a uniform identity distribution in the privatized images. As such, the generator contains a privacy term in the loss function, represented in Equation 4. In this equation, U represents a uniform distribution with noise. Explanatory Module: The explanatory module is responsible for guaranteeing the privatized images' explanatory value. We preserve the explanatory evidence through the reconstruction of explanatory features in the images, using Deep Taylor saliency maps, M , obtained by applying the task-related classifier D exp on the original images. We also approximate the privatized images classification score to the one in the original images. The generator loss terms representative of this module are shown in Equation 5.
Finally, the entire generator's loss is depicted in Equation 6. λ x are parameters to control the importance of each loss term x.

C. PRIVACY-PRESERVING NETWORK WITH SIAMESE IDENTITY RECOGNITION
As it stands, our privacy-preserving model cannot be used in domains where the number of images per subject is scarce, which is frequently the case in the medical context, since a multi-class identity recognition network is hard to train in these scenarios. To widen the range of application of our model, we replace the multi-class identity recognition network with a Siamese network [14], pre-trained on the dataset used to train the privacy-preserving model. The Siamese identity recognition network compares the original image with its privatized version and computes their identity-related distance, which can be used to classify whether the images belong to the same identity or not. We trained this network using a contrastive loss [58], represented in Equation 7. In this equation, m represents a margin to limit the distance between images, Y represents the label assigned to an image pair (1 when the images belong to the same identity, and 0 otherwise), and ED represents the Euclidean Distance between the image pair embeddings.
By using this network, we ensure that the privatized image is different from the original image in terms of identity. To guarantee that the generated images also do not look like the images of the other identities present in the dataset, we use the Siamese network to increase the identity-related distance between the privatized image and the images from each of the subjects present in the database. In practice, at each epoch during training, we randomly select one image from each of the identities, and promote that this image is far from the privatized image.
The privacy term of the generator loss function, when using the Siamese network, is represented in Equation 8, where N is the number of identities that exist in the dataset.

D. GENERATION OF COUNTERFACTUAL EXPLANATIONS
We also apply our model to the generation of counterfactual explanations. We add a counterfactual generation module to the previously defined privacy-preserving network in the form of a counterfactual decoder responsible for mapping an image's latent representation to its counterfactual. To generate counterfactual explanations, we aim to perform the smallest number of alterations to the privatized factual explanations to change their predicted class. As such, the VOLUME 4, 2016 counterfactuals' decoder is trained to minimize the pixelwise distance between the factual and counterfactual explanations while changing the original image's task-related prediction. We use the saliency masks with the explanatory features to promote changes in the image regions relevant to the explanatory classification task while preserving the remaining image parts. This network's architecture is shown in Figure 3.
Regarding the training approach, we first train the factual decoder as in the previously presented networks, with the counterfactual decoder frozen. Then, we freeze the factual decoder and transfer its weights to the counterfactual decoder to train it. The generator's loss function used to train the counterfactual decoder is represented in Equation 9. In this equation, F (I) and C(I) denote the privatized factual and counterfactual explanations, respectively.

IV. EXPERIMENTS
For the experiments, we used the medical and biometric dataset Warsaw-BioBase-Disease-Iris v2.1 [15], [16], composed of 2,996 iris images with various eye pathologies acquired from 115 different patients. We only used the 1,795 images taken from the device IrisGuard AD100, and we focused on one of the pathologies, glaucoma. The images were labeled according to the presence or absence of glaucoma. In the pre-processing stage, we cropped the images to remove labels in their lower corners, horizontally flipped the patients' right eye images, and centered the iris of the eye in the middle of the image. The images' resolution was set to 64 × 64 and they were split into 65% for training, 15% for validation, and 20% for testing. To obtain masks with relevant glaucoma features located inside the iris, we generated iris segmentation masks and performed an AND operation between the Deep Taylor saliency maps and the iris segmentation masks.

A. PRIVACY-PRESERVING MODEL WITH MULTI-CLASS IDENTITY RECOGNITION
In the privacy-preserving model with multi-class identity recognition, we used as parameters λ 1 = 0.4, λ 2 = 1, λ 3 = 2, λ 4 = 0.001 and λ 5 = 0.002. We used λ = 10 in the discriminator's loss, as suggested in the original WGAN-GP paper [56]. We used the Adam optimizer with a learning rate of 2e −5 . The results are presented in Figure 4. Although the images possess some visible noise, they can be considered intelligible. We notice that the network has some difficulty creating a realistic eye structure surrounding the iris. In the visual results, the privatized image's Deep Taylor saliency maps closely resemble the ones from the original images, evidencing the correct preservation of explanatory evidence. We include in Table 1 the results achieved with this network. The identity recognition network's accuracy is evaluated at recognizing the subject from the original image.
To evaluate privacy at the whole dataset's level, we analyze the maximum score that the identity recognition model assigns to an identity when making a prediction about a privatized image. We also evaluate the divergence between the privatized images' identity distribution and the uniform distribution, using KL Divergence. Finally, we assess the Glaucoma Recognition network's accuracy at detecting the original images' glaucoma score in the privatized images.
The low accuracy in identity recognition suggests that the privacy-preserving model succeeds at privatizing the images. The values for the maximum identity score and KL divergence suggest that the network has difficulty recognizing any identity, as these values are significantly lower than the baseline. Furthermore, the high values in glaucoma recognition accuracy advocate for the network's high capacity of preserving explanatory evidence.
During the network's development, the most significant challenge we came across was to manage the trade-off between privacy, intelligibility and explanatory evidence. In most cases, improving one of these dimensions would result in worsening the remaining ones. In our model, the most sacrificed dimension was intelligibility, as the generated images have poorer quality than the original ones. When we try to remove one of the other dimensions, the image quality improves. For instance, removing explanatory evidence results in the higher-quality results shown in Figure 5.

B. PRIVACY-PRESERVING MODEL WITH SIAMESE IDENTITY RECOGNITION
Using the privacy-preserving model with Siamese identity recognition, with parameters λ 1 = 0.4, λ 2 = 5, λ 3 = 2, λ 4 = 0.001, λ 5 = 0.002 and λ 6 = 10, we obtained the results shown in Figure 6.   This model provides higher-quality images than the previous multi-class identity recognition model. Nonetheless, the model also suffers from a trade-off between privacy, intelligibility, and explanatory evidence. For instance, when we remove the overall privacy term (λ 6 = 0), we obtain privatized explanatory features that resemble more closely the ones from the original images, as shown in Figure 7. Table 2 exposes the results obtained with this model. To evaluate privacy, we use the previously developed multi-class identity recognition model as an evaluation network. Then, we use the Siamese identity recognition model's accuracy at recognizing that the original and privatized images belong to different identities. To calculate this accuracy, we verify whether the distance between image pairs is higher than 0.777, corresponding to the average distance value obtained when using the Siamese network on image pairs from the original testing set. To evaluate the privacy in the whole dataset, we obtain the identity recognition accuracy when comparing the privatized images with an image from each identity available in the dataset. We also evaluate the number of pairs that are considered to be from the same identity (real pairs). In this table, we expect to achieve low values in multiclass identity recognition and average number of real pairs, and high values in the remaining metrics.
We obtained a higher privacy degree by considering overall privacy, as seen by the lower accuracy in multi-class recognition and higher accuracy in Siamese identity recognition. Furthermore, when we consider overall privacy, there are fewer images from the dataset's subjects that are considered to be from the same identity as the privatized images. The privatized set with overall privacy also achieved higher glaucoma recognition accuracy.

C. COUNTERFACTUAL GENERATION
By adding a counterfactual generation module to the privacypreserving model with multi-class identity recognition, with parameters λ 7 = 0.001 and λ 8 = 1, we were capable of inverting the glaucoma classification of the original image with 90.29% accuracy. With the model that uses Siamese identity recognition, we achieved 90.88% accuracy in inverting the images' glaucoma classification. Furthermore, in both models, the differences between the factual and the counterfactual explanations are located mainly in the iris region. An example of the obtained results using the Siamese identity recognition model is shown in Figure 8.
In this experiment, we used the Deep Taylor glaucoma masks to promote changes located inside the iris and, thus, avoid alterations in zones that are irrelevant to glaucoma classification that may occur as an adversarial attack. However, even with these masks, the counterfactual decoder may be performing an adversarial attack on the glaucoma classification network, tricking it into misclassifying the samples and generating adversarial samples instead of counterfactual explanations.

D. ABLATION STUDY
To verify how the generator used in the privacy-preserving models fares in comparison with other state-of-the-art architectures, we replaced it with a ResNet VAE, which contains ResNet [59] as the encoder and decoder, and with a UNET architecture [60]. We performed this experiment with the multicass identity recognition version of the privacy-preserving model. The results are shown in Table 3 and in Figure 9. Although the results obtained with the ResNet VAE present higher privacy, the images lack intelligibility and explanatory value, hindering their use as explanations. The UNET has a higher capacity to preserve features, as verified by the higher accuracy in identity recognition and glaucoma recognition. Furthermore, since the image generated by the UNET is extremely similar to the original one, this network might be performing an adversarial attack on the identity recognition network instead of an adequate anonymization.
Given these results, we can conclude that the original generator with a standard convolutional VAE is the one that provides better and more balanced results, guaranteeing both privacy and the explanatory value of the images.

E. STATE-OF-THE-ART COMPARISON
In this section, we compare our privacy-preserving models with the state-of-the-art methods blurring, K-Same-Select [35] and PPRL-VGAN [42]. These methods had previously been applied to the Warsaw-BioBase-Disease-Iris v2.1 dataset in [13]. The results in terms of identity recognition and glaucoma recognition are summarized in Table 4. Our privacy-preserving models have a higher capacity to preserve explanatory features than the methods from the literature while obtaining comparable results in identity recognition. Furthermore, our models promote privacy for every patient in the dataset, unlike K-Same-Select and PPRL-VGAN, which directly use identities from the dataset in the privatization process (through image averaging or identity replacement). As such, our privacy-preserving models are the most appropriate to be applied to the domain of medical casebased explanations.

V. DISCUSSION AND CONCLUSIONS
In this paper, we developed a privacy-preserving model to privatize case-based explanations. The model tackles the most significant weaknesses of current privacy-preserving models, guaranteeing privacy, intelligibility, and preservation of explanatory evidence. At first, we used a multiclass identity recognition model to guide image privatization. Then, we widened the range of application of our model by using a Siamese identity recognition network to guide the privatization, enabling the model to be used when medical data only has a small number of images per subject.
Our approach regarding the preservation of explanatory evidence consisted of using interpretability saliency maps to reconstruct relevant features. However, post hoc techniques are often criticized for not reflecting a model's real reasoning [22]. As such, using these methods to preserve explanatory features when privatizing explanations obtained through intrinsic interpretability methods clashes with the intrinsic methods' goal of providing accurate representations of a model's reasoning. In such cases, if the intrinsic interpretability method defines a similarity measure to semantically compare two images, it should be possible to use this measure to approximate the privatized image to the original image in regard to explanatory features.
We have also applied the model to generate counterfactual explanations based on the privatized factual explanations. The counterfactual explanations highlight the changes in an image that would lead to a reversal of the class prediction. We used interpretability saliency maps to promote changes in image regions related to the classification task. Nonetheless, the resulting explanations may be adversarial examples whose alterations are not related to the concepts associated with the classification task. Even though we only considered a binary classification task in our work, the approach is generalizable to the multi-class scenario. To apply the counterfactual generation model to multi-class classification problems, the counterfactual decoder could be trained to receive the latent representation of an image and the target class of the counterfactual, allowing to retrieve counterfactual explanations representative of each class.
Future work should consider integrating privacy in the image retrieval process to optimize the selection of explanatory cases and using causality to ensure that features preserved in the privacy-preserving explanations are causally related to the explanatory task.
In conclusion, this work contributes to enabling the use of case-based explanations in contexts where the data violates the privacy of individuals, like in medical imaging. Cardoso is also a Senior Member of IEEE since 2011. His research can be summed up in three major topics: computer vision, machine learning and decision support systems. Image and video processing focuses on medicine and biometrics. The work on machine learning cares mostly with the adaptation of learning to the challenging conditions presented by visual data, with a focus on deep learning and explainable machine learning. The particular emphasis of the work in decision support systems goes to medical applications, always anchored on the automatic analysis of visual data. Cardoso has co-authored 300+ papers, 90+ of which in international journals, which attracted 5800+ citations, according to google scholar. VOLUME 4, 2016