Interpretability-Aware Industrial Anomaly Detection Using Autoencoders

The past decade has witnessed wide applications of deep neural networks in anomaly detection. However, the dearth of interpretability in neural networks often hinders their reliability, especially for industrial applications where practical users heavily rely on interpretable methods to provide explanations for their decision-making. In this paper, we propose a reconstruction-based approach to unsupervised detection of anomalies in industrial defect data. Our algorithm employs an interpretability score during both the training and test phases. Specifically, we train an autoencoder with a loss function that incorporates an interpretability-aware error term. After training, the autoencoder processes a specific feature from the difference between the test image and the average of training images and produces an attention map that is used for detecting the anomalies. Our method not only achieves competitive performance compared with non-interpretability-aware methods but also produces attention maps that facilitate a direct explanation of detection results, which can potentially be useful for industrial practitioners.


I. INTRODUCTION
Anomaly detection is an important research field in machine learning that aims to detect unusual patterns within given data [1], [2], [3]. It is widely used in various fields, such as network intrusion detection [4], signal processing [5], abnormal behavior detection [6], and medical image analysis [7]. Early anomaly detection algorithms were primarily used in the field of data mining [8]. However, in recent years, with the development of computer vision and related technologies, there has been an increasing interest in applying anomaly detection to the field of image processing [9], [10], [11]. In particular, many research works have introduced techniques that utilize deep learning to detect anomalies in images [12], [13].
In industrial applications, anomaly detection is crucial for detecting visual defects in products. Industrial anomaly detection aims to find visible defects in the appearance of various industrial products, including fabrics, chips, pharmaceuticals, and even building materials [14], [15], [16], [17], [18].
The associate editor coordinating the review of this manuscript and approving it for publication was Mehul S. Raval . These defects, though minor, may seriously affect the normal function of the product. In industrial anomaly detection, it is usually easy to obtain data that show a normal pattern, whereas it is often challenging to obtain data that represent possible defects. Therefore, the most natural scenario is the unsupervised learning setting, where the task is to use unannotated samples or normal samples to build a detection model and detect anomaly samples that differ from the expected pattern [19].
Benefiting from their powerful capability of feature extraction and representation learning, methods based on convolutional neural networks (CNNs) can greatly improve detection and localization accuracy [19], [20]. For high-resolution industrial application datasets for industrial applications, there already exist powerful anomaly detectors [21], [22]. However, in addition to robustness and model performance, model interpretability is crucial for decision-making, given the strict inspection for product quality and safety, especially in the manufacturing field [23], [24]. Although various methods exist for understanding CNNs [25], [26], they do not form a part of the anomaly detection model and thus cannot guarantee that the anomaly detection model is capable of correctly interpreting the results. In this paper, we innovatively use a gradient-based interpretation method [27] to include the heatmap used to explain a model as a loss, so that the model will extract features that are crucial for explaining the anomaly detection. In other words, our model promotes making decisions that align with human intuition, which are more explainable and helpful in industrial applications.
One famous class of approach in the anomaly detection area is reconstruction-based, or more specifically, based on the neural network architecture of an Autoencoder (AE) [28], [29]. The encoder transforms the input image into a latent variable, and the decoder maps the latent variable to a reconstructed version of the input image. The anomalous images can be detected since they usually have a larger reconstruction error between the input image and the reconstructed image. However, the reconstruction loss is usually unaware of the anomaly detection task and may produce uninterpretable results, e.g., a larger reconstruction error for normal pixels. In this paper, we address this problem from two aspects. First, we introduce a novel interpretability-aware loss to AE. In particular, we discourage attention over a large region where the attention is from an explainable anomaly heatmap. Second, we replace the reconstruction error used in decision-making with the heatmap for each image, which can be used for obtaining a localization map for interpreting the results. Specifically, the attention map is obtained by backpropagating the difference between averaged normal images and the candidate image, which is then processed to produce the anomaly score.
We summarize our contributions as follows.
• We introduce a novel interpretability-aware loss term for CNNs, which can be flexibly used in various models and produces interpretable anomaly detection results.
• We use this loss term to improve AE for anomaly detection. Accordingly, we further propose novel anomaly scores that are derived based on an explainable heatmap.
• The proposed anomaly score places greater emphasis on the defect area and can also detect multiple similar defects in a single image. This makes our model highly applicable and relevant for industrial practitioners.
• We conduct extensive experiments on industrial image datasets. The results, both quantitatively (in AUC scores) and qualitatively, show the effectiveness of our proposed model.

II. RELATED WORKS
We survey related works on unsupervised anomaly detection and interpretable CNN in §II-A and §II-B, respectively.

A. UNSUPERVISED ANOMALY DETECTION
Unsupervised anomaly detection, also known as novelty detection, is a critical machine learning task used to identify anomalous samples by constructing a model based on normal samples only [30]. Among the existing approaches to unsupervised anomaly detection, the most related works can be categorized into two main categories: classificationbased and reconstruction-based [31], [32], [33], [34]. Classification-based anomaly detection approaches aim to extract highly discriminative features from normal samples to identify anomalous samples [35], [36]. Recent examples of classification-based anomaly detection methods include OC-SVM [37], [38] and Deep SVDD [36].
On the other hand, the objective of the reconstruction-based approach is to reconstruct samples based on the extracted features, with the anticipation that anomalous samples will receive worse reconstruction results compared to normal samples based on the training information [28]. Compared to relatively early models such as K-means [39], recent reconstruction-based approaches adopt neural networks, especially autoencoders [40], [41], [42], variational autoencoders (VAEs) [43], [44], and GANs [30], [34]. Nevertheless, none of the above approaches consider an interpretability loss as our model. When used in industrial contexts, these approaches can hardly produce meaningful interpretations.
One work that is specifically related to ours is [45], where the GradCAM attention map is integrated with a VAE model to visually explain the principle behind anomaly detection. However, their method requires the use of the special VAE architecture with latent space parametrization representing the mean and variance of the posterior. In contrast, our model relies solely on the reconstruction of an AE and contains a novel component in the loss function that incorporates the GradCAM output. This component is used for deriving the anomaly score. As a result, our approach allows us to obtain more interpretable results in anomaly detection, which is particularly useful in industrial applications.

B. INTERPRETING CONVOLUTIONAL NEURAL NETWORKS
The task of explaining CNNs has received considerable attention in recent years because it provides an understanding of the model's authenticity and enhances the reliability of its outcomes [46], [47]. Two commonly used general approaches to visual-attention-based CNN visualization are the response-based method and the gradient-based method [48], [49]. Response-based methods such as SAGAN [50], ABN [51], and Class Activation Mapping (CAM) [25] modify the original CNN architectures for auxiliary information but require specific CNN architectures. For instance, CAM implements the visualization of CNNs by modifying the model structures with a global average pooling layer. However, CAM has restrictions in that it requires a global average pooling layer to be applied to the convolutional feature maps. On the other hand, the gradient-based approach utilizes the gradients calculated through backpropagation. Similar to CAM, the Gradient-weighted Class Activation Mapping (Grad-CAM) [27] generates a weighted attention map for CNN models, but based on gradients computed through backpropagation. GradCAM can be implemented without any VOLUME 11, 2023 restrictions on CNN architectures. However, GradCAM has mostly been adopted only for validation and visualization purposes after training the CNN model. CNN interpretation has been found beneficial in various applications, such as 3D object recognition [52], diagnosis [53], human activity recognition [54], and metric learning [55]. In particular, CNN interpretation is crucial in industrial applications, which serves as a motivation for the current work. For instance, in [56], a visualized feature map is extracted using GradCAM to meet the requirements of process engineers. Similarly, GradCAM feature maps are also extracted for power equipment maintenance [57] and electromechanical system diagnosis [58]. Other visualization techniques have also been adopted. For instance, in [59], t-SNE is adopted to visualize the wafer defect maps. Our method differs from the above works because we not only use GradCAM for interpretation but also incorporate it as part of the training process to improve our neural network model.

III. APPROACH
We review autoencoders and their losses in §III-A. We then introduce our novel interpretability-aware loss in §III-B and our attention map used for anomaly detection in §III-C.
to reconstruct the image, transforming z (t) intox (t) , which have the same dimensionality as and are similar to x (t) N t=1 . Here, d represents the dimensionality of the latent variable z (t) N t=1 , while c, h, and w respectively represent the numbers of channels, height, and width of the image. The parameters of E and D are learned by minimizing the reconstruction loss. Traditionally, the reconstruction error is computed using pixel-wise evaluation metrics, such as the ℓ 2 loss, to generate an anomaly score map based on the discrepancy between the input image and its reconstruction. However, it has been shown in [27] that incorporating a structural similarity loss in autoencoder architectures enhances the model's ability to capture inter-dependencies between image regions. Consequently, this approach effectively identifies complex structural defects in images.
The Structural Similarity Index (SSIM) [60] is a method used to compare two images to determine their similarity. It compares local patterns of pixel intensities in two images, denoted as x and y, based on three components: luminance l(x, y), contrast c(x, y), and structure s(x, y). These components are defined by respectively, where µ x denotes the average pixel luminance of the image x, σ x denotes the standard deviation of the pixel luminance of the image x. Here, the constants C 1 , C 2 , and C 3 are included to avoid zero denominators, with C 3 set to C 2 /2. The SSIM index is then defined as a function of l, c, and s, or more specifically, In this section, we introduce our Interpretability-Aware (IA) loss L IA . Unlike the vanilla GradCAM [27], which backpropagates the CNN based on the score for a specific classification type, our proposed loss can be implemented for non-classification tasks. Specifically, we backpropagate the latent variable z generated by the encoder (i.e., z = E(x)) until the gradient reaches a specified layer of the encoder. We remark that z can be any feature in a CNN, allowing our proposed loss to be applied flexibly to various CNN architectures. In addition to AE, we will also demonstrate its application in a classifier in our experiments.
To obtain the IA loss L IA for an input image x, encoded to a latent variable z, we backpropagate the gradient of each entry z i of z, i = 1, · · · , d, with respect to the feature maps A j in the j-th layer of the encoder. This process generates the GradCAM attention map M j , which is obtained through a linear combination of feature maps A j with ReLU activation: where k is the channel index. In (5), α k ij ∈ R is obtained by applying the global average pooling operation to the gradient of z i with respect to A k j : To ensure that the attention maps M ij generated from different layers are comparable, we conduct bilinear interpolation upsampling operations to bring them to the same size of 256 × 256. The IA loss L IA is derived from the upsampled attention maps. Specifically, we first calculate the mean of all the pixel values of the map M ij as follows: and then incorporate it into the regularization term as follows: where λ is the regularization coefficient of the IA loss. Note that due to the ReLU operation in (5), each pixel value M st ij is non-negative. Consequently, µ can be viewed as a LASSO [61] term essentially, promoting sparsity in the attention map. This sparsity encourages the attention to focus on a small number of pixels, which is particularly important in industrial applications where defective regions in products are typically limited in size. It is worth noting that this approach differs from using an ℓ 2 -like loss, which is commonly employed to prevent overfitting. As illustrated in Fig. 1, training the AE entails minimizing the sum of the SSIM loss, the MSE loss, and the IA loss. To distinguish our approach from other AEs, we refer to our model as IAAE. Once trained, the IAAE exhibits focused attention that is utilized in anomaly detection, as explained in the next section. Algorithm 1 summarizes the steps for training the IAAE.

C. GENERATING INTERPRETABILITY-AWARE ATTENTION MAP
For industrial products, the shooting conditions of their images may vary, making it inappropriate to retrieve a normal sample from the training stage and compare it directly with a test sample. To address this issue, we leverage an AE trained using an IA loss and utilize an Interpretability-Aware Attention Map (IAAM) for anomaly detection, as described below. It is important to note that IAAM differs from the GradCAM attention map discussed in the previous section. To be specific, IAAM focuses on the differences between a test sample and normal images and thus provides more for each batch {x (i) } i∈I do 3: L SSIM = SSIM(x (i) ,x (i) ), according to (4) 5: Compute L IA according to (5)-(8) 7: end for 10: end for suitable scores for anomaly detection. In contrast, GradCAM attention solely applies to the images themselves.
The first step in constructing the IAAM is to obtain a difference map between an input image y * and the pixel-wise mean of all normal examples used in training, denoted as x. Given that our model is applied to images of industrial products, the prior assumption is that in each anomalous image, the area of the defect is concentrated and often small compared to the entire product image. To locate these small, compact defects more accurately, we amplify the differences between y * andx. The detailed difference map is defined as follows. For each y * , let i be an index for the pixels (ranging from 1 to n = c · h · w). We first calculate the mean of all the exponential differences where the power p, applied tox i − y * i , is an energy measurement index, with a larger value of p indicating a greater emphasis on the differences. Similarly, we calculate the standard deviation of all the exponential differences as follows: At the end, we obtain the standardized imageỹ * whose entries are given byỹ * Onceỹ * is obtained, similarly to how we obtain the Grad-CAM attention map, we first encode it into a latent variable z * , and then backpropagate the gradient of z * with respect to the feature maps A * j in one of the encoder layers to generate the interpretability-aware attention map M * j . The procedure for obtaining the IAAM is summarized in Fig. 2.
The IAAM obtained through the above procedures can be used in anomaly detection tasks. Specifically, we use the sum of the pixel values in M * j as the anomaly score, which is compared to a threshold. If the anomaly score is larger than the threshold, then y * is considered an anomalous sample. Algorithm 2 outlines the necessary steps for performing anomaly detection.

IV. EXPERIMENT RESULTS
We validate the effectiveness of our IA loss by visualizing the results of a simple task in §IV-A. We compare our model with baseline methods and discuss its performance in §IV-B. All experiments reported in this paper were conducted on a GPU server with NVIDIA GeForce RTX 3090 GPUs (24G memory).

A. QUALITATIVE VALIDATION OF THE IA LOSS
We first qualitatively validate the application of the IA loss by visualizing the attention maps from models trained by minimizing a loss function with and without an additional IA  loss, respectively. To this end, we first train a simple classifier using the above two alternatives and visualize the GradCAM attention maps. It is important to note that the GradCAM serves as an explanation of the classification results for practical users. By comparing the explanatory power of these attention maps, we can observe the benefits of our IA loss.
The dataset we use is the Casting Dataset [62], which consists of 7,348 grayscale images with dimensions of 300 × 300 pixels. The dataset primarily contains products from the casting manufacturing process. The training set consists of 2,875 normal images and 3,758 defect images, while the test set contains 262 normal images and 453 defect images. Defects in the dataset encompass various types, such as blow holes, pinholes, burr, shrinkage defects, mould material defects, pouring metal defects, metallurgical defects, and others. The inspection process for these products is typically carried out manually, which is time-consuming and subject to human error. Anomalies in this dataset typically manifest small areas within the product images, and there may be multiple similar defects within the same image.
Our focus is on validating the effectiveness of our proposed IA loss. To achieve this, we train two classification models which distinguish normal and anomalous examples, using the same architecture but different losses. The first model utilizes a standard binary cross entropy (BCE) loss in addition to our proposed IA loss, while the second model only employs the BCE loss. The classifier is taken to be a ResNet-50 model, trained from scratch using labeled data from the Casting Dataset. Table 1 displays the hyperparameters utilized during the training process.
In Table 2, we present the results of defect product detection using the area under the receiver operating characteristic curve (AUC) as the evaluation metric. We compare the performance of models trained with and without the IA loss. We observe from the table that, although the model trained with the IA loss performs better than the model without the IA loss, both models achieve high AUC scores. This implies that practical users may find both models effective in detecting defective products. However, in order to understand why a product is considered defective, an interpretability method needs to be applied. Next, we report the results obtained by observing the GradCAM attention maps for selected examples from the Casting Dataset. Fig. 3 presents the examples from the test data in the first row, followed by the GradCAM attention maps obtained from models trained with the IA loss in the second row and without the IA loss in the third row. From the visualization, we observe notable differences between the two sets of attention maps. In the case of the model without IA loss, the attention areas appear large and ambiguous, indicating that the model may struggle to accurately identify the correct reason for the detection. Consequently, the results from this model may be deemed untrustworthy, providing no guidance for improving the manufacturing processes. In contrast, the attention maps generated by the model with IA loss exhibit more focused and localized hot areas. Comparing the first VOLUME 11, 2023 TABLE 3. Architecture of the AE used in the experiment. The index of layers refers to the convolution layer. After each convolution layer, a LeakyReLU activation function with a slope of 0.2 is applied. For convolution layers 1-6 and 11-16, batch normalization (BatchNorm) is applied to the output activations. and second rows, we can observe that the anomaly areas more closely correspond to human intuition. Additionally, the model is capable of identifying multiple defects within a single image. Evidently, our proposed IA loss assists the model in effectively focusing on the true defect areas, thereby enhancing its trustworthiness.

B. QUANTITATIVE RESULTS FOR ANOMALY DETECTION 1) EXPERIMENTAL SET UP
In this section, we evaluate the effectiveness of our proposed method by training an AE using Algorithm 1 and obtaining the IAAM for anomaly detection based on Algorithm 2. During the training stage, only normal samples are utilized, while a combination of normal and anomalous samples is used for testing.
For our experiments, we utilize the BeanTech Anomaly Detection (BTAD) Dataset [63], which is an industrial anomaly detection dataset with pixel-level annotations. This dataset consists of RGB images representing three different industrial products, with 400 training images for Product 1, 1,000 training images for Product 2, and 399 training images for Product 3.
To facilitate our experiments, we crop the images into patches of size 256 × 256. The precise architecture of the autoencoder network used in all experiments is provided in Table 3. We employ the Adam optimizer [64] for training, and the specific hyperparameters utilized during the training stage are presented in Table 4.

2) RESULTS
In our evaluation, we use the AUC metric to measure the performance of our proposed method and comparable methods, which is in line with previous works. Table 5 presents a comparison of the performance results for anomaly detection. We conduct the experiments using the same settings for three times and record the mean and standard deviation of the results for each run.
From the results, it is clear that our method excels the benchmarks for all three products. In particular, it is better than AE models trained without IA loss. At the same time, it is consistently better than models not based on AE, including traditional models and GAN-based models.

3) ANALYSIS ON TWO IMPORTANT FEATURES
To further explore and validate the influence of different features on the model performance, we conduct a sensitivity analysis to compare the effects caused by alternative choice of hyperparameters. Specifically, there are two important features to our model: first, the layers towards which the GradCAM backpropagates in the training and testing processes respectively; second, the energy measurement index i.e., the exponential order p used in (9)-(11) for computing the IAAM. Next, we discuss the effect of changing these two features and show the numerical results according to changes. In addition to our primary focus on detection, we present the pixel-wise results in this section to provide a more comprehensive analysis.

a: GradCAM USING BACKPROPAGATION to DIFFERENT CONVOLUTIONAL LAYERS
The focus of GradCAM varies depending on the layer to which it backpropagates. Ablation studies conducted in [27] suggest that deeper convolutional layers tend to capture more high-level and abstract features of the image, while shallow layers tend to capture more local and basic features. In our context, the selection of the convolutional layer involves a tradeoff during training. Choosing a deep layer sacrifices resolution since deep layer features have smaller sizes and require upsampling before producing the GradCAM attention maps. On the other hand, choosing a shallow layer sacrifices TABLE 5. Image-level AUC results for the BTAD Dataset. We report the results for all the individual products, as well as the mean of all three products. TABLE 6. AUC scores for GradCAM using different convolutional layers on the BTAD Dataset. For each setting, there are two rows: the top row reports the pixel-wise score, and the bottom row reports the image-wise score. explanatory power since the features contain fewer semantics. Therefore, we expect a layer in the middle to be most suitable for our task.
To validate our choice, we compare alternative models by adjusting the layers used for GradCAM backpropagation during the training and testing stages, while keeping the other hyperparameters the same. We consider convolutional layers 0, 3, 6, 9, 12, 15, and 22. Table 6 displays the results for Product 1 of the BTAD Dataset. The quantitative comparison suggests that the choice of the convolutional layer does impact the model's performance, but it is not very sensitive, especially with respect to the layer used in the testing phase. Regarding the training phase, models trained on deeper convolutional layers generally perform better in terms of localization and classification tasks, indicating that deeper layers contain more useful semantics. However, using a very deep layer (Encoder.22) for training and extracting the attention map results in significantly worse performance due to the very low resolution of the attention map.
To ensure that our method is explainable during the testing phase, we visualize the performance of the alternative models in Fig. 4, while fixing the layer used during training to be Encoder.9. Testing on a shallower convolutional layer has the advantage of focusing on a smaller and concentrated area to depict the defects in the generated abnormal area. However, it may also mistakenly narrow down the estimated  abnormal area when the defects are actually large. Therefore, to ensure that our model provides good interpretability when users examine the attention map, we choose the Encoder.9 convolutional layer for training and the Encoder.0 layer for visualization.

b: ENERGY MEASUREMENT INDEX
The energy measurement index p in (9)-(11) affects the concentration of the attention heat map generated by Grad-CAM during the testing phase. By increasing the exponential order p, the gradient becomes more polarized as it amplifies the already high gradients and increases the distance between these high gradients and the lower ones. Consequently, the GradCAM heat map exhibits a more concentrated hot area since it is derived from these gradients. This specific design aims to enhance the capability of IAAM in accurately identifying the defective part of abnormal industrial data. Furthermore, since the objective is to increase differentiation among pixels, we restrict p to odd numbers. The validation results for Product 1 of the BTAD Dataset using different values of p are presented in Table 7, and the corresponding heatmaps are visualized in Figure 5. It is evident that as the energy measurement index increases, both the pixel-level and image-level anomaly detection accuracy scores improve. This validates our choice of using a larger value of p = 9 for achieving better detection results and interpretability.

V. CONCLUSION
In this paper, we have presented an interpretable deep learning-based algorithm for the detection of anomalies in industrial products. Our algorithm leverages the capabilities of neural networks for anomaly detection while ensuring model interpretability, making it suitable for industrial users who require actionable insights. The experimental results have demonstrated that our algorithm surpasses the performance of baseline anomaly detection methods in terms of accuracy and interpretability. Particularly, the attention maps generated by our algorithm offer valuable insights into its functioning and can be leveraged to enhance its performance.
While our proposed algorithm holds significant potential for various industrial applications such as quality control, product inspection, and defect prevention, we would like to acknowledge two potential limitations. Firstly, different types of data may necessitate the adjustment of hyperparameters, which should be considered alongside the selection of an appropriate threshold during practical implementation. Secondly, our model utilizes GradCAM attention maps twice, both during training and testing, which may introduce additional computational complexity.
In the future, our focus will be on enhancing the scalability of our algorithm to handle larger datasets with more complex anomalies. We will also extend our investigations to other types of data, including audio or sensor data, where interpretability is equally vital. Additionally, we will explore how our interpretability-aware algorithm can foster effective collaboration between humans and machines in industrial settings.

(Rui Jiang and Yijia Xue contributed equally to this work.)
RUI JIANG received the dual B.S. degree in data science from Duke Kunshan University and Duke University, in 2023. She is currently pursuing the master's degree in electrical and computer engineering with Duke University. Her research interests include leveraging machine learning and artificial intelligence techniques to enhance data analysis and facilitate more effective decisionmaking processes.
YIJIA XUE received the dual B.S. degree in data science from Duke Kunshan University and Duke University, in 2023. She is currently pursuing the master's degree in data science with Brown University. Her research interests include advancing the understanding and ethical implications of artificial intelligence, she is dedicated to exploring innovative approaches that promote transparency, interpretability, and fairness in machine learning models. From 2017 to 2020, he was a Postdoctoral Researcher with the Institute for Mathematics and its Applications and the School of Mathematics, University of Minnesota, Twin Cities. He joined Duke Kunshan University, in 2020, where he is currently an Assistant Professor in data science with the Division of Natural and Applied Sciences. He is also affiliated with the Zu Chongzhi Center for Mathematics and Computational Sciences (CMCS) and the Data Science Research Center (DSRC). His research interests include the intersection of applied harmonic analysis, machine learning, and signal processing.