Finding Spurious Correlations with Function-Semantic Contrast Analysis

Bykov, Kirill; Kopf, Laura; Höhne, Marina M.-C.

doi:10.1007/978-3-031-44067-0_28

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1902))

Included in the following conference series:

World Conference on Explainable Artificial Intelligence

671 Accesses
1 Altmetric

The original version of this chapter was previously published non-open access. A Correction to this chapter is available at https://doi.org/10.1007/978-3-031-44067-0_33

Abstract

In the field of Computer Vision (CV), the degree to which two objects, e.g. two classes, share a common conceptual meaning, known as semantic similarity, is closely linked to the visual resemblance of their physical appearances in the data: entities with higher semantic similarity, typically exhibit greater visual resemblance than entities with lower semantic similarity. Deep Neural Networks (DNNs) employed for classification exploit this visual similarity, incorporating it into the network’s representations (e.g., neurons), resulting in the functional similarity between the learned representations of visually akin classes, often manifesting in correlated activation patterns. However, such functional similarities can also emerge from spurious correlations — undesired auxiliary features that are shared between classes, such as backgrounds or specific artifacts. In this work, we present the Function-Semantic Contrast Analysis (FSCA) method, which identifies potential unintended correlations between network representations by examining the contrast between the functional distance of representations and the knowledge-based semantic distance between the concepts these representations were trained to recognize. While natural discrepancy is expected, our results indicate that these differences often originate from harmful spurious correlations in the data. We validate our approach by examining the presence of spurious correlations in widely-used CV architectures, demonstrating that FSCA offers a scalable solution for discovering previously undiscovered biases, that reduces the need for human supervision and is applicable across various Image Classification problems.

You have full access to this open access chapter, Download conference paper PDF

Persistent Evidence of Local Image Properties in Generic ConvNets

How Well Do Vision Transformers (VTs) Transfer to the Non-natural Image Domain? An Empirical Study Involving Art Classification

Understanding Image Representations by Measuring Their Equivariance and Equivalence

Article Open access 18 May 2018

Keywords

1 Introduction

In recent years, Deep Learning has exhibited remarkable progress in addressing the diverse challenges in the field of Computer Vision (CV) [21], such as image classification [24, 79], object detection [77, 80], and semantic segmentation [39, 50]. This success can be largely attributed to the powerful hierarchical representations learned by DNNs, which are capable of capturing intricate patterns and features within visual data [7]. However, a prevailing concern remains the inherent obscurity of the learned representations within these models. As a consequence, DNNs are often referred to as “black-box” systems, since their internal mechanisms are not readily interpretable or comprehensible to human observers.

While being powerful in capturing intricate patterns within the data, DNNs are susceptible to learning spurious correlations—coincidental relationships, often driven by an unobserved confounding factor, which may lead the model to identify and rely on such misleading patterns [31, 68]. Model dependence on such artifactual features could lead to poor generalization performance of the model on different datasets and pose a substantial risk in the case of safety-critical areas. As such, the identification and subsequent mitigation of these spurious correlations within models are crucial for the development of robust and trustworthy Computer Vision systems.

In this work, we propose a new method called Function-Semantic Contrast Analysis (FSCA)^{Footnote 1}, which aims to identify spurious correlations between the neural representations (i.e. neurons) given that target concepts of each representation are known. The proposed approach is based on the idea of analyzing the contrast between two distinct metrics: the functional distance, which measures the relationships between representations based on the correlations in activation patterns, and the knowledge-based semantic distance between concepts, these representations were trained to encode. Hypothesizing that spurious correlations frequently arise between semantically distant classes due to the influence of an unobserved factor, FSCA analyzes the contrast between these two distance measures, ultimately identifying potentially spurious pairs with high disagreement between metrics. FSCA offers a scalable approach that considerably reduces dependence on human supervision, thereby providing a robust means for the comprehensive evaluation of “black-box” models.

2 Related Works

To address the problem of the opacity of Deep Neural Networks given their widespread popularity across various domains, the field of Explainable AI (XAI) has emerged [27, 36, 49, 61, 62]. The primary goal of XAI is to provide insights into the decision-making processes of complex AI systems, allowing humans to comprehend, trust, and effectively manage these systems [67]. One important class of explainability approaches, known as post-hoc explainability methods, seeks to explain the decision-making processes of trained models without interfering in their training procedure [38, 70]. These methods can be broadly classified into two types based on the scope of their analysis: local explanation methods, which focus on explaining model decisions for specific inputs, and global explanation methods, which aim to interpret general decision-making strategies, allowing for audits and investigations of models across diverse populations and shedding light on the roles of various model components.

Local explanation methods often provide explanations to the decision-making process on a given data point in the form of attribution maps, distributing relevance scores among features of the input, emphasizing the most critical attributes for the prediction. Various methods, such as Layer-wise Relevance Propagation (LRP) [3], GradCAM [64], LIME [58], Integrated Gradients [73], and SHAP [45] have been introduced, and have proven to be effective in explaining the decision-making process in Computer Vision models [15, 75], including Bayesian Neural Networks [13, 19]. To tackle the interpretability issue of attribution maps, several enhancing techniques were introduced, such as SmoothGrad [71], NoiseGrad [18], and Augmented GradCam [51]. Significant focus has been devoted to examining and assessing the effectiveness of local explanation techniques [30, 33, 34]. However, the primary limitation of local explanation methods lies in their ineffectiveness in probing the unknown behaviors of the models. While they prove beneficial for examining existing, known hypotheses, they are ineffective when it comes to uncovering unknown hypotheses, including the identification of unknown spurious correlations and shortcuts [1].

Conversely, global explanation methods aim to interpret the general decision-making strategies employed by the models by shedding light on the roles of specific components such as neurons, channels, or output logits, which are often referred to as representations. Such approaches enable a more general insight into the decision-making strategies of the models, thus facilitating the discovery of unknown and unexpected behavior within these models. Methods such as Network Dissection [4, 5], Compositional Explanations of Neurons [52], and MILAN [35] have been developed to explain the functionality of these latent representations by associating them with human-comprehensible concepts. Activation Maximization Methods [25], on the other hand, aim to explain the concept behind the model’s representations by identifying the inputs that maximally activate a particular neuron or layer in the network and hence, visualize the features that have been learned by the specific representation. These activation maximization images, also referred to as signals, embody the features that the representations have learned to detect and they could be either sampled from an existing dataset [11, 17] or generated artificially through an optimization procedure [53,54,55].

2.1 Spurious Correlations in Computer Vision Models

While excelling at various Computer Vision tasks by being able to learn complex and intricate representations of the data, Deep Neural Networks are susceptible to learning spurious correlations from data. Such correlations represent apparently related variables that, upon closer inspection, reveal a connection rooted in mere coincidence or an underlying, often obscured, factor [31, 68]. This phenomenon, commonly referred to as “Shortcut learning” or “Clever-Hans effect” [2, 42, 76], often manifesting in a strong contrast between desired and actual learning strategy within the model. In the following discussion, we provide a general overview of these correlations, which have been identified across a range of CV tasks.

Co-occurring Objects. In the field of image classification, as well as in other CV subdomains, images often contain not only the primary object of interest but also secondary objects in the background. Deep Neural Networks (DNNs) trained on such data can establish associations between the primary objects and frequently occurring secondary objects. Examples of such learned correlations could be fingers and band-aids [69], trains on tracks, or bees and flowers [40]. [66] observed that classifiers heavily depend on the context in which objects are situated, performing poorly in less common contexts, e.g., the absence of a typical co-occurring object. In experiments using the MS COCO dataset [44] it was discovered that classifiers for specific classes, like “Keyboard”, “Mouse”, and “Skateboard”, are highly sensitive to contextual objects and exhibit poor performance when encountered outside their usual context, such as for instance, keyboards often go unrecognized without a nearby monitor [66].

Object Backgrounds. A prevalent type of spurious correlation arises between background features and target labels [78]. For example, a classifier may rely on a snowy background to identify huskies in images, instead of focusing on the target feature “Husky” [58]. Such correlations could stem from selection bias in training datasets, as demonstrated by the Waterbirds dataset [60], where the target label (“Waterbird” or “Landbird”) is spuriously correlated with background features (water or land) in most training images [29]. Another example involves classifying cows and camels [6], where the target label (“Cow” or “Camel”) is spuriously correlated with background features (green fields or desert) in most images. [69] identified numerous instances of background spurious features in ImageNet.

Biases and Stereotypes. Racial and gender biases stand as notable examples of undesired behavior, leading to adverse real-world consequences, particularly for marginalized groups. These biases can materialize in various ways, such as underdiagnosis in chest radiographs among underrepresented populations [65], or racially discriminatory facial recognition systems that disproportionately misidentify darker-skinned females [16]. Researchers have found instances of racial, religious, and Americentric biases embedded in the representations of the CLIP model [57]. Generative models like Stable Diffusion [59] also exhibit biases that perpetuate harmful racial, ethnic, gendered, and class stereotypes [8]. Other harmful spurious correlations have been found, such as associations between skin tone and sports or gender and professions [72, 82].

Artifacts. Spurious correlations often arise from the presence of artifacts in images across various classes. These artifacts are secondary objects that hold no semantic connection with the primary class, and their coincidental association is unnatural and irregular. For instance, in the ImageNet dataset, Chinese watermarks have been found to influence numerous classes, such as “carton”, “monitor”, “broom”, “apron” and “safe” [20], resulting in up to 26.7% drop in performance when watermarks are added to every class in the validation dataset [43]. This phenomenon has also been observed in the PASCAL VOC 2007 dataset [26], where a photographer’s watermark frequently appears in images from the “horse” class [42]. Consequently, the trained model inadvertently learns this association, affecting its overall performance. Additionally, spurious correlations caused by artifacts have been noted in medical applications, such as skin lesion detectors, where artifacts like rulers and human-made ink markings or stains are present [10]. Similarly, hospital-specific metal tokens in chest X-ray scans [28, 81] and radiologist input artifacts in brain tumor MRI classification [76] have been found to impact the accuracy of these applications.

2.2 Finding and Suppressing Spurious Correlations

The primary challenge in identifying spurious correlations stems from the lack of a concrete definition or criteria that differentiate them from “permissible” correlations. This ambiguity is reflected in the majority of methods’ reliance on extensive human oversight. Spectral Relevance Analysis (SpRAy) was designed to aid in the identification of spurious correlations by clustering local attribution maps for future manual inspection [2, 42]. However, the dependence on local explanations restricts the range of spurious correlations identified to basic, spatially static artifacts. This limitation necessitates a significant amount of human supervision and tailoring of the method’s various hyperparameters to suit the specific problem at hand, which subsequently constrains the detection of unknown, unexpected correlations. The Data-Agnostic Representation Analysis (DORA) method approaches the problem of spurious correlations from a different angle, by analyzing relationships between internal representations [17]. The authors introduced the functional Extreme-Activation distance measure between representations, demonstrating that representations encoding undesired spurious concepts are often observed to be outliers in this distance metric.

A subsequent challenge is to revise or update the model after identifying spurious correlations. The Class Artifact Compensation framework was introduced, enabling the suppression of undesired behavior through either a fine-tuning process or a post-hoc approach by injecting additional layers [2]. An alternative method involves augmenting the training dataset after uncovering an artifact, so that the artifact is shared among all data points, rendering it an unusable feature for recognition by the model [43]. To suppress spurious behavior in transfer learning, a straightforward method was proposed to first identify representations that have learned spurious concepts, and then, during the fine-tuning phase, exclude these representations from the fine-tuning process [20].

2.3 Visual-Semantic Relationship in Computer Vision

In the field of Computer Vision, both visual and semantic similarities play crucial roles in the comprehension and interpretation of images and their underlying concepts. Visual similarity refers to the resemblance between two images based on their appearances, whereas semantic similarity denotes the extent of relatedness between the meanings or concepts associated with the images. A widely accepted definition of semantic similarity takes into account the taxonomical or hierarchical relationships between the concepts [32]. There is a general observation that semantic and visual similarities tend to be positively correlated, as an increase in semantic similarity between categories is typically accompanied by a rise in visual similarity [14, 23]. DNNs trained on Computer Vision tasks demonstrate the ability to indirectly learn class hierarchies [9].

3 FSCA: Function-Semantic Contrast Analysis

In this work, we propose a novel method called Function-Semantic Contrast Analysis (FSCA). This method allows to identify pairs of output representations that may possess spurious associations. FSCA capitalizes on the functional distance between representations, which can be calculated using the activations of representations on the given dataset, and the knowledge-based semantic distance between concepts, obtained from taxonomies or other knowledge databases. By examining the contrast between the two distance metrics, our primary focus lies in revealing pairs of representations that exhibit a high degree of functional similarity but whose underlying concepts are semantically very different, i.e., which are located in the first quadrant of Fig. 1. While disagreements between functional and semantic distances are often natural, as some concepts may share visual similarity while remaining semantically distinct [14], we observe that such behavior frequently results from undesired correlations present in the training data.

3.1 Method

Let us consider a neural network layer $\mathcal {F} = \{f_1, ..., f_k\}$, consisting of k distinct functions, $f_i(x): \mathbb {D} \rightarrow \mathbb {R}, \forall i \in {1, ..., k}$, referred to as neural representations, that are mappings from the data domain $\mathbb {D}$ to the activation of the i-th neuron in the layer. We further assume that the concepts associated with each representation are known, and define a set of concepts $\mathcal {C} = \{c_1, ..., c_k\}$, where $c_i$ denotes the concept underlying the representation $f_i(x), \forall i \in {1, ..., k}$. Thus, we can define the set $\mathcal {P} = \{(f_1, c_1), \dots , (f_k, c_k)\} \subset \mathcal {F}\times \mathcal {C}, $ as a collection of representation-concept pairs.

We consider that two distance metrics, $d_{\mathcal {F}}$ and $d_{\mathcal {C}}$, that are defined on the respective sets $\mathcal {F}$ and $\mathcal {C}$: $ d_{\mathcal {F}}: \mathcal {F} \times \mathcal {F} \rightarrow \mathbb {R}, \quad d_{\mathcal {C}}: \mathcal {C} \times \mathcal {C} \rightarrow \mathbb {R}, $ where $d_{\mathcal {F}}$ is measuring the functional distance between learned representations in the networks, and $d_{\mathcal {C}}$ measures the semantic distance between the concepts these representations are trained to encode. Accordingly, we define two $k \times k$ distance matrices, F and C, as follows:

$$\begin{aligned} F = \begin{bmatrix} d_{\mathcal {F}}(f_1, f_1) &{} \dots &{} d_{\mathcal {F}}(f_1, f_k) \\ \vdots &{} \ddots &{} \vdots \\ d_{\mathcal {F}}(f_k, f_1) &{} \dots &{} d_{\mathcal {F}}(f_k, f_k) \end{bmatrix}, \quad C = \begin{bmatrix} d_{\mathcal {C}}(c_1, c_1) &{} \dots &{} d_{\mathcal {C}}(c_1, c_k) \\ \vdots &{} \ddots &{} \vdots \\ d_{\mathcal {C}}(c_k, c_1) &{} \dots &{} d_{\mathcal {C}}(c_k, c_k) \end{bmatrix}. \end{aligned}$$

(1)

Given two neural representations $f_i, f_j \in \mathcal {F}$ with corresponding concepts $c_i, c_j \in C,$ one can assess the contrast between functional and semantic distance by comparing the values between $d_{\mathcal {F}}(f_i, f_j)$ and $d_{\mathcal {C}}(c_i, c_j).$ However, such an approach might not be optimal since functional and semantic distance measures can possess distinct scales. To overcome this challenge, we suggest a non-parametric approach, where the ranks of distances within their corresponding distributions are analyzed instead.

In the following, a collection of unique distances are collected from the upper triangular portion of the distance matrices, including the main diagonal and all elements above it:

$$\begin{aligned} F_{\varDelta } &= \left\{ d_{\mathcal {F}}(f_i, f_j) \mid \forall i \in \{1, \dots , k\}, \forall j \in \{i, \dots , k\}\right\} ,\end{aligned}$$

(2)

$$\begin{aligned} C_{\varDelta } &= \left\{ d_{\mathcal {C}}(c_i, c_j) \mid \forall i \in \{1, \dots , k\}, \forall j \in \{i, \dots , k\}\right\} . \end{aligned}$$

(3)

We define matrices $F^*, C^*$ as

$$\begin{aligned} F^* = \begin{bmatrix} d^*_{\mathcal {F}}(f_1, f_1) &{} \dots &{} d^*_{\mathcal {F}}(f_1, f_k) \\ \vdots &{} \ddots &{} \vdots \\ d^*_{\mathcal {F}}(f_k, f_1) &{} \dots &{} d^*_{\mathcal {F}}(f_k, f_k) \end{bmatrix}, \quad C^* = \begin{bmatrix} d^*_{\mathcal {C}}(c_1, c_1) &{} \dots &{} d^*_{\mathcal {C}}(c_1, c_k) \\ \vdots &{} \ddots &{} \vdots \\ d^*_{\mathcal {C}}(c_k, c_1) &{} \dots &{} d^*_{\mathcal {C}}(c_k, c_k) \end{bmatrix}, \end{aligned}$$

(4)

where $\forall i,j \in \{1, \dots , k\}$

$$\begin{aligned} d^*_{\mathcal {F}}(f_i, f_j) = \textrm{cdf}^{-1}_{F_{\varDelta }}\left( d_{\mathcal {F}}(f_i, f_j)\right) , \quad d^*_{\mathcal {C}}(c_i, c_j) = \textrm{cdf}^{-1}_{C_{\varDelta }}\left( d_{\mathcal {C}}(c_i, c_j)\right) , \end{aligned}$$

(5)

and $\textrm{cdf}^{-1}$ correspond to the inverse of the cumulative distribution function (percentile).

Finally, for every pair of neural representations we define the function-semantic contrast score based on the difference between the percentile of the functional distance, and the percentile from the semantic distance between corresponding concepts.

Definition 1

Given $\mathcal {P} = \{(f_1, c_1), \dots , (f_k, c_k)\} \subset \mathcal {F}\times \mathcal {C},$ as a collection of representation-concept pairs, corresponding to the outputs of a DNN and two metrics $d_{\mathcal {F}}, d_{\mathcal {C}}$ defined on $\mathcal {F}, \mathcal {C},$ respectively. Furthermore, let $F_{\varDelta }, C_{\varDelta }$ be a collection of unique distances among neural representations and concepts, respectively. For $p_i, p_j \in \mathcal {P}$ we define contrast score as

$$\begin{aligned} \textrm{fsc}\left( p_i, p_j\right) = \textrm{cdf}^{-1}_{C_{\varDelta }}\left( d_{\mathcal {C}}(c_i, c_j)\right) - \textrm{cdf}^{-1}_{F_{\varDelta }}\left( d_{\mathcal {F}}(f_i, f_j)\right) . \end{aligned}$$

(6)

Contrast scores range from −1 to 1, with high contrast scores indicating cases where representations display significant functional similarity, while the underlying concepts are semantically distinct. This particular type of function-semantic relationship is our primary focus and is illustrated in Fig. 1.

In practice, to detect spurious correlations within the output representations, each pair of representations is assigned a contrast score, and pairs are sorted in descending order. Pairs with the highest contrast scores highlight the discrepancy between the model’s perception and the human-defined semantic distance. Subsequently, each pair can be manually investigated by a human to determine the causal reason for such contrast.

3.2 Selecting a Distance Metric Between Representations

A crucial aspect of our proposed method’s performance lies in the choice of an appropriate distance metric for the comparison of the output representations, which must reflect the similarity in activation patterns between pairs of representations within the network. Consider the dataset $\mathcal {D} = \{x_1, \dots , x_N\} \subset \mathbb {D},$ consisting of N independent and identically distributed data points from the data distribution. For a layer $\mathcal {F}$ with k representations, we define vector $A_i = ({f_i(x_1), \dots , f_i(x_N)}) \subset \mathbb {R}^N, \forall i \in \{1, \dots , k\},$ which contains the activations of the i-th representation across the dataset. We assume that all vectors $A_i, \forall i \in \{1, \dots , k\}$ are standardized, with a sample mean of 0 and a standard deviation of 1.

Our approach permits flexibility in choosing the distance metric between representations. In this work, we utilize the Extreme-Activation (EA) distance metric, derived from the analysis of natural data [17]. Drawing inspiration from the study of Activation-Maximization signals (AMS), which are data points that maximally activate a given representation, the EA distance quantifies the extent to which two representations are activated by each other’s AMS. This provides insights into how the representations are influenced by the features present in the AMS.

To calculate the pair-wise Extreme-Activation distance, the dataset $\mathcal {D}$ is partitioned into n disjoint blocks, $D = \bigcup _{i = 1}^n D_t, D_t = \left\{ x_{td+1},..., x_{(t+1)d+1}\right\} , \forall t \in \{0, ..., n-1\}$, each of length d. Subsequently, for each representation $f_i \in \mathcal {F}, \forall i \in \{1, \dots , k\}$, we define a set of natural Activation-Maximization signals (n-AMS) as $S_i = \left\{ s^i_1,..., s^i_n\right\} ,$ where

$$\begin{aligned} s^i_t = \mathop {\mathrm {arg\,max}}\limits _{x\in D_t} f_i\left( x\right) , \forall t \in \{0, ..., n-1\}. \end{aligned}$$

(7)

For every two representations $f_i, f_j \in \mathcal {F}$, we define their pair-wise representation activation vectors (RAVs) $r_{ij}, r_{ji}$ as:

$$\begin{aligned} r_{ij} = \begin{pmatrix} \frac{1}{n}\sum _{t=1}^n f_i\left( s^i_t\right) \\ \frac{1}{n}\sum _{t=1}^n f_j\left( s^i_t\right) \end{pmatrix}, \quad r_{ji} = \begin{pmatrix} \frac{1}{n}\sum _{t=1}^n f_i\left( s^j_t\right) \\ \frac{1}{n}\sum _{t=1}^n f_j\left( s^j_t\right) \end{pmatrix}. \end{aligned}$$

(8)

Subsequently, we define the pair-wise Extreme-Activation distance between two representations as the cosine of the angle between their corresponding RAVs.

Definition 2 (Extreme-Activation distance)

Let $f_i, f_j$ be two neural representations, and $r_{ij}, r_{ji}$ be their pair-wise RAVs. We define a pair-wise Extreme-Activation distance as

$$\begin{aligned} d_\mathcal {F}\left( f_i, f_j\right) = \frac{1}{\sqrt{2}}\sqrt{1 - \cos \left( r_{ij}, r_{ji}\right) }, \end{aligned}$$

(9)

where $\cos (A, B)$ is the cosine of the angle between vectors A, B.

Extreme-Activation distance quantifies the activation of n-AMS between two representations, offering a valuable metric for examining the relationships among intricate non-linear functions [17]. In contrast to other metrics, such as Pearson correlation, the EA distance utilizes a small subset of n-AMS for each representation, enabling a straightforward visual inspection of Activation-Maximization signals. This metric, grounded in the measure of how two representations are co-activated on their most activating signals, allows practitioners to easily discern the shared visual features between two sets of n-AMS.

Figure 2 demonstrates the EA distance between two representations, corresponding to the “Snow leopard” and “Crossword puzzle” classes, derived from an ImageNet [22] pre-trained DenseNet161 network [37]. This figure enables an effortless assessment of the functional similarity between the two representations. We can observe that the RAVs are not perpendicular, implying a functional dependence between the representations. Moreover, a visual inspection of the n-AMS for both representations reveals a similar black-and-white texture pattern that both representations have learned to detect.

EA distance measure varies between 0 and 1. Low values correspond to small angles between RAVs, indicating that both representations are highly activated on each other’s AMS. Perpendicular RAVs, which represent cases where the representations are indifferent to each other’s AMS, yield a distance equal to $\frac{1}{\sqrt{2}} \approx 0.7071$. Higher EA distance signifies situations where the n-AMS of the representations negatively affect one another, meaning the AMS of one representation deactivates the other.

3.3 Selecting a Distance Metric Between Concepts

The choice of functional and semantic distances between representations and concepts, respectively, is critical. Semantic distance should encapsulate human-defined relationships, particularly ensuring that these distances do not rely on spurious or undesired correlations. Function-Semantic Contrast Analysis (FSCA) can utilize any concept metric, including expert-defined knowledge-based distance measures. For example, semantic distances can be derived from the WordNet database [48], which groups English words into synsets connected by semantic relationships.

In this work, we employ the Wu-Palmer (WUP) distance metric defined on the WordNet taxonomy database. The WUP distance is based on the depth of the least common subsumer (LCS), which is the most specific synset that is an ancestor of both input synsets [56]. The WUP distance computes relatedness by considering the depth of the LCS and the depths of the input synsets in the hierarchy.

Definition 3

Let $c_i, c_j \in \mathcal {C},$ be two concepts, and let $w_i, w_j$ be the corresponding synsets from the WordNet taxonomy database. The Wu-Palmer distance is defined as:

$$\begin{aligned} d_{\mathcal {C}}(c_i, c_j) = 1 - 2\frac{l(r, lcs(w_i, w_j))}{l(r, w_i) + l(r, w_j)}, \end{aligned}$$

where lcs(x, y) is the Least Common Subsumer [56] of two synsets x and y, r is the taxonomy root, and l(x, y) is the length of the shortest path between WordNet synsets x, y.

The WUP distance takes into account the specificity of the common ancestor, rendering it more robust to the structure of the WordNet hierarchy in comparison to other semantic distance metrics such as Shortest-Path distance or Leacock-Chodorow [17, 63]. Moreover, the Wu-Palmer distance offers a more fine-grained measure of relatedness. Figure 3 demonstrates the Wu-Palmer distance between 1000 ImageNet classes that share natural connections to WordNet synsets. The structure of the semantic distance matrix (center) aligns with the location of the primary groups of classes within the dataset, as illustrated in the figure to the right.

4 Experiments

This section provides a detailed examination of various implemented experiments. These include an evaluation of the performance of the FSCA method in light of the given ground truth. Furthermore, we explore the practical application of FSCA to the widely-employed DenseNet-161 model. Finally, we conduct a broad assessment of ImageNet-trained models, focusing on the relationship between performance and the functional similarities between representations.

4.1 Evaluation Given the Ground Truth

To evaluate the effectiveness and suitability of the proposed methodology, we investigated its capability to identify instances of representation pairs previously acknowledged to exhibit spurious correlations. This analysis utilized two ImageNet-trained models, specifically GoogLeNet [74] and DenseNet-161 [37], both previously reported to possess a significant proportion of output representations susceptible to watermark text detection [20].

Consider $\mathcal {P}_G, \mathcal {P}_D$ as collections of representation-concept pairs for 1000 output representations - essentially, the pre-softmax output logit representations from the two networks. For each of these models, employing a technique akin to that described in [20], we identified subsets $\mathcal {Z}_G \subset \mathcal {P}_G, \mathcal {Z}_D \subset \mathcal {P}_D$ of representation-concept pairs with high discriminatory capability (AUROC > 0.9) towards watermarked images, implying that such representations exhibit spurious correlations towards watermarked images and generally assign higher activations to the images, where the watermark is present. For GoogLeNet there were found $|\mathcal {Z}_G| = 21$ output representations, such as “carton”, “broom”, “apron” and others, while for DenseNet-161 there were found $|\mathcal {Z}_D| = 22$ high-discriminatory representations. We applied FSCA to both sets $\mathcal {P}_G$ and $\mathcal {P}_D$ using the functional Extreme-Activation distance, computed over $n = 10$ n-AMS with parameter $d = 5000$. For the semantic distance between concepts, we chose the Wu-Palmer distance, considering the inherent link between ImageNet concepts and the WordNet taxonomy. After calculating Function-Semantic Contrast (FSC) scores for each pair of representations, we compared these scores between two groups: those pairs known to be susceptible to spurious correlations and the rest. More specifically, we defined two sets:

$$\begin{aligned} \text {FSC}_G^{-} &= \left\{ \text {fsc}(p_i, p_j) \mid \forall p_i, p_j \in \mathcal {P}_G, i > j, p_j \in \mathcal {P}_G \setminus \mathcal {Z}_G \right\} ,\end{aligned}$$

(10)

$$\begin{aligned} \text {FSC}_G^{+} &= \left\{ \text {fsc}(p_i, p_j) \mid \forall p_i, p_j \in \mathcal {Z}_G, i > j\right\} , \end{aligned}$$

(11)

where $\text {FSC}_G^{-}$ denotes the set of FSC scores for representation-concept pairs from GoogLeNet, in which at least one representation was not identified as being susceptible to Chinese watermark detection. Conversely, $\text {FSC}_G^{+}$ represents the FSC scores for the representation-concept pairs where both representations were recognized to be susceptible to spurious correlations. We similarly defined sets $\text {FSC}_D^{-}$ and $\text {FSC}_D^{+}$ for the DenseNet-161 model. GoogLeNet and DenseNet-161, 210 and 231 pairs of representations were respectively flagged as exhibiting spurious correlation, among a total of 499500 pairs.

Figure 4 visually presents the differences between the $\text {FSC}^{-}$ and $\text {FSC}^{+}$ distributions for both models. For each model, the FSC scores for “watermark” pairs, defined as pairs of representations where both classes were identified as susceptible to watermark detection, are consistently higher than those for other representation pairs. This observation was further corroborated by the Mann-Whitney U test [46] under a standard significance level (0.05).

If we constrain the FSCA analysis solely to representation pairs exhibiting substantial functional similarity, specifically those falling within the top 2.5% ($d^*_{\mathcal {F}} \le 0.025$), the results for GoogLeNet indicate 8 spurious pairs (out of 210) among the top 1000 pairs with the highest FSC, 38 within the top 5000, and 52 within the top 10000. Implementing the same methodology with DenseNet-161 yields no spurious pairs (out of 231) within the top 1000, 32 pairs within the top 5000, and 42 within the top 10000. This infers that by focusing exclusively on representation pairs with high functional similarity, we can recover 25% (52 pairs out of 210) and 18% (42 pairs out of 231) of pairs displaying known spurious correlations, merely by scrutinizing 2% (10000 pairs) of the total representation pairs. Our results suggest that FSCA tends to allocate high FSC scores to pairs of representations known to be susceptible to spurious correlations, thereby lending further credibility to the proposed methodology. However, it’s important to note a limitation in this experiment: while we have a knowledge of spurious correlations due to the reliance on Chinese watermarks, we cannot ascertain potential correlations among other pairs.

4.2 Identifying Spurious Correlations in ImageNet Trained DenseNet-161

To demonstrate the potential utility and relevance of our proposed approach, we investigated in detail the results of the FSCA of the widely-used ImageNet-trained DenseNet-161 model. Hyperparameters for the analysis were kept the same as in the previous experiment, namely, we employed functional Extreme-Activation distance metric with $n = 10$, allowing us to analyze the co-activation of representations based on the 10 Activation-Maximization images, providing a straightforward method for interpreting the shared features that the representations are trained to recognize. Due to the impracticality of examining all pairs, our analysis focused solely on pairs with high functional similarity based on Extreme-Activation, specifically those within the top 1% of the smallest distances, and in total 1000 with the highest contrast scores were analyzed. We report several significant categories of correlations observed between the logit class representations of the DenseNet-161 model, found by the FSCA method.

Shared Visual Features. Since semantic distance offers a metric for evaluating the conceptual differences between entities, it is natural for some concepts, despite being semantically distinct, to share visual features with one another. Such relationships between representations could be considered natural to the image classification model.

Some of the most intriguing relationships we observed include the functional similarity between representations corresponding to the classes “geyser”, “steam locomotive”, “volcano”, and “cannon”, owing to the shared visual feature of smoke fumes. Representations for the classes “menu”, “website”, “envelope”, “book jacket”, and “packet” exhibit a high degree of functional relationship due to the shared textual feature, which could be considered a natural characteristic for such classes. Furthermore, representations for crossword “puzzle” and “snow leopard” share similar behavior in detecting black and white grid patterns (illustrated in Fig. 2), “waffle iron” and “manhole cover” representations display a high degree of similarity due to their ability to detect specific grid patterns, and “mailbox” and “birdhouse” logits demonstrate a strong degree of co-activation of each other’s n-AMS, resulting from the visual similarity of the objects. Several of the described relationships are illustrated in Fig. 5, by the pair-wise RAVs between their representations together with their n-AMS.

Co-occurring and Mislabeled Objects. This category refers to objects that frequently co-occur, allowing the network to learn associations between two objects, due to the constraints of the classification problem to assign one class per image. Examples of such relationships can be found in the representations of “cup”, “espresso maker”, “coffeepot”, and “teapot”, all reported to frequently co-occur in each other’s image backgrounds as secondary objects. Intriguing examples include the high similarity of “plate” and “dungeness crab” representations, as the n-AMS for the crab representation illustrates an already prepared crab on a plate, “cardoon” (flower) and “bee” representations, and “hay” and “harvester”. Additionally, we detected that functional similarity can be caused by misattribution of labels, such as between “tiger” and “tiger cat”, where we were able to determine that the latter class also contains images of tigers, even though the class description states that it is a specific breed of cats exhibiting textural patterns of black stripes, similar to tigers.

Object Backgrounds. FSCA analysis of the functionally most similar representations yielded several groups of representations, that exhibit functional similarity due to the shared background only. Such a conclusion could be derived from the fact that representations are significantly co-activated with each other’s n-AMS, while the only shared feature among them is the background.

Snow We can consider the snow background among the most interesting examples of such spurious correlations. This feature is shared between representations such as “snowmobile”, “ski”, and “shovel”.
Mountain The commonality of the mountain background is observed across representations including “alp”, “marmot”, “mountain bike”, and “mountain tent”, with the latter two possessing descriptive references to the background within their respective names.
Underwater The underwater background is shared between representations such as “snorkel”, “coral reef”, “scuba diver”, and “stingray”, which collectively share a bluish shade and describe natural marine environments.
Savannah The shared background of the savannah, characterized by golden or green grasslands, is observed across representations of animal species such as “zebra”, “impala”, “gazelle”, “prairie chicken”, and “bustard”.
Water The water background encompasses the view of the water surface, as well as the presence of animals or objects above the water, including “pier”, “speedboat”, “seashore”, and “killer whale”.

Artifacts. Among the reported pairs of representations yielded by FSCA, we were able to detect representations “safe”, “scale”, “apron”, “backpack”, “carton”, and “swab” that exhibited high functional similarities caused by the presence of Chinese watermarks in their n-AMS. This result is consistent with previous works that reported these classes as having a strong ability to differentiate between watermarked and non-watermarked images [20].

By employing FSCA we were able to identify the new unknown spurious correlation, manifesting in the dependence of several classes on the presence of young children in the image. A high functional similarity was reported between the “diaper” class, naturally containing a lot of young children in the images, and several other representations, including the “rocking chair” representation. Inspection of the training dataset revealed a significant amount of images of children (without diapers) sitting in a “rocking chair”. Since the ImageNet dataset does not have a specific class dedicated to children, this represents a latent factor that corresponds to the functional similarity of such classes.

Figure 6 illustrates the Extreme-Activation distance, alongside with the n-AMS for the representations “diaper” and “rocking chair”, and several examples from the ImageNet-2012 training dataset from the class “rocking chair”. This spurious correlation was unexpected and could be considered artifactual for this class. The fact that “rocking chair” employs the presence of children as additional evidence for prediction is demonstrated in Fig. 7, where the model’s prediction shifts towards the “rocking chair” class after adding an image of a child on top of the image of the chair. Furthermore, FSCA reported the following representations to have high functional similarity with the “diaper” representation: “crib”, “bassinet”^{Footnote 2}, “cradle”, “hamper”, “band-aid”, “bib”^{Footnote 3}, and “bath towel”.

Another intriguing and previously unknown spurious correlation that we identified involves the dependence of several classes on images of fishermen. This correlation was observed between the “reel” class and several fish classes, namely “coho” and “barracouta”. Figure 8 furnishes evidence that the relationship between the “reel” and “coho” representations is primarily based on the presence of fishermen, often paired with a specific water background. This is further underscored by the model’s prediction given an image of a fisherman - the model confidently assigns a fish label to the image, despite the absence of any fish in the picture. Although this correlation bears similarity to the previously reported correlation between the “tench” and the presence of human fingers [12], our findings show that representations like “coho” and “barracouta” display a broader dependency on the existence of a fisherman within the image. This is evidenced even in instances where human fingers are not visible in the image, as exemplified by the right-hand image in Fig. 8.

Summary. Our examination of the top 1000 pairs of representations, as ranked by function-semantic contrast scores, suggests that around half of the detected correlations might be explained as “unintended” correlations. These correlations can be linked to the frequent co-occurrence of objects (32%), dependencies on shared backgrounds (12.3%), or a shared unnatural factor (2.6%), as visualized in Fig. 9. Nevertheless, we recognize that such categorization might oversimplify the actual interconnections between representations. It is uncommon for a single specific factor to account for the functional similarities observed between neural representations.

4.3 Better Models Tend to Have Fewer Associations

The analysis of the DenseNet-161 models surfaced a variety of correlations, including those that might be deemed natural as well as those potentially regarded as undesired or even harmful. Subsequently, we were motivated to examine whether higher-performing models exhibited fewer correlations among their output representations. For this investigation, we gathered 78 different ImageNet classification models from the Torchvision library [47], with the weight parameters set to “IMAGENET1K_V1”. For each model, we computed the pairwise Extreme-Activation distance between output representations using the ImageNet-2012 validation dataset, leveraging parameters analogous to those in our preceding experiment. This process yielded 78 distance matrices $F_i \in \mathbb {R}^{1000 \times 1000}, i \in [1, 78].$ To quantify the degree of correlation between output representations within models, we calculated the Frobenius norm of the difference between the Extreme-Activation distance matrix $F_i$ and a matrix Q for each of the 78 models:

$$\begin{aligned} Q = \frac{1}{\sqrt{2}}\left( \mathbbm {1} - \mathbb {I}\right) , \end{aligned}$$

(12)

where $\mathbbm {1}$ is a $k \times k$ matrix with all elements equal to 1, $\mathbb {I}$ is the identity matrix, and $k = 1000$. Matrix Q is the distance matrix between representations in the ideal scenario of total disentanglement. Hence, the norm of the difference between $F_i$ and Q serves as an indicator of the interconnectivity of the representations.

Figure 10 illustrates the correlation between the extent to which the models’ representations are correlated (top graph, y-axis) and their Top-5 performance on ImageNet (top graph, x-axis). Our observations indicate that models delivering superior performance achieve a lower norm, suggesting that enhanced performance aligns with better disentanglement and reduced correlation among output layer representations. The bottom graph in the same figure provides a visual representation of this, displaying distance matrices calculated across various networks.

5 Discussion and Conclusion

In the present work, we introduce a new technique, Function-Semantic Contrast Analysis (FSCA), designed to uncover spurious correlations between representations, when target concepts are known. FSCA reduces human supervision by systematically scoring and ranking representation pairs based on the function-semantic contrast. We have demonstrated the feasibility of our approach by uncovering several potentially unrecognized class correlations as well as rediscovering known correlations.

The primary limitation of our method relates to its reliance on a semantic metric that, despite broadly reflecting visual similarity between objects, isn’t entirely accurate in assessing visual similarity between two concepts. We aim to research alternative semantic metrics, including expert-defined ones, that take visual similarity into account in our future work. Another challenge is the undefined nature of spurious correlations, necessitating human oversight to discern whether a correlation is harmful. Nevertheless, our study found that analyzing 1000 representation pairs from the DenseNet-161 model only required around 3 human hours, uncovering previously undetected artifacts, and hence, the demand for human supervision is significantly reduced by FSCA.

While we have demonstrated the applicability of FSCA on ImageNet-trained networks, this approach is scalable in terms of its application to other image classification problems. Since WordNet encompasses a broad range of synsets, it is often quite simple to connect classes and concepts, as shown in the example of CIFAR-100 [17, 41]. Moreover, semantic distance can be measured using other knowledge-based datasets or by relying on expert assessments.

As Deep Learning approaches are becoming more popular in various disciplines, it becomes increasingly imperative to audit these models for potential biases, ensuring the cultivation of fair and responsible machine learning frameworks. Our presented FSCA method offers a scalable solution for practitioners seeking to explain the often opaque and enigmatic behavior of these learning machines. By doing so, we contribute to a more transparent and ethically-grounded understanding of complex deep learning systems, promoting responsible and trustworthy AI applications across various domains.

Change history

14 February 2024
A correction has been published.

Notes

1.
The code for FSCA can be accessed via the following GitHub link: https://github.com/lapalap/FSCA.
2.
A basket (usually hooded) used as a baby’s bed.
3.
Top part of an apron; covering the chest.

References

Adebayo, J., Muelly, M., Abelson, H., Kim, B.: Post hoc explanations may be ineffective for detecting unknown spurious correlation. In: International Conference on Learning Representations (2022)
Google Scholar
Anders, C.J., Weber, L., Neumann, D., Samek, W., Müller, K.R., Lapuschkin, S.: Finding and removing clever hans: using explanation methods to debug and improve deep model. Inf. Fusion 77, 261–295 (2022)
Article Google Scholar
Bach, S., Binder, A., Montovon, G., Klauschen, F., Müller, K.R., Samek, W.: On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation. PLoS ONE 10(7), e0130140 (2015)
Google Scholar
Bau, D., Zhou, B., Khosla, A., Oliva, A., Torralba, A.: Network dissection: quantifying interpretability of deep visual representations. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6541–6549 (2017)
Google Scholar
Bau, D., et al.: GAN dissection: visualizing and understanding generative adversarial networks. arXiv preprint arXiv:1811.10597 (2018)
Beery, S., Van Horn, G., Perona, P.: Recognition in terra incognita. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 456–473 (2018)
Google Scholar
Bengio, Y., Courville, A., Vincent, P.: Representation learning: a review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 35(8), 1798–1828 (2013)
Article Google Scholar
Bianchi, F., et al.: Easily accessible text-to-image generation amplifies demographic stereotypes at large scale (2022)
Google Scholar
Bilal, A., Jourabloo, A., Ye, M., Liu, X., Ren, L.: Do convolutional neural networks learn class hierarchy? IEEE Trans. Visual Comput. Graphics 24(1), 152–162 (2017)
Article Google Scholar
Bissoto, A., Valle, E., Avila, S.: Debiasing skin lesion datasets and models? Not so fast. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 740–741 (2020)
Google Scholar
Borowski, J., et al.: Natural images are more informative for interpreting CNN activations than state-of-the-art synthetic feature visualizations. In: NeurIPS 2020 Workshop SVRHM (2020)
Google Scholar
Brendel, W., Bethge, M.: Approximating CNNs with bag-of-local-features models works surprisingly well on ImageNet (2019)
Google Scholar
Brown, K.E., Talbert, D.A.: Using explainable AI to measure feature contribution to uncertainty. In: The International FLAIRS Conference Proceedings, vol. 35 (2022)
Google Scholar
Brust, C.-A., Denzler, J.: Not just a matter of semantics: the relationship between visual and semantic similarity. In: Fink, G.A., Frintrop, S., Jiang, X. (eds.) DAGM GCPR 2019. LNCS, vol. 11824, pp. 414–427. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-33676-9_29
Chapter Google Scholar
Buhrmester, V., Münch, D., Arens, M.: Analysis of explainers of black box deep neural networks for computer vision: a survey. Mach. Learn. Knowl. Extract. 3(4), 966–989 (2021)
Article Google Scholar
Buolamwini, J., Gebru, T.: Gender shades: intersectional accuracy disparities in commercial gender classification. In: Proceedings of the 1st Conference on Fairness, Accountability and Transparency, pp. 77–91. PMLR (2018)
Google Scholar
Bykov, K., Deb, M., Grinwald, D., Müller, K.R., Höhne, M.M.C.: DORA: exploring outlier representations in deep neural networks. arXiv preprint arXiv:2206.04530 (2022)
Bykov, K., Hedström, A., Nakajima, S., Höhne, M.M.C.: NoiseGrad-enhancing explanations by introducing stochasticity to model weights. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, pp. 6132–6140 (2022)
Google Scholar
Bykov, K., et al.: Explaining Bayesian neural networks. arXiv preprint arXiv:2108.10346 (2021)
Bykov, K., Müller, K.R., Höhne, M.M.C.: Mark my words: dangers of watermarked images in ImageNet (2023)
Google Scholar
Chai, J., Zeng, H., Li, A., Ngai, E.W.: Deep learning in computer vision: a critical review of emerging techniques and application scenarios. Mach. Learn. Appl. 6, 100134 (2021)
Google Scholar
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255. IEEE (2009)
Google Scholar
Deselaers, T., Ferrari, V.: Visual and semantic similarity in Imagenet. In: CVPR 2011, pp. 1777–1784. IEEE (2011)
Google Scholar
Dosovitskiy, A., et al.: An image is worth 16$\times $16 words: transformers for image recognition at scale. In: ICLR (2021)
Google Scholar
Erhan, D., Bengio, Y., Courville, A., Vincent, P.: Visualizing higher-layer features of a deep network. Technical report, Univeristé de Montréal (2009)
Google Scholar
Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The PASCAL visual object classes challenge 2007 (VOC2007) results. http://www.pascal-network.org/challenges/VOC/voc2007/workshop/index.html
Gade, K., Geyik, S.C., Kenthapadi, K., Mithal, V., Taly, A.: Explainable AI in industry. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD 2019, pp. 3203–3204. Association for Computing Machinery, New York (2019)
Google Scholar
Gautam, S., Höhne, M.M.C., Hansen, S., Jenssen, R., Kampffmeyer, M.: Demonstrating the risk of imbalanced datasets in chest X-ray image-based diagnostics by prototypical relevance propagation. In: 2022 IEEE 19th International Symposium on Biomedical Imaging (ISBI), pp. 1–5. IEEE (2022)
Google Scholar
Ghosal, S.S., Ming, Y., Li, Y.: Are vision transformers robust to spurious correlations? (2022)
Google Scholar
Guidotti, R.: Evaluating local explanation methods on ground truth. Artif. Intell. 291, 103428 (2021)
Article MathSciNet Google Scholar
Haig, B.D.: What Is a spurious correlation? Underst. Stat.: Stat. Issues Psycho. Educ. Soc. Sci. 2(2), 125–132 (2003)
Article Google Scholar
Harispe, S., Ranwez, S., Janaqi, S., Montmain, J.: Semantic similarity from natural language and ontology analysis. Synthesis Lect. Hum. Lang. Technol. 8(1), 1–254 (2015)
Article Google Scholar
Hedström, A., Bommer, P., Wickstrøm, K.K., Samek, W., Lapuschkin, S., Höhne, M.M.C.: The meta-evaluation problem in explainable AI: identifying reliable estimators with MetaQuantus. arXiv preprint arXiv:2302.07265 (2023)
Hedström, A., et al.: Quantus: an explainable AI toolkit for responsible evaluation of neural network explanations and beyond. arXiv preprint arXiv:2202.06861 (2022)
Hernandez, E., Schwettmann, S., Bau, D., Bagashvili, T., Torralba, A., Andreas, J.: Natural language descriptions of deep visual features. In: International Conference on Learning Representations (2021)
Google Scholar
Holzinger, A., Saranti, A., Molnar, C., Biecek, P., Samek, W.: Explainable AI methods - a brief overview. In: Holzinger, A., Goebel, R., Fong, R., Moon, T., Müller, K.R., Samek, W. (eds.) xxAI 2020. LNCS, vol. 13200, pp. 13–38. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-04083-2_2
Chapter Google Scholar
Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4700–4708 (2017)
Google Scholar
Kenny, E.M., Ford, C., Quinn, M., Keane, M.T.: Explaining black-box classifiers using post-hoc explanations-by-example: the effect of explanations and error-rates in XAI user studies. Artif. Intell. 294, 103459 (2021)
Article MathSciNet Google Scholar
Kirillov, A., et al.: Segment anything. arXiv preprint arXiv:2304.02643 (2023)
Kolesnikov, A., Lampert, C.H.: Improving weakly-supervised object localization by micro-annotation (2016)
Google Scholar
Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images. University of Toronto (2009)
Google Scholar
Lapuschkin, S., Wäldchen, S., Binder, A., Montavon, G., Samek, W., Müller, K.R.: Unmasking Clever Hans predictors and assessing what machines really learn. Nat. Commun. 10(1), 1096 (2019)
Article Google Scholar
Li, Z., et al.: A Whac-a-mole dilemma: shortcuts come in multiples where mitigating one amplifies others. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 20071–20082 (2023)
Google Scholar
Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Chapter Google Scholar
Lundberg, S.M., Lee, S.I.: A unified approach to interpreting model predictions. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Google Scholar
Mann, H.B., Whitney, D.R.: On a test of whether one of two random variables is stochastically larger than the other. Ann. Math. Stat. 50–60 (1947)
Google Scholar
Marcel, S., Rodriguez, Y.: Torchvision the machine-vision package of torch. In: Proceedings of the 18th ACM International Conference on Multimedia, pp. 1485–1488 (2010)
Google Scholar
Miller, G.A.: WordNet: a lexical database for English. Commun. ACM 38(11), 39–41 (1995)
Article Google Scholar
Minh, D., Wang, H.X., Li, Y.F., Nguyen, T.N.: Explainable artificial intelligence: a comprehensive review. Artif. Intell. Rev. 1–66 (2022)
Google Scholar
Mo, Y., Wu, Y., Yang, X., Liu, F., Liao, Y.: Review the state-of-the-art technologies of semantic segmentation based on deep learning. Neurocomputing 493, 626–646 (2022)
Article Google Scholar
Morbidelli, P., Carrera, D., Rossi, B., Fragneto, P., Boracchi, G.: Augmented Grad-CAM: heat-maps super resolution through augmentation. In: ICASSP 2020–2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4067–4071. IEEE (2020)
Google Scholar
Mu, J., Andreas, J.: Compositional explanations of neurons. In: Advances in Neural Information Processing Systems, vol. 33, pp. 17153–17163 (2020)
Google Scholar
Nguyen, A., Yosinski, J., Clune, J.: Understanding neural networks via feature visualization: a survey. In: Samek, W., Montavon, G., Vedaldi, A., Hansen, L.K., Müller, K.-R. (eds.) Explainable AI: Interpreting, Explaining and Visualizing Deep Learning. LNCS (LNAI), vol. 11700, pp. 55–76. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-28954-6_4
Chapter Google Scholar
Nguyen, A.M., Yosinski, J., Clune, J.: Innovation engines: automated creativity and improved stochastic optimization via deep learning. In: Proceedings of the 2015 Annual Conference on Genetic and Evolutionary Computation, pp. 959–966 (2015)
Google Scholar
Olah, C., Mordvintsev, A., Schubert, L.: Feature visualization. Distill 2(11), e7 (2017)
Article Google Scholar
Pedersen, T., Patwardhan, S., Michelizzi, J., et al.: WordNet::similarity-measuring the relatedness of concepts. In: AAAI, vol. 4, pp. 25–29 (2004)
Google Scholar
Radford, A., et al.: Learning transferable visual models from natural language supervision. In: Proceedings of the 38th International Conference on Machine Learning, pp. 8748–8763. PMLR (2021)
Google Scholar
Ribeiro, M.T., Singh, S., Guestrin, C.: “Why should i trust you?": explaining the predictions of any classifier. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1135–1144 (2016)
Google Scholar
Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: High-resolution image synthesis with latent diffusion models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10684–10695 (2022)
Google Scholar
Sagawa, S., Koh, P.W., Hashimoto, T.B., Liang, P.: Distributionally robust neural networks for group shifts: on the importance of regularization for worst-case generalization (2020)
Google Scholar
Samek, W., Montavon, G., Lapuschkin, S., Anders, C.J., Müller, K.R.: Explaining deep neural networks and beyond: a review of methods and applications. Proc. IEEE 109(3), 247–278 (2021)
Article Google Scholar
Samek, W., Montavon, G., Vedaldi, A., Hansen, L.K., Müller, K.R.: Explainable AI: Interpreting, Explaining and Visualizing Deep Learning, vol. 11700. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-28954-6
Book Google Scholar
Scriver, A.: Semantic distance in WordNet: a simplified and improved measure of semantic relatedness. Master’s thesis, University of Waterloo (2006)
Google Scholar
Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D.: Grad-CAM: visual explanations from deep networks via gradient-based localization. Int. J. Comput. Vision 128(2), 336–359 (2019)
Article Google Scholar
Seyyed-Kalantari, L., Zhang, H., McDermott, M.B.A., Chen, I.Y., Ghassemi, M.: Underdiagnosis bias of artificial intelligence algorithms applied to chest radiographs in under-served patient populations. Nat. Med. 27(12), 2176–2182 (2021)
Article Google Scholar
Shetty, R., Schiele, B., Fritz, M.: Not using the car to see the sidewalk - quantifying and controlling the effects of context in classification and segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8218–8226 (2019)
Google Scholar
Shin, D.: The effects of explainability and causability on perception, trust, and acceptance: implications for explainable AI. Int. J. Hum Comput Stud. 146, 102551 (2021)
Article Google Scholar
Simon, H.A.: Spurious correlation: a causal interpretation. J. Am. Stat. Assoc. 49(267), 467–479 (1954)
Google Scholar
Singla, S., Feizi, S.: Salient ImageNet: how to discover spurious features in deep learning? In: International Conference on Learning Representations (2022)
Google Scholar
Slack, D., Hilgard, A., Singh, S., Lakkaraju, H.: Reliable post hoc explanations: modeling uncertainty in explainability. In: Advances in Neural Information Processing Systems, vol. 34, pp. 9391–9404 (2021)
Google Scholar
Smilkov, D., Thorat, N., Kim, B., Viégas, F., Wattenberg, M.: SmoothGrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825 (2017)
Stock, P., Cisse, M.: ConvNets and ImageNet beyond accuracy: understanding mistakes and uncovering biases. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 498–512 (2018)
Google Scholar
Sundararajan, M., Taly, A., Yan, Q.: Axiomatic attribution for deep networks. In: International Conference on Machine Learning, pp. 3319–3328. PMLR (2017)
Google Scholar
Szegedy, C., et al.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015)
Google Scholar
Tjoa, E., Guan, C.: A survey on explainable artificial intelligence (XAI): toward medical XAI. IEEE Trans. Neural Netw. Learn. Syst. 32(11), 4793–4813 (2020)
Article Google Scholar
Wallis, D., Buvat, I.: Clever Hans effect found in a widely used brain tumour MRI dataset. Med. Image Anal. 77, 102368 (2022)
Article Google Scholar
Wu, X., Sahoo, D., Hoi, S.C.: Recent advances in deep learning for object detection. Neurocomputing 396, 39–64 (2020)
Article Google Scholar
Xiao, K., Engstrom, L., Ilyas, A., Madry, A.: Noise or signal: the role of image backgrounds in object recognition (2020)
Google Scholar
Yu, J., Wang, Z., Vasudevan, V., Yeung, L., Seyedhosseini, M., Wu, Y.: CoCa: contrastive captioners are image-text foundation models. arXiv preprint arXiv:2205.01917 (2022)
Zaidi, S.S.A., Ansari, M.S., Aslam, A., Kanwal, N., Asghar, M., Lee, B.: A survey of modern deep learning based object detection models. Digit. Signal Process. 103514 (2022)
Google Scholar
Zech, J.R., Badgeley, M.A., Liu, M., Costa, A.B., Titano, J.J., Oermann, E.K.: Variable generalization performance of a deep learning model to detect pneumonia in chest radiographs: a cross-sectional study. PLoS Med. 15(11), e1002683 (2018)
Article Google Scholar
Zhao, J., Wang, T., Yatskar, M., Ordonez, V., Chang, K.W.: Men also like shopping: reducing gender bias amplification using corpus-level constraints. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp. 2979–2989. Association for Computational Linguistics, Copenhagen (2017)
Google Scholar

Download references

Acknowledgements

This work was partly funded by the German Ministry for Education and Research through the project Explaining 4.0 (ref. 01IS200551).

Author information

Authors and Affiliations

Department of Electrical Engineering and Computer Science, Technische Universität Berlin, 10587, Berlin, Germany
Kirill Bykov
Understandable Machine Intelligence Lab, Leibniz Institute for Agricultural Engineering and Bioeconomy, 14469, Potsdam, Germany
Kirill Bykov, Laura Kopf & Marina M.-C. Höhne
Department of Computer Science, University of Potsdam, 14476, Potsdam, Germany
Laura Kopf & Marina M.-C. Höhne
BIFOLD – Berlin Institute for the Foundations of Learning and Data, 10587, Berlin, Germany
Kirill Bykov & Marina M.-C. Höhne
UiT the Arctic University of Norway, 9037, Tromsø, Norway
Marina M.-C. Höhne

Authors

Kirill Bykov
View author publications
You can also search for this author in PubMed Google Scholar
Laura Kopf
View author publications
You can also search for this author in PubMed Google Scholar
Marina M.-C. Höhne
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kirill Bykov .

Editor information

Editors and Affiliations

Technological University Dublin, Dublin, Ireland
Luca Longo

Rights and permissions

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bykov, K., Kopf, L., Höhne, M.MC. (2023). Finding Spurious Correlations with Function-Semantic Contrast Analysis. In: Longo, L. (eds) Explainable Artificial Intelligence. xAI 2023. Communications in Computer and Information Science, vol 1902. Springer, Cham. https://doi.org/10.1007/978-3-031-44067-0_28

Download citation

DOI: https://doi.org/10.1007/978-3-031-44067-0_28
Published: 21 October 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-44066-3
Online ISBN: 978-3-031-44067-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Finding Spurious Correlations with Function-Semantic Contrast Analysis