Towards adaptive unknown authentication for universal domain adaptation by classifier paradox

Wang, Yunyun; Liu, Yao; Chen, Songcan

doi:10.1007/s10994-022-06236-2

Towards adaptive unknown authentication for universal domain adaptation by classifier paradox

Published: 20 October 2022

Volume 113, pages 1623–1641, (2024)
Cite this article

Download PDF

Machine Learning Aims and scope Submit manuscript

Towards adaptive unknown authentication for universal domain adaptation by classifier paradox

Download PDF

409 Accesses
1 Citation
1 Altmetric
Explore all metrics

Abstract

Universal domain adaptation (UniDA) is a general unsupervised domain adaptation setting, which addresses both domain and label shifts in adaptation. Its main challenge lies in how to identify target samples in unshared or unknown classes. Previous methods commonly strive to depict sample “confidence” along with a threshold for rejecting unknowns, and align feature distributions of shared classes across domains. However, it is still hard to pre-specify a “confidence” criterion and threshold which are adaptive to different tasks, and a mis-prediction of unknowns further incurs mis-alignment of features in shared classes. In this paper, we propose a new UniDA method with adaptive Unknown Authentication by Classifier Paradox (UACP), considering that samples with paradoxical predictions are probably unknowns belonging to none of the source classes. In UACP, a composite classifier is jointly designed with two types of predictors. That is, a multi-class (MC) predictor classifies samples to one of the multiple source classes, while a binary one-vs-all predictor further verifies the prediction by MC predictor. Samples with verification failure or paradox are identified as unknowns. Further, instead of feature alignment for shared classes, implicit domain alignment is conducted in output space such that samples across domains share the same decision boundary, though with feature discrepancy. Empirical results validate UACP under both open-set and universal UDA settings.

Learning to Detect Open Classes for Universal Domain Adaptation

Open-set domain adaptation by deconfounding domain gaps

Article 27 July 2022

Classifier Decoupled Training for Black-Box Unsupervised Domain Adaptation

1 Introduction

Unsupervised domain adaptation (UDA) (Ben-David et al., 2010; Tzeng et al., 2014; Long et al., 2015) aims to adopt a fully-labeled source domain to help the learning of unlabeled target domain. Existing UDA methods mainly attempt to generate domain-invariant representations by reducing the distribution discrepancy across domains with some distance metrics, such as Maximum Mean Discrepancy (MMD) (Tzeng et al., 2014), or by adversarial learning (Ganin et al., 2016) between the feature generator and domain discriminator. However, they commonly make a strong assumption that the source and target domains share the same label set, which limits their applicability to many real-world applications.

In real tasks, the label sets from source and target domains are usually different. For example, with the emergence of Big Data (Sagiroglu & Sinanc, 2013), large-scale labeled datasets like ImageNet-1K (Russakovsky et al., 2015) and Google Open Images (Krasin et al., 2017) are readily accessible as the source domains, while the target domain may only contain a subset of the source classes, leading to a so-termed Partial Domain Adaptation (PDA) (Cao et al., 2018). On the other hand, in real open learning scenes, target domains usually have unknown classes not covered in the source domain, leading to Open-Set Domain Adaptation (OSDA) (Panareda Busto & Gall, 2017). The learning purpose is to classify the target data in known classes correctly, while reject data in all unknown classes as “unknown”. Recently, Universal Domain Adaptation (UniDA) (You et al., 2019), a general learning setting without prior knowledge on label sets across domains, has attracted increasing attention. Obviously, UniDA is a more realistic UDA setting since the target ground-truth is commonly not available in real tasks.

A main learning challenge posed in such setting is how to identify the target samples in unshared or unknown classes. Previous methods mainly strive to depict the sample “confidence” along with a pre-defined threshold to detect target unknowns, then align distributions of shared classes across domains. The commonly adopted confidence criteria include prediction entropy (Saito et al., 2020), source similarity (Panareda Busto & Gall, 2017), classifier discrepancy (Liang et al., 2021) and minimum inter-class distance (Saito & Saenko, 2021), etc. Though with great progress, it is still hard to pre-specify versatile “confidence” criterion and threshold which are adaptive to various complicated real tasks. Furthermore, the mis-prediction of unknowns further incurs mis-alignment of features in shared classes, probably leading to negative transfer. To this end, we propose a new UniDA method with adaptive unknown authentication by classifier paradox (UACP). Specifically, the prediction paradox from two types of predictors is adopted in UACP to adaptively identify target unknowns, since samples with paradoxical predictions are probably unknowns belonging to none of the source classes.

In UACP, a composite classifier is designed with two types of predictors for classification and verification, respectively. The MC predictor classifies samples to one of the multiple source classes, and the corresponding binary OVA predictor further verifies if a sample belongs to the predicted class by MC predictor. Finally, samples with paradoxical predictions are rejected as unknowns. An illustration is shown in Fig. 1, the sample with true label (TL) “tiger” denoted by purple star is classified to “cat” by MC predictor, thus it is declined by the “airplane” and “dog”. But meanwhile, the corresponding OVA predictor (cat vs others) gives a paradoxical prediction that it belongs to the “others” rather than “cat”, then it is predicted as an “unknown” sample not included in the source classes. At the same time, the sample with TL “cat” denoted by orange star has consistent predictions from the two predictors, thus it is classified to the known “cat” class. In fact, the decision space of MC contains only source classes, while the decision space of OVAs contains both source and unknown classes, thus they commonly give consistent predictions for known-class samples, while paradoxical predictions for unknown-class samples. Moreover, different from previous feature alignment for shared classes, implicit domain alignment is conducted in the output space by a domain-invariant classifier. Specifically, features are generated for both domains such that the classifier correctly classifies source samples, and captures the target structure as well. In this way, samples across domains share the same decision boundary, though with different feature distributions. The main contributions of this paper are organized as follows,

We propose adaptive unknown authentication for UniDA by classifier paradox, such that target unknowns are adaptively identified by prediction paradox from two types of predictors.
We propose implicit domain alignment for UniDA in the output space, such that samples across domains share the same decision boundary, though exhibit feature discrepancy.
Empirical comparisons with state-of-arts validate the proposed UACP in both open-set and universal UDA settings.

2 Related works

In this section, we briefly review the related UDA methods, including closed-set UDA, OSDA, and UniDA methods in separate sub-sections.

2.1 Unsupervised domain adaptation

Closed-set UDA is the classical scenario in which source and target domains have distribution shift but consistent label sets. UDA approaches commonly attempt to reduce the distribution discrepancy for domain-invariant features across domains. The two main categories are statistical-based methods and adversarial-based methods. Statistical-based UDA methods directly minimize a discrepancy metric across domains, such as MMD (Tzeng et al., 2014), multi-kernel MMD (Long et al., 2015), joint MMD (Long et al., 2017), and correlation (Sun & Saenko, 2016), etc. Adversarial-based methods maximize the domain confusion via adversarial learning between feature generator and domain discriminator, or between different classifiers (Bousmalis et al., 2016; Ganin et al., 2016; Saito et al., 2018). Besides, some works also utilize learning strategies from other fields, such as curriculum learning (Choi et al., 2019), co-training (Wu et al., 2019), self-training (Yang et al., 2021), and entropy regularization (Wu et al., 2021), etc.

2.2 Open-set domain adaptation

In OSDA, the target domain contains novel categories that are not observed in the source domain. Existing OSDA methods tackle this problem by first separating target unknowns from known data, and then performing feature alignment on shared classes across domains. Panareda Busto and Gall (2017) assign target samples to one of the known and unknown classes based on their distance to source centroids. Saito et al. (2018) adopt an adversarial framework for OSDA, and manually set a pre-defined threshold to either align target samples with source data or reject them as unknowns. Pan et al. (2020) present a self-ensemble method to exploit the category-agnostic clusters for unknown detection. Liu et al. (2019) employ a coarse-to-fine weighting mechanism to progressively identify target unknowns. Bucci et al. (2020) utilize rotation recognition with tailed adjustments to separate known and unknown target samples, and align distributions of shared classes across domains.

2.3 Universal domain adaptation

In UniDA, both domains may contain unshared or private classes, while no prior information about the target label set is provided. You et al. (2019) quantify sample-level transferability to distinguish shared and private classes in each domain. Later on, Saito et al. (2020) apply neighborhood clustering and entropy separation to encourage known target samples close to source prototypes while away from unknown classes. Fu et al. (2020) present calibrated multiple uncertainty to detect open classes more accurately. Li et al. (2021) utilize the intrinsic structure of target samples and provide a unified framework to deal with different sub-cases of UniDA. Saito and Saenko (2021) adopt the minimum inter-class distance in source domain as the threshold to identify unknowns in target. Some recent researches also study UniDA in more complicated scenarios. Yu et al. (2021) adopt the divergence between two classifiers as sample confidence in noisy UniDA. Liang et al. (2021) develop an informative consistency score based on two classifiers to help distinguish unknown samples in source-free UniDA.

Different from existing works (Yu et al., 2021; Liang et al., 2021) that adopt the discrepancy of two same-structured classifiers for describing sample “confidence”, our proposed UACP exploits the prediction paradox between two types of predictors (MC and OVA) to directly identify unknowns. Besides, OVANet (Saito & Saenko, 2021) utilizes OVA classifiers for unknown identification in UniDA, in order to seek the minimum source inter-class distance as the unknown threshold, while UACP directly adopts prediction paradox between MC and OVA for adaptive unknown authentication.

3 Methodology

In this section, we first revisit the problem setting of UniDA, and describe the network architecture of UACP. After that, we show individual components in UACP in details.

3.1 Problem setting and network architecture

In UniDA, we are given a labeled source domain ${\mathcal {D}}_s={\{({x}_i^s,{y}_i^s)\}}_{i=1}^{N_s}$ with ${N}_s$ labeled source samples, and an unlabeled target domain ${\mathcal {D}}_t={\{({x}_i^t)\}}_{i=1}^{N_t}$ with ${N}_t$ unlabeled target samples, where ${x}_i^s$ and ${x}_i^t$ denote the source and target samples, respectively. ${y}_i^s\in {\{1,\ldots ,K\}}$ is the class label for sample ${x}_i^s$, and K is the number of source classes. Define $L_s$ and $L_t$ as the label sets of source and target domains, respectively. Then the class set shared across domains is denoted as $L_s\cap L_t$. $L_s-L_t$ and $L_t-L_s$ represent the source-private and target-private class sets, respectively. We mainly focus on the scenario with ${L_t-L_s}{\ne } {\emptyset }$, including both OSDA and UniDA, and the learning goal is to classify target samples into $|{L_s}\cap {L_t}|+1$ classes, that is, to classify known target samples to the shared source classes, while recognize unknown target samples in all target-private classes as well.

The architecture of UACP is given in Fig. 2. It contains two components: (i) a feature extractor F, which outputs $\ell_2$ normalized feature vector, and (ii) a composite classifier CC composed of ${2}\times {K}$ neurons. In CC, a MC predictor w.r.t. the first K neurons is adopted to classify target samples to one of the K source classes. K OVA predictors are further adopted to verify the prediction from MC predictor, and the OVA predictor for the k-th class is built on the k-th and $(K+k)$-th neurons. The memory bank holds features for all target samples currently.

3.2 Composite classifier with MC and OVA predictors

In order to adaptively identify unknown target samples, a novel composite classifier with both MC and OVA predictors is designed in UACP. The MC predictor outputs a K-dimensional vector, in order to classify samples to one of the source classes. Further, there are also K OVA predictors $OVA=\{ova^1,ova^2,\ldots ,ova^K\}$ for verifying the prediction by MC predictor, one OVA predictor for each source class. Let $p_{mc}(x)=\sigma (MC(F(x)))\in {{\mathbb {R}}^K}$ denote the probability output vector for sample x by MC predictor, where $\sigma$ is the softmax function, each dimension $p_{mc}^k(x)$ describes the probability of x to the k-th class. $p_{{ova}^k}(x)=\sigma ({ova}^k(F(x)))\in {{\mathbb {R}}^2}$ denotes the probability output for x by the k-th OVA predictor, in which $p_{{ova}^k}^+(x)$ and $p_{{ova}^k}^-(x)$ are the probabilities of x to the k-th (positive) and rest (negative) classes, respectively.

For MC predictor, the cross-entropy loss is minimized with source supervision,

$$\begin{aligned} {\mathcal {L}}_{CE}={\mathbb {E}}_{(x_i^s,y_i^s)\in {{\mathcal {D}}_s}}\ell _{ce}(p_{mc}(x_i^s),y_i^s) \end{aligned}$$

(1)

where $\ell _{ce}$ is the standard cross-entropy loss.

Since OVA predictor does not enforce samples to belong to only source classes, UACP adopts it to further verify the prediction by MC predictor. For each source class, an OVA predictor learns the decision boundary between the positive in-class and negative out-class categories, and the negative category actually includes both source and unknown classes. At the same time, UACP learns discriminative features among categories by maximizing the distance between similar categories (Padhy et al., 2020; Saito & Saenko, 2021). Specifically, for each source sample $x_i^s$, the discrepancy between $ova^{y_i^s}$ and OVA predictor w.r.t. the closest negative class is further maximized,

$$\begin{aligned} {\mathcal {L}}_{SOVA}={\mathbb {E}}_{(x_i^s,y_i^s)\in {{\mathcal {D}}_s}}\left\{ -log(p_{{ova}^{y_i^s}}^+(x_i^s))+\mathop {max}\limits _{j\ne {y_i^s}}[log(p_{{ova}^j}^+(x_i^s))]\right\} \end{aligned}$$

(2)

where j represents the closest negative class for $x_i^s$. Through minimizing the above loss, OVA predictors can not only identify in-class and out-class samples, but also separate similar classes with source supervision.

3.3 Prediction paradox for target unknown authentication

Since some target categories are absent in source domain, it is difficult to make accurate predictions for target samples directly with source classifier, especially for target unknowns. To the end, UACP adopts classifier paradox for adaptive unknown authentication, including a MC predictor to classify samples to one of the multiple source classes, and a corresponding OVA predictor to further verify whether the sample belongs to the predicted class or not. If the OVA predictor affirms the prediction by MC predictor, which means MC and OVA have consistent predictions, then it is confident to predict the sample to a known class in source domain. Otherwise, if there is verification failure by OVA predictor, or the MC and OVA predictors have paradoxical predictions, then the sample tends to belong to unknown class. Finally, for each sample $x_i$, let $k=argmax(p_{mc}(x_i))$ denote the predicted class by MC predictor, then

$$\begin{aligned} x_i\in {\left\{ \begin{aligned} C _k,~~~~~~~~&if~p_{{ova}^{k}}^+(x_i)\ge p_{{ova}^{k}}^-(x_i)\\ C _{unknown},&if~p_{{ova}^{k}}^+(x_i)< p_{{ova}^{k}}^-(x_i)\\ \end{aligned} \right. } \end{aligned}$$

(3)

where $C _k$ and $C _{unknown}$ denote the k-th known class and unknown class, respectively.

Further, we adopt an entropy-strengthened loss over target samples for MC predictor, in order to strengthen the consistency between MC and OVA predictors, and capture the low-density separation for target samples as well. Specifically, for a known target sample with MC prediction affirmed by OVA predictor, we further constrain a sharper probability distribution or more confident prediction in MC predictor, while for an unknown sample with prediction paradox, a more uniform distribution or less confident prediction is further expected. Finally, for each target sample $x_i^t$, the entropy-strengthened loss is expressed as,

$$\begin{aligned} {\mathcal {L}}_{ESL}(x_i^t)={\left\{ \begin{aligned} -p_{mc}(x_i^t)log(p_{mc}(x_i^t)),&\,if~p_{{ova}^{k}}^+(x_i^t)> p_{{ova}^{k}}^-(x_i^t)+m\\ p_{mc}(x_i^t)log(p_{mc}(x_i^t)),&\,if~p_{{ova}^{k}}^+(x_i^t)< p_{{ova}^{k}}^-(x_i^t)-m\\ 0,~~~~~~~~~~~~&otherwise~~~~~~~~~~~ \end{aligned} \right. } \end{aligned}$$

(4)

and

$$\begin{aligned} {\mathcal {L}}_{ESL}={\mathbb {E}}_{(x_i^t)\in {\mathcal {D}}_t}{\mathcal {L}}_{ESL}(x_i^t) \end{aligned}$$

(5)

where m is the margin for selecting confident known and unknown samples. It is noted that we adopt m to conduct constraint on confident target samples, in order to exclude the incorrect predictions from MC predictor. In particular, with our special design of composite classifier, MC and OVA tend to have consistent predictions for known-class samples due to their partial-shared neurons, and they cooperate with each other in learning.

3.4 Domain-invariant classifier for implicit domain alignment

Previous UniDA methods commonly reduce the domain shift by feature alignment for shared classes across domains, while a mis-identification of target unknowns further incurs mis-alignment across domains. In UACP, an implicit domain alignment is conducted directly in the output space. Specifically, a domain-invariant classifier is trained such that samples across domains share the same decision boundary, though exhibit different feature distributions.

First, the source supervision is adopted for both feature extractor and classifier learning in Sect. 3.2. Due to the lack of target ground-truth, we further propose to leverage the self-supervised knowledge from target data. Our idea is that nearby samples should be close to each other in feature space, so as to generate well-clustered features for target data. A memory bank is utilized as $V=\{v_1,v_2,\ldots , v_{N_t}\}$, where $v_i$ is the stored feature vector for $x_i$, and it is updated with the mini-batch features in each iteration. Then the similarity between feature $f_i=F(x_i)$ and stored feature $v_j$ with $i\ne {j}$ is calculated as,

$$\begin{aligned} p_{i,j}=\frac{exp(v_j^\top f_i)/\tau }{\sum _{r=1,r\ne {i}}^{N_t}exp(v_r^\top f_i)/\tau } \end{aligned}$$

(6)

where temperature $\tau$ determines the level of concentration (Hinton et al., 2015). $p_{i,j}$ actually describes the probability that feature $f_i$ is a neighbor of $v_j$. To enforce samples be close to its nearby neighbors, a self-supervised feature clustering loss is adopted for target samples as,

$$\begin{aligned} {\mathcal {L}}_{SFC}=-{\mathbb {E}}_{(x_i^t)\in {\mathcal {D}}_t}\sum _{j=1,j\ne {i}}^{N_t}p_{i,j}log(p_{i,j}) \end{aligned}$$

(7)

Minimizing the above loss actually minimizes the entropy of each target sample’s similarity distribution to other target samples, thus helps gather similar target samples together to form compact clusters, and separate target samples from different clusters in the feature space.

Further, the low-density separation for target data has already been enforced over MC predictor in Eq. (5). It is also conducted on OVA predictors to seek a domain-invariant classifier across domains. Specifically, entropy minimization (Saito et al., 2019) is performed over each OVA predictor by,

$$\begin{aligned} {\mathcal {L}}_{TOVA}=-{\mathbb {E}}_{(x_i^t)\in {\mathcal {D}}_t}\sum _{k=1}^{K}\{p_{{ova}^{k}}^+(x_i^t)log(p_{{ova}^{k}}^+(x_i^t))+p_{{ova}^{k}}^-(x_i^t)log(p_{{ova}^{k}}^-(x_i^t))\} \end{aligned}$$

(8)

It is minimized to increase the prediction confidence of OVA predictors. In this way, the shared classes across domains are implicitly aligned, while the target unknowns are kept away from the known classes.

3.5 Overall training objective for UACP

The final learning objective of UACP can be formulated as,

$$\begin{aligned} \mathop {min}\limits _{\theta _F,\theta _{CC}}({\mathcal {L}}_{CE}+{\mathcal {L}}_{SOVA})+(\alpha \cdot {\mathcal {L}}_{SFC}+\beta \cdot {\mathcal {L}}_{TOVA}+\gamma \cdot {\mathcal {L}}_{ESL}) \end{aligned}$$

(9)

where $\alpha$, $\beta$, and $\gamma$ control the trade-off for each component in Eq. (9). In each iteration, the memory bank updates a batch target features and the network updates parameters $\theta _F$ and $\theta _{CC}$. The algorithm description for UACP is given in Algorithm 1.

4 Experiments

We validate the proposed UACP mainly with two adaptation settings, open-set domain adaptation and universal domain adaptation.

4.1 Setup

4.1.1 Datasets

We perform experiments on four popular benchmark datasets in UDA: Office-31 (Saenko et al., 2010), Office-Home (Venkateswara et al., 2017), VisDA (Peng et al., 2017), and DomainNet (Peng et al., 2019). Office-31 consists of 31 categories in 3 domains, i.e., DSLR(D), Amazon(A) and Webcam(W), with totally 4652 images. Office-Home contains 15500 images from 4 different domains, i.e., Artistic images (Ar), Clipart images (Cl), Product images (Pr), and Real-World images (Re). VisDA is a more challenging dataset, which consists of 15K source synthetic images and 5K target natural images. DomainNet contains 345 classes and 6 domains, and following Fu et al. (2020), we use 3 domains Painting(P), Real(R), and Sketch(S). Similar to Li et al. (2021), we split classes into three parts: common classes across domains $|{L_s}\cap {L_t}|$, source-private classes $|{L_s}-{L_t}|$, and target-private classes $|{L_t}-{L_s}|$. The details of category division in the four datasets are shown in Table 1.

Table 1 The division on label sets in each setting

Full size table

4.1.2 Evaluation metrics

To better evaluate the performance of UACP under both OSDA and UniDA scenarios, we utilize the HOS metric (Bucci et al., 2020) defined as the harmonic mean of average per-class accuracies over known and unknown samples, denoted by $Acc_{kn}$ and $Acc_{unk}$, respectively. HOS is formulated as,

$$\begin{aligned} HOS=2\times {\frac{Acc_{kn}\times {Acc}_{unk}}{Acc_{kn}+Acc_{unk}}} \end{aligned}$$

(10)

It fairly considers the performance on known and unknown data. Besides, instance-wise accuracy on known classes (Acc) and area under the ROC curve (AUC) are also adopted in Sect. 4.3.4, following the standard protocol of unknown detection (Hendrycks & Gimpel, 2016).

4.1.3 Implementation details

We perform UACP in Pytorch (Paszke et al., 2017) framework. For fair comparisons, we implement our network based on ResNet-50 (He et al., 2016) pre-trained on ImageNet (Russakovsky et al., 2015). Following Saito et al. (2019), the last linear layer is replaced by a new linear classification layer. The learning rates for the new linear layer and finetuned layers with inverse scheduling are set to 0.01 and 0.001, respectively. We exploit a mini-batch SGD optimizer with momentum 0.9 and weight decay 0.0005 in all experiments. The value of temperature $\tau$ is set to 0.05 following Ranjan et al. (2017). Trade-off parameters $\alpha$, $\beta$, and $\gamma$ in UACP are fixed, i.e., $\alpha$ = $\gamma$ = 0.05, and $\beta$ = 0.1. m is set to 0.4 for Office-31 and Office-Home, and 0.5 for VisDA and DomainNet.

Table 2 HOS (%) on Office-31 and VisDA for OSDA

Full size table

Table 3 HOS (%) on Office-Home for OSDA

Full size table

4.2 Reults

In this section, we evaluate UACP by comparing with the state-of-the-arts. The bolded value in each column indicates the best performance of compared methods.

4.2.1 Open-set domain adaptation

We perform comparisons under OSDA scenario over Office-31, Office-Home and VisDA datasets. There are 6 tasks for Office-31 and 12 tasks for Office-Home. We compare with OSDA methods OSBP (Saito et al., 2018), STA (Liu et al., 2019) and ROS (Bucci et al., 2020), as well as UniDA methods UAN (You et al., 2019), DANCE (Saito et al., 2020), CMU (Fu et al., 2020), DCC (Li et al., 2021) and OVANet (Saito & Saenko, 2021).

The results over Office-31 and VisDA are reported in Table 2, and Table 3 records the performance over Office-Home. UACP achieves the best performance on 5 of the 6 tasks on Office-31. On average, it outperforms state-of-the-art method OVANet by 3.0%. As for VisDA, UACP significantly outperforms OVANet by 16.3%, and outperforms the second-best method DCC by 1.7%. Note that DCC takes the prior of OSDA and UniDA settings into consideration, while our UACP has no prior knowledge of private classes. From Table 3, UACP achieves the best results on 7 of 12 tasks. On average, UACP surpasses all compared methods.

Table 4 HOS (%) on Office-31 and VisDA for UniDA

Full size table

Table 5 HOS (%) on Office-Home for UniDA

Full size table

4.2.2 Universal domain adaptation

For UniDA scenario, we perform UACP on Office-31, Office-Home, VisDA and DomainNet. We compare UACP with OSDA methods, including OSBP and ROS, and UniDA methods, including UAN, DANCE, CMU, DCC, and OVANet.

Table 4 shows the results over Office31 and VisDA, from which UACP outperforms all baselines on 5 of the 6 tasks, yielding 4.5% improvement over OVANet on Office31, and 7.0% improvement on VisDA. The results on Office-Home are given in Table 5, in which UACP achieves the best performance of 74.7%, which has improvement of 2.9% over the second-best method OVANet. The performance on DomainNet is reported in Table 6. UACP outperforms the baselines on 2 of 6 tasks, and achieves the second-best HOS of 50.3%, which is actually comparable to the best one of 50.7%. In terms of the comparison results, it can be observed that our proposed UACP is able to properly tackle different levels of category shifts, and performs well in different UDA scenarios.

Table 6 HOS (%) on DomainNet for UniDA

Full size table

4.3 Analysis

In this section, more analyses will be provided to further investigate the effectiveness of UACP.

4.3.1 Varying the number of unknown classes

Performances by varying the number of unknown classes on 2 tasks (Ar $\rightarrow$ Re and Cl $\rightarrow$ Pr) of Office-Home in both OSDA and UniDA settings are presented in Fig. 3. We compare UACP with UniDA methods, including DANCE, CMU, DCC, and OVANet. Results under OSDA setting are illustrated in Fig. 3a, b, from which it can be found that both CMU and OVANet suffer from performance degradation with the increasing number of unknown classes, while UACP has smoother fluctuations. At the same time, from comparison performance under UniDA setting in Fig. 3c, d, UACP obviously outperforms previous state-of-the-arts with a large margin, indicating its effectiveness and robustness to different ratios of unknown classes. In summary, UACP yields consistent improvement on all tasks, demonstrating that it can effectively handle different levels of label shifts among domains.

4.3.2 Ablation study

In this sub-section, we verify the effectiveness of individual components of UACP. Specifically, ablation studies over the difficult task Pr $\rightarrow$ Re of Office-Home for both OSDA and UniDA settings are presented in Table 7. Four variants of UACP are studied: (i) “w/o ${\mathcal {L}}_{ESL}+{\mathcal {L}}_{SFC}+{\mathcal {L}}_{TOVA}$” is the variant that only trained with source supervision. (ii) “w/o ${\mathcal {L}}_{ESL}$” discards entropy-strengthened loss for MC in Eq. (5). (iii) “w/o ${\mathcal {L}}_{SFC}$” discards self-supervised feature clustering on target samples in Eq. (7). (iv) “w/o ${\mathcal {L}}_{TOVA}$” discards entropy minimization on OVA predictors in Eq. (8).

Table 7 Ablation on losses

Full size table

From Table 7, each component of UACP contributes to the final target performance. Specifically, the entropy-strengthened loss ${\mathcal {L}}_{ESL}$ is essential for unknown detection, it increases the $Acc_{unk}$ significantly from 56.4% to 72.9% under OSDA scenario, and from 60.4% to 75.2% under UniDA scenario. Besides, removing the ${\mathcal {L}}_{SFC}$ greatly hurts the $Acc_{kn}$. When employing entropy minimization ${\mathcal {L}}_{TOVA}$ on OVA predictors, the HOS is improved from 70.5% to 74.8% under OSDA scenario, and from 79.1% to 83.3% under UniDA scenario.

4.3.3 Convergence comparison

The performance of UACP in each iteration is presented in Fig. 4, compared to OVANet. We plot $Acc_{kn}$, $Acc_{unk}$ and HOS w.r.t. the number of iterations on the task A $\rightarrow$ D of OSDA setting, and A $\rightarrow$ W of UniDA setting, respectively. As illustrated in Fig. 4, UACP quickly converges within the first several hundreds of iterations and achieves better performance. Besides, its $Acc_{kn}$, $Acc_{unk}$ and HOS have much less fluctuations than those of OVANet, demonstrating the stability and effectiveness of our proposal.

4.3.4 Hyper-parameter analysis

To illustrate the sensitivity of UACP to trade-off parameters $\alpha$, $\beta$, and $\gamma$, we perform experiments on the tasks of D $\rightarrow$ A and Ar $\rightarrow$ Pr under UniDA scenario. As shown in Fig. 5, we present the performance of HOS, Acc and AUC w.r.t. trade-off parameters $\alpha$, $\beta$, and $\gamma$ within the wide range of [0, 0.025, 0.05, 0.1, 0.2, 0.3, 0.4, 0.5]. Although the performance fluctuates as the values of parameters change to an extent, it remains relatively flat and stable within a certain range.

5 Conclusion

In this work, we propose a novel UniDA approach UACP to adaptively identify unknowns by classifier paradox. In UACP, a composite classifier is proposed to tackle both domain and category shifts. The composite classifier distinguishes different source categories using MC predictor, and captures the concept of “unknown” by verification from OVA predictors. Moreover, self-supervised knowledge is utilized to pursue well-clustered target features and low-density separation for target data, so as to conduct implicit domain alignment by domain-invariant classifier. Finally, extensive experiments on four benchmarks validate UACP in both OSDA and UniDA scenarios.

Availability of data and materials

The data is publicly available and from previous studies (Li et al. 2021, Saito & Saenko 2021, Liang et al. 2021).

Code availability

The source code of this study is available on request from author Yunyun Wang.

References

Ben-David, S., Blitzer, J., Crammer, K., Kulesza, A., Pereira, F., & Vaughan, J. W. (2010). A theory of learning from different domains. Machine Learning, 79(1), 151–175.
Article MathSciNet Google Scholar
Bousmalis, K., Trigeorgis, G., Silberman, N., Krishnan, D., & Erhan, D. (2016). Domain separation networks. In Advances in neural information processing systems 29.
Bucci, S., Loghmani, M.R., & Tommasi, T. (2020). On the effectiveness of image rotation for open set domain adaptation. In European conference on computer vision (pp. 422–438).
Cao, Z., Ma, L., Long, M., & Wang, J. (2018). Partial adversarial domain adaptation. In Proceedings of the European conference on computer vision (ECCV) (pp. 135–150).
Choi, J., Jeong, M., Kim, T., & Kim, C. (2019). Pseudo-labeling curriculum for unsupervised domain adaptation. arXiv preprint arXiv:1908.00262.
Fu, B., Cao, Z., Long, M., & Wang, J. (2020). Learning to detect open classes for universal domain adaptation. In European conference on computer vision (pp. 567–583).
Ganin, Y., Ustinova, E., Ajakan, H., Germain, P., Larochelle, H., Laviolette, F., & Lempitsky, V. (2016). Domain-adversarial training of neural networks. The Journal of Machine Learning Research, 17(1), 2030–2096.
MathSciNet Google Scholar
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770–778).
Hendrycks, D., & Gimpel, K. (2016). A baseline for detecting misclassified and out-of-distribution examples in neural networks. arXiv preprint arXiv:1610.02136.
Hinton, G., Vinyals, O., Dean, J., et al. (2015). Distilling the knowledge in a neural network, 2(7). arXiv preprint arXiv:1503.02531.
Krasin, I., Duerig, T., Alldrin, N., Ferrari, V., Abu-El-Haija, S., Kuznetsova, A., et al. (2017). Openimages: A public dataset for large-scale multi-label and multi-class image classification. Dataset available from https://github.com/openimages, 2(3), 18.
Li, G., Kang, G., Zhu, Y., Wei, Y., & Yang, Y. (2021). Domain consensus clustering for universal domain adaptation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 9757–9766).
Liang, J., Hu, D., Feng, J., & He, R. (2021). Umad: Universal model adaptation under domain and category shift. arXiv preprint arXiv:2112.08553 .
Liu, H., Cao, Z., Long, M., Wang, J., & Yang, Q. (2019). Separate to adapt: Open set domain adaptation via progressive separation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 2927–2936).
Long, M., Cao, Y.,Wang, J., & Jordan, M. (2015). Learning transferable features with deep adaptation networks. In International conference on machine learning (pp. 97–105).
Long, M., Zhu, H., Wang, J., & Jordan, M. I. (2017). Deep transfer learning with joint adaptation networks. In International conference on machine learning (pp. 2208–2217).
Padhy, S., Nado, Z., Ren, J., Liu, J., Snoek, J., & Lakshminarayanan, B. (2020). Revisiting one-vs-all classifiers for predictive uncertainty and out-of-distribution detection in neural networks. arXiv preprint arXiv:2007.05134.
Pan, Y., Yao, T., Li, Y., Ngo, C.-W., & Mei, T. (2020). Exploring category- agnostic clusters for open-set domain adaptation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 13867–13875).
Panareda Busto, P., & Gall, J. (2017). Open set domain adaptation. In Proceedings of the IEEE international conference on computer vision (pp. 754–763).
Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, A., Desmaison, A., Antiga, L., & Lerer, A. (2017). Automatic differentiation in PyTorch. In NIPS 2017 workshop Autodiff Paper8 acceptance decision
Peng, X., Bai, Q., Xia, X., Huang, Z., Saenko, K., & Wang, B. (2019). Moment matching for multi-source domain adaptation. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 1406–1415).
Peng, X., Usman, B., Kaushik, N., Hoffman, J., Wang, D., & Saenko, K. (2017). Visda: The visual domain adaptation challenge. arXiv preprint arXiv:1710.06924 .
Ranjan, R., Castillo, C. D., & Chellappa, R. (2017). L2-constrained softmax loss for discriminative face verification. arXiv preprint arXiv:1703.09507 .
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., et al. (2015). Imagenet large scale visual recognition challenge. International Journal of Computer Vision, 115(3), 211–252.
Article MathSciNet Google Scholar
Saenko, K., Kulis, B., Fritz, M., & Darrell, T. (2010). Adapting visual category models to new domains. In European conference on computer vision (pp. 213–226).
Sagiroglu, S., & Sinanc, D. (2013). Big data: A review. In 2013 international conference on collaboration technologies and systems (cts) (pp. 42–47).
Saito, K., Kim, D., Sclaroff, S., Darrell, T., & Saenko, K. (2019). Semi-supervised domain adaptation via minimax entropy. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 8050–8058).
Saito, K., Kim, D., Sclaroff, S., & Saenko, K. (2020). Universal domain adaptation through self supervision. Advances in Neural Information Processing Systems, 33, 16282–16292.
Google Scholar
Saito, K., & Saenko, K. (2021). Ovanet: One-vs-all network for universal domain adaptation. In Proceedings of the ieee/cvf international conference on computer vision (pp. 9000–9009).
Saito, K., Watanabe, K., Ushiku, Y., & Harada, T. (2018). Maximum classifier discrepancy for unsupervised domain adaptation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3723–3732).
Saito, K., Yamamoto, S., Ushiku, Y., & Harada, T. (2018). Open set domain adaptation by backpropagation. In Proceedings of the European conference on computer vision (ECCV) (pp. 153–168).
Sun, B., & Saenko, K. (2016). Deep coral: Correlation alignment for deep domain adaptation. In European conference on computer vision (pp. 443–450).
Tzeng, E., Hoffman, J., Zhang, N., Saenko, K., & Darrell, T. (2014). Deep domain confusion: Maximizing for domain invariance. arXiv preprint arXiv:1412.3474.
Venkateswara, H., Eusebio, J., Chakraborty, S., & Panchanathan, S. (2017). Deep hashing network for unsupervised domain adaptation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 5018–5027).
Wu, S., Zhong, J., Cao, W., Li, R., Yu, Z., & Wong, H.-S. (2019). Improving domain-specific classification by collaborative learning with adaptation networks. In Proceedings of the AAAI conference on artificial intelligence (Vol. 33, pp. 5450–5457).
Wu, X., Zhang, S., Zhou, Q., Yang, Z., Zhao, C., & Latecki, L. J. (2021). Entropy minimization versus diversity maximization for domain adaptation. IEEE Transactions on Neural Networks and Learning Systems. https://doi.org/10.1109/TNNLS.2021.3110109.
Article Google Scholar
Yang, J., Shi, S., Wang, Z., Li, H., & Qi, X. (2021). St3d: Self-training for unsupervised domain adaptation on 3d object detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 10368–10378).
You, K., Long, M., Cao, Z., Wang, J., & Jordan, M. I. (2019). Universal domain adaptation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 2720–2729).
Yu, Q., Hashimoto, A., & Ushiku, Y. (2021). Divergence optimization for noisy universal domain adaptation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 2515–2524).

Download references

Funding

This work is supported in part by the NSFC under Grant Nos. 62076124 and 62176118.

Author information

Authors and Affiliations

School of Computer Science and Engineering, Nanjing University of Posts and Telecommunications, Nanjing, 210046, China
Yunyun Wang & Yao Liu
School of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing, 210023, China
Songcan Chen

Authors

Yunyun Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yao Liu
View author publications
You can also search for this author in PubMed Google Scholar
Songcan Chen
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All authors contributed to the study conception and design. Material preparation, data collection and analysis were performed by Y.W. and Y.L. The first draft of the manuscript was written by Y.W. and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Songcan Chen.

Ethics declarations

Conflict of interest

All authors declare that they have no conflict of interest.

Ethical approval

Not Applicable.

Consent to participate

Not Applicable.

Consent for publication

Not Applicable.

Additional information

Editors: Yu-Feng Li, Prateek Jain.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Wang, Y., Liu, Y. & Chen, S. Towards adaptive unknown authentication for universal domain adaptation by classifier paradox. Mach Learn 113, 1623–1641 (2024). https://doi.org/10.1007/s10994-022-06236-2

Download citation

Received: 27 May 2022
Revised: 11 August 2022
Accepted: 12 September 2022
Published: 20 October 2022
Issue Date: April 2024
DOI: https://doi.org/10.1007/s10994-022-06236-2

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Towards adaptive unknown authentication for universal domain adaptation by classifier paradox

Abstract

Similar content being viewed by others

Learning to Detect Open Classes for Universal Domain Adaptation

Open-set domain adaptation by deconfounding domain gaps

Classifier Decoupled Training for Black-Box Unsupervised Domain Adaptation

1 Introduction

2 Related works

2.1 Unsupervised domain adaptation

2.2 Open-set domain adaptation

2.3 Universal domain adaptation

3 Methodology

3.1 Problem setting and network architecture

3.2 Composite classifier with MC and OVA predictors

3.3 Prediction paradox for target unknown authentication

3.4 Domain-invariant classifier for implicit domain alignment

3.5 Overall training objective for UACP

4 Experiments

4.1 Setup

4.1.1 Datasets

4.1.2 Evaluation metrics

4.1.3 Implementation details

4.2 Reults

4.2.1 Open-set domain adaptation

4.2.2 Universal domain adaptation

4.3 Analysis

4.3.1 Varying the number of unknown classes

4.3.2 Ablation study

4.3.3 Convergence comparison

4.3.4 Hyper-parameter analysis

5 Conclusion

Availability of data and materials

Code availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Consent to participate

Consent for publication

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation