Discovering and incorporating latent target-domains for domain adaptation
Introduction
In real-world visual recognition problems, it is common that training and testing data differ in various ways. For example, training data may be collected from a domain (a.k.a. source domain) that is different from the testing data (a.k.a. target domain). Due to the domain discrepancy, a model trained on source domain training data may fail to perform well on the target domain. Therefore, how to reduce the domain discrepancy between the source and the target domains, and reuse the source domain training data to build a precise classifier for the target domain are vital in domain adaptation. Many works have been proposed for domain adaptation problems in the literature, e.g. instance re-weighting (e.g. [1], [2]), subspace learning for distribution alignment (e.g. [3], [4], [5], [6], [7], [8], [9]), etc.
Most existing domain adaptation methods consider the “balanced” setting that the source and target domains are both from single domains. However, in many circumstances the training and testing data may be diverse and contain multiple latent domains. Directly applying domain adaptation methods may not be optimal. It has been observed in the literature that simply treating the labeled data collected from multiple domains may lead to poor adaptation performance [10]. This is largely because traditional distribution alignment methods generally assume the source and target domains are compact, and their supports are overlapped as well, which however, may not always hold when the domain is diverse. Participating a complex domain into multiple (small and compact) latent domains helps to reduce the difficulty for distribution alignment, which was also verified by the previous works on discovering latent domains for the source domains [10], [11]. Here, we argue that in real-world applications, compared with training data, testing data could be even more diverse, implying the existence of multiple “latent target domains”. For example, the images or videos for testing could be acquired from arbitrary viewpoints, under different illuminations, or using different devices. However, most existing latent domain discovery methods cannot be directly applied to the target domain, as they rely on label information based on source domain to learn the latent domain. Such label information is not available in the target domain.
In this paper, we aim to address a new challenging issue for the unsupervised domain adaptation by discovering latent target domains for improving the domain adaptation performance. Our intuition is that the main difficulty in domain adaptation for many visual recognition problems originates from the large diversity of the testing data. In other words, the testing data may be from different latent target domains, resulting in the underlying distribution to be extremely complicated. Therefore, we propose to first partition the target domain into multiple compact and distinctive latent domains, such that the distribution of each latent domain becomes simpler, and thus domain adaptation between the source and each latent target-domain could be less challenging. When partitioning the target domain, we also enforce each latent target-domain to be as similar to the source domain as possible, which can further facilitate knowledge transfer from the source domain. After learning latent target domains, for each pair of the source domain and a latent target-domain, we apply a state-of-the-art subspace-based domain adaptation method, Joint Geometrical and Statistical Alignment (JGSA) [12], to map all the data into a latent feature space, such that in the latent feature space instances from the domains can be well-aligned. Finally, to incorporate information from all the latent target domains, we propose an extended Multiple Kernel Learning (MKL) algorithm to train a robust classifier for making predictions on target data. Experiments are conducted on three benchmark datasets on object recognition and human activity recognition, and the results demonstrate the effectiveness of our proposed approach for exploiting multiple latent target domains to improve domain adaptation performance.
The contributions of this paper are summarized as follows.
- 1.
We focus on the practical problem in unsupervised domain adaptation that multiple latent domains are observed in the target domain. We propose an integrated solution by discovering and incorporating the latent target domains.
- 2.
We propose a latent domain discovery scheme based on the inherent characteristics of the target domain and the external relationship between source and target domains.
- 3.
We propose a method to jointly learn the mapping function based on multiple latent domains, which achieves superior performance on different computer vision tasks.
Section snippets
Related work
Traditional balanced domain adaptation approaches focused on either subspace learning or instance re-weighting. For example, in Huang et al. [1], an instance-weighting based on source domain data was proposed to minimize the distribution discrepancy between source and target domains. Subspace learning based unsupervised domain adaptation assumes that there exists a latent space such that the distribution between source and target domain can be minimized [3], [8], [13], [14], and can be further
Proposed methodology
For the consistency in the presentation, we use lowercase/uppercase letter in boldface to represent a vector/matrix, e.g., a denotes a vector and A denotes a matrix. The transpose of a vector/matrix is denoted by the superscript ⊤. The symbol ⊙ defines the element-wise product between two vectors/matrices of the same size.
Object recognition
We first use images collected from Amazon dataset (A), DSLR dataset (D), webcam dataset (W) and Caltech-256 dataset (C). We provide several samples from these four datasets in Fig. 1. Ten common categories in all these datasets are used for evaluation. We consider to use SURF feature [36] by using K-means to build a codebook of 800 clusters, leading to a final 800 dimension features for each image. Moreover, the Decaf6 feature [37] is extracted from pretrained AlexNet. We then consider Office31
Conclusion and future work
In this paper, we propose a new method to discover latent target domain for unsupervised domain adaptation. In particular, we propose three criteria for latent domains discovery: minimizing entropy within each latent domain, maximizing distinctiveness among different latent domain, and minimizing distinctiveness between source domain and each latent target domain. After latent target domains are learned, we leverage the latent target domain information by learning a common subspace for each
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgments
Wen Li is supported by Major Project for New Generation of AI under Grant No. 2018AAA0100400 National Natural Science Foundation of China under Grant No. 61772118. This research was carried out at the Rapid-Rich Object Search (ROSE) Lab at the Nanyang Technological University, Singapore. The ROSE Lab is supported by the National Research Foundation, Singapore, and the Infocomm Media Development Authority, Singapore. Haoliang Li thanks the Wallenberg-NTU Presidential Postdoc Fellowship grant.
Haoliang Li obtained his B.Eng degree from University of Electronic Science and Technology of China in 2013, and the Ph.D. degree from Nanyang Technological University, Singapore, in 2018. He was a project officer in 2018 and a research fellow from July 2018 to May 2019 in Rapid-Rich Object Search Lab, NTU. He is now a Wallenberg-NTU presidential postdoc fellow in NTU. He received the doctorate innovation award from NTU in 2019.
References (43)
- et al.
Semi-supervised transfer subspace for domain adaptation
Pattern Recognit.
(2018) - et al.
Boosting for transfer learning from multiple data sources
Pattern Recognit. Lett.
(2012) - et al.
Review on mining data from multiple data sources
Pattern Recognit. Lett.
(2018) - et al.
Joint learning of multiple latent domains and deep representations for domain adaptation
IEEE Trans. Cybern.
(2019) - et al.
Correcting sample selection bias by unlabeled data
NIPS
(2006) - et al.
Direct importance estimation with model selection and its application to covariate shift adaptation
NIPS
(2008) - et al.
Domain adaptation via transfer component analysis
IEEE Trans. Neural Netw.
(2011) - et al.
Geodesic flow kernel for unsupervised domain adaptation
CVPR
(2012) - et al.
Unsupervised visual domain adaptation using subspace alignment
ICCV
(2013) - et al.
Transfer feature learning with joint distribution adaptation
ICCV
(2013)
Transfer joint matching for unsupervised domain adaptation
CVPR
Domain adaptation for object recognition: An unsupervised approach
ICCV
Subspace interpolation via dictionary learning for unsupervised domain adaptation
CVPR
Reshaping visual datasets for domain adaptation
NIPS
Discovering Latent Domains for Multisource Domain Adaptation
ECCV
Joint geometrical and statistical alignment for visual domain adaptation
CVPR
Return of frustratingly easy domain adaptation.
AAAI
Scatter component analysis: a unified framework for domain adaptation and domain generalization
IEEE Trans. Pattern Anal. Mach. Intell.
Learning transferable features with deep adaptation networks
ICML
Unsupervised pixel-level domain adaptation with generative adversarial networks
The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Cited by (8)
Hierarchical feature disentangling network for universal domain adaptation
2022, Pattern RecognitionCitation Excerpt :To incorporate the sparse representation learning approach in domain adaptation, a domain-shared group-sparse dictionary learning model has been proposed in [22] for joint distribution alignment. Besides these methods, adversarial learning has been proposed for domain adaptation in [3,23,24] inspired by the idea of Generative Adversarial Nets [25]. In the adversarial learning, a discriminator is trained to distinguish features from source and target domains, and a feature extractor is learned to confuse the discriminator for knowledge transfer.
A Two-Way alignment approach for unsupervised multi-Source domain adaptation
2022, Pattern RecognitionCitation Excerpt :This situation makes the issue difficult to solve. A lot of methods for UDA have been proposed in the setting of single-source domain, which refer to single-source UDA [16–19]. Most of these algorithms are developed on the basis of the theoretical generalization error bound established by Ben-David et al. [9–11].
Discriminative feature alignment: Improving transferability of unsupervised domain adaptation by Gaussian-guided latent alignment
2021, Pattern RecognitionCitation Excerpt :Correlation alignment [24] utilizes the difference of the mean and the covariance between the two datasets as the domain divergence, and attempts to match them during the training. The methods based on maximum mean discrepancy (MMD) [25] such as [26] measure the variance between the latent feature distributions of the two domains. Some studies [27,28] also propose to learn the discriminative representations by pseudo-labels and aligning the output class distributions.
Haoliang Li obtained his B.Eng degree from University of Electronic Science and Technology of China in 2013, and the Ph.D. degree from Nanyang Technological University, Singapore, in 2018. He was a project officer in 2018 and a research fellow from July 2018 to May 2019 in Rapid-Rich Object Search Lab, NTU. He is now a Wallenberg-NTU presidential postdoc fellow in NTU. He received the doctorate innovation award from NTU in 2019.
Wen Li received the Ph.D. degree from Nanyang Technological University, Singapore, in 2015. From 2015 to 2019, he was a Post-Doctoral Researcher with the Computer Vision Laboratory, ETH Zrich, Switzerland. He is currently a Professor with the School of Computer Science and Engineering, University of Electronic Science and Technology of China. His main interests include transfer learning, multi-view learning, multiple kernel learning, and their applications in computer vision.
Shiqi Wang received the B.S. degree in computer science from the Harbin Institute of Technology in 2008 and the Ph.D. degree in computer application technology from Peking University in 2014. From 2014 to 2016, he was a Post-Doctoral Fellow with the Department of Electrical and Computer Engineering, University of Waterloo, Waterloo, ON, Canada. From 2016 to 2017, he was with the Rapid-Rich Object Search Laboratory, Nanyang Technological University, Singapore, as a Research Fellow. He is currently an Assistant Professor with the Department of Computer Science, City University of Hong Kong. He has proposed over 30 technical proposals to ISO/MPEG, ITU-T, and AVS standards. His research interests include video compression, image/video quality assessment, and image/video search and analysis.