Abstract
Unsupervised Domain Adaptation (UDA) seeks to utilize the knowledge acquired from a source domain, abundant in labeled data, and apply it to a target domain that contains only unlabeled data. The majority of existing UDA research focuses on learning domain-invariant feature representations for both domains by minimizing the domain gap using convolution-based neural networks. Recently, vision transformers have made significant strides in enhancing performance across various visual tasks. In this paper, we introduce a Bidirectional Cross-Attention Transformer (BCAT) for UDA, which is built upon vision transformers with the goal of improving performance. The proposed BCAT employs an attention mechanism to extract implicit source and target mixup feature representations, thereby reducing the domain discrepancy. More specifically, BCAT is designed as a weight-sharing quadruple-branch transformer with a bidirectional cross-attention mechanism, allowing it to learn domain-invariant feature representations. Comprehensive experiments indicate that our proposed BCAT model outperforms existing state-of-the-art UDA methods, both convolution-based and transformer-based, on four benchmark datasets.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12346, pp. 213–229. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58452-8_13
Chen, C., Chen, Z., Jiang, B., Jin, X.: Joint domain alignment and discriminative feature learning for unsupervised deep domain adaptation. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 3296–3303 (2019)
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255. IEEE (2009)
Dong, M., Wang, Y., Chen, X., Xu, C.: Towards stable and robust addernets. Adv. Neural Inf. Process. Syst. 34, 13255–13265 (2021)
Dosovitskiy, A., et al.: An image is worth 16\(\times \)16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)
Fu, B., Cao, Z., Long, M., Wang, J.: Learning to detect open classes for universal domain adaptation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12360, pp. 567–583. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58555-6_34
Ganin, Y., Lempitsky, V.: Unsupervised domain adaptation by backpropagation. In: International Conference on Machine Learning, pp. 1180–1189. PMLR (2015)
Girdhar, R., Carreira, J., Doersch, C., Zisserman, A.: Video action transformer network. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 244–253 (2019)
Gretton, A., Borgwardt, K., Rasch, M., Schölkopf, B., Smola, A.: A kernel method for the two-sample-problem. Adv. Neural Inf. Process. Syst. 19, 513–520 (2006)
Han, K., Xiao, A., Wu, E., Guo, J., Xu, C., Wang, Y.: Transformer in transformer. Adv. Neural Inf. Process. Syst. 34, 1–12 (2021)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016)
Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 25, 1–9 (2012)
Li, G., Kang, G., Zhu, Y., Wei, Y., Yang, Y.: Domain consensus clustering for universal domain adaptation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9757–9766 (2021)
Liang, J., Hu, D., Feng, J.: Do we really need to access the source data? source hypothesis transfer for unsupervised domain adaptation. In: International Conference on Machine Learning, pp. 6028–6039. PMLR (2020)
Liang, J., Hu, D., Feng, J.: Domain adaptation with auxiliary target domain-oriented classifier. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16632–16642 (2021)
Liu, Z., et al.: Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021)
Long, M., Cao, Y., Wang, J., Jordan, M.: Learning transferable features with deep adaptation networks. In: International Conference on Machine Learning, pp. 97–105. PMLR (2015)
Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101 (2017)
Munir, F., Azam, S., Jeon, M.: Sstn: self-supervised domain adaptation thermal object detection for autonomous driving. In: 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 206–213. IEEE (2021)
Neimark, D., Bar, O., Zohar, M., Asselmann, D.: Video transformer network. arXiv preprint arXiv:2102.00719 (2021)
Peng, X., Bai, Q., Xia, X., Huang, Z., Saenko, K., Wang, B.: Moment matching for multi-source domain adaptation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1406–1415 (2019)
Peng, X., Usman, B., Kaushik, N., Hoffman, J., Wang, D., Saenko, K.: Visda: the visual domain adaptation challenge. arXiv preprint arXiv:1710.06924 (2017)
Qiu, Z., Yang, H., Fu, J., Fu, D.: Learning spatiotemporal frequency-transformer for compressed video super-resolution. In: Computer Vision-ECCV 2022: 17th European Conference, Tel Aviv, Israel, 23–27 October 2022, Proceedings, Part XVIII. pp. 257–273. Springer, Heidelberg (2022). https://doi.org/10.1007/978-3-031-19797-0_15
Qiu, Z., Yang, H., Fu, J., Liu, D., Xu, C., Fu, D.: Learning spatiotemporal frequency-transformer for low-quality video super-resolution. arXiv preprint arXiv:2212.14046 (2022)
Ranftl, R., Bochkovskiy, A., Koltun, V.: Vision transformers for dense prediction. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12179–12188 (2021)
Robbins, H., Monro, S.: A stochastic approximation method. Ann. Math. Stat. 22, 400–407 (1951)
Saenko, K., Kulis, B., Fritz, M., Darrell, T.: Adapting visual category models to new domains. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6314, pp. 213–226. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15561-1_16
Touvron, H., Cord, M., Douze, M., Massa, F., Sablayrolles, A., Jégou, H.: Training data-efficient image transformers & distillation through attention. In: International Conference on Machine Learning, pp. 10347–10357. PMLR (2021)
Tzeng, E., Hoffman, J., Zhang, N., Saenko, K., Darrell, T.: Deep domain confusion: maximizing for domain invariance. arXiv preprint arXiv:1412.3474 (2014)
Vaswani, A., et al.: Attention is all you need. Adv. Neural Inf. Process. Syst. 30, 1–11 (2017)
Venkateswara, H., Eusebio, J., Chakraborty, S., Panchanathan, S.: Deep hashing network for unsupervised domain adaptation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5018–5027 (2017)
Wang, J., Chen, Y., Feng, W., Yu, H., Huang, M., Yang, Q.: Transfer learning with dynamic distribution adaptation. ACM Trans. Intell. Syst. Technol. (TIST) 11(1), 1–25 (2020)
Wang, Q., Li, B., Xiao, T., Zhu, J., Li, C., Wong, D.F., Chao, L.S.: Learning deep transformer models for machine translation. In: ACL (2019)
Wang, Y., Du, B., Xu, C.: Multi-tailed vision transformer for efficient inference. arXiv preprint arXiv:2203.01587 (2022)
Wang, Y., Wang, X., Dinh, A.D., Du, B., Xu, C.: Learning to schedule in diffusion probabilistic models. In: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (2023)
Wang, Y., Xu, C., Du, B., Lee, H.: Learning to weight imperfect demonstrations. In: Proceedings of the 38th Annual International Conference on Machine Learning (2021)
Xu, T., Chen, W., Wang, P., Wang, F., Li, H., Jin, R.: Cdtrans: cross-domain transformer for unsupervised domain adaptation. arXiv preprint arXiv:2109.06165 (2021)
Yang, G., et al.: Transformer-based source-free domain adaptation. arXiv preprint arXiv:2105.14138 (2021)
Yang, J., Liu, J., Xu, N., Huang, J.: Tvt: transferable vision transformer for unsupervised domain adaptation. arXiv preprint arXiv:2108.05988 (2021)
Yang, Q., Zhang, Y., Dai, W., Pan, S.J.: Transfer Learning. Cambridge University Press, Cambridge (2020)
Zhang, J., Huang, J., Luo, Z., Zhang, G., Lu, S.: Da-detr: domain adaptive detection transformer by hybrid attention. arXiv preprint arXiv:2103.17084 (2021)
Acknowledgments
This work is supported by NSFC key grant under grant no. 62136005, NSFC general grant under grant no. 62076118, and Shenzhen fundamental research program JCYJ20210324105000003.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Wang, X., Guo, P., Zhang, Y. (2023). Unsupervised Domain Adaptation via Bidirectional Cross-Attention Transformer. In: Koutra, D., Plant, C., Gomez Rodriguez, M., Baralis, E., Bonchi, F. (eds) Machine Learning and Knowledge Discovery in Databases: Research Track. ECML PKDD 2023. Lecture Notes in Computer Science(), vol 14173. Springer, Cham. https://doi.org/10.1007/978-3-031-43424-2_19
Download citation
DOI: https://doi.org/10.1007/978-3-031-43424-2_19
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-43423-5
Online ISBN: 978-3-031-43424-2
eBook Packages: Computer ScienceComputer Science (R0)