Abstract
Unsupervised domain adaptation (UDA) is an important solution for the cross-domain problem in semantic segmentation. Existing segmentation UDA methods mainly consider the domain shift as the major challenge. This paper, from a novel viewpoint, disentangles the cross-domain problem into two negative factors beyond the domain shift. Specifically, we find that apart from the domain shift factor, the dispersed within-class distribution on the target domain is another factor that compromises cross-domain segmentation. This paper finds that the neglected target domain distribution dispersion is a challenge as crucial as the domain shift. In response to the joint of these two negative factors, we propose a “Pull-and-Concentrate” (PuCo) method comprised of two consistencies: (1) A cross-domain consistency “pulls” the source and target domain distribution (of the same class) close to each other based on a novel statistical style transfer. (2) An intra-domain consistency “concentrates” the within-class distribution on the target domain in a new unsupervised teacher-student method. Both consistencies have the advantage of being robust (or insulated) from pseudo-label noises. This advantage allows PuCo to bring consistent improvement over a battery of pseudo-label-based UDA methods. For example, on GTA5 to Cityscapes and SYNTHIA to Cityscapes, PuCo achieves \(60.3\%\) and \(57.2\%\) mean IoU, respectively. Code is available at https://github.com/Jarvis73/PuCo.
Similar content being viewed by others
Data avilability
The data used in the experiments are all open-source. For details, please refer to the source code https://github.com/Jarvis73/PuCo.
References
Hoffman, J., Wang, D., Yu, F., Darrell, T.: FCNs in the Wild: Pixel-level Adversarial and Constraint-based Adaptation. arXiv:1612.02649 (2016)
Tsai, Y.-H., Hung, W.-C., Schulter, S., Sohn, K., Yang, M.-H., Chandraker, M.: Learning to Adapt Structured Output Space for Semantic Segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7472–7481 (2018)
Hoffman, J., Tzeng, E., Park, T., Zhu, J.-Y., Isola, P., Saenko, K., Efros, A., Darrell, T.: Cycada: Cycle-consistent adversarial domain adaptation. In: International Conference on Machine Learning, pp. 1989–1998 (2018). PMLR
Wu, Z., Han, X., Lin, Y.-L., Uzunbas, M.G., Goldstein, T., Lim, S.N., Davis, L.S.: DCAN: Dual Channel-wise Alignment Networks for Unsupervised Scene Adaptation. In: ECCV, pp. 518–534 (2018)
Luo, Y., Liu, P., Guan, T., Yu, J., Yang, Y.: Significance-Aware Information Bottleneck for Domain Adaptive Semantic Segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 6778–6787 (2019)
Yang, Y., Soatto, S.: FDA: Fourier Domain Adaptation for Semantic Segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4085–4095 (2020)
Zou, Y., Yu, Z., Vijaya Kumar, B.V.K., Wang, J.: Unsupervised domain adaptation for semantic segmentation via class-balanced self-training. In: ECCV (2018)
Luo, Y., Zheng, L., Guan, T., Yu, J., Yang, Y.: Taking a Closer Look at Domain Shift: Category-Level Adversaries for Semantics Consistent Domain Adaptation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2507–2516 (2019)
Zhang, Q., Zhang, J., Liu, W., Tao, D.: Category Anchor-Guided Unsupervised Domain Adaptation for Semantic Segmentation. Advances in Neural Information Processing Systems 32 (2019)
Mei, K., Zhu, C., Zou, J., Zhang, S.: Instance Adaptive Self-training for Unsupervised Domain Adaptation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV. Lecture Notes in Computer Science, pp. 415–430. Springer International Publishing, Cham (2020)
Wang, H., Shen, T., Zhang, W., Duan, L.-Y., Mei, T.: Classes Matter: A Fine-Grained Adversarial Approach to Cross-Domain Semantic Segmentation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV. Lecture Notes in Computer Science, pp. 642–659. Springer International Publishing, Cham (2020)
Zhang, P., Zhang, B., Zhang, T., Chen, D., Wang, Y., Wen, F.: Prototypical Pseudo Label Denoising and Target Structure Learning for Domain Adaptive Semantic Segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 12414–12424 (2021)
Araslanov, N., Roth, S.: Self-Supervised Augmentation Consistency for Adapting Semantic Segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 15384–15394 (2021)
Zou, Y., Yu, Z., Liu, X., Kumar, B.V.K.V., Wang, J.: Confidence Regularized Self-Training. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 5982–5991 (2019)
Huang, X., Belongie, S.: Arbitrary style transfer in real-time with adaptive instance normalization. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 1501–1510 (2017)
Zhou, K., Yang, Y., Qiao, Y., Xiang, T.: Domain generalization with MixStyle. In: ICLR (2021)
Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., Piot, B., Kavukcuoglu, K., Munos, R., Valko, M.: Bootstrap Your Own Latent - A New Approach to Self-Supervised Learning. Adv. Neural. Inf. Process. Syst. 33, 21271–21284 (2020)
He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.: Momentum Contrast for Unsupervised Visual Representation Learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9729–9738 (2020)
Wang, X., Zhang, R., Shen, C., Kong, T., Li, L.: Dense Contrastive Learning for Self-Supervised Visual Pre-Training. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3024–3033 (2021)
Wei, C., Shen, K., Chen, Y., Ma, T.: Theoretical Analysis of Self-Training with Deep Networks on Unlabeled Data. In: ICLR (2020)
Chapelle, O., Scholkopf, B., Zien, A. Eds.: Semi-Supervised Learning (Chapelle, O. et al., Eds.; 2006) [Book reviews]. IEEE Transactions on Neural Networks 20(3), 542–542 (2009)
Amini, M.-R., Feofanov, V., Pauletto, L., Devijver, E., Maximov, Y.: Self-Training: A Survey. arXiv
Zheng, Z., Yang, Y.: Rectifying Pseudo Label Learning via Uncertainty Estimation for Domain Adaptive Semantic Segmentation. IJCV 129(4), 1106–1120 (2021)
Cheng, Y., Wei, F., Bao, J., Chen, D., Wen, F., Zhang, W.: Dual Path Learning for Domain Adaptation of Semantic Segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 9082–9091 (2021)
Li, W., Yang, X., Li, Z.: Mlcb-net: a multi-level class balancing network for domain adaptive semantic segmentation. Multimedia Systems, 1–12 (2023)
Melas-Kyriazi, L., Manrai, A.K.: PixMatch: Unsupervised Domain Adaptation via Pixelwise Consistency Training. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 12435–12445 (2021)
Wang, Z., Yu, M., Wei, Y., Feris, R., Xiong, J., Hwu, W.-m., Huang, T.S., Shi, H.: Differential Treatment for Stuff and Things: A Simple Unsupervised Domain Adaptation Method for Semantic Segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 12635–12644 (2020)
Guo, X., Yang, C., Li, B., Yuan, Y.: MetaCorrection: Domain-aware Meta Loss Correction for Unsupervised Domain Adaptation in Semantic Segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3927–3936 (2021)
Li, R., Li, S., He, C., Zhang, Y., Jia, X., Zhang, L.: Class-Balanced Pixel-Level Self-Labeling for Domain Adaptive Semantic Segmentation. arXiv:2203.09744 [cs] (2022)
Xie, B., Li, S., Li, M., Liu, C.H., Huang, G., Wang, G.: SePiCo: Semantic-Guided Pixel Contrast for Domain Adaptive Semantic Segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 45(7), 9004–9021 (2023)
Li, T., Roy, S., Zhou, H., Lu, H., Lathuilière, S.: Contrast, Stylize and Adapt: Unsupervised Contrastive Learning Framework for Domain Adaptive Semantic Segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4868–4878 (2023)
Hoyer, L., Dai, D., Van Gool, L.: DAFormer: Improving Network Architectures and Training Strategies for Domain-Adaptive Semantic Segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022)
Hoyer, L., Dai, D., Van Gool, L.: HRDA: Context-Aware High-Resolution Domain-Adaptive Semantic Segmentation. arXiv:2204.13132 [cs] (2022)
Gong, R., Wang, Q., Danelljan, M., Dai, D., Van Gool, L.: Continuous Pseudo-Label Rectified Domain Adaptive Semantic Segmentation With Implicit Neural Representations. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7225–7235 (2023)
Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. Advances in neural information processing systems 30 (2017)
Laine, S., Aila, T.: Temporal Ensembling for Semi-Supervised Learning. arXiv:1610.02242 (2017)
Gong, C., Wang, D., Liu, Q.: AlphaMatch: Improving Consistency for Semi-Supervised Learning With Alpha-Divergence. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 13683–13692 (2021)
Hyun, M., Jeong, J., Kwak, N.: Class-Imbalanced Semi-Supervised Learning. arXiv:2002.06815 (2020)
Sohn, K., Berthelot, D., Carlini, N., Zhang, Z., Zhang, H., Raffel, C.A., Cubuk, E.D., Kurakin, A., Li, C.-L.: Fixmatch: Simplifying semi-supervised learning with consistency and confidence. Adv. Neural. Inf. Process. Syst. 33, 596–608 (2020)
Ghosh, A., Thiery, A.H.: On Data-Augmentation and Consistency-Based Semi-Supervised Learning. In: ICLR (2020)
Lai, X., Tian, Z., Jiang, L., Liu, S., Zhao, H., Wang, L., Jia, J.: Semi-Supervised Semantic Segmentation With Directional Context-Aware Consistency. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1205–1214 (2021)
Wu, Y., Liu, C., Chen, L., Zhao, D., Zheng, Q., Zhou, H.: Perturbation consistency and mutual information regularization for semi-supervised semantic segmentation. Multimedia Systems, 1–13 (2022)
Xie, Z., Lin, Y., Zhang, Z., Cao, Y., Lin, S., Hu, H.: Propagate Yourself: Exploring Pixel-Level Consistency for Unsupervised Visual Representation Learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 16684–16693 (2021)
Wang, W., Zhou, T., Yu, F., Dai, J., Konukoglu, E., Van Gool, L.: Exploring Cross-Image Pixel Contrast for Semantic Segmentation. arXiv:2101.11939 (2021)
Liang, X., Wu, L., Li, J., Wang, Y., Meng, Q., Qin, T., Chen, W., Zhang, M., Liu, T.-Y.: R-Drop: Regularized Dropout for Neural Networks. arXiv:2106.14448 (2021)
Huang, T., Sun, Y., Wang, X., Yao, H., Zhang, C.: Spatial Ensemble: A Novel Model Smoothing Mechanism for Student-Teacher Framework. In: Advances in Neural Information Processing Systems, vol. 34, pp. 15957–15968. Curran Associates, Inc
Hinton, G.E., Srivastava, N., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.R.: Improving neural networks by preventing co-adaptation of feature detectors. arXiv:1207.0580 (2012)
Yang, Y., Zhuang, Y., Pan, Y.: Multiple knowledge representation for big data artificial intelligence: framework, applications, and case studies. Frontiers of Information Technology & Electronic Engineering 22(12), 1551–1558 (2021)
Gatys, L.A., Ecker, A.S., Bethge, M.: Image Style Transfer Using Convolutional Neural Networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2414–2423 (2016)
Ulyanov, D., Vedaldi, A., Lempitsky, V.: Instance normalization: The missing ingredient for fast stylization. arXiv:1607.08022 (2016)
Peng, D., Lei, Y., Liu, L., Zhang, P., Liu, J.: Global and Local Texture Randomization for Synthetic-to-Real Semantic Segmentation 30, 6594–6608
Zhao, Y., Zhong, Z., Luo, Z., Lee, G.H., Sebe, N.: Source-Free Open Compound Domain Adaptation in Semantic Segmentation, 1–1
Wang, X., Zhu, L., Zheng, Z., Xu, M., Yang, Y.: Align and tell: Boosting text-video retrieval with local alignment and fine-grained supervision. IEEE Transactions on Multimedia (2022)
Li, Y., Yuan, L., Vasconcelos, N.: Bidirectional Learning for Domain Adaptation of Semantic Segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6936–6945 (2019)
Yang, J., An, W., Wang, S., Zhu, X., Yan, C., Huang, J.: Label-Driven Reconstruction for Domain Adaptation in Semantic Segmentation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV. Lecture Notes in Computer Science, pp. 480–498. Springer International Publishing, Cham (2020)
Musto, L., Zinelli, A.: Semantically Adaptive Image-to-image Translation for Domain Adaptation of Semantic Segmentation. arXiv:2009.01166 (2020)
Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456. JMLR.org
French, G., Mackiewicz, M., Fisher, M.: Self-ensembling for visual domain adaptation. In: ICLR (2018)
Richter, S.R., Vineet, V., Roth, S., Koltun, V.: Playing for data: Ground truth from computer games. In: ECCV, pp. 102–118 (2016). Springer
Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The Cityscapes Dataset for Semantic Urban Scene Understanding. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3213–3223 (2016)
Ros, G., Sellart, L., Materzynska, J., Vazquez, D., Lopez, A.M.: The SYNTHIA Dataset: A Large Collection of Synthetic Images for Semantic Segmentation of Urban Scenes. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3234–3243 (2016)
Chen, L., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs. IEEE TPAMI 40(4), 834–848 (2018)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016)
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: A simple way to prevent neural networks from overfitting 15(56), 1929–1958
Chen, L.-C., Papandreou, G., Schroff, F., Adam, H.: Rethinking Atrous Convolution for Semantic Image Segmentation. arXiv:1706.05587 (2017)
Chen, X., He, K.: Exploring Simple Siamese Representation Learning. arXiv:2011.10566 (2020)
Tranheden, W., Olsson, V., Pinto, J., Svensson, L.: DACS: Domain Adaptation via Cross-Domain Mixed Sampling. In: WACV, pp. 1379–1389 (2021)
Vu, T.-H., Jain, H., Bucher, M., Cord, M., Perez, P.: ADVENT: Adversarial Entropy Minimization for Domain Adaptation in Semantic Segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2517–2526 (2019)
Yang, J., Xu, R., Li, R., Qi, X., Shen, X., Li, G., Lin, L.: An Adversarial Perturbation Oriented Domain Adaptation Approach for Semantic Segmentation. Proceedings of the AAAI Conference on Artificial Intelligence 34(07), 12613–12620 (2020)
Tsai, Y.-H., Sohn, K., Schulter, S., Chandraker, M.: Domain Adaptation for Structured Output via Discriminative Patch Representations. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 1456–1465 (2019)
Truong, T.-D., Duong, C.N., Le, N., Phung, S.L., Rainwater, C., Luu, K.: BiMaL: Bijective Maximum Likelihood Approach to Domain Adaptation in Semantic Scene Segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 8548–8557 (2021)
Zhang, Y., Qiu, Z., Yao, T., Ngo, C.-W., Liu, D., Mei, T.: Transferring and Regularizing Prediction for Semantic Segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9621–9630 (2020)
Lian, Q., Lv, F., Duan, L., Gong, B.: Constructing Self-Motivated Pyramid Curriculums for Cross-Domain Semantic Segmentation: A Non-Adversarial Approach. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6758–6767 (2019)
Ma, H., Lin, X., Wu, Z., Yu, Y.: Coarse-To-Fine Domain Adaptive Semantic Segmentation With Photometric Alignment and Category-Center Regularization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4051–4060 (2021)
Liu, Y., Deng, J., Gao, X., Li, W., Duan, L.: BAPA-Net: Boundary Adaptation and Prototype Alignment for Cross-Domain Semantic Segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 8801–8811 (2021)
Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A Simple Framework for Contrastive Learning of Visual Representations. ICML 1 (2020)
McInnes, L., Healy, J., Melville, J.: UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction. arXiv:1802.03426 (2020)
Acknowledgements
Jian-Wei Zhang’s work on this paper is done during an internship at Baidu Research. This work is supported by the National Natural Science Foundation of China (62132017) and Fundamental Research Funds for the Central Universities (226-2022-00235).
Author information
Authors and Affiliations
Contributions
Jian-Wei Zhang and Yifan Sun wrote the main manuscript text and Jian-Wei Zhang prepared all the figures. All the authors reviewed the manuscript.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no competing interests.
Additional information
Communicated by A. Liu.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Zhang, JW., Sun, Y. & Chen, W. Pull and concentrate: improving unsupervised semantic segmentation adaptation with cross- and intra-domain consistencies. Multimedia Systems 29, 2633–2650 (2023). https://doi.org/10.1007/s00530-023-01131-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00530-023-01131-9