Skip to main content
Log in

Style-Hallucinated Dual Consistency Learning: A Unified Framework for Visual Domain Generalization

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

Domain shift widely exists in the visual world, while modern deep neural networks commonly suffer from severe performance degradation under domain shift due to poor generalization ability, which limits real-world applications. The domain shift mainly lies in the limited source environmental variations and the large distribution gap between source and unseen target data. To this end, we propose a unified framework, Style-HAllucinated Dual consistEncy learning (SHADE), to handle such domain shift in various visual tasks. Specifically, SHADE is constructed based on two consistency constraints, Style Consistency (SC) and Retrospection Consistency (RC). SC enriches the source situations and encourages the model to learn consistent representation across style-diversified samples. RC leverages general visual knowledge to prevent the model from overfitting to source data and thus largely keeps the representation consistent between the source and general visual models. Furthermore, we present a novel style hallucination module (SHM) to generate style-diversified samples that are essential to consistency learning. SHM selects basis styles from the source distribution, enabling the model to dynamically generate diverse and realistic samples during training. Extensive experiments demonstrate that our versatile SHADE can significantly enhance the generalization in various visual recognition tasks, including image classification, semantic segmentation, and object detection, with different models, i.e., ConvNets and Transformer.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Data Availibility

The datasets analysed during the current study are available as follows: GTAV (Richter et al., 2016): https://download.visinf.tu-darmstadt.de/data/from_games/ SYNTHIA (Ros et al., 2016): http://synthia-dataset.net/ CityScapes (Cordts et al., 2016): https://www.cityscapes-dataset.com/ BDD100K (Yu et al., 2020): https://bdd-data.berkeley.edu/ Mapillary (Neuhold et al., 2017): https://www.mapillary.com/dataset/vistas PACS (Li et al., 2017): https://domaingeneralization.github.io/#data Urban-scene Detection (Wu & Deng, 2022): https://github.com/AmingWu/Single-DGOD

Code Availability

The code is available at https://github.com/HeliosZhao/SHADE-VisualDG.

References

  • Arjovsky, M., Chintala, S., & Bottou, L. (2017). Wasserstein generative adversarial networks. In ICML

  • Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., & Zagoruyko, S. (2020). End-to-end object detection with transformers. In ECCV

  • Carlucci, F. M., D’Innocente, A., Bucci, S., Caputo, B., & Tommasi, T. (2019). Domain generalization by solving jigsaw puzzles. In CVPR

  • Chen, H., Zhao, L., Zhang, H., Wang, Z., Zuo, Z., Li, A., Xing, W., & Lu, D. (2021a). Diverse image style transfer via invertible cross-space mapping. In ICCV

  • Chen, L. C., Zhu, Y., Papandreou, G., Schroff, F., & Adam, H. (2018). Encoder-decoder with atrous separable convolution for semantic image segmentation. In ECCV

  • Chen, M., Zheng, Z., Yang, Y., & Chua, T. S. (2022). PiPa: Pixel-and patch-wise self-supervised learning for domain adaptative semantic segmentation. arXiv preprint arXiv:2211.07609

  • Chen, T., Kornblith, S., Norouzi, M., & Hinton, G. (2020). A simple framework for contrastive learning of visual representations. In ICML

  • Chen, Y., Wang, H., Li, W., Sakaridis, C., Dai, D., & Van Gool, L. (2021). Scale-aware domain adaptive faster R-CNN. IJCV, 129, 2223–2243.

    Article  Google Scholar 

  • Choi, S., Jung, S., Yun, H., Kim, J. T., Kim, S., & Choo, J. (2021). RobustNet: Improving domain generalization in urban-scene segmentation via instance selective whitening. In CVPR

  • Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., & Schiele, B. (2016). The cityscapes dataset for semantic urban scene understanding. In CVPR

  • Deng, J., Dong, W., Socher, R., Li, L. J., Li, K., & Fei-Fei, L. (2009). ImageNet: A large-scale hierarchical image database. In CVPR

  • Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., & Uszkoreit, J. (2021). An image is worth 16 x 16 words: Transformers for image recognition at scale. In ICLR

  • Du, D., Chen, J., Li, Y., Ma, K., Wu, G., Zheng, Y., & Wang, L. (2022). Cross-domain gated learning for domain generalization. IJCV, 130, 2842–2857.

    Article  Google Scholar 

  • Dumoulin, V., Shlens, J., & Kudlur, M. (2017). A learned representation for artistic style. In ICLR

  • Fini, E., Sangineto, E., Lathuilière, S., Zhong, Z., Nabi, M., & Ricci, E. (2021). A unified objective for novel class discovery. In ICCV

  • French, G., Laine, S., Aila, T., Mackiewicz, M., & Finlayson, G. (2020). Semi-supervised semantic segmentation needs strong, varied perturbations. In BMVC

  • Gong, R., Li, W., Chen, Y., Dai, D., & Van Gool, L. (2021). DLOW: Domain flow and applications. IJCV, 129, 2865–2888.

    Article  Google Scholar 

  • Gretton, A., Borgwardt, K. M., Rasch, M. J., Schölkopf, B., & Smola, A. (2012). A kernel two-sample test. JMLR.

  • Halmos, P. R. (1987). Finite-dimensional vector spaces. Springer.

  • Hassaballah, M., Kenk, M. A., Muhammad, K., & Minaee, S. (2020). Vehicle detection and tracking in adverse weather using a deep learning framework. IEEE Transactions on Intelligent Transportation Systems, 22, 4230–4242.

    Article  Google Scholar 

  • He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In CVPR

  • He, K., Gkioxari, G., Dollár, P., & Girshick, R. (2017). Mask R-CNN. In ICCV

  • He, K., Fan, H., Wu, Y., Xie, S., & Girshick, R. (2020). Momentum contrast for unsupervised visual representation learning. In CVPR

  • Hendrycks, D., Mu, N., Cubuk, E. D., Zoph, B., Gilmer, J., & Lakshminarayanan, B. (2020). AugMix: A simple data processing method to improve robustness and uncertainty. In ICLR

  • Hoffman, J., Wang, D., Yu, F., & Darrell, T. (2016). FCNs in the wild: Pixel-level adversarial and constraint-based adaptation. arXiv preprint arXiv:1612.02649

  • Hoyer, L., Dai, D., & Van Gool, L. (2022). DAFormer: Improving network architectures and training strategies for domain-adaptive semantic segmentation. In CVPR

  • Huang, J., Guan, D., Xiao, A., & Lu, S. (2021). FSDR: Frequency space domain randomization for domain generalization. In CVPR

  • Huang, L., Zhou, Y., Zhu, F., Liu, L., & Shao, L. (2019). Iterative normalization: Beyond standardization towards efficient whitening. In CVPR

  • Huang, X., & Belongie, S. (2017). Arbitrary style transfer in real-time with adaptive instance normalization. In ICCV

  • Huang, Z., Wang, H., Xing, E. P., & Huang, D. (2020). Self-challenging improves cross-domain generalization. In ECCV

  • Kannan, H., Kurakin, A., & Goodfellow, I. (2018). Adversarial logit pairing. In ICML

  • Kim, J., Lee, J., Park, J., Min, D., & Sohn, K. (2022). Pin the memory: Learning to generalize semantic segmentation. In CVPR

  • Lee, S., Seong, H., Lee, S., & Kim, E. (2022). WildNet: Learning domain generalized semantic segmentation from the wild. In CVPR

  • Li, D., Yang, Y., Song, Y. Z., & Hospedales, T. M. (2017). Deeper, broader and artier domain generalization. In ICCV

  • Li, D., Yang, Y., Song, Y. Z., & Hospedales, T. (2018a). Learning to generalize: Meta-learning for domain generalization. In AAAI

  • Li, Y., Tian, X., Gong, M., Liu, Y., Liu, T., Zhang, K., & Tao, D. (2018b). Deep domain generalization via conditional invariant adversarial networks. In ECCV

  • Lin, C., Yuan, Z., Zhao, S., Sun, P., Wang, C., & Cai, J. (2021). Domain-invariant disentangled network for generalizable object detection. In ICCV

  • Liu, W., Rabinovich, A., & Berg, A. C. (2015). ParseNet: Looking wider to see better. arXiv preprint arXiv:1506.04579

  • Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., & Guo, B. (2021). Swin transformer: Hierarchical vision transformer using shifted windows. In ICCV

  • Loshchilov, I., & Hutter, F. (2019). Decoupled weight decay regularization. In ICLR

  • MacQueen, J. (1967). Some methods for classification and analysis of multivariate observations. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability

  • Neuhold, G., Ollmann, T., Rota Bulo, S., & Kontschieder, P. (2017). The mapillary vistas dataset for semantic understanding of street scenes. In ICCV

  • Nuriel, O., Benaim, S., & Wolf, L. (2021). Permuted AdaIN: Reducing the bias towards global statistics in image classification. In CVPR

  • Pan, X., Luo, P., Shi, J., & Tang, X. (2018). Two at once: Enhancing learning and generalization capacities via IBN-Net. In ECCV

  • Pan, X., Zhan, X., Shi, J., Tang, X., & Luo, P. (2019). Switchable whitening for deep representation learning. In ICCV

  • Peng, D., Lei, Y., Liu, L., Zhang, P., & Liu, J. (2021). Global and local texture randomization for synthetic-to-real semantic segmentation. IEEE TIP, 30, 6594–6608.

    Google Scholar 

  • Peng, D., Lei, Y., Hayat, M., Guo, Y., & Li, W. (2022). Semantic-aware domain generalized segmentation. In CVPR

  • Qi, C. R., Yi, L., Su, H., & Guibas, L. J. (2017). PointNet++: Deep hierarchical feature learning on point sets in a metric space. In NeurIPS

  • Qiao, F., Zhao, L., & Peng, X. (2020). Learning to learn single domain generalization. In CVPR

  • Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster R-CNN: Towards real-time object detection with region proposal networks. In NeurIPS

  • Richter, S. R., Vineet, V., Roth, S., & Koltun, V. (2016). Playing for data: Ground truth from computer games. In ECCV

  • Ros, G., Sellart, L., Materzynska, J., Vazquez, D., & Lopez, A. M. (2016). The synthia dataset: A large collection of synthetic images for semantic segmentation of urban scenes. In CVPR

  • Roy, S., Liu, M., Zhong, Z., Sebe, N., & Ricci, E. (2022). Class-incremental novel class discovery. In ECCV

  • Sakaridis, C., Dai, D., & Van Gool, L. (2018). Semantic foggy scene understanding with synthetic data. IJCV, 126, 973–992.

    Article  Google Scholar 

  • Sakaridis, C., Dai, D., & Gool, L. V. (2019). Guided curriculum model adaptation and uncertainty-aware evaluation for semantic nighttime image segmentation. In ICCV

  • Sakaridis, C., Dai, D., & Van Gool, L. (2021). ACDC: The adverse conditions dataset with correspondences for semantic driving scene understanding. In ICCV

  • Shankar, S., Piratla, V., Chakrabarti, S., Chaudhuri, S., Jyothi, P., & Sarawagi, S. (2018). Generalizing across domains via cross-gradient training. In ICLR

  • Shui, C., Li, Z., Li, J., Gagné, C., Ling, C. X., & Wang, B. (2021). Aggregating from multiple target-shifted sources. In ICML

  • Shui, C., Chen, Q., Wen, J., Zhou, F., Gagné, C., & Wang, B. (2022). A novel domain adaptation theory with Jensen–Shannon divergence. Knowledge-Based Systems, 257, 109808.

    Article  Google Scholar 

  • Shui, C., Wang, B., & Gagné, C. (2022). On the benefits of representation regularization in invariance based domain generalization. Machine Learning, 111, 895–915.

    Article  MathSciNet  PubMed  PubMed Central  Google Scholar 

  • Shui, C., Xu, G., Chen, Q., Li, J., Ling, C. X., Arbel, T., Wang, B., & Gagné, C. (2022c). On learning fairness and accuracy on multiple subgroups. In NeurIPS

  • Tang, Z., Gao, Y., Zhu, Y., Zhang, Z., Li, M., & Metaxas, D. (2021). SelfNorm and CrossNorm for out-of-distribution robustness. In ICCV

  • Tarvainen, A., & Valpola, H. (2017). Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. In NeurIPS

  • Vapnik, V. (2013). The nature of statistical learning theory. Springer science & business media.

  • Wang, H., Xiao, C., Kossaifi, J., Yu, Z., Anandkumar, A., & Wang, Z. (2021a). AugMax: Adversarial composition of random augmentations for robust training. In NeurIPS

  • Wang, P., Li, Y., & Vasconcelos, N. (2021b). Rethinking and improving the robustness of image style transfer. In CVPR

  • Wang, Z., Luo, Y., Qiu, R., Huang, Z., & Baktashmotlagh, M. (2021c). Learning to diversify for single domain generalization. In ICCV

  • Wu, A., & Deng, C. (2022). Single-domain generalized object detection in urban scene via cyclic-disentangled self-distillation. In CVPR

  • Wu, A., Liu, R., Han, Y., Zhu, L., & Yang, Y. (2021). Vector-decomposed disentanglement for domain-invariant object detection. In ICCV

  • Xie, E., Wang, W., Yu, Z., Anandkumar, A., Alvarez, J. M., & Luo, P. (2021). SegFormer: Simple and efficient design for semantic segmentation with transformers. NeurIPS, 34, 12077–12090.

    Google Scholar 

  • Yu, F., Chen, H., Wang, X., Xian, W., Chen, Y., Liu, F., Madhavan, V., & Darrell, T. (2020). BDD100K: A diverse driving dataset for heterogeneous multitask learning. In CVPR

  • Yuan, J., Ma, X., Chen, D., Kuang, K., Wu, F., & Lin, L. (2022). Domain-specific bias filtering for single labeled domain generalization. IJCV, 131, 552–571.

    Article  Google Scholar 

  • Yue, X., Zhang, Y., Zhao, S., Sangiovanni-Vincentelli, A., Keutzer, K., & Gong, B. (2019). Domain randomization and pyramid consistency: simulation-to-real generalization without accessing target domain data. In ICCV

  • Zhao, L., Liu, T., Peng, X., & Metaxas, D. (2020). Maximum-entropy adversarial data augmentation for improved generalization and robustness. In NeurIPS

  • Zhao, Y., Zhong, Z., Yang, F., Luo, Z., Lin, Y., Li, S., & Nicu, S. (2021). Learning to generalize unseen domains via memory-based multi-source meta-learning for person re-identification. In CVPR

  • Zhao, Y., Zhong, Z., Luo, Z., Lee, G. H., & Sebe, N. (2022). Source-free open compound domain adaptation in semantic segmentation. IEEE TCSVT, 32, 7019–7032.

    Google Scholar 

  • Zhao, Y., Zhong, Z., Sebe, N., & Lee, G. H. (2022b). Novel class discovery in semantic segmentation. In CVPR

  • Zhao, Y., Zhong, Z., Zhao, N., Sebe, N., & Lee, G. H. (2022c). Style-hallucinated dual consistency learning for domain generalized semantic segmentation. In ECCV

  • Zheng, Z., & Yang, Y. (2020). Unsupervised scene adaptation with memory regularization in vivo. In IJCAI

  • Zheng, Z., & Yang, Y. (2021). Rectifying pseudo label learning via uncertainty estimation for domain adaptive semantic segmentation. In IJCV

  • Zheng, Z., & Yang, Y. (2022). Adaptive boosting for domain adaptation: Toward robust predictions in scene segmentation. IEEE TIP, 31, 5371–5382.

    Google Scholar 

  • Zhong, Z., Zhu, L., Luo, Z., Li, S., Yang, Y., & Sebe, N. (2021). OpenMix: Reviving known knowledge for discovering novel visual categories in an open world. In CVPR

  • Zhong, Z., Zhao, Y., Lee, G. H., & Sebe, N. (2022). Adversarial style augmentation for domain generalized urban-scene segmentation. In NeurIPS

  • Zhou, K., Yang, Y., Qiao, Y., & Xiang, T. (2021a). Domain generalization with mixstyle. In ICLR

  • Zhou, Q., Feng, Z., Gu, Q., Pang, J., Cheng, G., Lu, X., Shi, J., & Ma, L. (2021b). Context-aware mixup for domain adaptive semantic segmentation. arXiv preprint arXiv:2108.03557

  • Zhou, Q., Feng, Z., Gu, Q., Cheng, G., Lu, X., Shi, J., & Ma, L. (2022). Uncertainty-aware consistency regularization for cross-domain semantic segmentation. Computer Vision and Image Understanding, 221, 103448.

    Article  Google Scholar 

  • Zhu, J. Y., Park, T., Isola, P., & Efros, A. A. (2017). Unpaired image-to-image translation using cycle-consistent adversarial networks. In ICCV

Download references

Acknowledgements

This research/project is supported by the National Research Foundation Singapore and DSO National Laboratories under the AI Singapore Programme (AISG Award No: AISG2-RP-2020-016), the Tier 2 grant MOET2EP20120- 0011 from the Singapore Ministry of Education, the MUR PNRR project FAIR (PE00000013) funded by the NextGenerationEU, and the EU project Ai4Trust (No. 101070190).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yuyang Zhao.

Additional information

Communicated by Oliver Zendel.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhao, Y., Zhong, Z., Zhao, N. et al. Style-Hallucinated Dual Consistency Learning: A Unified Framework for Visual Domain Generalization. Int J Comput Vis 132, 837–853 (2024). https://doi.org/10.1007/s11263-023-01911-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-023-01911-w

Keywords

Navigation