Style-Hallucinated Dual Consistency Learning: A Unified Framework for Visual Domain Generalization

Zhao, Yuyang; Zhong, Zhun; Zhao, Na; Sebe, Nicu; Lee, Gim Hee

doi:10.1007/s11263-023-01911-w

Style-Hallucinated Dual Consistency Learning: A Unified Framework for Visual Domain Generalization

Published: 18 October 2023

Volume 132, pages 837–853, (2024)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

Yuyang Zhao ORCID: orcid.org/0000-0002-4754-0325¹,
Zhun Zhong²,
Na Zhao³,
Nicu Sebe⁴ &
…
Gim Hee Lee¹

496 Accesses
2 Citations
Explore all metrics

Abstract

Domain shift widely exists in the visual world, while modern deep neural networks commonly suffer from severe performance degradation under domain shift due to poor generalization ability, which limits real-world applications. The domain shift mainly lies in the limited source environmental variations and the large distribution gap between source and unseen target data. To this end, we propose a unified framework, Style-HAllucinated Dual consistEncy learning (SHADE), to handle such domain shift in various visual tasks. Specifically, SHADE is constructed based on two consistency constraints, Style Consistency (SC) and Retrospection Consistency (RC). SC enriches the source situations and encourages the model to learn consistent representation across style-diversified samples. RC leverages general visual knowledge to prevent the model from overfitting to source data and thus largely keeps the representation consistent between the source and general visual models. Furthermore, we present a novel style hallucination module (SHM) to generate style-diversified samples that are essential to consistency learning. SHM selects basis styles from the source distribution, enabling the model to dynamically generate diverse and realistic samples during training. Extensive experiments demonstrate that our versatile SHADE can significantly enhance the generalization in various visual recognition tasks, including image classification, semantic segmentation, and object detection, with different models, i.e., ConvNets and Transformer.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Microsoft COCO: Common Objects in Context

A survey on Image Data Augmentation for Deep Learning

Article Open access 06 July 2019

Attention mechanisms in computer vision: A survey

Article Open access 15 March 2022

Data Availibility

The datasets analysed during the current study are available as follows: GTAV (Richter et al., 2016): https://download.visinf.tu-darmstadt.de/data/from_games/ SYNTHIA (Ros et al., 2016): http://synthia-dataset.net/ CityScapes (Cordts et al., 2016): https://www.cityscapes-dataset.com/ BDD100K (Yu et al., 2020): https://bdd-data.berkeley.edu/ Mapillary (Neuhold et al., 2017): https://www.mapillary.com/dataset/vistas PACS (Li et al., 2017): https://domaingeneralization.github.io/#data Urban-scene Detection (Wu & Deng, 2022): https://github.com/AmingWu/Single-DGOD

Code Availability

The code is available at https://github.com/HeliosZhao/SHADE-VisualDG.

References

Arjovsky, M., Chintala, S., & Bottou, L. (2017). Wasserstein generative adversarial networks. In ICML
Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., & Zagoruyko, S. (2020). End-to-end object detection with transformers. In ECCV
Carlucci, F. M., D’Innocente, A., Bucci, S., Caputo, B., & Tommasi, T. (2019). Domain generalization by solving jigsaw puzzles. In CVPR
Chen, H., Zhao, L., Zhang, H., Wang, Z., Zuo, Z., Li, A., Xing, W., & Lu, D. (2021a). Diverse image style transfer via invertible cross-space mapping. In ICCV
Chen, L. C., Zhu, Y., Papandreou, G., Schroff, F., & Adam, H. (2018). Encoder-decoder with atrous separable convolution for semantic image segmentation. In ECCV
Chen, M., Zheng, Z., Yang, Y., & Chua, T. S. (2022). PiPa: Pixel-and patch-wise self-supervised learning for domain adaptative semantic segmentation. arXiv preprint arXiv:2211.07609
Chen, T., Kornblith, S., Norouzi, M., & Hinton, G. (2020). A simple framework for contrastive learning of visual representations. In ICML
Chen, Y., Wang, H., Li, W., Sakaridis, C., Dai, D., & Van Gool, L. (2021). Scale-aware domain adaptive faster R-CNN. IJCV, 129, 2223–2243.
Article Google Scholar
Choi, S., Jung, S., Yun, H., Kim, J. T., Kim, S., & Choo, J. (2021). RobustNet: Improving domain generalization in urban-scene segmentation via instance selective whitening. In CVPR
Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., & Schiele, B. (2016). The cityscapes dataset for semantic urban scene understanding. In CVPR
Deng, J., Dong, W., Socher, R., Li, L. J., Li, K., & Fei-Fei, L. (2009). ImageNet: A large-scale hierarchical image database. In CVPR
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., & Uszkoreit, J. (2021). An image is worth 16 x 16 words: Transformers for image recognition at scale. In ICLR
Du, D., Chen, J., Li, Y., Ma, K., Wu, G., Zheng, Y., & Wang, L. (2022). Cross-domain gated learning for domain generalization. IJCV, 130, 2842–2857.
Article Google Scholar
Dumoulin, V., Shlens, J., & Kudlur, M. (2017). A learned representation for artistic style. In ICLR
Fini, E., Sangineto, E., Lathuilière, S., Zhong, Z., Nabi, M., & Ricci, E. (2021). A unified objective for novel class discovery. In ICCV
French, G., Laine, S., Aila, T., Mackiewicz, M., & Finlayson, G. (2020). Semi-supervised semantic segmentation needs strong, varied perturbations. In BMVC
Gong, R., Li, W., Chen, Y., Dai, D., & Van Gool, L. (2021). DLOW: Domain flow and applications. IJCV, 129, 2865–2888.
Article Google Scholar
Gretton, A., Borgwardt, K. M., Rasch, M. J., Schölkopf, B., & Smola, A. (2012). A kernel two-sample test. JMLR.
Halmos, P. R. (1987). Finite-dimensional vector spaces. Springer.
Hassaballah, M., Kenk, M. A., Muhammad, K., & Minaee, S. (2020). Vehicle detection and tracking in adverse weather using a deep learning framework. IEEE Transactions on Intelligent Transportation Systems, 22, 4230–4242.
Article Google Scholar
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In CVPR
He, K., Gkioxari, G., Dollár, P., & Girshick, R. (2017). Mask R-CNN. In ICCV
He, K., Fan, H., Wu, Y., Xie, S., & Girshick, R. (2020). Momentum contrast for unsupervised visual representation learning. In CVPR
Hendrycks, D., Mu, N., Cubuk, E. D., Zoph, B., Gilmer, J., & Lakshminarayanan, B. (2020). AugMix: A simple data processing method to improve robustness and uncertainty. In ICLR
Hoffman, J., Wang, D., Yu, F., & Darrell, T. (2016). FCNs in the wild: Pixel-level adversarial and constraint-based adaptation. arXiv preprint arXiv:1612.02649
Hoyer, L., Dai, D., & Van Gool, L. (2022). DAFormer: Improving network architectures and training strategies for domain-adaptive semantic segmentation. In CVPR
Huang, J., Guan, D., Xiao, A., & Lu, S. (2021). FSDR: Frequency space domain randomization for domain generalization. In CVPR
Huang, L., Zhou, Y., Zhu, F., Liu, L., & Shao, L. (2019). Iterative normalization: Beyond standardization towards efficient whitening. In CVPR
Huang, X., & Belongie, S. (2017). Arbitrary style transfer in real-time with adaptive instance normalization. In ICCV
Huang, Z., Wang, H., Xing, E. P., & Huang, D. (2020). Self-challenging improves cross-domain generalization. In ECCV
Kannan, H., Kurakin, A., & Goodfellow, I. (2018). Adversarial logit pairing. In ICML
Kim, J., Lee, J., Park, J., Min, D., & Sohn, K. (2022). Pin the memory: Learning to generalize semantic segmentation. In CVPR
Lee, S., Seong, H., Lee, S., & Kim, E. (2022). WildNet: Learning domain generalized semantic segmentation from the wild. In CVPR
Li, D., Yang, Y., Song, Y. Z., & Hospedales, T. M. (2017). Deeper, broader and artier domain generalization. In ICCV
Li, D., Yang, Y., Song, Y. Z., & Hospedales, T. (2018a). Learning to generalize: Meta-learning for domain generalization. In AAAI
Li, Y., Tian, X., Gong, M., Liu, Y., Liu, T., Zhang, K., & Tao, D. (2018b). Deep domain generalization via conditional invariant adversarial networks. In ECCV
Lin, C., Yuan, Z., Zhao, S., Sun, P., Wang, C., & Cai, J. (2021). Domain-invariant disentangled network for generalizable object detection. In ICCV
Liu, W., Rabinovich, A., & Berg, A. C. (2015). ParseNet: Looking wider to see better. arXiv preprint arXiv:1506.04579
Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., & Guo, B. (2021). Swin transformer: Hierarchical vision transformer using shifted windows. In ICCV
Loshchilov, I., & Hutter, F. (2019). Decoupled weight decay regularization. In ICLR
MacQueen, J. (1967). Some methods for classification and analysis of multivariate observations. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability
Neuhold, G., Ollmann, T., Rota Bulo, S., & Kontschieder, P. (2017). The mapillary vistas dataset for semantic understanding of street scenes. In ICCV
Nuriel, O., Benaim, S., & Wolf, L. (2021). Permuted AdaIN: Reducing the bias towards global statistics in image classification. In CVPR
Pan, X., Luo, P., Shi, J., & Tang, X. (2018). Two at once: Enhancing learning and generalization capacities via IBN-Net. In ECCV
Pan, X., Zhan, X., Shi, J., Tang, X., & Luo, P. (2019). Switchable whitening for deep representation learning. In ICCV
Peng, D., Lei, Y., Liu, L., Zhang, P., & Liu, J. (2021). Global and local texture randomization for synthetic-to-real semantic segmentation. IEEE TIP, 30, 6594–6608.
Google Scholar
Peng, D., Lei, Y., Hayat, M., Guo, Y., & Li, W. (2022). Semantic-aware domain generalized segmentation. In CVPR
Qi, C. R., Yi, L., Su, H., & Guibas, L. J. (2017). PointNet++: Deep hierarchical feature learning on point sets in a metric space. In NeurIPS
Qiao, F., Zhao, L., & Peng, X. (2020). Learning to learn single domain generalization. In CVPR
Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster R-CNN: Towards real-time object detection with region proposal networks. In NeurIPS
Richter, S. R., Vineet, V., Roth, S., & Koltun, V. (2016). Playing for data: Ground truth from computer games. In ECCV
Ros, G., Sellart, L., Materzynska, J., Vazquez, D., & Lopez, A. M. (2016). The synthia dataset: A large collection of synthetic images for semantic segmentation of urban scenes. In CVPR
Roy, S., Liu, M., Zhong, Z., Sebe, N., & Ricci, E. (2022). Class-incremental novel class discovery. In ECCV
Sakaridis, C., Dai, D., & Van Gool, L. (2018). Semantic foggy scene understanding with synthetic data. IJCV, 126, 973–992.
Article Google Scholar
Sakaridis, C., Dai, D., & Gool, L. V. (2019). Guided curriculum model adaptation and uncertainty-aware evaluation for semantic nighttime image segmentation. In ICCV
Sakaridis, C., Dai, D., & Van Gool, L. (2021). ACDC: The adverse conditions dataset with correspondences for semantic driving scene understanding. In ICCV
Shankar, S., Piratla, V., Chakrabarti, S., Chaudhuri, S., Jyothi, P., & Sarawagi, S. (2018). Generalizing across domains via cross-gradient training. In ICLR
Shui, C., Li, Z., Li, J., Gagné, C., Ling, C. X., & Wang, B. (2021). Aggregating from multiple target-shifted sources. In ICML
Shui, C., Chen, Q., Wen, J., Zhou, F., Gagné, C., & Wang, B. (2022). A novel domain adaptation theory with Jensen–Shannon divergence. Knowledge-Based Systems, 257, 109808.
Article Google Scholar
Shui, C., Wang, B., & Gagné, C. (2022). On the benefits of representation regularization in invariance based domain generalization. Machine Learning, 111, 895–915.
Article MathSciNet PubMed PubMed Central Google Scholar
Shui, C., Xu, G., Chen, Q., Li, J., Ling, C. X., Arbel, T., Wang, B., & Gagné, C. (2022c). On learning fairness and accuracy on multiple subgroups. In NeurIPS
Tang, Z., Gao, Y., Zhu, Y., Zhang, Z., Li, M., & Metaxas, D. (2021). SelfNorm and CrossNorm for out-of-distribution robustness. In ICCV
Tarvainen, A., & Valpola, H. (2017). Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. In NeurIPS
Vapnik, V. (2013). The nature of statistical learning theory. Springer science & business media.
Wang, H., Xiao, C., Kossaifi, J., Yu, Z., Anandkumar, A., & Wang, Z. (2021a). AugMax: Adversarial composition of random augmentations for robust training. In NeurIPS
Wang, P., Li, Y., & Vasconcelos, N. (2021b). Rethinking and improving the robustness of image style transfer. In CVPR
Wang, Z., Luo, Y., Qiu, R., Huang, Z., & Baktashmotlagh, M. (2021c). Learning to diversify for single domain generalization. In ICCV
Wu, A., & Deng, C. (2022). Single-domain generalized object detection in urban scene via cyclic-disentangled self-distillation. In CVPR
Wu, A., Liu, R., Han, Y., Zhu, L., & Yang, Y. (2021). Vector-decomposed disentanglement for domain-invariant object detection. In ICCV
Xie, E., Wang, W., Yu, Z., Anandkumar, A., Alvarez, J. M., & Luo, P. (2021). SegFormer: Simple and efficient design for semantic segmentation with transformers. NeurIPS, 34, 12077–12090.
Google Scholar
Yu, F., Chen, H., Wang, X., Xian, W., Chen, Y., Liu, F., Madhavan, V., & Darrell, T. (2020). BDD100K: A diverse driving dataset for heterogeneous multitask learning. In CVPR
Yuan, J., Ma, X., Chen, D., Kuang, K., Wu, F., & Lin, L. (2022). Domain-specific bias filtering for single labeled domain generalization. IJCV, 131, 552–571.
Article Google Scholar
Yue, X., Zhang, Y., Zhao, S., Sangiovanni-Vincentelli, A., Keutzer, K., & Gong, B. (2019). Domain randomization and pyramid consistency: simulation-to-real generalization without accessing target domain data. In ICCV
Zhao, L., Liu, T., Peng, X., & Metaxas, D. (2020). Maximum-entropy adversarial data augmentation for improved generalization and robustness. In NeurIPS
Zhao, Y., Zhong, Z., Yang, F., Luo, Z., Lin, Y., Li, S., & Nicu, S. (2021). Learning to generalize unseen domains via memory-based multi-source meta-learning for person re-identification. In CVPR
Zhao, Y., Zhong, Z., Luo, Z., Lee, G. H., & Sebe, N. (2022). Source-free open compound domain adaptation in semantic segmentation. IEEE TCSVT, 32, 7019–7032.
Google Scholar
Zhao, Y., Zhong, Z., Sebe, N., & Lee, G. H. (2022b). Novel class discovery in semantic segmentation. In CVPR
Zhao, Y., Zhong, Z., Zhao, N., Sebe, N., & Lee, G. H. (2022c). Style-hallucinated dual consistency learning for domain generalized semantic segmentation. In ECCV
Zheng, Z., & Yang, Y. (2020). Unsupervised scene adaptation with memory regularization in vivo. In IJCAI
Zheng, Z., & Yang, Y. (2021). Rectifying pseudo label learning via uncertainty estimation for domain adaptive semantic segmentation. In IJCV
Zheng, Z., & Yang, Y. (2022). Adaptive boosting for domain adaptation: Toward robust predictions in scene segmentation. IEEE TIP, 31, 5371–5382.
Google Scholar
Zhong, Z., Zhu, L., Luo, Z., Li, S., Yang, Y., & Sebe, N. (2021). OpenMix: Reviving known knowledge for discovering novel visual categories in an open world. In CVPR
Zhong, Z., Zhao, Y., Lee, G. H., & Sebe, N. (2022). Adversarial style augmentation for domain generalized urban-scene segmentation. In NeurIPS
Zhou, K., Yang, Y., Qiao, Y., & Xiang, T. (2021a). Domain generalization with mixstyle. In ICLR
Zhou, Q., Feng, Z., Gu, Q., Pang, J., Cheng, G., Lu, X., Shi, J., & Ma, L. (2021b). Context-aware mixup for domain adaptive semantic segmentation. arXiv preprint arXiv:2108.03557
Zhou, Q., Feng, Z., Gu, Q., Cheng, G., Lu, X., Shi, J., & Ma, L. (2022). Uncertainty-aware consistency regularization for cross-domain semantic segmentation. Computer Vision and Image Understanding, 221, 103448.
Article Google Scholar
Zhu, J. Y., Park, T., Isola, P., & Efros, A. A. (2017). Unpaired image-to-image translation using cycle-consistent adversarial networks. In ICCV

Download references

Acknowledgements

This research/project is supported by the National Research Foundation Singapore and DSO National Laboratories under the AI Singapore Programme (AISG Award No: AISG2-RP-2020-016), the Tier 2 grant MOET2EP20120- 0011 from the Singapore Ministry of Education, the MUR PNRR project FAIR (PE00000013) funded by the NextGenerationEU, and the EU project Ai4Trust (No. 101070190).

Author information

Authors and Affiliations

Department of Computer Science, National University of Singapore, Singapore, Singapore
Yuyang Zhao & Gim Hee Lee
School of Computer Science, University of Nottingham, Nottingham, UK
Zhun Zhong
Information Systems Technology and Design Pillar, Singapore University of Technology and Design, Singapore, Singapore
Na Zhao
Department of Information Engineering and Computer Science, University of Trento, Trento, Italy
Nicu Sebe

Authors

Yuyang Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Zhun Zhong
View author publications
You can also search for this author in PubMed Google Scholar
Na Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Nicu Sebe
View author publications
You can also search for this author in PubMed Google Scholar
Gim Hee Lee
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yuyang Zhao.

Additional information

Communicated by Oliver Zendel.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Zhao, Y., Zhong, Z., Zhao, N. et al. Style-Hallucinated Dual Consistency Learning: A Unified Framework for Visual Domain Generalization. Int J Comput Vis 132, 837–853 (2024). https://doi.org/10.1007/s11263-023-01911-w

Download citation

Received: 16 December 2022
Accepted: 15 September 2023
Published: 18 October 2023
Issue Date: March 2024
DOI: https://doi.org/10.1007/s11263-023-01911-w

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Style-Hallucinated Dual Consistency Learning: A Unified Framework for Visual Domain Generalization

Abstract

Access this article

Similar content being viewed by others

Microsoft COCO: Common Objects in Context

A survey on Image Data Augmentation for Deep Learning

Attention mechanisms in computer vision: A survey

Data Availibility

Code Availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Style-Hallucinated Dual Consistency Learning: A Unified Framework for Visual Domain Generalization

Abstract

Access this article

Similar content being viewed by others

Microsoft COCO: Common Objects in Context

A survey on Image Data Augmentation for Deep Learning

Attention mechanisms in computer vision: A survey

Data Availibility

Code Availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation