Swin2SR: SwinV2 Transformer for Compressed Image Super-Resolution and Restoration

Conde, Marcos V.; Choi, Ui-Jin; Burchi, Maxime; Timofte, Radu

doi:10.1007/978-3-031-25063-7_42

Marcos V. Conde¹⁰,
Ui-Jin Choi¹¹,
Maxime Burchi¹⁰ &
…
Radu Timofte¹⁰

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13802))

Included in the following conference series:

European Conference on Computer Vision

2663 Accesses
27 Citations

Abstract

Compression plays an important role on the efficient transmission and storage of images and videos through band-limited systems such as streaming services, virtual reality or videogames. However, compression unavoidably leads to artifacts and the loss of the original information, which may severely degrade the visual quality. For these reasons, quality enhancement of compressed images has become a popular research topic. While most state-of-the-art image restoration methods are based on convolutional neural networks, other transformers-based methods such as SwinIR, show impressive performance on these tasks.

In this paper, we explore the novel Swin Transformer V2, to improve SwinIR for image super-resolution, and in particular, the compressed input scenario. Using this method we can tackle the major issues in training transformer vision models, such as training instability, resolution gaps between pre-training and fine-tuning, and hunger on data. We conduct experiments on three representative tasks: JPEG compression artifacts removal, image super-resolution (classical and lightweight), and compressed image super-resolution. Experimental results demonstrate that our method, Swin2SR, can improve the training convergence and performance of SwinIR, and is a top-5 solution at the “AIM 2022 Challenge on Super-Resolution of Compressed Image and Video".

Our code can be found at https://github.com/mv-lab/swin2sr.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 149.00; Price excludes VAT (USA)

Softcover Book: USD 199.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Online leaderboard https://codalab.lisn.upsaclay.fr/competitions/5076.

References

Agustsson, E., Timofte, R.: NTIRE 2017 challenge on single image super-resolution: dataset and study. In: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition Workshops, pp. 126–135 (2017)
Google Scholar
Ahn, N., Kang, B., Sohn, K.-A.: Fast, accurate, and lightweight super-resolution with cascading residual network. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11214, pp. 256–272. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01249-6_16
Chapter Google Scholar
Bevilacqua, M., Roumy, A., Guillemot, C., Morel, M.A.: Low-complexity single-image super-resolution based on nonnegative neighbor embedding. In: British Machine Vision Conference, pp. 135.1-135.10 (2012)
Google Scholar
Cao, H., et al.: Swin-Unet: Unet-like pure transformer for medical image segmentation. arXiv preprint arXiv:2105.05537 (2021)
Cao, J., Li, Y., Zhang, K., Van Gool, L.: Video super-resolution transformer. arXiv preprint arXiv:2106.06847 (2021)
Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12346, pp. 213–229. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58452-8_13
Chapter Google Scholar
Cavigelli, L., Hager, P., Benini, L.: CAS-CNN: a deep convolutional neural network for image compression artifact suppression. In: 2017 International Joint Conference on Neural Networks, pp. 752–759 (2017)
Google Scholar
Chen, H., et al.: Pre-trained image processing transformer. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 12299–12310 (2021)
Google Scholar
Chu, X., Zhang, B., Ma, H., Xu, R., Li, Q.: Fast, accurate and lightweight super-resolution with neural architecture search. In: International Conference on Pattern Recognition, pp. 59–64. IEEE (2020)
Google Scholar
Conde, M.V., Burchi, M., Timofte, R.: Conformer and blind noisy students for improved image quality assessment. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 940–950 (2022)
Google Scholar
Conde, M.V., Choi, U.-J., Burchi, M., Timofte, R.: Swin2SR: Swinv2 transformer for compressed image super-resolution and restoration. In: Proceedings of the European Conference on Computer Vision (ECCV) Workshops (2022)
Google Scholar
Conde, M.V., McDonagh, S., Maggioni, M., Leonardis, A., Pérez-Pellitero, E.: Model-based image signal processors via learnable dictionaries. Proc. AAAI Conf. Artif. Intell. 36(1), 481–489 (2022)
Google Scholar
Conde, M.V., Turgutlu, K.: Exploring vision transformers for fine-grained classification. arXiv preprint arXiv:2106.10587 (2021)
Dai, T., Cai, J., Zhang, Y., Xia, S.-T., Zhang, L.: Second-order attention network for single image super-resolution. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 11065–11074 (2019)
Google Scholar
Dong, C., Deng, Y., Change Loy, C., Tang, X.: Compression artifacts reduction by a deep convolutional network. In: IEEE International Conference on Computer Vision, pp. 576–584 (2015)
Google Scholar
Dong, C., Loy, C.C., He, K., Tang, X.: Learning a deep convolutional network for image super-resolution. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8692, pp. 184–199. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10593-2_13
Chapter Google Scholar
Dosovitskiy, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)
Ehrlich, M., Davis, L., Lim, S.-N., Shrivastava, A.: Quantization guided jpeg artifact correction. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12353, pp. 293–309. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58598-3_18
Chapter Google Scholar
Foi, A., Katkovnik, V., Egiazarian, K.: Pointwise shape-adaptive DCT for high-quality denoising and deblocking of grayscale and color images. IEEE Trans. Image Process. 16(5), 1395–1411 (2007)
Article MathSciNet Google Scholar
Fritsche, M., Gu, S., Timofte, R.: Frequency separation for real-world super-resolution. In: IEEE Conference on International Conference on Computer Vision Workshops, pp. 3599–3608 (2019)
Google Scholar
Gu, J., et al.: NTIRE 2022 challenge on perceptual image quality assessment. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 951–967 (2022)
Google Scholar
Gu, J., Lu, H., Zuo, W., Dong, C.: Blind super-resolution with iterative kernel correction. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1604–1613 (2019)
Google Scholar
Haris, M., Shakhnarovich, G., Ukita. N.: Deep back-projection networks for super-resolution. In: IEEE Conference on Computer Vision and Pattern Recognition, PP 1664–1673 (2018)
Google Scholar
Huang, J.-B., Singh, A., Ahuja, N.: Single image super-resolution from transformed self-exemplars. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 5197–5206 (2015)
Google Scholar
Hui, Z., Gao, X., Yang, Y., Wang. X.: Lightweight image super-resolution with information multi-distillation network. In: ACM International Conference on Multimedia, pp. 2024–2032 (2019)
Google Scholar
Ji, X., Cao, Y., Tai, Y., Wang, C., Li, J., Huang, F.: Real-world super-resolution via kernel estimation and noise injection. In: IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 466–467 (2020)
Google Scholar
Jiang, J., Zhang, K., Timofte, R.: Towards flexible blind jpeg artifacts removal. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 4997–5006 (2021)
Google Scholar
Kim, J., Lee, J.K., Lee, K.M.: Accurate image super-resolution using very deep convolutional networks. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1646–1654 (2016)
Google Scholar
Lai, W.-S., Huang, J.-B., Ahuja, N., Yang. M.-H.: Deep Laplacian pyramid networks for fast and accurate super-resolution. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 624–632 (2017)
Google Scholar
Li, W., Zhou, K., Qi, L., Jiang, N., Lu, J., Jia. J.: LAPAR: linearly-assembled pixel-adaptive regression network for single image super-resolution and beyond. arXiv preprint arXiv:2105.10422 (2021)
Li, Z., Yang, J., Liu, Z., Yang, X., Jeon, G., Wu. W.: Feedback network for image super-resolution. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3867–3876 (2019)
Google Scholar
Liang, J., Cao, J., Sun, G., Zhang, K., Van Gool, L., Timofte, R.: Swinir: image restoration using Swin transformer. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp.1833–1844 (2021)
Google Scholar
Liu, D., Wen, B., Fan, Y., Change Loy, C., Huang, T.S.: Non-local recurrent network for image restoration. arXiv preprint arXiv:1806.02919 (2018)
Liu, P., Zhang, H., Zhang, K., Lin, L., Zuo. W.: Multi-level wavelet-CNN for image restoration. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 773–782 (2018)
Google Scholar
Liu, Z., et al.: Swin transformer v2: scaling up capacity and resolution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12009–12019 (2022)
Google Scholar
Liu, Z., et al.: Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021)
Google Scholar
Lugmayr, A., Danelljan, M., Timofte, R.: NTIRE 2020 challenge on real-world image super-resolution: methods and results. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 494–495 (2020)
Google Scholar
Luo, X., Xie, Y., Zhang, Y., Qu, Y., Li, C., Fu, Y.: LatticeNet: towards lightweight image super-resolution with lattice block. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12367, pp. 272–289. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58542-6_17
Chapter Google Scholar
Martin, D., Fowlkes, C., Tal, D., Malik, J.: A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In: IEEE Conference on International Conference on Computer Vision, pp. 416–423 (2001)
Google Scholar
Matsui, Y., et al.: Sketch-based manga retrieval using manga109 dataset. Multim. Tools Appl. 76(20), 21811–21838 (2017)
Article Google Scholar
Mei, Y., Fan, Y., Zhou, Y.: Image super-resolution with non-local sparse attention. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3517–3526 (2021)
Google Scholar
Niu, B., et al.: Single image super-resolution via a holistic attention network. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12357, pp. 191–207. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58610-2_12
Chapter Google Scholar
Rawzor. Image compression benchmark
Google Scholar
Sheikh, H.R.: Live image quality assessment database release 2 (2005). http://live.ece.utexas.edu/research/quality
Tai, Y., Yang, J., Liu, X., Xu. C.: MemNet: a persistent memory network for image restoration. In: IEEE International Conference on Computer Vision, pp. 4539–4547 (2017)
Google Scholar
Timofte, R., Agustsson, E., Van Gool, L., Yang, M.-H., Zhang, L.: NTIRE 2017 challenge on single image super-resolution: methods and results. In: IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 114–125 (2017)
Google Scholar
Timofte, R., De Smet, V., Van Gool. L.: Anchored neighborhood regression for fast example-based super-resolution. In: IEEE Conference on International Conference on Computer Vision, pp. 1920–1927 (2013)
Google Scholar
Timofte, R., De Smet, V., Van Gool, L.: A+: adjusted anchored neighborhood regression for fast super-resolution. In: Asian Conference on Computer Vision, pp. 111–126 (2014)
Google Scholar
Timofte, R., Rothe, R., Van Gool, L.: Seven ways to improve example-based single image super resolution. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1865–1873 (2016)
Google Scholar
Touvron, H., Cord, M., Douze, M., Massa, F., Sablayrolles, A., Jégou, H.: Training data-efficient image transformers & distillation through attention. arXiv preprint arXiv:2012.12877 (2020)
Vaswani, A., Ramachandran, P., Srinivas, A., Parmar, N., Hechtman, B., Shlens, J.: Scaling local self-attention for parameter efficient visual backbones. arXiv preprint arXiv:2103.12731 (2021)
Vaswani, A., et al.: Attention is all you need. arXiv preprint arXiv:1706.03762 (2017)
Wang, X., Xie, L., Dong, C., Shan, Y.: Real-ESRGAN: training real-world blind super-resolution with pure synthetic data. arXiv preprint arXiv:2107.10833 (2021)
Wang, X., Yu, K., Wu, S., Gu, J., Liu, Y., Dong, C., Qiao, Yu., Loy, C.C.: ESRGAN: enhanced super-resolution generative adversarial networks. In: Leal-Taixé, L., Roth, S. (eds.) ECCV 2018. LNCS, vol. 11133, pp. 63–79. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-11021-5_5
Chapter Google Scholar
Wang, Z., Cun, X., Bao, J., Liu, J.: Uformer: a general u-shaped transformer for image restoration. arXiv preprint arXiv:2106.03106 (2021)
Yamac, M., Ataman, B., Nawaz, A.; KernelNet: a blind super-resolution kernel estimation network. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 453–462 (2021)
Google Scholar
Yang, R., Timofte, R., et al.: NTIRE 2021 challenge on quality enhancement of compressed video: methods and results. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops (2021)
Google Scholar
Yang, R., Timofte, R., et al.: Aim 2022 challenge on super-resolution of compressed image and video: dataset, methods and results. In: Proceedings of the European Conference on Computer Vision Workshops (ECCVW) (2022)
Google Scholar
Yang, R., Timofte, R., et al.: NTIRE 2022 challenge on super-resolution and quality enhancement of compressed video: dataset, methods and results. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops (2022)
Google Scholar
Zeyde, R., Elad, M., Protter, M.: On single image scale-up using sparse-representations. In: International Conference on Curves and Surfaces, pp. 711–730 (2010)
Google Scholar
Zhang, K., Li, Y., Zuo, W., Zhang, L., Van Gool, L., Timofte, R.: Plug-and-play image restoration with deep denoiser prior. IEEE Trans. Pattern Anal. Mach. Intell. 99, 1 (2021)
Google Scholar
Zhang, K., Liang, J., Van Gool, L., Timofte, R.: Designing a practical degradation model for deep blind image super-resolution. In: IEEE Conference on International Conference on Computer Vision (2021)
Google Scholar
Zhang, K., Zuo, W., Chen, Y., Meng, D., Zhang, L.: Beyond a gaussian denoiser: residual learning of deep CNN for image denoising. IEEE Trans. Image Process. 26(7), 3142–3155 (2017)
Article MathSciNet MATH Google Scholar
Zhang, K., Zuo, W., Gu, S., Zhang, L.: Learning deep CNN denoiser prior for image restoration. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3929–3938 (2017)
Google Scholar
Zhang, K., Zuo, W., Zhang, L.: FfdNet: toward a fast and flexible solution for CNN-based image denoising. IEEE Trans. Image Process. 27(9), 4608–4622 (2018)
Article MathSciNet Google Scholar
Zhang, K., Zuo, W., Zhang, L.: Learning a single convolutional super-resolution network for multiple degradations. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3262–3271 (2018)
Google Scholar
Zhang, Y., Li, K., Li, K., Wang, L., Zhong, B., Fu, Y.: Image super-resolution using very deep residual channel attention networks. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11211, pp. 294–310. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01234-2_18
Chapter Google Scholar
Zhang, Y., Li, K., Li, K., Zhong, B., Fu, Y.: Residual non-local attention networks for image restoration. arXiv preprint arXiv:1903.10082 (2019)
Zhang, Y., Tian, Y., Kong, Y., Zhong, B., Fu. Y.: Residual dense network for image super-resolution. In Proceedings of the IEEE Conference On Computer Vision and Pattern Recognition, pp. 2472–2481 (2018)
Google Scholar
Zhang, Y., Yapeng Tian, Yu., Kong, B.Z., Yun, F.: Residual dense network for image restoration. IEEE Trans. Pattern Anal. Mach. Intell. 43(7), 2480–2495 (2020)
Article Google Scholar
Zheng, B., Chen, Y., Tian, X., Zhou, F., Liu, X.: Implicit dual-domain convolutional network for robust color image compression artifact reduction. IEEE Trans. Circuits Syst. Video Technol. 30(11), 3982–3994 (2019)
Article Google Scholar
Zhou, S., Zhang, J., Zuo, W., Loy, C.C.: Cross-scale internal graph neural network for image super-resolution. arXiv preprint arXiv:2006.16673 (2020)

Download references

Acknowledgments

This work was partly supported by The Alexander von Humboldt Foundation (AvH).

Author information

Authors and Affiliations

Computer Vision Lab, CAIDAS, University of Würzburg, Würzburg, Germany
Marcos V. Conde, Maxime Burchi & Radu Timofte
MegaStudyEdu, Seoul, South Korea
Ui-Jin Choi

Authors

Marcos V. Conde
View author publications
You can also search for this author in PubMed Google Scholar
Ui-Jin Choi
View author publications
You can also search for this author in PubMed Google Scholar
Maxime Burchi
View author publications
You can also search for this author in PubMed Google Scholar
Radu Timofte
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Marcos V. Conde .

Editor information

Editors and Affiliations

IBM Research - MIT-IBM Watson AI Lab, Massachusetts, USA
Leonid Karlinsky
Technion – Israel Institute of Technology, Haifa, Israel
Tomer Michaeli
Kyoto University, Kyoto, Japan
Ko Nishino

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Conde, M.V., Choi, UJ., Burchi, M., Timofte, R. (2023). Swin2SR: SwinV2 Transformer for Compressed Image Super-Resolution and Restoration. In: Karlinsky, L., Michaeli, T., Nishino, K. (eds) Computer Vision – ECCV 2022 Workshops. ECCV 2022. Lecture Notes in Computer Science, vol 13802. Springer, Cham. https://doi.org/10.1007/978-3-031-25063-7_42

Download citation

DOI: https://doi.org/10.1007/978-3-031-25063-7_42
Published: 16 February 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-25062-0
Online ISBN: 978-3-031-25063-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Swin2SR: SwinV2 Transformer for Compressed Image Super-Resolution and Restoration