Skip to main content

Swin2SR: SwinV2 Transformer for Compressed Image Super-Resolution and Restoration

  • Conference paper
  • First Online:
Computer Vision – ECCV 2022 Workshops (ECCV 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13802))

Included in the following conference series:

Abstract

Compression plays an important role on the efficient transmission and storage of images and videos through band-limited systems such as streaming services, virtual reality or videogames. However, compression unavoidably leads to artifacts and the loss of the original information, which may severely degrade the visual quality. For these reasons, quality enhancement of compressed images has become a popular research topic. While most state-of-the-art image restoration methods are based on convolutional neural networks, other transformers-based methods such as SwinIR, show impressive performance on these tasks.

In this paper, we explore the novel Swin Transformer V2, to improve SwinIR for image super-resolution, and in particular, the compressed input scenario. Using this method we can tackle the major issues in training transformer vision models, such as training instability, resolution gaps between pre-training and fine-tuning, and hunger on data. We conduct experiments on three representative tasks: JPEG compression artifacts removal, image super-resolution (classical and lightweight), and compressed image super-resolution. Experimental results demonstrate that our method, Swin2SR, can improve the training convergence and performance of SwinIR, and is a top-5 solution at the “AIM 2022 Challenge on Super-Resolution of Compressed Image and Video".

Our code can be found at https://github.com/mv-lab/swin2sr.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 199.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Online leaderboard https://codalab.lisn.upsaclay.fr/competitions/5076.

References

  1. Agustsson, E., Timofte, R.: NTIRE 2017 challenge on single image super-resolution: dataset and study. In: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition Workshops, pp. 126–135 (2017)

    Google Scholar 

  2. Ahn, N., Kang, B., Sohn, K.-A.: Fast, accurate, and lightweight super-resolution with cascading residual network. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11214, pp. 256–272. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01249-6_16

    Chapter  Google Scholar 

  3. Bevilacqua, M., Roumy, A., Guillemot, C., Morel, M.A.: Low-complexity single-image super-resolution based on nonnegative neighbor embedding. In: British Machine Vision Conference, pp. 135.1-135.10 (2012)

    Google Scholar 

  4. Cao, H., et al.: Swin-Unet: Unet-like pure transformer for medical image segmentation. arXiv preprint arXiv:2105.05537 (2021)

  5. Cao, J., Li, Y., Zhang, K., Van Gool, L.: Video super-resolution transformer. arXiv preprint arXiv:2106.06847 (2021)

  6. Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12346, pp. 213–229. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58452-8_13

    Chapter  Google Scholar 

  7. Cavigelli, L., Hager, P., Benini, L.: CAS-CNN: a deep convolutional neural network for image compression artifact suppression. In: 2017 International Joint Conference on Neural Networks, pp. 752–759 (2017)

    Google Scholar 

  8. Chen, H., et al.: Pre-trained image processing transformer. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 12299–12310 (2021)

    Google Scholar 

  9. Chu, X., Zhang, B., Ma, H., Xu, R., Li, Q.: Fast, accurate and lightweight super-resolution with neural architecture search. In: International Conference on Pattern Recognition, pp. 59–64. IEEE (2020)

    Google Scholar 

  10. Conde, M.V., Burchi, M., Timofte, R.: Conformer and blind noisy students for improved image quality assessment. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 940–950 (2022)

    Google Scholar 

  11. Conde, M.V., Choi, U.-J., Burchi, M., Timofte, R.: Swin2SR: Swinv2 transformer for compressed image super-resolution and restoration. In: Proceedings of the European Conference on Computer Vision (ECCV) Workshops (2022)

    Google Scholar 

  12. Conde, M.V., McDonagh, S., Maggioni, M., Leonardis, A., Pérez-Pellitero, E.: Model-based image signal processors via learnable dictionaries. Proc. AAAI Conf. Artif. Intell. 36(1), 481–489 (2022)

    Google Scholar 

  13. Conde, M.V., Turgutlu, K.: Exploring vision transformers for fine-grained classification. arXiv preprint arXiv:2106.10587 (2021)

  14. Dai, T., Cai, J., Zhang, Y., Xia, S.-T., Zhang, L.: Second-order attention network for single image super-resolution. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 11065–11074 (2019)

    Google Scholar 

  15. Dong, C., Deng, Y., Change Loy, C., Tang, X.: Compression artifacts reduction by a deep convolutional network. In: IEEE International Conference on Computer Vision, pp. 576–584 (2015)

    Google Scholar 

  16. Dong, C., Loy, C.C., He, K., Tang, X.: Learning a deep convolutional network for image super-resolution. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8692, pp. 184–199. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10593-2_13

    Chapter  Google Scholar 

  17. Dosovitskiy, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)

  18. Ehrlich, M., Davis, L., Lim, S.-N., Shrivastava, A.: Quantization guided jpeg artifact correction. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12353, pp. 293–309. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58598-3_18

    Chapter  Google Scholar 

  19. Foi, A., Katkovnik, V., Egiazarian, K.: Pointwise shape-adaptive DCT for high-quality denoising and deblocking of grayscale and color images. IEEE Trans. Image Process. 16(5), 1395–1411 (2007)

    Article  MathSciNet  Google Scholar 

  20. Fritsche, M., Gu, S., Timofte, R.: Frequency separation for real-world super-resolution. In: IEEE Conference on International Conference on Computer Vision Workshops, pp. 3599–3608 (2019)

    Google Scholar 

  21. Gu, J., et al.: NTIRE 2022 challenge on perceptual image quality assessment. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 951–967 (2022)

    Google Scholar 

  22. Gu, J., Lu, H., Zuo, W., Dong, C.: Blind super-resolution with iterative kernel correction. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1604–1613 (2019)

    Google Scholar 

  23. Haris, M., Shakhnarovich, G., Ukita. N.: Deep back-projection networks for super-resolution. In: IEEE Conference on Computer Vision and Pattern Recognition, PP 1664–1673 (2018)

    Google Scholar 

  24. Huang, J.-B., Singh, A., Ahuja, N.: Single image super-resolution from transformed self-exemplars. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 5197–5206 (2015)

    Google Scholar 

  25. Hui, Z., Gao, X., Yang, Y., Wang. X.: Lightweight image super-resolution with information multi-distillation network. In: ACM International Conference on Multimedia, pp. 2024–2032 (2019)

    Google Scholar 

  26. Ji, X., Cao, Y., Tai, Y., Wang, C., Li, J., Huang, F.: Real-world super-resolution via kernel estimation and noise injection. In: IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 466–467 (2020)

    Google Scholar 

  27. Jiang, J., Zhang, K., Timofte, R.: Towards flexible blind jpeg artifacts removal. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 4997–5006 (2021)

    Google Scholar 

  28. Kim, J., Lee, J.K., Lee, K.M.: Accurate image super-resolution using very deep convolutional networks. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1646–1654 (2016)

    Google Scholar 

  29. Lai, W.-S., Huang, J.-B., Ahuja, N., Yang. M.-H.: Deep Laplacian pyramid networks for fast and accurate super-resolution. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 624–632 (2017)

    Google Scholar 

  30. Li, W., Zhou, K., Qi, L., Jiang, N., Lu, J., Jia. J.: LAPAR: linearly-assembled pixel-adaptive regression network for single image super-resolution and beyond. arXiv preprint arXiv:2105.10422 (2021)

  31. Li, Z., Yang, J., Liu, Z., Yang, X., Jeon, G., Wu. W.: Feedback network for image super-resolution. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3867–3876 (2019)

    Google Scholar 

  32. Liang, J., Cao, J., Sun, G., Zhang, K., Van Gool, L., Timofte, R.: Swinir: image restoration using Swin transformer. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp.1833–1844 (2021)

    Google Scholar 

  33. Liu, D., Wen, B., Fan, Y., Change Loy, C., Huang, T.S.: Non-local recurrent network for image restoration. arXiv preprint arXiv:1806.02919 (2018)

  34. Liu, P., Zhang, H., Zhang, K., Lin, L., Zuo. W.: Multi-level wavelet-CNN for image restoration. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 773–782 (2018)

    Google Scholar 

  35. Liu, Z., et al.: Swin transformer v2: scaling up capacity and resolution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12009–12019 (2022)

    Google Scholar 

  36. Liu, Z., et al.: Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021)

    Google Scholar 

  37. Lugmayr, A., Danelljan, M., Timofte, R.: NTIRE 2020 challenge on real-world image super-resolution: methods and results. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 494–495 (2020)

    Google Scholar 

  38. Luo, X., Xie, Y., Zhang, Y., Qu, Y., Li, C., Fu, Y.: LatticeNet: towards lightweight image super-resolution with lattice block. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12367, pp. 272–289. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58542-6_17

    Chapter  Google Scholar 

  39. Martin, D., Fowlkes, C., Tal, D., Malik, J.: A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In: IEEE Conference on International Conference on Computer Vision, pp. 416–423 (2001)

    Google Scholar 

  40. Matsui, Y., et al.: Sketch-based manga retrieval using manga109 dataset. Multim. Tools Appl. 76(20), 21811–21838 (2017)

    Article  Google Scholar 

  41. Mei, Y., Fan, Y., Zhou, Y.: Image super-resolution with non-local sparse attention. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3517–3526 (2021)

    Google Scholar 

  42. Niu, B., et al.: Single image super-resolution via a holistic attention network. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12357, pp. 191–207. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58610-2_12

    Chapter  Google Scholar 

  43. Rawzor. Image compression benchmark

    Google Scholar 

  44. Sheikh, H.R.: Live image quality assessment database release 2 (2005). http://live.ece.utexas.edu/research/quality

  45. Tai, Y., Yang, J., Liu, X., Xu. C.: MemNet: a persistent memory network for image restoration. In: IEEE International Conference on Computer Vision, pp. 4539–4547 (2017)

    Google Scholar 

  46. Timofte, R., Agustsson, E., Van Gool, L., Yang, M.-H., Zhang, L.: NTIRE 2017 challenge on single image super-resolution: methods and results. In: IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 114–125 (2017)

    Google Scholar 

  47. Timofte, R., De Smet, V., Van Gool. L.: Anchored neighborhood regression for fast example-based super-resolution. In: IEEE Conference on International Conference on Computer Vision, pp. 1920–1927 (2013)

    Google Scholar 

  48. Timofte, R., De Smet, V., Van Gool, L.: A+: adjusted anchored neighborhood regression for fast super-resolution. In: Asian Conference on Computer Vision, pp. 111–126 (2014)

    Google Scholar 

  49. Timofte, R., Rothe, R., Van Gool, L.: Seven ways to improve example-based single image super resolution. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1865–1873 (2016)

    Google Scholar 

  50. Touvron, H., Cord, M., Douze, M., Massa, F., Sablayrolles, A., Jégou, H.: Training data-efficient image transformers & distillation through attention. arXiv preprint arXiv:2012.12877 (2020)

  51. Vaswani, A., Ramachandran, P., Srinivas, A., Parmar, N., Hechtman, B., Shlens, J.: Scaling local self-attention for parameter efficient visual backbones. arXiv preprint arXiv:2103.12731 (2021)

  52. Vaswani, A., et al.: Attention is all you need. arXiv preprint arXiv:1706.03762 (2017)

  53. Wang, X., Xie, L., Dong, C., Shan, Y.: Real-ESRGAN: training real-world blind super-resolution with pure synthetic data. arXiv preprint arXiv:2107.10833 (2021)

  54. Wang, X., Yu, K., Wu, S., Gu, J., Liu, Y., Dong, C., Qiao, Yu., Loy, C.C.: ESRGAN: enhanced super-resolution generative adversarial networks. In: Leal-Taixé, L., Roth, S. (eds.) ECCV 2018. LNCS, vol. 11133, pp. 63–79. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-11021-5_5

    Chapter  Google Scholar 

  55. Wang, Z., Cun, X., Bao, J., Liu, J.: Uformer: a general u-shaped transformer for image restoration. arXiv preprint arXiv:2106.03106 (2021)

  56. Yamac, M., Ataman, B., Nawaz, A.; KernelNet: a blind super-resolution kernel estimation network. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 453–462 (2021)

    Google Scholar 

  57. Yang, R., Timofte, R., et al.: NTIRE 2021 challenge on quality enhancement of compressed video: methods and results. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops (2021)

    Google Scholar 

  58. Yang, R., Timofte, R., et al.: Aim 2022 challenge on super-resolution of compressed image and video: dataset, methods and results. In: Proceedings of the European Conference on Computer Vision Workshops (ECCVW) (2022)

    Google Scholar 

  59. Yang, R., Timofte, R., et al.: NTIRE 2022 challenge on super-resolution and quality enhancement of compressed video: dataset, methods and results. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops (2022)

    Google Scholar 

  60. Zeyde, R., Elad, M., Protter, M.: On single image scale-up using sparse-representations. In: International Conference on Curves and Surfaces, pp. 711–730 (2010)

    Google Scholar 

  61. Zhang, K., Li, Y., Zuo, W., Zhang, L., Van Gool, L., Timofte, R.: Plug-and-play image restoration with deep denoiser prior. IEEE Trans. Pattern Anal. Mach. Intell. 99, 1 (2021)

    Google Scholar 

  62. Zhang, K., Liang, J., Van Gool, L., Timofte, R.: Designing a practical degradation model for deep blind image super-resolution. In: IEEE Conference on International Conference on Computer Vision (2021)

    Google Scholar 

  63. Zhang, K., Zuo, W., Chen, Y., Meng, D., Zhang, L.: Beyond a gaussian denoiser: residual learning of deep CNN for image denoising. IEEE Trans. Image Process. 26(7), 3142–3155 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  64. Zhang, K., Zuo, W., Gu, S., Zhang, L.: Learning deep CNN denoiser prior for image restoration. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3929–3938 (2017)

    Google Scholar 

  65. Zhang, K., Zuo, W., Zhang, L.: FfdNet: toward a fast and flexible solution for CNN-based image denoising. IEEE Trans. Image Process. 27(9), 4608–4622 (2018)

    Article  MathSciNet  Google Scholar 

  66. Zhang, K., Zuo, W., Zhang, L.: Learning a single convolutional super-resolution network for multiple degradations. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3262–3271 (2018)

    Google Scholar 

  67. Zhang, Y., Li, K., Li, K., Wang, L., Zhong, B., Fu, Y.: Image super-resolution using very deep residual channel attention networks. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11211, pp. 294–310. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01234-2_18

    Chapter  Google Scholar 

  68. Zhang, Y., Li, K., Li, K., Zhong, B., Fu, Y.: Residual non-local attention networks for image restoration. arXiv preprint arXiv:1903.10082 (2019)

  69. Zhang, Y., Tian, Y., Kong, Y., Zhong, B., Fu. Y.: Residual dense network for image super-resolution. In Proceedings of the IEEE Conference On Computer Vision and Pattern Recognition, pp. 2472–2481 (2018)

    Google Scholar 

  70. Zhang, Y., Yapeng Tian, Yu., Kong, B.Z., Yun, F.: Residual dense network for image restoration. IEEE Trans. Pattern Anal. Mach. Intell. 43(7), 2480–2495 (2020)

    Article  Google Scholar 

  71. Zheng, B., Chen, Y., Tian, X., Zhou, F., Liu, X.: Implicit dual-domain convolutional network for robust color image compression artifact reduction. IEEE Trans. Circuits Syst. Video Technol. 30(11), 3982–3994 (2019)

    Article  Google Scholar 

  72. Zhou, S., Zhang, J., Zuo, W., Loy, C.C.: Cross-scale internal graph neural network for image super-resolution. arXiv preprint arXiv:2006.16673 (2020)

Download references

Acknowledgments

This work was partly supported by The Alexander von Humboldt Foundation (AvH).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marcos V. Conde .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Conde, M.V., Choi, UJ., Burchi, M., Timofte, R. (2023). Swin2SR: SwinV2 Transformer for Compressed Image Super-Resolution and Restoration. In: Karlinsky, L., Michaeli, T., Nishino, K. (eds) Computer Vision – ECCV 2022 Workshops. ECCV 2022. Lecture Notes in Computer Science, vol 13802. Springer, Cham. https://doi.org/10.1007/978-3-031-25063-7_42

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-25063-7_42

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-25062-0

  • Online ISBN: 978-3-031-25063-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics