Abstract
Long-range dependency is one of the important inscriptions in sequence modeling. For video data, the commonly used convolutional and recurrent operations are a kind of “local coding” for variable-length sequences, which can only capture the local neighborhood information. We introduce the idea of non-local mean to compensate for the shortcomings of repeated convolutional operations, while most of the previous non-local methods used for video super-resolution only focus on positional information or fail to capture temporal information directly. In this study, we propose a non-local bidirectional fusion network (NLBF) for the video super-resolution (VSR) task. This non-local network decouples multidimensional information to reduce computational memory consumption, at the same time capturing long-range dependencies within the temporal-spatial-channel dimension as much as possible. In the multi-scale local and non-local hybrid framework, we further design the bidirectional spatial-temporal fusion module to balance the information obtained from other frames while achieving feature refinement. Experimental results on benchmark datasets show that the proposed NLBF is able to achieve state-of-the-art performance in the VSR task.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Codron, P., et al.: STochastic Optical Reconstruction Microscopy (STORM) reveals the nanoscale organization of pathological aggregates in human brain. Neuropathol. Appl. Neurobiol. 47, 127–142 (2021)
Zhang, Y., et al.: Improving quality of experience by adaptive video streaming with super-resolution. In: IEEE INFOCOM 2020-IEEE Conference on Computer Communications, pp. 1957–1966. IEEE (2020)
Koester, E., Sahin, C.S.: A comparison of super-resolution and nearest neighbors interpolation applied to object detection on satellite data. arXiv preprint arXiv:1907.05283 (2019)
Caballero, J., et al.: Real-time video super-resolution with spatio-temporal networks and motion compensation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4778–4787 (2017)
Huang, Y., Wang, W., Wang, L.: Video super-resolution via bidirectional recurrent convolutional networks. IEEE Trans. Pattern Anal. Mach. Intell. 40, 1015–1028 (2017)
Wang, X., Chan, K.C., Yu, K., Dong, C., Change Loy, C.: EDVR: video restoration with enhanced deformable convolutional networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 0–0 (2019)
Li, W., Tao, X., Guo, T., Qi, L., Lu, J., Jia, J.: MuCAN: multi-correspondence aggregation network for video super-resolution. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12355, pp. 335–351. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58607-2_20
Tian, Y., Zhang, Y., Fu, Y., Xu, C.: TDAN: temporally-deformable alignment network for video super-resolution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3360–3369 (2020)
Chan, K.C., Wang, X., Yu, K., Dong, C., Loy, C.C.: BasicVSR: the search for essential components in video super-resolution and beyond. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4947–4956 (2021)
Chan, K.C., Zhou, S., Xu, X., Loy, C.C.: BasicVSR++: improving video super-resolution with enhanced propagation and alignment. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5972–5981 (2022)
Isobe, T., Jia, X., Gu, S., Li, S., Wang, S., Tian, Q.: Video super-resolution with recurrent structure-detail network. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12357, pp. 645–660. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58610-2_38
Lin, J., Huang, Y., Wang, L.: FDAN: Flow-guided deformable alignment network for video super-resolution. arXiv preprint arXiv:2105.05640 (2021)
Liang, J., et al.: Recurrent video restoration transformer with guided deformable attention. arXiv preprint arXiv:2206.02146 (2022)
Zhong, Z., Gao, Y., Zheng, Y., Zheng, B.: Efficient spatio-temporal recurrent neural network for video deblurring. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12351, pp. 191–207. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58539-6_12
Fuoli, D., Gu, S., Timofte, R.: Efficient video super-resolution through recurrent latent space propagation. In: 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), pp. 3476–3485. IEEE (2019)
Yi, P., Wang, Z., Jiang, K., Jiang, J., Ma, J.: Progressive fusion video super-resolution network via exploiting non-local spatio-temporal correlations. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3106–3115 (2019)
Jo, Y., Oh, S.W., Kang, J., Kim, S.J.: Deep video super-resolution network using dynamic upsampling filters without explicit motion compensation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3224–3232 (2018)
Xue, T., Chen, B., Wu, J., Wei, D., Freeman, W.T.: Video enhancement with task-oriented flow. Int. J. Comput. Vision 127, 1106–1125 (2019)
Yan, B., Lin, C., Tan, W.: Frame and feature-context video super-resolution. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 5597–5604 (2019)
Jiang, L., Wang, N., Dang, Q., Liu, R., Lai, B.: PP-MSVSR: multi-stage video super-resolution. arXiv preprint arXiv:2112.02828 (2021)
Wang, X., Girshick, R., Gupta, A., He, K.: Non-local neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7794–7803. (2018)
Wang, H., Su, D., Liu, C., Jin, L., Sun, X., Peng, X.: Deformable non-local network for video super-resolution. IEEE Access 7, 177734–177744 (2019)
Li, Y., Zhu, H., Hou, Q., Wang, J., Wu, W.: Video super-resolution using multi-scale and non-local feature fusion. Electronics 11, 1499 (2022)
Mei, Y., Fan, Y., Zhou, Y., Huang, L., Huang, T.S., Shi, H.: Image super-resolution with cross-scale non-local attention and exhaustive self-exemplars mining. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5690–5699 (2020)
Liu, D., Wen, B., Fan, Y., Loy, C.C., Huang, T.S.: Non-local Recurrent Network for Image Restoration. In: Advances in Neural Information Processing Systems, vol. 31 (2018)
Zhang, Y., Li, K., Li, K., Zhong, B., Fu, Y.: Residual non-local attention networks for image restoration. arXiv preprint arXiv:1903.10082 (2019)
Zhang, Z., Cui, P., Zhu, W.: Deep learning on graphs: a survey. IEEE Trans. Knowl. Data Eng. 34, 249–270 (2020)
Nah, S., et al.: Ntire 2019 challenge on video deblurring and super-resolution: dataset and study. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 0–0 (2019)
Schultz, R.R., Stevenson, R.L.: Extraction of high-resolution frames from video sequences. IEEE Trans. Image Process. 5, 996–1011 (1996)
Liu, C., Sun, D.: On Bayesian adaptive video super resolution. IEEE Trans. Pattern Anal. Mach. Intell. 36, 346–360 (2013)
Tao, X., Gao, H., Liao, R., Wang, J., Jia, J.: Detail-revealing deep video super-resolution. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4472–44802017)
Haris, M., Shakhnarovich, G., Ukita, N.: Recurrent back-projection network for video super-resolution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3897–3906 (2019)
Acknowledgments
This work was supported by the Natural Science Foundation of Chongqing under Grant cstc2021jcyj-msxmX0411 and Grant CSTB2022NSCQ-MSX0873, the Science and Technology Research Program of Chongqing Municipal Education Commission under Grant KJZDK202001105, and the Scientific Research Foundation of Chongqing University of Technology under Grant 2020zdz029 and Grant 2020zdz030.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Zhou, Q., Liu, Q., Chen, F., Wang, L., Peng, Z. (2023). Multi-scale Non-local Bidirectional Fusion for Video Super-Resolution. In: Lu, H., et al. Image and Graphics . ICIG 2023. Lecture Notes in Computer Science, vol 14359. Springer, Cham. https://doi.org/10.1007/978-3-031-46317-4_15
Download citation
DOI: https://doi.org/10.1007/978-3-031-46317-4_15
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-46316-7
Online ISBN: 978-3-031-46317-4
eBook Packages: Computer ScienceComputer Science (R0)