An End-to-End Fast No-Reference Video Quality Predictor with Spatiotemporal Feature Fusion

Vishwakarma, Anish Kumar; Bhurchandi, Kishor M.

doi:10.1007/978-3-031-31417-9_48

Anish Kumar Vishwakarma¹⁰ &
Kishor M. Bhurchandi¹⁰

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1777))

Included in the following conference series:

International Conference on Computer Vision and Image Processing

401 Accesses

Abstract

This work proposes a reliable and efficient end-to-end No-Reference Video Quality Assessment (NR-VQA) model that fuses deep spatial and temporal features. Since both spatial (semantic) and temporal (motion) features have a significant impact on video quality, we have developed an effective and fast predictor of video quality by combining both. ResNet-50, a well-known pre-trained image classification model, is employed to extract semantic features from video frames, whereas I3D, a well-known pre-trained action recognition model, is used to compute spatiotemporal features from short video clips. Further, extracted features are passed through a regressor head that consists of a Gated Recurrent Unit (GRU) followed by a Fully Connected (FC) layer. Four popular and widely used authentic distortion databases LIVE-VQC, KoNViD-1k, LIVE-Qualcomm, and CVD2014, are utilized for validating the performance. The proposed model demonstrates competitive results with a considerably decreased computation complexity.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 149.00; Price excludes VAT (USA)

Softcover Book: USD 199.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Carreira, J., Zisserman, A.: Quo Vadis, action recognition? A new model and the kinetics dataset. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6299–6308 (2017)
Google Scholar
Cho, K., Van Merriënboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: encoder-decoder approaches. arXiv preprint arXiv:1409.1259 (2014)
Dendi, S.V.R., Channappayya, S.S.: No-reference video quality assessment using natural spatiotemporal scene statistics. IEEE Trans. Image Process. 29, 5612–5624 (2020)
Article MATH Google Scholar
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255. IEEE (2009)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Kay, W., et al.: The kinetics human action video dataset. arXiv preprint arXiv:1705.06950 (2017)
Korhonen, J.: Two-level approach for no-reference consumer video quality assessment. IEEE Trans. Image Process. 28(12), 5923–5938 (2019)
Article MathSciNet MATH Google Scholar
Kundu, D., Ghadiyaram, D., Bovik, A.C., Evans, B.L.: No-reference quality assessment of tone-mapped HDR pictures. IEEE Trans. Image Process. 26(6), 2957–2971 (2017)
Article MathSciNet MATH Google Scholar
Li, D., Jiang, T., Jiang, M.: Quality assessment of in-the-wild videos. In: Proceedings of the 27th ACM International Conference on Multimedia, pp. 2351–2359 (2019)
Google Scholar
Li, X., Guo, Q., Lu, X.: Spatiotemporal statistics for video quality assessment. IEEE Trans. Image Process. 25(7), 3329–3342 (2016)
Article MathSciNet MATH Google Scholar
Mittal, A., Moorthy, A.K., Bovik, A.C.: No-reference image quality assessment in the spatial domain. IEEE Trans. Image Process. 21(12), 4695–4708 (2012)
Article MathSciNet MATH Google Scholar
Mittal, A., Saad, M.A., Bovik, A.C.: A completely blind video integrity oracle. IEEE Trans. Image Process. 25(1), 289–300 (2015)
Article MathSciNet MATH Google Scholar
Mittal, A., Soundararajan, R., Bovik, A.C.: Making a “completely blind” image quality analyzer. IEEE Sig. Process. Lett. 20(3), 209–212 (2012)
Google Scholar
Saad, M.A., Bovik, A.C., Charrier, C.: Blind prediction of natural video quality. IEEE Trans. Image Process. 23(3), 1352–1365 (2014)
Article MathSciNet MATH Google Scholar
Tu, Z., Wang, Y., Birkbeck, N., Adsumilli, B., Bovik, A.C.: UGC-VQA: benchmarking blind video quality assessment for user generated content. IEEE Trans. Image Process. 30, 4449–4464 (2021)
Article Google Scholar
Tu, Z., Yu, X., Wang, Y., Birkbeck, N., Adsumilli, B., Bovik, A.C.: Rapique: rapid and accurate video quality prediction of user generated content. IEEE Open J. Sig. Process. 2, 425–440 (2021)
Article Google Scholar
Vishwakarma, A.K., Bhurchandi, K.M.: 3D-DWT cross-band statistics and features for no-reference video quality assessment (NR-VQA). Optik, 167774 (2021)
Google Scholar
You, J., Korhonen, J.: Deep neural networks for no-reference video quality assessment. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 2349–2353. IEEE (2019)
Google Scholar

Download references

Author information

Authors and Affiliations

Vivesvaraya National Institute of Technology, Nagpur, India
Anish Kumar Vishwakarma & Kishor M. Bhurchandi

Authors

Anish Kumar Vishwakarma
View author publications
You can also search for this author in PubMed Google Scholar
Kishor M. Bhurchandi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Anish Kumar Vishwakarma .

Editor information

Editors and Affiliations

Visvesvaraya National Institute of Technology Nagpur, Nagpur, India
Deep Gupta
Visvesvaraya National Institute of Technology Nagpur, Nagpur, India
Kishor Bhurchandi
Indian Institute of Technology Ropar, Rupnagar, India
Subrahmanyam Murala
Indian Institute of Technology Roorkee, Roorkee, India
Balasubramanian Raman
Indian Institute of Technology Roorkee, Roorkee, India
Sanjeev Kumar

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Vishwakarma, A.K., Bhurchandi, K.M. (2023). An End-to-End Fast No-Reference Video Quality Predictor with Spatiotemporal Feature Fusion. In: Gupta, D., Bhurchandi, K., Murala, S., Raman, B., Kumar, S. (eds) Computer Vision and Image Processing. CVIP 2022. Communications in Computer and Information Science, vol 1777. Springer, Cham. https://doi.org/10.1007/978-3-031-31417-9_48

Download citation

DOI: https://doi.org/10.1007/978-3-031-31417-9_48
Published: 07 May 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-31416-2
Online ISBN: 978-3-031-31417-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

An End-to-End Fast No-Reference Video Quality Predictor with Spatiotemporal Feature Fusion