Skip to main content

An End-to-End Fast No-Reference Video Quality Predictor with Spatiotemporal Feature Fusion

  • Conference paper
  • First Online:
Computer Vision and Image Processing (CVIP 2022)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1777))

Included in the following conference series:

  • 401 Accesses

Abstract

This work proposes a reliable and efficient end-to-end No-Reference Video Quality Assessment (NR-VQA) model that fuses deep spatial and temporal features. Since both spatial (semantic) and temporal (motion) features have a significant impact on video quality, we have developed an effective and fast predictor of video quality by combining both. ResNet-50, a well-known pre-trained image classification model, is employed to extract semantic features from video frames, whereas I3D, a well-known pre-trained action recognition model, is used to compute spatiotemporal features from short video clips. Further, extracted features are passed through a regressor head that consists of a Gated Recurrent Unit (GRU) followed by a Fully Connected (FC) layer. Four popular and widely used authentic distortion databases LIVE-VQC, KoNViD-1k, LIVE-Qualcomm, and CVD2014, are utilized for validating the performance. The proposed model demonstrates competitive results with a considerably decreased computation complexity.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 199.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Carreira, J., Zisserman, A.: Quo Vadis, action recognition? A new model and the kinetics dataset. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6299–6308 (2017)

    Google Scholar 

  2. Cho, K., Van Merriënboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: encoder-decoder approaches. arXiv preprint arXiv:1409.1259 (2014)

  3. Dendi, S.V.R., Channappayya, S.S.: No-reference video quality assessment using natural spatiotemporal scene statistics. IEEE Trans. Image Process. 29, 5612–5624 (2020)

    Article  MATH  Google Scholar 

  4. Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255. IEEE (2009)

    Google Scholar 

  5. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

    Google Scholar 

  6. Kay, W., et al.: The kinetics human action video dataset. arXiv preprint arXiv:1705.06950 (2017)

  7. Korhonen, J.: Two-level approach for no-reference consumer video quality assessment. IEEE Trans. Image Process. 28(12), 5923–5938 (2019)

    Article  MathSciNet  MATH  Google Scholar 

  8. Kundu, D., Ghadiyaram, D., Bovik, A.C., Evans, B.L.: No-reference quality assessment of tone-mapped HDR pictures. IEEE Trans. Image Process. 26(6), 2957–2971 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  9. Li, D., Jiang, T., Jiang, M.: Quality assessment of in-the-wild videos. In: Proceedings of the 27th ACM International Conference on Multimedia, pp. 2351–2359 (2019)

    Google Scholar 

  10. Li, X., Guo, Q., Lu, X.: Spatiotemporal statistics for video quality assessment. IEEE Trans. Image Process. 25(7), 3329–3342 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  11. Mittal, A., Moorthy, A.K., Bovik, A.C.: No-reference image quality assessment in the spatial domain. IEEE Trans. Image Process. 21(12), 4695–4708 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  12. Mittal, A., Saad, M.A., Bovik, A.C.: A completely blind video integrity oracle. IEEE Trans. Image Process. 25(1), 289–300 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  13. Mittal, A., Soundararajan, R., Bovik, A.C.: Making a “completely blind” image quality analyzer. IEEE Sig. Process. Lett. 20(3), 209–212 (2012)

    Google Scholar 

  14. Saad, M.A., Bovik, A.C., Charrier, C.: Blind prediction of natural video quality. IEEE Trans. Image Process. 23(3), 1352–1365 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  15. Tu, Z., Wang, Y., Birkbeck, N., Adsumilli, B., Bovik, A.C.: UGC-VQA: benchmarking blind video quality assessment for user generated content. IEEE Trans. Image Process. 30, 4449–4464 (2021)

    Article  Google Scholar 

  16. Tu, Z., Yu, X., Wang, Y., Birkbeck, N., Adsumilli, B., Bovik, A.C.: Rapique: rapid and accurate video quality prediction of user generated content. IEEE Open J. Sig. Process. 2, 425–440 (2021)

    Article  Google Scholar 

  17. Vishwakarma, A.K., Bhurchandi, K.M.: 3D-DWT cross-band statistics and features for no-reference video quality assessment (NR-VQA). Optik, 167774 (2021)

    Google Scholar 

  18. You, J., Korhonen, J.: Deep neural networks for no-reference video quality assessment. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 2349–2353. IEEE (2019)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Anish Kumar Vishwakarma .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Vishwakarma, A.K., Bhurchandi, K.M. (2023). An End-to-End Fast No-Reference Video Quality Predictor with Spatiotemporal Feature Fusion. In: Gupta, D., Bhurchandi, K., Murala, S., Raman, B., Kumar, S. (eds) Computer Vision and Image Processing. CVIP 2022. Communications in Computer and Information Science, vol 1777. Springer, Cham. https://doi.org/10.1007/978-3-031-31417-9_48

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-31417-9_48

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-31416-2

  • Online ISBN: 978-3-031-31417-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics