Skip to main content
Log in

A multi-modal lecture video indexing and retrieval framework with multi-scale residual attention network and multi-similarity computation

  • Original Paper
  • Published:
Signal, Image and Video Processing Aims and scope Submit manuscript

Abstract

Due to technological development, the mass production of video and its storage on the Internet has increased. This made a huge amount of videos to be available on websites from various sources. Thus, the retrieval of essential lecture videos from multimedia is difficult. So, an effective way of indexing and retrieving the video by considering various similarities in the video features is suggested using the deep learning method in this paper. From the standardized set of data, the videos containing lectures are obtained for training. The optimal keyframes are selected from the obtained videos employing the Adaptive Anti-Corona virus Optimization Algorithm. Then the video contents are segmented and arranged on the basis of the optimized keyframes. The optical characters, such as semantic words and keywords, are recognized by means of Optical Character Reorganization, and the image features are extracted from the segmented frames with the help of a Multi-scale Residual Attention Network (MRAN). The generated pool of features is arranged and stored in the database according to the contents. Text and video queries are given as the input for testing the trained model. The features from the text query and the features of the optimized keyframes from the video query are obtained with the help of MRAN in the testing phase. The generated pool features from the text and video queries are compared with the features that are stored in the database for analyzing the similarities using Cosine, Jacquard, and Euclidean similarity indices. From this, the multi-similarity features are used for retrieval of the relevant videos in accordance with the provided query. The experimental results show that the performance of the proposed system for video indexing and retrieval is better and more efficient than the existing methods of video retrieval.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Data availability

The data underlying this article cannot be shared publicly due to the privacy.

References

  1. Paek, S., Chang, S.F.: Video-server retrieval scheduling and resource reservation for variable bit rate scalable video. IEEE Trans. Circuits Syst. Video Technol. 10(3), 460–474 (2000)

    Article  Google Scholar 

  2. Erol, B., Kossentini, F.: Shape-based retrieval of video objects. IEEE Trans. Multimedia 7(1), 179–182 (2005)

    Article  Google Scholar 

  3. Karpenko, A., Aarabi, P.: Tiny videos: a large data set for nonparametric video retrieval and frame classification. IEEE Trans. Pattern Anal. Mach. Intell. 33(3), 618–630 (2011)

    Article  PubMed  Google Scholar 

  4. Chang, H.S., Sull, S., Lee, S.U.: Efficient video indexing scheme for content-based retrieval. IEEE Trans. Circuits Syst. Video Technol. 9(8), 1269–1279 (1999)

    Article  Google Scholar 

  5. XingquanZhu, A.K., Elmagarmid, X.X., Wu, L., Catlin, A.C.: InsightVideo: toward hierarchical video content organization for efficient browsing, summarization and retrieval. IEEE Transact. Multimedia 7(4), 648–666 (2005)

    Article  Google Scholar 

  6. Xu, P., et al.: Fine-grained instance-level sketch-based video retrieval. IEEE Trans. Circuits Syst. Video Technol. 31(5), 1995–2007 (2021)

    Article  Google Scholar 

  7. Cotsaces, C., Nikolaidis, N., Pitas, I.: Face-based digital signatures for video retrieval. IEEE Trans. Circuits Syst. Video Technol. 18(4), 549–553 (2008)

    Article  Google Scholar 

  8. Hoi, S.C.H., Lyu, M.R.: A multimodal and multilevel ranking scheme for large-scale video retrieval. IEEE Trans. Multimedia 10(4), 607–619 (2008)

    Article  Google Scholar 

  9. Sze, K.W., Lam, K.M., Qiu, G.: A new key frame representation for video segment retrieval. IEEE Transact. Circuits Syst. Video Technol. 15(9), 1148–1155 (2005)

    Article  Google Scholar 

  10. Dyana, A., Das, S.: MST-CSS (multi-spectro-temporal curvature scale space), a novel spatio-temporal representation for content-based video retrieval. IEEE Trans. Circuits Syst. Video Technol. 20(8), 1080–1094 (2010)

    Article  Google Scholar 

  11. Hanjalic, A., Zhang, H.J.: An integrated scheme for automated video abstraction based on unsupervised cluster-validity analysis. IEEE Transact. Circuits Syst. for Video Technol. 9(8), 1280–1289 (1999)

    Article  Google Scholar 

  12. Hu, W., Xie, D., Fu, Z., Zeng, W., Maybank, S.: Semantic-based surveillance video retrieval. IEEE Trans. Image Process. 16(4), 1168–1181 (2007)

    Article  ADS  MathSciNet  PubMed  Google Scholar 

  13. Pritch, Y., Rav-Acha, A., Peleg, S.: Nonchronological Video Synopsis and Indexing. IEEE Trans. Pattern Anal. Mach. Intell. 30(11), 1971–1984 (2008)

    Article  PubMed  Google Scholar 

  14. Dagtas, S., Al-Khatib, W., Ghafoor, A., Kashyap, R.L.: Models for motion-based video indexing and retrieval. IEEE Trans. Image Process. 9(1), 88–101 (2000)

    Article  ADS  CAS  PubMed  Google Scholar 

  15. Erol, B., Kossentini, F.: Automatic key video object plane selection using the shape information in the MPEG-4 compressed domain. IEEE Trans. Multimedia 2(2), 129–138 (2000)

    Article  Google Scholar 

  16. Kang, E.K., Jahng, S.G., Choi, J.S.: A new indexing method for video retrieval using the rosette pattern. IEEE Trans. Consum. Electron. 46(3), 780–784 (2000)

    Article  Google Scholar 

  17. Fan, J., Luo, H., Elmagarmid, A.K.: Concept-oriented indexing of video databases: toward semantic sensitive retrieval and browsing. IEEE Transact. Image Process. 13(7), 974–992 (2004)

    Article  ADS  Google Scholar 

  18. Yang, E.H., Meinel, C.: Content based lecture video retrieval using speech and video text information. IEEE Transact. Learn. Technol. 7(2), 142–154 (2014)

    Article  Google Scholar 

  19. Khan, E., AlSalem, A.: Ivia: interactive video intelligent agent framework for instructional video information retrieval. Procedia–Soc. Behav. Sci. 64, 186–191 (2012)

    Article  Google Scholar 

  20. Stoica, A.S., Heras, S., Palanca, J., Julián, V., Mihaescu, M.C.: Classification of educational videos by using a semi-supervised learning method on transcripts and keywords. Neurocomputing 456, 637–647 (2021)

    Article  Google Scholar 

  21. Poornima, N., Saleena, B.: An automated approach to retrieve lecture videos using context based semantic features and deep learning. Sādhanā 45, 254 (2020)

    Article  Google Scholar 

  22. Behera, A., Lalanne, D., Ingold, R.: DocMIR: an automatic document-based indexing system for meeting retrieval. Multimedia Tools Appl. 37, 135–167 (2008)

    Article  Google Scholar 

  23. Muneesawang, P., Guan, L., Amin, T.: A new learning algorithm for the fusion of adaptive audio–visual features for the retrieval and classification of movie clips. J. Signal Process. Syst. 59, 177–188 (2010)

    Article  Google Scholar 

  24. Alatan, A.A., Akansu, A.N., Wolf, W.: Multi-modal dialog scene detection using hidden markov models for content-based multimedia indexing. Multimedia Tools Appl. 14, 137–151 (2001)

    Article  Google Scholar 

  25. Gupta, A., Yadav, D.: A novel approach to perform context-based automatic spoken document retrieval of political speeches based on wavelet tree indexing. Multimedia Tools Appl. 80, 22209–22229 (2021)

    Article  Google Scholar 

  26. Hassani, H., Mohebi, A., Ershadi M.J., Jalalimanesh, A.: A novel data quality framework for assessment of scientific lecture video indexing, Library Hi Tech, (2023).

  27. Zhang, Y., Li, Y., Cai, Z., Wang, X., Zhang, J. & Lam, S.: Key frame extraction method for lecture videos based on spatio-temporal subtitles, Multimedia Tools Appl., (2023).

  28. Veerakumar, T., Subudhi B.N., Kumar K.S., Da Rocha N.O. & Esakkirajan S.: Shot boundary detection from lecture video sequences using histogram of oriented gradients and radiometric correlation, Smart Comput. Vision, 35–59, (2023).

  29. Arazzi, M.: Marco ferretti and antonino nocera semantic hierarchical indexing for online video lessons using natural language processing. Big Data Cogn. Comput. 7(2), 107 (2023)

    Article  Google Scholar 

  30. Selvakanmani, S., Ashreetha, B., Devi, G.N., Misra, S., Jayavadivel, R., Perli, S.B.: Deep learning approach to solve image retrieval issues associated with IOT sensors. Measur. Sens. 24, 100458 (2022)

    Article  Google Scholar 

  31. Emami, H.: Anti-coronavirus optimization algorithm. Soft. Comput. 26, 4991–5023 (2022)

    Article  PubMed  PubMed Central  Google Scholar 

  32. Wang, F., Jiang, M., Qian, C., Yang, S. Li, C., Zhang, H., Wang, X. and Tang, X.: Residual attention network for image classification, Comput. Vision Pattern Recognit., 1, (2017).

  33. Rajesh Kanna, S. K., Sivakumar, K. and Lingaraj, N.: Development of deer hunting linked earthworm optimization algorithm for solving large scale traveling salesman problem, Knowl.-Based Syst., 227: 107199 (2021).

  34. Libing, Hu., Zhang, YongChun, Yousefi, N.: Nonlinear modeling of the polymer membrane fuel cells using deep belief networks and modified water strider algorithm. Energy Rep. 7, 2460–2469 (2021)

    Article  Google Scholar 

  35. Zhao, D., Yu, H., Fang, X., Tian, L., Han, P.: A path planning method based on multi- objective cauchy mutation cat swarm optimization algorithm for navigation system of intelligent patrol Car. IEEE Access 8, 151788–151803 (2020)

    Article  Google Scholar 

  36. Ye, Mu., Ruiwen, Ni., Chang, Z., Gong He, Hu., Tianli, L.S., Sun, Yu., Tong, Z., Ying, G.: A lightweight model of VGG-16 for remote sensing image classification. IEEE J. Select. Top. Appl. Earth Obs. Remote Sens. 14, 6916–6922 (2021)

    Article  ADS  Google Scholar 

  37. Wang, W., Li, H., Zhao, C., Kong, D., Zhang, P.: Interval estimation of motion intensity variation using the improved inception-V3 model. IEEE Access 9, 66017–66031 (2021)

    Article  Google Scholar 

  38. Roopashree, S., Anitha, J.: DeepHerb: a vision based system for medicinal plants using xception features. IEEE Access 9, 135927–135941 (2021)

    Article  Google Scholar 

Download references

Funding

This research did not receive any specific funding.

Author information

Authors and Affiliations

Authors

Contributions

All authors have made substantial contributions to conception and design, revising the manuscript, and the final approval of the version to be published. Also, all authors agreed to be accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.

Corresponding author

Correspondence to A. Debnath.

Ethics declarations

Conflict of interest

The authors declare no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (DOCX 8452 KB)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Debnath, A., Rao, K.S. & Das, P.P. A multi-modal lecture video indexing and retrieval framework with multi-scale residual attention network and multi-similarity computation. SIViP 18, 1993–2006 (2024). https://doi.org/10.1007/s11760-023-02744-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11760-023-02744-3

Keywords

Navigation