A multi-modal lecture video indexing and retrieval framework with multi-scale residual attention network and multi-similarity computation

Debnath, A.; Rao, K. Sreenivasa; Das, Partha P.

doi:10.1007/s11760-023-02744-3

A multi-modal lecture video indexing and retrieval framework with multi-scale residual attention network and multi-similarity computation

Original Paper
Published: 23 December 2023

Volume 18, pages 1993–2006, (2024)
Cite this article

Signal, Image and Video Processing Aims and scope Submit manuscript

A. Debnath¹,
K. Sreenivasa Rao² &
Partha P. Das²

95 Accesses
Explore all metrics

Abstract

Due to technological development, the mass production of video and its storage on the Internet has increased. This made a huge amount of videos to be available on websites from various sources. Thus, the retrieval of essential lecture videos from multimedia is difficult. So, an effective way of indexing and retrieving the video by considering various similarities in the video features is suggested using the deep learning method in this paper. From the standardized set of data, the videos containing lectures are obtained for training. The optimal keyframes are selected from the obtained videos employing the Adaptive Anti-Corona virus Optimization Algorithm. Then the video contents are segmented and arranged on the basis of the optimized keyframes. The optical characters, such as semantic words and keywords, are recognized by means of Optical Character Reorganization, and the image features are extracted from the segmented frames with the help of a Multi-scale Residual Attention Network (MRAN). The generated pool of features is arranged and stored in the database according to the contents. Text and video queries are given as the input for testing the trained model. The features from the text query and the features of the optimized keyframes from the video query are obtained with the help of MRAN in the testing phase. The generated pool features from the text and video queries are compared with the features that are stored in the database for analyzing the similarities using Cosine, Jacquard, and Euclidean similarity indices. From this, the multi-similarity features are used for retrieval of the relevant videos in accordance with the provided query. The experimental results show that the performance of the proposed system for video indexing and retrieval is better and more efficient than the existing methods of video retrieval.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An automated approach to retrieve lecture videos using context based semantic features and deep learning

Article 08 October 2020

An Efficient Scene Content-Based Indexing and Retrieval on Video Lectures

Index Point Detection and Semantic Indexing of Videos—A Comparative Review

Data availability

The data underlying this article cannot be shared publicly due to the privacy.

References

Paek, S., Chang, S.F.: Video-server retrieval scheduling and resource reservation for variable bit rate scalable video. IEEE Trans. Circuits Syst. Video Technol. 10(3), 460–474 (2000)
Article Google Scholar
Erol, B., Kossentini, F.: Shape-based retrieval of video objects. IEEE Trans. Multimedia 7(1), 179–182 (2005)
Article Google Scholar
Karpenko, A., Aarabi, P.: Tiny videos: a large data set for nonparametric video retrieval and frame classification. IEEE Trans. Pattern Anal. Mach. Intell. 33(3), 618–630 (2011)
Article PubMed Google Scholar
Chang, H.S., Sull, S., Lee, S.U.: Efficient video indexing scheme for content-based retrieval. IEEE Trans. Circuits Syst. Video Technol. 9(8), 1269–1279 (1999)
Article Google Scholar
XingquanZhu, A.K., Elmagarmid, X.X., Wu, L., Catlin, A.C.: InsightVideo: toward hierarchical video content organization for efficient browsing, summarization and retrieval. IEEE Transact. Multimedia 7(4), 648–666 (2005)
Article Google Scholar
Xu, P., et al.: Fine-grained instance-level sketch-based video retrieval. IEEE Trans. Circuits Syst. Video Technol. 31(5), 1995–2007 (2021)
Article Google Scholar
Cotsaces, C., Nikolaidis, N., Pitas, I.: Face-based digital signatures for video retrieval. IEEE Trans. Circuits Syst. Video Technol. 18(4), 549–553 (2008)
Article Google Scholar
Hoi, S.C.H., Lyu, M.R.: A multimodal and multilevel ranking scheme for large-scale video retrieval. IEEE Trans. Multimedia 10(4), 607–619 (2008)
Article Google Scholar
Sze, K.W., Lam, K.M., Qiu, G.: A new key frame representation for video segment retrieval. IEEE Transact. Circuits Syst. Video Technol. 15(9), 1148–1155 (2005)
Article Google Scholar
Dyana, A., Das, S.: MST-CSS (multi-spectro-temporal curvature scale space), a novel spatio-temporal representation for content-based video retrieval. IEEE Trans. Circuits Syst. Video Technol. 20(8), 1080–1094 (2010)
Article Google Scholar
Hanjalic, A., Zhang, H.J.: An integrated scheme for automated video abstraction based on unsupervised cluster-validity analysis. IEEE Transact. Circuits Syst. for Video Technol. 9(8), 1280–1289 (1999)
Article Google Scholar
Hu, W., Xie, D., Fu, Z., Zeng, W., Maybank, S.: Semantic-based surveillance video retrieval. IEEE Trans. Image Process. 16(4), 1168–1181 (2007)
Article ADS MathSciNet PubMed Google Scholar
Pritch, Y., Rav-Acha, A., Peleg, S.: Nonchronological Video Synopsis and Indexing. IEEE Trans. Pattern Anal. Mach. Intell. 30(11), 1971–1984 (2008)
Article PubMed Google Scholar
Dagtas, S., Al-Khatib, W., Ghafoor, A., Kashyap, R.L.: Models for motion-based video indexing and retrieval. IEEE Trans. Image Process. 9(1), 88–101 (2000)
Article ADS CAS PubMed Google Scholar
Erol, B., Kossentini, F.: Automatic key video object plane selection using the shape information in the MPEG-4 compressed domain. IEEE Trans. Multimedia 2(2), 129–138 (2000)
Article Google Scholar
Kang, E.K., Jahng, S.G., Choi, J.S.: A new indexing method for video retrieval using the rosette pattern. IEEE Trans. Consum. Electron. 46(3), 780–784 (2000)
Article Google Scholar
Fan, J., Luo, H., Elmagarmid, A.K.: Concept-oriented indexing of video databases: toward semantic sensitive retrieval and browsing. IEEE Transact. Image Process. 13(7), 974–992 (2004)
Article ADS Google Scholar
Yang, E.H., Meinel, C.: Content based lecture video retrieval using speech and video text information. IEEE Transact. Learn. Technol. 7(2), 142–154 (2014)
Article Google Scholar
Khan, E., AlSalem, A.: Ivia: interactive video intelligent agent framework for instructional video information retrieval. Procedia–Soc. Behav. Sci. 64, 186–191 (2012)
Article Google Scholar
Stoica, A.S., Heras, S., Palanca, J., Julián, V., Mihaescu, M.C.: Classification of educational videos by using a semi-supervised learning method on transcripts and keywords. Neurocomputing 456, 637–647 (2021)
Article Google Scholar
Poornima, N., Saleena, B.: An automated approach to retrieve lecture videos using context based semantic features and deep learning. Sādhanā 45, 254 (2020)
Article Google Scholar
Behera, A., Lalanne, D., Ingold, R.: DocMIR: an automatic document-based indexing system for meeting retrieval. Multimedia Tools Appl. 37, 135–167 (2008)
Article Google Scholar
Muneesawang, P., Guan, L., Amin, T.: A new learning algorithm for the fusion of adaptive audio–visual features for the retrieval and classification of movie clips. J. Signal Process. Syst. 59, 177–188 (2010)
Article Google Scholar
Alatan, A.A., Akansu, A.N., Wolf, W.: Multi-modal dialog scene detection using hidden markov models for content-based multimedia indexing. Multimedia Tools Appl. 14, 137–151 (2001)
Article Google Scholar
Gupta, A., Yadav, D.: A novel approach to perform context-based automatic spoken document retrieval of political speeches based on wavelet tree indexing. Multimedia Tools Appl. 80, 22209–22229 (2021)
Article Google Scholar
Hassani, H., Mohebi, A., Ershadi M.J., Jalalimanesh, A.: A novel data quality framework for assessment of scientific lecture video indexing, Library Hi Tech, (2023).
Zhang, Y., Li, Y., Cai, Z., Wang, X., Zhang, J. & Lam, S.: Key frame extraction method for lecture videos based on spatio-temporal subtitles, Multimedia Tools Appl., (2023).
Veerakumar, T., Subudhi B.N., Kumar K.S., Da Rocha N.O. & Esakkirajan S.: Shot boundary detection from lecture video sequences using histogram of oriented gradients and radiometric correlation, Smart Comput. Vision, 35–59, (2023).
Arazzi, M.: Marco ferretti and antonino nocera semantic hierarchical indexing for online video lessons using natural language processing. Big Data Cogn. Comput. 7(2), 107 (2023)
Article Google Scholar
Selvakanmani, S., Ashreetha, B., Devi, G.N., Misra, S., Jayavadivel, R., Perli, S.B.: Deep learning approach to solve image retrieval issues associated with IOT sensors. Measur. Sens. 24, 100458 (2022)
Article Google Scholar
Emami, H.: Anti-coronavirus optimization algorithm. Soft. Comput. 26, 4991–5023 (2022)
Article PubMed PubMed Central Google Scholar
Wang, F., Jiang, M., Qian, C., Yang, S. Li, C., Zhang, H., Wang, X. and Tang, X.: Residual attention network for image classification, Comput. Vision Pattern Recognit., 1, (2017).
Rajesh Kanna, S. K., Sivakumar, K. and Lingaraj, N.: Development of deer hunting linked earthworm optimization algorithm for solving large scale traveling salesman problem, Knowl.-Based Syst., 227: 107199 (2021).
Libing, Hu., Zhang, YongChun, Yousefi, N.: Nonlinear modeling of the polymer membrane fuel cells using deep belief networks and modified water strider algorithm. Energy Rep. 7, 2460–2469 (2021)
Article Google Scholar
Zhao, D., Yu, H., Fang, X., Tian, L., Han, P.: A path planning method based on multi- objective cauchy mutation cat swarm optimization algorithm for navigation system of intelligent patrol Car. IEEE Access 8, 151788–151803 (2020)
Article Google Scholar
Ye, Mu., Ruiwen, Ni., Chang, Z., Gong He, Hu., Tianli, L.S., Sun, Yu., Tong, Z., Ying, G.: A lightweight model of VGG-16 for remote sensing image classification. IEEE J. Select. Top. Appl. Earth Obs. Remote Sens. 14, 6916–6922 (2021)
Article ADS Google Scholar
Wang, W., Li, H., Zhao, C., Kong, D., Zhang, P.: Interval estimation of motion intensity variation using the improved inception-V3 model. IEEE Access 9, 66017–66031 (2021)
Article Google Scholar
Roopashree, S., Anitha, J.: DeepHerb: a vision based system for medicinal plants using xception features. IEEE Access 9, 135927–135941 (2021)
Article Google Scholar

Download references

Funding

This research did not receive any specific funding.

Author information

Authors and Affiliations

Indian Institute of Technology, Kharagpur, India
A. Debnath
Computer Science and Engineering, Indian Institute of Technology, Kharagpur, India
K. Sreenivasa Rao & Partha P. Das

Authors

A. Debnath
View author publications
You can also search for this author in PubMed Google Scholar
K. Sreenivasa Rao
View author publications
You can also search for this author in PubMed Google Scholar
Partha P. Das
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All authors have made substantial contributions to conception and design, revising the manuscript, and the final approval of the version to be published. Also, all authors agreed to be accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.

Corresponding author

Correspondence to A. Debnath.

Ethics declarations

Conflict of interest

The authors declare no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (DOCX 8452 KB)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Debnath, A., Rao, K.S. & Das, P.P. A multi-modal lecture video indexing and retrieval framework with multi-scale residual attention network and multi-similarity computation. SIViP 18, 1993–2006 (2024). https://doi.org/10.1007/s11760-023-02744-3

Download citation

Received: 30 June 2023
Revised: 02 August 2023
Accepted: 10 August 2023
Published: 23 December 2023
Issue Date: April 2024
DOI: https://doi.org/10.1007/s11760-023-02744-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A multi-modal lecture video indexing and retrieval framework with multi-scale residual attention network and multi-similarity computation

Abstract

Access this article

Similar content being viewed by others

An automated approach to retrieve lecture videos using context based semantic features and deep learning

An Efficient Scene Content-Based Indexing and Retrieval on Video Lectures

Index Point Detection and Semantic Indexing of Videos—A Comparative Review

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Supplementary Information

Supplementary file1 (DOCX 8452 KB)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A multi-modal lecture video indexing and retrieval framework with multi-scale residual attention network and multi-similarity computation

Abstract

Access this article

Similar content being viewed by others

An automated approach to retrieve lecture videos using context based semantic features and deep learning

An Efficient Scene Content-Based Indexing and Retrieval on Video Lectures

Index Point Detection and Semantic Indexing of Videos—A Comparative Review

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Supplementary Information

Supplementary file1 (DOCX 8452 KB)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation