Abstract
Due to technological development, the mass production of video and its storage on the Internet has increased. This made a huge amount of videos to be available on websites from various sources. Thus, the retrieval of essential lecture videos from multimedia is difficult. So, an effective way of indexing and retrieving the video by considering various similarities in the video features is suggested using the deep learning method in this paper. From the standardized set of data, the videos containing lectures are obtained for training. The optimal keyframes are selected from the obtained videos employing the Adaptive Anti-Corona virus Optimization Algorithm. Then the video contents are segmented and arranged on the basis of the optimized keyframes. The optical characters, such as semantic words and keywords, are recognized by means of Optical Character Reorganization, and the image features are extracted from the segmented frames with the help of a Multi-scale Residual Attention Network (MRAN). The generated pool of features is arranged and stored in the database according to the contents. Text and video queries are given as the input for testing the trained model. The features from the text query and the features of the optimized keyframes from the video query are obtained with the help of MRAN in the testing phase. The generated pool features from the text and video queries are compared with the features that are stored in the database for analyzing the similarities using Cosine, Jacquard, and Euclidean similarity indices. From this, the multi-similarity features are used for retrieval of the relevant videos in accordance with the provided query. The experimental results show that the performance of the proposed system for video indexing and retrieval is better and more efficient than the existing methods of video retrieval.
Similar content being viewed by others
Data availability
The data underlying this article cannot be shared publicly due to the privacy.
References
Paek, S., Chang, S.F.: Video-server retrieval scheduling and resource reservation for variable bit rate scalable video. IEEE Trans. Circuits Syst. Video Technol. 10(3), 460–474 (2000)
Erol, B., Kossentini, F.: Shape-based retrieval of video objects. IEEE Trans. Multimedia 7(1), 179–182 (2005)
Karpenko, A., Aarabi, P.: Tiny videos: a large data set for nonparametric video retrieval and frame classification. IEEE Trans. Pattern Anal. Mach. Intell. 33(3), 618–630 (2011)
Chang, H.S., Sull, S., Lee, S.U.: Efficient video indexing scheme for content-based retrieval. IEEE Trans. Circuits Syst. Video Technol. 9(8), 1269–1279 (1999)
XingquanZhu, A.K., Elmagarmid, X.X., Wu, L., Catlin, A.C.: InsightVideo: toward hierarchical video content organization for efficient browsing, summarization and retrieval. IEEE Transact. Multimedia 7(4), 648–666 (2005)
Xu, P., et al.: Fine-grained instance-level sketch-based video retrieval. IEEE Trans. Circuits Syst. Video Technol. 31(5), 1995–2007 (2021)
Cotsaces, C., Nikolaidis, N., Pitas, I.: Face-based digital signatures for video retrieval. IEEE Trans. Circuits Syst. Video Technol. 18(4), 549–553 (2008)
Hoi, S.C.H., Lyu, M.R.: A multimodal and multilevel ranking scheme for large-scale video retrieval. IEEE Trans. Multimedia 10(4), 607–619 (2008)
Sze, K.W., Lam, K.M., Qiu, G.: A new key frame representation for video segment retrieval. IEEE Transact. Circuits Syst. Video Technol. 15(9), 1148–1155 (2005)
Dyana, A., Das, S.: MST-CSS (multi-spectro-temporal curvature scale space), a novel spatio-temporal representation for content-based video retrieval. IEEE Trans. Circuits Syst. Video Technol. 20(8), 1080–1094 (2010)
Hanjalic, A., Zhang, H.J.: An integrated scheme for automated video abstraction based on unsupervised cluster-validity analysis. IEEE Transact. Circuits Syst. for Video Technol. 9(8), 1280–1289 (1999)
Hu, W., Xie, D., Fu, Z., Zeng, W., Maybank, S.: Semantic-based surveillance video retrieval. IEEE Trans. Image Process. 16(4), 1168–1181 (2007)
Pritch, Y., Rav-Acha, A., Peleg, S.: Nonchronological Video Synopsis and Indexing. IEEE Trans. Pattern Anal. Mach. Intell. 30(11), 1971–1984 (2008)
Dagtas, S., Al-Khatib, W., Ghafoor, A., Kashyap, R.L.: Models for motion-based video indexing and retrieval. IEEE Trans. Image Process. 9(1), 88–101 (2000)
Erol, B., Kossentini, F.: Automatic key video object plane selection using the shape information in the MPEG-4 compressed domain. IEEE Trans. Multimedia 2(2), 129–138 (2000)
Kang, E.K., Jahng, S.G., Choi, J.S.: A new indexing method for video retrieval using the rosette pattern. IEEE Trans. Consum. Electron. 46(3), 780–784 (2000)
Fan, J., Luo, H., Elmagarmid, A.K.: Concept-oriented indexing of video databases: toward semantic sensitive retrieval and browsing. IEEE Transact. Image Process. 13(7), 974–992 (2004)
Yang, E.H., Meinel, C.: Content based lecture video retrieval using speech and video text information. IEEE Transact. Learn. Technol. 7(2), 142–154 (2014)
Khan, E., AlSalem, A.: Ivia: interactive video intelligent agent framework for instructional video information retrieval. Procedia–Soc. Behav. Sci. 64, 186–191 (2012)
Stoica, A.S., Heras, S., Palanca, J., Julián, V., Mihaescu, M.C.: Classification of educational videos by using a semi-supervised learning method on transcripts and keywords. Neurocomputing 456, 637–647 (2021)
Poornima, N., Saleena, B.: An automated approach to retrieve lecture videos using context based semantic features and deep learning. Sādhanā 45, 254 (2020)
Behera, A., Lalanne, D., Ingold, R.: DocMIR: an automatic document-based indexing system for meeting retrieval. Multimedia Tools Appl. 37, 135–167 (2008)
Muneesawang, P., Guan, L., Amin, T.: A new learning algorithm for the fusion of adaptive audio–visual features for the retrieval and classification of movie clips. J. Signal Process. Syst. 59, 177–188 (2010)
Alatan, A.A., Akansu, A.N., Wolf, W.: Multi-modal dialog scene detection using hidden markov models for content-based multimedia indexing. Multimedia Tools Appl. 14, 137–151 (2001)
Gupta, A., Yadav, D.: A novel approach to perform context-based automatic spoken document retrieval of political speeches based on wavelet tree indexing. Multimedia Tools Appl. 80, 22209–22229 (2021)
Hassani, H., Mohebi, A., Ershadi M.J., Jalalimanesh, A.: A novel data quality framework for assessment of scientific lecture video indexing, Library Hi Tech, (2023).
Zhang, Y., Li, Y., Cai, Z., Wang, X., Zhang, J. & Lam, S.: Key frame extraction method for lecture videos based on spatio-temporal subtitles, Multimedia Tools Appl., (2023).
Veerakumar, T., Subudhi B.N., Kumar K.S., Da Rocha N.O. & Esakkirajan S.: Shot boundary detection from lecture video sequences using histogram of oriented gradients and radiometric correlation, Smart Comput. Vision, 35–59, (2023).
Arazzi, M.: Marco ferretti and antonino nocera semantic hierarchical indexing for online video lessons using natural language processing. Big Data Cogn. Comput. 7(2), 107 (2023)
Selvakanmani, S., Ashreetha, B., Devi, G.N., Misra, S., Jayavadivel, R., Perli, S.B.: Deep learning approach to solve image retrieval issues associated with IOT sensors. Measur. Sens. 24, 100458 (2022)
Emami, H.: Anti-coronavirus optimization algorithm. Soft. Comput. 26, 4991–5023 (2022)
Wang, F., Jiang, M., Qian, C., Yang, S. Li, C., Zhang, H., Wang, X. and Tang, X.: Residual attention network for image classification, Comput. Vision Pattern Recognit., 1, (2017).
Rajesh Kanna, S. K., Sivakumar, K. and Lingaraj, N.: Development of deer hunting linked earthworm optimization algorithm for solving large scale traveling salesman problem, Knowl.-Based Syst., 227: 107199 (2021).
Libing, Hu., Zhang, YongChun, Yousefi, N.: Nonlinear modeling of the polymer membrane fuel cells using deep belief networks and modified water strider algorithm. Energy Rep. 7, 2460–2469 (2021)
Zhao, D., Yu, H., Fang, X., Tian, L., Han, P.: A path planning method based on multi- objective cauchy mutation cat swarm optimization algorithm for navigation system of intelligent patrol Car. IEEE Access 8, 151788–151803 (2020)
Ye, Mu., Ruiwen, Ni., Chang, Z., Gong He, Hu., Tianli, L.S., Sun, Yu., Tong, Z., Ying, G.: A lightweight model of VGG-16 for remote sensing image classification. IEEE J. Select. Top. Appl. Earth Obs. Remote Sens. 14, 6916–6922 (2021)
Wang, W., Li, H., Zhao, C., Kong, D., Zhang, P.: Interval estimation of motion intensity variation using the improved inception-V3 model. IEEE Access 9, 66017–66031 (2021)
Roopashree, S., Anitha, J.: DeepHerb: a vision based system for medicinal plants using xception features. IEEE Access 9, 135927–135941 (2021)
Funding
This research did not receive any specific funding.
Author information
Authors and Affiliations
Contributions
All authors have made substantial contributions to conception and design, revising the manuscript, and the final approval of the version to be published. Also, all authors agreed to be accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Debnath, A., Rao, K.S. & Das, P.P. A multi-modal lecture video indexing and retrieval framework with multi-scale residual attention network and multi-similarity computation. SIViP 18, 1993–2006 (2024). https://doi.org/10.1007/s11760-023-02744-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11760-023-02744-3