Skip to main content
Log in

Efficient text-based query based on multi-level and deep-semantic multimedia indexing and retrieval

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Recent technological advancements have led to a significant increase in the quantity and accessibility of videos. The decrease in video acquisition costs and the increase in memory capacity have made it possible to store large video collections in computer systems. To effectively exploit these collections, it is crucial to have tools that facilitate access and management. In this paper, we present a multimedia retrieval approach that prioritizes the user’s needs by starting with a text-based query. The approach consists of two main parts: (i) a new multi-level and deep-semantic video classification indexing method, and (ii) a query expansion mechanism and relevance feedback system to improve the results based on the user’s feedback. Our contribution is demonstrated through the implementation of the Deep-VISEN prototype and experiments on a collection of 2700 videos and 62838 images. The results show that our algorithm is effective and precise.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22
Fig. 23
Fig. 24
Fig. 25
Fig. 26
Fig. 27
Fig. 28
Fig. 29
Fig. 30
Fig. 31

Similar content being viewed by others

References

  1. Hamroun M, Lajmi S, Nicolas H, Amous I (2019) VISEN: a video interactive retrieval engine based on semantic network in large video collections. In: Proceedings of the 23rd international database applications & engineering symposium, association for computing machinery, New York, USA, IDEAS ’19, pp 1–10. https://doi.org/10.1145/3331076.3331094. Accessed 07 Jan 2023

  2. Chen J, Mao J, Liu Y, Zhang F, Min Z, Ma S (2021). Towards a better understanding of query reformulation behavior in web search. https://doi.org/10.1145/3442381.3450127

    Article  Google Scholar 

  3. Ntirogiannis K, Gatos B, Pratikakis I (2011) Binarization of textual content in video frames. In: 2011 International conference on document analysis and recognition, pp 673–677. https://doi.org/10.1109/ICDAR.2011.141

  4. Christel MG, Hauptmann AG (2005) The use and utility of high-level semantic features in video retrieval. In: Leow WK, Lew MS, Chua TS, Ma WY, Chaisorn L, Bakker EM (eds) Image and video retrieval. Springer, Berlin Heidelberg, pp 134–144

  5. Snoek C, Worring M, Koelma D (2023) Smeulders A (2007) A learned Lexicon-Driven Paradigm for interactive video retrieval. IEEE Trans Multimed 9(2):280–292. https://doi.org/10.1109/TMM.2006.886275 Accessed 19 Jan

  6. Worring M, Snoek C, de Rooij O, Nguyen G, van Balen R, Koelma D (2006) Mediamill: advanced browsing in news video archives. Lect Notes Comput Sci 533–536. Accessed 19 Jan 2023

  7. Vrochidis S, Moumtzidou A, King P, Dimou A, Mezaris V, Kompatsiaris I (2010) VERGE: a video interactive retrieval engine. In: 2010 International workshop on content based multimedia indexing (CBMI), pp 1–6. https://doi.org/10.1109/CBMI.2010.5529884, iSSN: 1949-3991

  8. Hu WM, Xie NH, Li L, Zeng XL, Maybank S (2011) A survey on visual content-based video indexing and retrieval. IEEE Trans Syst Man Cybern Part C Appl Rev 41:797–819. https://doi.org/10.1109/TSMCC.2011.2109710 Recherche Google

  9. Etter D (2009) KB Video Retrieval at TRECVID 2011. https://www.semanticscholar.org/paper/KB-Video-Retrieval-at-TRECVID-2011-Etter/3d454d230f04e396d8d5379a2621689793157cb7. Accessed 19 Jan 2023

  10. Ellouze N, Lammari N, Métais E, Ahmed MB CITOM: approche de construction incrémentale d’une Topic Map multilingue

  11. Rossetto L, Giangreco I, Tănase C, Schuldt H (2017) Multimodal Video Retrieval with the 2017 IMOTION System. In: Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval, Association for Computing Machinery, New York, NY, USA, ICMR ’17, pp 457–460. https://doi.org/10.1145/3078971.3079012. Accessed 19 Jan 2023

  12. Feki I, Anis Ba, Alimi A (2012) New process to identify audio concepts based on binary classifiers encapsulation. Int J Comput Electr Eng 515–518. https://doi.org/10.7763/IJCEE.2012.V4.546

  13. Elleuch N, Zarka M, Feki I, Anis Ba, Alimi A (2010) Regimvid at trecvid2010: semantic indexing. https://doi.org/10.13140/2.1.4395.3607

  14. Elleuch N, Anis Ba, Alimi A (2014) A generic framework for semantic video indexing based on visual concepts/contexts detection. Multimed Tools Appl 74. https://doi.org/10.1007/s11042-014-1955-9

  15. Smeulders A, Worring M, Santini S, Gupta A, Jain R (2000) Content-based image retrieval at the end of the early years. IEEE Trans Pattern Anal Mach Intel 22(12):1349–1380. https://doi.org/10.1109/34.895972. Conference Name: IEEE Transactions on Pattern Analysis and Machine Intelligence

  16. Toriah STM, Ghalwash AZ (2023) Youssif AAA (2018) Semantic-based video retrieval survey. J Comput Commun 6(8):28–44. Number: 8 Publisher: Scientific Research Publishing. https://doi.org/10.4236/jcc.2018.68003 Accessed 07 Jan

  17. Sjoberg M, Viitaniemi V, Koskela M, Laaksonen J () PicSOM Experiments in TRECVID 2009

  18. Slimi J, Mansouri S, Ben Ammar A, Alimi AM (2013a) Video exploration tool based on semantic network. In: Proceedings of the 10th conference on open research areas in information retrieval, LE Centre De Hautes Etudes Internationales D’informatique Documentaire, Paris, FRA, OAIR ’13, pp 213–214

  19. Slimi J, Ben Ammar A, Alimi AM (2013b) Interactive video data visualization system based on semantic organization. In: 2013 11th International workshop on content-based multimedia indexing (CBMI), pp 161–166. https://doi.org/10.1109/CBMI.2013.6576575. iSSN: 1949-3991

  20. Halima MB, Hamroun M, Moussa SB, Alimi AM (2013) An interactive engine for multilingual video browsing using semantic content. https://doi.org/10.48550/arXiv.1308.3225. . Accessed 19 Jan 2023

  21. Zhang Z, Li W, Gurrin C, Smeaton AF (2016) Faceted navigation for browsing large video collection. In: Tian Q, Sebe N, Qi GJ, Huet B, Hong R, Liu X (eds) MultiMedia modeling, Springer International Publishing, Cham, Lecture Notes in Computer Science, pp 412–417. https://doi.org/10.1007/978-3-319-27674-8_42

  22. Galanopoulos D, Markatopoulou F, Mezaris V, Patras I (2017) Concept language models and event-based concept number selection for zero-example event detection. In: Proceedings of the 2017 ACM on international conference on multimedia retrieval, Association for Computing Machinery, New York, USA, ICMR ’17, pp 397–401. https://doi.org/10.1145/3078971.3079043. Accessed 12 Jan 2023

  23. Janwe N, Bhoyar K (2020) Semantic concept based video retrieval using convolutional neural network. SN Appl Sci 2:80. https://doi.org/10.1007/s42452-019-1870-9

    Article  Google Scholar 

  24. Amato F, Greco L, Persia F, Poccia SR, De Santo A (2015) Content-based multimedia retrieval. In: Colace F, De Santo M, Moscato V, Picariello A, Schreiber FA, Tanca L (eds) Data management in pervasive systems, data-centric systems and applications, Springer International Publishing, Cham, pp 291–310. https://doi.org/10.1007/978-3-319-20062-0_14. Accessed 30 Dec 2022

  25. Faudemay P, Seyrat C (1997) Intelligent delivery of personalised video programmes from a video database. In: Database and expert systems applications. 8th International conference, DEXA ’97. Proceedings, pp 172–177. https://doi.org/10.1109/DEXA.1997.617264

  26. Meng L, Tan AH, Xu D (2013) Semi-supervised heterogeneous fusion for multimedia data co-clustering. IEEE Transactions on Knowledge and Data Engineering 26. https://doi.org/10.1109/TKDE.2013.47

  27. Poria S, Chaturvedi I, Cambria E, Hussain A (2016) Convolutional mkl based multimodal emotion recognition and sentiment analysis, pp 439–448. https://doi.org/10.1109/ICDM.2016.0055

  28. Xu J, Huang F, Zhang X, Wang S, Li C, Li Z, He Y (2019) Visual-textual sentiment classification with bi-directional multi-level attention networks. Knowl-Based Syst 178. https://doi.org/10.1016/j.knosys.2019.04.018

  29. Xu J, Huang F, Zhang X, Wang S, Li C, Li Z, He Y (2019) Sentiment analysis of social images via hierarchical deep fusion of content and links. Appl Soft Comput 80. https://doi.org/10.1016/j.asoc.2019.04.010

  30. Huang F, Zhang X, Zhao Z, Xu J, Li Z (2019) Image-text sentiment analysis via deep multimodal attentive fusion. Knowl Based Syst 167:26–37

    Article  Google Scholar 

  31. Yadav A, Vishwakarma D (2020) Sentiment analysis using deep learning architectures: a review. Artif Intell Rev 53:1–51. https://doi.org/10.1007/s10462-019-09794-5

    Article  Google Scholar 

  32. Xu N (2017) Analyzing multimodal public sentiment based on hierarchical semantic attentional network, pp 152–154. https://doi.org/10.1109/ISI.2017.8004895

  33. Chen F, Ji R, Su J, Cao D, Gao Y (2017) Predicting microblog sentiments via weakly supervised multi-modal deep learning. IEEE Trans Multimed PP:1. https://doi.org/10.1109/TMM.2017.2757769

  34. Zhao Z, Zhu H, Xue Z, Liu Z, Tian J, Chua M, Liu M (2019) An image-text consistency driven multimodal sentiment analysis approach for social media. Inf Process Manag 56. https://doi.org/10.1016/j.ipm.2019.102097

  35. Yu J, Jiang J, Xia R (2020) Entity-sensitive attention and fusion network for entity-level multimodal sentiment classification. IEEE/ACM Trans Audio Speech Lang Process 28:429–439. https://doi.org/10.1109/TASLP.2019.2957872

    Article  Google Scholar 

  36. Liu AA, Shao Z, Wong Y, Li J, Yu-Ting S, Kankanhalli M (2019) Lstm-based multi-label video event detection. Multimed Tools Appl 78. https://doi.org/10.1007/s11042-017-5532-x

  37. Shao Z, Han J, Debattista K, Pang Y (2023) Textual context-aware dense captioning with diverse words. IEEE Trans Multimed 1–15. https://doi.org/10.1109/TMM.2023.3241517

  38. Hu X, Gan Z, Wang J, Yang Z, Liu Z, Lu Y, Wang L (2021) Scaling up vision-language pretraining for image captioning. 2022 IEEE/CVF Conference on computer vision and pattern recognition (CVPR), pp 17959–17968

  39. Shao Z, Han J, Marnerides D, Debattista K (2022) Region-object relation-aware dense captioning via transformer. IEEE Trans Neural Netw Learn Syst

  40. Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, Lin S, Guo B (2021) Swin transformer: hierarchical vision transformer using shifted windows. 2021 IEEE/CVF Int Conf Comput Vision (ICCV), pp 9992–10002

  41. Furnas GW, Landauer TK, Gomez LM, Dumais ST (1987) The vocabulary problem in human-system communication.Commun ACM 30(11):964–71

  42. Maron ME, Kuhns JL (1960) On relevance, probabilistic indexing and information retrieval. J ACM 7:216–244

    Article  Google Scholar 

  43. Rocchio JJ (1971) Relevance feedback in information retrieval. In: Salton G (ed) The Smart retrieval system-experiments in automatic document processing. Prentice-Hall, Englewood Cliffs, NJ, pp 313–323

  44. Jones KS (1971) Automatic keyword classification for information retrieval. https://api.semanticscholar.org/CorpusID:62724133

  45. Rijsbergen CV (1977) A theoretical basis for the use of co-occurrence data in information retrieval. J Doc 33(2):106–119. https://doi.org/10.1108/eb026637

    Article  Google Scholar 

  46. (1986) A non-classical logic for information retrieval. Comput J 29(6):481–485

  47. PORTER M (1982) Implementing a probabilistic information retrieval system

  48. Yu CT, Buckley C, Lam K, Salton G (1983) A generalized term dependence model in information retrieval. Cornell University, Tech. rep

    Google Scholar 

  49. Harman D (1992) Relevance feedback revisited. In: Proceedings of the 15th annual international ACM SIGIR conference on research and development in information retrieval, pp 1–10

  50. (2020) Statista: average number of search terms for online search queries in the united states as of january 2020. https://www.statista.com/statistics/269740/number-of-search-terms-in-internet-research-in-the-us/

  51. keyworddiscovery (2020) Keyword: query size by country. https://www.keyworddiscovery.com/keyword-stats.html

  52. Azad H, Deepak A, Chakraborty C, Abhishek K (2022) Improving query expansion using pseudo-relevant web knowledge for information retrieval. Pattern Recognit Lett 158. https://doi.org/10.1016/j.patrec.2022.04.013

  53. Azad HK, Deepak A (2017) Query expansion techniques for information retrieval: a survey. CoRR abs/1708.00247. http://arxiv.org/abs/1708.00247

  54. Hamid A (2017) Relevance feedback in information retrieval systems

  55. Nguyen HQ, Lam K, Le LT, Pham HH, Tran DQ, Nguyen DB, Le DD, Pham CM, Tong HTT, Dinh DH, Do CD, Doan LT, Nguyen CN, Nguyen BT, Nguyen QV, Hoang AD, Phan HN, Nguyen AT, Ho PH, Ngo DT, Nguyen NT, Nguyen NT, Dao M, Vu V (2020) Vindr-CXR: an open dataset of chest x-rays with radiologist’s annotations. https://doi.org/10.48550/ARXIV.2012.15029. https://arxiv.org/abs/2012.15029

  56. Kermany DS, Zhang K, Goldbaum MH (2018) Labeled optical coherence tomography (oct) and chest x-ray images for classification

  57. Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: 2009 IEEE Conference on computer vision and pattern recognition, pp 248–255. https://doi.org/10.1109/CVPR.2009.5206848

  58. Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. https://doi.org/10.48550/ARXIV.1412.6980. https://arxiv.org/abs/1412.6980

  59. Lewis JR (1995) IBM computer usability satisfaction questionnaires: psychometric evaluation and instructions for use. Int J Hum Comput Interact 7(1):57–78. Taylor & Francis. https://doi.org/10.1080/10447319509526110

  60. Development and application of a metric on semantic nets | IEEE Journals & Magazine | IEEE Xplore. https://ieeexplore.ieee.org/document/24528. Accessed 19 Jan 2023

  61. Wu Z, Palmer M (1994) Verb semantics and lexical selection. In: 32nd Annual meeting of the association for computational linguistics, Association for Computational Linguistics, Las Cruces, New Mexico, USA, pp 133–138. https://doi.org/10.3115/981732.981751. https://aclanthology.org/P94-1019. Accessed 19 Jan 2023

  62. Resnik P (1995) Using Information content to evaluate semantic similarity in a Taxonomy. https://doi.org/10.48550/arXiv.cmp-lg/9511007. http://arxiv.org/abs/cmp-lg/9511007. Accessed 19 Jan 2023

  63. Jiang JJ, Conrath DW (1997) Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy. https://doi.org/10.48550/arXiv.cmp-lg/9709008. . Accessed 19 Jan 2023

  64. Hamroun M, Lajmi S, Nicolas H, Amous I (2018) ISE: Interactive image search using visual content. In: Proceedings of the 20th international conference on enterprise information systems, SCITEPRESS - science and technology publications, Funchal, Madeira, Portugal, pp 253–261. https://doi.org/10.5220/0006806702530261. http://www.scitepress.org/DigitalLibrary/Link.aspx?doi=10.5220/0006806702530261. Accessed 19 Jan 2023

  65. Kennedy L, Chang S (2007) A reranking approach for context-based concept fusion in video indexing and retrieval, pp 333–340. https://doi.org/10.1145/1282280.1282331

  66. Behmo R, Paragios N, Prinet V (2008) Graph commute times for image representation. In: 2008 IEEE Conference on computer vision and pattern recognition, pp 1–8. ISSN: 1063-6919. https://doi.org/10.1109/CVPR.2008.4587840

  67. Chin J, Diehl V, Norman K (1988) Development of an instrument measuring user satisfaction of the human-computer interface. ACM CHIi’

  68. SUS: A quick and dirty usability scale. https://www.researchgate.net/publication/228593520_SUS_A_quick_and_dirty_usability_scale. Accessed 20 Jan 2023

  69. Brooke J (2013) SUS: a retrospective. J Usability Stud 8:29–40

    Google Scholar 

  70. Rashid U, Viviani M, Pasi G (2016) A graph-based approach for visualizing and exploring a multimedia search result space. Inf Sci 370–371:303–322. https://doi.org/10.1016/j.ins.2016.07.072 Accessed 20 Jan 2023

  71. Belz A, Muscat A, Aberton M, Benjelloun S (2015) Describing spatial relationships between objects in images in English and French. In: Proceedings of the fourth workshop on vision and language, Association for Computational Linguistics, Lisbon, Portugal, pp 104–113. https://doi.org/10.18653/v1/W15-2816. https://aclanthology.org/W15-2816. Accessed 20 Jan 2023

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mohamed Hamroun.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hamroun, M., Lajmi, S., Jallouli, M. et al. Efficient text-based query based on multi-level and deep-semantic multimedia indexing and retrieval. Multimed Tools Appl (2023). https://doi.org/10.1007/s11042-023-17256-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11042-023-17256-y

Keywords

Navigation