Abstract
Video annotation tools are often compared in the literature, however, most reviews mix unstructured, semi-structured, and the very few structured annotation software. This paper is a comprehensive review of video annotations tools generating structured data output for video clips, regions of interest, frames, and media fragments, with a focus on Linked Data support. The tools are compared in terms of supported input and output data formats, expressivity, annotation specificity, spatial and temporal fragmentation, the concept mapping sources used for Linked Open Data (LOD) interlinking, provenance data support, and standards alignment. Practicality and usability aspects of the user interface of these tools are highlighted. Moreover, this review distinguishes extensively researched yet discontinued semantic video annotation software from promising state-of-the-art tools that show new directions in this increasingly important field.
Similar content being viewed by others
Notes
There are also cross-media annotation tools, such as IMAS and YUMA, which provide annotations for multiple media types (see Section 3).
It is a common practice to abbreviate terms using the namespace mechanism, which relies on a prefix to eliminate long (often symbolic) URIs, such that schema: abbreviates http://schema.org/ and foaf: abbreviates http://xmlns.com/foaf/0.1/. For example, foaf:depicts abbreviates http://xmlns.com/foaf/0.1/depicts.
In the example, concept names are written in PascalCase, role names in camelCase, and individual names in ALL CAPS, as per description logic best practices.
References
Aydınlılar M, Yazıcı A (2013) Semi-automatic semantic video annotation tool. In: Gelenbe E, Lent R (eds) Computer and information sciences III, pp 303–310. doi:10.1007/978-1-4471-4594-3_31
Ballan L, Bertini M, Del Bimbo A, Seidenari L, Serra G (2011) Event detection and recognition for semantic annotation of video. Multimed Tools Appl 51(1):279–302. doi:10.1007/s11042-010-0643-7
Ballan L, Bertini M, Del Bimbo A, Serra G (2010) Semantic annotation of soccer videos by visual instance clustering and spatial/temporal reasoning in ontologies. Multimed Tools Appl 48:313–337. doi:10.1007/s11042-009-0342-4
Bellini P, Nesi P, Serena M (2015) MyStoryPlayer: experiencing multiple audiovisual content for education and training. Multimed Tools Appl 74:8219–8259. doi:10.1007/s11042-014-2052-9
Benmokhtar R, Huet B (2014) An ontology-based evidential framework for video indexing using high-level multimodal fusion. Multimed Tools Appl 73(2):663–689. doi:10.1007/s11042-011-0936-5
Bertini M, d’Amico G, Ferracani A, Meoni M, Serra G (2010) Sirio, Orione and Pan: an integrated web system for ontology-based video search and annotation. In: ACM international conference on multimedia, Firenze, Oct 25–29, 2010, pp 1625–1628. doi:10.1145/1873951.1874305
Bertini M, Del Bimbo A, Torniai C, Cucchiara R, Grana C (2006) MOM: multimedia ontology manager. A framework for automatic annotation and semantic retrieval of video sequences. In: ACM Multimedia 2006, Santa Barbara, Oct 23–27, 2006, pp 787–788
Bizer C, Heath T, Berners-Lee T (2009) Linked Data—the story so far. Int J Semant Web Inform Syst 5(3):1–22. doi:10.4018/jswis.2009081901
Bohlken W, Neumann B, Hotz L, Koopmann P (2011) Ontology-based realtime activity monitoring using beam search. Lect Notes Comput Sci 6962:112–121. doi:10.1007/978-3-642-23968-7_12
Carrer M, Ligresti L, Ahanger G, Little TDC (1998) An annotation engine for supporting video database population. Springer Int Series Eng Comput Sci 431:161–184. doi:10.1007/978-0-585-28767-6_7
Choudhury S, Breslin JG (2010) Enriching videos with light semantics. In: Fourth international conference on advances in semantic processing, Florence, Oct 25–30, 2010, pp 126–131
Duong TH, Nguyen NT, Truong HB, Nguyen VH (2015) A collaborative algorithm for semantic video annotation using a consensus-based social network analysis. Expert Syst Appl 42(1):246–258. doi:10.1016/j.eswa.2014.07.046
Elleuch N, Zarka M, Ammar AB, Alimi AM (2011) A fuzzy ontology-based framework for reasoning in visual video content analysis and indexing. In: Eleventh international workshop on multimedia data mining, San Diego, Aug 21–24, 2011, Article 1. doi:10.1145/2237827.2237828
Gómez-Romero J, Patricio MA, García J, Molina JM (2010) Ontology-based context representation and reasoning for object tracking and scene interpretation in video. Expert Syst Appl 38:7494–7510. doi:10.1016/j.eswa.2010.12.118
Grassi M, Morbidoni C, Nucci M (2012) A collaborative video annotation system based on semantic web technologies. Cogn Comput 4(4):497–514. doi:10.1007/s12559-012-9172-1
Guo K, Zhang S (2013) A semantic medical multimedia retrieval approach using ontology information hiding. Computational and Mathematical Methods in Medicine, Volume 2013, Article ID 407917, Hindawi Publishing Corporation. doi:10.1155/2013/407917
Haslhofer B, Jochum W, King R, Sadilek C, Schellner K (2009) The LEMO annotation framework: weaving multimedia annotations with the web. Int J Digit Libr 10(1):15–32. doi:10.1007/s00799-009-0050-8
Haslhofer B, Momeni E, Gay M, Simon R (2010) Augmenting Europeana content with Linked Data resources. In: 6th international conference on semantic systems, Graz, Sep 1–3, 2010, Article 40. doi:10.1145/1839707.1839757
Heggland J (2002) Ontolog: temporal annotation using ad hoc ontologies and application profiles. Lect Notes Comput Sci 2458:118–128. doi:10.1007/3-540-45747-X_9
Hunter J, Newmarch J (1999) An indexing, browsing, search and retrieval system for audiovisual libraries. Lect Notes Comput Sci 1696:76–91. doi:10.1007/3-540-48155-9_7
Hunter J, Schroeter R, Henderson M (2003) Vannotea screenshot. University of Queensland. http://www.itee.uq.edu.au/eresearch/filething/images/get/projects/vannotea/031014_Screenshot_FilmEd_v2.jpg. Accessed 4 April 2016
Jiang Y-G, Bhattacharya S, Chang S-F, Shah M (2013) High-level event recognition in unconstrained videos. Int J Multimed Info Retr 2:73–101. doi:10.1007/s13735-012-0024-2
Khedher MI, El Yacoubi MA (2015) Local sparse representation based interest point matching for person re-identification. Lect Notes Comput Sci 9491:241–250. doi:10.1007/978-3-319-26555-1_28
Krötzsch M, Simančík F, Horrocks I (2013) A description logic primer. arXiv:1201.4089v3
Lee M-H, Rho S, Choi E-I (2014) Ontology-based user query interpretation for semantic multimedia contents retrieval. Multimed Tools Appl 73(2):901–915. doi:10.1007/s11042-013-1383-2
Lienhart R, Maydt J (2002) An extended set of Haar-like features for rapid object detection. In: 2002 International conference on image processing, New York, Sep 22–25, 2002, pp 900–903. doi:10.1109/ICIP.2002.1038171
Lombardo V, Pizzo A (2014) Ontology–based visualization of characters’ intentions. Lect Notes Comput Sci 8832:176–187. doi:10.1007/978-3-319-12337-0_18
Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110. doi:10.1023/B:VISI.0000029664.99615.94
Mazloom M, Habibian A, Snoek CG (2013) Querying for video events by semantic signatures from few examples. In: 21st ACM international conference on multimedia, Barcelona, Oct 21–25, 2013, pp 609–612. doi:10.1145/2502081.2502160
Merler M, Huang B, Xie L, Hua G, Natsev A (2012) Semantic model vectors for complex video event recognition. IEEE Trans Multimed 14(1):88–101. doi:10.1109/TMM.2011.2168948
Naphade M, Smith JR, Tesic J, Chang S-F, Hsu W, Kennedy L, Hauptmann A, Curtis J (2006) Large-scale concept ontology for multimedia. IEEE Multimedia 13(3):86–91. doi:10.1109/MMUL.2006.63
Nixon L, Bauer M, Bara C, Kurz T, Pereira J (2012) ConnectME: semantic tools for enriching online video with web content. In: 8th international conference on semantic systems, Graz, Sep 5–7, 2012, pp 55–62
Oomoto E, Tanaka K (1993) OVID: design and implementation of a video-object database system. IEEE T Knowl Data En 5(4):629–643. doi:10.1109/69.234775
Poppe C, Martens G, De Potter P, Van de Walle R (2012) Semantic web technologies for video surveillance metadata. Multimed Tools Appl 56(3):439–467. doi:10.1007/s11042-010-0600-5
Rublee E, Rabaud V, Konolige K, Bradski G (2011) ORB: an efficient alternative to SIFT or SURF. In: 2011 I.E. international conference on computer vision, Barcelona, Nov 6–13, 2011, pp 2564–2571. doi:10.1109/ICCV.2011.6126544
Sikos LF (2015) Mastering structured data on the Semantic Web: from HTML5 Microdata to Linked Open Data. Apress Media, New York. doi:10.1007/978-1-4842-1049-9
Sikos LF (2016) A novel approach to multimedia ontology engineering for automated reasoning over audiovisual LOD datasets. Lect Notes Comput Sci 9621:3–12. doi:10.1007/978-3-662-49381-6_1
Sikos LF, Powers DMW (2015) Knowledge-driven video information retrieval with LOD: from semi-structured to structured video metadata. In: Exploiting semantic annotations in information retrieval, Melbourne, Oct 23, 2015, pp 35–37. doi:10.1145/2810133.2810141
Simon R, Jung J, Haslhofer B (2011) The YUMA media annotation framework. Lect Notes Comput Sci 6966:434–437. doi:10.1007/978-3-642-24469-8_43
Steiner T, Hausenblas M (2010) SemWebVid—making video a first class semantic web citizen and a first class web Bourgeois. In: Ninth international semantic web conference, Shanghai, Nov 7–11, 2010
Viola P, Jones M (2001) Rapid object detection using a boosted cascade of simple features. In: IEEE computer society conference on computer vision and pattern recognition, Kauai, Dec 8–14, 2001, pp 511–518. doi:10.1109/CVPR.2001.990517
Weiss W, Bürger T, Villa R, Punitha P, Halb W (2009) Statement-based semantic annotation of media resources. Int J Digital Libr 5887:52–64. doi:10.1007/978-3-642-10543-2_7
Xu F, Zhang Y-J (2006) Evaluation and comparison of texture descriptors proposed in MPEG-7. J Vis Commun Image Represent 17:701–716. doi:10.1016/j.jvcir.2005.10.002
Yang N-C, Chang W-H, Kuo C-M, Li T-H (2008) A fast MPEG-7 dominant color extraction with new similarity measure for image retrieval. J Vis Commun Image Represent 19:92–105. doi:10.1016/j.jvcir.2007.05.003
Yıldırım Y, Yazıcı A, Yılmaz T (2013) Automatic semantic content extraction in videos using a fuzzy ontology and rule-based model. IEEE T Knowl Data En 25(1):47–61. doi:10.1109/TKDE.2011.189
Zarka M, Ammar AB, Alimi AM (2015) Fuzzy reasoning framework to improve semantic video interpretation. Multimed Tools Appl. doi:10.1007/s11042-015-2537-1
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Sikos, L.F. RDF-powered semantic video annotation tools with concept mapping to Linked Data for next-generation video indexing: a comprehensive review. Multimed Tools Appl 76, 14437–14460 (2017). https://doi.org/10.1007/s11042-016-3705-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-016-3705-7