1 Introduction

In the last decade, substantial progress has been made in content-based analysis and multimedia streaming to facilitate the development of large-scale multimedia information systems. Together with the recent progress on semantic web, it is now possible to build a new generation of multimedia applications that enable large-scale semantic representation, analysis, and delivery of multimedia data from heterogeneous data sources. However, there is still a long way to go for mature solutions of multimedia database systems that are capable of processing semantics-rich, large-volume multimedia data. It could be even more challenging if such systems are under stringent functional and non-functional (e.g., QoS) requirements.

The goal of this special issue is to bring the semantic web community and multimedia processing & computing community together and provide a forum for multidisciplinary research opportunities, with a focus on how to apply the semantic technologies to the acquisition, generation, transmission, storage, processing, and retrieval of multimedia information. Discussions on future challenges in multimedia information manipulation, as well as practical solutions for the design and implementation of multimedia database software systems are also encouraged. Topics of interest include but are not limited to practical areas that span both semantic technologies and multimedia processing & computing.

2 The review process

This special issue solicited high quality research papers in the following areas of interests:

  • Automatic generation of multimedia presentations

  • Semantic multimedia metadata extraction

  • Annotation tools and methods for multimedia semantics

  • Media ontology generation/learning/reasoning

  • Content-based multimedia analysis

  • Multimedia indexing, searching, and retrieving

  • Multimedia streaming

  • Semantic-based QoS control and scheduling

  • Semantic-based Internet data streaming and delivery

  • Multimedia standards (e.g., MPEG-7 and XMP) and Semantic Web

  • Semantics enabled multimedia applications (including annotation, browsing, storage, retrieval, and visualization)

  • Semantics enabled networking and middleware for multimedia applications.

Submissions related to any area of semantic computing for multimedia systems were welcomed. We received a very diverse pool of thirty nine (39) submissions and the contact authors were from China (5), United States (5), France (3), Greece (3), South Korean (3), Australia (2), Germany (2), Spain (2), Tunisia (2), Turkey (2), United Kingdom (2), Taiwan (1), Belgium (1), Brazil (1), India (1), Ireland (1), Malaysia (1), Portugal (1), and Netherlands (1). Due to the interdisciplinary nature of the Call for Paper, the papers represented a wide range of approaches and scenarios.

Most of the papers are reviewed by at least three experts in the topical area. We have carefully selected reviewers with the track record publications and review experiences in the particular sub-area. The guest editors have sent out more than 100 review invitations and most of the invited reviewers finished their reviews on time. We are very grateful to the reviewers for their excellent job. In addition, the guest editors examined the reviews for each paper and profound discussions were performed among the organizers and reviewers to reach a consensus for each paper. In the end, sixteen (16) papers were recommended for acceptance. The contact authors of the accepted papers are from Greece (3), United States (3), Australia (1), Belgium (1), France (1), Germany (1), India (1), South Korea (1), Spain (1), Netherlands (1), Tunisia (1), and United Kingdom (1).

3 Accepted papers

The sixteen (16) papers selected for publication span a variety of aspects of semantic computing for multimedia systems, which include the areas of semantic multimedia content analysis (e.g., annotation, classification, and segmentation), semantic multimedia systems (e.g., multimedia streaming), as well as intriguing applications of semantic multimedia computing (e.g., multimedia computing in medical applications). The key ideas and contributions of these papers are summarized as below.

3.1 Semantic multimedia content analysis

In the paper entitled “GAT: A Graphical Annotation Tool for Semantic Regions,” the authors Xavier Giro, Xavier Giro, Neus Camps, and Ferran Marques present a Graphical Annotation Tool based on a region-based hierarchical representation of images. The implementation uses MPEG-7/XML input and output data to allow interoperability with any type of Partition Tree. The annotation tool is public available as an open source software. The paper “A Robust Framework for Joint Background/Foreground Segmentation in Complex Video Scenes Filmed with Freely Moving Camera” by Walid Barhoumi, Slim Amri, and Ezzeddine Zagrouba explores a robust region-based general framework for discriminating between background and foreground objects within a complex video sequence. The proposed framework could identify novel objects and non-novel ones mutually while profiting of the semantic information offered by regions. In the paper entitled “Exploit Camera Metadata for Enhancing Interesting Region Detection and Photo Retrieval,” Zhong Li and Jianping Fan investigate a machine learning-based interesting region detection algorithm for consumer photos. Based on this algorithm, a computer can reversely calculate what the interestingness of the photographer is and what the core content of a photo is. The Paper “Content and Task-based View Selection from Multiple Video Streams” by Fahad Daniyal and Andrea Cavallaro introduces techniques for content-aware multi-camera selection that generates view-dependent ranking information using a multivariate Gaussian distribution. The best view is selected by a Dynamic Bayesian Network (DBN), which utilizes camera network information. In “personalTV—A TV Recommendation System based on Content-based Filtering,” Günther Hölbling, Michael Pleschgatternig, and Harald Kosch introduce a TV recommendation system based on the detailed program information and the interaction history of the user with the system. In the paper entitled “MyOwnLife: Incremental and Hierarchical Classification of a Personal Image Collection on Mobile Devices,” Antoine Pigeau designs and implements a personal image classification system for mobile devices. In the paper entitled “A Service for Validating MPEG-7 Descriptions w.r.t. to Formal Profile Definitions,” the authors Raphael Troncy, Werner Bailer, Michael Hausenblas, and Martin Höffernig tackle the problem of lacking formal grounding semantics of MPEG-7 elements. The proposed approach expresses the semantics explicitly by formalizing the constraints of various profiles using ontologies, logical rules, and ad-hoc programming. The authors also implemented the proposed approach as a full semantic validation web service that is public available over the Internet. Stamatia Dasiopoulou, Vassilis Tzouvaras, Ioannis Kompatsiaris, and Michael G. Strintzis present a systematic overview of the state of the art in MPEG-7 based ontologies in their paper entitled “Enquiring MPEG-7 based Ontologies.” They also highlight issues pertaining to the intended context of usage, obstacles hindering interoperability, as well as possible directions towards their harmonization.

3.2 Semantic multimedia systems

In the paper entitled “NinSuna: a Fully Integrated Platform for Format-independent Multimedia Content Adaptation and Delivery using Semantic Web Technologies,” the authors Davy Van Deursen, Wim Van Lancker, Wesley De Neve, Tom Paridaens, Erik Mannens, and Rik Van de Walle present the design and functioning of a fully integrated platform for multimedia adaptation and delivery. Due to the use of format-agnostic adaptation engines (i.e., independent of the underlying coding format) and formatagnostic packaging engines (i.e., independent of the underlying delivery format), this platform is able to efficiently deal with the aforementioned heterogeneity in the present-day multimedia ecosystem. In “A Multimedia Data Streams Model for Content-Based Information Retrieval”, Shi-Kuo Chang, Shenoda Guirguis, Rohit Kulkarni, and Lei Zhao provide a multimedia data streams model with the objective of furnishing a formal framework to efficiently design a Multimedia Data Streams (MMDS) schema that achieves an efficient performance in regard with content based retrieval. Nikolaos Konstantinou, Emmanuel Solidakis, Anastasios Zafeiropoulos, Panagiotis Stathopoulos, and Nikolas Mitrou investigate the problem of the real-time integration and processing of multimedia metadata collected by a distributed sensor network in their paper entitled “A Context-aware Middleware for Real-Time Semantic Enrichment of Distributed Multimedia Metadata.” They propose an approach for the real-time, rule-based semantic enrichment of lower level context features with higher-level semantics. In the paper entitled “Incorporating Packet Semantics in Scheduling of Real-time Multimedia Streaming,” Sungwoo Hong and Youjip Won develop a packet scheduling algorithm which properly incorporates the semantics of a packet. This research aims to develop a packet scheduling mechanism which can improve the user perceivable QoS instead of focusing on improving packet loss, delay, nor burstiness.

3.3 Intriguing applications of semantic multimedia computing

In the paper entitled “Content based Radiology Image Retrieval using a Fuzzy Rule based Scalable Composite Descriptor,” the authors Savvas A Chatzichristofis; Yiannis S Boutalis address the problem of creating an effective method for the indexing and retrieval of radiology images. One of the most important features of the proposed Fuzzy Rule Based Compact Composite Descriptor is that its size adapts according to the storage capabilities of the application that uses it. Mathias Lux, Oge Marques, Klaus Schöffmann, Laszlo Böszörmenyi, and Georg Lajtai present a video summarization tool and demonstrate how it can be successfully used in the domain of arthroscopic videos, which are video streams generated by a small camera used in the arthroscopic surgery procedure. They discuss how this tool can be used for arthroscopic videos, leveraging some domain-specific aspects, without losing its ability to work on general-purpose videos. In “A MPEG-7 Authoring and Retrieval System for Dance Videos,” Rajkumar Kannan introduces a system with the capacity of semi-automatic authoring and access to dance archives. This system offers an MPEG-7 based semiautomatic and search engine based on spatial, temporal, and spatio-temporal features of the dancers using the tree-embedding technique. In the paper entitled “Ranking Canonical Views for Tourist Attractions,” Lin Yang, John K Johnstone, and Chengcui Zhang leverage online photo collections to automatically rank canonical views which are a subset of the photographs that best summarize a photo collection, for a tourist attraction.