The objective of this special issue is to report on recent trends in digital and media technologies responding to the challenges of managing and accessing multimedia (images, audio, video, 3D/4D material, etc.). In a highly selective review procedure we accepted contributions describing recent work that aims at narrowing the large disparity between the low-level multimedia descriptors and the richness of subjectivity of semantics in user queries and human interpretations of audiovisual data.

The articles in this special issue can be grouped into following categories: Multimedia Analysis (Section 1), Multimedia Ontologies and Data Integration (Section 2), and Social Media and Retrieval (Section 3).

1 Multimedia analysis

The predominant mode of operation in multimedia analysis has traditionally been bottom-up, i.e., working out high-level semantic features from low-level and easy to compute characteristics of the multimedia representation. In this special issue only one of three papers in the area of multimedia analysis follows this path (emotion prediction [1], described below). The second paper in multimedia analysis uses a high-level desired representation of a film script and maps backwards and forwards to image-processing output of film scenes [2]. Finally, the last paper in this section uses high-level semantics to guide the low-level task of keyframe detection [9] in a reversal of traditional workflow.

Ricardo Calix and Gerald Knapp propose a methodology for actor level emotion magnitude prediction in text and speech [1]. Their model feeds the previous predicted emotion magnitudes into the model for the current state that is also influenced by particular current text and speech features. Their methodology is drawn from machine learning, i.e., selecting and fitting a model to the available data. They find that nonlinear Support Vector Regression models with radial basis functions work best and identify the contribution of the selected features to the prediction.

In an example of high-level conceptual modelling that is brought to bear on low-level features, Benjamin Diemert, Ana Pinzari, Claude Moulin, Marie-Helene Abel and Marcus Shawky [2] define conceptual models of film scripts that represent information about realising the film shooting for a) prior guidance of the (amateur) cameraman b) feedback during shooting based on image-processing algorithms and c) after the shooting for selection of appropriate video sequences. Diemert et al.’s work on mapping semantic scripts to the output of image processing algorithms ultimately aims at leveraging amateur video material in a professional production.

Suet-Peng Yong, Jeremiah Deng and Martin Purvis hypothesise that visual salience can be better modelled using top-down mechanisms that incorporate object semantics than without [9]. They predict that the approach of automated semantics extraction can be used to improve video summarisation, indexing and retrieval. In their own work they have developed a framework that models semantic contexts for keyframe extraction and monitors sequential changes in the semantic context. This results in an ability to detect and locate significant novelty in the video stream. In their concrete application, Yong et al. first segmented wildlife video frames, extracted features and matched image blocks in order to automatically construct a co-occurrence matrix of labels that ultimately represent the semantic context of a scene. They demonstrated that using this approach yields better keyframe extraction than using only the low-level features.

2 Multimedia ontologies and data integration

Ontologies play an important role in multimedia by allowing to exchange the semantics of multimedia content between distributed information systems. In this special issue three papers discuss the role of ontologies and the integration of distributed multimedia sources. The authors of [7] compare the landscape of multimedia ontologies in the last decade, while [8] proposes and compares two ontology matching techniques to semantic image retrieval. In contrast, the authors of [3] present an mediator based approach to integrate multiple heterogeneous data sources.

Mari Carmen Suarez-Figueroa, Ghislain Auguste Atemezing, and Oscar Corcho compare well-known ontologies in the multimedia domain [7]. Therefore, they introduce a framework called FRAMECOM-MON, which takes process-oriented dimension, such as the methodological one, outcome-oriented dimensions, like multimedia aspects, understandability, and evaluation criteria into account.

Konstantin Todorov, Nicolas James, and Celine Hudelot present an approach for multimedia ontology matching with the aim to associate common-sense knowledge to multimedia concepts [8]. Therefore, the authors extend textual concept matching approaches by visual representations of images and introduce a new multi-modal graph matching method. Their evaluation yields the conclusion that textual and visual modalities have to be seen as complementary to each other.

Claudio Gennaro, Domenico Beneventano, Sonia Bergamaschi, and Fausto Rabitti describe a mediator-based approach for providing a unified multimedia access service over integrated heterogeneous multimedia sources [3]. Their approach builds on the MOMIS integration system and the MILOS multimedia data management system which facilitate query-by-example access to multimedia collections. The argued strong points of this approach are MILOS flexible similarity measures and the MOMIS mediators exploitation of ranks of local answers.

3 Social media and retrieval

Remaining group of papers in this special issue concentrate on social media and retrieval. While in [5] an image tag ranking system called i-TagRanker is presented, [4] describes an approach for image retrieval using spatial relations. Further, [6] focuses on multiple perspective interactive search defining a paradigm for exploratory search and information retrieval on the web.

Jin-Woo Jeong, Hyun-Ki Hong, and Dong-Ho Lee propose in [5] a novel image tag ranking system called i-TagRanker which exploits the semantic relationships between tags for re-ordering the tags according to the relevance with an image. The proposed system works of two phases, namely the tag propagation phase, and the tag ranking phase. In the tag propagation phase, the most relevant tags from similar images are collected and propagated to an untagged image. In the tag ranking phase, tags are ranked according to their semantic relevance to the image. The system has been evaluated on a Flickr photo collection with over 30,000 images.

Carlos Arturo Hernandez-Gracidas, Luis E. Sucarand, and Manuel Montes-y-Gomez report in [4] on their work on improving image retrieval by using spatial relations. They model the spatial relations with conceptual graphs. Additionally, they propose a term-weighting scheme and use more than one sample image for retrieval applying several late fusion techniques. Their methods were evaluated with a rich and complex image dataset, based on the 39 topics developed for the ImageCLEF 2008 photo retrieval task.

Rahul Singh, Ya-Wen Hsu, and Naureen Moon focus in their article [6] on multiple perspective interactive search defining a paradigm for exploratory search and information retrieval on the web. Their system allows simultaneous and semantically correlated presentation of query results from different semantic perspectives. Users can explore the results either using a specific perspective or through a combination of perspectives via highly-intuitive yet powerful interaction operators. In the proposed paradigm, hits obtained from executing a query are first analysed to determine latent content-based correlations between the pages. Next, the pages are analysed to extract different types of perceptual and informational cues. This information is used to organise and present the results through an interactive and reflective user interface which supports both exploration and search.