Multimodal Generic Framework for Multimedia Documents Adaptation

Today, people are increasingly capable of creating and sharing documents (which generally are multimedia oriented) via the internet. These multimedia documents can be accessed at anytime and anywhere (city, home, etc.) on a wide variety of devices, such as laptops, tablets and smartphones. The heterogeneity of devices and user preferences has raised a serious issue for multimedia contents adaptation. Our research focuses on multimedia documents adaptation with a strong focus on interaction with users and exploration of multimodality. We pro-pose a multimodal framework for adapting multimedia documents based on a distributed implementation of W3C’s Multimodal Architecture and Interfaces applied to ubiquitous computing. The core of our proposed architecture is the presence of a smart interaction manager that accepts context related information from sensors in the environment as well as from other sources, including information available on the web and multimodal user inputs. The interaction manager integrates and reasons over this information to predict the user’s situation and service use. A key to realizing this framework is the use of an ontology that undergirds the communication and representation, and the use of the cloud to insure the service continuity on heterogeneous mobile devices. Smart city is assumed as the reference scenario.


I. Introduction
A smart city is an extension of a city which allows the intelligent use of resources to improve the quality of life in urban areas.Various aspects are considered to be the focus of the transformation towards smart cities, e.g.smart home, smart mobility, smart parking, smart lighting, and many more.Computing is moving toward Smart, ubiquitous environments in which heterogeneous devices (Smartphone, tablet, laptop, smart TV, etc.), applications and services are all expected to integrate and cooperate in support of human objectives; anticipating needs, negotiating for the service, acting on our behalf, and delivering services in anywhere (home, city, etc.) and anytime.
Currently, people are capable of creating and sharing documents (which generally are multimedia oriented) via the internet according to their context.Context has been defined in [1] as "the knowledge that can be used to characterize the situation of any entity that is relevant for the (pervasive) system under consideration".Many context information models are limited from the view of interoperability.Thus, the need for ontology based context modeling is given for many context-aware applications [2].
Multimodal interactions and multimodality refer to the process in which different devices and people are able to interact aurally, visually, by touch or by gesture.One of the main purposes of multimodality is the improvement of user interactions with devices, such as smartphones, laptops, tablet, etc [3] [4].User interfaces should allow users to interact with the content or a service through the most appropriate mode or through multiple modes, taking into account user preferences and context [5].
In this paper, we introduce a multimodal generic framework to support context-aware computing in smart fields (Home, City, Tourism, etc.).Our proposed approach is developed using a semantic multimodal user context to create a more natural communication system, adaptation decision making and a cloud as a way to ensure the service continuity since a wide variety of devices have to coexist in a very heterogeneous environments.
The rest of the paper is organized as follows.In Section 2, we survey the related work in literature.In Section 3, we introduce and discuss our contribution.Possible application scenarios are discussed in section 4. The result and discussion are provided in section 5. Finally, we conclude our paper and present future work in Section 6.

II. Related Work
Making cities smarter is spreading all over the world during the latest five years.The term "smart city" was defined by [6] and [7] as a strategy to surround the modern urban production factors in a common platform and to accentuate the importance of information technologies and communications during the last twenty years to increase the competing profile of a city.
In [8] García et al. described the difference between Smart Objects and Not-Smart Objects and their relation with the Internet of Things (IoT).In their review, they realized that the combination between Smart Objects and the IoT can offer many advantages and improve the people's life because it can interconnect and communicate the different objects to create more complex applications.[9] an approach for irrigation and highway lamps using IoT in order to preserve energy and resource such as water and electricity.The proposed approach is based on an advanced irrigation system for parks and road side plantation which includes grouping together various peripherals using IoT, and, an advanced highway and high mast lighting system which provides automatic control of the lights of the Highway and High Mast Light.

Solanki et al. introduced in
Smart cities applications must be able to adapt and react dynamically according to the heterogeneous and dynamic environment.They must also be able to adapt dynamically to the utilization context; which represents the relation between the user and the application, such as, its preferences, its physical capabilities, its intentions, etc.And, to the execution environment; which concerns all the information related to the system, such as, user interaction, sensors, networks, etc. [10] Since the last decade, a lot of approaches have been proposed in order to model devices characteristics and users contexts that are further exploited by multimedia document adaptation processes.We have noticed that some of these approaches provide exclusively a descriptive view of context information (e.g., CC/PP (Composite Capability / Preference Profiles) [11], UAProf (User Agent Profile) [12]), while others propose enhancements with some constraints expressions (e.g., CSCP (Comprehensive Structured Context Profiles) [13], Context-ADDICT [14], SGP (Semantic Generic Profile) [15]).Furthermore, these approaches lack of a dynamic context modeling that is dedicated to describe situation dependent user information and preferences and enables the multimedia documents adaptation.WURFL (User Agent Profile) [16] is a XML file of description of resources of mobile terminals.This language contains information on the capacities and features of the mobile devices.This project is intended for the adaptation of Web pages on mobile terminals [17].In [18] a generic model of profile specified in UML was proposed which makes it possible to describe the structure and the semantic of any type of information or user profile.This contribution defines semantic links between elements and integrates the weighting of the elements.This semantic graph is described with a logical directed approach of description [19] via formalisms RDF/RDFS/OWL.However, these models do not make it possible to express actions (e.g. to disable sound).
An adaptive context-aware application for assisting tourists in a smart environment was proposed in [20].This solution is able to collect not-structured data, belonging to heterogeneous sources and develop recommendations for the user, in order to support a tourist inside a town.In [21] an active learning support system for contextaware ubiquitous learning environments is developed, using contextual information including the location, the current capacity of the learning object, the time available, etc.However, these approaches don't insure the service continuity on mobile devices.In a moving environment, the services can themselves be lost or not be able to function correctly, for example, because of the disappearance of certain resources.
Multidimensional context-aware social network architecture was proposed in [22] to develop a mobile ecosystem to enable context awareness in the development and utilization of mobile crowdsensing applications.Maarala et al. [23] proposed system architecture for providing semantic data and reasoning process with different Semantic Web technologies and methods on context aware IoT environment.Hence, they do not allow a user to express his/her requirements using different interaction modes (e.g.speech).
In [24] Khari et al. proposed a model for secure transmission of data in smart cities.They implemented their model using digital signatures for authentication and triple DES for data transmission.Khari et al. proposed another approach in [25] which focused on Security Classification based on Cloud Layered Framework.
The adaptation, in particular, real time adaptation, raises complex scientific issues as well as new challenges for the execution and the development of applications.For example, the possibility of collecting context information in a dynamic and heterogeneous environment, exploiting heterogeneous infrastructures while benefiting from the opportunities given by these environments, etc, creating a more natural system which takes into account the cultural issues, working environment and physical capabilities.
Our work differs from all of the above in many aspects.Our main objectives are to design an innovative architecture, enabling: (1) a semantic representation and manipulation of multimodal input information, sensed data and services (2) dynamic relevant adaptation services (3) cloud services as a way to insure the service continuity on mobile devices.

III. Multimodal Generic Context-Based Framework for Multimedia Documents Adaptation
Intuitively, people navigate information which generally contains multimedia objects (text, image, audio and video) and interact with multiple devices every day, anytime and anywhere.These corporal interactions fundamentally change how we communicate with devices, because they influence how we process information and thus how we obtain knowledge.
Multimodal interfaces are the scope of our work; they allow users to process and display multimedia documents better.In order to adapt these documents, we propose, in this section, a system which benefits from user multimodal input along with the user context to make it easier to the user to display multimedia content according to their preferences.

A. System Architecture
We first describe the architecture based on a distributed implementation of W3C's Multimodal Architecture and Interfaces [26] applied to ubiquitous computing.The proposed architecture is shown in Fig. 1.

Physical Plane
The first step of enabling smart services is to collect contextual information about environment, location, device characteristics, user input, etc.For example, sensors can be used to continuously monitor human's physiological activities and actions such as health status and motion patterns.The Resource Manager able to interact with different components coordinates their activities and collect data for the semantic layer.It provides a uniform management of heterogeneous sensors.

Cloud Plane
The emergence of mobile devices has made millions of people turn to cloud-based services.The reason behind this shift is the need for seamless syncing of contacts, emails, calendar events, and all kinds of data within a heterogeneous environment comprising different devices, operating systems and applications.

Service Plane
The service plane is composed of three components: • The identification Service is able to identify the user and give him the full access to the platform.
• The multimodal services allow the recognition of the input modalities and present their semantic [27], [28].There are many multimodal services integrated in devices such as: the HandWriting Recognition (HWR), Speech Recognition (SR) and EmotionML for user emotions recognition.
• The adaptation service allows the transformation of a multimedia object into another multimedia object satisfying a given profile.

Semantic Plane
The semantic plane is composed of three main components: • The context ontology for modeling the user context, such as, their personal characteristics (language, physical capabilities, etc), their preferences, the capacity of their terminal (screen size, battery, etc.), the characteristics of the network (type, bandwidth, etc.), etc.
• The service ontology for modeling the services used in our system, such as, the identification service, multimodal service, etc.
• The Interaction Manager (IM) is the core of the architecture presented in this paper.Its role is the management of the user interactions between the multimodal interface and the user.To access to the platform functionalities, the user must identify himself, for that, the IM sends a request to the identification service.If the user is already identified, the Interaction Manager receives the multimodal input fragments from the user's device and processes them to obtain a meaningful input using the multimodal services.Thus, the IM starts to analyze the user request and get data from the context ontology.From all the information and data gathered in the analyze step, the IM generates a set of actions, then gets the adapted document from the adaptation services and displays it to the user.If there is no action, the process will start again and wait for another request.When it comes to a low battery level, the data flow could be stored in the cloud as a way to insure the service continuity on mobile devices.The algorithm (see Fig. 2) begins by testing if the user is already logged into the system using identified() method.If the user is not identified yet, an identification service will be called using the method identificationService().Once the user is logged, the system allows the user to interact more naturally and gets their requests through the method multimodalService.getRequest().The analysing process is done by analyse() which receives as parameters the user request, the multimedia document and the context ontology (described in the next section).Thus, depending on the analysis result, an adaptation service will be called through the method adaptationService().

B. The Proposed Context-Aware Ontology for Multimedia Documents Adaptation
In order to realize the multimodal generic architecture for multimedia documents adaptation, we need to develop an ontology for enabling knowledge sharing and reasoning.

The Context Ontology Representation
The first step of building the context ontology is the data collection from different sources, such as the ubiquitous sensors, social networks and mobile devices.The raw resource data is summarized in Table I.
We develop an ontology (Fig. 3) for a semantic modeling of context information using OWL language.OWL is one of the emerging Semantic Web technologies that are endorsed by the W3C for building ontologies [29], [30].Table II shows some classes of the context ontology and their contextual data.

Sensor data
• Environment data (temperature, humidity, pollution) • Health data (heart rate, blood pressure, stress level, physical capability) Multimodal data Table II.Contextual Data

Environment
• Environment data (temperature, humidity, pollution) • Health data (heart rate, blood pressure, stress level, physical capability)

Multimedia data
• Text/ Audio/Video/Image

Resource Context
• Hardware (Battery level, device screen size, etc.) • Software The contextual data is updated automatically depending to environment and situation changes.
The User class contains information about the user.It is divided into two sub classes (see Fig. 4): • The Non_Security class which represents general information such as, the user name, age, gender and physical capability.
• The Security class related to user identification.In the next section, we introduce the service ontology which is composed of different types of services.• The host service is responsible of migrating contextual data and services to the cloud for further processing in case of a limited computing power of mobile devices.The Local class contains information about fixed devices or mobile devices.The Cloud class contains information about the cloud server that can be used for hosting contextual data and services [31].

The Service Ontology Representation
• The adaptation service generates the transformation process of a multimedia object into another multimedia object satisfying a given decision.
-Transforming: it allows the content change without changing the media type and format, e.g., text summarization, language translation, etc.
• The identification service allows the user to be identified and give them the full access to the system.
• The multimodal service enables the recognition of user multimodal inputs.

Case Study on Context Reasoning
In this section, we define rules that can for example be represented using the generic rule languages in Jena reasoner which we intend to use in our prototype.

IV. Application Scenarios: Smart Cities
The introduced systems may serve people in different application scenarios.We list some of them in the following to illustrate the practicality and benefits of using the developed mobile application.

• Tourism
Traveling somewhere new where you have few to no language skills is a real obstacle, especially when you want to navigate native websites (restaurants, malls, museums, etc.) without any translation.Tourists may benefit from the language translation functionalities without being distracted or even blocked by language barriers in foreign countries.

• International Students Commuting
Many students prefer to study abroad.One of the problems that are internationally faced, language barrier becomes the biggest problem for international students.The students can get information, adapt with the city system and probably display any multimedia document without counting for language barriers.For instance, any student may display any courses videos, these videos can be subtitled automatically to their own language.

• Health
People navigate and display content using different methods depending on their preferences, skills, and abilities.Thus, they can display multimedia contents depending to their physical capabilities.For example, if a blind person receives a text, he would rather receive an audio instead, which means that we need to transform the text to audio.

V. Result and Discussion
We have partially implemented the proposed architecture on Androïd platforms using the AndroJENA framework.This framework allows to maintain OWL descriptions on tablets and smartphones, and to query them with the support of SPARQL queries.Fig. 6 shows screenshots of the speech input queries.
Our work will go beyond previous work in building intelligent multimodal context aware framework for multimedia documents adaptation.Some mobile computing research projects have explored context awareness as a means to improve user interfaces of mobile devices.In our framework, the notion of context awareness goes beyond the basic sensing of how the devices are being used.In our work, context ontology is explicitly represented using ontology language (i.e.OWL) that undergirds the communication and representation and supports preferences sharing, prediction of user mobility and prediction of service use.

VI. Conclusion
The heterogeneity of devices and user preferences has raised the problem of multimedia documents adaptation according to user context and condition.This paper applied a semantic multimodal approach to multimedia documents and their adaptation through a multimodal architecture that allows the interaction of user and his devices to satisfy his preferences and according to his context.
As discussed above, there remains more work to be carried out for covering all the aspects of adapting multimedia documents and for improving our semantic and multimodal architecture by integrating a trust model for data sharing in smart cities.Another work aims to build a context-based agent architecture in which the various components are implemented as autonomous agents.

Fig. 5
Fig. 5 presents the services composition in a simple OWL ontology.

Fig. 5 .
Fig. 5.A snapshot of the service ontology.

TABLE I .
Raw Resource