Review of Interactive Video–Romanian Project Proposal

In the recent years, the globalization and massification of video education offer involved more and more eLearning scenarios within universities. This article refers to interactive video and proposes an overview of it. We analyze the background information, regarding the eLearning campus used in virtual universities around the world, the MOOC movement in the last year, and the related interactive video platforms in the (education) field. At the same time, we pay particular attention to technical aspects of the interactive video: defining concept, types of video metadata, media fragments and types of annotations, as primordial elements that bring interactivity. We tested some free and commercial interactive web application. We gathered all the ideas. We propose a framework for an interactive system web based on the main modules: video resource management (production, transcoding and storage), annotations, Linked Open Data, distribution medium, player interface, data analytics and recommendation system. On the way, we offer our findings, together with our recommendations for an annotation interface and player module. It is our idea for Politehnica University Timisoara, either as a standalone solution or a complement to actual virtual campus (http://cv.upt.ro) depending on future development plans and financial aspects.


Digital Native
Advancement in science, eLearning field and technology has gradually changed our personal life and society. "Today's students are no longer the people our educational system was designed to teach" (Prensky, 2001). Students have unlimited and unrestricted access to information and a different approach to work and learning (Tapscott, 1999). They were born in a digital age and technologies are an integral part of their lives. They acquire information in a faster manner, using multiple sources. They are surrounded by digital technologies and spend a lot of their time watching television, surfing the Internet, playing games, using mobile phones, etc. (Yong & Gates, 2014). The literature classifies students in Generation Y or Digital Natives, birth years 1977-1994and Generation Z or Net Generation, birth years 1995-Present (Kinash, Wood, & Knight, 2013. They have a "hypertext mind", "leap around" (Prensky, 2001), "parallel cognitive structure and not sequential" (Yong & Gates, 2014). They are characterized as multitask, openness to share content (Oblinger & Oblinger, 2005), random access, function best when networked (Yong & Gates, 2014), constant connectivity, speed in delivery of information (Prensky, 2001), unique attitude towards education (Corrin, Bennett, & Lockyer, 2010). There are many approaches; each one differs in the manner researchers use it. Nevertheless, in generals terms are used interchangeably (Jones & Shao, 2011). In according to students' needs, teachers must know how to grasp students' attention and interest in and after the classroom. Digital native students spent over thousands of hours watching television and communicating through emails, cell phones and instant messaging, and less to reading books (Prensky, 2001). Students are both consumers and creators of electronic media material (Torres & Ross, 2014). Understanding this is "vitally important" (Teo, 2013) to allow teachers to build new materials, to improve their skills, to use new technologies (Yong & Gates, 2014). Modern educational concepts and universities must align to these goals, which is why the learning program and online infrastructure were been improved since their first appearance until present days. Contemporary eLearning involve virtual educational environments, (interactive) video lectures and new (video) platforms like MOOCs or interactive based. There are used widely and seem to be well regarded (Jones & Shao, 2011).  Vol. 9, No. 3; images and strictly textual information. On the other half, a Cisco study reveals the fact that the transfer of video material will be 73% of all Internet traffic by 2017 (Cisco, 2013). These literature observations, the continues trend of improving video (educational) content, the infrastructure of MOOCs and world universities online campus ensure the premises of a complete video experience with a productive interaction between video itself and information related to the images, theme and concepts presented in the video. It is the way for generating interactive video elements. Literature defines interactive video as: • One of the most exciting types of media, combining the power of moving images, the story of the video, the depth and wealth of the information enriched by interactivity (Chen, 2012); • video or hyper video, an improved video material by various methods with interactive elements that provide a non-linear way to transmit information, similar to the World Wide Web hyperlinks (Petan & Vasiu, 2013); • A convergence of interactive television with the Internet that brings a lot of benefits in areas like eLearning and business (Lytras, Lougos, Chozos, & Pouloudi, 2002).
In practice and from an instructional point of view, adapting a classical course material requires redesigning the course and restructures its content, leading to increased video production costs for such platforms (Jermann, Bocquet, Raimond, & Dillenbourg, 2014). Video interactivity, not widely implemented until present, come as a current, complement to the educational platform, and it provides "depth information" and diversity, the extra resources opportunity going from a central type video element. Moreover, it is noted by researchers as Zahn, Krauskopf, Kiener and Hesse (2014) or Jensen (2008) that the classical video type display will generate apathy in education, instead of active learning activities, the main purpose of digital information transmission process. Studies in the literature calls for an evolution of videos beyond "passive unidirectional TV type experience" in order to facilitate collaborative processes, directing the attention of students, questioning students online, and gradual transition from one stage of learning to another (Pea, 2006), resulting in increasing the level of interest and personal satisfaction from the student in relation to the contents (Marchioria, Blanco, Torrente, Martinez-Ortiz, & Fernandez-Manjon, 2011).

Technical Factors
Technically speaking, in the context of creation and distribution interactive video materials over the Internet, a primordial factor is video metadata. Metadata is structured information that "describes, explains, locates, or otherwise makes it easier to retrieve, use or manage an information resource" (Niso Press, 2001). This is the descriptive part of an individual video clip, in addition to a just replay of frames. It can be generated either manually by a human creator or automatically by various type of video processing. The information thus generated describes the current video material, and it can be used to identify additional information associated with the themes appearing in the video. In terms of content and concepts presented in a clip, a video material was a black box. From the point of view of the web browser, a video clip encapsulated this way within the page is completely opaque. From a technical standpoint, metadata can be stored in the video files, in particular, fields, defined by the existing standards or it can be saved in dedicated systems of management -databases. In the first case, metadata is stored in the header of the video, in the same file; the available fields for metadata are given by the structure of the video standard that is used. This option has the advantage of having metadata associated directly with the referred video, but access to metadata is slower, and it is requiring processing the video, which size is usually large. A much faster alternative is storing the metadata in dedicated management systems with faster response time to queries but having the disadvantage of being separated from the referred video files. Video materials that possess relevant descriptive metadata become visible to the user who is looking for accurate information. There are several types of metadata: • Technical metadata-Metadata obtained through automatic analysis of the video. They are a particular type of administrative metadata describing properties of digital video, format, compression rate, audio-video codec, resolution, bitrate, file size, video length, information on the equipment used to capture material; • Descriptive metadata-describe a resource to discover and identify, it creates a summary of the content of the video. The type of the information recorded is particular to make an interoperable collection. Metadata is manually entered by the manufacturer or by ordinary users: video title, description, category, tags, associating related videos, secondary annotations; • Administrative metadata-provides access, stores and helps organize the digital collection. The information provided does not directly describe the resource itself but provides information to help manage it, Copyright for example. These types of metadata are completed with accurate information of interactivity related to annotations, decisions and linking the inter-clips, as well as usability information (e.g., viewing statistics, viewed clips, chosen decisions, decisions annotations).
In the process of creating rich experiences through interactive video and adapting video to World Wide Web paradigm, the concept of correlation of additional information is impossible to avoid. A regular website usually contains hyperlinks to other sections from the same page or to other pages within the same site, and also to the web pages located at other sites and on other servers than the local one. The hyperlinks can also indicate other pages, images, Linked Data resources or multimedia materials. Similarly, an interactive video system can be considered as a video system containing references to information that is both within the same clip as well as to other video and multimedia resources managed on the same server, or towards other information from outside. In the first case, the one where the reference is made to another subsection of the same video clip we can use the notion of video fragment similar to a chapter of a DVD movie.
World Wide Web Consortium (W3C), a multi-organizational entity and led by the founder of WWW Tim Berners-Lee itself, aims to define the recommendations and directions for long term developing the World Wide Web. For video, one of the key specifications (W3C Consortium, 2013) defines "media pieces" (media fragments) notion. The main purpose of the specification of video pieces is addressing some subsections from the inside of videos, similar to HTML anchors. These anchors defined by adding a # followed by the anchor name, are referring to a subsection of a current page annotated as such. The media fragments term describes a portion/segment of a media object (Li, Wald, Omitola, Shadbolt, & Wills, 2012); a fact also highlighted by the name fragments + media. An example is a fragment of 30 seconds of a video clip lasting for 2 minutes. In terms of their structure, media fragments have three main components: #, t, xywh; described as: • # -indicates that the preceding part is the physical location of the file, and the following is an excerpt from an image or a video and has two possible components -temporal and spatial dimension; • t -it is the temporal dimension (W3C 2013), can have two values that represent the beginning and the end of the fragment, in second.
• xywh -it is the spatial dimension (W3C, 2013), has four values: the first two xy are the coordinates and wh representing the height and width of the defined fragment.
So, to uniquely identify a portion of a video the addressing of the video materials by URI has the following structure: http://www.name.ro/videoname.mp4#t=t_start,t_end&xywh=5,10,640,480. The process described above, where for a given spatial region or video segment (media fragment) the video content creator can provide additional resources to other media materials embedded within the page, hyperlinks to external resources or information obtained based on the principles of Semantic Web (Wald, Omitola, Shadbolt, & Wills, 2012) is part of video annotations. To cover all cases that may arise, annotations can be classified into: • Conceptual-a video clip is available on a generic concept throughout the video; • Temporal-a particular idea, object or person, second occurs between t 1 and t 2 second; • Spatial-A portion of the image has a definite meaning; • Temporal/spatial-combine spatial and temporal annotations; • Subject in motion-the subject of the annotation moves in the video frame.

Major Players
The interactive video platforms serve the education and entertainment domains mostly. They are built under a free license or commercial, project collaboration, company effort or individual's approaches.
We search on the Internet the main actors for generating online interactive materials and the results are reveal in the

Wiremax
It is a result of a company work, and now the headquarters are in New York, London, and Venice. Cloud-based, the end-to-end pipeline for video ingestion, decomposition, analysis and delivery with the ability to add advanced annotations in real-time are the facilities provided.
Zaption A San Francisco-based tech startup offers an online annotation system, details can be found in the next subchapter.

Zaption Testing
For testing and verification of some theoretical principles of annotations, we choose this platform because it focuses on development and use of interactive video in education, both in Romania and worldwide. Addressed to all kind of learning cycles, Zaption wants to determine teachers, educator, trainers and publishers of content to transform video materials in an interactive and engaging experience. To use Zaption application, it is necessary to create an account. We choose for a Pro Membership Plan, $89/year. After logging, we noticed that the application offers three components: tours, videos, and groups. Video part have features like video searching on Zaption databases, own materials administration, and video uploading. Group component allows creating a new group or enrollment in an existing group. The tour elements offer the same functionalities as video part (e.g. search, add, admin), but for a new user here it is the start of the annotation itself. The beginning of the tour concurs with the video upload: local one, videos from Zaption databases or from other video sharing platform: Vimeo, PBS, National Geographic, TED, Discovery, NASA, Edutopia, Vsauce, Crashcourses, Scishow, CGP Grey etc.
The annotations are various: real-time drawing, creating an open response questions, creating a numerical answer questions, creating a multiple-choice questions, creating a cassette questions answered, create a reply to a question by drawing, discussions, users can ask questions or post comments, repeat video, skip a certain portion of the video. Position setting involves placing the annotation on the right or above the video frame. The behavior allows choosing between two actions: stop or play the video. Duration, in the name itself, sets a period during which the annotation will be displayed. After work finished, the final step includes publishing and sharing out the interactive video material. Once the tour is announced it can be a post to the Zaption Gallery for everyone to see it. Features like add a description and tag with, select one category and age level are permitted. From the technology profile and technical metadata perspective Zaption application was developed based on solutions from Table 7.

Conclusions
The flood of popularity that accompanies video-based platforms generates more and more research opportunities. Most of this research is focused on the impact of video on Higher Education, on a new platform that offers not only a passive learning experience but way more. The video material must represent a plethora of useful information according to user wishes, followed either online, on a TV connected to the Internet or a mobile device, generating a pattern of active learning. It is the way to collaborative activities, quizzing and broad information search, effort to gain the student attention and clear objectives for learning. In terms of technology, it is the right time for interactive video, for combining elements as video production, transcoding, streaming, semantic, recommendations systems, data analytics, etc. The time has come for the educational platform to take into consideration interactivity as a "necessary". It is the moment when universities all over the world have to invest time, money and effort to improve their online video education platforms or to develop from scratch a new challenging video interactive platform. It is our idea regarding Politehnica University Timisoara, as a standalone solution or a complement to actual virtual campus (http://cv.upt.ro) depending on future development plans and financial aspects.
In order to give a proposal closer to current world needs, in this paper we highlighted: the existing educational infrastructure locally and internationally (MOOCs movement, virtual campuses for universities); the concept and the progress of video as central part for e-Learning; essential elements from technical point of view; related project in the field; platform testing (free ones and some commercial ones).
internal/external resources, data analytics and recommendation system. Blocks like transcoding, spatial annotation, and the interactive player was partially built by Sorin Petan for his final Ph.D. thesis, but all these pieces were put together in a discussion group in January 2015. A concrete, stable version will require at least one year of work and a team of specialists in the web developing, Semantic Web, audio video streaming, and transcoding, as a minimum start. We want to present this framework to Politehnica University Timisoara Board, to bring together the needed specialist and to work on a real scenario until the beginning of the next year, to gather feedback from our students, identify possible problems and improve user interface and overall usability.