Multi Layered Multi Task Marker Based Interaction in Information Rich Virtual Environments

Simple and cheap interaction has a key role in the operation and exploration of any Virtual Environment (VE). In this paper, we propose an interaction technique that provides two different ways of interaction (information and control) on complex objects in a simple and computationally cheap way. The interaction is based on the use of multiple embedded markers in a specialized manner. The proposed marker like an interaction peripheral works just like a touch paid which can perform any type of interaction in a 3D VE. The proposed marker is not only used for interaction with Augmented Reality (AR), but also with Mixed Reality. A biological virtual learning application is developed which is used for evaluation and experimentation. We conducted our experiments in two phases. First, we compared a simple VE with the proposed layered VE. Second, a comparative study is conducted between the proposed marker, a simple layered marker, and multiple single markers. We found the proposed marker with improved learning, easiness in interaction, and comparatively less task execution time. The results gave improved learning for layered VE as compared to simple VE.


I. Introduction
R EALISTIC interaction in an effective manner is essential for the exploration of the physical or behavioral characteristics of a VE along with its spatial objects [1]. VEs can be viewed as an innovative model for human computer interaction which not only allows external examination of the virtual entities but interactively involve the user in the 3D virtual space [2], [3]. The virtual environment provides a base for the integration of different VEs methodologies and information visualization which is carried out for the discovery of the association between entities, environment, and information [4]. Different studies addressed the same concepts in different areas such as Scientific visualization [5], VEs [6]- [8], Psychologies of Perception [9]- [12], and Information Visualization [2], [13]. Information Rich Virtual Environment (IRVE) is a combination of accurate 3D information with improved mental and non-spatial information [4]. The objective of an IRVE is to provide systematic multi-form representations [14] that build precise ideas [15], [16] and the cognitive representations of the system [17]. VEs are successfully applied in different areas ranging from engineering design, information visualization, and educational training [3].
Different IRVEs have been developed such as Venue [2], Habitat [15], HUD [18], etc., for the delivery of information related to the VEs in different shapes. In this regard to achieve maximum information delivery along with limited cognitive load on users, efficient human computer interaction is needed [3]. There is a need to develop interaction techniques that provide simple access to information along with control over the system [2]. AR markers have been used for different purposes in AR applications. For interaction in AR environments, fiducial markers are effectively used due to its low cost and flexible nature. Rehman et al. [19]- [22] used multiple single markers for interaction in VEs.
Teteno et al. [23] proposed a nested marker with a hierarchically structured nature. It is used for increasing the range of viewpoint movement. Rabbi et al. [24] proposed a layered marker to cover the tracking distance of large indoor AR spaces. These markers have been limited to mono functional capabilities i.e. they are unable to perform other interaction tasks such as selection, navigation, and manipulation, etc. They are unable to work with complex AR environments where multiple functionalities are desired.
In this paper, we propose a new interaction technique with twofold functional nature i.e. realistic interaction along with information delivery in a simple manner using a newly designed Multi Layered Multi Task (MLMT) marker. It is used for selection, navigation, rotation, and scaling of virtual objects as well as the provision of information in a systematic manner. The MLMT marker is used for interaction with AR and VR. In the first phase, we used MLMT for different types of interaction i.e. visualization, scaling, and rotation of AR 3D objects. In the second phased this marker is used for interaction with complex 3D objects in VR.
Section II presents related work, section III is about MLMT marker, System architectural model is presented in section IV, section V describes Biological VR application, section VI is about experiments and evaluation, and finally section VII is about conclusion.

II. Related Work
The use of IRVEs in learning, training, and other related fields resulted in a valuable response in these areas [16]. Bowman et al. [2], in Virtual Venue, inserted different types of audio, textual, special animation, imagery, and empirical information in the virtual environment. A comparative study was carried out between IRVE and traditional hypermedia or paper-based information system. Interaction with the system is made using a pen and tablet and hand menu. For visualization and interaction with the system, costly devices such as HMD, Joystick, and pen and tablet were used [2]. AnthroGloss, a desktop base VE was developed by [18]. The human anatomy related information was displayed in textual form in the system. For joining the perceptual information and textual labels various tags were used. In Mobile Augmented Reality Systems (MARS), information was visualized over the real environment [25]. A virtual zoo exhibit, was developed by Bowman et al. [15]. The system was used to educate students in the designing of exhibit. In the comparative study with traditional lectures, students got improved learning. Chen et al. [3] developed an immersive IRVE, where they compared two navigation techniques i.e. GoGo [26] and HOMER [26] for search, and exploration tasks. The previous systems mostly stressed the delivery of abstract information of simple virtual objects/concepts but they were unable to provide systematic delivery of information concerning complex objects.
Fiducial markers are used in various types of AR systems. Marker based tracking is carried out using different types of toolkits such as ARToolKit [28], ARToolKitPlus [29], ARTag [29], and ALVAR [30]. These toolkits use different types of markers placed in real world scenes and tracked by the AR systems. For the development of AR applications, these toolkits provide the basic framework. ARToolKit uses square shaped markers placed in the 3D space [31].
Jun et al. [32] used a large room space for the fiducial marker tracking to avoid occlusion for multiple markers. Khan et al. [33], [34] identified different factors affecting the fiducial marker tracking. Rehman et al. [19]- [22] used multiple single markers for interaction such as navigation, selection, and manipulation of objects in a virtual assembly environment and interactive writing board. Azhar et al. [35] used a single marker for interaction in a biological IRVE. Tateno et al. [23] used a hierarchical structured nested marker to extend the range of viewpoint movement. They used four markers inside a marker and each of them was further consisted of four other markers making a total of three layers. A nested marker may lead to inter marker confusion between inner and outer layer markers. Due to limited hierarchical structure, it can cover limited tracking distance, therefore, can't be used in large indoor applications. Recently Rabbi et. al [24] proposed a layered marker for extending the tracking distance of the fiducial marker. The use of these markers is limited to single functional capabilities.
We propose MLMT marker as an interaction tool for Augmented and VEs i.e. a single marker with multiple functional capabilities at the same time. We also propose a new interaction technique for IRVEs which provide textual information related to complex objects having multiple sub-parts in a simple, easy, and interest-oriented manner. The information delivery is based on an interest-based step by step/layered manner.

III. MLMT MARKER
The newly designed MLMT marker (see Fig. 1), is an ARToolKit [31] marker but we have extended its functionality to use it for different purposes. In its design, multiple markers are placed in a nested/layered fashion. Each layer itself represents a single marker with its unique pattern. While the innermost layer consists of two different markers. The complete description of each layer along with its selection procedure is given in Table I. The MLMT marker can be used for different tasks such as visualization, rotation, scaling, etc. at the same time needless of any extra marker i.e. we can call it "all in one". In this study, we have used the MLMT marker as an interaction tool that can perform navigation, selection, rotation, and exploration.
The addition of individual markers (M and E) in the innermost layer enables the MLMT marker to perform different operations dynamically such as visualization, rotation, scaling, etc. at the same time and there is no need for extra markers. These markers bring dynamicity to the functionality of the MLMT marker, enables it to perform differently via changing its mode through these markers.

A. Applications of MMLMT Marker in AR
The use of MLMT is very simple and easy. Each layer is responsible for performing a specific operation. An inner marker can be visualized by occluding the upper layer above it by simply putting a finger over a section of the upper layer (see Fig. 2). If the marker is completely visible to the camera, it means that it is the first layer. Occlusion of the first layer leads to the visualization of the second layer and so on up to the innermost layer. A detailed description of the use of MLMT is given in Table I.
There are various applications of MLMT in different fields. Few applications of MLMT in AR are given below:

Multitasking Operation
The main purpose of the designing of the MLMT marker is to achieve multiple tasks performance capability using a single marker. The same marker can be used for performing different types of operations such as visualization, scaling, rotation, selection, and navigation, etc.

a) Visualization
Visualization of different 3D objects in AR can be done via a single MLMT marker. visualization of each layer of the MLMT marker displays a different 3D object. The first layer displays a cube, the second a sphere, the third a teapot, while the fourth layer displays a cone as shown in Fig. 2.

b) Task Selection
The selection of different tasks of objects can also be done in AR via the MLMT marker. The innermost layer of the MLMT marker consists of two independent markers M and E. Both markers are used to perform the selection of different tasks (see Fig. 3). Visualization of marker M makes the MLMT marker to perform rotation while marker E, to perform scaling.

c) Rotation
The third application of MLMT is the rotation of any 3D object via the MLMT marker. Visualization of M shifts the MLMT marker to perform rotation task. In this mode, at each layer of the MLMT marker performs the rotation of a 3D object in a specific axis. The 3D object rotates in the x-axis when layer 1 of the marker is visualized to the camera. Visualization of the second layer leads to the rotation of the object in the y-axis, 3 rd layer in the z-axis, while the 4 th layer in the xy-axis (see Fig. 4).

d) Scalin
The fourth application of MLMT in AR is the scaling of 3D objects. Visualization of marker E leads the MLMT marker to perform scaling operations with each layer. Visualization of the outermost layer leads to the object scaling with the biggest size, while the second scale down its size to a medium size and so on. Scale down operation is performed by moving from bigger to smaller layers while scaling up in the reverse direction (see Fig. 5).

IV. System Architectural Model
The proposed system is a mixture of AR and VR where AR works as a backend tool and VR as frontend (see Fig. 6). AR deals with marker detection, marker ID, and pose calculation. VR is responsible for interaction with VE. AR uses ARToolKit [31] for the detection and processing of fiducial markers from the real scene taken using a camera. First of all, a video path is initialized for the calculation of camera parameters, then the pattern file database is searched for the marker patterns. The marker detection is carried out when the marker is visualized to the camera. Then in the pattern matching phase, the pattern file of the specific marker layer is searched in the database. After identifying a particular layer, the system then calculates the position, orientation, and ID of the layer. In normal situations, ARToolKit [28] recognizes all registered and visible markers at the same time. In most situations, there is a need to identify and use only one specific marker. The multiple marker identification problems also arise when using MLMT. The MLMT consists of multiple layers each represents a unique marker, so we need a single marker each time to deal with. To cope with multiple marker identification problems, we proposed and implemented a new algorithm described with a flowchart (see Fig. 7) which identifies a specific marker while ignoring all others. The algorithm simply selects the uppermost visible layer of the MLMT marker while ignores all the other markers.  In the first phase, all the layers of the MLMT marker are searched in the library. The first marker (I=1) represents the outermost layer (L1) while the innermost marker (I=N) is represented by the LN (N th layer). If the outermost layer is visible, the algorithm identifies this marker and performs the associated task while ignoring all other inner layers (i.e. L2 to LN). If the first layer is not visible, the algorithm searches for the second layer (L2), if it is visible, its associated task is performed while ignoring layers. The algorithm repeats the whole process for all N layers. If the visible marker is LN, then the algorithm keeps track of the two markers. If marker M is visible and E is occluded, the task (new scenario) associated with M is performed and vice versa.

Marker ID and Position
The VR system performs different interaction tasks in the VE based on acquired marker ID and position. OpenGL library is used for the creation and realistic interaction in the VE.

A. Mode of Interaction
Individual markers M and E in the innermost layer enable the MLMT marker to perform different operations dynamically such as visualization, rotation, scaling, etc. at the same time without needing any extra markers.
The system allows interaction with VE in two different modes. These modes are represented via 3D interactive labels i.e. Explore and Manipulate (see Fig. 8).
The selection of the interaction mode is done using the MLMT's innermost layer markers. Visibility of marker M leads to Manipulation mode while the visibility of marker E to Exploration mode.

Exploration Mode
The selection of the exploration label leads the system to the exploration mode in the VE. This mode is responsible for the delivery of object-related textual information to the user. Exploration of the object is carried out in a step by step fashion.

a) Textual Information
The system displays information about the object at each layer in textual form. The first layer displays just the name, the second layer displays some detail such as the function of the object, and so on. So, the delivery of information depends on the interest of the user. If the user is more interested, he can move towards inside, and so he receives more and more information about the object.

Manipulation Mode
The selection of the manipulation mode allows the user to perform some manipulation operation on each layer. The first layer of the marker rotates the object at 90 0 , the second at 180 0 , and so on, in this way up to 360 0 . So as the user moves inside by selecting layers, the system rotates the object accordingly. Upon visualization of the innermost layer, the system displays the sub-objects of the parent object.

B. Interaction With Objects
When the camera observes the innermost layer of the marker, the system displays sub-objects of the parent object. A complex parent object may have more than one sub-object. A specific object can be selected simply by intersecting the virtual pointer with it. After the selection of an object, all other objects disappear from the screen. Now the user can interact with that object either in Exploration or Manipulation mode. The Exploration mode can be activated by simply occluding the marker M, and vice versa for Manipulation. Upon selecting the Exploration mode, the user can explore object related information in a step by step fashion. While the selection of Manipulation mode allows the user to manipulate the object via visualizing different layers of the marker. In both modes upon approaching the innermost layer, the system enters the next phase, i.e. displays sub-objects of this object.

V. Biological VR Application
A detailed description of the proposed VR system is shown in Fig. 9. The VE displays a complex virtual object and allows users to interact with it. The object has further internal subobjects. Interaction in the VE consists of selection, manipulation, exploration, and object inside/internal navigation in a layered fashion. First of all the system allows us to select an interaction mode and after that, the visualization and identification of each layer of the marker are carried out by the system, and finally, the task associated with each layer is performed. At the innermost (Nth) layer, the system visualizes sub-objects of the parent and allows users to select any one of them. After the selection of a specific sub-object, the above whole process is repeated for the selected sub object. We have developed a biological application for evaluation and experimental purposes. This VE is a room like structure where a human skull is visualized as the main/parent object as shown in Fig. 8. A human skull consists of various human organs where eyes and brain are the most important. So we will analyze these objects in our study. The VE also consists of Explore and Manipulate objects used for the selection of interaction mode.
The VE when loaded contains only the parent object to explore and manipulate labels. Interaction with the system is carried out using the MLMT marker. The virtual pointer follows the movement of a completely visible MLMT marker. First of all the user can select the explore or manipulate mode. Visualization of object related information at different layers, sub-objects of the parent object, and rotation of object during manipulation are shown in Fig. 10 It consist of mainly the upper dome shape called cranium and bones at the base of the skull. It also consists of nazal bridge, left, and right cavities, maxillary and mandibullar bones of the skull.

VI. Experiments and Evaluation
In the experimental section, we will perform two types of studies. The experimental setup is shown in Fig. 13.

A. Comparison of Simple VE Vs Layered VE
To perform a comparative study, we have designed two VEs i.e. a Simple VE and a Layered VE. In Simple VE three different 3D human organs i.e. human skull, eye, and brain are visualized (see Fig. 14). To interact with these virtual objects in Simple VE, we designed three different markers as shown in Fig. 15. Firstly, all three markers are visualized to the camera. Occlusion of specific marker results in information visualization of the related virtual object i.e. occlusion of the Skull marker displays information related to human skull object while Brain and Eye markers visualize information related to the human brain and eye. Upon occlusion of a specific marker, the system displays complete information related to that concerned virtual object as one big chunk as shown in Fig. 16.

Skull
Brain Eye The layered VE consists of a 3D human skull as shown in Fig. 8. MLMT marker is used for interaction with the VE in a layered fashion i.e. the Layer1 displays name, Layer2 displays some details and so on up to the LayerN-1, which displays detailed information related to the skull. The LayerN displays the subparts i.e. human Brain and Eye. Brain or Eye can be selected via visualization of one marker and occlusion of the markers i.e. E or M. Occlusion of E leads to the visualization of the human eye and occlusion of M displays 3D human brain. After the selection of the brain/eye, the system allows the user to display related information via the interaction of MLMT marker in a layered fashion as discussed above.
Human Skull ‧ Human skull is the upper most part of the human body. ‧ It encloses the important parts i.e.
Brain and Eyes etc. ‧ It consist of mainly the upper dome shape called cranium and bones at the base of the skull. It also consists of nazal bridge, left, and right cavities, maxillary and mandibullar bones of the skull. We will experimentally examine the learning effect and task execution time in both VEs.

Protocol
To investigate the learning improvement, task execution time, and easiness in interaction, we have randomly selected thirty (30), participants. All the participants were SSC (Secondary School Certificate) level science (biology) students from three different schools. The topic was included in their course work. The students had no previous experience with VR. The study was designed in a manner to use all the VE features.

Task
All the students were divided into two groups (G1 and G2) each of 15 students. All the students were demonstrated and trained about the use of VE. The task was to interact and study each object in a sequence (explore and manipulate).
Students of G1 performed three trails on the task in the simple VE. While that of G2 performed three trails on the same task in the layered VE. The task execution time was recorded for both groups. After that, they filled a questionnaire to evaluate their learning enhancement.

Results Analysis
In this section, we will analyze the questionnaire filled by both groups. These questions aimed to assess their learning enhancement and ease of interaction. The students have to answer the questions related to the learning. For the objective analysis, the task execution time for both groups was also recorded.

a) Learning
The analysis of variance (ANOVA) concerning the learning enhancement for G1 and G2 is significant (F(1,28) = 35.087, p < 0.05). So, there is a significant difference between G1 and G2. The mean and standard deviation (SD) of both groups (G1 (48.80, 11.143) and G2 (68. 8, 9.096)) is shown in Fig. 17. The results show that students of G1 got more knowledge as compared to G2. The main reason for improved learning of G1 may be the provision of information in small chunks in a stepwise manner. As the small amount of information is easy to learn as compared to a big chunk of information.

b) Task Execution Time
The ANOVA for task performance of both groups is significant (F(1,28) = 60.222, p < 0.05). The mean and standard deviation (SD) for G1 is (161. 6, 24.20) and G2 is (218.93, 15.25), as shown in Fig. 18. It means that the students of G1 who used simple VE completed the task fast as compared to G2 who used the layered VE. The reason behind the good performance of G2 was the simple selection of a marker among three markers placed in front of them while in case of layered VE, users need more cognitive and physical work i.e. selection of different layers in a sequence and selection of objects, etc. which took more time.

B. Comparison of MLMT With Single and Multi Layered Markers
In this study, we evaluated the effect of interaction tool on student's task execution time, learning, and usability in the VE. For this purpose, we compared our proposed interaction tool (i.e. MLMT marker) with multiple single markers [31], and multi layered markers [24].

Single Marker
Every single marker is uniquely designed for each type of function/ operation as shown in Fig. 19. For example, a single marker may perform the rotation of an object or display the name of an object, etc. We have designed 24 unique markers i.e. 8 markers for the human skull (4 for 4 layers of exploration, 4 for rotation), 8 for the human brain (4 for 4 layers of exploration, 4 for rotation), and 8 for the human eye (4 for 4 layers of exploration, 4 for rotation).

Multi Layered Marker
We have designed six different multi layered markers i.e. two for skull exploration and manipulation, two for brain exploration and manipulation, and two for eye exploration and manipulation as shown in Fig. 20. Each multi layered marker consists of 4 layers while the last/inner layer is a marker with some letters. Each layer performs its specific function. In the case of Skull exploration, the first layer is used for the visualization of simple information (i.e. Human Skull), Layer2 visualizes some detail, and so on, the LayerN displays in-depth information of the skull. While in case of manipulation, the visibility of each layer leads to some type of manipulation task e.g. rotation, scaling, etc. These markers can perform a single task and thus they have a lack of dynamicity. Interaction with VE can be performed via occluding each layer one by one from outer to the inner layer.

MLMT Marker
We designed an MLMT marker for interaction with VE, as shown in Fig. 21. This single marker is responsible for interaction which includes navigation, selection, and manipulation in the layered VE. The MLMT marker is a multi layered marker, consisted of four nested layers while the innermost layer consisted of two unique markers M and E. The Addition of markers M and E extend its capabilities to operate dynamically in different situations.

Protocol and Task
We selected another group of 30 students for the experimental study. We randomly divided these students into three groups (G1, G2, and G3). G1 is assigned to use a single marker while G2 and G3 used layered and MLMT markers for interaction with layered VE. All the students were first briefed regarding the use of their assigned marker. After that, they used the VE for 10 minutes before the actual experiment. After training, they performed the experimental task. The task was to interact and explore all the three 3D objects i.e. skull, brain, and heart.

Results Analysis
In this section, we performed both the objective and subjective analysis of the three groups. In the objective analysis, we compared the task execution time of the three groups. In the subjective analysis, we first used a questionnaire to assess learning enhancement using their assigned system. After that, we used the System Usability Scale (SUS) [36] for evaluation based on the student's opinions.

a) Task Execution Time
The ANOVA concerning task execution time for all groups i.e. G1, G2, and G3 is significant (F(2,27) = 60.289, p < 0.05). The mean and SD for G1, G2, and G3 is (55.00, 9.274), (37.20, 6.630), (20.50, 41.16) as shown in Fig. 22. It means that the group G3 completed the task in less time as compared to the G1 and G2. The reason for the low performance of using single markers is the searching of specific markers among multiple markers for each task.

b) Learning
To assess students learning, the questionnaire consisted of different questions about the information given in the biological application. The ANOVA related to students learning for all groups i.e. G1, G2, and G3 is significant (F(2,27) = 14.152, p < 0.05). The mean and SD for G1 is (82.60, 6.569), G2 is (69.40, 9.383), and G3 is (60.80, 11.153), as shown in Fig. 23. From the above results, we can conclude that G3 is comparatively better in learning enhancement than G1 and G2. The provision of step-wise information in small chunks (easy to read) using MLMT improves learning as compared to others. The reason of reduce learning for G1 and G2 may be the cognitive work required for identification and selection of a specific marker among multiple markers.

C. Usability
We use a standard usability test to evaluate these interaction tools i.e. markers based on the student's opinions. The SUS consists of ten questions that define the ease of use, learnability, efficiency, effectiveness, and user satisfaction. The SUS consists of 10 questions each with options ranges from strongly disagree (1) to strongly agree (5). The score for each odd number question is measured as scale value minus 1 i.e. strongly disagree has score 0 and strongly agree has 4. The score of even number questions is calculated as subtracting from each value from 5 i.e. strongly disagree has a score 5-1 = 4 and strongly agree has 5-5=0 score. For example, in Table II, for question 1, 7 students opted for Strongly agree and 3 for Agree option. The average score of ten students for question 1 is calculated as (((5-1) x 7 + (4-1) x 3)/10) = 3.7. For question 2, three students selected the option Strongly disagree and 7 opted for Disagree. The average score of ten students for question 2 is (((5-1) x 3 + (5-2) x 7) /10) = 3.3.
SUS questionnaire results show that the overall results for Layered markers are good as students opted to 80.25 SUS score (see Table III). As the students are satisfied with their assigned tool but having less score in terms of consistency and user-friendliness. The SUS questionnaire results of students who used multiple single markers show an average SUS score of 70 (see Table IV). Results of the SUS questionnaire (see Table II) shows that students selected the best options in favor of the MLMT marker which got SUS usability score 88. Students' opinions regarding questions 1 and 9 show that all students are satisfied with the proposed interaction tool i.e. MLMT marker. Results of questions 2, 3, 7, and 8 show that the proposed marker has user-friendly nature. 3. I thought the system was easy to use. 0 0 0 2 8 3.8

4.
I think that I would need the support of a technical person to be able to use this system. 8 2 0 0 0 3.8

5.
I found the various functions in this system were well integrated. 0 0 0 6 4 3.4 6. I thought there was too much inconsistency in this system. 5 5 0 0 0 3.5

7.
I imagine that most people would learn to use this system very quickly. 0 0 0 1 9 3.9 8. I found the system very cumbersome to use. 9 1 0 0 0 3.9 9. I felt very confident using the system. 0 0 0 2 8 3.8

10.
I needed to learn a lot of things before I could get going with this system.

4.
I think that I would need the support of a technical person to be able to use this system. Results of questions 5 and 6 show that the proposed system is well integrated while the results of questions 4 and 10 show that systems learnability is very good.

VII. Conclusion and Future Work
The provision of information easily and effectively is the most important prerequisite in any information rich virtual environment. We propose a novel interaction technique for the selection, manipulation, and exploration (textual information delivery) of complex objects in virtual environments. Exploration consists of an interest-based, step by step (layered based) information delivery to users. A newly designed MLMT fiducial marker is used for interaction with virtual objects. The MLMT marker was used for navigation, selection, and manipulation of virtual objects. A biological virtual learning application was used for evaluation and experimental purposes. We performed a comparative study between the proposed MLMT marker, simple layered marker, and multiple single markers. The experiments resulted in improved learning, easiness in interaction, and comparatively less task execution time using the MLMT marker.
In the future, we will use the proposed marker in different areas such as for interaction with interactive writing boards [22] and interactive games. We also plan to work on the occlusion of the MLMT marker, i.e., to differentiate between intentional and unintentional hiding of the markers.