EZ-Manipulator: Designing a mobile, fast, and ambiguity-free 3D manipulation interface using smartphones

Interacting with digital contents in 3D is an essential task in various applications such as modeling packages, gaming, virtual reality, etc. Traditional interfaces using keyboard and mouse or trackball usually require a non-trivial amount of working space as well as a learning process. We present the design of EZ-Manipulator, a new 3D manipulation interface using smartphones that supports mobile, fast, and ambiguity-free interaction with 3D objects. Our system leverages the built-in multi-touch input and gyroscope sensor of smartphones to achieve 9 degrees-of-freedom axis-constrained manipulation and free-form rotation. Using EZ-Manipulator to manipulate objects in 3D is easy. The user merely has to perform intuitive singleor two-finger gestures and rotate the hand-held device to perform manipulations at fine-grained and coarse levels respectively.We further investigate the ambiguity in manipulation introduced by indirect manipulations using a multi-touch interface, and propose a dynamic virtual camera adjustment to effectively resolve the ambiguity. A preliminary study shows that our system has significant lower task completion time compared to conventional use of a keyboard–mouse interface, and provides a positive user experience to both novices and experts.


Introduction
3D object manipulation is a key task in many applications, including virtual reality, gaming, and commercial modeling packages such as 3DS Max, Maya, etc. Manipulation is typically achieved through a traditional keyboard mouse interface using a virtual trackball [1,2] or a 3D transformation widget [3,4].Researchers have done extensive work on designing novel input devices with more degrees of freedom (DOF) [5][6][7][8] and mid-air interaction using hand gestures [9][10][11].However, those systems usually demand specialised hardware (e.g., depth sensors [10,11]) and require a non-trivial environment.For instance, most mouse-like devices need to be placed or moved on a desk, while a depth sensor is capable of tracking hand gestures only within a fixed range of distances.These requirements largely limit the usability and mobility of the above systems.
Motivated by the rapid development of mobile devices with multi-touch interfaces, a huge body of work has been proposed to adapt tactile paradigms to 3D manipulation [12][13][14][15][16][17][18][19][20][21][22].Amongst existing systems, indirect manipulation techniques [17,18] that disassociate the transformations from visual widgets have been shown to be easier to use than direct ones [20,21].However, they assume multitouch devices with mid-to large-sized displays (e.g., a tablet or touch-table) and hence the proposed twoor three-finger gestures used are unsuitable for touch screens of small size as found on a smartphone.Naive adaptation of these methods will lead to "fat finger" and hand occlusion problems.Moreover, indirect manipulation using multi-touch gestures frequently causes ambiguous manipulation, where the 2D projection of a 3D transformation widget results in overlapping coordinate axes on the 2D screen.Thus, while multi-touch interfaces are promising for 3D manipulation, we still lack a feasible approach for smartphones.
In this work, we present EZ-Manipulator, a new 3D manipulation interface that leverages the builtin multi-touch input and gyroscope sensor in a smartphone to support mobile, fast, and ambiguityfree 9 DOF axis-constrained manipulation, with 3 DOF each for translation, rotation and scaling, and free-form rotation.Specifically, we base our work on the work of Au et al. [18] to achieve 9 DOF axis-constrained manipulation through a small set of multi-touch gestures using intuitive singleand two-finger operations (see Fig. 1(a)).To tackle ambiguous manipulation, we develop a dynamic camera system that monitors the 2D projection of the 3D transformation widget and automatically adjusts the viewpoint of the virtual camera to resolve ambiguity.To freely rotate 3D objects, our system virtually associates the coordinate system of the object with that of the physical smartphone, and direclty controls the object's orientation using the reference orientation from the gyroscope sensor (see Fig. 1(b)).The handiness and wireless connection capability of smartphones further enable a scenario of mobile and remote interaction with 3D applications using EZ-Manipulator (see Fig. 1

(c)).
A preliminary study shows that our system not only outperforms conventional keyboard-mouse interfaces in terms of task completion time, but also that its efficacy and intuitiveness are highly valued by both novices and experts.We have also demonstrated the usability of EZ-Manipulator in several application scenarios, including remote interaction with a large display environment, rapid prototyping of interior designs, multi-user collaboration, and gaming (see the video in the Electronic Supplementary Material (ESM)).

Multi-touch interaction
The rapid development of multi-touch display devices has motivated much work on the design of 2D multitouch gestures for 3D manipulation tasks [12,13,[15][16][17][18][21][22][23][24].For instance, Hancock et al. [12,13] explored how to map multi-touch inputs to 6 DOF unconstrained 3D manipulation, i.e., translation and free-form rotation.Cohé et al. [21] presented tBox, a novel box-like widget designed for touch screens to facilitate 3D interaction using a 2D widget.Au et al. [18] further achieved a widget-and button-free constrained 3D manipulation interface using a small but effective set of multi-touch gestures.Beyond basic 3D manipulation, some work utilizes multi-touch input in specific contexts, including 3D modeling [23,24], organic environment construction [22], automation engineering [25], and stereo workspace construction [19].However, all of these systems require a large display to support interaction with the underlying applications, and hence are largely limited in mobility.A naive adaptation to the smartphone paradigm (with a small display) leads to "fat finger" and hand occlusion problems.In contrast, our work carefully adapts the multi-touch gestures in Ref. [18] to smartphones, and leverages the built-in sensors to enable both constrained and unconstrained 3D manipulations with 9 DOF.

Mid-air interaction
Another line of work proposes to interact with 3D Fig. 1 EZ-Manipulator is a new 3D manipulation interface for smartphones that supports mobile, fast, and ambiguity-free interaction with 3D objects.The system utilizes (a) a small set of intuitive multi-touch gestures to achieve 9 DOF axis-constrained manipulation, and (b) performs free-form rotation using the built-in gyroscope sensor.(c) A user remotely manipulating 3D objects on a large-size display using EZ-Manipulator.
content using natural freehand or body gestures [10,11,26,27].Wang et al. [10] presented a markerless hand tracking system with a set of tailor-made gestures for performing fundamental tasks in CAD applications.Song et al. [11] introduced a handle bar metaphor to provide effective and accurate control over pose and scale manipulations of 3D objects.Nancel et al. [27] designed a set of mid-air gestures to support pan-and-zoom interaction with digital content shown on a wall-sized display.Hilliges et al. [26] leveraged the space above a tabletop display to support intuitive and direct 3D manipulations using a see-through display.However, these trackingbased systems are functional only in non-trivial settings where users have to perform the gestures within a working area sensed by an RGB-D camera [28] or when wearing a colored glove [9], as means of hand tracking and gesture recognition.In addition, performing mid-air gestures can cause fatigue and is hence unsuitable for long term usage.Our system overcomes the above issues by using an ubiquitous smartphone for the interface, which takes negligible effort to use.

Interaction using mobile devices
Some work has shown the advantages of using mobile devices in remote interaction and collaboration [29][30][31][32][33]. Boring et al. [32] introduced the Touch Projector, an interface implemented on a mobile device that allows users to manipulate contents shown on a distant display.The system has been further extended to support interaction of multiple users and has been applied to media facades through live video [31].Chapuis et al. [29] presented Smarties, an input interface using a mobile device that supports complex interaction and collaboration among users for wall-sized display applications.Chang et al. [33] implemented object tracking and annotation on a mobile device to support remote collaboration in a dynamic video-mediated workspace.Similarly, our system utilizes the mobility and handiness of a smartphone to support intuitive and effective 3D manipulation.

System overview
Since the EZ-Manipulator is designed to enable single or multiple users to manipulate 3D objects on a remote display, we employ a client-server protocol to communicate between the smartphones (clients) and virtual environment (server).EZ-Manipulator has three modes of use in 3D modeling.In the key manipulation mode, users may perform both fine-grained and coarse 3D manipulations of virtual objects through intuitive multi-touch gestures, and by rotating the smartphone by hand.Without loss of generality, we assume the dominant hands of users to be their right hands.The other two modes are camera and selection modes, in which users may respectively control the camera view and select single or multiple 3D objects.Users can seamlessly switch modes, thanks to the small touch screen of the smartphone, using different gesture patterns.

Multi-touch gestures: axis-constrained manipulation
Indirect control of a 3D transformation widget using multi-touch gestures has been proved to be effective for constrained manipulation of 3D objects [18].As our interface shares the same idea of indirect manipulation using multi-touch gestures, we thus design a visual transformation widget for each 3D object.Given an object selected for editing, we define candidate axes for the object using its local coordinate frame and render these axes in distinguishable colors to represent a visual transformation widget drawn at the centroid of the object.Next we base on the following observations to re-design the multi-touch gestures for smartphones.

One finger is better than two
A pilot study by Kashiwakuma et al. [34] indicated that single-finger pan offers more precise control than using two fingers, and is preferred by most users.Furthermore, a single finger, which touches a smaller area, can pan more freely on a small touch screen than two fingers.Thus we base gestures on single-finger interactions.

Locations of touched points matter
It is easy to show that when we hold the smartphone with one or two hands, the right (or left) thumb can easily touch regions near the right (or left) border of the screen.This enables us to increase the dimensionality of multi-touch gestures based on the locations of the touched points.Figure 2 illustrates the design of multi-touch gestures for 9 DOF axis-constrained manipulation.
We divide the touch screen into two regions, one for triggering finger touch events and the other for detecting single-and two-finger pan and pinch gestures.The former region, the TouchZone, is a vertical rectangular area aligned with the left border of the touch screen.We roughly classify the location of touched points within the TouchZone into three areas, top, middle, and bottom, which are visualized using circular shapes.Therefore we obtain 12 combinations in total using 4 finger touching states from the nondominant hand and 3 finger movement patterns from the dominant hand.
To control the axis-constrained transformations using a single-finger pan gesture, our system automatically compares its movement direction with 2D lines, which represent the projections of candidate axes on the view plane, and selects the one with smallest angular difference as the candidate axis.The magnitude of the transformation is calculated from the displacement.Specifically, axisconstrained translation is achieved by the gesture consisting of a single-finger pan along the desired axis (see Fig. 2(a)).Axis-constrained scaling is performed by two-handed gesture that combines a single-finger touch on the bottom of the TouchZone and a single-finger pan along the desired axis (see Fig. 2(b)).Uniform-scaling is triggered using a direction-independent two-finger pinch gesture (see Fig. 2(c)).Finally, we employ the same design as Ref. [18] to perform axis-constrained rotation using the gesture of a two-finger pan perpendicular to the desired axis (see Fig. 2(d)).

Gyroscope: free-form rotation
In augmented reality applications, the advantage of using the mobile device's sensors for interaction with 3D content is that the 3D scene can be viewed and controlled intuitively by rotating and moving the physical device.This inspires us to base the free-form rotation control on physical smartphones in such a way that users can freely control object orientation simply by rotating the phone by hand.This can be done by associating the object orientation with the reference orientation data from the built-in gyroscope sensor (see Fig. 3).

Camera control and object selection
Camera control is a basic but essential operation for users to see 3D objects from different viewpoints.To offer a basic control mechanism, we constrain the camera view to always be centered on the centroid of the manipulated object, while users can rotate the camera using a two-handed gesture that combines a single-finger touch on top of the TouchZone and a single-finger pan (see Fig. 4(a)).
Another frequently used operation is to select one or more objects in the virtual environment.We use a two-handed gesture similar to the one used for camera control, except that the single-finger touch is now on the middle of the TouchZone, while the single-finger pan gesture moves a virtual cursor in 2D (see Fig. 4(b)).The system automatically highlights each object pointed to by the virtual cursor; users Fig. 3 Free-form rotation using a reference rotation inferred from the gyroscope sensor.can confirm the selection by releasing the touch finger on the TouchZone.

Ambiguity
Determination of a unique candidate axis is essential for valid axis-constrained manipulation.In our system, this task is done by comparing the orientation of 2D lines, the projections of candidate axes on the view plane, with either the direction of a singlefinger pan gesture (for translatation or non-uniform scaling) or the orientation defined by two touch points (for rotation).When 2D lines are nearly collinear in the view plane, the system can not effectively compute a unique candidate axis, and we call such a scenario an ambiguous manipulation (see Fig. 5(a)).Although one can resolve the ambiguity by altering the camera view, this extra effort will typically slow down the manipulation task.We thus provide a simple yet effective system to automatically resolve the ambiguity problem.The key insight lies in formulating an optimization problem that finds a new camera view which is ambiguity-free and does not deviate too much from the original view.

Dynamic camera adjustment
Given a camera view C i , we denote the projection of candidate axes on the view plane of C i as a triple {X i , Y i , Z i }.To detect the occurrence of ambiguity, we use the following delta function: where f XY is the angle between the 2D vectors X and Y .We measure how far the current camera view is from an ambiguous configuration using the equation: Since the camera view is constrained to be centered on the object, we model the distance between the two camera views as where p i is the 3D position of C i .Overall, if δ(C i ) = 0, our goal is to find a new C i that minimizes the objective function: With proper normalization of the first two terms in Eq. ( 1), we find a local minimum by randomly sampling around the current camera view (15 samples) and taking the one with best value.

Evaluation
Three experimental studies were conducted to evaluate the performance of EZ-Manipulator.In the first experiment, we compared our interface with a conventional keyboard-mouse interface by measuring the completion time for a series of 3D manipulation tasks executed by the participants.
In the second experiment, we evaluated the effectiveness of our proposed dynamic camera adjustment for ambiguous manipulations with a set of tailor-made 3D manipulation tasks.
Finally we collected user experiences from both novices and experts in the third experiment to evaluate the usability of our system.
We recruited thirteen novice users, including 11 males and 2 females with mean age of 23, who participated in all experiments.Further five experts including 1 male and 4 females with mean age of 25 participated in the third experiment.No novice user had prior experience of 3D manipulation, while all experts had at least three years of experience in 3D modeling packages such as 3DS Max or Maya.All the participants were well-trained smartphone users.

3D object manipulation
In Experiment I, each participant was asked to accomplish 12 tasks of adjusting the pose of a 3D object to a specified configuration using both a keyboard-mouse interface and EZ-Manipulator.The total task completion time for each participant was recorded.Note that each re-posing task was specifically designed so that participants would have to perform translation, rotation, and scaling operations to finish the task.Each participant was given 5 training tasks using both interfaces before starting the assessed tasks.Results are shown in Fig. 6(a).They indicate that our interface is much more efficient than a conventional keyboard-mouse interface with significant reduction in task completion time.

Ambiguous manipulation
Experiment II was conducted to evaluate the effectiveness of our dynamic camera adjustment system in helping users to resolve ambiguous manipulations.We prepared a set of 10 object re-posing tasks similar to those in the previous experiment.Each task invloved only translation and was designed to guarantee more than one occurrence of ambiguity while finishing the task.Each participant was asked to accomplish 10 tasks using our interface with and without the assistance of dynamic camera adjustment, and we recorded the total completion time for each participant.We can see from Fig. 6(b) that the dynamic camera did indeed help reduce the total task completion time for 3D object manipulations.Our observations showed that participants typically spent much time adjusting camera views when dynamic camera adjustment was unavailable.

Usability
Experiment III comprised a set of questionnaires given to each participant after the two previous user studies, to evaluate the usability of EZ-Manipulator for both experts and novices.The mean response of the 17 participants and associated standard deviation bar can be found in Fig. 7. Responses from novices were in general higher than those from experts, especially for the opinion "a dynamic camera is helpful".The responses from the experts indicate they can readily resolve the ambiguity using a combination of keyboard and mouse input, and for them, the dynamic camera system is unnecessary.A further common opinion voiced by participants is that the axis-constrained rotation could not achieve the demanded precision using a two-finger gesture.Overall, however, both experts and novices agree that EZ-Manipulator is a fast, easy, and intuitive 3D manipuation interface that is particularly suitable for fast prototyping.

Applications
We further used EZ-Manipulator in three different applications to illustrate its potential (see Fig. 8).The first application demonstrates how the user can quickly re-arrange the furniture in an interior design using our system.The second application shows the advantage of using our system to support remote interaction and multi-user collaboration (two users collaborate to re-arrange furniture) on a wall-sized display.The third application implements a multiplayer Jenga game in which the user can easily pick a piece and remove it using the axis-constrained manipulations offered by EZ-Manipulator.We refer the reader to the supplementary video in the ESM for demonstrations.

Limitations
EZ-Manipulator has several limitations.(i) Because manipulation is performed via the multi-touch interface on a small-screen device, accurate manipulation requires tiny finger movements, which are difficult and time-consuming.(ii) During multiuser collaboration, camera control can cause problems as they share a single camera on the workspace.If one user controls the camera, it will immediately impact the other users.To a certain extent, our dynamic Fig. 7 User ratings on a Likert scale from 1 (strongly disagree) to 5 (strongly agree) for each user study.camera also has this problem.Therefore, we do not permit dynamic camera operations when multiple users are manipulating objects in a scene.

Conclusions and future work
In this paper, we have proposed EZ-Manipulator, a new 3D manipulation interface using smartphones that provides 9 DOF axis-constrained manipulation and free-form rotation.To overcome ambiguous manipulation problems in our interface, we have implemented a dynamic camera, which can effectively resolve the ambiguity.According to user feedback, our system provides a positive user experience for both novice users and experts.Results of our user study indicate that our interface is faster than a keyboard-mouse interface for novice users.In future, we hope to enrich and improve our interface according to the feedback from experts, while keeping the advantages of the current interface.Moreover, we hope to investigate further applications based on EZ-Manipulator.

Fig. 4
Fig. 4 Multi-touch gestures for (a) camera control and (b) object selection.

Fig. 5
Fig. 5 (a) Examples of ambiguous configurations.(b) The ambiguity on the right in (a) is resolved using dynamic camera adjustment.

Fig. 6
Fig. 6 Results for (a) Experiment I and (b) Experiment II.

Fig. 8
Fig.8Applications using EZ-Manipulator.Left: rapid prototyping of interior designs.Centre: multiple user collaboration and interaction using a wall-sized display.Right: multi-player gaming.