skip to main content
10.1145/3613904.3642511acmconferencesArticle/Chapter ViewFull TextPublication PageschiConference Proceedingsconference-collections
research-article
Open Access
Artifacts Available / v1.1

CamTroller: An Auxiliary Tool for Controlling Your Avatar in PC Games Using Natural Motion Mapping

Authors Info & Claims
Published:11 May 2024Publication History

Abstract

Natural motion mapping enhances the gaming experience by reducing the cognitive burden and increasing immersion. However, many players still use the keyboard and mouse in recent commercial PC games. To solve the conflict between complex avatar motion and the limited interaction system, we introduced CamTroller, an auxiliary tool for commercial one-to-one avatar mapping PC games following the concept of a NUI (natural user interface). To validate this concept, we selected PUBG as the application scenario and developed a proof-of-concept system to help players achieve a better experience by naturally mapping selected human motions to the avatars in games through an RGB webcam. A within-subject study with 18 non-professional players practiced common operation (Basic), professional player’s operation (Pro), and CamTroller. Results showed that the performance of CamTroller was as good as the Pro and significantly higher than Basic. Also, the subjective evaluation showed that CamTroller achieved significantly higher intuitiveness than Basic and Pro.

Skip 1INTRODUCTION Section

1 INTRODUCTION

An interaction method with more intuitiveness usually yields a higher performance and better experience. This is also the reason why graphical user interfaces prevailed over command lines and touchscreen devices triumphed over keypad phones. Creating a new interaction system that is more natural for people to use could fundamentally change how people interact with technology. This is involved in the theory of NUI [60]. NUI allows users to interact through modalities like touch, gesture, and voice, making the interaction easier, more fun, and more natural since it utilizes more basic skills of the user compared to traditional interaction[33].

Figure 1:

Figure 1: Natural mapping for different motions.

According to the Enhanced Framework for Intuitive Interaction (EFII) proposed by Alethea Blacker, a higher natural mapping extent leads to higher intuitiveness [3]. According to Skalski et al., there are four types of natural mapping: directional natural mapping (first level), kinesic natural mapping (second level), incomplete tangible natural mapping (third level), and realistic, tangible natural mapping (highest level) with increasing naturalness [46]. The higher the naturalness of the mapping from players’ actions to avatars is, the lower the learning costs and the more advantage of intuitiveness can be reached. Using a naturally-mapped controller, users can perform tasks relying less on their memory [30]. This type of controller can also improve the sense of presence, interactivity, realism, and enjoyment, as well as enhance performance and accuracy in a game [1, 44].

Motion control, an intuitive approach based on the principle of natural mapping and the concept of NUI that enables players to control avatars with real-life actions rather than using traditional game controllers or the combination of mouse and keyboard, has attracted great attention since the launch of the revolutionary product, Nintendo Wii, in 2006 [23]. It is indicated that such a motion controller has high intuitiveness since it offers a more familiar and tangible interaction for the user [52]. Motion control also provides a more natural interaction way to interact with the game systems than pressing keys or rotating joysticks, which was proven by Skalski et al. [46]. Rachel L. Franz et al. also found that players may even prefer motion control techniques performed worse and with higher perceived workloads than input with buttons joysticks when the techniques are enjoyable and make them feel present [9].

Although motion controllers substantially enhance user experience, many players still use traditional input methods in the most popular games that have appeared in recent years. There are two main reasons: 1) the usage of motion controllers is primarily limited to motion-controlled console games because traditional games are not specifically designed to support motion control. Even though some developers have tried to migrate the motion controller to traditional games, the differences from the root when designing the game make the motion controller acclimatized. 2) The physical fatigue of conducting actions like standing, walking, holding something, dancing as input in motion control games can also quickly bring tiredness to stop players from playing for a long time, like more than one hour. Even with higher intuitiveness and better gaming performance, the discomfort of holding a physical motion controller without support (like holding a gun-shape controller) for a long time could strongly deteriorate the user experience [45].

Even the leader of motion control games, Nintendo, has shifted the role of motion from primary to auxiliary in the most newly-released games, such as Super Mario Odyssey [38] and The Legend of Zelda: Breath of the Wild [39]. In these games, motion input is utilized for supplementary actions rather than essential ones or as an alternative for buttons and joysticks. Only sports games aiming more at fitness than entertainment still use motion as the main input method for vital operations. The arm fatigue problem caused by hanging hands in the air also occurs when using Virtual Reality, and it affects the user experience to a great extent [17, 21, 22]. Hence, motion control is not suitable to be the primary input method for long-time playing since fatigue and tiredness can impede the gaming experience while incorporating it as the auxiliary input is a more proper application.

Although Nintendo adopts motion control as an auxiliary input method to improve the game experience with less fatigue, these motion control advantages are still limited in the narrow scenario of the Nintendo platform and not applied to most commercial PC games, which contribute more than one-fifth of the whole gaming market [32]. This paper introduces CamTroller, a new systematic design concept that combines webcam-based motion tracking and a mouse/keystroke emulator based on the concept of NUI and natural mapping for typical closed-source computer games with complicated avatar action.

1.1 limitations in Current Commercial PC Games

Video games hold immense significance in the gaming market, providing immersive and interactive experiences that foster social connectivity and artistic expression. Upon analyzing the distribution of video game sales by genre, it becomes evident that action games dominate the market with a significant share of 26.9%, followed by shooting games at 20.9% and role-playing games at 11.3% [59].

In order to involve more naturalness in the interaction system of current commercial PC games, we focus on the PC games in which one player controls only one role. This is because this one-player-to-one-avatar mapping is the most suitable and direct situation for natural mapping, which means what the player does could direct the map to what the avatar does. By finding the most comfortable and natural way for players, we are exploring how the technology could adapt to player needs and preferences with NUI [13].

In the early stage, role-playing shooting games on PC like Quake [18] with limited fundamental actions like "WASD" are easy to learn while fingers are adequate for controlling; there is no demand for involving extra input methods such as motion control. However, not only the newly released shooting games but also other role-playing and action games aim to create a more realistic experience by adding a lot of actions—the complexity of action control pleasing game players and winning success in marketing. However, a strong conflict exists between complicated actions and the limited interaction system in which people can only control the avatar through the mouse and keyboard. This conflict creates a barrier for general players to enjoy the free combination of different actions because of the related memory burden, physical finger shortage, learning cost, and related health problems when using a keyboard and mouse.

1.1.1 Memory burden.

It is a well-accepted design to use WASD for directional movement while the mouse is used for moving field of view and attack, and it has become a general rule for various genres of 3D games in which the player is controlling an avatar, such as action games, role-playing games, and shooter games [5, 15, 53]. These "WASD" motions are directionally mapped [46] and easy to learn and perform. Although it is the lowest naturalness (first level) in natural mapping, players still benefit from its intuitiveness. This key location also fits the natural layout of the normal human hand structure from the perspective of ergonomic [14], bringing a comfortable operating experience. However, newly released games aim to create a more realistic experience by adding a lot of actions, such as peeking, crouching, and boosting medical items. With more actions and features added to games, the complexity of games is increasing and leads to a higher memory burden on keybindings, as some actions assigned to some key without any natural mapping and make it difficult to remember, and also, people need to remember which finger should assign to which key to minimize the possibility of pressing on a wrong key.

1.1.2 Finger shortage.

Additionally, players using the mouse and keyboard often face ’finger shortage’, which refers to the difficulty of pressing two keys simultaneously with a single finger. When using WASD layout for the direction move of the avatar, the left hand is typically placed on the keyboard with the index finger, middle finger, and ring finger on ’D’, ’W’, and ’A’. This placement of the left hand makes the pinkie and thumb naturally lay on the ’Shift’ and ’Spacebar’, and these two keys are usually assigned to some frequently performed actions like rush and jump. Keys around WASD are assigned other actions, which are mainly performed by the index and ring fingers, which is the root cause of the ’finger shortage’ problem. When playing a game, if the player needs to press ’D’ with their index finger to move right but also needs to press another key such as ’E’, ’R’, or ’F’ to perform an action, they cannot use their index finger for the second key press because it is already occupied. This situation is also true when the player uses their ring finger to press ’A’ to move left and then needs to press ’Q’ or ’Tab’ to perform another action. Some pro players are capable of overcoming this problem by borrowing an ’idle’ finger like the middle finger or thumb, but it requires the hand to be twisted into an unnatural posture and brings discomfort. While the normal player just uses their ring finger to press either ’A’ for moving left or other keys for other avatar actions. In another way, the pro-player adopted a more complex finger action to let their avatar perform multiple actions simultaneously to enhance their performance.

1.1.3 Learning cost.

The more to memorize, the more effort for learning. Also, as mentioned, there are two things that need to be learned: which key is assigned to which avatar action and which finger should be assigned to which key. All players that are familiar with certain games are well-learned for the first thing. However, facing the "finger shortage" due to the increased game fun and complexity is the second thing that distinguishes the pro players from normal players. Professional players put more effort into difficult finger actions to perform better. We also investigated the pro-action that the professional player performs and compared it with our newly developed natural mapping system and the general player’s basic action in our user study.

1.1.4 Examples in one-to-one avatar mapping PC games.

In order to understand the influence of ’finger shortage’ and related game experience limitations among the large field of commercial PC games, some specific examples in one-to-one avatar mapping PC games are provided in this section. For hardcore shooting games, players are performing a shooter in different maps, which is a typical one-to-one avatar mapping game, usually performed on PC platforms, like Arma 3 [19], Insurgency [20], Escape from Tarkov [11], and PUBG [49]. All of them share the typical pain point in the gaming experience, "finger shortage", the complex actions assigned to different keys, limited cognition ability, and lack of the degree of freedom in hand control. PUBG, a representative game of hardcore shooting games, assigns left and right peeking action to ’Q’ and ’E’, which means peeking conflicts moving in the same direction. The crouch action also conflicts with moving right since it is assigned to ’C’, which can not be pressed together with ’D’ with normal operation.

Except for the shooting games, a lot of recent action or role-playing PC games are facing a similar issue. In the well-known action-adventure game Red Dead Redemption 2 [12], ’Q’ is used to enter/exit the cover, which makes this action conflict with moving left with ’A’, turning frequently peeking out of the cover to shoot while moving left impossible. In the famous action-adventure game Cyberpunk 2077 [41], since the key for using consumables to heal is ’X’, it is difficult to press when moving right with ’D’. A similar problem also happens in Elden Ring [10], a masterpiece of action role-playing games, for using items to heal by pressing ’R’ when moving right by pressing ’D’. Remapping these frequently performed actions to the side buttons of the mouse can alleviate it to a great extent, but the number of side buttons is limited, and buttons are usually occupied by some other actions that are also conducted at a high frequency.

1.1.5 Related health issues.

Moreover, playing computer games with a keyboard and mouse for a long time may also cause health issues. Multiple research point out that the longtime usage of a keyboard and mouse can lead to discomfort, pain [7], and even injuries such as orthopedic [51] and tendinopathies [31]. A review has pointed out a negative impact of video game playtime on the musculoskeletal system in about 92% of studies [50]. Fine motor strain when performing repetitive actions is considered one reason behind the musculoskeletal hazards, while another is sitting for an extended period of time, often with poor posture. Hence, frequently stretching and conducting flexion, extension, and rotation of the cervical, thoracic, and lumbar spine are strongly recommended in Ahmed K. Emara et al.’s paper [6]. Therefore, it will be helpful if the players need to combine a motion controller for extra neck, arm, and body movement through a keyboard and mouse during the long period of game-playing.

1.2 Why Camtroller?

Memory burden of remembering keybindings, difficulties in forming muscle memory for ’touch typing’ in games, as well as the ’finger shortage’ problem, resulting in a high entry difficulty and steep learning curves, significantly affecting the gaming experience of beginners and ordinary players. These obstacles turned out that current commercial one-to-one avatar mapping PC games lack naturalness in their user interaction system. Therefore, it is where the NUI should be brought in to overcome these drawbacks and improve intuitiveness to benefit a large group of players. Trying to reduce the mentioned limitations, we developed the concept of Camtroller, an auxiliary tool, by natural mapping control as supplemental input to closed-source commercial games. Offering options for players to trigger some actions of the avatar by conducting the same or similar movements with their body will be effective in helping those who haven’t memorized the keybindings. Naturally mapping players’ motion to drive the avatar also enables them to conduct actions without looking down at the keyboard for the keys to be pressed with ’touch typing’ when fighting against the enemy in games, avoiding the risk of being defeated because of losing the view. Moreover, utilizing motion control can turn body parts besides the hand, such as the head, torso, and feet, into ’extra fingers’ for input, addressing the problem of ’finger shortage’ in an intuitive way.

Based on the detailed investigation of game players, this work introduced Camtroller, an auxiliary tool following the NUI concept for involving natural mapping control in current one-to-one avatar mapping PC games to overcome the aforementioned problems. Also, this work validated this concept by developing a system in one certain game through the combination of a motion-tracking solution and a mouse/keystroke emulator.

For the structure of this work, we first carefully illustrated the advantage of NUI and natural mapping control, investigated the gaming experience limitation of commercial one-to-one avatar mapping PC games, and introduced the NUI concept of Camtroller to enhance player experiences in terms of decreasing memory burden, eliminating finger shortage, reducing leaning cost and allowing body movement to avoid muscle fatigue from the longtime keyboard-mouse usage. Then we reviewed the existing works in the field of motion controllers and compared different technical routes in the section of related works. Motion tracking algorithms through a commodity RGB camera are selected since it has the least requirement for hardware and does not need to wear anything on the body. We chose PUBG as the testing platform for verifying our concept since this game is phenomenal and has a tremendous amount of players. In the concept design part, we conducted a survey to find out players’ habits to confirm the player needs and then analyzed which actions need to be mapped and how they can be mapped naturally. Then, we introduced the technical implementations from the 3D position and rotation measurement with accuracy test to the detailed posture calculation and triggering criteria of each selected motion in the technical implementation section. All technical algorithms of CamTroller were firstly confirmed by a feasibility test and then followed by a user study with objective performance measurement and subjective intuitiveness assessment. The results were discussed, and we conclude this work with current limitations and potential extensions.

Skip 2RELATED WORKS Section

2 RELATED WORKS

There are physical and non-physical motion controllers. Since the motion controller in this project acts as the auxiliary input device, it should not occupy the hand. Hence, related works of wearable motion controllers are investigated.

2.1 Wearable Motion Controller

Many researchers and developers have attempted to develop small-size wearable motion controllers that can be categorized into passive and active tracking devices.

2.1.1 Passive Tracking Device.

In some research, the infrared camera on the Wii remote is used to track the infrared marker, which can be mounted to the goggles, earphones, or head straps, and then the data is used to calculate the head orientation and position [2, 25, 54]. The Wii remote is also applied for hand and gesture tracking [27, 58]. A commercial product under this principle, named TrackIR, comprises two parts: an infrared camera and a TrackClip containing retro-reflective markers [36]. This product tracks only the head and mainly targets the driving simulator player. The requirement for an infrared camera and special markers to be put on the player’s body makes the cost relatively high and inconvenient to use.

2.1.2 Active Tracking Device.

Active tracking devices with an Inertial Measurement Unit (IMU) embedded inside the controller attached to the human body enable sensing of the player’s motion as input signals instead of capturing from joysticks or buttons. IMUs are widely used in tracking, from large headgear like VR/AR headsets [26] to small ones like true wireless earphones [42]. Except for head tracking, researchers also developed a motion-tracking system for the upper body using low-cost inertial sensors [24, 62, 63].

A small commercial wearable head-tracking motion controller, TrackIMU, released in 2016, consists of a 9-DOF IMU to track the player’s pitch, yaw, and roll motions to drive the avatar’s view orientation in games, which reaches similar functions of a console controller [34]. However, it is designed especially for driving and flight simulator games and is unsuitable for other games.

Tilted, a more versatile product created by Akila Zhang, was released in 2018 on Kickstarter [43]. This product detects head movements to simulate keyboard keystrokes or mouse movement, eliminating the need for game original support. This product can drive the avatar to perform a motion similar to the player’s. Also, multiple Tilted can be attached to different parts of the body for composed motion. However, this product was withdrawn from the market even though it was much more affordable compared to passive tracking devices like TrackIR. The reason could be a physically attached item is still too unnatural and inconvenient. Therefore, a controller-less motion-controlling system should be considered for this study.

2.2 Non-physical Motion Controller

Non-physical motion-sensing devices, such as the Kinect, use the depth sensor and the RGB camera to detect players’ actions, allowing direct game playing with body motion. Much research has been done to study the enhancement in the game experience of utilizing the Kinect [8, 35, 40]. However, the extra costs of Kinect and some other mature commercial products, like TrackHat, prevent wider use by players due to their complex hardware [61].

Motion-sensing applications only with a commodity camera should be easily expanded according to economic reasons. Wang et al. used a commodity webcam tracking the face position to improve role-playing as well as presence effectively [57]. However, only 2D information about the center position of the face could be retrieved, which made the head orientation angle inaccurate. Moreover, their face recognition method ran lower than 10 FPS, which is insufficient during gaming. Moreover, the motion control was only in a specific prototype game, which constrained the versatility. This problem also happened to the research of Sko Torben et al.[47, 48]. Their approach is only executable for a self-developed game or a game with an adjustable gaming engine. The stability was also problematic when the scenario moved from a clean lab background to the home with more interfering objects [47]. Moreover, the FaceAPI used in the research is a commercial product that is difficult to access. In this study, only open-source toolkits are considered to provide posture recognition.

Skip 3CONCEPTUAL DESIGN Section

3 CONCEPTUAL DESIGN

The key concept of the CamTroller is to drive the avatars in the game by detecting the player’s similar motions using an RGB camera with free posture recognition toolkits. In order to cover a large target group of people, the popular computer game PUBG is selected as the platform to use CamTroller in this study. Before digging into the technical algorithms, the initial question is which action in the game needs to be mapped by CamTroller. To answer this question, we figured out the detailed interaction system of keyboard and mouse-based shooting games and conducted a survey on our focus group.

3.1 Player Survey

As PUBG is selected to validate our concept of CamTroller, we first conducted an online survey. This questionnaire was launched on an online questionnaire platform (Sojump) and was mainly spread among college student players of PUBG. There are a total of 28 multiple-choice questions in the questionnaire. Questions are set to understand several general questions: 1. the players’ gaming habits on directional moving, peeking, and item consumption; 2. why they assign their fingers in a certain way and whether they explored some new ways; 3. Also, their experience with different approaches to item taking is also asked to figure out any possible pain point. A participant will answer a maximum of 18 questions since there are dependent relationships between selected options and the following questions. The details of each question and the dependent relationship can be viewed in the appendix.

There are 190 responses in total, with 122 valid responses and 68 non-players rejected at the beginning.

For performing ’WASD’, the result turned out that the 92% and 87% players used the middle finger to move forward and backward. 88% of them use their ring finger to move left and index finger to move right. None of them use the customized key to replace ’WASD’.

For peeking, the response turned out that 11% of players never used the operation of peeking. Among the players that perform peeking in games, 80% of them use the ring finger for left peeking and the index finger for right finger (basic action). 8% and 12% of them use the middle finger for left peeking and right peeking, respectively (pro action). 57% of players that used peeking never tried to move and peek simultaneously. For why they perform peeking in this way, the two main reasons they never try this is that they never think about this operation (53%) and think the same finger can not press two keys simultaneously (42%). These results confirm our mentioned insight that most amateur players use the same finger of performing left and right walking to perform left and right peeking, which leads to a ’finger shortage’ and the situation that more than half of them did not even have a try to perform them simultaneously.

For the approaches of taking boost items, 58% of players use open the inventory, which is the basic approach from the initial PUBG tutorial. Meanwhile, 24% use the wheel menu, and only 18% use corresponding keys. It turned out that most of the PUBG players chose more complex approaches to lighten memory and keystroke burdens. However, when responding to the frequency that they feel they are unable to use boosting items when engaging with enemies, half of them often feel stuck in this problem. Only 2% of the players were never disturbed by this problem.

As a conclusion, rare players remap the key to certain actions. For peeking and moving, as the main part players are using the same finger for these two actions, they don’t even think about performing them simultaneously. In other words, the physical lack of operational freedom (finger shortage) directly limits their ability to try new operational combinations at a cognition level. Hence, it is predictable that if CamTroller could involve a higher degree of freedom for action control, the way players interact with the gaming platform could be changed from a fundamental cognitive level. Also, the survey turned out that the boost-item taking is also a pain point in the gaming experiences, which CamTroller could also try to deal with.

3.2 Motion Selection and Analysis

Based on the survey, it is confirmed that there are some issues that the concept of CamTroller could deal with. However, when digging into the design details of how CamTroller maps to certain avatar actions, many details still need to be discussed. In this section, in the first step, we selected the avatar motions that could be enhanced from the perspective of using a keyboard and mouse. In the second step, we try to extract detectable features from the avatar motions. Finally, we considered the basic posture of the computer game player and tried to find player motion with similar detectable features of the avatar motion to maintain a high intuitiveness level.

3.2.1 Step1: Motion selection.

firstly, we figured out which actions should remain and which avatar actions CamTroller could deal with to benefit user experience in PUBG. One finger can only be ready for very limited motions, and "WASD" is already a consensus in gaming, occupying players’ ring, middle, and index fingers. To avoid the ’finger shortage’ and confirm with the survey results, peeking is chosen first. Because the avatar usually needs to jump over objects and crouch to avoid getting a headshot, the crouching and jumping are also selected. Free-looking is also selected since this behavior involves moving the mouse as well as pressing the ’Alt’ key, which can sometimes interfere with aiming once players release the ’Alt’ key. When engaging with enemies, most players more or less feel they are too slow in using boost items. The main reason could be more than half of them use the troublesome approach of ’opening inventory’ and clicking the corresponding boost items, which is difficult in a hurry. Hence, the most frequently used boosting items, including energy drinks, painkiller pills, and adrenaline syringes, are adapted to the CamTroller mapping system.

3.2.2 Step2: Analysis features of avatar motion.

After determining the avatar’s actions that need to be mapped, the following step is analyzing the detectable features of avatar motion, such as angles, distances, and positions For the purpose of finding proper features of the body suitable for motion sensing, we first analyzed how humans perform these avatar motions in the physical world Figure 2. The participants were asked to imitate five movements of the avatar in the game, including peeking, crouching, and consuming three types of boost items. Neutral posture was also performed, acting as the ’reference zero’. To observe the peeking and crouching motion, participants were holding a toy pistol to simulate the scenario in the game. As for the boost item consumption observation, vitamin pills were used to simulate the painkiller, a can of cola was used to simulate the energy drink, and a pen was used to simulate the syringe of adrenaline. In total, five participants took part in the observation activity.

Figure 2:

Figure 2: Avatar motions in the physical world and corresponding detectable features illustration for (a) crouching, (b) neutral position, (c) peeking, (d) taking pills, (e) taking a drink, and (f) injecting drugs.

The result of observation indicated that peeking is usually conducted by leaning the torso and accompanied by the head lateral bend, as shown in Figure 2 (c). Meanwhile, crouching (shown in Figure2 (a)) causes the whole body to move down. Therefore, head orientation angle and position can be used as the detection parameters. For the boosting items, using energy drinks, painkiller pills, and adrenaline syringes in the real world is usually accompanied by various gestures. Fingers flex at a large angle while holding an energy drink bottle. On the other hand, taking pills keeps the fingers straight or slightly curved. Both of these motions bring the hand closer to the mouth, which means the distance between the hand and mouth will be significantly smaller than normal. These two can be viewed in Figure 2 (d) and (e). Conducting adrenaline injections, as shown in Figure 2 (f), requires hand gestures similar to drinking, with fingers wrapped around the needle, but the hand position differs as it is applied to the heart. Hence, the flexion angle of the finger, as well as the distance between the hand to the mouth, can be utilized as the parameter for detecting.

3.2.3 Step3: Mapping player motion to avatar.

Then, consider how similar actions could be used to drive the avatar with the intuitiveness principles [3] under the game-playing situations. Based on the reality that all motions are performed while the players are sitting on a chair and using the keyboard and mouse for basic actions (WASD and aiming), which means they cannot exactly do crouching or peeking as the avatar during game playing. Hence, player motions with the corresponding detectable features are adopted for mapping player motion to avatars to achieve the principle of ’natural’ or intuitive [60].

Ideally, the avatar should simultaneously mimic the player’s movements accurately. When the player leans or moves down the head for a certain degree or distance, the avatar should perform the corresponding action. However, in the game, peeking and crouching are usually treated as state transitions, meaning that the avatar switches to another state when the corresponding key is pressed, and there is no half-peeking or half-crouching. Thus, a proper threshold is needed for triggering peeking and crouching. When the player lateral bends their head left or right above the threshold value, the keys for left or right peeking, typically Q and E, will be pressed. Similarly, when the player moves down or up their head over a specific distance, it should trigger the crouch or jump of avatars in the game.

As for using boost items in the game, pressing related keys will be triggered by hand gestures and positions. When the hand is close enough to the mouth, and the distance is below the critical value, the flat hand will result in pressing the key for painkillers, whereas the flexed hand will result in drinking the energy drink. Curved fingers will result in key pressing for adrenaline injection when the head is closer to the heart.

Figure 3:

Figure 3: Illustration figures for (a) head motion, (b) coordinate in the normal condition, and (c) coordinate in MediaPipe.

When looking around without moving the torso (free-look), the change of view is mainly accomplished by the axial rotation of the head. Free-looking in the game is a continuous process that allows the avatar to follow the head rotation. The rotation degree of the avatars’ view is linearly related to the mouse moving distance, so when the player presses the key for the free-looking, CamTroller will track the player’s head’s axial rotation and flexion-extension Angle, as shown in Figure 3 (a), to move the view correspondingly by simulating the movement of the mouse.

Skip 4TECHNICAL IMPLEMENTATION Section

4 TECHNICAL IMPLEMENTATION

From the perspective of NUI, CamTroller should provide an enjoyable way for novices to trigger the avatar quickly and seemingly effortlessly to skilled practitioners. This is more than the simple idea of being ’natural’ or intuitive [60] using some existing posture detecting system that is not designed for this certain scenario and maps it to the avatar. We carefully considered the technical details of each detectable feature of each player motion and also the triggering methods of the commercial closed-source PUBG system to minimize system latency and enhance user experiences.

To estimate the head posture, we need to have the depth data of landmarks on the Face Mesh. Since only a general commercial RGB camera is considered, an algorithm or solution capable of estimating depth data is essential. Many Face Recognition algorithms are only able to process 2D images for face segmentation, and landmark assignments are not able to estimate the depth.

MediaPipe [16] is a free open-source ML solution developed by Google, and its Face Mesh module can estimate the depth data of all mesh points on the face, which is why it was chosen for this research. MediaPipe provides landmarks’ position after processing the input image. The landmarks position data x and y are normalized to [0.0, 1.0] by the image’s width and height. As for the depth data, z has the same scale as x, which is not the distance from the camera to the landmark point but the distance between the camera and the center point of the face. However, MediaPipe does not provide the head orientation angles directly, meaning the angles must be derived from the 3D coordination of mesh points.

As for hand gesture detection, MediaPipe also provides the ’Hand’ solution that can provide coordinates data of knuckles. The landmarks of this module can be viewed in Figure 7 (a). The coordinates of the knuckles will be used to estimate the gesture and calculate the spatial relationship between different fingers.

4.1 Head Position Estimation

The planar head position can be represented by the position of the nose tip xnose and ynose. Since the sitting posture will gradually change, it is unsuitable only to use the nose coordinates. It is better to depend on a reference. Under this application, the shoulder is chosen. The position of both shoulders can be retrieved with the MediaPipe ’Pose’ solution. The landmarks of the left and shoulder, landmarks 11 and 12, are shown in Figure 7 (b). The coordinate of the left and right shoulder, which are \((x_{shoulder_L}, y_{shoulder_L})\) and \((x_{shoulder_R}, y_{shoulder_R})\), can calculate the middle point’s coordinate, \((x_{shoulder_M}, y_{shoulder_M})\). Then, the head relative position is \(x_{s2n}=x_{nose}-x_{shoulder_M}\) and \(y_{s2n}=y_{nose}-y_{shoulder_M}\). The vertical distance between the nose and the shoulder is Ls2n = |ys2n|

4.2 Head Orientation Estimation

Due to the complex structure of the cervical spine, head motion is deeply coupled. To have a straightforward estimation method, angles must be separated. Since flexion-extension mainly depends on the joint between the C0 (Occipital Bone, a bone of the head connecting the cervical spine) and the C1 (Atlas, the first cervical vertebra), while the C1-C2 (Atlas-Axis) joint, the one between the first and the second vertebra, contributes most to the axial rotation and the lateral bending relies more on the inferior vertebrae [4], we may simplify the head motion is carried out in the sequence of lateral bending, axial rotation, and flexion-extension.

Since MediaPipe generates data under a coordinate system different from what people generally utilize, we first introduced the angle calculation in a general coordination system and then mapped MediaPipe’s coordination to it.

4.2.1 Angles Calculation Under General Coordinate.

In a general coordination system for head pose, the center of the head is the original point with a right-hand coordinate system that takes the left-hand direction as the X-Axis, the facing direction of the head can be described through Euler Angle in the sequence of ZYX (Figure 3 (b)).

Figure 4:

Figure 4: Face image with MediaPipe Face Mesh and indication for points for head angles calculation.

Lateral bending is only related to the planar coordinate data x and y. Two symmetric mesh points along the middle plane of the head (the Left point and Right point in Figure 4) are able to calculate the lateral bending Angle. The left point coordinate data is with a subscript L, and the same for the right one. The lateral bending Angle can be derived by: (1) \(\begin{equation} \theta _{LB}=atan\left(\frac{y_L-\ y_R\ }{x_L-\ x_R}\right) \end{equation} \)

Axial rotation depends not only on the coordinate data x and y but also on the depth z. Two symmetric mesh points are still adequate for calculating it. Included the lateral bending, the equation is: (2) \(\begin{equation} \theta _{AR}=atan\left(\frac{z_R-\ z_L\ }{\left|\left(x_R-\ x_L\right)/cos\left(\theta _{LB}\right)\right|}\right) \end{equation} \)

Flexion-extension can be determined by two mesh points, which are located at the upper part of the head and the lower part. Considering the previous two angles, it can be calculated by: (3) \(\begin{equation} \theta _{FE}=atan\left(\frac{\left(z_{Upper}-\ z_{Lower}\right)/cos\left(\theta _{AR}\right)}{\left|\left(y_{Upper}-\ y_{Lower}\right)/cos\left(\theta _{LB}\right)\right|}\right) \end{equation} \)

4.2.2 Angles Calculation Under MediaPipe Coordinate.

Its direction of Z-Axis and Y-Axis is reversed. The original point is placed at the top-left corner of the image. Since the equations are about the difference of the mesh points coordinate but not the absolute positional data, only converting the difference calculation about y and z will be sufficient. The coordinate system of the MediaPipe is illustrated in Figure 3 (c).

Meanwhile, the landmarks position data x and y are normalized to [0, 1] by the image’s width and height. The depth z has a similar scale to the x. The position data x, y, and z to obtain the correct angle should be scaled back to the actual scale with zooming factors kx, ky, and kz. The relation is: kx = kz, kx/ky = CameraAspectRation. The equations for calculating angles under normal coordinate system eq.(1)(2)(3) are transformed into: (4) \(\begin{equation} \theta _{LB}=atan\left(\frac{-\ k_y\left(y_L-\ y_R\ \right)}{k_x\left(x_L-\ x_R\right)}\right) \end{equation} \) (5) \(\begin{equation} \theta _{AR}=atan\left(\frac{-\ k_z\left(z_R-\ z_L\ \right)}{\left|k_x\left(x_R-\ x_L\right)/cos\left(\theta _{LB}\right)\right|}\right) \end{equation} \) (6) \(\begin{equation} \theta _{FE}=atan\left(\frac{-\ k_z\left(z_{Upper}-\ z_{Lower}\right)/cos\left(\theta _{AR}\right)}{\left|k_y\left(y_{Upper}-\ y_{Lower}\right)/cos\left(\theta _{LB}\right)\right|}\right) \end{equation} \)

4.3 Accuracy Evaluation Test

Since the estimation method for head orientation angles is based on a machine learning solution whose accuracy has not been tested and validated, we conduct a test to evaluate its accuracy.

4.3.1 Apparatus.

Due to the difficulty and the unavoidable errors of measuring the angle of a real human, a head model is more suitable for accuracy validation. A mock head model is installed on a tripod with two degrees of freedom (DOF) (CIMAPRO LD-2R) that can pan and tilt to simulate the motion of a human head. The degree of freedom from panning is for axial rotation, while the one from tilting is for lateral bending and flexion-extension. There are scales on the tripod head for obtaining angles. A webcam (Rapoo C270AF) working as an image-capturing device is connected to a laptop (Zephyrus G14 2022) where the program is running. A grid transparency sheet with the scale is snapped to the desk and used as the camera distance reference. Figure 5 shows a photo of all the apparatus.

Figure 5:

Figure 5: Apparatus for the evaluation of head orientation estimation accuracy

4.3.2 Procedure.

There are tests for three types of head mono-motion. The test range of rotation for lateral bending, axial rotation, and flexion-extension are -27 deg to 27 deg, -50 deg to 50 deg, and -30 deg to 30 deg, while the increment steps are 5 deg, 5 deg, and 3 deg separately. In each test, the tripod goes through the leveling first to make sure the detected angles of the mock head model are all zero. The mock head model will then be adjusted to the minimum limit as the start point of the test. The measured angle is stored in an Excel file, and an image of the mock head model is saved each time the tripod head is adjusted with the step under the process of reaching the maximum limit.

4.3.3 Results.

The error percentages for lateral bending, axial rotation, and flexion-extension are 3.88%, 5.37%, and 6.7%, respectively. Head posture angle results are also displayed in Figure 6. As seen in the figure, lateral bending is the most accurate since it is only related to the planar coordinate data x and y, while no estimated depth data z is involved. The axial rotation estimation is of minor error within 40 deg. The potential reason for the error is that the landmarks used for angle calculation are invisible over 40 degrees, and MediaPipe predicts their location. Also, it can be observed that the estimation of Extension is better than the Flexion. Generally speaking, the accuracy is adequate for motion detection.

Figure 6:

Figure 6: Results of the head orientation estimation.

4.4 Hand Gesture and Position Estimation

Figure 7:

Figure 7: Landmarks of MediaPipe for (a) hand solution and (b) pose solution.

The coordinate data of the middle finger of the right hand are utilized for estimating the hand gesture. To figure out the different boosting items, the curving condition of the hand is required. The coordinate data of hand landmarks 9, 10, and 11(Figure 7 (a)) is used for calculating the angle between line segments from hand landmarks 9 to 10 and from 10 to 11. Since the depth data z is not provided, the projected angle of the middle finger will be applied: (7) \(\begin{equation} \theta _{proj}=\left|atan2\left(y_9-y_{10},\ x_9-x_{10}\right)-atan2\left(y_{11}-y_{10},\ x_{11}-x_{10}\right)\right| \end{equation} \) The palm is deemed flat when the θproj is smaller than 20 deg. Also, it will be considered curved when the θprojected is greater than 30 deg.

For the right hand position(xhand, yhand), it is represented by the coordinates of the landmark 9 (x9, y9). However, only knowing the hand’s position is not enough to detect different boosting items. The position of the mouth(xmouth, ymouth) as well as the heart should also be retrieved. As for the mouth, it can be retrieved from the landmarks of Face Mesh, while the heart has to use another solution in the MediaPipe, which will be the ’Pose’. Although there are no landmarks for the heart, it does have one, landmark 11, for the left shoulder(xshoulder, yshoulder), which is close to the heart, so it will be utilized to approximate the heart. Then, the distance between hand to mouth and hand to shoulder will be: (8) \(\begin{equation} d_{h2m}=\sqrt {(x_{hand}-x_{mouth})^2+(y_{hand}-y_{mouth})^2} \end{equation} \) (9) \(\begin{equation} d_{h2s}=\sqrt {(x_{hand}-x_{shoulder})^2+(y_{hand}-y_{shoulder})^2} \end{equation} \)

4.5 Program Structure

4.5.1 Calibration.

In most cases, the camera is placed above the monitor at its middle plane, which means the initial axial rotation θAR is very close to zero. However, the camera is at a location that is not normal to the player’s face but with an inclined angle from a higher or lower place. It results in an initial flexion-extension Angle (\(\theta _{FE_0}\)). Also, the real flexion-extension Angle will be \(\theta _{FE}^{\prime }=\theta _{FE}-\theta _{FE_0}\). As a result, a calibration function is required to retrieve the initial values at first. The initial between the head and the nose \(L_{s2n_0}\) will also be obtained.

4.5.2 Motion Detection Criteria.

All selected motions in the shooting game are related to keyboard interaction except for the free-looking. The initial triggering criteria are based on the angle or the displacement, but this method causes a noticeable lag when the threshold value is large to avoid unwanted triggering since the head is not fixed at a position in a steady state but gently twitches randomly. More details can be viewed in Figure 8.

Figure 8:

Figure 8: Illustration figure for the relationship between threshold width and (a)time lag, (b)stability.

Also, setting the threshold to small values causes system instability, while a proper value is hard to determine and difficult to fit for a large group of people. So, we will use angular velocities of lateral bending \(\dot{\theta }_{LB}\) and linear velocities of head movement along Y-Axis \(\dot{y}_{nose}\), which are the derivatives of angles and displacements. The setting of ’Dead-zone’ commonly applied on the game controller will be used to enhance stability and avoid wrong action recognition in the format of angle θDZ, position yDZ, and factor kDZ. The criteria values for motions to be considered happening based on the collectible data are listed in Table 1. Figure 9 and Figure 10 are also presented for a better illustration.

Table 1:
MotionCriteria
Left Peeking\({\dot{\theta }_{LB}}\lt -\dot{\theta }_{thre}\) & θLB < − θDZ
Release Left Peeking(\({\dot{\theta }_{LB}}\gt \dot{\theta }_{thre}\) & θLB < − θDZ) or (θLB > − θDZ)
Right Peeking\({\dot{\theta }_{LB}}\gt \dot{\theta }_{thre}\) & θLB > θDZ
Release Right Peeking(\({\dot{\theta }_{LB}}\lt -\dot{\theta }_{thre}\) & θLB > θDZ) or (θLB < θDZ)
Crouching(\(\dot{y}_{nose}\gt \dot{y}_{thre}\) & \(k_{thre}\times L_{s2n_0}\lt L_{s2n}\lt k_{DZ}\times L_{s2n_0}\)) or (\(L_{s2n}\lt k_{thre}\times L_{s2n_0}\))
Release Crouching(\(\dot{y}_{nose}\lt -\dot{y}_{thre}\) & \(k_{thre}\times L_{s2n_0}\lt L_{s2n}\lt k_{DZ}\times L_{s2n_0}\)) or (\(L_{s2n}\gt k_{DZ}\times L_{s2n_0}\))
Jumping\(y_{shoulder_M}\lt y_{DZ}\) & \(\dot{y}_{nose}\lt -\dot{y}_{thre}\)
Drinking Energy Drinkdh2m < Δdthre & θproj > αlarge
Drinking Painkiller Pillsdh2m < Δdthre & θproj < αsmall
Injecting Adrenalinedh2s < Δdthre & θproj > αlarge

Table 1: Criteria for Different Motions

Figure 9:

Figure 9: Graphic display for the criteria of triggering (a) peeking (both left and right) under a neutral state, (b) the release of right peeking when right peeking is performed, (c) the release of left peeking when left peeking is performed, (d) the drinking of energy drink, (e) the taking of painkiller, (f) the injection of adrenaline.

Figure 10:

Figure 10: Graphic display for the criteria of triggering (a) the release of couching when the torso is flexing, (b) crouching under a neutral state, (c) triggering jumping when the torso is moving up.

The free-looking is related to mouse interaction. The axial rotation of the head triggers the horizontal mouse movement, while the flexion-extension of the head leads to the vertical one. When the free-looking key, which is usually ’Alt’, is pressed, free-looking is considered to be occurring, and the values of head axial rotation angle θAR and flexion-extension angle θFE drive the view to change.

4.5.3 Keystroke and Mouse Movement Emulation.

When a specific motion is detected, the corresponding keystroke is emulated. Hardcore shooting games typically have two input modes for state-shifting movements: ’toggle’ or ’hold’. When a key is pressed in both modes, avatars perform a specific move. However, in the former mode, avatars remain in the posture after the key is released, while in the latter mode, they return to their normal state. In PUBG, the default mode for peeking is ’hold’ while crouching is in ’toggle’ mode. When the player’s head bends laterally, the ’peeking’ key is automatically pressed until the head returns to its neutral position. As for the move down of the head, when the threshold value is achieved, the key for ’crouching’ will be pressed and released. Also, when the head returns to the neutral position, the key will be pressed and released again. For each action in ’toggle’ mode, a Boolean parameter indicates whether the avatar is in a state. When the player performs a certain action and triggers a keystroke causing the state-shifting of the avatar, the parameter is set to true. On the other hand, when the avatar returns to a neutral position after another emulated keystroke, the parameter is set to false.

When it comes to the free-looking, when the ’Ctrl’ key is pressed, the mouse position should be set to: (10) \(\begin{equation} x_{mouse}=x_{mouse0}+k_x\times \theta _{AR} \end{equation} \) (11) \(\begin{equation} y_{mouse}=y_{mouse0}+k_y\times \theta _{FE}^{\prime } \end{equation} \) in which kx and ky are adjustable factors to control the mouse sensitivity while (xmouse0, ymouse0) is the initial position of the mouse. A flowchart of the CamTroller can be viewed in the appendix for more details.

4.6 Feasibility Validation Test

We conducted one pilot test to verify the CamTroller’s feasibility and one user study in the real gaming scenario (PUBG). Both the CamTroller software and game PUBG are running on the laptop ROG Zephyrus G14 (Processor: AMD Ryzen™ 7 6800HS, Display Card: AMD Radeon™ RX 6700S 8GB, Memory: 8GB DDR5) plugged with an external commodity webcam Rapoo C270AF (FOV: 85.8deg, Resolution: 1920*1080p) for image capturing. The game’s content is shown on an external monitor (Xiaomi Curved Gaming Monitor 30", model number: RMMNT30HFCW). A wireless gaming mouse (Logitech G304) and a wireless mechanical keyboard (Dareu EK810) are used as input devices for the tests. The participants sit in chairs facing the monitor, about 50 cm away.

Nine participants attended the feasibility test and were asked to perform eleven motions to drive the avatar in the game, which are: 1) Head Moving Up for jumping, 2) Head Moving Down for crouching, 3) Head Left Bending for left peeking, 4) Head Right Bending for right peeking, 5) Head Left Rotation for free looking left, 6) Head Right Rotation for free looking right, 7) Head Flexion for free looking down, 8) Head Extension for free looking up, 9) Drink Energy Drink, 10) Take Painkiller, and 11) Inject Adrenaline. Each motion was performed three times in total by each participant, which means each participant was required to perform 33 motions in total. During the test, each participant was required to follow the experimenter’s instructions to perform all 33 motions one by one in a fully random order. If the avatar performed the correct motion, it was considered successful.

The success rates for eleven motions are shown in Table 2. The overall success rate is 99.33%. This high success indicates that it is feasible for gaming from the perspective of success rate.

Table 2:
MotionSuccess Rate
Head Moving Up for jumping100%
Head Moving Down for crouching96.3%
Head Left Bending for left peeking100%
Head Right Bending for right peeking100%
Head Left Rotation for free looking left100%
Head Right Rotation for free looking right100%
Head Flexion for free looking down100%
Head Extension for free looking up100%
Drink Energy Drink100%
Take Painkiller96.3%
Inject Adrenaline100%

Table 2: The success rate of motions in feasibility validation test

4.7 Computation Power Occupation and Performance Test

We also conducted a test in PUBG to determine how much computation resources will be taken when running the CamTroller on the ROG Zephyrus G14 laptop. When playing in the training mode, the CamTroller took only 5% to 10% of the CPU without causing a noticeable frame-rate drop (average within 5 FPS). We also conducted a test on Kinect as a comparison. After connecting the Kinect to the same computer as an input device, the essential software for data collecting, Azure Kinect Viewer, occupied 30% to 40% of CPU usage and caused a large frame rate drop for over 30 FPS. The study by Shengmei Liu et al.[29] pointed out that the quality of experience (QoE) decreases when the frame rate drops. Moreover, the CamTroller was able to run at 25-30 FPS when playing PUBG, which means the response time of the CamTroller is about 0.033 s to 0.04s. Meanwhile, the Azure Kinect Viewer could only run within 5 FPS, which suggested that its response time is larger than 0.2s and would influence the gaming experience.

We also obtained the time for players to trigger different motions. The apparatus is the same as the feasibility test. When participants performed different motions in the test, a camera was set to record both the motion of the players and the screen. The video was used to measure the difference between the start time of players’ motion and the start time of the avatar’s motion, which was considered as the trigger time. Five participants attended this test and performed five types of motion (peeking, crouching, drinking energy drinks, taking painkillers, and injecting adrenaline) three times. The trigger time for peeking and crouching mostly falls into the range between 0.3s and 0.5s. The mean value for peeking and crouching is 0.39s (std=0.056) and 0.41s (std=0.059). As for boost item consumption, most participants trigger the motions of drinking energy, taking painkillers, and injecting adrenaline in the range from 0.5s to 0.8s. The mean value for these three motions is 0.68s (std=0.088), 0.66s (std=0.069), and 0.69s (std=0.083). The variation is relatively large since the trigger time is highly dependent on the participant’s motion speed. The boost item consumption motions have a longer trigger time than peeking and crouching since participants have to move their hands from the desk to the mouth or heart, costing more time.

Figure 11:

Figure 11: An illustration of boost items in three different approaches

Figure 12:

Figure 12: An illustration of an ideal peeking loop in three different approaches

Skip 5USER STUDY ON PERFORMANCE AND INTUITIVENESS Section

5 USER STUDY ON PERFORMANCE AND INTUITIVENESS

To compare the performance of CamTroller and the original keyboard and mouse input, We conducted two parts on the boosting items consumption actions and the peeking. We estimated the objective task performance for peeking and the subjective intuition estimation through the QUESI questionnaire for both peeking and boosting items tasks. The QUESI questionnaire [37] was adopted as it is more suitable for interaction system estimation. The INTUI questionnaire [55] was not chosen as it relates more to tangible products.

5.1 Apparatus and Participants

The apparatus is the same as the Feasibility Validation Test. A total of 18 participants without cervical or upper limb problems were recruited to attend the user study, and they were all students or employees from the university. All participants had experience playing PUBG or similar shooting games but were not professional players. Before participating in the experiment, they were required to pass a small test of the basic understanding of shooting game operations, such as using WASD keys to move and using the mouse to aim and shoot.

5.2 Procedure

5.2.1 Experiment part 1: Boost Item Consumption.

All participants listened to the experimenter’s introduction and watched a demonstration of how to trigger the boost item taken in the game with three approaches (Figure 11). Then, they sat on the chair with their hands on the mouse and the keyboard and underwent the calibration. Participants had three operation approaches to trigger the consumption of the boost item, which were 1) opening the inventor by pressing the ’tab’ key (called ’Basic’), which is the most popular approach in non-professional players confirmed by our survey; 2) pressing the ’back quote’ key to use the wheel panel (called ’Pro’), which is an advantaged approach and always adopted by professional players [56] 3) through CamTroller (called ’CamTroller’). Participants were instructed to consume each boost item (energy drink, painkiller, and adrenaline syringe) three times using each approach. In the end, the participants were required to fill out the QUESI questionnaire to report their experience using three different approaches.

5.2.2 Experiment part 2: Peeking.

After part 1, all participants were required to perform peeking in three different approaches (Figure 12), which were 1) (’Basic’) pressing the ’Q’ and ’E’ keys with the ring finger and the index finger, which lead the issue of ’finger shortage’. This operation is the most popular operation for non-professional players, which is proved by the previous survey. 2) (’Pro’) Pressing the ’Q’ and ’E’ keys with the middle finger, this method allows the simultaneous moving and peeking as shown in the red dashed rectangles in Figure 12, but head to perform according to the photos of such kind of key pressing postures. 3) (’CamTroller’) Lateral bending of the head is easier to perform than the ’Pro’ and allows walking with peeking.

The task of this part was to conduct the peeking between the left and right after a tree to take a glance at the enemy. The tree is wider than the avatar; therefore, the player needs to press the ‘AD’ keys to reach the tree’s left and right sides and then perform peeking. The player was asked to see an enemy that was crouching at a fixed position and ready to shoot at the player’s avatar. Participants were asked to perform the loop as soon as possible to avoid being hit by the enemy. Figure 12 illustrates the ideal loops of each approach with the same ideal key pressing period. Compared to Pro and CamTroller, the ideal performance of the Basic approach adopted by the majority of PUBG players is slow.

For each approach, a participant was required to train enough for Pro and CamTroller and then start their experiment. The participants can train for as long as they feel ready. Usually, the training takes 20 minutes in total. For the experiment, each participant was guided to conduct the task with three interaction approaches in a balanced order, each for around 90 seconds. Before starting the 90-second screen capture, a participant was asked to practice the peeking loop of the certain approach until both the participant and the experimenter felt he or she was performing smoothly. In the end, the participants were required to fill out the QUESI questionnaire to report their intuition-related experience of using three different approaches for the peeking task.

5.3 Data Analysis

All statistical tests are executed by SPSS 27. The QUESI results with five subscales on three approaches on boot item task, and peeking were statistically analyzed through a repeated measure ANOVA with a Greenhouse–Geisser correction and following post hoc pairwise comparisons with Bonferroni adjustment.

For the peeking performance, as in the whole 90-second video clip, there usually are a few errors, such as the avatar not being able to reach out to see the enemy. An error-free section with a length of 15 seconds with relatively stable performance was retrieved to measure the time of peeking on both sides. To estimate the time more precisely, the loop of peeking starts from the view, reaches the most left side to the right side, and goes back to the most left side view as an end as shown in Figure 12. The mean loop time is calculated as an indication of the mean performance of each participant. The mean performance time of each approach is tested through an independent-sample Kruskal-Wallis test with post hoc tests.

5.4 Results

Figure 13:

Figure 13: The illustration of subjective QUESI questionnaire results of (a) boost item task, (b) peeking task, and (c) a box plot of objective performance - loop time in peeking task

For boost item tasks, the within-subject effects of repeated measure ANOVA on QUESI average score of each subscale turned out that the approaches (F(1.69, 28.80)=16.79; p<0.001, partial eta squared = 0.496) are significant. The pairwise comparisons of approaches show that the CamTroller (M=4.596, SE=0.064) has a significantly higher mean intuitiveness score than Basic (M=3.220, SE=0.281, p<0.001) and Pro(M=3.472, SE=0.247, p<0.001), while the mean QUESI scores are not different between Basic and Pro (p=0.692). For each subscale, the mean score keeps the same significance structures between different approaches. The result shown in Figure 13 (a) indicated that the intuitiveness of the ’CamTroller’ operation is significantly higher than the ’Basic’ and ’Pro’ operations in all five sub-scales.

For the peeking task, the statistical results are similar to the boosting item task (Figure 13(b)). The repeated measure ANOVA also revealed significant within-subject effects on the average score of different approaches (F(1.916, 32.57)=16.362; p<0.001, partial eta squared = 0.490). The pairwise comparisons among the approaches showed that the CamTroller approach (M=4.678, SE=0.089) had a significantly higher mean QUESI score compared to the Basic approach (M=3.067, SE=0.299, p<0.001) and the Pro approach (M=3.341, SE=0.269, p<0.001). There was no significant difference in the mean QUESI scores between the Basic and Pro approaches (p=0.276). These findings were consistent across all five subscales, indicating that the CamTroller approach demonstrated significantly higher intuitiveness in operation compared to both the Basic and Pro approaches.

The loop time of each approach is shown in Figure 13(c). The independent-sample Kruskal-Wallis test showed that the approaches on time for performing a peeking loop have a significant effect (H=32.407, p<0.001). The following pairwise comparison with the Bonferroni correction showed that the CamTroller is comparable to the Pro operation (p=0.116). The mean loop time of Pro and CamTroller are significantly higher than the ’Basic’ operation (p<0.001 and p=0.001, respectively).

Skip 6DISCUSSION Section

6 DISCUSSION

Based on the NUI and natural mapping, this work first developed the concept of CamTroller, in which hybrid interaction switches between the most commonly used keyboard-mouse input system and motion gestures as an auxiliary tool. In this concept, a large group of keyboard-mouse users could be involved. Then, according to the investigation of current commercial games, we found there are several gaps that CamTroller could fill, and we hence applied the CamTroller concept to a real scenario in PUBG and validated the developed system contributing to both players’ objective gaming performance and subjective intuitiveness perception. In this part, we discussed our user study results, the generalizability of CamTroller in one-to-one avatar mapping PC games, the comparison between CamTroller and existing alternatives, and the limitations of this work.

6.1 User Study Results

The pairwise comparison on the QUESI questionnaire showed that the natural mapping using Camtroller not only performed better in the overall intuitiveness but also in each subscale of intuiveness: subjective mental workload, perceived achievement of goals, perceived effort of learning, familiarity, and perceived error rate. Based on these results, we believe the CamTroller could help players perform better when stuck in severe competition situations that require high levels of attention, focus, and decision-making in their brains. In such a situation, they need to rapidly assess the situation, evaluate multiple options, and choose the most appropriate actions to take, and even a little bit of release of mental workload can make a difference. Hence, although the CamTroller has no advantage over the Pro in simple keeping tasks, in complicated playing conditions, it is hypothesized that human performance with a CamTroller can be better than the professional player’s approach. Due to the difficulty of experimental control and performance measurement in a complicated playing situation, we did not achieve the data in such an environment. We would further enhance and evaluate people’s performance with CamTroller under more complicated playing conditions.

Gaming with CamTroller may also be beneficial for neck health. Long-time playing computer games with a keyboard and mouse may also cause health issues. Multiple research point out that the longtime usage of a keyboard and mouse can lead to discomfort, pain [7], and even injuries such as orthopedic [51] and tendinopathies [31]. A review has pointed out a negative impact of video game playtime on the musculoskeletal system in about 92% of studies [50]. Fine motor strain when performing repetitive actions is considered one reason behind the musculoskeletal hazards, while another is sitting for an extended period of time, often with poor posture. Hence, frequently stretching and conducting flexion, extension, and rotation of the cervical, thoracic, and lumbar spine are strongly recommended in Ahmed K. Emara et al.’s paper [6]. Therefore, it will be helpful if the players need to combine a motion controller for extra neck, arm, and body movement through a keyboard and mouse during the long period of game-playing.

6.2 The Generalizability of CamTroller for Other Games

Generally speaking, based on the design concept, CamTroller is applicable for all one-to-one avatar mapping PC games. For the specially designed CamTroller system for the PUBG game, a typical hardcore shooting game, we discussed here its generalizability to other games.

The easiest transplant games are other hardcore shooting games sharing similar avatar actions, like Arma 3, Insurgency, Escape from Tarkov, and the shooting controls in Cyberpunk 2077. As for other one-to-one avatar mapping PC games, for instance, in action role-play games, like Elden Ring, and Sekiro: Shadows Die Twice, dodging in different directions is commonly seen. CamTroller can be modified into the tool of trigger dodging to the direction in which the head is quickly moving. To be more specific, when the avatar needs to dodge right, the player can trigger it with a sudden move of the head towards the right. Also, using consumables is common in various types of games, and using gestures of hand to trigger the consumption will be easier and faster than traditional methods. Overall, among the recently released games, more of them are integrating modules that were previously available in different games. Just like Cyberpunk 2077, you can find features from traditional shooting games, action games, and role-play games. In other words, large-scale games are becoming freer and more complex. The gaming experience brought about by the complexity of this free combination of different game features is restricted by the limited interaction methods (keyboard and mouse) of computer games, so we introduce CamTroller to help lift this restriction.

6.3 The Comparison between CamTroller and Existing Alternatives

In different types of games, not only shooting games but also some action or role-playing games, developers have attempted to alleviate the high burden of remembering complicated key bindingsby indicating the keybindings on the UI and adding pop-up hints inside the games. For example, when the avatar is close to a door or a closet that can be opened, a tip will pop up to remind the player which key is for the corresponding action. Also, when the avatar is at low HP, a pop-up reminds the player which key is for consuming the healing items. This method has a good effect, except for the intense battle scenario. When engaging with enemies, players have no time to look at the indications on the UI to figure out which key should be pressed to perform a required action. Different from the layout and structure of buttons on the controller that can be easily distinguished from each other, keys on the keyboard have much higher similarity, making it extremely challenging to operate without looking at the keyboard especially for the players who are not proficient in touch-typing. Looking down to find the key in an intense battle will put the player in great danger since they lose the view, possibly leading to failure. Getting familiar with the keys and forming muscle memory to operate without looking down at the keyboard through repetitive practice is tedious and time-consuming. Compared to the current existing solution adopted by professional players, CamTroller is easier to learn and execute for novices.

Also, for the gamepad, which is typically used in action games. Gamepad does a good job of increasing the control ability of limited fingers. For example, only one thumb is enough to move the avatar in all four directions by the gamepad. In the keyboard, three fingers are needed for WASD control. However, as its control is not fast and accurate enough to aim a target by either the bottom or joystick compared to the mouse, the gamepad is not compatible with the shooting game. Besides, the gamepad occupies two hands most of the time, so once using the gamepad, players cannot use the keyboard and mouse simultaneously. They are forced to select one of them. CamTroller is an auxiliary tool. Although we only discuss it under the condition of gaming with keyboard and mouse input, it is possible to use the CamTroller to add more degree of freedom by head or some hand motions for the video games with gamepad input.

6.4 Limitations

While the CamTroller has demonstrated its potential to assist players in a more efficient and intuitive game experience, it still has certain limitations as a technical tool. One limitation is that the CamTroller operates as an open-loop control and does not receive any feedback from the gaming system. This means that it is unable to determine whether the motion is being initiated correctly or not. Sometimes the emulated input generated by CamTroller interferes with the input sent by the keyboard and gets ignored by the game. The approach proposed by Yu-Hsin Lin et al. for detecting video game events by monitoring the video, audio, and controller I/O [28] can be learned and migrated to the CamTroller to solve this problem. In addition, self- or auto-adjusted thresholds will be considered in further works.

Besides, for the validation part, more objective performance estimation and user experience questions could be involved. For example, the learning curve of learning using CamTroller and Pro actions; and the willingness to adopt CamTroller when they are playing games.

Skip 7CONCLUSION Section

7 CONCLUSION

By investigating the existing problems caused by complex avatar actions and the limited traditional PC game input, we find the gap for the NUI concept of Camtroller to fulfill as an auxiliary tool involving the extra degrees of freedom for controlling the avatar by natural mapping. Based on the natural mapping properties, CamTroller is applicable in the field of one-to-one avatar mapping PC games, where one player controls one avatar in the game.

To develop a proof-of-concept system of CamTroller, we choose the popular game PUBG as the platform. Based on the result of the feasibility test, it has been proven that a standard RGB camera, when paired with a machine learning algorithm, can seamlessly translate players’ movements to avatars in video games with exceptional accuracy. According to the user study, CamTroller helps to achieve significantly higher performance and intuitiveness than the basic operation that most PUBG players are using.

Mapping the motion naturally can significantly reduce the working memory burden and effectively alleviate the problem of ’Finger Shortage’. The player doesn’t have to memorize complex keyboard bindings or switch their fingers frequently to press desired keys with CamTroller. Composed motions that need multiple keys to be pressed simultaneously, previously only possible for professionals to perform, are now easily achieved by newcomers. It helps players without prior experience or with limited experience become accustomed to the ’realistic’ shooting game faster and more efficiently. Additionally, from a health perspective, gaming with head and hand motions can help players relieve pressure on the neck and shoulder. This is done by facilitating frequent movement, reducing potential cervical and shoulder pain caused by remaining in the same posture for a long time.

Although currently, CamTroller is focused on one-to-one avatar mapping games, the basic concept of using a commercial RBG camera with motion detection can also be extended to other situations such as video watching and document editing with more flexible interaction with the computer like performing ’shush’ gesture to mute the sound and swiping hand to turn page.

Skip ACKNOWLEDGMENTS Section

ACKNOWLEDGMENTS

The work described in this paper was fully supported by a grant from the Research Grants Council of the Hong Kong Special Administrative Region, China (Project No. GRF/PolyU 15607922).

Footnotes

  1. Both authors contributed equally to this research.

  2. Corresponding Author.

Skip Supplemental Material Section

Supplemental Material

Video Presentation

Video Presentation

mp4

202.7 MB

CamTroller Demo Video

This is a video demonstrating the use of CamTroller in the game PUBG

mp4

42.8 MB

References

  1. Monthir Ali and Rogelio E. Cardona-Rivera. 2020. Comparing Gamepad and Naturally-Mapped Controller Effects on Perceived Virtual Reality Experiences. In ACM Symposium on Applied Perception 2020 (Virtual Event, USA) (SAP ’20). Association for Computing Machinery, New York, NY, USA, Article 10, 10 pages. https://doi.org/10.1145/3385955.3407923Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. L Bharath, S Shashank, V S Nageli, Sangeeta Shrivastava, and S Rakshit. 2010. Tracking method for Human Computer Interaction using Wii Remote. In INTERACT-2010. IEEE, IEEE, New York, NY, USA, 133–137. https://doi.org/10.1109/INTERACT.2010.5706202Google ScholarGoogle ScholarCross RefCross Ref
  3. Alethea Blackler (Ed.). 2018. Intuitive interaction: Research and application. CRC Press, Boca Raton. https://doi.org/10.1201/b22191Google ScholarGoogle ScholarCross RefCross Ref
  4. Nikolai Bogduk and Susan Mercer. 2000. Biomechanics of the cervical spine. I: Normal kinematics. Clinical biomechanics 15, 9 (2000), 633–648. https://doi.org/10.1016/S0268-0033(00)00034-6Google ScholarGoogle ScholarCross RefCross Ref
  5. Delwin Clarke and P. Robert Duimering. 2006. How Computer Gamers Experience the Game Situation: A Behavioral Study. Comput. Entertain. 4, 3 (jul 2006), 6–es. https://doi.org/10.1145/1146816.1146827Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Ahmed K. Emara, Mitchell K. Ng, Jason A. Cruickshank, Matthew W. Kampert, Nicolas S. Piuzzi, Jonathan L. Schaffer, and Dominic King. 2020. Gamer’s health guide: optimizing performance, recognizing hazards, and promoting wellness in esports. Current sports medicine reports 19, 12 (2020), 537–545. https://doi.org/10.1249/JSR.0000000000000787Google ScholarGoogle ScholarCross RefCross Ref
  7. Garrick N Forman and Michael WR Holmes. 2023. Upper-Body Pain in Gamers: An Analysis of Demographics and Gaming Habits on Gaming-Related Pain and Discomfort. Journal of Electronic Gaming and Esports 1, 1 (2023), 1–8. https://doi.org/10.1123/jege.2022-0018Google ScholarGoogle ScholarCross RefCross Ref
  8. Rita Francese, Ignazio Passero, and Genoveffa Tortora. 2012. Wiimote and Kinect: Gestural User Interfaces Add a Natural Third Dimension to HCI. In Proceedings of the International Working Conference on Advanced Visual Interfaces (Capri Island, Italy) (AVI ’12). Association for Computing Machinery, New York, NY, USA, 116–123. https://doi.org/10.1145/2254556.2254580Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Rachel L. Franz, Jinghan Yu, and Jacob O. Wobbrock. 2023. Comparing Locomotion Techniques in Virtual Reality for People with Upper-Body Motor Impairments. In Proceedings of the 25th International ACM SIGACCESS Conference on Computers and Accessibility (New York, NY, USA) (ASSETS ’23). Association for Computing Machinery, New York, NY, USA, Article 39, 15 pages. https://doi.org/10.1145/3597638.3608394Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. FromSoftware. 2022. Elden Ring. Game [PlayStation 4/5, Xbox One/Series(X/S), Microsoft Windows].Google ScholarGoogle Scholar
  11. Battlestate Games. 2017. Insurgency. Game [Microsoft Windows].Google ScholarGoogle Scholar
  12. Rockstar Games. 2018. Red Dead Redemption 2. Game [PlayStation 4, Xbox One, Microsoft Windows, Stadia].Google ScholarGoogle Scholar
  13. Bill Gates. 2011. The power of the natural user interface. https://www.gatesnotes.com/the-power-of-the-natural-user-interfaceGoogle ScholarGoogle Scholar
  14. Kathrin M. Gerling, Matthias Klauser, and Joerg Niesenhaus. 2011. Measuring the Impact of Game Controllers on Player Experience in FPS Games. In Proceedings of the 15th International Academic MindTrek Conference: Envisioning Future Media Environments (Tampere, Finland) (MindTrek ’11). Association for Computing Machinery, New York, NY, USA, 83–86. https://doi.org/10.1145/2181037.2181052Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Kostas Gkikas, Dimitris Nathanael, and N Marmaras. 2007. The evolution of FPS games controllers: how use progressively shaped their present design. In Panhellenic Conference on Informatics (PCI) (Patras, Greece). Greek Computer Society (EPY), Greek Computer Society, Greek Computer Society (EPY), Athens, Greece, 37–46.Google ScholarGoogle Scholar
  16. Google. 2023. MediaPipe. https://developers.google.com/mediapipeGoogle ScholarGoogle Scholar
  17. Juan David Hincapié-Ramos, Xiang Guo, Paymahn Moghadasian, and Pourang Irani. 2014. Consumed Endurance: A Metric to Quantify Arm Fatigue of Mid-Air Interactions. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Toronto, Ontario, Canada) (CHI ’14). Association for Computing Machinery, New York, NY, USA, 1063–1072. https://doi.org/10.1145/2556288.2557130Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. id Software. 1996. Elden Quake. Game [MS-DOS, Linux, Microsoft Windows].Google ScholarGoogle Scholar
  19. Bohemia Interactive. 2013. Arma 3. Game [Microsoft Windows, OS X, Linux].Google ScholarGoogle Scholar
  20. New World Interactive. 2014. Insurgency. Game [Microsoft Windows, OS X, Linux].Google ScholarGoogle Scholar
  21. Hasan Iqbal, Seemab Latif, Yukang Yan, Chun Yu, and Yuanchun Shi. 2021. Reducing arm fatigue in virtual reality by introducing 3D-spatial offset. IEEE Access 9 (2021), 64085–64104.Google ScholarGoogle ScholarCross RefCross Ref
  22. Sujin Jang, Wolfgang Stuerzlinger, Satyajit Ambike, and Karthik Ramani. 2017. Modeling Cumulative Arm Fatigue in Mid-Air Interaction Based on Perceived Exertion and Kinetics of Arm Motion. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems (Denver, Colorado, USA) (CHI ’17). Association for Computing Machinery, New York, NY, USA, 3328–3339. https://doi.org/10.1145/3025453.3025523Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Steven E. Jones and George K. Thiruvathukal. 2012. Codename Revolution: The Nintendo Wii Platform. The MIT Press, Cambridge, Massachusetts, USA.Google ScholarGoogle ScholarCross RefCross Ref
  24. Yujin Jung, Donghoon Kang, and Jinwook Kim. 2010. Upper body motion tracking with inertial sensors. In 2010 IEEE International Conference on Robotics and Biomimetics. IEEE, IEEE, New York, NY, USA, 1746–1751. https://doi.org/10.1109/ROBIO.2010.5723595Google ScholarGoogle ScholarCross RefCross Ref
  25. Jongshin Kim, Kyoung Won Nam, Ik Gyu Jang, Hee Kyung Yang, Kwang Gi Kim, and Jeong-Min Hwang. 2012. Nintendo Wii remote controllers for head posture measurement: accuracy, validity, and reliability of the infrared optical head tracker. Investigative ophthalmology & visual science 53, 3 (2012), 1388–1396. https://doi.org/10.1167/iovs.11-8329Google ScholarGoogle ScholarCross RefCross Ref
  26. Steven M. LaValle, Anna Yershova, Max Katsev, and Michael Antonov. 2014. Head tracking for the Oculus Rift. In 2014 IEEE International Conference on Robotics and Automation (ICRA) (Hong Kong, China). IEEE, IEEE, New York, NY, USA, 187–194. https://doi.org/10.1109/ICRA.2014.6906608Google ScholarGoogle ScholarCross RefCross Ref
  27. Johnny Chung Lee. 2008. Hacking the Nintendo Wii Remote. IEEE Pervasive Computing 7, 3 (2008), 39–45. https://doi.org/10.1109/MPRV.2008.53Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Yu-Hsin Lin, Yu-Wei Wang, Pin-Sung Ku, Yun-Ting Cheng, Yuan-Chih Hsu, Ching-Yi Tsai, and Mike Y. Chen. 2021. HapticSeer: A Multi-Channel, Black-Box, Platform-Agnostic Approach to Detecting Video Game Events for Real-Time Haptic Feedback. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems (, Yokohama, Japan, ) (CHI ’21). Association for Computing Machinery, New York, NY, USA, Article 418, 14 pages. https://doi.org/10.1145/3411764.3445254Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Shengmei Liu, Atsuo Kuwahara, James J Scovell, and Mark Claypool. 2023. The Effects of Frame Rate Variation on Game Player Quality of Experience. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (Hamburg, Germany) (CHI ’23). Association for Computing Machinery, New York, NY, USA, Article 573, 10 pages. https://doi.org/10.1145/3544548.3580665Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Mitchell W. McEwan, Alethea L. Blackler, Daniel M. Johnson, and Peta A. Wyeth. 2014. Natural Mapping and Intuitive Interaction in Videogames. In Proceedings of the First ACM SIGCHI Annual Symposium on Computer-Human Interaction in Play (Toronto, Ontario, Canada) (CHI PLAY ’14). Association for Computing Machinery, New York, NY, USA, 191–200. https://doi.org/10.1145/2658537.2658541Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Caitlin McGee and Kevin Ho. 2021. Tendinopathies in video gaming and esports. Frontiers in sports and active living 3 (2021), 689371.Google ScholarGoogle Scholar
  32. Ahmad Merheb. 2023. PC Games Statistics (Updated Statistics). https://ahmadmerheb.com/pc-games-statistics/Google ScholarGoogle Scholar
  33. Ditte Hvas Mortensen. 2023. Natural user interfaces – what does it mean & how to design user interfaces that feel naturaly. https://www.interaction-design.org/literature/article/natural-user-interfaces-what-are-they-and-how-do-you-design-user-interfaces-that-feel-naturalGoogle ScholarGoogle Scholar
  34. MovSens. 2017. TrackIMU: Head Tracking For Video Games Using IMU. https://www.hackster.io/movsensllc/trackimu-head-tracking-for-video-games-using-imu-7b6dafGoogle ScholarGoogle Scholar
  35. Akihito Nakai, Aung Pyae, Mika Luimula, Satoshi Hongo, Hannu Vuola, and Jouni Smed. 2015. Investigating the effects of motion-based Kinect game system on user cognition. Journal on Multimodal User Interfaces 9 (2015), 403–411. https://doi.org/10.1007/s12193-015-0197-0Google ScholarGoogle ScholarCross RefCross Ref
  36. NaturalPoint. 2023. Trackir. https://www.trackir.com/trackir5/Google ScholarGoogle Scholar
  37. Anja Naumann and Jörn Hurtienne. 2010. Benchmarks for Intuitive Interaction with Mobile Devices. In Proceedings of the 12th International Conference on Human Computer Interaction with Mobile Devices and Services (Lisbon, Portugal) (MobileHCI ’10). Association for Computing Machinery, New York, NY, USA, 401–402. https://doi.org/10.1145/1851600.1851685Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Nintendo. 2017. Super Mario Odyssey. Game [Nintendo Switch].Google ScholarGoogle Scholar
  39. Nintendo. 2017. The Legend of Zelda: Breath of the Wild. Game [Nintendo Switch].Google ScholarGoogle Scholar
  40. Pujana Paliyawan and Ruck Thawonmas. 2017. UKI: universal Kinect-type controller by ICE Lab. Software: Practice and Experience 47, 10 (2017), 1343–1363. https://doi.org/10.1002/spe.2474Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. CD Projekt. 2020. Cyberpunk 2077. Game [PlayStation 4/5, Xbox One/Series(X/S), Microsoft Windows, Stadia].Google ScholarGoogle Scholar
  42. Meera Radhakrishnan, Kushaan Misra, and V. Ravichandran. 2021. Applying “Earable” Inertial Sensing for Real-time Head Posture Detection. In 2021 IEEE International Conference on Pervasive Computing and Communications Workshops and other Affiliated Events (PerCom Workshops) (Kassel, Germany). IEEE, IEEE, New York, NY, USA, 176–181. https://doi.org/10.1109/PerComWorkshops51409.2021.9430988Google ScholarGoogle ScholarCross RefCross Ref
  43. Joe Rice-Jones. 2018. Review: Tilted - is this the world’s most versatile PC gaming wearable?https://knowtechie.com/review-tilted-gaming-wearable/Google ScholarGoogle Scholar
  44. Daniel M Shafer, Corey P Carbonara, and Lucy Popova. 2014. Controller required? The impact of natural mapping on interactivity, realism, presence, and enjoyment in motion-based video games. Presence: Teleoperators and Virtual Environments 23, 3 (2014), 267–286. https://doi.org/10.1162/PRES_a_00193Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Daniel M. Shafer, Corey P. Carbonara, and Lucy Popova. 2014. Controller Required? The Impact of Natural Mapping on Interactivity, Realism, Presence, and Enjoyment in Motion-Based Video Games. Presence: Teleoperators and Virtual Environments 23, 3 (10 2014), 267–286. https://doi.org/10.1162/PRES_a_00193Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Paul Skalski, Ron Tamborini, Ashleigh Shelton, Michael Buncher, and Pete Lindmark. 2011. Mapping the road to fun: Natural video game controllers, presence, and game enjoyment. New Media & Society 13, 2 (March 2011), 224–242. https://doi.org/10.1177/1461444810370949Google ScholarGoogle ScholarCross RefCross Ref
  47. Torben Sko, Henry Gardner, and Michael Martin. 2013. Studying a Head Tracking Technique for First-Person-Shooter Games in a Home Setting. In Human-Computer Interaction – INTERACT 2013, Paula Kotzé, Gary Marsden, Gitte Lindgaard, Janet Wesson, and Marco Winckler (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 246–263.Google ScholarGoogle ScholarCross RefCross Ref
  48. Torben Sko and Henry J. Gardner. 2009. Head Tracking in First-Person Games: Interaction Using a Web-Camera. In Human-Computer Interaction – INTERACT 2009, Tom Gross, Jan Gulliksen, Paula Kotzé, Lars Oestreicher, Philippe Palanque, Raquel Oliveira Prates, and Marco Winckler (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 342–355.Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. PUBG Studios. 2017. PUBG: Battlegrounds. Game [Microsoft Windows].Google ScholarGoogle Scholar
  50. Chuck Tholl, Peter Bickmann, Konstantin Wechsler, Ingo Froböse, and Christopher Grieben. 2022. Musculoskeletal disorders in video gamers–a systematic review. BMC musculoskeletal disorders 23, 1 (2022), 1–16. https://doi.org/10.1186/s12891-022-05614-0Google ScholarGoogle ScholarCross RefCross Ref
  51. P Truong, L Truong, TKK Le, and K Kuklova. 2020. Orthopedic injuries from video games: a literature review and implications for the future. Int Arch Orthop Surg 3, 2 (2020), 20. https://doi.org/10.23937/2643-4016/1710020Google ScholarGoogle ScholarCross RefCross Ref
  52. Phil Turner. 2008. Towards an account of intuitiveness. Behaviour & Information Technology 27, 6 (Nov. 2008), 475–482. https://doi.org/10.1080/01449290701292330Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Wilde Tyler. 2021. How WASD became the standard PC control scheme. PC Gamer. https://www.pcgamer.com/how-wasd-became-the-standard-pc-control-scheme/Google ScholarGoogle Scholar
  54. Mauricio Ubilla, Domingo Mery, and Rodrigo F. Cádiz. 2010. Head Tracking For 3d Audio Using The Nintendo Wii Remote. In Proceedings of the 2010 International Computer Music Conference (New York, USA), Vol. 2010. Michigan Publishing, Ann Arbor, Michigan, USA, 494–497.Google ScholarGoogle Scholar
  55. Daniel Ullrich and Sarah Diefenbach. 2010. INTUI. Exploring the Facets of Intuitive Interaction. In Mensch & computer, Ziegler Jürgen and Schmidt Albrecht (Eds.). Vol. 10. Oldenbourg Verlag, Düsseldorf, Germany, 251–260. https://doi.org/10.1524/9783486853483.251Google ScholarGoogle ScholarCross RefCross Ref
  56. WackyJacky101. 2018. GUIDE: How to fluently LEAN & PEEK (Using Q and E) in PUBG +Keyboard Cam. https://www.youtube.com/watch?v=aBfF5rK6CFwGoogle ScholarGoogle Scholar
  57. Shuo Wang, Xiaocao Xiong, Yan Xu, Chao Wang, Weiwei Zhang, Xiaofeng Dai, and Dongmei Zhang. 2006. Face-Tracking as an Augmented Input in Video Games: Enhancing Presence, Role-Playing and Control. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Montréal, Québec, Canada) (CHI ’06). Association for Computing Machinery, New York, NY, USA, 1097–1106. https://doi.org/10.1145/1124772.1124936Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Zehan Wang. 2011. Using a Wii remote for Finger Tracking and Gesture Recognition (1st ed.). LAP LAMBERT Academic Publishing, London, United Kingdom.Google ScholarGoogle Scholar
  59. WePC. 2023. Video Game Industry Statistics, Trends and Data In 2023. https://www.wepc.com/news/video-game-statistics/Google ScholarGoogle Scholar
  60. Daniel Wigdor and Dennis Wixon. 2011. Brave NUI World: Designing Natural User Interfaces for Touch and Gesture (1st ed.). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA.Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. Zhengyou Zhang. 2012. Microsoft kinect sensor and its effect. IEEE multimedia 19, 2 (Feb. 2012), 4–10. https://doi.org/10.1109/MMUL.2012.24Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. Huiyu Zhou and Huosheng Hu. 2005. Inertial motion tracking of human arm movements in stroke rehabilitation. In IEEE International Conference Mechatronics and Automation, 2005, Vol. 3. IEEE, IEEE, New York, NY, USA, 1306–1311 Vol. 3. https://doi.org/10.1109/ICMA.2005.1626742Google ScholarGoogle ScholarCross RefCross Ref
  63. Huiyu Zhou, Thomas Stone, Huosheng Hu, and Nigel Harris. 2008. Use of multiple wearable inertial sensors in upper limb motion tracking. Medical engineering & physics 30, 1 (2008), 123–133. https://doi.org/10.1016/j.medengphy.2006.11.010Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. CamTroller: An Auxiliary Tool for Controlling Your Avatar in PC Games Using Natural Motion Mapping

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      CHI '24: Proceedings of the CHI Conference on Human Factors in Computing Systems
      May 2024
      18961 pages
      ISBN:9798400703300
      DOI:10.1145/3613904

      Copyright © 2024 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 11 May 2024

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed limited

      Acceptance Rates

      Overall Acceptance Rate6,199of26,314submissions,24%

      Upcoming Conference

      CHI PLAY '24
      The Annual Symposium on Computer-Human Interaction in Play
      October 14 - 17, 2024
      Tampere , Finland
    • Article Metrics

      • Downloads (Last 12 months)376
      • Downloads (Last 6 weeks)376

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format .

    View HTML Format