Development of Mouse System for Physically Disabled Person by Face Movement Using Kinect

We developed a system which operates a computer by a facial movement using Kinect. Recognition of a facial movement of a person makes it possible to operate a computer. We can move a mouse cursor by changing the face direction, and we can carry out an operation of mouse click by recognizing an open mouth or a closed eye. In this paper, we evaluated the effect on operability due to the face direction and recognition rate due to distance.


Introduction
Physically disabled people cannot move their limbs freely.It is difficult for a physically disabled person to use a computer.The importance of the computer with the information society increases as a current social background.Therefore, support of the computer operation is required.The interface for them has been developed so that they can use a computer recently.There are two types of the interface.One is the contacttype which operates while attaching the device to a body.Another is the noncontact-type which operates by recognizing the movement of a body.Although the contact-type has the advantage of easy detection of body movement, the user must attach it directly during use.On the other hand, noncontact-type must adjust parameters for a user.To resolve these problems reduce a burden for the user.
One of the possible physical movements of a physically disabled person is facial movement (This means the face direction and open and closed mouth or eyes).If a computer operates with recognizing the facial movement, the physically disabled person can use it.We developed a system which operates a computer by the facial movement.Our system uses Kinect for obtaining the face direction and extracting feature points of the face.We can move a mouse cursor by changing the face direction.We can carry out an operation of mouse click by recognizing an open mouth or closed eye.This reduces the burden on the user.

Kinect
Kinect is a motion capture device developed by Microsoft Corporation.RGB camera, depth sensor, and microphone array is mounted in Kinect, it can obtain RGB images, depth images, and audio information.Moreover, joint positions of the whole body can be estimated from the obtained depth images.to extract detailed facial feature points, and there are 1347 feature points that it can extract.

System overview
Our system includes Kinect for operating a computer.A user moves his face towards Kinect.The movement is recognized on the basis of information acquired using Kinect.It is possible to operate a computer.Our system has three functions for a physically disabled person.Firstly, it can control a mouse cursor along face direction.Secondly, it can perform a mouse click by recognizing an open mouth.Finally, it can carry out a mouse click by judging the open or closed eyes.In the following subsections, these functions are explained in detail.

Control of a mouse cursor by face direction
We acquire angles of face direction using Face Tracking of Kinect for Windows SDK 2.0.The angles to use are ℎ and express a value of the angle (degree) for vertical and horizontal face direction.If the user turn his face toward Kinect, the values of the face direction are respectively, pitch = 0, yaw = 0.However, the user watches a monitor while using a computer.Therefore, we change the origin of face direction coordinate to the center of the monitor.We consider the difference between the angle on watching a monitor and the angle on watching Kinect.As shown in Fig. 1, the difference of the angles is expressed using the following equation: where y denotes a length between the center of monitor and Kinect, and z denotes a distance of the depth between a face of the user and Kinect, which is obtained using Kinect.We define the face direction vector a as: a = ( yaw, pitch -θ ) .The control of a mouse cursor is operated following the face direction vector a.It is moved in the same direction as the vector.The moving speed of a mouse cursor is changed according to the size of the vector.When |a| < 10, we define the face is turning to the front.Accordingly, a mouse cursor isn't moved.

Recognition of open mouth
We extract four feature points of mouth (top, bottom, left and right) using HD face of Kinect for Windows SDK 2.0, as shown in Fig. 2. The vertical length of the mouth is found by feature point coordinates of the top and bottom.The horizontal length of the mouth is found by feature point coordinates of the left and right.The rate of the open mouth Rm is expressed as Rm = hm / wm where hm and wm denote the vertical and horizontal length of the mouth.Let Thm be the threshold for judgment of opened or closed mouth.When Rm > Thm, it is recognized that the mouth is opened.In the present study, the value of Thm is set to 0.4, which was determined experimentally.

Judgment of opened and closed eyes
We extract four feature points of eyes (top, bottom, left and right) using HD face of Kinect for Windows SDK 2.0, as shown in Fig. 3.The range of X-axis is the difference of x-coordinate between the left and right, and the range of Y-axis is the difference of y-coordinate between the top and bottom.We define these ranges as eye region.
The judgment of opened or closed eye uses binary images.For the binarization, at first, the detection of the eye region is performed in a RGB image.It is converted from the RGB data to the luminance value.Next, we make the histogram of luminance values in the eye region.The threshold for judgment of opened or closed eye is decided using discriminant analysis method 2 from the histogram.Our system applies the threshold in the first frame, and it is used for the judgment after the next frame.It makes the luminance value a two level with the threshold.The binary image is made by the above procedure.
The vertical and horizontal lengths of the eye are settled from the binary image.We make the histogram Fig. 2. Feature points of mouth extracted using Face Tracking and the vertical and horizontal length.The rate of the open eye Re is given by Re = he / we where he and we denote the vertical and horizontal length of the eye.Let The be the threshold for judgment of opened or closed eye.When Re > The , the eye is judged to be opening state.When Re < The , the eye is judged to be closing state.In the present study, the value of The is set to 0.3, which was determined experimentally.

Mouse click processing
Mouse click processing is performed by recognizing the intentional movement of the user.The movement is the following conditions: When the duration time of opening mouth state reach for t1, When the duration time of closing eye state reach for t2, where t1 and t2are arbitrary times.In the present study, the value of t1 and t2 are set to 1.0 s and 0.8 s, respectively.Mouse click processing is carried out only in the case of |a| < 10.

Conditions
The experiment was performed in the following computational environment: the PC was a HP ENVY 700-560jp (CPU: Intel(R) Core(TM) i7-4790 CPU 3.60GHz, memory: 8.00GB); the OS was Microsoft Windows 8.1 Pro; the development language was Microsoft Visual C++ 2013 Express Edition.The image was produced by Xbox One Kinect sensor which was placed in front of the PC, as shown in Fig. 4.

Method
We conducted three kinds of experiments which ware performed by five subjects, respectively.For the first experiment, we measured times that a mouse cursor moved by face direction from a circle drawn on screen to another circle drawn in the distance of 20 cm.The subject moved to eight directions (vertical, horizontal, and diagonal), respectively.This experiment was conducted to assess the operability by difference of direction.For the second experiment, we examined the recognition rate of mouse click by open mouth.The distance between Kinect and the subjects was conducted at 0.6m, 0.8m, 1.0m, 1.5m, and 2.0m for evaluating influence to the recognition rate by the difference in distance.The subjects perform three times of opening movement for one second.The recognition rate is the proportion that click processing is carried out in fact.For the last experiment, we conducted experiment to judge opened or closed eyes.This experiment was conducted at distance of 0.6m, 0.8m, 1.0m, 1.5m, and 2.0m between Kinect and subjects for evaluating influence to the judgment accuracy by the difference in distance.We evaluate the accuracy of judgment with Fmeasure.F-measure is expressed the following equation: where P, R and F denote the rate of frames which is really closing eye among frames judged as closed state, the rate of frames judged as closing state among frames which is closing eye ,and the harmonic mean of precision rate P and recall rate R. We performed judgment of closed eye by visual observation.

P -126
respectively.It is probably due to the frame of the subject's glasses which interrupt his operation.

Mouse click recognition by open mouth
Fig. 6

4.3.3.Judgment accuracy of opened and closed eyes
Fig. 7 shows precision rate, recall rate, and F-measure of judging opened and closed eyes.The F-measure at 0.8m was the highest value.Conversely, the F-measure at 1.5m and over greatly decreases.One of the failure cases were that the feature points extracted at 1.5m and over were out of alignment.The eye region which got out of position was not correctly judged.Moreover, there were that the shadow around a closed eye was described black in the binary image.Therefore, it was the reason that the rate of eye Re didn't fall below The.

Conclusion
We developed a mouse system for physically disabled people using Kinect.In this system, the user can move a mouse cursor by face direction and perform a mouse click processing by opening the user's mouth or closing the user's eye.As the results, there was not the conspicuous influence to recognition of mouse click processing by open mouth.F-measure of eye judgment decreased greatly at 1.0m over.However, because computer operation is usually performed at 1.0m or less, it can be operated by facial movement.As an area of future work, we intend to develop additional functions by recognizing other facial movements.

Fig. 1 .
Fig. 1.Positional relation of the user and Kinect, which is looked from the side.User Kinect

Fig. 5
Fig.5shows the time that took to move from a circle to another circle.From averaging of the moving time of 5 subjects, it was the movement of bottom direction to take the longest time.The second and third were the movement of bottom right and bottom left direction,

Fig. 7 .
Fig. 7. Precision rate, recall rate, and F-measure in the experiment about judging opening and closing eyes.
shows recognition rate of mouse click by open mouth.The recognition rate at 0.8m was 86.7%, which was the highest rate.In contrast, the recognition rate at 1.5m was 73.3%, which was the lowest rate.The cause of the recognition failure was that the rate of open mouth felt below the threshold momentarily because the subject didn't open his mouth of adequate size.