Hand gesture recognition on python and opencv

Hand gesture recognition is one of the system that can detect the gesture of hand in a real time video. The gesture of hand is classify within a certain area of interest. In this study, designing of the hand gesture recognition is one of the complicated job that involves two major problem. Firstly is the detection of hand. Another problem is to create the sign that is suitable to be used for one hand in a time. This project concentrates on how a system could detect, recognize and interpret the hand gesture recognition through computer vision with the challenging factors which variability in pose, orientation, location and scale. To perform well for developing this project, different types of gestures such as numbers and sign languages need to be created in this system. The image taken from the realtime video is analysed via Haar-cascaded Classifier to detect the gesture of hand before the image processing is done or in the other word to detect the appearance of hand in a frame. In this project, the detection of hand will be done using the theories of Region of Interest (ROI) via Python programming. The explanation of the results will be focused on the simulation part since the different for the hardware implementation is the source code to read the real-time input video. The developing of hand gesture recognition using Python and OpenCV can be implemented by applying the theories of hand segmentation and the hand detection system which use the Haar-cascade classifier.


Introduction
In a day-to-day life, hand gesture recognition is one of the system that can detect the gesture of hand in a real time video. The gesture of hand is classify within a certain area of interest. Designing a system for hand gesture recognition is one of the goal of achieving the objectives of this project. The task of recognizing hand gestures is one of the main and important issues in computer vision. With the latest advances in information and media technology, human computer interaction (HCI) systems that involve hand processing tasks such as hand detection and hand gesture recognition.
The first step in any hand processing system is to detect and locate the hand in the real-time video from the webcam. The detection of hand is challenging because of variation in pose, orientation, location and scale. Also, different intensity of light in the room adds to the variability. In the process of detection of hand, according to Mohamed [1], hand gesture recognition generally involves multiple levels such as image acquisition, pre-processing, feature extraction and gesture recognition. Image acquisition involve capturing image in the video frame by frame using a webcam. The captured images go through the image pre-processing process which involves color filtering, smoothing and thresholding. Feature extraction IOP Publishing doi: 10.1088/1757-899X/1045/1/012043 2 is a method that extracting features of the hand image such as hand contours while gesture recognition is a method to recognize hand gesture by extracting the features.
In this study, designing of the hand gesture recognition is one of the complicated job that involves two major problem. Firstly is the detection of hand. User hand is detected by using webcam in real-time video. The problem would be the unstable brightness, noise, poor resolution and contrast. The detected hand in the video are recognized to identify the gestures. At this stage, the process involves are the segmentation and edge detection. According to Karishma [2], with various information of image like color, hand posture and shape based (shape of hand) in a real time would affect the recognition of gestures. Another problem is to create the sign that is suitable to be used for one hand in a time. The extraction of hand need to be followed to determine each number and sign used. According to Amiraj [3], the extraction of hand involves the contour and convexity defects. Convexity defect gives an issue on how to calculate the depth of the defects. Some of the defects have far greater depth than others so to identify the depth would include some equations. The list of objectives that will need to be achieve for this project: (1) to establish a complete system for detecting, recognizing and interpreting hand gesture recognition through computer vision using Python and OpenCV, and (2) to create the numbers and sign languages of hand gesture shown in the system that will meets the name of the project.
This study comprises on how to implement a complete system that can detect, recognizing and interpreting the hand by using Python and OpenCV in any intensity of light, pose or orientation of hand. In order to accomplish this, a real-time gesture based system is developed. In the proposal stage, designing a system that could detect the hand through the contour of the hand. The contour of the hand refers to the curve for a two variables for the function along which that function has a contact value that is not changed. Besides, to detect the appearance of hand in a frame of the real-time video, I will be using the Haar-cascade Classifier to track the appearance of hand before the image-processing is done. The result of the appearance of hand that the system could detect will be validated for analysis. For hardware implementation section, a wired web camera is used thus the hand gesture recognition can be done and implemented.
For this project, the prototype itself has its own limitation in terms of hardware implementation. First of all, the video capturing device used for this project can only configure up to 640 by 480 pixel resolutions. This might be the disadvantage of getting the hardware to properly work for detecting the hand. To solve this, the web camera must be in a fixed position up to 30 inches (75 cm) to help capture the video more efficiently. Another limitation to this project is the variation in plane and pose. Variations can be classified as rotation, translation and scaling of camera pose [4], [5]. The direction of the web camera is important to obtain a good video without any shadowing effects and the intensity of light in a room is enough. According to Ray [6], by selecting the correct light intensity and direction, the shadows can be reduced to a minimum.

Methodology
Hand gesture recognition project is done using Python programming language and OpenCV [7], [8] as library. Python programming language produces simple and easy system code to understand. Also, Python package used here is Numpy [9], [10]. The image that is captured using web camera will be processed in a region called as Region of Interest (ROI) where act as a region of wanted area while ignoring the outside region, called background [11].

Hand Segmentation Methods
According to Nobuyuki Otsu [12], segmentation is done to convert gray scale image into binary image so that only two object in image which one is hand and other is background. Otsu notes again, this algorithm is used for segmentation purpose and gray scale images are converted into binary image consisting hand or background. A very good segmentation is needed to select an adequate threshold of gray level for extract hand from background for example there is no part of hand should have background IOP Publishing doi:10.1088/1757-899X/1045/1/012043 3 and background also should not have any part of hand. In general, the selection of an appropriate segmentation algorithm depends largely on the type of images and the application areas. The Otsu segmentation algorithm was tested and found to give good segmentation results for the hand gesture. Otsu algorithm is nonparametric and unsupervised method of automatic threshold selection. The basic theory of segmentation of hand is by using the equation as follow: (1) where the pixels of a given picture be represented in L gray levels[1, 2, 3, … … … … … . , ‫]ܮ‬ in a digital image of size ‫ܯ‬ × ܰ, and number of pixels at level ݅ is denoted by ݊݅ and the total number of pixels in the image is ‫ܰܯ‬ = ݊1 + ݊2 + ݊3 + ݊4 … … … … . . ‫ܮ݊‬ . The normalized histogram has components of ‫.‬ The formula above was adapted from Rafael [13]. To apply the formula (1), the image of each frame that has been captured from real-time video are considered as pixels and being compared to each other of the frames. As each frames contains equal value of colour in a pixel, the pixel is then considered as eliminated or removed. These applied to other pixels within each frame, and each frames are compared with respect to time. As any of the pixels between two frames do not match with each other, the pixel at the first frame retain its contain. Otsu's thresholding method involves iterating through all the possible threshold values and the measurement of spread for the pixel level for each side of the threshold is calculate for example pixels that either fall in foreground or background. The aim is to find the threshold value where the sum of foreground and background spreads is at its minimum. Figure 1 shows how the hand image is obtained from the real-time video in a gray scale filter and then the image is converted for Otsu's algorithm that showing the thresholded image. The limitation of the Otsu's thresholding method is that as I previously mentioned, this algorithm only consider a bimodal image where bimodal image is an image whose histogram has two peaks. For that image, Otsu's does take a value in the middle of those peaks as threshold value. Then, it automatically calculates a threshold value from image histogram for bimodal image. This only work when a frame capturing a frame from video and converting the RGB format into gray scale filter.

Track Gesture Of Hand Using Haar-Cascade Classifier
In order for the program to continuously detect the hand at all times, especially when the hand is in static or not moving around, the program needs to learn the hand image. This is where the machine learning comes in. In this project, the machine learning technique was performed to track the gesture of hand before the image processing is done. As suggested by Sander [14], the suitable machine learning technique in term of image processing technique that is much available to use is called as cascade classifier. According to Viola [15], the cascaded classifier are actually the program features for the machine to know the exact object that will need to classify in each frames of images. One example that will be use in this project is Haar-cascaded type. Viola [15] again suggested that Haar-cascaded classifier was the simple classifier for machine learner to detect any object.

Region of Interest (ROI)
Region of Interest, or commonly abbreviated as ROI, is a region that will be considered as the region of wanted area while ignoring the outside region, called background. As proposed by Sander [14], to detect the availability of hand recognition, the detected region must be located the same as the region of interest. In other words, as the classifier tracks the wanted object, in this case the bounding rectangle around the hand contour, the area of hand must overlap with the region of interest. By using the conditional method of overlapping region, the system can identify whether the region selected, in this case the rectangle around the hand contour, is overlap with other active region, in this case the hand itself.
By developing this theoretical part of ROI to the simulation and hardware implementation, the process of detecting the appearance of hand will be much simplified compared to the other method like in [16]. Figure 2 shows the criteria for ROI to overlap that can be considered as overlapping ROIs. These criteria are needed to make sure that the program can identify the detection of hand gesture.  [16] For this project the Region of Interest (ROI) is used to see the gesture of hand because the implementation for the ROI is suitable to detect the overlapping two regions on it. Besides, the system will detect only two outputs for each hand gesture recognition. This can be explained in Table 1 as follow: Table 1. Application of Region of Interest (ROI).

Region of Interest (ROI)
Region implemented for

ROI1
An active region, in this case the hand itself.

ROI2
The rectangle sub window on screen. This region is static or not movable outside the frame of the realtime video and it highlights the area that involves.

Implementation on Simulation and Hardware prototype
The detection of hand gesture was calculated through the space consumption within the area between the convex hull and contour of hand. Convex hull is apply to find the edge of user hand by forming descriptor in which the smallest convex set contains the edges. In this project, the OpenCV library files that is suitable for any of image processing techniques are used. By referring to the flowchart in Figure 3, generating a Python file for both simulating and hardware prototype can be implemented. The main program action to detect the appearance and the gesture of hand is by using the theory of Region of Interest (ROI). For programs to track the rectangle around the hand contour, the program required to identify whether the selected area of interest, referring to the case of the project, the rectangle will overlap with the detection of hand. Uncertainty it is overlap, it is considered as showing the type of gestures such as number or sign and the area within the area of region of interest. Uncertainty there is not overlap (outside the region of interest), the program can be considered as "Show your hand in a box". To plan the region of interest for this project, a group of coordinate will be recognized to present the area of bounding rectangle around the hand contour. For this project, there is an area of bounding rectangle, therefore there is a region of interest to be implemented with each type of gestures. After generate the Python main file, now the simulation begins where the detection of hand will be tested thus recognizing the type of gestures shown based on the selected area of interest. To run the Python file, the OpenCV environment must be link and run through the command window.
Then, the same procedures applied for the hardware prototype. For this prototype, USB video web camera will be (a), (b), (c), and (d) Keys: ‫1ܫܱܴ‬ ‫2ܫܱܴ‬ used as Nayana [17]. The source code in the Python core need to be modified in the video capture section to change the camera from the laptop web camera to the USB Video web camera. The procedures of detecting and analysing the gesture of hand using the USB video web camera remains the same during the simulation part. Notice that OpenCV environment is used once again to detect the hand via Haar-cascade classifier.

Results And Discussion
The explanation of the results will be focused on the simulation part since the different for the hardware implementation is the source code of the real-time input video. These results are analysed throughout the project scope. At first, the result of Haar-cascade classifier will be explain in detail to show how the hand is detected before image processing. Then, the simulation part begins where the area of interest that has been calculated. The Python file is actually the command file to run the specified program onto the Python command script. Table 1 shows the details of the generated Haar-cascade classifier. (1) This section covers the type of classifier to be used in this project. For this case, it will use the OpenCV and Haar-cascade classifier methods.
(2) This is the part where the window size plays a role. The window size means the size of the specified hand. By default, it is fixed to 24 by 24 pixels.
The variation of the threshold and window detection are set. Generally, this will be based on the type of camera that will be used to detect hands and recognize gestures.
(4) Each of the posture, about 10 times of repetition to analysing the cascade classifier.
In this project, the total data collected around 10 times of repetition with different scales for each posture to determine that the bounding rectangle could detect the hand gesture or not. Figure 4 shows how convex hull is apply to find the edge of user hand. Initially, convex hull compares with hand contours to detect convexity defects where defects are the point where both are different. Firstly, vertices of the convex hull are calculated where they change direction. Then, the hand contour segment between consecutive pairs of consecutive vertices of convex hull and finds that pixels in each segment maximizes the distance from the convex hull. This maximum distance is called the defect depth where the dots are called convexity defects.
The proposed hand gesture recognition requires a webcam to capture the user's gesture, preferably with clear background. The test is done under normal lighting, the system can recognize and track the user's hand movement. The segmentation process is able to detect user's movement and count the contours. Each signal differentiates using different ROI. Figure 5, Figure 6 and Figure 7 shows the output when execution of Python file. All of the numbers and signs were tested until number 5. The experimental results are tabulated in Table  3.     Table 4, the total data collected around 100 samples with different scales for each posture. Each of the posture with different scale were captured with 10 times to analyse the gesture. Haar-cascade Classifier determine whether the box in the frame could detect the hand gesture or not. Based on all of the different sign, the box in the frame were recognizing all of the sign. However, there are some limitations of the program, i.e. the lighting environment changes are still influential in the segmentation process, in particular the process of background removal. Also, the user's hand and the webcam must be in a fixed position so the webcam will capture the image with the exact area to identify the gesture.

Conclusion
As a conclusion based on the result of the project, it can be seen that developing the hand gesture recognition using Python and OpenCV can be implemented by applying the theories of hand segmentation and the hand detection system which use the Haar-cascade classifier. To summarize it, this system has accomplished several objective in this project: (1) manage to establish a complete system for detecting, recognizing and interpreting hand gesture recognition through computer vision using Python and OpenCV, and (2) able to create the numbers and sign languages of hand gesture shown in the system that will meets the name of the project.
For the future recommendation, this system will include the execution of additional gestures that will allow any users with different skin colour and size of palm to perform more functions easily. The current system only uses the right hand with specific area in ROI to perform gestures. Therefore, the desired enhancement of the technique may possibly using both of user hands to perform different signs with computer operations. Additionally, background subtraction algorithms can be used for more effective performance. Implementing the Graphical User Interface (GUI) will be done in the future where the users know how to translate the gesture from its meaning to the sign or number and vice versa.