Hand gesture recognition using discrete wavelet transform and hidden Markov models

ABSTRACT


INTRODUCTION
The technology can be used for human-physical recognition [1][2][3][4][5][6][7][8][9][10], one of which is the recognition of the hand that can be applied as a communication tool [11][12][13].The previous research used DWT and support vector machine (SVM) classification [14].In this study, they obtained 94% of accuracy where they did cross validation five times with 50 data samples from seven actions.Then when 231 samples were used for training data and the remaining 119 were used for test data, they obtained an accuracy of 93.27%.Tests also carried out with 256×256 pixel images with level 5 decomposition which produced an accuracy of 93.14%.DWT can provides the information of time and frequency simultaneously and wavelets can be arranged and adapted as needed [15].HMM has the advantage at being able to overcome the problem of evaluation, inference, and learning [16].HMM often used in various applications, an effective learning algorithm, and can handle variations in record structure [2].Referring to the research [17], a static hand gesture recognition using the HMM has an average accuracy rate of 93.38%.
The purpose of this research was to be able to design a hand gesture recognition system based on digital images using DWT as feature extraction and HMM as a classification algorithm.Then, test the results 2266 and analyze the system performances.The problems contained in this research including how to make the system design and simulation of hand gesture recognition for dataset using the DWT method and HMM classification, how does the effect in changing the value of input parameters on system performances using DWT feature extraction methods and HMM classification, how is the performances, and how the accuracy and timing of computing system compared to.The data was self collected using own dataset consist of a collection of hand gesture images using smartphone with 13MP resolution.

RESEARCH METHOD 2.1. Discrete wavelet transform
DWT method is utilized to conduct hand feature extraction.The method is used to create characteristic matrix of an image to represent value of the matrix of the related image.Explanation of the feature extraction process using DWT with example in Table 1.First, calculate the average value of pixle row of a hand gesture conture image as in Table 2 and the result is in Table 3.Second, calculate the average value of each pixel column set in Table 3 by inputing the previous calculation result illustrated in Table 4.Then, it would produce an output of image conture value sub-band LL, LH, HL, HH illustrated in Table 5. Final process, process of repetitive extraction of DWT characteristic will finish if every conture image data is succesfully extracted [18][19][20][21].

Hidden Markov models
Each hidden Markov models are defined by state, probability state, probability of transition, probability of emission and the early probability.To describe the entire HMM, the following five elements should be elaborated: a. N is a state of a model, defined as follows: The   displays the current state.Transition probability should, meet the normal limit.  ≥ 0, 1 ≤ ,  ≤  and ∑   = 1, 1 ≤  ≤ .

𝑁𝑁 𝑖𝑖=1
d.The Observation of symbol probability distribution in each state,  = �  ()� where   () serves as probability of symbol   occurred in state S j .
shows symbol in observation  with alphabet and   serve as current vector parameter.Following stochastic limit must be met .
e. HMM is the first distribution of state  = {  },   stands for model probability in state   in time  = 0 In order to carry out further analysis, firstly two basic issues of HMM should be solved as follow: a. Evaluation and forward and backward issues Calculate the value by inserting scaling function.− Scaling function − Forward − Initialization : − Recursion: ] +1 (), − Termination : − Backward − Initialization : − Recursion : b. Learning issue Followings are the step to compute for solving learning issue: Next step is re-estimating parameter A, B, and : The process above should be carried out until a decent value is obtained [22][23][24][25][26][27].

Image pre-processing
In this research, a system has been designed to recognizing hand gestures through images.In general, the design illustrated in Figure 1.The inputs were training images from a RGB-layered dataset.The inputs were testing images from a RGB-layered dataset using DWT as feature extraction method.The final process was to train the parameters of forward and backward training images in each class using HMM and the inputs were feature vector from training images as seen in Figure 1 (a).In Figure 1 (b) the inputs had been testing from a dataset that had a RGB layer then generated a contour image by image resizing and skin color segmentation.The last had been processing with DWT method and HMM method, the process that happened was calculating the forward parameters and determined the class from the highest probability.
The image pre-procesding based on Figure 1 was to resize the image to 128×128 pixels then second step was change the image from RGB to YcbCr, Blue-layered.In this process, the input was RGB-layered hand gesture image.The third step was segment the skin by setting up the pixel value threshold, the final result from this process was a segmented image.The fourth step called denoising, where this process had been removing the noise in the signal while maintain signal characteristics.The fifth step was to filled up the noise that cannot be removed from the previous process.The sixth step was a dilation process to thicken the edge of the segmented image from the last process so that the required pixels can be detected.The seventh step was the erosion process which would eroded the edge of the segmented image from the last process so that unnecessary pixels can be removed.The output was YCbCr-layered hand gesture contours.The main process in pre-processing was a process of separating the background and objects, which in this research was the right hand as seen in Figure 2.

Feature extraction and classification
In this research, we used the DWT to find the hand features and to create a feature matrix from an image to denote the matrix value of the image itself.The result was a contour image value within LL, LH, HL, HH subband as an example seen in Table 5.The classification process with HMM as illustrated in Figure 3, input was a combined vector from training image's characteristic vector resulting from the feature extraction process using DWT.In addition, HMM required A, B, π, state, and cluster values.It was necessary to determine the required state value and calculated the cluster value as the   observation value by seeking the k-means value.The next process was calculating the forward variable, namely the process of initialization [10,28], recursion and termination [2].
Before the process, there was an added process of calculate the scaling function.Next was a backward algorithm calculation.The process consisted of two stages, the initialization and recursion stages.Calculated the variable ξ  (, ) and   () based on the variables defined in the previous forward and backward procedures.Afterthe four variables was obrtained, reestimated the parameters A, B, and π.The final step was to take the highest probability value of the testing image to be used as the final value of the hand gesture classification.

RESULTS AND ANALYSIS
System testing was performed out from self collected dataset with an image that through a resizing process measured at 128×128 pixels.The purpose of examining this system was to compare the accuracy, system performances, and the best-performed parameters for hand gesture recognition systems.In this research, the total image data used was 250 images from dataset.The hand gesture image consists of 5 word classes which each consisted of 50 images.

Testing the system parameters
The parameter testing goal was to obtain the results of parameters with the best performance, more spesific, the accuracy and timing of the system.− Layer-type parameters impact Done by using one type of layer for testing and then DWT feature extraction was performed and classified using HMM as shown in Table 6.It can be seen that the best parameter was in the YCbCr layer.In Table 6 it appears that the blue layer have the highest accuracy.This was due to the high frequency of pixels, from 0 to 45 for high-intensity values at pixels 0 to 231 compared to other types of layers as in Figure 4. − Sub-band-type parameters impact Done by using layers that had the best performance in the previous test, the blue layer, and DWT parameters, that was the four types of sub-band consisting in low-low (LL), low-high (LH), high-low (HL), high-high (HH).The performance results in sub-chapter were described in Table 7 and can be seen that the best parameter was in the LL sub-band type.The LL sub-band had a smoothest than other sub-band types as shown in Figure 5.

− Decomposition level parameters impact
The previous test was conducted by analyzing the value of DWT decomposition level parameters of level 1, 2, 3, and 4 in the dataset.The tests were carried out with the best parameters in the two previous testing parameters, the blue layer and LL sub-band.The performance results were described in Table 8.Graphs of characteristics that were influenced by level decomposition were shown in Figure 6.The changes of level decomposition resulting in the acquired characteristics had no many characteristic.The smaller the decomposition level, the faster the computational time would be.However, this was not the case with   The tests were carried out with four types of mother wavelet parameters: Haar, db3, db5, and db7.Tested it with the best parameters in the previous parameters: the Blue layer, LL sub-band, and level 1 decomposition.The performance results were listed in Table 9.The best test results obtained from Haar mother wavelet.In Figure 7, the graph shows that the different types of mother wavelets cause different forms of characteristic in the same class.So, the used of certain mother wavelets in a system can provided a uniqueness for each class so that they can be distinguished between each class Figure 7. Feature values of various mother wavelet − Amount of cluster parameters impact Done to test the cluster parameters used in HMM classification.Clusters that were being analyzed ere 50, 100, 200, 400, 800, and 1000.Tested it with the best parameters from previous tests.In Table 10, the best number obtained in 800.In Figure 8, it can be seen in the graph that the characteristics of the clusters of 50 caused the characteristics at the same type obtained the small accuracy compared to 800 clusters.− Number of state impact The next step was to test the state parameters used in the HMM classification to system accuracy and computation time.The state that were used: 4, 5, 25, 50, 100, and 150.The best performance results was 5 states and the rest were listed in Table 11.The best parameters with the number of similar states was 5 states.This happened because the concept of HMM that basically broke down the data as many as the desired state.So, if the value of the state used is not right, it will make it difficult to identify the test data.

Testing the data batch
The data that tested were shown in Table 12.The conclusion was the recognition system with the DWT and the HMM can identify well if the training and test images presented was between 60% and 40% of all data in each class.

Classification testing
Classification testing was done by comparing the accuracy and computation time of two classification methods, K-Nearest Neighbor (K-NN) and HMM.The classification data was taken from the training data to the training data and training data to the test data as shown in Table 13.Based on Table 12, HMM had a lower accuracy when tested a training to training data, when compared to training to test data.This happened due to the percentage of data when the training tested with training data is 50-50%.Whereas, when tested training to the test data had a presentation of 60-40%.Based on Table 13, it can be seen that the test was also done with other datasets with accuracy and computational time better than a performance with Marcel static hand posture database [29,30].The dataset had a lower performance compared to the Author dataset that has a resize image measured at 76×66 pixels which in this research was the right hand as seen in Figure 9.

CONCLUSION
This paper proposed a hand gesture recognition system that has 5 types of gestures: letter A, letter B, letter C, point, and number 5 (five).The best parameters are blue layer, low-low sub-band, level 1 decomposition, Haar mother wavelet on DWT parameters, 800 cluster numbers and 5 state states on the HMM parameter.The accuracy and computation time outcome from the system were 72% and 53 seconds respectively.The best amount of data tested is on 30 training images and 20 test images.Layers that had high accuracy would have a good contrast and brightness ratio.The dataset image had a high contrast and brightness on the Blue layer due to the high frequency of pixels which are 0 to 45 for high-intensity values at pixels 0 to 231 compared to other layer.
DWT had three test parameters they are sub-band type, decomposition level, and mother wavelet.The sub-band parameter was processed to obtain a smooth image characteristic in the LL sub-band.Decomposition level parameters was the process of converting images into a simple form to obtain unique characteristics of a good image.Next, different type of mother wavelet caused the uniqueness in the characteristic.The HMM classification had two test parameters, the clusters and the states.Effect of cluster parameters was taking the features to be used.The cluster values must be appropriated determinate because the characteristic values of each class will be similar.In testing the classification for training to training had a lower accuracy of 55% compared with testing training to data which is 72%.This happened due to the percentage of data when the training tested with other training data was 50-50%.Whereas, when testing training data to the test data had a presentation of 60%-40%.Cogitated and concluded to create our own dataset because when Sebastien dataset was used, it just had the accuracy at 58% with the image size of 76×66 pixel.Further, it had 2 nd level decomposition and db5 mother wavelet.The mentioned caused by feature extraction process with 2 nd level DWT and classification with HMM encountered three times the compression process.The consequences were the gestures taken from the images was so small so its harder to classified it.Hence, to made the accuracy higher we produced our own dataset with a good brightess and contrast value.Thereafter, the resolution was boosted to 128×128 pixels resulting the accuracy jumped up by 14% to 72%.

TELKOMNIKA
Telecommun Comput El Control  Hand gesture recognition using discrete wavelet transform and... (Erizka Banuwati Candrasari) 2271 accuracy, there were several values for level decomposition parameters that had clear characteristic values to able to made different between classes.

Figure 4 .Figure 6 .−
Figure 4. Histogram of blue layer images Figure 5. Illustration of images in sub-band

Table 2 .
Illustration of calculation process of average pixel pair based on the row

Table 3 .
Illustration of pixel pair based on the row calculation result

Table 4 .
Illustration of the process of calculating the average pixel pair

Table 5 .
Illustration of the result from calculating average pixel pair based on the column b. M is symbol representing observation per state  = { 1 , … ,   }.The observation has continuous value as the M value is infinity.c.Probability distribution of transition state  = �  �,   stands for state probability at t+1 symbolized as   , given when state in time t valued   .  = { +1 = |  = }, where 1≤ ,  ≤

Table 6 .
Layer-type parameters performances

Table 7 .
Sub-band-type parameters performances

Table 9 .
Mother wavelet parameters performances Mother Wavelet Total testing data Total correct data Accuracy (%) Computation time (s)

Table 10 .
Amount of cluster parameters performances

Table 11 .
Number of state impact performances State total Total testing data Total correct data Accuracy (%) Computation time (s)

Table 12 .
Data batch testing performances

Table 13 .
Classification testing performance on classification methods