Abstract

Recent years have witnessed renewed interest in developing skin segmentation approaches. Skin feature segmentation has been widely employed in different aspects of computer vision applications including face detection and hand gestures recognition systems. This is mostly due to the attractive characteristics of skin colour and its effectiveness to object segmentation. On the contrary, there are certain challenges in using human skin colour as a feature to segment dynamic hand gesture, due to various illumination conditions, complicated environment, and computation time or real-time method. These challenges have led to the insufficiency of many of the skin color segmentation approaches. Therefore, to produce simple, effective, and cost efficient skin segmentation, this paper has proposed a skin segmentation scheme. This scheme includes two procedures for calculating generic threshold ranges in Cb-Cr colour space. The first procedure uses threshold values trained online from nose pixels of the face region. Meanwhile, the second procedure known as the offline training procedure uses thresholds trained out of skin samples and weighted equation. The experimental results showed that the proposed scheme achieved good performance in terms of efficiency and computation time.

1. Introduction

Dynamic hand gesture segmentation is the foundation step of the whole hand gesture tracking and recognition system. Since the qualitative results will affect the follow-up procedures of hand gesture recognition system [1], dynamic hand gesture segmentation process requires improving the segmentation accuracy, achieving real-time segmentation, reducing the influence of lighting conditions on dynamic hand gesture segmentation process, and precisely segmentally moving the hand gesture from a complicated environment or different brightness conditions.

In previous years, skin colour feature segmentation played a pivotal role in dynamic hand gesture segmentation since skin is invariant to hand scale changes and posture variations [2]. The existing body of studies on skin colour segmentation has suggested the use of colour spaces to perform human skin segmentation by particular skin colour threshold [35]. This is because utilising skin colour threshold approach can easily and simply segment the skin features including hand skin from the background to be used within dynamic hand gesture segmentation methods [6]. However, skin colour threshold approach is limited by illumination variations, as well as the interference with similar skin colour objects from the background [7]. Therefore, many researchers have proposed approaches depending on a set of conditions derived from the skin spot in 2D or 3D colour spaces. Such methods have their conditions for any given pixel to be investigated in making the decision on the class of that pixel. A number of methods have been developed to segment human skin colour including hand skin, such as that conducted by Mahmoodi et al. [8] using the combination of ternary images with the outcome of frame differencing technique in skin-motion segmentation scheme, which has improved the Bayesian classifier and feedback mechanism. However, they [8] stated that the initial seed generator based on skin-motion segmentation is very crucial and requires further improvement. In addition, there is a need for designing a mechanism to cope with highly lightened areas.

Qiu-yu et al. [9] presented a method based on YCbCr colour space and -means clustering algorithm. On the contrary, their method is unsuitable under complex situations and is restricted to real-time performance.

Tan et al. [10] developed a framework based on smoothed 2D histogram and Gaussian model. Additionally, the success of their method relies on eye detector algorithms. Besides, they [10] mentioned that the framework is still limited and is subjected to further enhancement.

Gupta and Chaudhary [11] employed automated colour space switching method based on various colour spaces, the statistical mean of the skin pixels value in the image and Bayesian approaches. Nonetheless, their [11] model consumes a high computational time. Moreover, it is limited by various lighting conditions and different backgrounds.

Yeo et al. [12] have utilised the skin colour in , Cb, and Cr colour space components, which were further extracted and smoothed to reduce clutter via applying morphology operation and finally combined using logical “AND” operation. Nonetheless, the algorithm was found to function better in indoor situation under normal lighting condition and may be degraded under very dark or very bright illumination conditions.

Asaari et al. [13] used generic thresholding skin segmentation algorithm based on skin thresholds in Cb-Cr colour spaces, which was calculated using an equation from mean and standard deviation of cropped skin regions pixels. Unfortunately, the obtained threshold factors were detected with a reduced capability in detecting and segmenting skin colour features including hand skin under low illumination condition.

Kawulok et al. [14] introduced two strategies combined in hybrid adaption system: the first adaption strategy uses detected facial area and the other strategy utilises a self-adaptive scheme that uses global model response to create local skin colour. The extracted local skin colour model was used to obtain seeds for the geodesic distance transform that defines the skin region boundaries. However, the introduced method offers essential improvement of accuracy of skin detection, but still the results are unsatisfactory and the algorithm is addressed for further improvement regarding reducing the false positive rate and developing the algorithm to exhibit more adaption.

Thus, to ensure correct, simple, and low computation time segmentation to skin colour including hand skin against illumination conditions variations and complex environment, this paper proposes the enhancement thresholds selection technique through skin colour segmentation scheme. This scheme includes two procedures for calculating generic threshold ranges in Cb-Cr colour space. The first procedure considers skin segmentation using threshold values trained online from nose pixels of face region based on Viola–Jones method and Fast Marching Method (FMM) [15, 16]. The second procedure, which is named offline training procedure, was executed as an alternative since the online training procedure has failed to detect face region under particular circumstances where Viola–Jones algorithm is degraded. This is because the offline training procedure calculates the threshold range offline out of skin samples using weighted equation.

2. The Proposed Skin Segmentation Scheme

Skin colour is considered a significant feature used to discriminate between skin and nonskin regions in an image. Skin colour information is more robust against geometric variations caused by scaling, rotation, or translation. Basically, the RGB colour space is the default colour space that is often utilised and visualised particularly for digital images processing and storing. Any other colour spaces that can be derived from RGB colour space through a linear or nonlinear transformation can be retrieved back again with different quantities of computational power required for various colour spaces. The colour space transformation acts to reduce the overlay between skin and nonskin distributions, thus facilitating skin pixel classification and improving the efficiency against illumination influences. Prior studies have revealed that the skin colours are brighter than that of chrominance or colour components [17, 18].

In this study, YCbCr colour space was advocated to represent skin colour since the transformation from RGB to YCbCr is less complicated than other colour spaces, which makes YCbCr a favourable choice for a skin segmentation task [13, 17]. YCbCr colour space is derived in such a way that the illumination (luminance) component is focused in a single component (), while colour (chrominance) components are concentrated in Cb and Cr components. The conversion from RGB to YCbCr can be gained from the linear relationship as described in is the point representing the grey level information, whereas Cb and Cr describe the colour variance with respect to blue and red colour channels.

To reduce the effects of brightness on skin colour regions, this study has adapted the same view of previous studies [13, 17, 19] that consider only chrominance components (Cb and Cr) while the brightness components () are discarded for the skin segmentation process. The cause behind the discarding of brightness component () is the illumination that can vary greatly over the region of skin under various lighting effects, which makes it difficult to select the range of skin colour values.

To segment skin regions, the study in [13] has used the mean and standard deviation for skin region samples in Cb-Cr colour spaces to calculate the decision boundary (thresholds values) for skin colour segmentation. However, the calculated thresholds gave weak or noncorrect segmentations to skin region under low lighting situation including the misdetection of skin colour pixels in somewhat dim illumination or high light situation. This has caused the degradation of skin colour segmentation scheme, which directly affected the results of hand gesture segmentation algorithm.

To cope with illumination problem, this study has proposed a new skin colour information segmentation scheme. This scheme calculates a new boundary decision (thresholds values) based on maximum and minimum range values of Cr-Cb colour space. The thresholds of range based segmentation were calculated utilising either online training procedure from nose pixels of face region or offline training procedure from a number of skin samples. Figure 1 displays the flowchart of skin colour segmentation scheme.

First and foremost, the proposed skin colour segmentation scheme began with online training process. In this process, Viola–Jones algorithm [20, 21] in YCbCr colour space was used to detect face region and the region of nose, respectively. The threshold values were selected from detected nose region by calculating the minimum and maximum values for Cb and Cr chrominance components, respectively. Considering that the threshold values taken from nose region are in range (minCr, maxCr, minCb, and maxCb), Figure 2 shows the sample of online training thresholds. Moreover, to cope with high complexity drawback, the online training procedure was applied only once. The face region was later discarded from image. Consequently, binary information of skin areas in the image was segmented via thresholding operation depicted in (4).

Furthermore, Fast Marching Method (FMM) [15, 16] was applied into segmented skin feature from the previous step to correct the boundary and compensate the holes or missing pixels parts of hands and other skin color regions in the image frame due to unequal brightness distribution over face parts and other skin color regions in the image frame of video sequence. The FMM was able to track the boundary of moving objects and segment them from image with low computation time [22] using the formula illustrated below: BW represents the segmented image; is the weights for every pixel defined in the input array; MASK is the seed locations; THRESH is the positive scalar in the period . THRESH identifies the level at which the outcome of FMM performs a thresholding operation to obtain the output binary image BW. Here, THRESH is set to 0.001 by experiments and observations.

However, under the situations of low lighting conditions and face rotation, Viola–Jones algorithm exhibited a decrease in its performance to detect face or/and nose regions in the video frames. This flaw further made the online training procedure fail and stop. To deal with such issue, the offline-training procedure was used to run based on thresholds (values range) that were calculated separately (in the offline procedure) for such obstacle. Hence, in the offline procedure, the thresholds values were calculated by weighted equation (3) and from a number of picked skin colour samples in Cb-Cr colour space. In fact, 11 picked skin samples were manually cropped from the face region of 11 individuals. These individuals were randomly selected from video files of the IBGHT dataset. Maximum and minimum values in each Cb and Cr for every skin sample were taken out as threshold values with Figure 3 illustrating the skin colour samples. Finally, the threshold values for Cb and Cr components required for skin colour segmentation were obtained using is the weight and value by experiments and observation set to 0.02; thr1 and thr2 are the first and second extracted threshold vectors based on minimum and maximum range values of Cr_min, Cr_max, Cb_min, and Cb_max of skin samples (Figure 3). thresholdsvect represents calculated thresholds for skin colour segmentation based on Cr-Cb range. The used equation is inspired by [23] for adaptive human motion feature extraction. Figure 4 shows the threshold values calculated for skin segmentation.

Finally, image binarisation was performed to obtain the skin regions extracted from images by a thresholding operation depicted in

3. The Experimental Results and Discussion

For performance evaluation, unfortunately, no standard datasets appropriate for the video skin colour segmentation algorithm have been yet collected [8]. The dataset of Feeval [24] is the only one available; however, this dataset is not qualitative enough with imprecise ground truths. Additionally, Mahmoodi et al. [8] used their self-made SDD dataset in [25] including 33 videos. Yet, their dataset was not made available for other researchers.

Therefore, video files of IBGHT dataset [26] were used as they contain the highlighted challenges for hand gesture segmentation and detection approach. The proposed skin segmentation scheme in this present study concentrates on skin feature segmentation including hand gesture skin in video frames for dynamic hand gesture segmentation and detection method. The IBGHT video sequences were captured by low cost USB camera with 352 × 288 image resolution [26]. In addition, the IBGHT dataset comprised 60 video sequences with indoor and outdoor scenes.

However, the IBGHT dataset does not include ground truth for skin feature segmentation. Thus, the performance of the proposed scheme was subjectively compared with skin segmentation based on thresholds of previous studies. Besides, comparison upon computation time was conducted with the skin segmentation scheme developed by [13].

As depicted in Figure 1, the proposed algorithm for the skin colour segmentation started with an online training procedure where the input image frame of the video sequences was already converted into the YCbCr colour space. Thereafter, as observed in Figure 5, the Viola–Jones algorithm was applied to detect the face region where the output parameter of face detection is the boundary box around the region of the face (face_box). The face_box parameter was represented using a numeric vector in which and represent the corner points of the boundary box in and directions. Additionally, the width and height of the boundary box were included. Next, after the face region was successfully detected, Viola–Jones algorithm was applied to detect the nose region. The output of the nose detection is the boundary box around the nose region (nose_box), as shown in Figure 5.

As illustrated in Figures 5 and 6, the Cb and the Cr nose images were cropped based on the nose boundary box (nose_box) in terms of , , width, and height. After that, the threshold values for Cb and Cr were trained online by calculating the maximum and minimum value ranges inside each cropped Cb and Cr image of the detected nose region to be represented in the required (minCr, maxCr, minCb, and maxCb) parameters.

As depicted in Figure 7, the binary information of skin regions was then segmented from YCbCr input image of video sequences using the calculated thresholds in (4). However, there is a problem that occurs under unequal brightness distribution over face parts and other skin areas in the image, which resulted in a loss of boundary and/or wholes inside the segmented regions (inaccurate segmentation). It was observed in Figure 7 that this problem was handled by applying the Fast Marching Method (FMM) into the segmented skin image. Consequently, FMM has contributed to enhancing and manipulating the wholes and/or missing parts of skin for hands and other regions inside the image.

On the other hand, the offline training procedure is switched on under low illumination conditions and face rotation situation where the Viola–Jones face detection algorithm failed to detect face and nose regions in skin colour segmentation by online training thresholds procedure. Figure 8 illustrates the skin colour segmentation results using offline training thresholds in Cb-Cr chrominance components where the threshold values are calculated by (3) and from a number of skin colour samples taken from advocated dataset IBGHT, as depicted in Figures 3 and 4. Finally, skin feature was segmented into binary image using (4) where white pixels match the skin information.

As shown in Figure 8 on the first vertical line image, the Viola–Jones face detection algorithm was seen to detect the face region excluding the nose region since hand and arm practice partial occlusion with the face, leading to a failure in segmenting skin information using the online training procedure. Therefore, the offline training threshold procedure was switched on as an alternative to handle such problem.

Figure 9 demonstrates that the face region was already detected and that the skin colour segmentation algorithm has managed to remove the face and reduces the noise caused by unrelated objects.

For further evaluation, Figure 10 illustrates the comparison between the proposed skin segmentation scheme and other state-of-the-art approaches based on pretrained thresholds in the Cb-Cr range. The comparison displayed the difference in performance of the threshold values calculated here by the proposed skin segmentation scheme and that by previous studies [13, 27] upon the video frames of IBGHT dataset [26].

It can be noticed from the images in the second column of Figure 10 that the proposed skin segmentation scheme possessed a better performance in comparison with the previous methods of [13, 27]. In addition, the proposed scheme has shown a better potential in various lighting and background conditions. For example, the proposed skin segmentation scheme has correctly segmented skin colour including that of the hand region in comparison with skin segmentation procedure of [13] that failed to detect the hand region. Moreover, the method of [27] has detected skin region with further noise affecting the hand gesture segmentation. Thus, the comparison is based on computational time, evaluated against the study results by [13, 27] skin segmentation methods, as shown in Table 1.

In summary, it can be seen from the results in Figure 10 and Table 1 that the developed skin colour segmentation algorithm has achieved the trade-off between accuracy and computational time.

4. Conclusion

In this paper, a skin segmentation scheme has been proposed, which successfully and accurately classified every pixel of the random selection video frame into skin and nonskin classes. The proposed scheme included two alternatively running procedures based on threshold factors in Cb-Cr colour spaces trained either using an online training procedure or using an offline training procedure. Online training procedure has calculated the thresholds factors ranging from nose pixels to face in which Viola–Jones algorithm was used to detect the face followed by the nose region to calculate min and max values for every threshold factor of Cb and Cr. However, alternatively, in out-of-plane rotation and different lighting situations, the Viola–Jones algorithm may fail to detect the nose region; in that case, to prevent degradation and failure, this paper proposed an offline training procedure. The offline training procedure segments skin information using skin samples data and weighted equation to obtain threshold factors of Cb-Cr. Two calculated thresholds ranges in max-min values for Cb-Cr were used in the weighted equation based on alpha weight to make the process adaptive to skin colour variations under different circumstances. As a qualitative measurement, the proposed method was subjectively compared with previous studies. Comparison upon computational time was also set. Consequently, the experimental results showed the ability of the proposed scheme to adapt to different illumination conditions and complicated environment by achieving a balance performance between time and sufficiency. In our future work, we intend to perform more qualitative experiments based on precision and recall with previous studies so that the latter may lead to the enhancement and reduction of the false positive percentage for the images that do not represent the skin somewhat. In addition, applying the extracted skin feature is yet to be applied into other features for dynamic hand gesture segmentation method.

Competing Interests

The authors declare that there are no competing interests regarding the publication of this paper.