Skin Segmentation Using Ensemble Technique

Localizing potential skin regions in a color image forms a significant step in applications like face detection, face recognition, face verification, face tracking, gesture analysis, content-based image retrieval and human computer interaction. In this study, we present a pixel based skin segmentation algorithm with ensemble approach using Gaussian Mixture Model (GMM) classifier. Skin color features are extracted using RGB, HSV, YCbCr and CIELab color spaces and ensembled into a single feature vector which is used to train the GMM classifier. Comprehensive experiments have been conducted using three different datasets, SFA Database, ECU Skin Database and UCI Skin database. The skin detection rate of our proposed classifier is observed to be better than the existing works.


INTRODUCTION
Recently skin detection technique is gaining more importance since many applications like face detection, face recognition, face tracking, gesture analysis, content-based image retrieval and human computer interaction use skin detection as a preliminary step to localize skin regions.Kakumanu et al. (2007), discussed that detecting skin color is a challenging task because skin color in images are affected by illumination, background, camera modality and race.Skin color segmentation can be categorized as pixel based methods and region based methods.Pixel based method builds a skin classifier by explicitly defining a skin cluster boundary in some color space.Construction of rapid classifier is the main advantage of this method since it uses simple skin detection rules.Kovac et al. (2003) and Ahlberg, 1999, have concluded that high recognition rates can be achieved only by choosing a good color space along with the appropriate decision rules.Kruppa et al. (2002), Jedynak et al. (2003) and Yang et al. (1998), presented region based pixel methods that analyze the spatial arrangement of skin pixel during the detection stage.
Most of the existing works for skin detection is based on pixel based methods.In the past decade many research work has been reported on skin color pixel classification.Cho et al. (2001) proposed a adaptive skin color filter which is capable of adjusting the threshold values which effectively separates skin color regions.A dynamic skin color model was proposed by (Sun, 2010) that uses a local skin model to shift a globally trained skin model to adapt the final skin model.Cheddad et al. (2009) proposed a new color space which contains error signal derived by differentiating the grayscale map and the non-red encoded grayscale version.Phung et al. (2005) presented the issues pertaining to skin color pixel classification.The authors investigated eight different color representations, seven different levels of color quantization and nine different color pixel classification algorithms.A simple probabilistic model for classifying pixels as skin or non-skin pixel is proposed in Fkihi et al., 2009. Rehg andJones (2002) discussed a skin color detection algorithm based on self-adaptive skin color model which depends on the luminance Y. Hassanpour et al. (2008) proposed an adaptive skin color segmentation method using Gaussian Mixture Model that can adapt the model parameters according to the changing imaging conditions such as lighting and noise.A self adaptive Gaussian mixture model for segmenting the foreground images from the background is proposed in (Chen and Ellis, 2014).The proposed method uses a dynamic learning rate with adaptation to global illumination to adjust the variation in illumination.Skin color segmentation using texture and k-means clustering was proposed by Pun and Ng, 2014 and used both color and texture features to improve the accuracy of skin detection.
The objective of this research work is to segment the potential skin regions in the given color images inspite of variation in race, background, illumination etc.In this study we propose a skin segmentation algorithm based on feature ensemble using various color spaces to improve the recall and accuracy rate.The skin detection rate of our classifier is much better than the existing works, Phung et al., 2005 andCasati et al., 2013.

LITERATURE REVIEW
Pixel based skin color segmentation algorithm classifies each pixel as skin or non-skin pixel based on thresholding technique.In this study, four different color space have been used for feature ensembling, which are discussed below.
RGB and normalized RGB color model: RGB color space is the most widely used color model to store and represent digital image.This model specifies the intensity of red, green and blue on a scale of 0 to 255.The settings of the three colors are converted to a single integer value by using the formula RGB value = Red+ (Green * 256) + (Blue * 256 * 256).RGB color model is not suitable for all applications since the red, green and blue color components are highly correlated.
The RGB color components are normalized in order to reduce the dependence of lighting.Normalized RGB can be obtained by simply normalizing the three color components which are given by: The sum of these three normalized color components is 1 (r+b+g), the third component does not hold any significant information and can be dropped to reduce the space dimensionality.RGB color space is more popular among the researchers due to its simple transformation property and invariant to changes of surface orientation relatively to the light source when we ignore the ambient light source Brown et al., 2001 andSkarbek et al., 1994.Hue Saturation (HS) color model: It will be more intuitive if a color can be described by the perceptual property of color such as Hue (H), Saturation (S) and Intensity (I).This is implemented via this color space where Hue defines the dominant color (such as red, green, purple and yellow) of an area; saturation measures the colorfulness of an area in proportion to its brightness.The intensity, lightness or value is related to the color luminance.In HSV color model values can be extracted using non-linear transformation of RGB color primaries.RGB to HSV transformation may be expressed as: The transformation of RGB to HSV is invariant to high intensity at white lights, ambient light and surface orientations relative to the light source and hence, it can be a better choice for skin detection methods (Kakumanu et al., 2007).The HSI color space (hue, saturation and intensity) attempts to produce a more intuitive representation of color.The I axis represents the luminance information.Converting colors from RGB to HSI is given by: The main drawback of this color space is that hue is undefined if saturation is zero.
YCbCr color model: YCbCr color space has been defined in response to increasing demands for digital algorithms in handling video information and has become a widely used model in a digital video Boykov and Kolmogorov, 2001.In YCbCr color model color is represented as luminance (Y) computed as a weighted sum of RGB values and chrominance (Cb and Cr) computed by subtracting luminance from Red and Blue components: Due to its simple transformation and explicit separation of luminance and chrominance this color space can be a better choice for skin tone detection.

CIELab color model:
CIELab is based on one channel for Luminance (L) and two color channels (a and b).It is the accurate mathematical model that emulates normal human color vision based on standard viewing conditions, light sources and a defined standard observer set by CIE.This color model is device independent.This color space is more suitable for digital manipulation than the RGB space Amanpreet and Kranthi, 2012.CIELab color model is directly based on CIE XYZ and the conversion from RGB to CIELab is given by: Gaussian Mixture Model (GMM): A Gaussian Mixture Model (GMM) can be defined as a parametric probability density function represented as a weighted sum of Gaussian component densities.GMMs find application in Background Subtraction (BS), moving object detection, speaker recognition system, biometric and skin pixel detection.GMM parameters are estimated using iterative Expectation-Maximization (EM) algorithm or Maximum A Posteriori (MAP) estimation from training data.The likelihood given a Gaussian distribution is given by: where, d = The dimension of sample x µ i = The mean Σ i = The covariance matrix of the Gaussian A Gaussian mixture model is a weighted sum of M Gaussian components which is given by: ( ) ( ) where, x is a d dimensional feature vector, w i , i = 1, ..., M, are the mixture weights with constraints Σ i w i = 1 and g(x|µ i , Σ i ) where i = 1, ..., M, are the Gaussian components.

Proposed skin segmentation technique:
Identifying skin pixels in an image is a vital step in most of the face processing applications.The traditional approach for skin segmentation extracts pixel values from a single color space in order to classify skin pixel.In this research work, we propose to ensemble the pixel values using RGB, HSV, YCbCr and CIELab color spaces into a single feature vector.The advantage of this ensemble approach is that the recall value is improved since the complimentary information are extracted using different color spaces.
Our proposed method extracts pixel values from the training set which contains only skin pixels.For each skin pixel, the pixel values are obtained using above mentioned color spaces and ensembled into a single feature vector which contains complimentary information.In order to identify the number of ethnicity groups (M-Number of Gaussian mixture components) in the training set spectral clustering algorithm was applied since spectral clustering provides number of potential clusters in any given data set Ng et al., 2001.The obtained mean and covariance from the spectral clustering is used as the initial parameters to model skin distributions as Gaussian mixtures.
In order to segment the pixels as skin pixels, log likelihood values are obtained using the Gaussian model which is given by: ( ) ( ) where, weight w i , mean µ i and covariance ∑ i are obtained by Expectation Maximization (EM) algorithm.A pixel is classified as skin pixel if the log likelihood value is above a threshold τ.The threshold value τ is fixed using Equal Error Rate (EER) method.Equal Error Rate (EER) is the value where false positives and false negatives are equal.It is also referred as crossover rate or Crossover Error Rate (CER).The lower the equal error rate value, the higher the accuracy of the GMM classifier.The threshold value τ is predetermined from the log likelihood value returned by the GMM classifier for the training samples.If the log likelihood value of a pixel falls below the threshold τ then it is labeled as skin pixel and non-skin pixel otherwise.The algorithm for skin segmentation using the ensembling technique is presented in Algorithm 1.

Algorithm 1 Algorithm for skin segmentation using Ensemble Technique:
1: Function SKIN SEGMENTATION 2: For all Images I in the training set Tr do 3: For all Pixels in an image I do 4: Extract pixels values for RGB, HSV, YCbCr and CIELab color spaces.5: Form ensembled feature vector by combining the extracted pixel values into a single vector.6: End for 7: End for 8: Apply spectral clustering in the ensembled feature space to find the number of mixture models M. 9: Build the GMM classifier with M mixture components using the obtained mean µ and covariance ∑. 10: Fix the threshold τ using the loglikelihood values obtained in the training phase.11: For all Images I in the test set Tt do 12: For all Pixels in an image I do 13: If loglikelihood value>τ then 14: Label the pixel as skin pixel 15: Else 16: Label the pixel as non-skin pixel 17: End if 18: End for 19: End for 20: End function

RESULTS AND DISCUSSION
Comprehensive experiments were conducted to evaluate our proposed ensemble technique to segment skin regions.

Data:
To ascertain the efficiency our proposed work we have tested with three databases namely, SFA Database, ECU Skin Database (Phung et al., 2005) and UCI Skin database.SFA database (Casati et al., 2013)  contains 3354 skin samples and 5590 non-skin samples.Both the skin and non-skin samples vary in dimension, from 1 pixel to 35×35 pixels.For training we have used 4,108,650 (35×35×3354) skin pixels.The database contains test set (original images) along with the ground truth.ECU skin database contains 4,000 color images from four races, with varying illumination and background along with the ground truth.From this database we have used 2500 images to train the Gaussian classifier and the remaining 1500 images for testing.UCI skin dataset (Bache and Lichman, 2013) contains RGB values of skin and non-skin pixels which were collected from the face images of FERET and PAL databases.The database includes 3 races (white, black and asian) with 50859 skin pixels and 194198 non-skin pixels.To train the GMM classifier with UCI database we have used 60% of the samples for training and the remaining 40% samples for testing.

Evaluation criteria:
The performance of the proposed skin classifier is evaluated with accuracy (acc), precision (p) and recall (r).Accuracy is a measure to determine how well a classifier classifies a given sample as skin or non-skin pixel.It is calculated as total number of correct results out of total number of samples.Precision is calculated as the ratio of actual skin positives out of predicted positives and recall can be calculated as the ratio of actual positives out of positives.Accuracy, precision and recall can be calculated with the formula listed below:  tested with individual color space using SFA Database, ECU Skin Database and UCI Skin database.In this study we have built five Gaussian mixture models (Four GMMs for four color spaces and one GMM for ensembled color space) to model the skin distribution in an image.To train the GMM we have used only the skin pixels from the database and for testing, the test images in the database were used.The output image produced by the classifier was compared pixel wise with the corresponding ground truth image.With the ensembled technique we have obtained a significant improvement in terms both the accuracy and recall value.Table 1 shows the results of the proposed ensemble technique for SFA database.For SFA database both HSV and YCbCr color space produced almost the same accuracy, but YCbCr produced high recall value when compared to HSV color space.By combining all the four color models we have obtained the recall value of 0.9563 and accuracy of 0.9876 which is higher than the individual methods.
We have also obtained improved results for both UCI and ECU skin databases.The results of the proposed ensemble classifier for skin segmentation using UCI and ECU skin databases are presented in Table 2 and 3, respectively.For UCI database our proposed ensemble technique produced the recall rate of 0.9776 and accuracy rate of 0.9955 which is higher than the individual color spaces.Similarly for ECU database we have obtained recall rate of 0.9879 and accuracy rate of 0.9788 which higher when compared to the individual methods.
For all the three databases we have obtained high recall rate with the ensemble technique.Since our proposed ensemble technique makes use of complimentary information we have obtained better recall value than the individual methods.

Comparison with the existing works:
The results of the proposed work is compared with the methods  (Phung et al., 2005) 89.79% Multilayer perceptrons (Phung et al., 2005) 89.49% Gaussian of skin using YCbCr (Phung et al., 2005) 82.67% Gaussian of skin using CbCr (Phung et al., 2005) 85.57% Gaussian of skin and non-skin using YCbCr (Phung et al., 2005) 88.92% Gaussian of skin and non-skin using RGB (Phung et al., 2005) 85.76% Proposed method 97.88% proposed by Phung et al., 2005 and Casati et al., 2013.The comparison of the proposed work with the existing work using ECU database is shown in Table 4.For ECU database we obtained improved performance in terms of accuracy.We have obtained 97.88% accuracy for this database which is 9% higher than the method proposed by Phung et al., 2005.We have also obtained better accuracy than the method proposed by Casati et al., 2013 and the results are shown in Table 5.The proposed method produces 7% increase in accuracy for SFA database and 10% increase in accuracy for UCI database.

CONCLUSION AND RECOMMENDATIONS
In this research work we have proposed a novel method for segmenting the skin regions in a given image using ensemble technique.The proposed method extracts the skin pixels using four different color spaces namely RGB, HSV, YCbCr and CIELab and ensemble the extracted pixels into a single feature vector.The ensembled feature vector is then used to train the Gaussian classifier.Using the log likelihood values the thresholding technique classify every pixel in an image are classified as skin or non-skin pixels.
We have tested our new approach using three well known databases namely, SFA Database, ECU Skin Database and UCI Skin database.Out of these three databases we have obtained maximum recall rate of 0.9776 and accuracy rate of 0.9955 for UCI database.For all the three databases we have also obtained better recall rate than the individual color spaces.The proposed method also performs better than the methods proposed by Phung et al., 2005 and Casati et al., 2013.Our method produced 7% improvement in terms of accuracy when compared to the method proposed by Phung et al., 2005.For the method proposed by Casati et al., 2013 we obtained 6% improvement using SFA database and 9% improvement using UCI database.
In the future we intend to parallelize the proposed ensemble technique to reduce the processing time.
˜JJ˩ˮ˩˰˥J + ˘IˬJ˥ ˜JJ˩ˮ˩˰˥J ˞˥IIˬˬ {˞{ = ˠJ˯˥ ˜JJ˩ˮ˩˰˥J ˠJˮIˬ ˚J.J˦ ˜JJ˩ˮ˩˰˥J Performance of the proposed skin classifier: The performance of our proposed ensemble technique was

Table 1 :
Result of the proposed ensemble technique for SFA database

Table 4 :
Comparison of the proposed ensemble technique using

Table 5 :
Comparison of the proposed ensemble technique using SFA