GLCM-Based Feature Extraction for Alpha Matting on Natural Images

The main objective of this research is to determine the optimal threshold value in the unknown region in the alpha-matting operation of natural images. Alpha-mating serves to draw matte from the image used in segmentation. The alpha value is very influential on the quality of segmentation which is determined by the level of threshold value accuracy. The determination of the threshold begins by breaking the grayscale image into several sub-images using Region of Interest (RoI). Each sub-image was extracted using the Gray Level Co-occurrence Matrix (GLCM) considered by the parameters of contrast, energy, and entropy at angles of 0°, 45°, 90°, and 135 °. Each feature results in extractions, which are then averaged and normalized in each sub-image. The value is determined as the local threshold value used in the alpha matting operation. Experiments were carried out on 12 natural images from the image-mating dataset to evaluate the performance of the proposed algorithm. The increase in accuracy shows up to 63% by the measurements of experiments, compared to the calculation of adaptive threshold by using the fuzzy CMs Algorithm.


Introduction
The transformation of broadcasting standards from analogue to digital, as well as video coding standards from Advanced Video Coding (AVC, H.264, or MPEG-4) to High-Efficiency Video Coding (H.265 or MPEG-H), caused the demand for multimedia content to increase rapidly.This increase automatically also has an impact on increasing the need for video editing applications, where the object extraction stage will play an important role.
Object extraction is classified into two types, namely automatic and semi-automatic.In automatic methods, [1], [2] object separation is carried out based on special features such as gestures, colour, texture, and movement [3].This is very difficult to implement to separate objects according to user perception.Meanwhile, in the semi-automatic method [4], object extraction is carried out by involving the user to manually provide constraints on the image that indicate the appropriate foreground and background areas with video context.In both methods, the task of extracting objects in still images or single frames can be done using matting techniques.
Porter and Duff [5] introduced the matting technique by using an alpha channel (unknown region) for controlling linear interpolation in determining the foreground and background colours with an antialiasing function in an image.The technique used is meant to eliminate the jaggies effect resulting from the combination of foreground and background, which is called the "pulling matte" or "digital matting" technique in the segmentation process.Digital matting greatly affects the quality of object extraction, so in the last few decades' research related to is still being carried out on a massive scale.This is because the accuracy of object extraction will determine the quality of compositing in image or video editing since the multimedia network industry has rapid growth and it has an important role for the applications.
Matte extraction is done to produce a new colour by combining pixels in the foreground and background having a semi-transparent colour.It results in colour in the form of a gradation from black to white, in which the background will be represented by black and white for the foreground.Meanwhile, the unknown region is represented by a mixed colour obtained by measuring the average foreground and background colours.
The quality of foreground separation becomes quite important in the matte extraction process, in which the separation accuracy is strongly influenced by the determination of threshold value.Matte extraction results that meet the requirements have an even colour distribution (equally distributed of black and white).If the area correlated with the foreground is extracted, white will be the dominant pixel.Otherwise, it will be black for the background extraction.Matting and segmentation technique has differences in the extraction process.It is firmly carried out on pixels within the edge so that each pixel will have a definite foreground and background.This will cause a jaggies effect so that the segmentation results will be rough.Thus, a matting technique is introduced by considering the unknown region to be considered as a threshold value (α = alpha) to provide an anti-aligning effect so that it can reduce the effect of jaggies on the edge.
The alpha (alpha matting) withdrawal process is carried out by separating the pixels into parts of the foreground and background, where the graded value will range from a value of 0.0 (fully transparent) to 1.0 (fully opaque).The withdrawal threshold is carried out by setting a value within the range.Initially, the alpha threshold is set between 0.17 -0.15 which is assumed that image characteristics are within the range.The threshold value is defined by the user's perception considering the characteristics of the image extracted.Thus, special skills are needed to determine the right threshold value to get a qualified matte.The matte quality will be reduced by user error in determining the threshold value that the border will be transparent or opaque.
Prior matte extraction is performed to separate the foreground and background on an image or video by threshold value as the main parameter.However, user intervention is required in the determination of the threshold value which is frequently experiencing errors in the classification.Therefore, intervention needs to be reduced.Moghaddam et al [6] performed the appearance separation of dirt and scratches in an image on the marble surface to determine the threshold value.The global calculation of threshold value is carried out by using the Otsu algorithm and classified by the K-Means algorithm.The results show an increase in accuracy though it does not reach an optimal value.Therefore, the Fuzzy C-Means algorithm [3] and linear optimization [5] are adaptively applied by considering the image intensity to determine the global threshold value.The technique shows a better-increased result in accuracy compared to the Otsu algorithm by matte extraction.
Illuminance is important in image analysis since the light intensity differences cause pixel blocks to be lighter or darker than others.Errors will occur when the threshold value is globally calculated [7], [8].Therefore, in this research, the determination of threshold value is locally carried out by initially extracting some image features to overcome this problem.Image feature extraction is performed using the GLCM (Gray Level Co-occurrence Matrix) algorithm with statistical analysis to control texture.The features considered in this calculation are contrast, correlation, energy and entropy which are applied to the grayscale image by considering the spatial relationship among pixels [9].The next step is to define the RoI (Region of Interest) by breaking the image into several sub-images (blocks).The blocks will be the basis for the analysis and experiment of the features extracted.Each block will be calculated by using the parameters (contrast, correlation, energy and entropy features) and the results are averaged so it can earn more detailed calculation in the boundary of foreground and background parameters in the unknown region.
In previous research [9] the threshold value was determined by calculating pixels globally and used for the overall threshold.This research proposed to locally calculate the threshold value for each sub-image / RoI (Region of Interest).The results are used as the threshold value which is set as the alpha value in matte extraction so that the extracted object is obtained.PSNR is used to measure the proposed algorithm to find out the performance by comparing the extracted objects with ground truth.
The main problem in image segmentation is the illposed problem where the image does not have semantic information.Thus, user intervention is needed for the initiation.Wang and Cohen applied trimap [7], [8], [10], [11], [12], [13] as a pre-segmentation image (user intervention) to distinguish foreground and background in unknown regions.This technique produces nearperfect accuracy in separating foregrounds.However, it has limitations which are the user difficulties in defining trimaps on images with complex color samples such as hair, feathers and falling snow [14].Therefore, Levin et al. [15] modified a trimap-based user constraint into a scribble to distinguish the foreground and background areas, while the unknown area (α) is determined by a threshold value that is in the range of 0 -1.
The determination of threshold value in the unknown region in the range of 0 -1 was adapted by Basuki, et al [16] with the Fuzzy C-Means algorithm.It is obtained by collecting clusters in areas having similarity and proximity levels.They are calculated on the grayscale image, based on the distance between class and interclass so that the separation of foreground and background will be more expressive.This technique is repeatedly applied in semi-automatic segmentation of video objects [9], [16].
It is important to determine the threshold value in image matting.Manikandan et al [17] locally defined a threshold value based on matting to separate foreground and background.Experiments were carried out on natural image datasets with Image-Based Segmentation (Gray Histogram and Gradient Based) and Region Based Segmentation (Thresholding and Region Operation) techniques, by using KNN Matting for the classification.From the experiments, it is shown that the homogeneity, texture and structure of the character have a significant effect on the quality of image matting.
In general, thresholds are classified into two: global and local threshold values.For the global threshold, there is a change in illumination which is equal to the T value in all pixels causing a certain part which is deemed to be lighter and darker in the other (e.g.object shadows in the original image).But, this problem is resolved with local thresholds [18] which were adaptively applied in the Niblack, Souvola, Bernsen, Yanowitz and Bruckstein methods and Maximum Entropy [18].The application of image segmentation can reach an optimal value in the local adaptive threshold.This is because a more detailed calculation is earned by calculating the threshold value only in each sub-image.
Moghaddam, et al [6] applied the K-Means algorithm for classification to segment the marble surface image with the aim of separating dirt and scratches.The steps are: align the input image as pre-processing; convert the image to grayscale; perform image morphology to reduce dimensions; increase pixel intensity; perform image binarization to simplify computation; define the global threshold with the Otsu algorithm and perform filtering with the Mean Filter algorithm to refine the image.
Then, the threshold value is used to improve accuracy by showing a significant increase in the result.However, from these stages, the Otsu algorithm has not provided an optimal performance in defining the threshold to determine the alpha value [4].The threshold will be locally calculated in this research, referring to the previous related one where the maximum value of entropy is obtained in each Region of Interest (RoI).The value will be normalized and treated as an input alpha threshold.Feature extraction is carried out using GLCM (Gray Level Co-occurrence Matrix).It can analyze textures by considering the spatial relationship among pixels.The results of this extraction will provide a value used as input of the threshold value [19].
The aim of this research is to obtain an optimal threshold value to get a more precise performance of matte extraction, which has the impact of reducing errors in classification and increasing accuracy in object segmentation.The steps are by carrying out pixel calculations in each sub-image with GLCM-based feature extraction which is set as a threshold.Thus, there will be an increase in accuracy in the segmentation results since it is locally calculated.This research contributes to threshold adaptation so that it obtains optimal alpha values.
The structure of this research is written as follows.Section 2 is a review of previous research related to image matting.Section 3 describes the framework of the research carried out in image matting.Section 4 consists of discussion, observations and experiments carried out in determining the local adaptive threshold, matte extraction (alpha matting) and evaluation of the proposed algorithm.The last part is the conclusion of the research conducted.

Research Methods
Feature extraction for GLCM-based Alpha Matting is carried out in stages as shown in Figure 1.As an experiment, RGB images obtained from the image matting dataset are used and then converted into grayscale images to simplify the computing process.The alpha value is locally calculated.therefore the image is split into several windows with a size of 16 x 16 pixels called RoI (Region of Interest).Then, the GLCM (Gray Level Co-occurrence Matric) feature consisting of contrast, correlation, energy and entropy are calculated at angles 00, 450, 900 and 1350.The results of feature extraction calculations using GLCM are normalized and then treated as a threshold for image matting operations carried out with closed closed-form solution [15].To test the performance of the proposed algorithm, evaluation is carried out by comparing the results of the extracted image objects with the ground truth (reference image) available in the image dataset with PSNR (Peak Signal Noise to Ratio).

Discussions
Experiments were carried out on 12 images from the image matting dataset consisting of (alex.bmp,artem.bmp,dandelion.bmp,fire.bmp,hair.bmp,kid1.bmp,kid2.bmp,peacock.bmp, rabbit.bmp,teddy.bmp,teddy_ear.bmp,and vitaily.bmp) to test the proposed algorithm.From this data set, the image was converted into a grey scale domain and divided into 16 sub-images using RoI (Region of Interest).Features of each sub-image were extracted (local threshold) using GLCM with contrast, correlation, energy and entropy parameters.Then, the feature values were normalized and set as threshold values.
Image transformation is image conversion that aims to simplify the computational process.Image composition with an intensity value of 0 -255 in 1 (one) channel will be simpler than in 3 (three).The transformation process is carried out by Equation 1.

GrayImage=0,.989R+0.5870G+0.1140B
(1) R is the red channel, G is the green channel and B is the blue channel.With the simplification, it will be easier for the computational process and more efficient for processing time.In addition, the threshold value is only in the 0-1 range.
The Block Processing method [20] is used to determine the RoI.The grayscale image resulting from the transformation of the input image is divided into 16 blocks of the same size as shown in Fig. 3. Furthermore, the blocks are defined as sub-images in which each block is labelled for the foreground and background area initialization.
GLCM is used to extract features based on two-order texture calculations for calculating the relationship of two-pixel pairs in the original image.The sub-image blocks generated from the RoI are used as the data source for the testing process.The contrast correlation, energy and entropy will be included in the GLCM features analysis [21].The feature extraction calculation is done by taking a 5x5 pixel window.It is taken from the upper left corner of each sub-image as shown in Fig. 3. Following is the step to determine the value for each feature: Contrast is a measure of the presence of variations in the grey level of the pixel image as the result of Equation 2.
is the leveling value used in the computational process, while  is the pixel image intensity value,  is the smallest intensity value of the image pixel and  is the largest image pixel intensity value.

a) Correlation
It is a measure of the linear dependence among values of the gray level in the image as the result of Equation 3.
is the leveling value used in the computation process,  is the smallest intensity value of the image pixel and  is the largest image pixel intensity value.Meanwhile, ′′ is the standard deviation for all pixel intensities in the GLCM matrix.
Energy is the intensity value measurement which is based on the region area variation.It is calculated based on Equation (4).
is the average pixel intensity in the GLCM matrix.
Entropy is used to express the size of gray level irregularity in the image as the result of Equation (5).
The determination of feature values in the parameters was performed at rotation angles of 0°, 45°, 90° and 135°, applied to all windows.The average value of each feature is summed and divided by the number of analyzed features.
Matte quality has an even color distribution which is not dominant in black or white.The extracted area will be correlated with the foreground if pixels in the unknown region are predominantly white, and so will the black.The process of separating foreground from background has a technical difference between hard segmentation and image matting.In hard cementation, the separation process is strictly carried out on the pixels around the edge, so that the result seems to be rough due to the "jaggies" effect.Meanwhile, in image matting, the separation process is carried out by considering the unknown area (alpha), in which the pixels can be part of both (foreground/background).Therefore, the separation in this area requires a more precise threshold so that the quality will be more optimal.
The separation process of foreground and background is carried out with the assumption that the dominant pixel with white colour (α=1) will correlate with definite foreground while black (α=0) with background.The basic matting problem is determining the threshold value in the unknown area.The accuracy value will greatly determine the quality of the extracted foreground.Image matting operations are generally performed by setting a threshold value in the range 0 to 1.
In the previous research, Levin et.al. defined a threshold value in the range of 0.17 -0.15 with the assumption that the noise is within the range.In this technique, the threshold value is determined based on the user's perception by considering the image characteristics and set threshold value range.Thus, special skills are needed to determine the value to get the best results.The user error will greatly affect the extracted matte quality [22].
Basuki et al proposed an adaptive threshold algorithm by applying Fuzzy C-Means to determine the threshold value in the unknown area by considering the characteristics of the extracted image [4], [23].
In this study, the threshold value is locally set in each sub-image generated from the RoI.The average value of feature extraction from GLCM (Equation 2 -5) is normalized by dividing the entropy value by the average number of GLCM.It is shown in Equation 6.
where 1, 2are the contrast and correlation, 1, 2 are the energy and entropy.While  is the average image feature, in which GLCM is feature value in the sub-image.The result of equation ( 6) is treated as a threshold value for alpha matting [14] and as input for Equation 7.
The pixels in the extracted matte were replaced with the original image to get the desired object as illustrated in Fig. 4. Furthermore, a window of 5x5 pixels is taken in each sub-image.Windows of 5 x 5 pixels is used as it is the smallest possible block that can be used by GLCM due to the processing time for computation.It is used as the GLCM analysis area to obtain the normalized feature value and is set as the threshold value (α) in the image matting as shown in Equation (7).
From the dataset used, the image was converted into a grey scale domain and divided into 16 sub-images using RoI (Region of Interest).Features of each sub-image were extracted (local threshold) using GLCM with contrast, correlation, energy and entropy parameters.Then, the feature values were normalized and set as threshold values.
The threshold value obtained from GLCM feature extraction which was locally performed was then set as the alpha value in image matting (the process of pulling matte from the image).This process was carried out using the closed-form solution algorithm [13].The matte obtained (pixel value of 1) from this process was then injected with the pixel value of the original image which is at the same coordinates to obtain a foreground image.
Measure the performance of the extracted matte, is done using PSNR (Peak Signal Noise to Ratio) by comparing the extracted matte with the matte in the reference image from the dataset.

Results
Experimental results from the dataset used show that the object extraction process using local threshold calculations can optimize the alpha value so that image matting accuracy increases.
It shows that there is an increase in accuracy by using PSNR (Peak Noise to Ration) as in Equation 8.The image object from the matte extraction results obtained from the proposed model is compared with the ground truth (reference object from the dataset).The difference value between the two objects will then become the MSE (Mean Square Error) value as in Equation 9.The smaller the difference value between the two objects, the closer the accuracy to precision.Thus, the measurement weight has a large value which then the MSE value is input into the PSNR calculation. is the ground-truth image as the reference,  is the matte extracted by the  ×  system is the size of the executed image From the experiment, it is shown that matte extraction using the proposed algorithm has better accuracy shown by the performance evaluation graph in Figure (5).The average PSNR value of the proposed algorithm is higher than matte extraction based on a closed-form solution with adaptive global threshold Fuzzy C-Means [4], [9], [16].Generally, local threshold calculations have been proven to increase accuracy by up to 63%.However, calculations carried out locally also have the impact of increasing processing time.Next, the grayscale image is divided into 16 sub-images using RoI (Region of Interest).Features of each subimage are extracted (local threshold) using GLCM with contrast, correlation, energy and entropy parameters.Then, the feature values are normalized and set as a threshold value which is used as the alpha value in image matting.The experiment performed on public image datasets shows an increase in accuracy as shown by evaluation using the Peak Signal Noise to Ratio which has increased to 63% (shown in Figure 5).So, feature extraction using GLCM with contrast, correlation, energy and entropy parameters calculated on images locally can improve the quality of image matting-based image segmentation.

Figure 1 Figure. 2 .
Figure 1 illustrates the proposed feature-based object extraction system.Step 1, the extraction begins with the process of image acquisition as the source of analyzed data.The computational process needs to be simple for each image in the RGB domain by transformation process into grayscale.Then, it is divided into 16 subimage blocks using the RoI method.They are used as a basis for local feature extraction calculations with the determined parameters at angles of 0, 45, 90 and 135 degrees using GLCM.The result of feature extraction is

Figure 3 .
Figure 3. Region of interest calculation

Figure 5 .
Figure 5. Performance evaluation using PSNR 4. Conclusions GLCM-based object extraction for alpha matting on Natural Images has been locally tested on 12 images, obtained from the image matting dataset.This step is carried out by transforming the image to grey scale.Next, the grayscale image is divided into 16 sub-images using RoI (Region of Interest).Features of each subimage are extracted (local threshold) using GLCM with contrast, correlation, energy and entropy parameters.Then, the feature values are normalized and set as a threshold value which is used as the alpha value in image matting.The experiment performed on public image datasets shows an increase in accuracy as shown by evaluation using the Peak Signal Noise to Ratio