Noise-Against Skeleton Extraction Framework and Application on Hand Gesture Recognition

Extracting stable skeletons from noisy images is a challenging problem since the skeletonization method is prone to be affected by inner and border noise. Although many methods have been proposed in the past for increasing the antinoise ability of skeletonization methods, most of them either only overcome border noise or, at the cost of lost topology, degrade the effects of two noises. In this paper, we propose a skeleton extraction framework to enhance the robustness of the existing skeletonization method against both inner and border noise. In our approach, we first use the different scales of Gaussian filters to smooth the input image and obtain multiple representations. Then, binarization and skeletonization were performed to produce a series of binary images and a series of skeletal images. Next, we use our measure on these binary and skeletal images to find the most suitable skeleton. Since our measure considers both the skeleton image changes and binary image changes caused by using a filter, the selected skeleton is sufficiently robust and has all the necessary skeletal branches. The inner noise experiment and border noise experiment are conducted for comparison. From the perspective of the measure of the rate of variation in the skeleton, the proposed framework can reduce the inner noise by approximately 92% and the border noise by approximately 40%. In addition, the experiment on static hand gesture recognition has demonstrated that the introduction of our framework can increase up to 11% mean recognition accuracy.


I. INTRODUCTION
Skeletons are popular descriptors since they can preserve the original topology and connectivity of an object [1] in an image. They are widely applied in many fields, such as hand gesture recognition [2], human action recognition [3], image matching [4], hepatic vascular analysis and vessel segmentation [5], sketch-based modeling [6], human character animation [7] and quantitative structure imaging [8].
Skeletal images can be either obtained directly from the depth image captured by a depth camera such as Kinect or extracted by using the traditional skeletonization method The associate editor coordinating the review of this manuscript and approving it for publication was Hossein Rahmani . from an image captured by a regular camera. For approaches based on a depth camera, the generated skeleton may not be affected by the lighting, shade, and color; therefore, recognition based on it tends to have a better result. However, the cost, size, and availability of the depth camera limits its use [9]. In contrast, the traditional method has a broader range of applications since it requires only a regular camera. For the traditional method, it is necessary to convert RGB images into grayscale images, followed by binarization to extract the region of interest (ROI) as a foreground object. Then, skeletons can be extracted by applying the skeletonization method on binary images. One of the challenges of the traditional method is that the produced skeleton is not stable due to the existence of noise. VOLUME 11, 2023 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ This noise can be observed in the binary image and may significantly influence the resulting skeleton. From the perspective of binary images, noise can be divided into two types: border noise and inner noise. Border noises are caused by tiny changes in the edge of the foreground, and this type of noise may produce many unwanted branches in the skeletal image. Inner noises appear on the inside of the foreground, and this type of noise may create many unnecessary skeletal rings. Both of these noises and their influences on the skeleton are shown in Fig. 1. In Fig. 1, the gray region is the foreground in the binary image, and the black region is the skeleton extracted by skeletonization. For visualization, the effects of the noise on the skeleton are marked with red rectangles. Many different algorithms have been proposed in recent decades to obtain a more stable skeleton when using the traditional method. There are three different approaches to addressing the problem caused by noise: the skeletonizationbased approach, the pruning-based approach, and the scalespace-based approach. The skeletonization-based approach concentrates on the improved robustness of the skeletonization method with respect to noise, such as [10], [11], [12], [13], [14], and [15]. The merit of this kind of method is that it can reduce the effects of border noise and does not require extra operations, which has a lower computational cost. The drawback is that they fail to alleviate the effects caused by inner noise. Pruning-based approaches, such as [16], [17], [18], [19], [20], [21], [22], and [23], tend to introduce the need for postprocessing after skeletonization to remove insignificant or unwanted branches. Almost all pruning-based algorithms are based on a salience measurement of the skeleton branches or their corresponding contour. Then, they remove the skeleton branches whose salience value measures less than a given threshold, which usually requires manual tuning.
The merit of this kind of method is that it can significantly offset the problem caused by border noise, even when the extent of the border noise is significant. However, similar to the skeleton-based method, the pruning-based method is still unable to degrade the influence of inner noise. Scale-spacebased approaches [24], [25], [26] adopt filters to smooth the image and remove noise.
The advantage of this approach is that it can deal with both inner and border noise. The defect of these methods is that they require an adequately set filter parameter. The reason is that when the filter parameter is small, the ability to filter noise is limited. When the filter parameter is large, the original geometrical and topotypical features may not be preserved.
In this paper, we propose a noise-against-skeletonization extraction framework, which can make the resulting skeleton more stable by degrading the influence caused by noise while retaining necessary skeleton branches. Our method first generates multiple skeletons for images according to different sets of filter parameters. Then, our framework automatically selects the most suitable filter parameter and its corresponding skeleton by using our proposed measure.
The main contributions of this paper are summarized as follows: • We proposed a skeletonization extraction framework in which a novel measure based on both skeleton information and region information is used to select a suitable representation of the skeletons to strengthen the robustness of the existing skeletonization method.
• The robustness of the proposed framework against inner and border noise was proven during the artificial noise experiment. In the experiment, we also noticed that the proposed framework does not suffer from the distortion problem.
• We applied the proposed framework in the task of static gesture hand recognition and proved that the use of the proposed framework can help to improve recognition accuracy.
The remainder of this paper is organized as follows. Section II presents a review of several well-known denoising methods. Section III describes our framework. Inner and border noise experiments are detailed in Section IV. Section V presents a comparison of the results for static hand gesture recognition when using the skeleton extracted by our method and those of other methods. Section VI concludes this paper. Finally, limitations and future work are presented in Section VII.

II. RELATED WORKS
As we mentioned in the former section, there are three different denoising techniques, and in this section, we review several popular or recent methods for each technique.

A. SKELETONIZATION-BASED APPROACH
Skeletonization methods can be further divided into three basic types [1]: Voronoi-based methods, continuous curve propagation approaches, and digital approaches. Skeletonization-based denoising methods are skeletonization algorithms with the ability to suppress border noise. Most of them belong to Voronoi-based [12], [14] and digital approaches [9], [13].
Durix et al. [13] recently proposed one-step compact skeletonization, an antinoise skeletonization Voronoi-based method. This method directly computes a simple skeleton with a few branches by propagating selected Voronoi circles within the shape while discarding propagation directions that designate negligible information. They demonstrated that their method could produce a clean skeleton and avoid creating branches caused by noise.
Among digital denoising skeletonization approaches, thinning algorithms have attracted much attention since these methods tend to use devised templates or criteria to extract skeletons, which are easy to modify to improve denoising. For example, in 1993, Shih and Wong [14] proposed an efficient thinning algorithm against border noise by applying 69 group templates. In recent years, Ma et al. [10] proposed a fully parallel skeletonization algorithm against border noise, which requires 13 group templates.
In addition to the two types of antiborder-noise skeletonization approaches mentioned above, Yang et al. [12] proposed a new kind of method based on skeleton grafting. Their approach, inspired by tree grafting, generates a skeleton in a coarse-to-fine fashion.
Compared with normal skeletonization methods, noiseagainst-skeletonization methods are more robust but can deal only with border noise. They are still sensitive to inside noise.

B. PRUNING-BASED APPROACH
Pruning-based approaches are postprocessing approaches that are applied after the skeleton is obtained. These methods have an outstanding ability to remove unwanted branches.
One of the most cited pruning algorithms was proposed by Bai et al. [16]. That method is based on contour partitioning with discrete curve evolution. They linked every skeleton point to a boundary point that is tangential to its maximal circle and then deleted all skeleton points whose corresponding boundary point lies on the same contour segment.
Shen proposed a pruning method with a bending potential ratio [17] in which the pruning of the skeletal branches depends on the context of the boundary segment that corresponds to the branch.
A pruning method based on information fusion was proposed by Liu et al. [18]. They considered redundant branch length, region reconstruction, and local salience degree to determine the pruning of a branch.
Many other pruning methods have been proposed. Among them, Andres's method [23], Stelios's method [19], and Siyu's method [21] are also fascinating pruning methods. In addition, some pruning methods have been proposed along with skeletonization for some specified tasks in recent years, such as methods described in [27] and in [28].
The advantage of the pruning-based approach is that it can completely remove unwanted branches caused by border noise, even when the amount of border noise is significant. However, similar to the skeletonization-based method, they also fail to suppress the inner noise.

C. SCALE-SPACE-BASED APPROACH
The scale-space-based methods adopt Gaussian filters to smooth the image and to remove noise. They are promising methods since they can suppress inner and border noise.
A scale-space method was proposed by Hoffman and Wong [24] for thinning binary and grayscale images. First, they extracted the union of the topographical features, which consists of the peak, the ridge, and the saddle point, by applying the Gaussian filter, whose scale keeps increasing, to the original image. Then, these topographical features formed the skeleton.
Cai proposed a method [25] that can decrease the effects of noise by using oriented Gaussian filters, which help determine principal directions and help classify ridges, valleys, and edges. Their method is robust to interference from different types of noise. Their strategy is applied to handwriting and fingerprint image enhancement.
Houssem proposed an adaptive framework [26] that uses scale-space filtering to make thinning algorithms robust against noise. In their framework, multiple skeletons are first generated within various filtering scales, followed by using their proposed measure to select a suitable skeleton. Experiments have demonstrated that their method can yield good results in sketch images. However, their approach may break the original topology and connectivity since their evaluation measure considered only the skeleton information.

III. PROPOSED FRAMEWORK
The proposed framework is an improved version of Houssem's framework, which also first uses scale-space filtering to generate multiple representations of input images within a considerable scale of filtering. Then, our modified measure is used to select the most suitable skeleton. In our modified measure, we consider the relative variation between the skeleton generated from the filtered image and that generated from the original image; moreover, the relative change between the binary image produced from the filtered image and that produced from the original image is considered. In addition, the inside characteristics of the skeleton generated from the filtered image are also considered. Therefore, the selected skeleton tends to have few unwanted branches and rings, and all necessary skeletal branches are retained.
Before formally presenting the details of our framework, to simplify, there is only one object (connected component) in the foreground image. In addition, we assume that the input image is in grayscale.
The Gaussian kernel is used in our framework. The value of the element in the Gaussian kernel whose coordinate is VOLUME 11, 2023 (x, y) can be denoted as G n (x, y, σ ), which can be computed according to the following formulas: where σ is the smoothing parameter of the Gaussian kernel that controls the scale and k is the kernel size, whose value depends on σ . ⌊.⌋ is the ceiling function that computes the smallest integer that is greater than or equal to the input number.
By changing the value of σ , from the initial σ init increasing to σ max , whose value is a multiple of σ init , with the step of σ init , a series of different Gaussian kernels can be obtained. Then, these kernels are applied to the original input grayscale image to generate a series of filtered images. After binarizing those filtered grayscale images, we can obtain different binary images. Next, multiple skeletons can be extracted from these binary images by using the skeletonization algorithm.
A suitable σ can result in a proper binary image and a proper skeleton, in which the number of unwanted branches from the noise is as small as possible and the number of necessary branches is as large as possible. To select this suitable σ , we develop a novel measurement in which an improper σ may generate an overly large value, and a proper σ may produce a low value.
For a given σ , a Gaussian kernel can be determined. Then, the binary image and skeletal image derived from the grayscale image that is blurred by this Gaussian kernel can be denoted by B σ and S σ , respectively. Similarly, the binary and skeletal images derived from the original grayscale images can be denoted by B o and S o , respectively. The value of the proposed measure M for the given σ can be computed by using B σ , S σ , B o and S o , whose formula is shown in the following.
From Eq. 4, it is clear that the proposed measurement is composed of three functions: the F 1 function, F 2 function, and F 3 function. First, F 1 is defined as follows.
where H and W are the height and width of S σ , respectively. B of is defined as a particular binary image whose foreground Function F 1 generates a large value in two cases. One of them is when there are many cross points in the skeleton, which denotes that it may still be a noisy skeleton with many unwanted branches or circles. Another case is when the skeleton image derived from the filtered image produces the distortion problem so that the skeletal pixel is located in the background region of the original pattern. We differentiate the penalization for the outer background (foreground pixels in B of ) and inner background because in principle, the skeleton should lie on the inside of the original object.
As the second part of the measure, function F 2 is defined as follows: where α is a threshold that controls the penalty. If the ratio of the number of foreground pixels of the skeleton extracted from the filtered image to that of the original skeleton is higher than α, then there is no penalization. Otherwise, penalization is introduced; in our framework, we set it to 0.42. Area() is used to count the number of foreground pixels of the input binary or skeletal image. Overall, the F 2 function is used to penalize the case when there is a significant difference between the filtered skeleton and the original skeleton. The last function F 3 is defined as follows: In Eq. 12, β is a threshold for controlling the penalty. The penalty is introduced only when the relative difference in the number of foreground pixels is above this β. Here, we set it to 0.1. Function F 31 is used to ensure that the changes in foreground pixels of the binary image caused by the introduction of the filter are within a reasonable range. In Eq. 13, N region (image) function is used to compute the total number of connected components in the input image. Therefore, we know that F 32 is used to ensure that the number of connected components of binary image B σ remains the same as that of the original binary image B o . In B o , there is only one connected component according to our assumption at the beginning of this section. Overall, function F 3 is used to detect the distortion caused by using an improper Gaussian kernel from the perspective of the binary image and to introduce enough penalty.
After obtaining the value of our measure for various σ , each of whose value is the sum of the value of three subfunctions, the best σ best can be selected. A σ can be deemed a σ best only when its derived skeleton and binary image produce the minimum output under our measure. Then, the derived skeleton from σ best is considered the best skeleton, which B σ = Binarize(I σ ) 8: M .append(m) 11: S.append(S σ ) 12: end for 13: Index = argmin(M ) 14: S output = S[Index] 15: return S output becomes the output of our framework. The entire procedure of the proposed framework is summarized in Algorithm 1.

IV. INNER AND BORDER NOISE EXPERIMENT
The proposed framework (PF) and Houssem's ATF framework are implemented in MATLAB on a core i7 Intel CPU. Since both of these frameworks require calling the skeletonization method to extract the skeleton, we also implemented the FPSA algorithm and SPSM algorithm, which are both skeletonization methods described in [10] and in [11].
To objectively evaluate the skeleton extracted by the PF, several performance measures are defined as follows: • Number of endpoints (NEP): This measure counts the total number of endpoints in all foreground pixels in skeleton S, which can be computed by: • Number of crosspoints (NCP): This measure counts the total number of cross points in all foreground pixels in skeleton S, which can be computed by: • Rate of variation in the skeleton (RVS): This measure calculates the relative deviation of two skeletons, which are extracted from a noisy image and a clean image. The value of VOLUME 11, 2023 RVS can be calculated by using the following formula: where S n denotes the skeleton extracted from a noisy image, S c denotes the skeleton extracted from a clean image, and S c S n denotes those foreground pixels that belong to set S c but do not belong to set S n . Area(), please refer to Eq. 10.
In later noisy experiments, the antinoise ability of six methods of skeletonization extraction is studied. Four methods are combinations of the frameworks and skeletonization methods. They are PF+FPSA, PF+SPSM, ATF+FPSA, and ATF+SPSM. The other two are the SPSM skeletonization method and the FPSA skeletonization method. For all extraction frameworks, the initial σ init is set as one, and σ max is set as 12.
To better conduct the comparison, a simple and clean human-shape image (named dude8), which comes from the well-known benchmark Kimia-99, is selected as the base image. Dude8 was used in both noisy experiments.
The extracted skeletons from dude8 for all six methods are presented in Fig. 3, and they are considered clean skeletons, which are later compared when computing the RVS with the skeletons extracted from noise. From Fig. 3, there are many noticeable things. First, for the visualization, we use gray to denote the original pattern and black to indicate the extracted skeleton. All the experimental images presented later share the same style as this image. Next, according to the perception of human vision for this human shape, it is clear that a good skeleton should retain five skeletal branches, representing one head, two arms, and two legs. However, the skeletons extracted by the ATF+SPSM method and ATF+FPSA have only three and four skeletal branches, respectively, which demonstrates that they have suffered some distortion.

A. INNER NOISE EXPERIMENT
The inner noise experiment is divided into five subtests according to the distinct noise levels, which start at 2% and gradually rise to 10%. In each subtest, inner noise is randomly added to the inner region of the original object. Then, six methods are used to extract the skeletons. Figs. 4, 5, and 6 present the skeletons extracted from the noisy image under noise levels of 2%, 6%, and 10% by using six methods. In addition, the RVS, NEP, and NCP values of the skeleton extracted by each distinct method are measured and recorded in each subtest. The whole inner noise experiment was independently conducted 100 times, and the statistical descriptions of the parameters of RVS NEP and NCP are presented in Table 1, Table 2, and Table 3.
Figs. 4 to 6 present an intuitive sense of the robustness of the inner noise for the various methods. In Fig 4, it is clear that the two pure skeletonization methods, SPSM and FPSA, are prone to effects by the inner noise and produce many meaningless rings, even when the noise level is very small. In contrast, the four framework-based methods can  create a relatively stable skeleton under 2% inner noise. Their skeletons have only nuances with that used in Fig. 3. From the perspective of skeleton completeness, in Fig. 4, the two ATF-based methods have both suffered from the problem of skeleton distortion since both ATF+SPSM and ATF+FPSA have only three skeletal branches. In contrast, the two  methods based on our framework can generate a complete skeleton with all five skeletal branches.
In Fig. 5 and Fig. 6, as the noise level of the inner noise increases, increasingly more rings appear in the results of the SPSM and FPSA methods. In contrast, all the  framework-based methods produce skeletons similar to the one shown in Fig. 4 so that it is believable that they are robust to the inner noise. Among the framework-based methods, one difference between the methods based on the PF and those based on the ATF framework is that the methods based on the PF can appropriately preserve all the necessary skeletal branches, while the ATF-based method cannot.
In Table 1, the mean value of RVS of the SPSM method and that of the FPSA method are much higher than that of the other four framework-based methods in all subtests, and they keep increasing with the increment of the noise level. This is consistent with our visual perception in Fig. 4 to Fig. 6. Among these methods, two methods based on the PF have the lowest two values in terms of RVS for each subtest, which demonstrates that they are the two most robust methods to inner noise among these methods.
In addition, in Table 1, by comparing each data listed in the third row with each data recorded in the first row, it is noticed that using the PF can reduce noise by approximately 92% on average. Similar results can be obtained when comparing each data point listed in the sixth row with each data point recorded in the fourth row.
From Table 2, we can also learn that the average NEP for both PF-based methods is five, which means that the proposed method has the potential to maintain skeleton completeness. From Table 3, the proposed framework-based methods can also maintain the value of NCP sufficiently stably with an increasing noise level.

B. BORDER NOISE EXPERIMENT
Similar to the inner noise experiment, the border noise experiment also consists of five subtests according to the noise level. The initial noise level is 30%, which gradually escalates to 50% in steps of 5%. In each subtest, border noise is randomly added to the boundary of the original image, and then the six methods are used to extract the skeleton from them. Each subtest is conducted independently 100 times.   It is clear that in Fig. 8, the SPSM method and FPSA method produced seven and six skeletal branches, respectively. These are two and one more than the number of skeleton branches they extracted in a clean image, respectively. The ATF+SPSM and ATF+FPSA methods still suffered the problem of excessive erosion. PF+SPSM and PF+FPSA can generate a satisfactory skeleton, although there are some tiny position changes in the skeleton. In Fig. 9 and Fig. 10, we can see that as the border noise level increases, an increasing number of unwanted branches appear in the results from the  SPSM and FPSA methods. ATF+SPSM and ATF+FPSA are still hindered by a distortion defect. Only two proposed framework-based methods can suppress border noise and maintain the original structure.    Table 5, and Table 6 present the mean values of RVS, NEP, and NCP, respectively, from which it can be seen that with an increasing level of border noise, the values of all three parameters of SPSM and FPSA rapidly increase. In contrast, this trend in the framework-based methods is not apparent, especially when considering NEP and NCP. In addition, it is noted that under various noise levels, those methods based on our frameworks still have a strong ability against border noise, whose RVS is lowest when compared with the other techniques that use the same skeletonization method. In addition, the NEP of the method based on our framework is always 5, which demonstrates that using our framework may not introduce distortion.
In addition, from Table 4, by comparing each data listed in the third row with each data recorded in the first row, it is noticed that using the PF can reduce noise by approximately 40% on average. Similar results can be obtained when comparing each data point listed in the sixth row with each data point recorded in the fourth row.
From the inner noise experiment and border noise experiment, it is confirmed that the PF can enhance the robustness of the skeletonization algorithm, and the introduction of the PF will not cause skeleton distortion. Therefore, the PF is an excellent option for improving the stability of existing skeletonization algorithms.

V. STATIC HAND GESTURE RECOGNITION EXPERIMENT
To further explore the performance of SPSM, ATF+SPSM, PF+SPSM, FPSA, ATF+FPSA, and PF+FPSA, static hand gesture recognition experiments are conducted. All the static hand gesture images used in this experiment are part of a wellknown public dataset named MU_ HandImages_ASL [29]. Nine different hand gesture classes are considered in our experiment, and in each category, there are 70 RGB images. As a result, there are a total of 630 RGB images. Examples of nine classes of hand gestures are shown in Fig. 10.

A. OVERVIEW
In our experiment, for static hand gesture recognition, we adopted a standard machine learning pipeline, which includes feature extraction, model training, and evaluation. VOLUME 11, 2023 During the feature extraction procedure, the original RGB static hand gesture image is first converted into a grayscale image, and the skeleton and its corresponding binary image (pattern image) are extracted. For the implemented framework-based method, it is easy to extract both the skeleton and pattern image from the grayscale image (the skeleton should be derived from this pattern) because the operation of image binarization is embedded in the framework. However, since skeletonization methods can process only binary images, SPSM and FPSA must introduce an extra binarization operation. The result of the binarization is considered a pattern and saved. After obtaining the skeleton and pattern image, the next step is to convert them into a 7-dimensional feature vector, which is presented in the following subsection.
Four well-known classification models are adopted in the current experiment: the decision tree model (DT), the bagging tree model (BT), the support vector machine model (SVM), and the k-nearest neighbor model (KNN). All these models are created in MATLAB, and their parameters are set as default values. For better training and evaluation of these models, the original dataset is randomly divided into two subsets, the training-validation set and the testing set. There are 500 images in the training-validation set, which is used in model training, and 130 images in the testing set, which is used in model evaluation. During the training procedure, a 10-fold cross-validation strategy is adopted. Since there is a balanced multiclass classification task in the present experiment, accuracy is used to evaluate the models' performance. The formula for the accuracy is shown as follows: where D is a set consisting of all the feature vectors and corresponding labels, and m is the number of the pair of feature vectors x and its corresponding actual label y. f (x) is the predicted output of a classifier when the input feature vector is x. In Fig. 11, a block diagram of the entire procedure from feature extraction to model evaluation is presented.

B. FEATURE VECTOR EXTRACTION FROM SKELETON AND PATTERN IMAGES
After a skeletal image and its pattern image are obtained from an input image, it is necessary to transform the skeletal images along with its pattern image to a 7-dimensional feature vector that is used in later classification. This 7-dimensional vector includes the number of endpoints (NEP), number of crosspoints (NCP), whether the inner hole exists or not (EIH), the rate of deviation of the thickness of the endpoints (RDTE), the average distance between the thickest point in the pattern image and the endpoints in the skeletal image (ADTPE), the distance between the pattern thickest point and the skeletal thickest point (DPSP) and the average angle between the endpoint and the main axis (AAEP).
The values of NEP and NCP can be calculated by using the formula in Eq. 14 and Eq. 16, the EIH can be determined by using an all-ones matrix to subtract the skeleton matrix and count the number of regions. EIH is one when the number of regions is above one; otherwise, it is zero.
Before presenting the definition of RDTE, ADTPE, and DPSP, the concept of thickness is first introduced. The thickness of a pixel is defined by the distance between this pixel and its closest pixel located on the boundary in the pattern image. Boundary pixels are composed of the foreground pixel, whose 4 neighbors have at least one background pixel. In Fig 2 (a), the region marked by brown is the boundary.
For a given skeleton that has n endpoints, all endpoints can form a set S EP , in which the i-th endpoint is denoted as S EP i . The thickness of S EP i can be denoted as T EP i . The set formed by all T EP i is denoted as T S EP . Then, the RDTE for this skeleton can be computed by using the following formula: We assume the coordinates of the thickest pixel in the pattern image are P x and P y , and its thickness is T p . We suppose that in a skeletal image, there are n endpoints. The coordinates of the i-th endpoint are denoted as EP i x and EP i y . Then, the ADTPE can be calculated by using the following formula: Assuming the coordinate of the thickest pixel in the pattern image is P x and P y , and the coordinate of the thickest pixel in the skeletal image is S x and S y , the DPSP can be calculated according to the following formula: Before obtaining the value of the AAEP, the main axis is defined by the thickest point in the pattern image and the farthest endpoint in the skeletal image from that point. Based on that, it is easy to calculate the relative angle of the remaining endpoint to these axes, and the AAEP is the mean of these angles. If the number of endpoints is less than 2, the AAEP is set as 0.

C. RESULTS OF THE EXPERIMENT
Since the classifier accuracy may drift when using different test sets and validation sets, thus influencing our core task of studying how different skeletonization methods influence final recognition accuracy, we conducted the training and evaluation on different classifiers 80 times independently, in which images in the test set and validation set were randomly selected. Table 7 presents the maximum validation accuracy when training different classifiers on feature vectors derived from skeletons extracted by different skeletonization methods. In the table, we can see that under different classifiers, the maximum validation accuracy of PF+SPSM outperforms SPSM and ATF+SPSM, and the maximum validation accuracy of PF+FPSA exceeds that of FPSA and ATF+FPSA. For example, when using the BT classifier, the validation accuracy of PF+SPSM and PF+FPSA reaches 89.60%. The confusion matrix for maximum validation accuracy when using the BT classifier and various skeletonization methods can be seen in Fig. 12. Tables 8 and 9 present the mean validation accuracy and min validation accuracy when using different skeletonization methods and classifiers, from which we can observe that the validation accuracy of using the PF-based skeletonization method is higher than that of using other skeletonization methods.
In addition, by comparing each data recorded in the third row and each data recorded in the first row in Table 8, it is clear that the maximum increase in the mean accuracy on the validation set is approximately 11%, which denotes that the use of the PF can greatly improve recognition accuracy.   Table 7, Table 10 presents the maximum testing accuracy when testing different classifiers by using feature vectors derived from skeletons extracted by various skeletonization methods. The testing accuracy based on the PF+SPSM method is higher than that of the SPSM method and ATF+SPSM, and the accuracy of the PF+FPSA method is higher than that of FPSA and ATF+FPSA. The maximum testing accuracy based on PF+SPSM is 94.29% and that based on PF+FPSA is 93.45% when using the BT classifier. Confusion matrix for maximizing the validation accuracy when using a BT and different skeletonization methods, which corresponds to the third column in Table 7.  Similar to Fig. 12, Fig. 13 shows the confusion matrix for maximum testing accuracy when using a BT and different skeletonization methods. Table 11 and Table 12     Confusion matrix for maximum testing accuracy when using a BT and different skeletonization methods, which corresponds to the third column in Table 10.

Similar to
mean and min testing accuracy when using different skeletonization methods and classifiers.
In addition, by comparing each data recorded in the third row and each data recorded in the first row in Table 11, it is clear that the maximum increase in the mean accuracy on the testing set is also approximately 11%.
Using the PF can significantly increase validation and testing accuracy for four classifiers in the static gesture recognition experiment. The reason is that the skeleton extracted by the PF is more robust. Additionally, it can preserve the necessary branches to avoid the appearance of distortion.

VI. CONCLUSION
In this work, we proposed a novel noise-against-skeleton extraction framework. Our framework can enhance the robustness of the existing skeletonization method for both inner and border noise. Two noise experiments demonstrated the robustness of the proposed framework to noise. In addition, the proposed framework can appropriately preserve the essential skeleton and avoid the distortion problem. These factors make it promising for applications along with existing skeletonization methods in the pattern recognition field. The results of experiments on static hand gesture recognition support the opinion that using the proposed framework can improve the validation accuracy and the testing accuracy of four well-known classifiers.

VII. LIMITATIONS AND FUTURE WORK
One of the limitations of this paper is that in static hand gesture recognition experiments, we only focus on the influence of introducing the proposed framework to the recognition accuracy and temporarily omit some important factors.
In the future, it is possible to further improve recognition accuracy by considering other important factors. Since feature selection may greatly affect the recognition accuracy, it is necessary to find more important features and from them to obtain the optimal features. Besides, the configuration of the classifiers may alter the recognition accuracy as well, therefore, it is necessary to put more effort into tuning the classifiers. It will also be interesting to consider other classifiers. In addition, in our paper, there are only nine different types of static hand gestures, and it will be exciting to explore the performance by including more types of static hand gestures.
Deep learning methods to extract skeletons have emerged in these years, such as [30] and [31]. Their robustness to noise is still far from satisfying. It is also attractive to combine our framework and these new methods in the future. VIKTAR YUREVICH TSVIATKOU received the Ph.D. degree from the Belarusian State University of Informatics and Radioelectronics, in 1999. He works at the Belarusian State University of Informatics and Radioelectronics, where he is currently a Professor and the Dean of the Department of Infocommunication Technologies, Faculty of Information Security. His research interests include digital image processing, pattern recognition, signal processing, and information theory.
ANATOLIY ANTONOVICH BORISKEVICH is a Professor with the Belarusian State University of Informatics and Radioelectronics. His research interests include digital signal processing, machine learning, and deep learning. VOLUME 11, 2023