p -norms of histogram of oriented gradients for X-ray images

ABSTRACT


INTRODUCTION
The histogram of oriented gradients (HOG) is a well-known feature extraction algorithm used especially for human descriptors [1].The HOG descriptor is based on the location and orientation of the edge.The literature has used HOG descriptors in conjunction with linear support vector machine (SVM) classifiers [2] and other data mining algorithms [1].The strength of the HOG algorithm lies in its ability to capture edge and gradient information while decreasing the weight of irrelevant features due to illumination conditions [3].However, many researchers have aimed to improve the HOG algorithm to enhance detection performance in terms of accuracy, computational costs, and classification.For example, the CoHOG [4] approach uses pairs of gradient orientations to form histograms. HOG-LBP, which was developed in [5] and [3], combines a local binary pattern (LBP) and HOG descriptor to produce better results.HOG was also enhanced by a complementary descriptor in the proposed eHOG algorithm [6] to handle the scale variation of pedestrians.In [7], the authors reduced the dimensions of the HOG features by combining HOG and greedy algorithms for selected HOG descriptors.Other work has also been conducted to enhance HOG features, including [8] and [9], among others.
HOG [2] is an effective feature descriptor technique that computes edge direction by dividing an image into blocks from which it extracts the histogram gradient information.The HOG feature descriptor, defined in [2], extracts useful information from a given image and discards extraneous information.In general, the HOG process contains three phases for the divided blocks, as described in [1]: i) conducting image normalization, ii) computing the image gradients for x and y directions, and iii) collecting HOG descriptors for all blocks.In the original algorithm proposed in [2], the Euclidean norm was utilized to calculate the gradient magnitude.Since p-norms are crucial in both pure and applied mathematics, other norms can be used to calculate length or magnitude.In fact, L p spaces (or Lebesgue spaces) have a key role in mathematical analysis and are used in many disciplines, such as computer science, engineering, physics, statistics, and finance [10].The p-norm or L p -norm of a vector  = ( 1 ,  2 ) in R 2 is defined as follows: where p is a real number ≥ 1.If p = 1 (called the L 1 norm), the distance between two points is the total distance traveled: ‖‖ 1 = | 1 | + | 2 |.Thus, p = 1 counts the total changes in both x and y directions.If p = 2, we have the Euclidean norm (L 2 norm), i.e., the shortest distance between two points: ‖‖ 2 =  1 2 +  2 2 .It is widely used and is the only norm invariant under any unitary transform, such as rotation. = ∞ considers the highest gradient change and ignores the smallest one, where  ∞ norm of  = ( 1 ,  2 ) is defined as: Given the power of p-norms in maximizing performance or minimizing error, it could be argued that the Euclidean norm (or 2-norm) in HOG descriptors is not necessarily the only choice for detecting the actual histogram gradient in an image.In reality, image scaling and resolution are known to affect the performance of feature detection.Different p-norms may enhance the capturing of actual distances for a suitable value of p. Different p-norm values are expected to affect the performance of the HOG feature detector in different ways.
In   , all norms are equivalent [10]-i.e., for any norm ||||  and ||||  , there are positive constants c and k such that ||||  ≤ ||||  ≤ ||||  for all  ∈   .In other words, there is only one norm topology in   .Consequently, the convergence of a sequence of vectors in   is independent of the choice of p-norm.Nevertheless, different norms offer flexibility to prove convergence.In numerical analysis, choosing a suitable norm plays a role in efficiently determining convergence.Convergence in infinitedimensional vector spaces depends on the choice of p-norm, as p-norms in infinite-dimensional vector spaces are not equivalent [10].Usually, one norm is more suitable than others for solving certain problems.For instance, the 1-norm (rather than the 2-norm) can be used to find the total distance traveled in a rectangular street grid from a location marked as the origin and the destination point(x, y).In approximation theory, optimization problems depend on the choice of p-norm algorithm to obtain optimal solutions [11], [12].In short, solutions to a problem can vary with different norms.p-norms are widely used in machine learning and artificial intelligence and are powerful tools for evaluating and improving machine learning models.Prediction in machine learning relies on detecting patterns and inferences, rather than explicit instructions.Using sample data when building machine learning algorithms requires testing the predictive models to achieve the best performance.Maximizing the performance or minimizing the error of a model, in other words, aims to minimize the cost function.Norms are useful in measuring such errors [13].In addition, solving an optimization problem means finding the input that best minimizes some output penalty [14].Norms assign a magnitude to these outputs and hence enable penalties to be minimized.
In machine learning, different norms can be used for regularization and feature selection, as a loss function, and so on.Choosing which norm to use depends on the problem to be solved, as each norm has its own pros and cons [15].The principle of parsimony in machine learning is commonly used to create a prediction model with good sparse approximation.Regularization techniques with different norms are applied to address overfitting, outliers, and feature selection in a model [13].
The L 1 norm is often used to calculate the Manhattan or taxicab distance, mean absolute error (MAE), and the least absolute shrinkage and selection operator (LASSO).LASSO uses L 1 regularization to reduce the huge number of features in a model by removing less important features, since L 1 is robust towards outliers and missing data [13].On the other hand, the L 2 norm is often used to calculate Euclidean distance, mean squared error (MSE) and least squares error, and the ridge operator, which uses L 2 regularization to handle overfitting [13].There are many efficient methods available for the widely used L 2 norm; however, the L 2 norm is sensitive to outliers due to enormous squared error values.
Moreover, extracting meaningful features requires robust feature selection methods that can eliminate noisy points.In [16] proposed joint L 1,2 norm minimization on both loss function and regularization to make feature selection more efficient.This idea reflects the effect of using more than one norm within one technique.Researchers have worked on improving the framework of SVMs (along with other algorithms) using p-norms.Some have proposed a 1-norm SVM to achieve more sparse classifiers [17].Others have introduced a new approach using a 0 < p < 1 norm [18], which was shown to be more effective than the 1norm SVM.

4425
In this paper, we investigate different p-norm values in the HOG algorithm (p-HOG) to achieve better performance in classifying medical X-ray images.To test different norms in the proposed modification, we used a dataset of X-ray images from COVID-19 patients and recorded the results of comparing the original HOG and p-HOG algorithms using different p-norm values.Both were implemented in Python.The paper is organized as follows.In section 2, we describe the steps of including the p-norm in the HOG algorithm and present the experiments performed on the dataset.We display and discuss the results in section 3, then conclude the paper in section 4.

RESEARCH METHOD 2.1. p-HOG algorithm
As mentioned in the introduction, p-norms are widely used in machine learning to improve predictive models.Using the p-norms with the best performance and accuracy will affect our findings.In this section, we propose the p-HOG algorithm by changing how we measure distance using different p-norms instead of the Euclidean norm.The goal is to improve the HOG descriptor's detection process.In the original HOG algorithm, it is necessary to extract the main feature descriptor to identify image features.The information in each 8-pixel × 8-pixel cell is compacted to a nine-dimensional space consisting of nine angular bins which are equally divided over 0 0 -180 0 according to their gradient directions.The following steps explain the use of the p-norm in the algorithm.All steps except step 3 are derived from the HOG algorithm. Select the main block with a size ratio of 1:2. Divide the main block into 8-pixel × 8-pixel cells to compute the histogram of gradients in x and y directions (denoted as gx and gy, respectively), as shown in Figure 1. Figure 1 illustrates the histogram generated for a single cell.

Experiments Coronavirus disease 2019 (COVID-19
) is a widespread disease caused by SARS-CoV-2 [19].The disease first hit Wuhan, China, in late December 2019.As the number of confirmed cases increased rapidly, COVID-19 was declared a pandemic on March 11, 2020 [20].COVID-19 can be diagnosed based on a combination of symptoms, including fever (87.9%), dry cough (67.7%), fatigue (38.0%), and sputum production (33.4%), among others [21].On March 27, 2020, the World Health Organization (WHO) announced that the outbreak included 509,164 confirmed cases, which resulted in 23,335 deaths across 201 countries [22], [23] a death rate of approximately 4.6%.The growth in the number of diagnosed cases is due to close contact and human-to-human transmission [20], [24].Scientists all over the world are working hard to overcome this health crisis, which poses a severe threat to public health especially to older patients with chronic diseases due to the unpredictable jump in COVID-19 patients.Chest X-rays can be used to detect the features of pneumonia [21], [24]; therefore, this research will conduct comparison experiments using a set of X-ray images.

Dataset selection
In general, datasets of COVID-19 X-ray images are still evolving.The dataset used in this paper included two categories, +COVID-19 and -COVID-19, which indicate scans of patients with and without COVID-19, respectively.The +COVID-19 X-ray images used in this research were collected by [25].In the original data he provided, only positive COVID-19 cases were included.Other images related to SARS and MERS were ignored.The total images included 25 +COVID-19 images.The dataset used was relatively small; however, as these experiments were performed as a proof of concept and since this type of image is still not attainable at a large scale, this dataset is considered acceptable [26].The -COVID-19 data was downloaded from [27], where images of pneumonia were collected and stored in the Kaggle repository.However, [28] collected 25 images from the repository to avoid noisy, mislabeled, and blurry images.In this research, we use the final data from [28].

Experimental setup
Python 3.7.3 was set up with the packages necessary such as skimage, numpy, and openCV to perform the experiments presented in this paper.The specifications of the computer system used were as follows: Intel® Core™ i7-8750 H CPU (3.70 GHz, 9 M Cache) and 16.00 GB RAM.We used 10-fold cross-validation to ensure more reliable results from the generated models.
For each image, both the original HOG and the p-HOG feature detector descriptors are applied.The original HOG descriptor was extracted from the original Python implementation in the sklearn package, which depends on the scale-invariant feature transform (SIFT) algorithm [29].The p-HOG implementation was based on the implementation found in [30].The modifications were implemented in the methods, adding the "norm" parameter for the magnitude method and using the new value to find the orientation, gradient, and HOG calculations.
The generated HOG and p-HOG descriptors for all images were fed separately into the SVM algorithm to generate a different model for each, which was later used in classification.The model was evaluated using the unseen testing set, and the results were recorded and compared.To generate a full picture of the p-norm's effect on the results, different p-norm values were tested: p-norm = 1, 2, 10, 20, and ∞.Moreover, 10-fold cross-validation was used to ensure the reliability of the produced results, and a t-test was used to record whether the differences between results were statistically significant.

Performance measures
Different tools are used to compare results of different data mining algorithms [31].One such tool is the confusion matrix, which is used as "an indication of the properties of a classification (discriminant) rule" [31].The confusion matrix has four values that indicate the number of cases correctly and incorrectly classified for each class.In this research, there are two classes: +COVID-19 and -COVID-19.The true positive (TP) rate refers to the correct classification of the positive cases, and the false positive (FP) rate indicates the incorrect classification of positive cases as negative.The true negative (TN) rate describes the correct classification of normal cases, and the false negative (FN) rate represents the incorrect classification of normal cases.Although accuracy is not the only indicator used, it can be considered one of the most important.Accuracy is computed using (5).Another indication is recall or sensitivity, which shows how well the positive cases is calculated using (6) [31].Precision shows how many positively classified cases were relevant.High precision indicates that cases labeled as positive were indeed positive, with a very small number of FPs.Specificity is another useful indicator that describes the number of cases without the disease who test negative [31].Finally, the F-measure combines precision and recall to ensure that they are balanced.The calculating for the F-measure is being as [31]:

RESULTS AND DISCUSSION
The results in Table 1 reveal that using the linear kernel with the original HOG algorithm resulted in 94.8% accuracy and precision and recall values of 97% and 91.8%, respectively.Using the p-HOG algorithm (with any norm value) increased accuracy to 95% (L 2 , L ∞ ) or 96% (L 1 ).To ensure that the differences were statistically significant, a t-test was conducted between the original HOG and p-HOG results.The t-test showed significant differences (p < 0.1) in precision, recall, and specificity (p = 0.03, p = 0.08, and p = 0.03, respectively).Although accuracy was not significantly different, the improvements in recall and specificity seem to indicate a good influence on p-HOG.Another step was taken to further explore the recorded resultsthat is, to compare the norm values in the p-HOG implementation.In terms of recall and accuracy, L 1 had significantly better results than the other norm values (norm=2 and norm=10, respectively).The results in Table 2 reveal that using the RBF kernel with the original HOG algorithm resulted in 95.2% accuracy.These results were very similar to the p-HOG results using different norm values, with the exception of the L 1 result, which was high at 97.0%.The difference in accuracy is also reflected in the recall, F-measure, and specificity scores, where the p-HOG (L 1 ) results surpass (with statistical significance) the results of the original HOG algorithm and those of the p-HOG algorithm with other norm values.Again, L 1 showed better results than any other norm values and the original HOG algorithm, stressing the same conclusion as before.Finally, in Table 3, the results using the sigmoid kernel showed better results in general for p-HOG over the original HOG.However, based on the t-test results, the differences are not statistically significant except for the recall value, where p-HOG shows statistically significant improvement.In general, exploring different p-norm values enhanced the SVM's performance in classifying images.This result reveals that using p-HOG with L 1 to detect +COVID-19 X-ray images is promising.Our results stress that L 1 is robust towards outliers and consequently achieves better results in detecting +COVID-19 X-ray images.Detecting edges in X-rays occurs when colors change from white to black (or gray), or vice versa, which thus has a large gradient magnitude compared with changing colors gradually.

Figure 1 .
Figure 1.Histogram generated for a single cell

Figure 2 .
Figure 2. Histogram generated for a block of four cells

Table 1 .
The results of SVM with linear kernel using HOG and p-HOG with different norms

Table 2 .
The results of SVM with RBF kernel using HOG and p-HOG with different norms

Table 3 .
The results of SVM with sigmoid kernel using HOG and p-HOG with different norms