Classification of cervical cancer using deep learning and machine learning approach

Human and material resources are scarce in countries such as developing countries with a high rate of cervical cancer. In such an environment, the introduction of automatic diagnostic technology that can replace specialists is urgent. Finding best method of the known methods can accelerate the adoption of computer-aided diagnostic tools for cervical cancer. In this paper, we would like to investigate which method, machine learning or deep learning, has higher classification performance in diagnosing cervical cancer. 4,119 sheets, cervical cancer was to positive or negative class using Resnet-50 for deep learning, XGB, SVM and RF for mechine learning. In both experiments, square images which of vaginal wall regions are cut were used. In the machine learning, 10 major features were extracted from a total of 300 features. We


Introduction
Cervical cancer is the second most common cancer in women worldwide, with a death rate of 60%. In particular, about 85% of the deaths are women in developing countries every year [1][2][3]. Cervical cancer of the uterus is characterized by a long period of pre-invasive stages. It is sufficiently preventable as screening tests enable effective treatment of precancer stage lesions [4,5]. Nevertheless, it is analyzed that the mortality rate in developing countries is particularly 3 high because they are not receiving the benefits of preventive policies such as free vaccination programs and national examination programs provided by the state. [6,7].
There is a typical method of determining cervical cancer [8]. Cervicography is a test in which morphological abnormalities of the cervix are read by external professional readers after applying 5% acetic acid to the cervix and then magnifying the cervix up to 50 times the maximum with camera [9]. However, this method has a limitation in that it needs sufficient human and material resources given that accurate reading of cervical dilatation tests requires a professional reader licensed for international reading [10]. In addition, it is essential to increase objectivity through systematic and regular reader quality control as there may exist inter-intra observer errors. In addition, results may vary depending on the subjective view of the reader in cervical reading and researcher's health condition [11,12].
To compensate for these shortcomings, computer-aided diagnostic tools such as classic machine learning(ML) and deep learning(DL) recently been used to recognize patterns using computers are useful for medical diagnosis [13,14]. ML is originally a high-level concept of DL, and it refers to a series of processes that analyze and learn data and make decisions based on learning information [15]. Artificial neural networks modeled after neurons as human brain structures are included in ML. Such artificial neural network has limitations in that it cannot learn or handle new data due to the problem of gradient converging to zero and falling to the local maximum value. And DL has now been pushed beyond limits of the existing artificial neural network using pre-learning and dropout methods called DL [16]. In this paper, ML and DL concepts were used separately. 4 In the 2000s, ML-based cervical lesion screening techniques began to be actively studied [17].
In 2009, an artificial intelligence (AI) research team in Mexico conducted a study to classify negative and positive cervicography images using k-nearest neighbor algorithm(K-NN). With images of about 50 people, k-NN classified negative and positive images with a sensitivity of 71% and a specificity of 59% [18].
In another case of ML classification, in 2020 Indonesia, image processing was applied to cervicography images and classification of negative and positive images was conducted using support vector machine (SVM), resulting in an accuracy of 90% [19].
Since 2016, when the Fourth Industrial Revolution drew attention, numerous studies have focused on image classification by DL. In the field of cervical research, many research teams around the world are paying attention to detection and classification using DL [20]. In 2019, Utah State University in the United States used a faster region convolution neural network(F-RCNN) to automatically detect the cervical region in cervicography images and classify dysplasia and cancer, with an AUC of 0.91 [21]. In 2017, Japan conducted an experiment in which 500 images of cervical cancer were classified into three grades [severe dysplasia, carcinoma in situ (CIS), and invasive factor (IC)] using the research team's self-developed neural network, showing an accuracy of about 50% as an early stage experiment [22]. ML and DL are still actively studied in the medical field, especially in cervical lesion screening, with various techniques.
As mentioned earlier, DL is known to produce higher performance as a way of supplementing limitations of ML. In this study, we classify Cervicography images as negative and positive using ML and DL techniques which are computer-aided diagnostic tools under the same 5 environment and evaluate the performance of them. The reason why it is important to compare the two methods and find a method that performs better is that it is closely related to the faster introduction of automatic diagnostic technology in countries such as developing countries. Through this verification process, algorithms that are more suitable for classification of negative and positive cervical cancer are identified for clinical application. Such algorithms can be used to assist the diagnosis of cervical cancer. Data pre-processing. Cervicography images were generally obtained with width longer than height. The cervical area was located in the center of the image and the vaginal wall was often photographed on the left and right sides. In the stage of ML feature analysis, the entire input area is screened and the features are extracted, so it is recommended to remove areas other than the target area. The left and right ends were cropped to the same size and made into squares so that the width was equal to the height, provided that the cervical region was centered in the image.

Methods
Likewise, in DL, the same pre-processed image was used as input to meet the same conditions. Study design for ML analysis. The overall process of ML is shown in Figure 1. Train sets were pre-processed and converted into grayscale images as described earlier. After extracting more than 300 features from the pre-processed and converted image through the feature extraction stage, only major variables affecting the classification were selected through the Lasso model. First-order feature is a value that relies only on each pixel value of the image for analyzing onedimensional characteristics such as mean, maximum, and minimum not expressed in the image.
Second, the GLCM of second-order-feature is a matrix that takes into account the spatial relationship between the reference pixel and the adjacent pixel. Adjacent pixels are east (0), north-east (45), north (90), and north-west (135) of reference pixels. Third, the second-orderfeature GLRLM is a matrix that calculates how continuous pixels have the same value within a given direction and length. GLSZM identifies nine adjacent pixel zones, a matrix that calculates how continuous pixels with the same value are. Finally, LoG-filtered-first-order is a method of applying the Laplacian of Gaussian filter and then selecting the first order features. LoG filter is the application of Laplaceian filter after smoothing the image with Gaussian filter, a technique commonly used to find contours, the point of rapid change in the image.
ML generally adopt only key features to create easy-to-understand models, better-performing models, and fast-learning models. The lasso feature selection method using L1 regularization was adopted, with only a few important variables selected and coeffients of other variables 8 reduced to zero. This method is known to be simpler and more accurate than other models. Thus, it is often used to select variables [24]. with a lower value indicating more homogeneity in size zone volume [25].

ML classification architectures.
We used architecture XGB, RF, and SVM for ML classification. XGB is one of boosting methods that combines weak predictive models to create strong predictive models [26]. As shown in Figure 2-A, a pre-pruning method is used to compensate for the error of the previous tree and create the next tree. The RF in Figure 2 Study design for DL analysis. The entire DL process is shown in Figure 3. After preprocessing the same way as ML, the model was created with the Resnet-50 architecture. The generated model was applied to test set and model performance was evaluated through 5-fold crossvalidation.
DL classification architecture. We used Resnet-50 architecture, one of convolution neural networks (CNN) (Figure 4-A). As shown in Figure 4-B, the traditional CNN method was used to find the optimal value of input x through the learning layer, while Resnet was used to find the 10 optimal F(x)+x by adding input x after the learning layer. This approach has the advantage of optimizing input values for the next layer [29].
In this study, we used ImageNet to generate weight, the most common used one in the field of image recognition, consisting of 1.25 million real-life images and 1,000 classes. This weight was applied to transfer learning [30]. Parameters for learning were set to batch size of 40 and epoch 300 suitable for computing power. The learning rate was set to be 0.0001 to prevent significant changes in transition learning weights. For proper learning speed, the image was resized to 256x256.
Evaluation process. Cross validation is one of the evaluation methods to prevent overcompatibility and improve accuracy in evaluating model performance. In this paper, we validated classification performances of two algorithoms with 5-fold cross validation, a method in which all datasets were tested once each with a total of five verifications. For implementation under the same condition, the same five training sets and test sets were used in each method.

Results
Visualization. The bar graph in Figure 5 shows 10 selected features and the importance of each feature of ML method. Features with values greater than zero had positive linear relationships while features smaller than zero meant negative linear relationships.

11
To determine which area the Resnet recognized as negative or positive, results of the test set were visualized using Class Activation Map (CAM) to show which areas were given more weights ( Figure 6).

Evaluation.
To evaluate performances of the XGB, SVM, RF and the Resnet-50, results were validated by a 5-fold cross validation and evaluated as precision, recall, f1-score, and acurracy indicator as shown in Figure 7.  (Figure 8).

Discussion
Principal findings. In this study, we compared performance by automatically classifying negative and positive cervical images using DL and ML among previously known artificial intelligence techniques. Resnet-50 architecture were 15% higher than the average value of ML methods XGB, RF, and SVM architecture performance.

Results.
In this study, we compared performances of ML algorithms XGB, SVM, and RF and DL algorithm Resnet-50 among automatic classification techniques for cervical images to determine which algorithm would be more suitable to help with accurate diagnosis by clinicians.  Additionally, we will add a DL-based cervical detection model to the process in the future.
This study has used arbitrary crop as a pre-processing method, but if a cervical area detection model is used, a more accurate comparative study will be possible because only the exact desired area can be analyzed.

Strengths and limitations.
This study is the first to compare the performance of DL and ML in the field of automatic cervical cancer classification. Compared to other studies that have produced results using only one method of DL or ML, this work has the advantage of enabling cervical clinicians to objectively evaluate which automation algorithms are better as a computeraided diagnostic tool.
In pre-processing, the same width is cut from both ends to remove vaginal wall areas taken at both ends of the image, assuming that the cervix was exactly in the middle during pre-processing.
However, not all images have the cervix in the center. In addition, not all images have the exact form of a circle. In other words, the cut image may still contain vaginal walls which is unnecessary or contain the cervix that should be analyzed is cut. This may lead to poor accuracy for the comparison. In addition, when selecting ML features, the lasso technique was adopted and 10 features were selected. However, adopting a different feature selection method or selecting features more than or less than 10 might result in completely different results. The fact that human intervention is involved in the process itself has the disadvantage of not being able to accurately compare it with DL by making results inaccurate.

14
In this study, the performance of the existing ML and DL techniques was objectively evaluated and compared in the classification of negative and positive cervical cancer under the same environment.
The results of this study can serve as a criterion for objective evaluation of which technique clinicians will choose as a computer-assisted diagnostic tool in the future. In addition, when diagnosing cervical cancer, it can help to consider diagnostic factors in various ways by outputting both the automatically selected features (DL) and the randomly selected features (ML).
In future studies, a more accurate comparison of cervical cancer classification performance will be conducted by adding a detection model that accurately detects and analyzes only the cervix, and by minimizing human intervention in ML through finding and adopting the optimal feature selection technique.
The results of these additional studies are convinced that the automatic diagnosis of cervical cancer will be able to accelerate the introduction of computer-assisted diagnostic tools that will produce more accurate and reliable results in countries or regions where there is an urgent need. learning method of existing CNN(left) and ResNet (right). By adding shortcuts that adds input value to output value every two layers, errors is reduced faster than existing CNNs.