Abstract

Cervical cancer is the fourth most common type of cancer and is also a leading cause of mortality among women across the world. Various types of screening tests are used for its diagnosis, but the most popular one is the Papanicolaou smear test, in which cell cytology is carried out. It is a reliable tool for early identification of cervical cancer, but there is always a chance of misdiagnosis because of possible errors in human observations. In this paper, an auto-assisted cervical cancer screening system is proposed that uses a convolutional neural network trained on Cervical Cells database. The training of the network is accomplished through transfer learning, whereby initializing weights are obtained from the training on ImageNet dataset. After fine-tuning the network on the Cervical Cells database, the feature vector is extracted from the last fully connected layer of convolutional neural network. For final classification/screening of the cell samples, three different classifiers are proposed including Softmax regression (SR), Support vector machine (SVM), and GentleBoost ensemble of decision trees (GEDT). The performance of the proposed screening system is evaluated for two different testing protocols, namely, 2-class problem and 7-class problem, on the Herlev database. Classification accuracies of SR, SVM, and GEDT for the 2-class problem are found to be 98.8%, 99.5%, and 99.6%, respectively, while for the 7-class problem, they are 97.21%, 98.12%, and 98.85%, respectively. These results show that the proposed system provides better performance than its previous counterparts under various testing conditions.

1. Introduction

Cervical cancer is the leading cause of cancer-related deaths in females. It arises from the cervix, i.e., the lower and narrow end of the uterus, as shown in Figure 1. It starts due to abnormal growth of cells that have the ability to spread into other parts of the body. Human Papilloma Virus (HPV) infection is the major risk factor for cervical cancer. There are no symptoms in the beginning of the disease, while with the passage of time symptoms may include abnormal vaginal bleeding, pelvic pain, and pain during sexual intercourse. It can be diagnosed earlier through regular medical check-ups [1].

There are many diagnostic tests for cervical cancer identification. Papanicolaou (PAP) smear is the most commonly used test for cervical cancer screening worldwide. In the conventional PAP smear procedure [2], a speculum is inserted into vagina to widen the walls so that vaginal smear can be viewed. Several weeks are required to prepare for the final results of the PAP smear test. The process is time-consuming and laborious. It requires microscopic examination of hundreds of thousands of cells for the diagnosis of precancerous and cancerous cells. In every 10 to 15 positive cases, there is a chance of one case to be missed in conventional screening [3].

The rate of the incidence of cervical cancer is lesser in the USA and other parts of developed countries because of early detection and better screening methods [4]. Its rate of occurrence has been dropped by 80% since the screening systems are introduced in some Nordic countries. In Sweden [5], it is dropped by 65% during the last four decades and the occurrence of cervical cancer and mortality figures are stable over the last decade. However, improved screening systems are still unavailable in underdeveloped countries, partly due to the complexity and tedious nature of manual screening of abnormal cells from a cervical cytology specimen [6, 7]. While auto-assisted mass screening techniques can boost efficiency, they are not accurate enough to be used as a primary tool for cervical screening [8].

During the past few years, extensive research has been carried out for the development of computer-assisted automated reading systems based on cell image analysis [7, 9, 10]. The manual screening process is normally initiated with the collection of cervical cell samples from the uterine cervix and their placement on a glass slide. After visual inspection under a microscope, these are classified into different categories. The shape, size, texture, ratio of nucleus, and cytoplasm are the main characteristics for the classification task. Hence, for an automated system, the first step may include segmentation of images of cell samples to extract regions of interest, containing single cells with nucleus and cytoplasm, from the noncell regions. This initial segmentation is then followed by separation of main cell components including the nucleus and cytoplasm and extraction of their shape/textural features. However, the separation of main components and shape feature extraction is not an integral part of an automated screening system, as proposed schemes in the literature include both options, i.e., with geometrical feature extraction and without prior extraction.

For a system that includes prior feature extraction, accurate segmentation of nucleus from cytoplasm in cervical cell images is a difficult task and is prone to error, thus limiting the success of overall system. The presence of large irregular shapes, appearance dissimilarities, and cell clusters between malignant and benign cell nucleus is the major problem in accurately segmenting the cytoplasm and nucleus. Various segmentation algorithms have been proposed by researchers to segment out cell components. An iterative algorithm for assigning pixels based on a statistical criterion function was proposed in [11] to separate the nucleus, cytoplasm, and background. In another study [12, 13], Gabor filters were applied for exploiting textural variation of the cervical cells to segment out regions of interest. Fuzzy C-means clustering was used in [14, 15] to segment the single cell images into nucleus, cytoplasm, and background. However, if the overlapping cells are taken into account, the classification accuracy is decreased significantly. Therefore, a majority of the presented segmentation approaches [1620], [11, 12, 14] are effective in terms of their performance for single and clear cervical cell images only, but in the case of overlapping cells or other shape changes, they lack the performance accuracy.

To overcome this dependency on segmentation, many techniques have been proposed during the past few years, which do not include prior segmentation and directly classify the unsegmented cell images. A pixel-level classification method is proposed in [21] to classify normal and abnormal cells without prior segmentation using block-wise feature selection and extraction techniques. However, the validation accuracy of the proposed algorithm is not up to the mark. In [22], block image processing was proposed that includes cropping arbitrary image blocks prior to feature extraction, and the cropped blocks are then classified using SVM. However, in their approach, arbitrary cropping could potentially separate a full cell into distinct patches.

Recently, feature representation in image classification problems based on deep learning methods has become more popular [23]. In particular, convolutional neural networks (ConvNets) [24] have achieved unprecedented results in the 2012 ImageNet Large Scale Visual Recognition Challenge, which consisted of classifying natural images in the ImageNet dataset into 1000 fine-grained categories [25]. Besides, they have drastically increased the accuracy in the field of medical imaging [26, 27], specifically classification of lung diseases and lymph nodes in CT images [28, 29], and detecting cervical intraepithelial neoplasia based on cervigram images [30] or multimodal data [31]. ConvNets have also shown superior performance in the classification of cell images for diagnosis of pleural cancer [32].

However, large datasets are essential to achieve high performance and to overcome the problem of overfitting with ConvNets [33]. This is a major limitation in applying ConvNets to the cervical cell classification problem as in the case of cervical cells, and a limited number of annotated datasets are available. For instance, the Herlev dataset [34] only contains 917 cervical cells with 675 abnormal and 242 normal cells that are insufficient for ConvNets. To overcome this limitation, recently, image data augmentation techniques have been proposed to virtually increase the size of training datasets and reduce the problem of overfitting [25]. Data augmentation can be achieved by linear transformation of the data such as mirroring, scaling, translations, rotation, and color shifting unless the information of the object in the image is intact. Transfer learning [21, 22, 3539] is another solution to overcome data overfitting. In transfer learning, a ConvNet is first trained on large-scale natural image datasets and then can be fine-tuned to the desired dataset which is limited in the size.

In this paper, an automatic screening system is proposed to classify malignant and benign cell images without prior segmentation using ConvNets. Due to limited size of Herlev datasets, transfer learning is used to initialize the weights and then fine tune on the dataset. The feature vector at a fully connected layer is extracted after fine-tuning and passed to various classifiers. To show the efficacy of the proposed approach, its performance is evaluated on the Herlev dataset for 2-class and 7-class problems. Malignant and benign cells are considered in the 2-class problem, while in the 7-class problem, all seven categories of the cervical cells have been explored. In short, the research contributions of the presented work are summarized as follows:(1)Our work is aimed at developing tool for automatic classification of cervical cells using the convolutional neural network. Unlike previous methods, it does not require prior segmentation and hand-crafted features. This method automatically extracts hierarchical features embedded in the cell image for the classification task.(2)A data augmentation technique has been considered to avoid overfitting. The rate of overfitting has been reduced as the data augmentation strategy is applied to train our network. This approach is fruitful for our network to learn the most discriminative features of cervical cells and thus achieve superior classification results.(3)Transfer learning is also explored for pretraining, and initial weights are reassigned to another network for fine-tuning on cervical cell images. Training from scratch requires a large amount of labeled data which is extremely difficult in medical diagnosis. Moreover, the designing and adjustment of the hyperparameters are the challenging tasks with reference to overfitting and other issues. Transfer learning is the easiest way to overcome such problems.(4)We also conduct extensive malignant and benign cell assessment experiment on the Herlev dataset. Our results clearly demonstrate the effectiveness of the proposed convolutional neural architecture. The experimental results are compared with recently proposed methods, and our approach provides superior performance as compared with existing systems for cervical cells classification.

The paper is organized as follows: the proposed methodology is presented in Section 2; experiments and results are given in Section 3; result-related discussion is presented in Section 4; and conclusion and future work are summarized in Section 5.

2. Proposed Methodology

The proposed automatic mass screening system for cervical cancer detection using ConvNets is shown in Figure 2. There are four steps: (1) data collection, (2) preprocessing, (3) feature learning, and (4) classification of cervical cells. These steps are explained in the following sections.

2.1. Data Collection

The publicly available Herlev Pap smear dataset is used for the training and testing purpose. It contains 917 single cervical cell images with ground truth classification and segmentation. The cells are categorized into seven different classes. These seven classes are diagnosed by doctors and cytologists to increase the reliability of the diagnosis. Furthermore, these seven classes are broadly categorized into two groups, i.e., malignant and benign. The first class to third class is normal or benign, while fourth to seventh class is abnormal or malignant. The class’s distribution is shown in Table 1.

Normal and abnormal cell images are shown in Figure 3. It can be seen that the size of the nucleus in malignant or abnormal cells is larger than that of the normal cells. The difficult task from classification perspective is that the normal columnar cells have nucleus size quite similar to that of severe nucleus, and also chromatin distribution is same.

2.2. Preprocessing

Herlev dataset consists of images that contain multiple cells in a single image. The data preprocessing phase includes image patch extraction from the original cervical cell images and augmentation of data for training ConvNet 2.

2.2.1. Image Patch Extraction

The proposed approach, like previous patch-based classification methods, does not directly operate on original images present in the Herlev dataset that contains multiple cells at a time [4043]. Image patches, each containing single cell, are first extracted. In order to extract the individual cell, presegmentation of cytoplasm is required [44]. The nuclei are first detected and then image patches of size , and each centered on a nucleus is generated that embed not only the size and scale information of the nucleus but also the textural information, of the cytoplasm surrounding the nucleus. Scale and size of the nucleus is a very important discriminative feature between malignant and benign cells.

2.2.2. Data Augmentation

An image data augmentation technique is used to virtually increase the size of training dataset and reduces overfitting [25]. As the cervical cells are invariant to rotations, they can be rotated from degree with a step angle . In the data augmentation process, rotations with degree, translations in the horizontal direction, translations up to 15 pixels for each normal cells, while in vertical direction, translations up to 15 pixels for each abnormal cells are performed. Hence, we generate 300 image patches from a single normal cell and 160 image patches from each abnormal cell. This transformation yields relative normal distribution, as the numbers of samples of abnormal cell images are as large as compared to that of normal cell images. The size of the generated image patch is set to pixels to cover the cytoplasm region. These patches are then upsampled to a size using bilinear interpolation. These upsampled image patches, as shown in Figure 4, are used in ConvNet 2 for initiating layer transfer and training [28].

The malignant cells in the Herlev dataset are 3 times more than the benign cells. Therefore, it is natural that the classifier tends to be more biased towards the majority class, i.e., the malignant cells. The unfair distribution of data is commonly solved by normalization of data prior to classification, whereby the ratio of positive and negative samples of data is evenly distributed [45]. This normalization process improves not only the convergence rate of training of the ConvNets but also the classification accuracy [25]. In the proposed approach, the training dataset is made balanced by unequal augmentation of benign and malignant cells, in which a higher proportion of benign training samples are generated as compared to malignant training samples.

2.3. Feature Learning

The ConvNets can learn to discriminate features automatically for an underlying task. In this work, a typical deep model is used consisting of 2 ConvNets, named ConvNet 1 and ConvNet 2. At first, the base network ConvNet 1 is pretrained on ImageNet database that consists of over 15 million labeled high-resolution images, belonging to roughly 22,000 categories [46]. The images were collected from the web and labeled through human judgment using Amazon’s Mechanical Turk crowdsourcing tool. In all, there are roughly 1.2 million training images, 50,000 validation images, and 150,000 testing images [25]. ConvNet 1 contains five convolutional layers denoted as conv1–conv5, followed by three pooling layers denoted as , , and , and there are three fully connected layers as , , and . All these layers are transferred to ConvNet 2, which is the network used for feature extraction, setting its initial parameters. This new network is then fine-tuned on the single cervical cell images of the Herlev database. This procedure is shown in Figure 5.

As described earlier, and layers are transferred from ConvNet 1 at the same locations to ConvNet 2. Both ConvNet 1 and ConvNet 2 share the same structure from to layers. However, the fully connected layers are modified in ConvNet 2 because the number of output classes is different as compared to ConvNet 1. Numbers of neurons in ConvNet 2 are and in the case of 7-class problem and 2-class problem, respectively. Fully connected layers of ConvNet 2 are initialized with values from random Gaussian distributions. Local response normalization for and is set according to the parameters in [25]. Hidden layers are used with rectified linear units’ activation function. There are three fully connected layers in the proposed network, i.e., , , and . The feature vector for the final classification task is selected from the layer which is the last layer before the output layer. The main reason to select feature vector from is that it contains more specific and abstract details of the images. The dimension of feature vector extracted from is the number of training samples . The configuration of ConvNet 2 is listed in Table 2.

2.4. Classification

Deep features are extracted from the outer layer of ConvNet 2 for cervical cells classification task. The classification score is then calculated using three different classifiers including SR, SVM, and GEDT. The details of three classifiers are also presented as follows.

2.4.1. Softmax Regression (SR)

For the multiclass dataset, SR is used for classification of unknown samples that are first preprocessed according to the described approach. Unlike ConvNets, SR uses cross entropy function for the classification. The sigmoid function is replaced by softmax function. Mathematically, it is represented by the following equation:where we define the network input aswhere is a weight vector, is the feature vector of training sample, and is the bias. This softmax function computes the probability score that training sample belongs to class given the network . The probability score is generated at the softmax layer of ConvNet 2, next to fully connected layer. Cross entropy function is used for the classification at the final layer of the ConvNet 2. Softmax layer of ConvNet is shown in Figure 6.

2.4.2. Support Vector Machine (SVM)

SVM is a supervised learning model that uses an optimization method to identify support vectors , weights , and bias . The classification is being considered to classify vectors , according to the following equation:where is the kernel function depending on the model assumed for decision boundary. In case of a linear kernel, is a dot product. If , then is classified as a member of the first group and otherwise the second group. Error correcting code classifier is trained using support vector machine. The batch size is set to 256. The training set is applied to the classifier along with deep hierarchical feature vector using ConvNet 2. Validation data are used in SVM to calculate validation accuracy.

2.4.3. GentleBoost Ensemble of Decision Tree (GEDT)

Regression trees or GEDT are used to predict the response of the data. The classification decisions are made when the query sample follows the path from the initial or root node to the end or leaf node.

In GEDT, an ensemble of trees is used that is based on majority voting. It is trained on the training data, numbers of trees are set to 100, and batch size is set to 256. Validation accuracy of the Herlev dataset is evaluated using validation data.

2.4.4. Aggregated Score

Evaluation of the cervical cell classification task is done using 5-fold cross validation on the Herlev dataset for both 2-class and 7-class problems. The performance metrics used for evaluation include accuracy, F1 score, area under the curve, specificity, and sensitivity. Finally, the count of correct classification score is obtained for each cell from all the categories in the Herlev dataset.

3. Experiments and Results

3.1. Experimental Protocol

In the training stage, the conv and pool layers of Alexnet, i.e., ConvNet 1, as shown in Figure 5, are used as initial layers for the ConvNet 2. Random weights are initialized to . In order to train ConvNet 2, a patch of size is cropped randomly from each augmented image to make training/test images compatible to the input nodes of the network. To achieve zero-center normalization, a mean image over the dataset is subtracted. Stochastic gradient descent (SGD) is used for training ConvNet 2 using 30 epochs. Small batches of image patches are fed to ConvNet 2, and validation accuracy of batches is evaluated. The size of mini batch is set to 256. Initial learning rate for convolutional and pooling layers is set to 0.0001, which is set to decrease with a factor of 10 after every 10 epochs. L2 regularization and momentum can be tuned to reduce overfitting and speed up learning process of the ConvNet 2 [25]. L2 regularization or weight decay and momentum are empirically set to 0.0005 and 0.9, respectively. Finally, the network is trained using a randomly selected subset of epochs and validated for its accuracy. The model having a minimum validation error is used for classification application.

In order to test the system against an unseen image, multiple cropped patches of test images, each having single cell, are generated from the original images containing multiple cells. Abnormal score of each crop is generated by the classifier. The abnormal scores of all patches of the test image are aggregated to generate the final score [47]. Patches of test image are generated same as for training images. Furthermore, ten cropped images (four corner, center of cell, i.e., nucleus portion, and their mirrored images) are generated from each of test patch. These image patches are input to the classifier. The prediction score of the classifiers (SR, SVM, and GEDT) is then aggregated to calculate the final score, as shown in Figure 7.

3.2. Experimental Results and Evaluation
3.2.1. ConvNet 2 Learning Results

ConvNet 2 is fine-tuned on the Herlev dataset for 2-class and 7-class problems using 30 epochs. It is observed that, after 10 epochs, the validation accuracy reaches its maximum value, i.e., 0.9935 for the 2-class problem and 0.8745 for the 7-class problem. Figure 8 illustrates a fine-tuning process of ConvNet 2 during 30 training epochs.

These results are improved by considering various classifiers. GEDT provides better performance with reference to both classes because it exploits the randomness of data more efficiently as compared to other classifiers. Table 3 shows the comparison of SR, SVM, and GEDT.

The structure of the layers of network is also being observed after passing a test image to the fine-tuned network. Features learned at the first layer, i.e., are more generic, as shown in Figure 9.

It can be seen that these learned filters contain gradients of different frequencies, blobs, and orientations of colors. As we go deeper in convolutional layers , the features become more prominent and provide more information. Figure 10 shows the feature learning results in for a test cervical cell image.

The strongest activation is also shown in Figure 11 at the pool layer. The white pixels show strong positive activation, while black pixels provide strong negative activation and gray does not activate strongly. It is also observed that the strongest activation initiates negatively on right edges and positively on left edges.

The feature set at fully connected layer is also explored and it is observed that features are more abstract as compared to the previous layers, as shown in Figure 12. Fully connected layer provides features learned for 7 classes.

3.2.2. ConvNet 2 Testing Results

It is a common knowledge that a single evaluation metric is not appropriate to evaluate the performance of a given algorithm due to the presence of some imbalanced classes in the dataset or a large number of training labels [48]. Therefore, the performance of the deep model is reported in terms of four distinct metrics including accuracy, sensitivity, specificity, and F1 score, as proposed in the previous studies [49]. These performance parameters are calculated using the following equations:where the precision and recall are expressed asIn the above equations, true positive (TP) is defined as the number of malignant cell images classified as malignant and true negative (TN) is the number of benign cell images classified as benign. False positive (FP) is the number of benign cell images identified as malignant cell images, and false negative (FN) is the number of malignant cell images classified as benign.

In ConvNet 2, a test set of cervical images using the multiple crop testing scheme is considered with three classifiers, i.e., SR, SVM, and GEDT. It can be seen that GEDT again outperforms the classification accuracy of SR and SVM in test results also. The results are presented in Table 4.

By analyzing the class-wise accuracy, one can observe that the proposed method can predict the cervical cell images well. The classification accuracy of each of the seven cell categories is calculated by feeding all the images as test to the classifiers. It is observed that GEDT shows superior performance on class 1, class 2, class 4, and class 5 because of its ability to eliminate irrelevant features and to extract decision rules from decision trees efficiently. The performance slightly deteriorates for class 3 and class 6 because their features are very close to each other causing confusion. The classification accuracy of class 3 and class 6 is 97.50 % and 97.79 %, respectively. Classification accuracy for class 7 is 99.20 %. The average accuracy achieved by GEDT for underlying task is 99.21%. These results are illustrated in Figure 13.

The evaluation parameters of the classification performance, i.e., accuracy, F1 score, area under the curve, specificity, and sensitivity of the trained ConvNet 2 are displayed in Tables 5 and 6, where the performance comparison of proposed work with [13,39] and [5056] is presented. We have proposed two scenarios with different classifiers, i.e., SVM and GEDT. The mean values of accuracy, F1 score, area under the curve (AUC), sensitivity, and specificity of fine-tuned ConvNet 2 with GEDT classifier are 99.6%, 99.14%, 99.9%, 99.30%, and 99.35%, respectively, for the 2-class problem. These are 98.85%, 98.77%, 99.8%, 98.8%, and 99.74%, respectively, for the 7-class problem.

The accuracy of our system, i.e., ConvNet with GEDT is 99.6% for 2-class and 98.85% for 7-class compared to 99.5% and 91.2% for [39], respectively. This indicates that the prediction accuracy of our classification model is better than the existing models. Similarly, the sensitivity of 99.38% and 99.30% implies better performance of the proposed method compared to the existing methods in classifying the cervical cell images. Likewise, the values of specificity and accuracy of proposed system for the 2-class problem are better than previous methods in [15, 16, 34, 36, 38, 41, 56].

The images of cervical cells that are correctly classified or misclassified are also analyzed. Figure 14 shows the correctly classified malignant cell images; columns 1 to 4 are mild dysplasia, moderate dysplasia, severe dysplasia, and carcinoma, respectively.

Figure 15 shows the result for test cell images that are misclassified (normal misclassified as abnormal and abnormal misclassified as normal).

3.2.3. Computational Complexity

In the training phase, ConvNet 1 is trained on the Corei7 machine with clock speed 2.8 GHz, NVidia 1080Ti GPU, and 8 GB of memory on MATLAB R2017b. The average training time for ConvNet 2, running for 30 epochs, is about 4 hours and 30 minutes for the 2-class and 8 hours and 20 minutes for 7-class problem. In the testing phase, the system takes 8 seconds to classify a test image into normal and abnormal classes. Using multiple crop testing, i.e., classifications and score aggregation, the average time for the testing of one cell image is around 8 seconds.

The experimental results presented in this study suggest the following key observations:(1)Compared with the traditional prior feature extraction schemes, the proposed cervical cell screening system is more effective and robust. This is because ConvNets have been used to encode cervical cancer specific features automatically. In the traditional methods such as [11, 12, 14, 15], cervical cells extraction strategies are hand-crafted which limit the success of overall system. Moreover, in the presence of large irregular shapes, appearance dissimilarities and cervical cell clusters between malignant and benign cell nucleus are the major problems in accurately segmenting the cytoplasm and nucleus. In contrast, this method uses automatic cervical cell features extraction to encode cancerous cell representation and thus achieve superior classification accuracy across a range of cells severity.(2)In order to prevent overfitting, a data augmentation technique, suitable for the underlying task of cervical cell grading, has been proposed. The training and validation losses for 30 epochs have been evaluated to analyze the impact of the proposed data augmentation on classification accuracy. It is observed that the rate of overfitting is greatly reduced when the data augmentation strategy is applied to train our classification model. The smaller difference between training and validation losses caused by data augmentation is presented in Figure 16. It indicates that how this approach is fruitful for the classification model to learn the most discriminative features for the desired task. Furthermore, the proposed model works across a variety of cervical cells and preserves the discriminative information during training. While, in the testing stage, a cell image with arbitrary level of severity can be easily classified into the true grading level. Hence, this suggests the efficacy of our method to avoid the classification model from overfitting and shows robustness for classification accuracy against varying nature of cervical cells.(3)A multiple crop testing scheme is also used with three classifiers to calculate the accuracy of all individual classes of cervical cell images. The class-wise accuracy displayed in Figure 13 shows, if the cervical cells are more clear, the classification ability of our system is more robust. For example, classification accuracy for class 1, class 2, class 4, and class 5 is the highest, i.e., 100% among all other classes. It is because this type of cells can be identified more effectively by the underlying model. The classification accuracy for class 7 is 99.20%. Conversely, the classification performance slightly degrades for the class 3 (97.50%) and class 6 (97.79%) because their features are very close to each other owing to lesser cervical cells specific discriminative information presented to the model.(4)In a general way, a single performance metric can lead to inappropriate classification results due to some imbalanced classes in the dataset or too small or large number of training samples. The literature review of the existing methods on cervical cell images such as [13, 39], and [5056] shows classification performance in terms of accuracy metric only. On the contrary, we have considered four distinct evaluation metrics including accuracy, sensitivity, specificity, and area under the curve. The experimental results displayed in Tables 5 and 6 show the consistent performance of our proposed models in cervical cell images classification across different evaluation metrics. We have proposed two scenarios with different classifiers, i.e., SVM and GEDT. It is noted that the proposed scheme outperforms all previous approaches. This is despite of the fact that training and test images also contained images of overlapping cells. This exceptional performance is mainly because of following reasons: (1) during the training stage, transfer learning is used and the network is finally, fine-tuned on the Herlev dataset; (2) the trained network is used only for extraction of deep features; and (3) the extracted features are then fed to more robust classifiers like SVM and GEDT which are used for final classification. This suggests the effectiveness of our method for underlying task in the presence of a wide variety of cervical cell images ranging from class 1 to class 7.(5)The structure of different layers of fine-tuned network is also explored. It is seen that the features learned in the initial layers are more generalized, and as we move deeper into the network, the extracted features tend to become more abstract. The features learned at fully connected layers are displayed in Figures 912.(6)Despite higher performance of deep learning-based cervical cell screening system, it has some limitations. Classification time of testing a cropped single cell image is 8 seconds for the system that is very slow in clinical setting as there are large numbers of samples from one PAP smear slide. This limitation can be addressed by neglecting the process of data augmentation step for the test data, and only multiple crop testing can be used for classification problem. This increases the speed of the system as it requires only 0.08 seconds for classification, but the accuracy of the system is compromised by 1.5%. Although classification accuracy of the system on the Herlev dataset is high, there is room for further improvement.

5. Conclusions and Future Work

This paper proposes an automatic cervical cancer screening system using convolutional neural network. Unlike previous methods, which are based upon cytoplasm/nucleus segmentation and hand-crafted features, our method automatically extracts deep features embedded in the cell image patch for classification. This system requires cells with coarsely centered nucleus as the network input. Transfer learning is used for pretraining, and initial weights or feature maps are transferred from a pretrained network to a new convolutional neural network for fine-tuning on the cervical cell dataset. The features learned by the new fine-tuned network are extracted and given as input to different classifiers, i.e., SR, SVM, and GEDT. The validation results for 2-class and 7-class problems are analyzed. To test a single cell image, different test image patches are generated same as training data, and the multiple crop testing scheme has been carried out on all patches to achieve classifier score. It is further aggregated for the final score. The proposed method yields the highest performance, as compared to previous state-of-the-art approaches in terms of classification accuracy, sensitivity, specificity, F1 score, and area under the curve on the Herlev Pap smear dataset. It is anticipated that a segmentation free, highly accurate cervical cell classification system of this type is a promising approach for the development of auto-assisted cervical cancer screening system.

In future, the effect of system on field of view images containing overlapping cells is to be analyzed. The system should avoid the misclassification of overlapping objects. Specific classifiers relying on deep learning may be used to cater these problems. Moreover, deep learning-based cervical cell classification still needs to be explored for high-precision diagnosis.

Data Availability

The data used to support the findings of this study will be made available on request.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.