Segmentation of Overlapping Cervical Cells in Normal Pap Smear Images Using Distance-Metric and Morphological Operation

The automatic interpretation of Pap Smear image is one of challenging issues in some aspects. Accurate segmentation for each cell is an important procedurethat must be done so that no information is lost during the evaluation process. However, the presence of overlapping cells in Pap Smear image make the automated analysis of these cytology images become more difficult. In most ofthe studies, cytoplasm segmentation is the difficult stage because the boundaries between cells are very thin. In this study, we propose an algorithm that can segment the overlapping cytoplasm. First, the morphology operation and global thresholding to segment cytoplasm is done. Second, the overlapping area on cytoplasm region is separated using morphological operation and distance criteria on each pixel. The proposed method has been evaluated against the results of manual tracing by experts. The experiment results show that the proposed method can segment the overlapping cytoplasm as similar as experts do, i.e., 2:897 3:632 (mean std) using Hausdorff distance.


I. INTRODUCTION
P AP Smear is one of methods to examine the cervical cells. Those cells are examined by microscope to observe the alternation or acuteness of cervical epithelial cell as the initial sign of cancerous existence. Cervical cancer is the fourth highest cancer that attacks woman. There were more than 530.000 new cases of cervical cancer in 2012 [1]. Cervical cancer is a cancer arising from the cervix, the female reproductive system, which is in the upper part of the vagina and posterior portion of the uterus. Nowadays, in developing countries Pap Smear test used for examining cervical cells has been utilized extensively. It can decrease mortality rates caused by cervical cancer significantly. However, Pap Smear images are difficulties Received: Jan. 30, 2017; received in revised form: May 2, 2017; accepted: May 10, 2017; available online: May 11, 2017. to interprete. It is caused by clustered cells, overlapping cells, existence of inflammatory cells, blood stains, low contrast, and variation in illumination occurred because of inconsistent staining methods such as dye concentration. Figure 1(a) shows Pap Smear image that contains inflammatory cells, whereas Fig. 1(b) shows Pap Smear image that contains overlapping cells. Those difficulties consumes time and require a highly trained personnel to avoid high errors. Consequently, in recent years, there are a lot of researches focusing on building system that can interpret Pap Smear images automatically.
Currently, researches in the field of computer vision about cervical cancer have been carried out. This suggests that the need to combat this disease is very large. In the previous study [2], clustering by using the Morphological Reconstruction successfully detects most of nuclei in the image of the Pap Smear. A similar study to detect nuclei in the image of the Pap Smear is also performed using Fuzzy C-Means Clustering (CFM) [3]. The other research techniques that have been successfully used to segment nuclei in the Pap Smear images are deformable templates [4], pixel classification schemes [5], morphological operation and watershed transformation [6].  Almost all of these studies only targeted the process of detection, analysis, and segmentation of nuclei regions by assuming that the accuracy of Pap Smear results is strongly influenced by the general appearance of nuclei on the image. The nuclei shows significant changes when the cell is infected by the disease. The detection of cytoplasm regions is also important because the cytoplasm features have proved very useful for the identification of abnormal cells. Cytoplasm is the cell membrane outside the nucleus. The studied conducted by Yang-Mao et al. [7] and Li et al. [8] had successfully performed segmentation of nuclei and cytoplasm area using Gradient Vector Flow (GVF) snake. However, the study only uses a single cell as an input to be analyzed. While each image preparation can contain up to 300 000 cervical cells [9] thus the analytical approach using a single cell cannot represent the actual case. Similar study that use multiple cell has been done by Ref. [10]. However, it has not overcome the cytoplasm overlapping. Figure 2 shows the dissimilarity between a Pap Smear image containing a single cell and a Pap Smear image containing multiple cells and overlapping cells.
This study is conducted to assist cervical epithelial cell identification automatically in Pap Smear images, namely segmentation of overlapping cervical cells. We focus on the nuclei and cytoplasm segmentation in Pap Smear image that contains multiple cells. Two main steps to segment the overlapping cervical cells proposed in this study are: nuclei segmentation and cytoplasm segmentation.

A. Nuclei Segmentation
The nuclei are segmented automatically using the technique we proposed in Ref. [12]. Nuclei are segmented to obtain the actual boundaries of nuclei. Figure 3 illustrates the steps for nuclei segmentation.
In the first step, because Pap Smear images tend to have low contrast as shown in Fig. 4 is needed to increase the contrast between nuclei and background. Techniques used to increase the contrast are h-minima transform operation [12]. The h-minima transform operation is applied to each color space layer and subtraction operation between two images. Figure 4(b) shows the result of preprocessing. We can see that the contrast of image in Fig. 4(b) become higher. The nuclei segmentation process become easier and more precise when it is performed in high contrast image ( Fig. 4(b)) than in low contrast image ( Fig. 4(a)).
In the second step, the segmentation using morphological operation, and global thresholding operation approach is performed to identify nucleus candidate. The global thresholding operation used in this step are based on method proposed by Ref. [13]. Furthermore, modified watershed transformation proposed by Ref. [14] is used to address the overlapping nuclei. Figure 5(b) shows the result of the second step. All nucleus candidates that can be identified are marked by "x".
The last step is clustering by using FCM to separate nuclei against non-nuclei objects, such as inflammatory cells or blood stains. Inflammatory cells that exist almost in all Pap Smear images can disturb the automatic analysis process. Features that are used to separate the nuclei against non-nuclei object are the minor axis length, the average radius, the equivalent diameter, the major axis length, the uniformity, the foreground-background contrast in red, the compactness, the circularity, and the eccentricity. Those nine features are used to identify nuclei based on Ref. [12].
In that study, those nine features have been tested in four different datasets and give the high accuracy for nuclei clustering. Figure 5(c) shows the result of the clustering process. Identified nuclei are marked by "+", whereas non-nuclei objects are marked by "•". Figure 5(d) shows the result of nuclei identification after eliminating non-nuclei objects.

B. Cytoplasm Segmentation 1) Cytoplasm Boundaries Detection:
Cytoplasm segmentation is more difficult than nuclei segmentation. It is because nuclei intensities are darker than cytoplasm intensities. It is easier to distinguish nuclei against the background. On the contrary, cytoplasm tends to have similar intensity with background intensity. Hence, the first step of cytoplasm segmentation is a contrast enhancement using h-minima transform operation into each color space layer in order to make cytoplasm more homogenous. The values of h-minima at each color space layer (red, green, and blue) are calculated by where µ is the value of average intensities and I is the intensity value in the original image. Each layer of color that has been enhanced is converted to grayscale color space by using: where h r , h g , and h b are h-minima values at each color space layer (red, green, blue). After the grayscale images of h-minima transform image are obtained, the morphological opening operation is applied to the grayscale image: where • denotes the opening operation [15]. The variable I g is a grayscale image of h-minima transform image and SE is flat disk-shaped structuring element with radius of 5 pixels. The h-minima transformation with h = 15 is applied again to I opened image in order to get more uniform color intensities of cytoplasm regions.
The output image of this stage is called H i image that has more homogeneous color intensity in the cytoplasm regions.
The cytoplasm regions in that image are extracted by using: where I g is a grayscale transformation image obtained using Eq. (2), and H i is h-minima transform image gotten from the previous process. Then, contrast enhancement filter is applied to complement the image of C n in order to enhance the color intensities of an image.
The morphological dilation using a flat disk-shaped structuring element with radius of 3 pixels is applied by: where ⊕ denotes the dilation operation [15] and I ce denotes the contrast enhancement filter image. Another grayscale image that will be used for marker image is obtained by: where G is a grayscale image of the original image, I dilate is image gotten from Eq. (5), and I g is image obtained from Eq. (2). After that, the binary image of I mark is obtained using a thresholding process in each pixel of I mark image. The thresholding process is given in Alg. 1 where D I is the dimension size of image, and T is obtained from Eq. (7). After we get the marker image (M ), a morphological opening is applied to M using: where SE is the flat disk-shaped structuring element with the radius of 15 pixels. The last process is the noise reduction in BM image by omiting the object that has an area less than 1000 pixels. To ensure that no information is lost related to the cytoplasm, the experiment of selecting the appropriate threshold value in eliminating noise is done through observations of an anatomical pathologist. The final result of this part is shown in Fig. 6.

2) Segmentation of Overlapping Cytoplasm:
The segmentation of overlapping cytoplasm is importan because accurate segmentation of cytoplasm can prevent the loss of important information for the subsequent feature extraction process. Several works had been proposed to separate the overlapping cytoplasm utilize color intensity [16] because the overlapping cytoplasm has lower intensities than the non-overlapping regions. Figure 7(a) shows overlapping cytoplasm region. However, there are overlapping cytoplasm that do not have different intensities to the non-overlapping regions, as shown in Fig. 7(b). This shows that the segmentation method of cytoplasm by only utilizing the color intensity can be false because there is no color intensity difference in the cytoplasm regions.
The method for segmenting the overlapping cytoplasm is proposed as indicated in Algorithm 2.
First, we perform a labeling process of the output image in the cytoplasm boundaries detection stage. Furthermore, from the cytoplasm which has been labeled, we look for the number of nuclei contained in that area. If there is more than one nuclei in a region of the cytoplasm, then the process of calculating the distance between each of the pixel coordinate on the cytoplasmic region and the pixel coordinates on the area centroid nuclei is performed to determine the proximity of each pixel with the nearest centroid. Each pixel in the cytoplasmic region further is classified based on proximity to pixel coordinates of centroid nuclei. The resulting image is the cytoplasmic region that has been classified in accordance with the pixel value (called the image D c ). Furthermore, the process of morphological dilation towards segmentation of nuclei with structuring elements of a matrix shaped nuclei itself is done repeatedly until the area of morphological dilation process results exceeds the area segmentation cytoplasm (called M d image). Finally, the merger process is performed between the image of Dc and the image of Md using the 'OR' operator. Figure 8 shows the estimation of segmentation results using the proposed method. The result indicates that the proposed method can segment the overlapping cytoplasm.

A. Data
Images used in this study are obtained from the collection of 20 images of pap smear photographs in CITO Laboratory using a NIKON D100 microscope with the two types of preparation: 16

B. Execution Time
The proposed method requires the execution time of 12.37 ± 3.95 s. Table I shows the time execution of two main steps of the proposed method, namely nuclei segmentation and cytoplasm segmentation. The execution time is presented to test the efficiency of the proposed method. Those methods are built using MATLAB software, 2.66 GHz Pentium of processor and 4 GB of RAM.

C. The t-test
The t-test is used to identify and measure the difference between cytoplasm area generated by proposed method and area generated by manual tracing of experts. The test is applied in both of nuclei area and cytoplasm area. The test used in this study is the paired t-test. It compares two variables of data directly. Those are nuclei area and cytoplasm area. The value of the significance level α is 5%. In other words, the accuracy level of the test results is 95%. Table II illustrates the t-test result for nuclei area. From Table II, it can be seen that the p-value from the t-test is 0.9070, where that value is greater than α = 0.05. It can be concluded that there is no significant difference between the manual tracing by experts and the proposed method.

D. Hausdorff Distance Test
Hausdorff distance is one of the measurements to compare the distance between two points. Hausdorff distance is used for measuring the similarity level between segmentation image obtained by the proposed method and segmented image traced manually by experts. The Hausdorff distance is defined by [17]: The variable A is a set of points (x, y), and B is a set of points (x, y) of the segmented image resulted by the proposed method. The average value of Hausdorff distance generated in nuclei area is 2.897±3.632 (mean ± std). Whereas, the average value of Hausdorff distance generated in cytoplasm area is 8.278 ± 6.239 (mean ± std). Equations (9) and (10) indicate that the smaller the Hausdorff distance is, the higher the similarity degree of two objects. In comparison, Hausdorff distance in cytoplasm area generated by Ref. [16] is smaller than the proposed method, that is 7.69 ± 1.89. However, the method proposed by Ref. [16] can separate only two overlapping cells, whereas our proposed method can separate more than two overlapping cells. Moreover, in the study done by Ref. [16], there are parameters that must be set manually depending on the dataset used, whereas our proposed method runs automatically without setting specific parameters.

E. Comparison to Another Method
To know its reliability, the proposed segmentation method is compared to the two commonnly used methods: Watershed Transformation [18] and Voronoi Diagram [19]. Figure 9 shows the comparison results of cell segmentation using Voronoi Diagrams, Watershed Transformation, and proposed method. The Voronoi Diagrams can identify cytoplasm region very fast but it is less precise as shown in Fig. 9(a). In this case, Voronoi Diagrams cannot identify the overlapping area. Moreover the wrong identifications as indicated by arrows in Fig. 9(a) are occurred where one cytoplasm is identified as two cytoplasms. This can be occurred because Voronoi diagram use liniear approach when dividing space. Meanwhile, in this case, Watershed Transformation can not only segment the cytoplasm precisely, as shown in Fig. 9(b), but there are also variabels that must be set manually. Meanwhile, Fig. 9(c) shows the proposed method can segment both the cytoplasm and overlapping cytoplasm precisely.
The proposed method has the advantages to other studies Refs. [16,20]. These previous studies can only be applied in images that have only two overlapping cells.

IV. CONCLUSION
This study proposes a method based on distance metric and morphological operation to perform an overlapping cytoplasm segmentation. The result shows that the proposed method can segment the overlapping cytoplasm and get cytoplasm boundaries automaticcally and precisely. In addition, the proposed method is able to separate the boundaries of the cytoplasm without depending on the intensity of color to the cytoplasm. This is considered important because the intensity of colors in the cytoplasm that overlap often have a homogeneous intensity, or cannot be identified the boundaries by naked eye. The results of this study Cite this article as: R. Kurniawan, I. Muhimmah, A. Kurniawardhani, and Indrayanti, "Segmentation of Overlapping Cervical Cells in Normal Pap Smear Images Using Distance-Metric and Morphological Operation", CommIT (Communication & Information Technology) Journal 11(1), 25-31, 2017.
can be used for other research to improve the accuracy of detection of nuclei in the image of the Pap Smear.