Cellular Image Segmentation using Morphological Operators and Extraction of Features for Quantitative Measurement

To address the issue of blurriness, artifacts, overlapping of cells and uneven dying of histopathology images of breast cancer cells, a computer assisted image analysis and feature extraction methods are proposed in the present paper which include preprocessing, enhancement, segmentation and features extraction. The proposed method is based on the dysplastic features that work on the computation of features for differentiation of benign and malignant cells. Morphological measures are significantly used to analyze these features. The purpose of choosing morphological operators is based on the fact that these operators principally utilize regularities and distribution of the structural features of cells. Analysis of cell morphology is an important factor that aids the complete evaluation of the microscopic cells, examination of the cell behaviour. This also provides the quantitative measure of area, perimeter, intensity, and texture, etc. present in large populations of cells. For the implementation, of proposed method publicly available image data set of 58 images (26 malignant and 32 benign) has been used. It is observed that malignant cells have the considerably greater magnitude for computed features as compared to benign. Significant variation in features values are also found in a case of malignant cells. Apart from this, an efficient approach of segmenting cells, presented in the histopathology images has been shown, that will provide assistance to the pathologist to identify malignant cells. The results reported here can be further used in the classification of cells for benign and malignant categories.

Breast cancer is one of the most common cancers in women in the world.In 2015, an estimated 60,290 new cases of breast carcinoma were diagnosed in situ, 83% of which were ductal carcinoma in situ (DCIS) and 12% lobular carcinoma in situ (LCIS) [1][2] .Cancer begins when genes in a cell become abnormal and the cell starts to growing and divide out of control.Cancerous cells replicate much faster than normal healthy cells.It divides and multiplies to form a tumor that may be benign (non-cancerous) and malignant (cancerous) 3 .
Histopathological studies are still most reliable and effective technique in cancer research.Till now, analysis of histopathology images has been done manually via observing dysplastic appearances such as minute structures, distribution, finding of tubules, nuclei, regularities of cell shapes and size across the tissues by the pathologist to decide whether it is benign or malignant.The distortion in the shape of cells and change in the density of cluster of cells are the signatures of the occurrence of malignancy in body tissue [4][5][6] .Pathologists face several problems while observing the histopathological image due to overlapping, blurriness, artifacts, weak boundary detection and uneven dying.Moreover, it is found to be very time consuming and tedious process.This depends on perceptions and level of expertise of pathologists.For cancer detection, morphological feature extraction is the main tool for analyzing the cellular organization, abnormality, and changes in the physiological state of the cells 7- 9 .Analysis of the cells based on their morphological differences was applied to study the differentiation of benign or malignant cells.A computer aided diagnosis system is proposed in this paper as a qualitative and quantitative tool for analysis and classification [10][11] .Several types of research have analyzed histopathology images that relate image analysis of cells morphology to the malignancy detection.A. Madabhushi observed the challenges in digital imaging that led to improvement in image analysis techniques resulting in improved opportunities to the pathologist for the treatment of benign tissues 12 .A. D. Belsare et al. worked on the tissue structure and presented cell distribution in a tissue.They described irregularities of the shapes of cells to determine the level of malignancy and benign in histopathology images 13 .Bhattacharjee et al. presented a review of computeraided diagnosis system to detect cancer from histopathology images using image processing method 14 .Demir et al. presented on both tissue level and cellular level automatic diagnosis of biopsy image using image processing techniques, feature extraction and classification techniques 15 .S. Petushi obtained the intensity of the pixels that are registered and calculate the mean of the neighboring pixels 16 .Bergmeir et al. proposed a model to extract the various texture features contrast, correlation, energy, homogeneity, gray level, and HSV by using local histograms and GLCM 17 .
The aim of present work is to investigate robust and accurate image analysis algorithm for the purpose of detection of cancer cells using morphological and texture features (GLCM) extracted from the segmented histopathology images.In this work, diverse image processing techniques on histopathological images, breast cancer have been analyzed.Classification of benign and malignant cells has been done in three steps: pre-processing, segmentation and feature extraction.
The organization of this paper is as follows.Section 2 discusses methodology and proposed algorithm.Section 3 describes the results and Section 4 describes discussions.Finally, section 5 draws the conclusion of the work presented in this paper.

Images collection
Histopathology breast cancer cell datasets used in present work have been taken from www.bioimage.ucsb.edu(Centre for Bio-image Informatics, University of California, Santabarbara (UCSB) for analysis.Microphotographs of breast cancer histopathology of total 58 images were taken, 26 out of which were malignant and 32 are benign.With the help of cropping histopathology images are fragmented into single and group cells.A dataset of single cells consisting of 218 benign and 233 malignant and a dataset of the group of cells consisting of 72 benign and 73 malignant were framed.Structural, intensity and textures based 30 features were used to distinguish between benign and malignant cells.The images acquired from histopathology breast cancer (UCSB) datasets were already stained to visualize various parts, cellular structures such as cells, nuclei, and cytoplasm of the tissue.Certain special stains are used to bind selectively to particular components.The nuclei were stained blue with hematoxylin while cytoplasm and extra cellular components were in pink due to eosin staining.

Experimental set up
Experiments have been implemented on a 3.40 GHz CPU with 4 GB RAM, 64 bits, Windows 7 operating system, with MATLAB.Figure 1 represents the flow chart of the proposed system and basic steps involved in the cell morphological analysis.

Image pre-processing using median filtering
The main purpose of the pre-processing stage was to reduce the background noise and to enhance the image to improve the image quality.In this paper, median filtering was implemented to preprocess the images to eliminate graininess.Basic fundamental of median filtering is that every output pixel comprises the median value in the 5-by-5 neighborhood around the equivalent pixel in the input image.The image was padded with zeros on the edges, so the median values for the points of 3 pixels of the edges may appear distorted.After that, the contrast is enhanced between the cytoplasm, nucleus and extracellular components using unsharp masking.The filter was applied to the image by subtracting the multiplied scaled factor, and Gaussian filtered from the input image.A rotationally symmetric Gaussian low pass filter with a standard deviation of 50 pixels was used, with a total filter size of 15-by-15 pixels.The scaling factor was 0.35.

Segmentation
Segmentation is the process where an image is divided into the different regions on some similarity basis.The basic purpose of segmentation was the extraction of important features from the image, from which information can easily be perceived.The morphological appearance of structures like size, shape, and color intensity, are important factors for the identification of the cancer cells.To analyze all these indicators, images firstly should be segmented.In this paper band thresholding was implemented to group pixels lying in the cellular region for segmentation.Basic morphological operations such as filling with holes, opening, closing dilation, erosion were done to plot the boundary of the cells [18-21].This procedure provides user to see different outlined cells.For implementing dilation, arbitrary sized structuring element has been used.Further, for erosion implemented disk sized structuring element has been used.The region of interest (ROI) of the segmented cells was then considered for feature manipulation.Un-weighted centroid and the weighted centroid are marked by blue and red color   [22-25].Single cells and the group of cells have been taken into account for segmentation and analysis.Figure 2 depicts the results obtained by implementing the steps discussed above.

Extraction of morphological features
The most significant portion of this work is the computation of features.To do the same, the total features of the particular ROI (region of interest) are extracted to distinguish different types of cells such as benign and malignant.This based on their structural, intensity, and texture features in single cells and the group of cells were computed from the segmented cell images as shown in Table 1.Further, 30 features have been computed for the cells present in the image.
The quantification of these features helps to differentiate the malignant cells from benign cells.Moreover, the statistics computed on these properties is used to identify cancer in a tissue.Structure-based features used in this paper are the area, convex area, perimeter, major axis length, minor axis length, circularity, eccentricity, and solidity are explained in Table 3.

Intensity features
Pixel based features provide information about the intensity (gray-level or color) histogram of the pixels located in cells.These features were extracted from the gray-level or color histogram of the image.This includes max intensity, min intensity, mean intensity and standard deviation that explained in Table 4.These types of features do not provide any information about the spatial division of the pixels.

Texture features
The texture features provide information about the variation in the intensity of a surface and quantify properties such as regularity, The Cell area can be represented by nucleus region containing a total number of non-zero pixels in the region.

= ( , ) =1 =1
A is cell area and B is the segmented image of i rows and columns.F 2.
Convex Area Scalar that specifies the number of pixels in Convex Image.
Cell Perimeter (P) Cell perimeter calculates the distance between each adjoining pair of pixels around the border of the region.It is defined as: Major Axis Length It specifies the length (in pixels) of the major axis of the ellipse that has the same normalized second central moments as the region.
1, 1 and 2, 2 are end points on the major axis.F 5.
Minor Axis Length It specifies the length (in pixels) of the minor axis of the ellipse that has the same normalized second central moments as the region.
Circularity This dimensionless parameter is calculated by area and perimeter.

Eccentricity
The ratio of major axis length and minor axis length is known as eccentricity and defined as: = ℎ ℎ F 8. Solidity Scalar was specifying the proportion of the pixels in the convex hull that are also in the region, computed as Area/Convex Area.= coarseness, and smoothness.The texture is a connected set of pixels that repeatedly occur in an image.The texture analysis techniques based on the gray level co-occurrence matrix is applied to histopathological images analysis.It is an estimate of image properties related to second order statistics.The gray level co-occurrence matrix GLCM quantifies the various textural features such as autocorrelation, contrast, correlation, cluster prominence, dissimilarity, energy, entropy, homogeneity, maximum probability, sum of squares, sum of average, sum of variance, sum of entropy, difference variance, difference entropy, information measure of correlation 2, inverse difference normalized (INN) and inverse difference moment normalize etc.Some of them are described in Table 5.

RESULTS
These features such as area, convex area, perimeter, major axis, minor axis, circularity, eccentricity, max intensity, mean intensity, solidity, autocorrelation, cluster prominence, sum of squares, sum of average, sum of variance, contrast, sum of entropy, and information measure of correlation 2. This yielded significant differentiation between benign and malignant cells into single cells and group cells.The reasons for choosing the group of cells over single cells are to produce the accurate result.The variations of values of various features for single cells and group cells to differentiate benign cells and malignant cells are shown in Figure 3 to Figure 6.These features show that malignant cells have the greater   magnitude of shape based features in comparison to benign cells, and there was variation in other features values.All the malignant cells in the single cells and the group have increased the size (area, convex area, perimeter, major axis, minor axis) and elongated shape (circularity, eccentricity) and greater magnitude of the maximum and mean intensity.This size, shape and intensity based feature were significant for the differentiation point of view for single cells and group cells.Some of the features have an insignificant relation and minor difference such as standard deviation, minimum intensity, and correlation.These features are insignificant for the differentiation point of view in both cases in single cells and the group of cells as they are having almost analogous values, as presented in Table 6.Dissimilarity, energy, entropy, homogeneity, maximum probability difference variance, difference entropy, inverse difference normalized (INN) and inverse difference moment The results presented here are expressed as Mean ± S.D. Statistical analysis has been performed using Graph Pad Prism software (version 5.1).To perform unpaired, two-tailed students ttests, p-value < 0.05 was used for significance.

DISCUSSION
In this work, morphological features of breast cancer cells have been calculated in benign and malignant cells.Structure-based features of malignant cells show greater magnitude such as area, perimeter, major axis, and minor axis etc. in comparison to benign cells as shown in figures 3 to 6. Main reasons for this outcomes are that benign cells grow and divide when they receive signals from the surrounding cells and does not exhibit contact inhibition phenomenon while malignant cells have uncontrolled cell division and grow faster.Benign cells undergo through ageing and senescence process, as well as repair their physiological and chromosomal abnormalities (e.g.apoptosis) while malignant cells show neither repair nor induce apoptosis.Benign cells become specialized or mature so that they are able to carry out their function in the body.While malignant cells often reproduce very quickly and do not exhibit mature phenotypes.
Further, shape based two feature circularity and eccentricity of the cell has been taken into consideration.When shape factor circularity is taken between 0 and 1.When the value is 1 than the object is a perfect circle.In our case, for benign cells, it is found to be nearly 1 i.e. 0.94 shows that it circular in structure as compare to malignant cell nearly to 0 i.e., 0.84 have not circular structure.The eccentricity of an ellipse gives a measure of just how squashed it is.If the eccentricity is 0, it is not squashed at all and so remain a circle.If it is 1 than completely squashed and look like a line.As per consideration of eccentricity, it is found to be nearly 1 i.e. 0.7.This shows that malignant cells have elongated structure and in concern to benign it is about 0.5 that shows circular structure as shown in Table 6.The main cause of the results obtained as malignant cells image has generally elongated, distorted or blebs shape of cells.This becomes physiologically nonfunctional such types of shape is useful for malignant cells to exhibit random migration i.e. metastasis.In normal tissues, the cells stay together and adheres to each other through specific microstructures that assist in governing the cellular function.
Our report is in agreement with the data reported by several authors Kasmin et al. extracted the features of microscopic biopsy images including (area, perimeter, convex area, solidity, major axis length, eccentricity, ratio of cell and nucleus area, circularity, and mean intensity) of cytoplasm [26].Basavanhally et al. quantify the morphological features that classify their structure in a histopathological slide image which leads to discrimination of a cell into a particular class for the purpose of diagnosis.[27].Sinha et al. extracted some features of histopathological images that contain the area of cells, area ratio, eccentricity, compactness, average values of color components, energy, correlation, and entropy [28].
In concern to pixel-based features, max intensity and mean intensity pixel values were found to be higher and diverse in malignant cells as compared to benign where max intensity pixel values found to be almost identical to normal cells as shown in the Table1.Intensity based some of the features such as standard deviation and minimum intensity have been found insignificant in both case benign and malignant.The possible reason for this observation shows the presence of high amount of DNA (deoxyribose nucleic acids) or increase the amount of nucleoprotein synthesis in the malignant cells.This results in the larger nucleolus and dark-staining nuclei which referred as hyperchromatism.Thiran, et al. and Zhao et al. also worked on the pixel of the benign and malignant nucleus.[29-30].C. Demir et al. reported that the intensity-based approach is employed to calculate the intensity value of pixels to define the features in a histopathological slide image [14].
Texture based feature are also helpful in distinguishing benign and malignant cells [31-33].Hamilton et al. worked on texture analysis to develop criteria for the automatic identification of colorectal dysplasia from a background through focal areas of histologically normal tissue [34].Mouelhi et al. classify the cancerous cells from histopathological images by using Haralick's textures features, color component and the histogram of oriented gradients (HOG).This is based on statistical moments (CCSM) that feature selection and extraction approaches [35].In our implementation, computed texture features can also depict the difference between benign and malignant cells Figure 5 and 6.

CONCLUSION
In this paper, the histopathological cellular image of suspected breast cancer has been analyzed using structure, intensity and texture based morphological features.The developed algorithm for automated analysis and evaluation of histopathological images will assist the pathologists and reduce the human error.Such automated cancer diagnosis facilitates mathematical judgment to the pathologist.The future work would include more features in the algorithm for efficient differentiation between benign and malignant cancer cells so that suitable classifiers may be designed.

,F20
Measures the closeness of the distribution of elements in the GLCM to the GLCM diagonal.

Fig. 3 .Fig. 4 .Fig. 5 .
Fig. 3. Variations of values of various features for (a) single cell (b) group cells for breast cancer

Fig. 6 .
Fig. 6.Variations of values of various features for (a) single cell (b) group cells.Following table represents the numerical value of the features of single and group cells

Table 2 .
Feature of cell in benign and malignant cells boundary respectively.Standard deviation is then measured.After that, it is converted to gray level image having one bounding box marked by yellow color

Table 3 .
Structure features

Table 4 .
Intensity features

Table 5 .
Texture features

Table 6 .
Comparative parameter of single cells and group cells of benign and malignant cells of breast cancer image