Colour segmentation of multi variants tuberculosis sputum images using self organizing map

Lung tuberculosis detection is still identified from Ziehl-Neelsen sputum smear images in low and middle countries. The clinicians decide the grade of this disease by counting manually the amount of tuberculosis bacilli. It is very tedious for clinicians with a lot number of patient and without standardization for sputum staining. The tuberculosis sputum images have multi variant characterizations in colour, because of no standardization in staining. The sputum has more variants colour and they are difficult to be identified. For helping the clinicians, this research examined the Self Organizing Map method for colouring image segmentation in sputum images based on colour clustering. This method has better performance than k-means clustering which also tried in this research. The Self Organizing Map could segment the sputum images with y good result and cluster the colours adaptively.


Introduction
Pulmonary tuberculosis is a deadly infectious disease occured in many countries in Asia and Africa. In Indonesia, many people with tuberculosis disease are examined in the community health centre. Examination of tuberculosis through sputum smear is a result from Ziehl -Neelsen staining in microscopy. From the results, Ziehl -Neelsen staining will give red color effect to the tuberculosis (TB) bacteria and blue color for background images. The first step of examination was to detect the presence of TB bacteria from the red side, then from the shape of the TB bacteria itself.
The Ziehl -Neelsen staining in sputum smear gives the complex color images, so that the clinicians have difficulty in doing slide examination manually. To assist the clinicians when reading the sputum smear slide, this research did color segmentation method for the sputum images which had been digitized. The purpose was to extract the TB bacteria images and eliminate the background images. The red TB bacteria after Ziehl -Neelsen staining provided the different brightness red levels, from brown, dark red, until to pink. The right color segmentation is necessary for the accurate classification techniques in order to distinguish the image of TB bacteria, the image of non TB bacteria, and the image of background.
Examination of patients with pulmonary tuberculosis began with an examination of sputum. The examination of sputum waa the main step, because it could identify the presence of TB bacteria so that the disease can be diagnosed as pulmonary tuberculosis. The Indonesian community health centers commonly use a light microscopic examination techniques using Ziehl -Neelsen staining for sputum observation. This staining gives a red effect for TB bacteria and blue color for the background of the sputum sample. The examination of the TB bacteria is still done manually by clinicians to perform manually counting the number of bacteria in each field of view of the microscope. For pulmonary TB disease diagnose becoming easy and fast, this research made a diagnosis of pulmonary tuberculosis automatically. The first step was the enhancement process for digital sputum images. After that, the color segmentation was done to sputum images for clearing up the background images and leaved the TB bacteria images. Some researchs which had been done correlated with the segmentation of TB bacteria images were [1], [2], [3]. Research [1] did the color segmentation using adaptive color thresholding in RGB color space while research [2] did color segmentation of the TB bacteria images using pixel classifier method. In addition, research [3] did color segmentation for TB bacteria using adaptive color thresholding in HSV color space. The researcher has opinion that the color segmentation has important rule to support the success of TB sputum diagnose. To improve the accuracy of TB identification in [1][2] [3], this research already did color segmentation using selforganizing map (SOM). Previously, this research examined k-means cluster for color image segmentation, but there were some shortcomings to choose the best centroid values. It needs long computational time. With SOM method, the color could be segmented more fast and accurately. After segmenting background, identification is continued with the shape identification. Research [4] did TB bacteria identification in binary images using neural network based on geometry feature only without using color feature as information for classification process. SOM with clustering method to segment the color of TB images from background could support the identification TB bacteria images from sputum more accurately.

Image acquisition
The number of sputum smear slides used was about 1000 image of Ziehl-Neelsen sputum smear provided by Bandung Health Department in West Java Province, Indonesia. Images of sputum smear slide were captured using web camera integrated with ocular lens and mounted onto ocular field and saved in JPEG format in RGB 24 bits. The total of microscope magnification was 1000 times. The image size was 640x360 pixels.

K-Means clustering
K-means clustering is grouping data method which widely used. K-means clustering method has objective function based on the proximities of the data points to the centroid of cluster [8]. With * + be the data, ∑ be the centroid of cluster , , where is the number of the data objects in cluster , and k is the number of clusters [3]. The objective function of K-means clustering is then formulated as the sum of squared errors [5] : With ( ) ∑ ∑ ‖ ‖ , the sum of all pair wise distances of data objects within K clusters as follows [3] : Each neuron in the input layer is connected to a neuron in the output layer. Thus, output neuron (cluster units) j is connected to each of the input neuron through weights which is referred to as the j-th weights vector. In vector or matrix notation, it is denoted the j-th weight vector by ,

Self Organizing Map
-. It is given by the j-th column of the weight matrix. There is a total of M such weight vectors, one for each cluster units.
During the training process, each training vector is cyclically or randomly selected and presented to the network. Two vectors are considered the closest if the square of the Euclidean distance between them, ‖ ‖ was the smallest. The winning unit and its neighboring units (those located in its first and second neighborhoods) then update their weight vectors according to the Kohonen rule. This process is continued until the weight vectors changed by less than a preset amount. The square of the Euclidean distance between the input vector and each of the weight vectors is [6]: c. Find the index j' such that is a minimum d. For all cluster units j within the specified neighborhoods of j', update the weight vectors

Result
Firstly, color segmentation was done to TB bacteria identification. Color segmentation had a purpose to identify the red object as TB bacteria candidate. This research examined some methods based on clustering concepts for color image segmentation. They were k-means clustering and SOM.

Conclusion
SOM technique could do better segmentation in Ziehl-Neelsen sputum smear images which is very multi variants with the 97.90% of accuracy. The background images could be clear enough and remain TB bacilli candidate objects. Therefore, it could be easy of doing classification for the red objects to decide TB bacilli or not TB bacilli. SOM technique could support the classification step to give the best accuracy in pulmonary disease diagnose. K-Means clustering still have some lacks in centroid finding. It remains use trial and error to find centroid and the data needs long computational time to achieve convergence. The SOM method could implement in real application for TB identification combining with the classification method.