K-MEANS CLUSTERING IN TEXTURED IMAGE: EXAMPLE OF LAMELLAR MICROSTRUCTURE IN TITANIUM ALLOYS

. This paper presents an implementation of the k-means clustering method, to segment cross sections of X-ray micro tomographic images of lamellar Titanium alloys. It proposes an approach for estimating the optimal number of clusters by analyzing the histogram of the local orientation map of the image and the choice of the cluster centroids used to initialize k-means. This is compared with the classical method considering random coordinates of the clusters.


Introduction
K-means clustering algorithm, is one of the most popular methods used in data mining, it aims at classifying a data set into predefined k clusters. Every cluster includes similar objects by comparing them with the other objects in different clusters [4]. There are many factors that affects on the k-means clustering results, one of them being the number of clusters. The selection of the number of clusters has a big effect on the final results and the speed of the algorithm. This letter will give a description of an automatic approach that determines the number of clusters by analysing the histogram of the orientation map of the image. The results are further compared with the results of the segmentation obtained by the using the directional filter bank of the Complementary of HourGlass (CHG) method [2,3].
The approach is tested on images of lamellar titanium alloys. The microstructure of this type of material shows groups of lamellar colonies (see Fig. 1 top-left). These groups represent parallel surfaces in different directions in 3D [3], while it corresponds to parallel lines in 2D (Fig. 1). These groups show a local orientation in 2D and 3D images. Texture segmentation based on the local orientations of the lamellar can be applied to extract the colonies. Note that this work is an alternative method to the ones that were already developed to estimate local orientations to segment lamellar colonies. One can mention for example the method based on the local estimation of orientation (Hessian matrix), that has been proposed by Jeulin and Moreaud [9] or the use of directional filter banks [2] to segment 3D texture images of lamellar microstructure.

Clustering algorithm
The main algorithm works in the 2D case as follows. First the local 2D gradient map considering an interrogation window W of size s 2 (in 2-dimensional space R 2 ) is used around each pixel (i,j) to estimate the local matrix of inertia Jij of input image I, as expressed below: Where denotes the averaging operation in the neighbourhood W. Following the method proposed in [1], each pixel in the 2D space is represented by a vector of length 3: that defines the vector elements of the matrix of inertia defined in a the following vector form: = a ij and so on. The diagonalization of J to get the eigenvalues and eigenvectors of the local matrix of inertia can be done sequentially (i.e. pixel by pixel). However, it is very computing time consuming. Instead, this can be done using Hadamart product (i.e. an element-wise multiplication) [7], as it has been presented in [1]. In that case, the maps of eigenvalues I 1 and I 2 are solutions of the set of quadratic equations given by (4) where denotes the Hadamart product. In the case of the studies image, the orientation map corresponds to the eigenvectors of the largest eigenvalue maps (i.e. arg max(I 1, I 2 )). Further, this orientation map is converted into a projection map representing the angle between the 2 vector components generated from the eigenvector component images. From this 2D projection map, a histogram can be easily calculated. This histogram is then used as an input for the k-means.
The basic algorithm of k-means is the following: 1. Determine the initial coordinates of the centroids. 2. Calculate the distance of each object from the centroids. 3. Group the objects based on minimum distance to the centroids. 4. Redo steps 1-3 until convergence.
K-means is one of the algorithms that require the number of clusters to be known before the clustering calculation is accomplished. Since we don't know the exact number of clusters the simplest solution consists in testing the response of the clustering steps for different number of clusters k (e.g. k = 4, 8, 12). There are a large number of strategies that propose solutions to determine the optimal number of clusters [5,8,10,11]. In our study we suggest a solution to determine the number of clusters by analysing the histogram of the local orientation map of the image.
In the present case, local maxima in the histogram are automatically detected and their number defines the number of clusters to be searched in the dataset using k-means. Alternatively, the position of the local maxima can serve for the identification of the cluster centroids at initialisation step (step 1 above), assuming that the number of iteration until convergence will be decreased. In the following, the tested data are 2D images of size 500500 pixels directly taken from a 3D X-ray tomography image lamellar titanium microstructure.

Results
By implementing the k-means clustering on the input 2D image, one can notice the strong effect of the number of clusters k (k = 4, 8, 12) on the segmentation (see Fig. 1). Our mission is to determine this number automatically.
There is a large number of strategies to determine the optimal number of clusters [5,8,10,11], all quite complex. In the present case, the histogram method mentioned above is used. Typical histograms corresponding to image1 in Fig. 3 for the averaging kernel of different half-sizes ksize used for the local mean gradient estimations are presented in Fig. 2, showing the presence of 4 peaks with different degrees of smoothing. The kernel of half-size 5 is used for the segmentation. Fig. 3 shows four images obtained from different orthoslices of the original 3D tomography image, together with their respective orientation map and k-means clustering and the segmentation results after using CHG filter.
The k-means clustering obtained after automatic analysis of the histogram of each orientation map, taking into consideration the number of the peaks, which refer to the clusters number. This reveals that the optimal number of clusters is k = 4 for image1, k = 5 for image2 and image4, and k = 6 for image3. The segmentation of the colonies reflects relatively well the reality, especially for image1 and image4 in Fig. 3. Local gradient fluctuations, which are not filtered prior to local orientation calculation, are both present in the orientation and segmentation maps inside colonies, creating the impression of noise in the orientation detection and corresponding segmentation. However, additional smoothing steps considered prior to local orientation estimation or after k-means clustering should remove most of these local fluctuations.
The results of the segmentation based on k-means clustering have also been compared with those obtained using the directional filter bank of the Complementary of HourGlass (CHG) filter presented in [2,3]. Initially thought for the 3D case, the method can be easily adapted for the 2D case, where the filtering directions are homogeneously distributed in the 2D polar space. An example of the structuring element (SE) using Epanechnikov kernel [6] is shown in Fig. 4a, while the set of conventional 8 normal directions of the SE obtained from the <1 0>, <1 1> and <1 2> normal direction families is shown in Fig. 4b.
The comparison between the two methods shown in Fig. 3 tends to support the use of the proposed segmentation method combining local orientation estimation (Fig. 3b) and k-means clustering in this case study. Indeed, when comparing Fig. 3c and Fig. 3d, one can see that the main boundaries are better delineated using the proposed method than in the CHG case (e.g. see results for images 1 and 4). CHG results do not present segmentation noise as for the proposed method, since a denoising step is integrated in the main algorithm. However, numerous colonies are segmented with more than one class (see image1 in Fig. 3d) or inversely two colonies are segmented within the same main class (e.g. image4 in Fig. 3d). The CHG method has been judged satisfying so as to correlate crack path with local colony orientation [3]. However, the current study does show that other segmentation alternatives, such as the one combining local gradient-based orientation with k-means clustering may result in better segmentation results. As far as the computation time of k-means is concerned, Table  1 reveals an improvement for image1 and image4, when the positions of the local maxima in the histogram are used to calculate the initial position of the cluster centroids. Indeed, it is important to reduce the number of iteration since k-means is known to be computationally demanding because of the number of iterations needed to update the cluster centroids until convergence. However, a detailed analysis needs to be performed why it is not generalized to all images, as it was initially guessed (e.g. using the standard random approach is faster for image2 and image3).

Conclusion
This paper has presented an implementation of k-means method on 2D X-ray microtomography images of lamellar Titanium alloys, and has proposed an approach based on histogram profile analysis of orientation map to estimate the optimal number of clusters. Segmentation results have been compared with the ones obtained using CHG filter. This study is still in the trial stage but has already shown that it outperforms the CHG method, at least on the tested images. Work is in progress to extend the approach to the 3D case.