University of Birmingham Morphological Classification of Odontogenic Keratocysts Using Bouligand-Minkowski Fractal Descriptors

(


Introduction
Cysts are pathological cavities containing fluid or semi-fluid content and lined by epithelial tissue. The diagnosis of jaw cysts is based on histopathology features, firstly to identify to which of the various cyst types a lesion corresponds to (with different origins, behaviour and prognosis), and secondly, to avoid misdiagnosis with other lesions (e.g. as intra osseous squamous cell carcinoma, unicystic ameloblastoma and other tumours) might present with similar radiographical features but require different treatments.
Among the cysts arising in the jaws, two important types are the 'radicular cyst' and the 'odontogenic keratocyst' (OKC). Radicular cysts are the most common type (55% of odontogenic cysts [10]), they are associated with the roots of teeth with non-vital pulps (e.g. due to advanced dental caries) and have slow growth. OKCs are less frequent (12% of odontogenic cysts [10]), they are not associated with dental disease and have certain characteristics common with neoplasms (e.g. active epithelial growth [21,14,15] and higher recurrence rates). Furthermore, there are two OKC sub-types; they can be solitary cysts (sporadic), or they can be multiple (synchronous or metachronous) as part of a rare autosomal dominant disease, the Gorlin-Goltz or Basal Cell Naevus Syndrome (BCNS) [11,20]. About 85% of syndromic and 30% of sporadic cases have mutations of the PTCH1 gene [13], (similarly to basal cell carcinomas of skin) indicating a potential common pathogenesis across OKC sub-types. However, there seem to be differences in the behaviour of syndromic and sporadic OKCs too, and therefore any morphological evidence that could help differentiating between the two sub-types is of diagnostic and predictive importance.
Woolgar et al. [22,23]  between syndromic and sporadic OKCs. While those features are relevant in histopathological terms to understand the organisation of the OKCs, they are difficult to characterise using automated or semiautomated approaches as they require contextual descriptions of cells and tissues that are not directly translated into quantitative features that machine learning algorithms can currently handle. Although an automated technique was proposed to classify other groups of oral cysts in [8], so far in the domain of machine diagnosis the only work addressing the diagnostic differences between syndromic and nonsyndromic OKC cysts is [11], where the differences in the morphology of algorithmically segmented cells and the architectural organisation of the epithelial lining of inflammation-free regions of OKCs sub-types were investigated using semi-automated image analysis. Such differences, however, were not sufficient to achieve OKC sub-type classification purposes (60% correct classification rate), although they allowed a good discrimination of OKCs from the most common type of jaw cyst (i.e. radicular cysts) [11]. For expert histopathologists, the latter discrimination should not be a particularly difficult task, but is currently far from becoming automated. Furthermore, the differences between OKC sub-types are more difficult to quantify. If it was possible to rationalise the differences between all these lesions systematically, they could become histolgical discriminators for cases of BCNS, specially in retrospective analyses, in cases exhibiting low penetrance, and to develop indicative markers of recurrence potential. Advances in digital pathology and virtual slides technology have made attractive the possibility of developing machine learning algorithms to help pathologists gather reproducible morphological evidence for diagnostic purposes, to allow analysis of large datasets with multiple tissue sections and to evaluate the results of treatment modalities. The purpose of this paper is to investigate a statistical approach to the classification of cyst lining images, based on a machine vision algorithm (Bouligand-Minkowski descriptors) that does not require high levels of segmentation of histologically relevant features.

Materials
The material used in this paper consists of an anonymised database of 150 images of haematoxylin and eosin (H & E) sections from formalin fixed and paraffin embedded specimens of developmental (solitary and syndromic OKCs) or inflammatory (radicular cysts) origin. Briefly, the database included 65 images of the lining from 13 cases of sporadic (k) OKCs, 40 images from 8 cases of syndromic (s) OKCs and 45 images from 9 cases of radicular (r) cysts (i.e. 5 nonoverlapping images per case and with no sectioning artefacts). The cyst types were previously determined by histopathologists using histological as well as clinical information from the respective patients.
The images were captured using an Olympus BX50 microscope (Olympus Optical Co. Tokyo, Japan) with a x40 objective and a colour camera JVC KY-55B (JVC, Tokyo, Japan) attached to a 24 bit RGB frame grabber (Imaging Technologies IT4PCI, Bedford, MA, USA). With this setup the image size was 768×572 pixels with an inter-pixel distance of 0.31 µm. The images were background corrected for illumination uniformity (by means of the transmittance ratio of the specimen with an empty bright field frame), for camera bias (by subtraction of a non illuminated frame) and for camera shot noise (each image was the average of 32 consecutive frames). The extent of the epithelial lining of the lesions was segmented (from the background and connective capsule) by optical intensity thresholding (described below), and further analysis was done exclusively on the pixel values of the epithelial lining. Fig. 1 shows one sample for each group (sporadic OKC, radicular and syndromic OKC).
Finally, all the stained images were converted to grey values, where the in the intensity G of each pixel is obtained from the R, G and B components using the following weighted sum:

Method
We investigated the classification performance of the Bouligand-Minkowski (B-M) descriptors developed in [2], to classify the images of cyst linings into their original diagnostic groups k (sporadic OKCs), s (syndromic OKCs) and r (radicular cysts). To our knowledge, this is the first proposal for using a statistical approach to this task, while previous works used morphological features that require more complicated pipelines to guarantee that the generated histologicallyrelevant features are segmented accurately. The method consists of three steps applied to each image: 1) segmenting the epithelial lining from the background (i.e. empty space of the cystic lumen and the surrounding cyst fibrous capsule), 2) computing the Bouligand-Minkowski descriptors of the lining image and 3) concatenating and submitting such descriptors to a classifier algorithm to predict the original type of cysts they belonged to. Section 3.1 briefly describes the steps involved in the segmentation, whereas Sections 3.2-4 provide the theoretical background on the B-M descriptors.

Segmentation
The segmentation of the epithelial lining was performed by a combination of colour deconvolution and morphological operations. First, colour deconvolution separated the main colour components of the hematoxylin and eosin stained regions into two channels. The epithelial lining typically showed more intense staining in the haematoxylin channel than the other parts of the sample. The histogram equalization and thresholding of this channel allowed highlighting the region containing the epithelial cells. Finally, morphological filtering (using binary reconstruction) was used to eliminate small features producing a clean image where the largest region closely corresponded to the epithelial lining of the cysts. Fig. 2 illustrates the procedure and each step involved. More details can be found in [11].

Fractal geometry
A fractal [16] is a set of points embedded in a topological space that exhibits self-similarity (i.e. the characteristic of being exactly or statistically similar independently of the scale of observation). Fractal geometry concerns the study of the properties that are scale-independent and the self-similarity of such objects. The most commonly used property to characterise such objects is the 'fractal dimension'. This describes the rate of space filling of the set with scale and it can be defined through the Hausdorff-Besicovitch dimension, a concept from Measure Theory, which requires some knowledge of the analytical rules that generated the fractal set.
Fractal scale independence implies infinite amount of morphological detail. Real-world natural objects cannot be infinitely complex, yet some degree of self-similarity and complexity is a common feature. Therefore, it is possible to model such objects by means of fractal geometry concepts (e.g. the fractal dimension) even when there are no well-defined analytical rules (as required by Measure Theory) responsible for generating the sets. An alternative approximation to the Hausdorff-Besicovitch dimension consists of measuring some physical property of the object over a range of scales, e.g. with a ruler with measuring unit length ϵ and counting the number N(ϵ) of units necessary to measure the length of the object. Such abstraction can be extended to any embedding space endowed with a topological dimension D T by enlarging the length ϵ and recomputing N(ϵ), to estimate the fractal dimension D: In practice, the limit is estimated from the slope of a straight line fit to the plot of log N(ϵ) vs. log 1/ϵ. Quantities other than N(ϵ) can be used to estimate the self-similarity of objects using, e.g., methods such as boxcounting, B-M dilation and mass-radius [6]. Here we concentrate on the B-M approach because it can be easily implemented and was shown to be useful in characterising a number of biological problems [19,2,7].

Bouligand-Minkowski dimension
The B-M fractal dimension, originally developed for the analysis of binary sets [6], was adapted here to analyse grey-level images. The procedure consists of two steps. First, the grey level image is mapped onto a cloud C of points within a discretised bounded region E of the three-dimensional Euclidean space. Each pixel with coordinates x y M N ( , ) ∈ ([1, ] × [1, ]) and in- Fig. 3 illustrates such procedure.
Second, the points are gradually dilated by spheres with increasing radii r, starting with r=1, and the total volume V(r) of the dilated cloud is measured. By replacing N(ϵ) by V(r) in Eq. (2) and considering that the power-law scales with r 2 in the B-M dilation, the dimension is estimated by

Bouligand-Minkowski descriptors
Here, instead of estimating the dimension D BM , the method uses the values of log V r ( ) within a pre-defined range of radii r as the B-M descriptors. When an image is homogeneous, the procedure results in a regular cloud of points with the plot of log V r ( ) vs. log r approaching a linear function. With heterogeneous images, at some scales the spheres start merging for various values of r and the log-log plot shows a different pattern. This curve therefore is a representation of the image homogeneity at various scales (radii) and this makes it appealing for characterisation of grey-level image complexity [2].
The set of volumes V(r) are obtained by summing the points pertaining to the union of spheres B p r ( , ) centred at each point p C ∈ with radius r, that is, where 1 is the indicator function and In discrete spaces (such as digital images), B p r ( , ) is the (finite) set of points at a distance at most r from p. Therefore the number of points in U can be efficiently computed by means of the Euclidean Distance Transform (EDT). In a three-dimensional space, each point p E ′ ∈ is transformed into: where p p dist( , ′) is the Euclidean distance between the vectors. Several methods exist to compute the EDT efficiently [5].
To compute the B-M descriptors D from the EDT, all the possible values of the transform within the region of interest in E are increasingly sorted into a vector o(k) and the descriptors are given by the logarithm of the cumulated sum of points p′ such that where # expresses for set cardinality. The variable k indexes all the non-negative values for the EDT within the considered space, so that e.g., if we take r ≤ 2 there are four possible values for k (1, 2 , 3 and 2), and therefore four B-M descriptors are produced. We use r ≤ 8, to provide 64 descriptors, although only the first 50 are used in our experiments, as no significant classification gains were achieved beyond that number.

Assessment
As described in Section 2, the database contains 150 images collected from 30 independent cases (5 images per case).
The B-M descriptors were used for image classification as input parameters of a Linear Discriminant Analysis (LDA) supervised classifier [4]. This method was chosen due to its ability of decorrelating features among the samples. For small values of k the difference between the growth of D k ( ) is subtle, so simpler classifiers (such as knearest-neighbours [4]) for instance, might not be capable of identifying patterns useful for the classification.
We used two approaches to define training and testing sets and assess the classification performance, to know, a random 10-fold to evaluate the performance of the classifier when the number of descriptors is varied and a nested 10-10-fold employed to obtain the best classifier setup for the cyst classification. In both cases, the B-M descriptors were subjected to principal component analysis [4] with the aim of decorrelating the image features, as the cumulative way that B-M descriptors are obtained tends to result in some degree of inter-descriptors dependence. If any random fold did not contain at least one sample from each group, that classification round was discarded and the procedure was repeated until satisfying such requirement.
In the first approach, the set of samples is randomly divided into 10 equal sub-sets and in each round of classification one sub-set is used for testing and all the other sub-sets are used for training. The success rate is given by the ratio of the number of images correctly classified and the total number of images in the database. The procedure is repeated 10 times (each time with a different arrangement of the random sub-sets) and the final success rate is given by the total average. The second approach differs from the first one in that instead of running over all possible numbers of features, an ideal number is chosen by a second 10-fold procedure performed only over the testing set. In this way, the classification involves two nested 10-folds, an external one responsible for the classification and an internal procedure where the optimum number of descriptors is determined.

Results
The results are reported below by groupings (k: sporadic OKCs, r: radicular cysts and s: syndromic OKCs) having diagnostic relevance (k×r×s, ks×r and k×s).
All the results are expressed in terms of classification accuracy Acc, which is defined by the following expression: where TP is the number of true positives, i.e., the number of samples/ cases assigned to a particular group and actually pertaining to that group and TN is the number of true negatives, i.e., the number of samples/cases not assigned to a particular group and that actually do not come from that group. In this way, TP TN + is the total number of samples and therefore Acc can also be interpreted as the ratio of images correctly classified with respect to the total number of samples in the database.
The first test evaluates the success rate of the classification (percentage of images correctly classified) assessed for a number of descriptors ranging between 1 and 50. In addition to the original unprocessed images, we also applied three different pre-processing operations: histogram equalization, Gaussian filter blurring (kernel of radius=5 pixels and standard deviation=1) and sub-sampling size reduction by half, to test the robustness of the classification subject to possible variations in image quality. Histogram equalization enhances the contrast of images, balancing excessively dark or clear images. Blurring and subsampling simulate practical situations where the images are collected with sub-optimal microscope focusing and at lower resolution, respectively. The objective of verifying the performance in such conditions is only to assess how much and in which case such operations affect the final result. For the purpose of discussion and comparison only the results for the original images were considered. Fig. 4 shows the plot of success rates for the discrimination k×r×s, Fig. 5 for ks×r and Fig. 6 for k×s. Table 1 lists the success rates for each discrimination task and for the different pre-processings applied. Here the nested 10-10-fold is employed and an ideal number of features is selected by the internal 10-fold procedure. Confirming what was expected from the biological problem, the discrimination between keratocysts is a more difficult challenge than the identification of radicular cysts. Tables 2-4 show the confusion matrices for k×r×s, ks×r and k×s classification, respectively. In this representation, the rows contain the actual (expected) groups and the columns, the predicted groups. We opted for normalising this matrix as the number of samples assigned to each group is an average of all the classification rounds. That results in fractional values (including a non-integer number of samples) which should not be misinterpreted in this context. It is worth noticing that the discrimination between syndromic (k) and sporadic (s) OKCs tends to be lower than between OKC cysts (of any type) and radicular (r) cysts.
At this point, we also tested the possibility of using only the H component of the image (Hematoxylin) as that dye is mostly uptaken by cell nuclei. The H component was extracted from the H E & image using a procedure called colour deconvolution. This involves an orthonormal transformation of the Red-Green-Blue pixel components, once the individual dye vectors (in this case H and E separately) have been determined. The procedure enables determining the contribution of each individual dye used in the staining. Futher details can be found in [18]. Tables 5-8 list the results obtained for the same types of preprocessed images. In general the average performance is similar to that achieved using the original image. Table 9 shows the success rates analysed case-wise (the database contains 30 individual cases, with 5 images from each). For each case, the percentage of samples correctly classified as k, r or s is taken into account. The classifier is said to identify the lesion type if more than 50% of a case are correctly classified. In general, all the success rates are now larger than in the sample-wise classification, even because the classification problem is easier in fact, given the reduced number of elements to be discriminated (30 against 150 in the sample-wise test). Tables 10-12 show the confusion matrices for the case-wise classification of k×r×s, ks×r and k×s, respectively.
Finally, Table 13 shows the contingency tables comparing the classification performance of the approach proposed in this work to that previously achieved in [11]. The same table shows the respective p-values of a McNemar test [3] for rejecting the null hypothesis that both classifiers are equivalent. In all cases it was demonstrated that the difference in the classification performance is statistically significant with an error within the confidence interval. . The test is also applied to pre-processed images. Table 1 Classification accuracy in the discrimination between all the types of cysts here considered (k×r×s), between radicular (r) and OKC (ks×r) and between both types of OKCs (k×s) (k: sporadic and s: syndromic OKSs).

Discussion
Odontogenic cysts arise from proliferation of epithelial cell rests commonly found in the jaw bones. These cells were originally part of the tooth-forming apparatus and subsequently remain inactive through life, but can occasionally proliferate spontaneously or in response to inflammatory processes occurring nearby.
Traditional histopathology is based on the interpretation of the morphological appearance of cells and tissues, and while this is the current diagnostic gold standard, the assessment of morphological features through visual observation has unavoidable elements of subjectivity. For instance, cell and tissues shapes and sizes and staining patterns are difficult to assess consistently across observers, making the reproducibility of diagnostic procedures less robust than desired. The arrival of digital imaging and digital pathology, however, has opened the opportunity to treat histopathological digital images as numerical data sets which can be processed by algorithms for image reconstruction, enhancement and classification. The rationale for the development of automated diagnosis therefore aims to more accurate and reproducible diagnosis and to analyse more tissue samples than is     currently possible. Within this context, automated diagnosis can be approached from two different perspectives. One is the interpretation of digital images in terms of histologically-relevant concepts. This type of intelligent imaging requires algorithms that associate pixel values into discrete regions with special histological meaning (such as cells and nuclei, see for example [17,1] to then compare and measure those against known spatial and temporal models of normality and disease [12]. Such model-interpreting computations aim to approach algorithmically the tasks pathologists perform. An alternative is a data-driven, 'machine vision' approach based on the analysis of the image data itself, without need of identifying histologically-relevant structures. Both approaches have their benefits. Ideally, intelligent imaging might provide the means to enable histopathological generalisations to be made. The data-analysis approach, on the other hand, can be simpler to implement by avoiding intermediate, higher level interpretation of the image data. However the solutions encountered might not be specific to the biological problem and prevent any generalisation (in pathology terms) of the solutions achieved.
Here we concentrated on an 'image data' analysis approach to the problem of classifying sporadic, syndromic OKC and radicular cyst images solely based on the staining patterns of the epithelial lining of     these lesions. This is a difficult task that, to date, has not been resolved satisfactorily by neither expert observers nor algorithmic approaches. Our analysis based on the B-M descriptors achieved better results classifying OKCs vs. radicular cysts than those previously reported in [11]/(98% (Table 1) against around 95%), or in [9].
When discriminating between the three types of cysts, our approach achieved a better success rate (72%, Table 1), compared to 66% in [11]. The previous reported success rate in the discrimination of solitary from syndrome OKCs was also outperformed by the approach here described (68% (Table 1) vs. 60%).
We have used a nested cross-validation procedure which is a more challenging than the standard cross-validation, ensuring a fair feature selection stage with no a priori knowledge about images in the training set and using a similar number of images for training and testing sets. The patient-wise analysis, had an average success rate of 76% for k×r×s (average between the two majority success rates in Table 9), 100% for ks×r and 71% for k×s).
The effects of image variation on the analysis were non-predictable, k×r×s classification was better (except with equalization, which impaired the result) and k×s and worse in ks×r discrimination. As the mapping of pixel values into a cloud is sensitive to pixel intensity (the z coordinate) different procedures are likely to change the image data in somewhat unexpected manner.

Conclusion
We applied the state-of-the-art Bouligand-Minkowski descriptors to classify the epithelial lining of the sub-types of OKCs and discriminate between OKCs and radicular cysts. Unlike previous attempts to resolve this problem, we used a statistical approach to analyse and quantify the complex patterns of pixel intensities in the image. This outperformed previous sample classification results and demonstrates that in addition to histologically-relevant structures (such as those in [11]) the distribution pixel intensities also carry useful diagnostic importance for image characterisation. This highlights the value of texture-based image analysis to describe complex histological images.
The results achieved in the classification of cyst types are of high relevance to histopathology as they can form the basis for automatic diagnosis with known levels of accuracy, making the diagnostic process more reproducible and less time-consuming. This also opens the possibility to automatic pre-screening of larger numbers of histological samples from the same case than it is currently possible, therefore making diagnosis more robust.