Active Contour-Based Segmentation of Head and Neck with Adaptive Atlas Selection

This paper presents automated segmentation of structures in the Head and Neck (H&N) region, using an active contour-based joint registration and segmentation model. A new atlas selection strategy is also used. Segmentation is performed based on the dense deformation ﬁeld computed from the registration of selected structures in the atlas image that have distinct boundaries, onto the patient’s image. This approach results in robust segmentation of the structures of interest, even in the presence of tumors, or anatomical differences between the atlas and the patient image. For each patient, an atlas image is selected from the available atlas-database, based on the similarity metric value, computed after performing an afﬁne registration between each image in the atlas-database and the patient’s image. Unlike many of the previous approaches in the literature, the similarity metric is not computed over the entire image region; rather, it is computed only in the regions of soft tissue structures to be segmented. Qualitative and quantitative evaluation of the results is presented.


Introduction
Automated, accurate and robust segmentation of anatomical structures in the Head and Neck (H&N) CT images, for radiotherapy planning, is still an open challenge [1]. From the segmentation point of view, structures can be classified into two categories; (i) Structures having distinct edges and/or region characteristics with respect to the surrounding structures: Examples include bony structures like mandible, vertebra and skull, (ii) Structures that do not posses any clearly distinct characteristics: Examples include soft tissue structures like lymph node regions (IA, IB, IIA, IIB etc.) [2] and brainstem. While it is relatively easy to segment the structures that have distinct characteristics, it is very hard to directly segment the soft tissue structures without using any prior knowledge about those structures. Even for the segmentation of the structures of the first category, if the images contain noise or artifacts, incorporation of prior knowledge (like intensity distributions and shape priors) becomes essential for improving the segmentation quality.
Atlas-based methods have become a standard paradigm in medical image segmentation for exploiting prior anatomical knowledge [3]. Atlas is a reference image in which structures of interest have been accurately segmented, usually by hand, or by semi-automatic methods. The general procedure for obtaining the segmentation of the structures of interest in the patient's image can be summarized in a two-step procedure: First, a dense deformation field that maps the atlas image onto the patient's image is computed using appropriate registration methods. Second, the deformation field computed above is applied to the already segmented structures of interest in the atlas image, and this results in the segmentation of the structures in the patient's image.
Accuracy of the atlas-based segmentation largely depends on the registration methods used for computing the dense deformation field. It is common to perform an affine registration as a preliminary step before performing the actual non-rigid registration, in order to correct for global differences like position and orientation. The important non-rigid registration methods that have been studied in the literature in the context of H&N segmentation are as follows: Parraga et al. [4] compared cubic B-Spline registration method [5] with the Morphons [6] and concluded that Morphons performed better. Commowick et al. [7] performed locally affine registration followed by non-rigid registration which is an extension of the block-matching registration. Han et al. [8] performed an object driven block matching registration followed by a shape constrained dense deformable registration. Duay et al. [9] proposed a generalized optical flow and active contour-based atlas registration method. Gorthi et al. [1] extended this method to the segmentation of the lymph nodes in the H&N region; the proposed algorithm is compared with the radial basis function registration method [10], and Demons registration model [11] and found to perform better. In this paper, the active contour-based registration method in [1] is used to segment two specific structures in the H&N region, namely, mandible and brainstem, representing a bony structure and a soft tissue organ respectively. More details of this algorithm are presented in Section 2. The details of the segmentation approach are presented in Section 3.
Another important component of the atlas-based segmentation is the criteria used for the selection/construction of the atlas. In majority of the works, a single segmented individual image is selected as the atlas for all the patients' images to be segmented [12,13,14]. But, such selection can cause undesirable bias due to the possibility of huge anatomical variations between the atlas and the patient's image, and thereby resulting in poor segmentation. A brief review of the atlas selection criteria used in the context of H&N segmentation is presented in Section 4, followed by the details of the proposed atlas selection strategy. As mentioned in the preceding Section, H&N structures are segmented in this paper using an Active Contourbased Atlas Registration (ACBAR) model. This Section presents a brief description of this model, and the readers are referred to [1] for further details.
The formulation of the active contour-based registration model has been intuitively deduced from the level set evolution Eq. 1, introduced by Osher and Sethian in [15] where ν is the velocity of the flow or speed function that contains the local segmentation and contour regularization constraints, and φ d : Ω → R (for the image domain Ω) is the signed distance function often used to represent implicitly the active contour (AC) by its zero level. In ACBAR model, the signed distance function is tracked with optical flow (OF) approach [16]. The OF technique assumes that the brightness of the moving image (atlas), here the level set function φ d , stays constant for small displacements and a short period of time: where du is the instantaneous deformation vector field and dφ d is the total derivative of φ d . By using the chain rule, this optical flow constraint can be rewritten as the evolution equation of a vector flow: where φ d,t , given by Eq. 1, represents the variation of the level set function according to the desired forces such as supervised segmentation, shape prior knowledge or contour regularization. Thus, introducing the evolution equation of the level set segmentation model (1) in (3) gives the following formula, merging the active contour segmentation framework with the image registration task: The level set function φ d does not evolve with the usual finite difference scheme. Its position at time t is given by the deformation field u(x,t) and the initial level set function φ d (x, 0) such that: where φ d (x, 0) is the initial active contour position. This ensures that the evolution of the level set function exactly corresponds to the current deformation. Introducing (5) in (4) yields: Note that u : R n → R n (n is the image dimension; n = 3 in this paper) is the deformation field vector computed at each point of the level set function. The level set function models the contours of the objects selected in the atlas to drive its registration.
The above signed distance function representation (φ d ) can be used with any type of forces derived from the active contour framework. However, this representation can model only two regions. To cope with this limitation, active contours that are selected in the atlas to drive its registration are represented by a label  function φ L . This label function permits to define an arbitrary number of regions as follows: where Ω k is the k th labeled region and n is the number of regions. In this representation, active contours are modeled by the discontinuities of φ L . The main advantage of the label function representation is to distinguish n regions by using only one function. However, this representation does not contain the polarity information (information indicating the inside (Ω in ) and the outside (Ω out ) of a modeled region) necessary to compute the region-based forces of the AC segmentation framework. Hence, a function S is introduced into the general formulation of the model (6) in order to generate the polarity information. The objective of this function is to adapt the orientation of the gradient φ L based on local label values such that it always gives the polarity of the current region, i.e. S(x)( φ L ) is always oriented from the inside to the outside of the region. Please refer to [1] for the detailed description of the function. With this label function representation, the general formulation of the registration model (6) becomes: This equation corresponds to the general formulation of the ACBAR model. Note that different types of forces coming from AC segmentation framework as well as from OF framework can be used with this model. For example, these forces include: regularization forces like mean curvature, pixel-based forces, boundary-based forces and region-based forces.

Segmentation
The whole process of segmentation of structures in the H&N region is illustrated in Fig. 1. ACBAR is used for registering the atlas onto the patient's image. The structures of interest selected in the atlas to drive the registration are represented by a labeled function. This label function for one of the images is shown in the last column of Fig. 2. It labels the external contour of the image, skull, mandible, vertebrae and trachea. These structures have been chosen for the following two reasons. First they are closely located to the structures to be segmented. Hence their registration will influence the location of the structures of interest. Then they have distinct boundaries and are consistent in all images. For registration, a region-based force inspired by the unsupervised region-based segmentation model proposed by Chan and Vese [17] is used. This force is derived from the following energy designed to be minimal when the mean of a region Ω  defined in the target image by the evolving level set function (or label function) is close to the mean of the corresponding region in the reference image: where Ω in is the image area inside the contour, and Ω out is the image area outside the contour, µ prior is the prior mean of a given region extracted from a reference image (the atlas) and I is the intensity function of the image to segment. In the current formulation, energy value in equation (8) is used as the speed function ν in equation (7). Equation (8) is computed at the end of each iteration. Note that µ prior does not evolve during the registration process. Hence it is computed once on the reference image in a pre-process step.
Once the dense deformation field matching the atlas to the patient image is computed, the segmentation process ends up as the classical atlas-based method. The transformation is applied to the manually segmented structures of interest of the atlas image for automatically obtaining their segmentation on the patient's image.

Atlas Selection Strategy
For a nice overview of the atlas selection strategies in general, the interested readers are referred to [18]. This is Section focuses particularly on H&N segmentation. A brief review of the important atlas selection/- Commowick et al. [7] constructed a mean symmetric statistical atlas from several atlas images, and performed the segmentation using the mean atlas. However, because of the high anatomical variability in the H&N region among different patients, the generated mean atlas is found to be very different from the patient, resulting in over-segmentation of certain structures; hence, Commowick et al. [19] preferred the selection of a single best atlas than the mean atlas. The most similar image in their case is defined as the one that will be less deformed when nonlinearly registered onto the patient. Han et al. [8] evaluated two atlas selection strategies: First, a single atlas image with the highest mutual information (MI) similarity to the patient's image after the global linear registration is selected for the final segmentation. Second, multiple segmentations are performed using different atlas images, and the segmentations are fused using the STA-PLE algorithm [20]. They concluded that multiple segmentation approach, in general, has performed well over the single-best-atlas strategy.
In this paper, a slightly different approach is used for the atlas selection. A single-best-atlas strategy is used with the following variations compared to the existing approaches: • The most similar image is defined as the one with the least mean square error metric. Since the images to be registered are of the same modality, and also have similar intensity range, Mean Square Error (MSE) metric is preferred over the other metrics like MI and Kullback-leibler distance. Note that in the current context, Normalized Correlation (NC) metric provides results similar to MSE, and thus NC can also be used instead of MSE metric. As usual, the metric is computed after performing an affine registration between the images.
• The similarity metric is not computed over the entire image; rather, it is computed only in the regions of the soft tissue structures to be segmented. Note that the soft tissue structure of interest in this paper is the brainstem. Thus, while performing the initial affine registration, the similarity metric is computed over the entire image region, and after the affine registration, while selecting the atlas, the similarity metric value is computed only in the brainstem region. The reason for computing the metric only in the regions of interest is, to avoid any undesirable bias from the other structures in the image. For instance, if there are artifacts in certain regions of the image that are not at all relevant to the current structures to be segmented, this selective metric computation approach will avoid the undesirable bias in the atlas selection due to such regions. The reason for limiting the metric computation to soft tissue structures alone is because, the hard (bony) structures of interest anyway have the advantage of having distinct boundaries compared to soft tissue structures. Thus, the selection of the most similar atlas is more crucial for the soft tissue structure regions than the bony structures. Hence, the metric is computed only in the regions of soft tissue structures of interest.

Datasets, Preprocessing and Parameter Setting
The datasets used in this paper are provided by Princess Margaret Hospital, Toronto, as a part of the H&N auto-segmentation challenge [21]. It currently consists of 18 CT datasets with the typical resolution of 1 mm × 1 mm × 2 mm, and represents a wide variety of patient body types. Ground ). Hence, 10 datasets with the provided expert segmentations are used as atlas database. However, the evaluation is performed both on the training data, and testing data; while evaluating on the training data, atlas for each image is selected using the leave-one-out strategy [1].
In order to speed up the registration, the datasets are cropped in the Z-direction; the datasets are cropped such a way that they include all the structures of interest (brainstem and mandible in this case) as well as 3 additional axial slices both at the top and bottom of the images. The size of each slice is 512 × 512 pixels, and the number of slices for the datasets after cropping are varying in the range 43 − 64. Then, labeled images containing the selected structures that drive the active contour-based registration process are created for all the 10 atlas datasets, using a semiautomatic segmentation followed by manual corrections. Note that the labels for the mandible are created by simply reusing the already existing manual segmentations by the clinical expert. Labeled image for one of the atlas datasets is shown in the last column of Fig. 2.
All the code for segmentation algorithm is implemented in ITK [22] environment. The initial affine registration is performed using a multi resolution approach for improving the speed, accuracy, and robustness; 3 levels of resolutions are used. Mean square error metric is used for registration. Optimization of the affine transform parameters is done using the regular step gradient descent method. In the first (coarse) level of resolution, maximum and minimum levels of step lengths used for the optimizer are 10 and 0.01 respectively. These values are decreased by a factor of 4 and 10 respectively with the increasing (fine) resolution level. A linear interpolator is used for the registration. Concerning the parameters setting for the final active contour-based registration: the metric, optimizer and interpolator are same that of the affine registration; 4 levels of resolutions, and σ = 1.5 for the Gaussian filtering performing the linear diffusion are used. Table 1 presents, for each patient, the details of the atlas image that appears to be the best match, and the atlas that appears to have maximum anatomical differences with respect to the patient's image to be segmented. The atlas number, and the mean square error (MSE) values associated with these two extreme atlases are specified in this Table. As mentioned earlier, MSE is computed in the brainstem region, after performing an affine registration. The differences in MSE values between the two extreme atlases clearly emphasize the importance of selecting an appropriate atlas.

Evaluation Results
This Subsection presents both qualitative and quantitative evaluation of the automated segmentation of two structures of interest in the H&N region; the structures considered for the evaluation are: (i) mandible, a bony structure, and (ii) brainstem, a soft tissue structure. Automated segmentations are quantitatively evaluated using different Hausdorff Distance (HD)-based and overlap-based metrics. The exact HD-based metrics used are: mean HD, median HD and the number of slices with HD more than 3 mm. Hausdorff distance evaluated slice-wise can be considered as an indicator whether a particular slice needs to be corrected. Since deviations less than 3 mm are often considered as acceptable by the clinicians, the evaluation criterion of "number of slices with HD exceeding 3 mm" is also used. The overlap-based metrics used are: average slice overlap, median slice overlap and total volume overlap. Please refer to [21] for a detailed description of these evaluation metrics. Note that dataset no. 13 is excluded from the brainstem evaluation due to truncated field of view, this happens rather often in the clinical practice. The readers are referred to [21] for more details on these evaluation metrics.
One of the patient's image to be segmented is shown in the first column of Fig. 2  same figure shows the atlas selected based on the strategy described in Section 4. Last column of this figure shows the labeled image associated with selected atlas, containing the structures that are used for driving the registration process. As mentioned earlier, as a first step of segmentation, active contour-based registration is performed, and a dense deformation field that maps the atlas onto the patient's image is computed. In the next step, the dense deformation field is applied to the manually delineated mandible and brainstem structures on the atlas, resulting in the automated segmentation of these structures on the patient's image. Fig. 3 shows the automated segmentation of the mandible along the axial slices, for the dataset 12. For a qualitative evaluation of these results, ground truth segmentations created by the clinical expert are superposed in green in the same slices. The segmentation results for the brainstem are also presented in a similar manner in Fig. 4, for the same dataset. HD-statistics for the mandible and the brainstem are presented in Table 2. Similarly, overlap-statistics for both these structures are presented in Table 3.

Discussion
In the preceding Section, automated segmentations of the mandible and the brainstem structures are quantitatively validated using different Hausdorff Distance (HD)-based and overlap-based measures. For the mandible, the average values of mean-HD and median-HD across the 18 datasets are 16   good for the segmentation of the bony structures (mandible) as well as the soft tissue structures (brainstem). From the HD point of view, brainstem segmentation appears more accurate than the mandible. However, it is easy to notice from Fig. 3, the reasons for the high values of HD for the mandible, despite having relatively distinct boundaries; first, the topology of the mandible, when it is seen axial slices wise, is split in some slices, and thus is geometrically spread over a relatively larger region; second, despite the mandible having distinct boundaries, presence of artifacts makes the segmentation a challenging task. To summarize, while the current results are promising, they also highlight the scope for further improvement.
It is possible to further improve the accuracy of segmentation within the current framework itself. For instance, at this point, only region-based forces are used for evolving the level sets of the contours. But, in addition, forces like shape-based forces can also be used in the ACBAR model, and thereby more prior knowledge can be incorporated into the segmentation process. Another possible extension is to identify and label additional significant structures to drive the atlas-based registration, and in turn improve the accuracy of segmentation. The current atlas database contains only 10 images; expanding the database to more images that cover a wide range of anatomical variations will further improve the accuracy of segmentation.

Conclusions
This paper has presented the segmentation of structures in the Head and Neck (H&N) region, using an Active Contour-Based Atlas Registration (ACBAR) model. The presented framework facilitates using any of the segmentation forces coming from the active contour segmentation framework, as well as the optical flow framework; these forces include: regularization forces, pixel-based forces, boundary-based forces, and  77.9 ± 10.0 80.8 ± 6.7 77.8 ± 7.4 77.5 ± 6.9 region-based forces. Another potential advantage of ACBAR model is, since the registration is driven by the selected structures in the labeled image, it can perform the segmentation accurately even in the presence of inconsistencies like tumors.
Because of the high anatomical variability in the H&N region among different datasets, atlas selection plays an important role in obtaining an accurate segmentation. Thus, another contribution of this paper is in efficient atlas selection. For each patient's image to be segmented, an affine registration is performed with all the images in the atlas database; then, an image with the least mean square error value in the regions of soft tissue structures to be segmented is selected as the atlas image for that patient's image. This approach avoids undesirable bias from the other structures in the image and thereby results in the selection of the atlas image that is the most similar one to the patient's image in the regions of soft tissue structures to be segmented.
Future directions for further improving the accuracy of H&N region segmentation include using additional forces like shape-based priors, including more structures for driving the registration, and expanding the atlas database. Latest