Discriminative Dictionary Learning for Abdominal Multi-Organ Segmentation

An automated segmentation method is presented for multi-organ segmentation in abdominal CT images. Dictionary learning and sparse coding techniques are used in the proposed method to generate target speciﬁc priors for segmentation. The method simultaneously learns dictionaries which have reconstructive power and classiﬁers which have discriminative ability from a set of selected atlases. Based on the learnt dictionaries and classiﬁers, probabilistic atlases are then generated to provide priors for the segmentation of unseen target images. The ﬁnal segmentation is obtained by applying a post-processing step based on a graph-cuts method. In addition, this paper proposes a voxel-wise local atlas selection strategy to deal with high inter-subject variation in abdominal CT images. The segmentation performance of the proposed method with diﬀerent atlas selection strategies are also compared. Our proposed method has been evaluated on a database of 150 abdominal CT images and achieves a promising segmentation performance with Dice overlap values of 94.9%, 93.6%, 71.1%, and 92.5% for liver, kidneys, pancreas, and spleen, respectively.


Introduction
CT-based clinical assessment of abdominal organs relies on quantitative measures and comprehensive analysis of multiple organs in order to identify disorders (Linguraru et al., 2012).The segmentation of multiple abdominal organs enables quantitative analysis of different organs, providing invaluable input for computer aided diagnosis (CAD) systems.For instance, liver segmentation is helpful in the automatic detection and definition of focal lesions (Liu et al., 2004).The segmentation of the pancreas facilitate the diagnosis of dilated pancreatic ducts or inflamed pancreatic tissues (Shimizu et al., 2010).The measurement of the size of the kidney is useful in evaluating its conditions.Other applications like radiotherapy planning as well as cancer detection and staging also require the accurate segmentation of abdominal organs.An automated segmentation approach can eliminate the need for manual labeling by trained observers, i.e. radiologists.Many segmentation approaches have been developed for abdominal computed tomography (CT) scans in recent years.Most of these approaches are based on statistical shape models (Heimann et al., 2006;Okada et al., 2008b;Spiegel et al., 2009;Wimmer et al., 2009;Cerrolaza et al., 2014) or multi-atlas segmentation (Park et al., 2003;Okada et al., 2008b;Shimizu et al., 2011;Wolz et al., 2013;Wang et al., 2014).In these methods, an shape model or a probability atlas is calculated by averaging shape or location priors of multiple spatially aligned atlases.Such statistical shape models or probabilistic atlases can then provide prior knowledge for the segmentation of organs in the target image.A combination of statistical shape models and probabilistic atlases has also been proposed (Okada et al., 2008a;Wang et al., 2012;Okada et al., 2013) to incorporate both shape and location priors for segmentation tasks.
Since the introduction of statistical shape models in the early 1990s (Cootes et al., 1995), these models have been proven very effective in various image segmentation applications.Gao et al. (1998) presented early work using statistical shape models for segmentation of abdominal organs.Later, Heimann et al. (2006) showed a successful application of an active shape model in the segmentation of the liver in CT scans.An automated segmentation method using statistical shape models was proposed by Shimizu et al. (2010) to successfully segment the pancreas.An enhanced shape model approach that integrates a hierarchical framework was also proposed in Bagci et al. (2012) for improving the segmentation accuracy.Another interesting study was presented in Cerrolaza et al. (2014) which introduces a generalized multiresolution hierarchical shape model to efficiently describe the shape variability of different organs to improve the segmentation performance of statistical shape models.
Early work using probabilistic atlases was described in Park et al. (2003), where a statistical atlas of the liver and the kidneys was shown to be helpful for the segmentation of these organs.Recent work have incorporated spatial priori knowledge for different abdominal organs (Okada et al., 2008a;Linguraru et al., 2010;Shimizu et al., 2011;Oda et al., 2012;Linguraru et al., 2012;Wolz et al., 2013;Wang et al., 2014).Notably, Okada et al. (2008a) constructed a hierarchical multi-organ statistical atlas for improving segmentation performance.In order to generate more specific atlases, Oda et al. (2012) separated an atlas database into several clusters and multiple probabilistic atlases were generated.Also recently, inter-organ spatial relations have been incorporated into the probabilistic atlases to perform multi-organ segmentation (Okada et al., 2013;Cerrolaza et al., 2014;Wang et al., 2014).
The aim in building models from a population in the form of statistical shape models or probabilistic atlases is that the constructed models can match to the shape or appearance of the anatomical structure of interest of new images.However, the average models calculated from a given population describe the full variability in this specific dataset, potentially leading to a low specificity with respect to individual appearance.The generality of such average models may hamper the segmentation of a specific target image due to large inter-subject variability.For example, difficuties may arise in the segmentation of the target images whose anatomical shapes or locations differ significantly from the average model.To address these shortcomings, more recent approaches are based on subject-specific shape models (Wang et al., 2010) or subject-specific probabilistic atlases (Shimizu et al., 2010;Wolz et al., 2013;Chu et al., 2013), generating subject-specific priors for unlabeled images instead of sharing the same average shape or location priors.The subject-specific models are generated by identifying a number of suitable atlases and then fusing their priors.In order to generate good subject-specific priors for segmentation, two steps are crucial in these methods: selecting similar atlases to the target images and performing accurate pairwise registrations.
Previous studies (Aljabar et al., 2009) show that the segmentation performance of multi-atlas based methods is highly dependant on the selected atlases for the target image.Most atlas selection methods define a global mask region to include multiple organs of interest.Then, global similarity measures are calculated in this predefined mask between the target and atlas images to select suitable atlases.More advanced methods (Wolz et al., 2010;Cao et al., 2011) transfer the global similarities into a manifold and perform the atlas selection in the learnt manifold.However, the global similarities represent the overall differences in the mask, which are dominated by the large organs.For example, in our application of abdominal segmentation, the atlas selection is likely to be dominated by the liver since the liver is much larger than other organs.This means that the selected "similar" atlases may not be similar in some local regions such as in the pancreas.A region-wise local atlas selection strategy (van Rikxoort et al., 2010;Shi et al., 2010;Wolz et al., 2013) has been utilized to overcome this shortcoming by selecting suitable atlases at each local region.However, these approaches require the separation of the whole image into different local regions and non-rigid registrations are performed over these local regions for accurate label fusion.Since different anatomical patterns exist at different locations, a voxel-wise local comparison strategy may provide a better way to select similar atlases at each location.
The other drawback of traditional multi-atlas based methods is that accurate pairwise registrations are needed to acquire good segmentation results.This can be problematic in the case of high inter-subject variability.Another challenge that arises from using non-rigid registration is the highly computational burden.Previous studies (Wolz et al., 2013;Chu et al., 2013) have demonstrated that the computational complexity is largely defined by the computational time required for the non-rigid registration step.Recently, nonlocal patch based segmentation (PBS) method (Coupé et al., 2011;Rousseau et al., 2011) has been proposed to avoid the need of accurate non-rigid registration and demonstrated the successful applications on the segmentation of brain MR images.However, the patch-based segmentation method cannot be directly applied to the segmentation of abdominal organs, because i) unlike the human brain, the anatomy in abdominal region shows great variability.There is significant variation in the shapes, sizes and locations of the abdominal organs especially the pancreas, making the overall image alignment particularly challenging (Wolz et al., 2013;Wang et al., 2014).This will pose difficulties for the segmentation methods that rely on image registrations.Although only affine registration is required for the patch based segmentation method, Rousseau et al. (2011) argued that more accurate registration is beneficial to improve the segmentation accuracy of patch-based segmentation methods; ii) the computational complexity becomes a significant problem for large abdominal organs.
To address the above problems, a novel patch-based segmentation framework is presented for the abdominal multi-organ segmentation.In our previous work (Tong et al., 2013), a dictionary learning technique was introduced to improve the segmentation performance of the patch-based methods.However, this approach was limited to binary segmentation and only evaluated on the hippocampus labeling.In this paper, we extend our previous method (Tong et al., 2013) for the simultaneous segmentation of multiple structures.Furthermore, we evaluate the approach on abdominal multi-organ segmentation from CT images.Specifically, dictionaries and classifiers are learnt from the selected training atlases, which will then be utilized to generate a subject-specific probabilistic atlas for each unlabeled target image.The final segmentation is obtained by applying a post processing step based on graph-cuts in combination with the generated subjectspecific probabilistic atlases (Wolz et al., 2013;Chu et al., 2013).The main contributions of this work can be summarized as follows: (1) The extension of discriminative dictionary learning for segmentation (DDLS) algorithm for the segmentation of multiple organs in CT images; (2) A local voxel-wise atlas selection in order to capture local information for segmentation and to tackle the high inter-subject variability; (3) A comparison between different atlas selection strategies; (4) A multi-resolution strategy for gaining computational efficiency.In the remainder of the paper, we will first introduce the datasets used in our work in Section 2.1.The methodology of DDLS for multi-organ segmentation is introduced in Section 2.4 and different atlas selection strategies are also presented.The performance of the proposed method is analyzed in Section 3. Finally, we discuss the strengths and weaknesses of the proposed method and conclude this paper.

Dataset
150 3-D abdominal CT scans acquired from 36 female and 114 male subjects were used for our experiments.All scans were acquired between 2004 and 2009 at Nagoya University hospital by a TOSHIBA Aquilion 64 scanner and obtained under typical clinical protocols for the purpose of laparoscopic resection of the stomach and gallbladder glands or colon.Among the 150 CT scans, 141 subjects had early or advanced gastric cancer, one subject had cholecystitis cancer and eight subjects had colorectal cancer.All subjects were aged between 26 and 84 years with a mean age of 62.8±12.0.Scans have a resolution of 512×512 voxels in plane and contain between 238 and 1061 slices depending on the field-of-view and the slice thickness.Voxel sizes range from 0.55 to 0.82 mm and the slice spacing varies from 0.4 to 0.8 mm.The X-ray tube voltage is 120 kV and the X-ray tube current is 350-400 mAs.All of the images were acquired in portal venous phase (20-30 s delayed from starting point).The starting point of scanning was chosen according to the following rules: for patients who were younger than 60 years, the starting point was set as 25 s delayed from the injection point; for other patients, the scan started after 7 s when the intensity of the aorta is over 80 HU.Scanning control is performed by utilizing the Toshiba Real Prep System.Images were acquired under typical clinical conditions and therefore show typical contrast variations.Images start anterior at the lungs and are automatically cropped at 25 cm in the axial direction.
Reference segmentations are available for the liver, spleen, pancreas and the kidneys.The segmentations are used as atlases.All 150 subjects were segmented by one out of three trained raters.The reference segmentations are based on interactive region growing, where a spherical element is utilized to prevent excess segmentation of a target region, or graph-cut segmentation, where a set of foreground and background voxels are manually set as seed points.After the semi-automated segmentation, a manual correction step was performed on the axial, coronal, or sagittal slices.Perform local atlas selection.

4:
Extract training patches in a constraint search neighborhood from the selected atlas images and form a training patch library: Discriminative dictionary training: Sparse coding for the target patch p t : Probabilistic labels estimation : 10: h t = Ŵ αt 11: end for 12: Obtain final segmentation label maps S t using graph cuts.
a Algorithm of the proposed discriminative dictionary learning for segmentation (DDLS) with local atlas selection strategy (L-DDLS).β 1 and β 2 are parameters in the dictionary learning and sparse coding process.P L is the training patch library extracted from selected atlases.D and Ŵ represent the learned dictionary and the classifier respectively from P L .pt is the target patch under study and ht is the estimated probabilistic labels for pt.

Overview
There are three major steps in the proposed method.
Step 1 : After all the training atlases are affinely aligned to the target space, atlas selection is performed.Similar atlases can be selected by calculating similarity measures over the whole image or in a local mask.
Step 2 : Training patches are extracted from the selected atlas images within a search volume to form a training patch library P L .Then, a dictionary D with reconstruction power and a classifier W with discriminative ability are learnt simultaneously from P L and their corresponding labels.A probabilistic label is then estimated for each target voxel.In the end, a subject-specific probabilistic atlas is generated for each target image.
Step 3 : Based on the subject-specific probabilistic atlas, the final segmentation is obtained by using the graph-cuts method as proposed in Wolz et al. (2013).Algorithm 1 outlines the major steps of the proposed DDLS approach with a local atlas selection strategy.Details of these three steps are described in the following sections.

Atlas selection
In traditional atlas selection (Aljabar et al., 2009), the whole image is treated as a single entity for calculating inter-subject pairwise similarity measures.As a result, the selected atlases are shared by all voxels in the target image.However, it may not be optimal to utilize the same atlases at different locations.Assume that a target voxel v t (x i , y i , z i ) is in the liver region of a target image I t as shown in Figure 1, and atlas A b is selected because A b has a similar liver and shows similar anatomical patterns at location (x i , y i , z i ).If the target voxel v t (x j , y j , z j ) moves into the pancreas region of image I t , it is possible that atlas A c (c = b) is selected at location (x j , y j , z j ) because atlas A c contains more similar anatomical patterns with the target image I t at this location than atlas A b .Therefore, we propose to use a voxel-wise local atlas selection strategy to capture the important local information for segmentation.Figure 2 shows an example of the local atlas selection at different locations for a target image.As can be seen from this figure, the most similar atlas is different at different locations.In addition, same atlases are selected at neighboring voxels in homogeneous regions.The extent of the local mask influences the behaviour of the atlas selection: Larger masks mean that the atlas selection is more global (in the limit the mask can be the size of the image) and smaller masks lead to more local behaviour (atlases are selected based on more local intensity patterns).If the size of the mask is as large as the image, local atlas selection will be equivalent to the global atlas selection.In this case, the selected atlases are the same at all locations in the target image.
In this paper, we propose novel DDLS methods that allow either global or local atlas selection strategies, which are denoted as G-DDLS and L-DDLS respectively.In G-DDLS, a global mask (i.e. the whole image) is first defined.Then, a set of atlases is selected for each target image according to the similarities between the atlas images and the target image within this mask.In contrast to this, in L-DDLS, a voxel-wise atlas selection is carried out to select similar atlases locally at different locations in the target image.This means that different sets of atlases can be selected at different locations in the target image.For a target voxel v t at location (x, y, z), a local neighborhood is defined as shown in Figure 2.Then, pairwise similarities at this location between atlas images and the target image are calculated within this local mask.Different similarity measures such as the squared intensity differences (SSD), cross-correlation or mutual information (Pluim et al., 2003) can be used.Finally, K atlases are selected at location (x, y, z) for the target voxel v t according to the local similarity measures.
In our previous work (Tong et al., 2013), DDLS with a fixed-atlas strategy was proposed which we denoted as F-DDLS.In F-DDLS, a subgroup of the whole dataset is randomly selected as the fixed training atlases.Discriminative dictionaries are then trained from these randomly selected training atlases offline.After that, the segmentation is performed on the remaining test subjects online.In contrast to G-DDLS and L-DDLS which select subject-specific atlases for training, F-DDLS uses fixed atlases for training.The advantage of F-DDLS is that it can yield a significant speed-up in the segmentation process since the dictionaries are learnt offline and kept fixed.

DDLS for Multiple Structures
For labeling a target voxel v t in the target image I t , the surrounding patch of v t is extracted and denoted as the target patch p t ∈ R m×1 .Here, the m intensity values within the patch are arranged into a mdimensional feature vector.A search volume is defined in each selected atlas image A i .All template patches in the search volume across the K selected similar atlases are extracted to form a training patch library P L .Assuming that the patch library contains n training patches, the patch library can then be represented as A reconstructive dictionary D ∈ R m×d with d atoms can be learnt from the input patch library P L ∈ R m×n by solving the following problem: where the first term is the reconstructive term and the second term adds the sparsity constraint over the coding coefficients α, forcing that each training patch in P L is represented by a linear combination of a few atoms in D. This means that all the training patches in P L can be reconstructed using the learnt dictionary D. Equation ( 1) can be solved by using the K-SVD algorithm (Aharon et al., 2006) or via the online dictionary learning algorithm (Mairal et al., 2009).However, the learnt dictionary only has reconstructive power, lacking of discriminative ability for our segmentation task.Since we know the segmentation labels of the training patches in P L , we can use this prior information to learn a classifier that predicts labels for the target patch p t .As in (Tong et al., 2013), a linear classifier f (α, W ) = W α is added to the objective function: where a labeling error term H − W α 2 2 is added to Equation (1).Each column of H is a label vector corresponding to a training patch in P L .Each label vector is defined as h i = [0, 0 . . . 1 . . .0, 0], where the non-zero entry position indicates the label of the center voxel in the corresponding training patch p i .W denotes the linear classifier parameters and β 1 controls the trade-off between the reconstruction error term and the labeling error term.Here, we use the online dictionary learning algorithm as in (Tong et al., 2013) to solve Equation (2).After this equation is solved, a dictionary Dt and a classifier Ŵ are learnt from P L and their labels H.The target patch p t can be represented by the learnt dictionary D as: where αt are the coding coefficients of the target patch p t .Probabilistic labels of the target voxel v t can then be estimated by the linear predictive classifier Ŵ and the coding coefficients αt : Here h t is the estimated probabilistic label vector for the target voxel v t .Values in h t represent the probability of the target voxel v t belonging to different organs and are normalized to sum to one.Ideally, h t will be {0, 0, • • • , 1, • • • , 0, 0} with only one non-zero element, indicating the label of the structure.The final label at each voxel v t can directly be determined by finding the index of the largest element in the probabilistic label vector h t .Previous patch-based approaches including our DDLS algorithm were evaluated over the segmentation of small structures like the hippocampus (Coupé et al., 2011;Tong et al., 2013), which can be computed efficiently.However, when these methods are applied to the segmentation of large structures such as the whole brain or the abdominal organs, the computational complexity becomes extremely high.For example, it takes more than 42 hours for a whole brain segmentation by using the nonlocal patch based segmentation as reported in Eskildsen et al. (2011).To overcome this problem, a multi-resolution framework (Eskildsen et al., 2011;Wang et al., 2013) can be used to gain computational efficiency.In order to make efficient multi-organ segmentation possible, we also integrated a multi-resolution dictionary learning framework into our proposed DDLS algorithm as shown in Figure 3.

Speedup with Multi-resolution Framework
Multiple resolutions of the target image and all atlases are created by constructing Gaussian image pyramids offline.Using DDLS, a probabilistic atlas is obtained for the target image at the lowest resolution, which contains the initial probabilities of each voxel belonging to different organs.If the largest probability value at a location is lower than a defined confidence level γ, the probabilities at this location will be recalculated at next resolution; otherwise, the probabilities at current location will be retained.This enables propagation of probabilistic atlas across resolutions by using the resulting probabilistic atlas at the current resolution to initialize the probabilistic atlas at next resolution.In this manner, the segmentation mask at next resolution is limited to the voxels with uncertain segmentations at the current resolution, forming a computationally-efficient way to process images through increasing resolutions.

Refinement with Graph Cuts
The above DDLS algorithm generates a probabilistic segmentation that serves as a subject-specific probabilistic atlas.This in turn, provides the spatial prior for obtaining the final segmentation.Previous studies (van der Lijn et al., 2008;Wolz et al., 2009Wolz et al., , 2013) ) demonstrate that further improvements can be achieved by combining the target intensity information and the spatial prior.In the work of (Wolz et al., 2009), the graph-cuts algorithm is used to obtain the final segmentation S t of the target image I t by solving an MRF-based energy function: where v i and v j are voxels in a neighborhood N in the target image I t .The data term D vi measures the disagreement between a prior probabilistic model and the observed data, which is a combination of the target intensity information and the spatial prior.E vi,vj is a smoothness term penalizing discontinuities in a neighborhood N .A more detailed description of the energy function is given in Appendix A. The parameter λ controls the influence of these two terms, which was set to 1 in all experiments as in Wolz et al. (2013).The setting of λ was not optimized for the current dataset.Since the graph-cuts algorithm is applied to each organ independently, a fusion step is applied to obtain the final segmentation.In this step, equivocal voxels are assigned the label that has the largest value in the probabilistic label vector h t .

Experiments and Results
The proposed methods were evaluated on 150 abdominal CT scans as described in Section 2.1.For the G-DDLS and L-DDLS methods, a leave-one-out procedure was utilized in the validation.Each scan was segmented by treating the remaining 149 subjects as atlases.Atlas selection was performed over the remaining 149 atlas database.Two resolution levels with isotropic voxel spacing respectively of 4 mm and 2 mm were utilized to speed up the process of the proposed methods as shown in Figure 3.After the probabilistic atlases were generated in the native spaces, they were treated as the input of the graph cuts algorithm to achieve the final segmentations.All parameters were empirically set (see Table 1) according to previous studies (Eskildsen et al., 2011;Tong et al., 2013).The influence of the mask size in the local atlas selection on the segmentation performance was evaluated in Section 3.3.
The Dice overlap was calculated between automated and manual segmentations for the evaluation of our proposed method.Paired (for the same group) or non-paired (for different groups) two-tailed t-tests were performed with the Dice overlaps to assess the statistical significance of different results.In order to compare with state-of-the-art methods, the Jaccard index (JI) as well as the Dice overlap were computed.Given the true positive (TP), false positive (FP) as well as false negative (FN) fraction, these two measures are defined as: Resolution (mm 3 ) Patch size (voxels) Search volume (voxels) Dictionary atoms β 1  and 2×2×2mm 3 ) are used.The patch size is set to 5 × 5 × 5 voxels at different resolutions and for all experiments.The search volume is the defined neighborhood in the atlases for extracting training patches.The number of atoms in dictionaries is set to 256.β 1 and β 2 are parameters in the dictionary learning and sparse coding step.γ is the defined confidence level for the propagation of probabilistic atlas across resolutions in the multi-resolution framework.

Advantage of discriminative dictionary learning
We first evaluated the segmentation performance of G-DDLS compared with majority voting (MV) labeling (Heckemann et al., 2006).Global atlas selection was performed by comparing pairwise similarities between atlases and the target images over the whole CT scan.After 20 similar atlases were selected globally, the G-DDLS and MV approaches were used to perform the labeling of the target images.Furthermore, the graph cuts algorithm was utilized as a post-processing step of the G-DDLS and MV approaches, denoted as G-DDLS-GC and MV-GC respectively.Figure 4 shows a comparison of these four different methods.Since only affine registrations were used, the MV method cannot provide accurate segmentation results.Especially for the pancreas, the segmentation results are quite poor due to the significant registration errors resulting from the large variation in the shapes and locations of the pancreas.In comparison with MV, G-DDLS can generate more accurate results even though only affine registrations were used.By applying graph cuts as a post-processing step, both the G-DDLS and MV approaches gain further improvements.Therefore, we utilized the graph cuts refinement in all the following experiments.
The segmentation performance of G-DDLS is also compared with the non-local patch-based segmentation (PBS) method as proposed in Coupé et al. (2011).The results are shown in Table 2.As can be seen from Table 2, G-DDLS can achieve significant improvements over MV and PBS on all the four organs.The great variability of abdominal organs result in large registration errors, which may degrade the segmentation performance of the PBS method.

Advantage of local atlas selection
The proposed method was also validated with different atlas selection strategies.Figure 5 compares the segmentation performances of G-DDLS and L-DDLS.It can be seen that L-DDLS can achieve more accurate segmentation results than G-DDLS on the four structures when the same number of atlases are selected.Especially in the case of a small number of selected atlases (i.e.5), the improvements of L-DDLS over G-DDLS is significant.It has been reported in Aljabar et al. (2009) that the segmentation accuracy of multi-atlas methods in terms of Dice overlap rises from a low value to a maximum and then gradually declines as the number of selected atlases increases.This is due to the fact that the population represented by a large atlas database is heterogeneous, for example in terms of age, morphology or pathology (Aljabar et al., 2009).Our proposed DDLS method follows this trend, but the segmentation accuracy of L-DDLS converges to the maximum much more quickly than that of G-DDLS as suggested by the results in Figure 5.This is attractive because it is possible to achieve the best segmentation performance of the proposed DDLS method by using only a small number of atlases.
Table 3 shows the segmentation results using G-DDLS, L-DDLS and F-DDLS over the four organs.In order to perform the F-DDLS method, 150 images were affinely transformed to a template space.Here, we chose the first image in our dataset as the template image.After that, 50 subjects were randomly selected as training atlases.Dictionaries and classifiers were then trained offline in the template space using the randomly selected 50 atlases.The segmentations of the remaining 100 images were carried out in the template space by using the learnt dictionaries and classifiers.Finally, the segmentation results were transformed back to the target spaces for calculating the Dice overlaps.This evaluation was repeated 10 times and the average Dice overlaps were calculated.As shown in Table 3, F-DDLS achieved the lowest Dice overlaps among the three different methods because F-DDLS does not utilize an atlas selection step but learns an average model from the randomly selected subset of the database.In contrast with F-DDLS, G-DDLS and L-DDLS select similar atlases for each target image and generate a subject-specific probabilistic atlas for segmentation, which results in a significant improvement in the segmentation accuracy.In terms of Dice overlap, L-DDLS has an improvement of 3% over that of G-DDLS in the segmentation of the pancreas.However, the improvement on the segmentations of the liver is limited.This is due to the fact that both G-DDLS and L-DDLS can select similar atlases in the liver region since the liver is the largest organ, but only L-DDLS can select similar atlases in the pancreas region.It is observed that there is significant variation in the shapes and locations of the pancreas.The improvement of L-DDLS over G-DDLS in the segmentation of the pancreas suggests that the local atlas selection strategy can handle this high inter-subject variability to some extent.1.
atlas selection strategies.

Influence of mask size in L-DDLS
In L-DDLS, a local mask is defined at every voxel in the target image for selecting similar atlases at different locations adaptively.The influence of the mask size on the segmentation accuracy is shown in Figure 6.The G-DDLS is an extreme case of L-DDLS by increasing the mask size to the image size.Due to the computational burden of the DDLS method, 5 atlases were selected in this evaluation.As the mask size increases from 7 × 7 × 7 voxels to 31 × 31 × 31 voxels, the segmentation accuracy of the liver remains roughly unchanged, but that of the pancreas gradually drops, indicating that the local atlas selection strategy has more influence in the segmentations of small organs with large inter-subject variability.

L-DDLS with different similarity measures
The L-DDLS method was also evaluated using different similarity measures.Squared intensity differences (SSD), cross correlation (CC) and normalized mutual information (NMI) were used as similarity measures in

Methods
The results using SSD, CC, and NMI are not significant different from each other on the segmentation of the liver, the spleen and the kidneys in a paired t-test.However, L-DDLS using CC and NMI as similarity measures can generate more accurate results on the segmentation of the pancreas than L-DDLS using SSD.
An experiment was also performed in order to assess the performance on lower quality image data.The dataset were downsampled in dorsoventral direction (slice-spacings were set to 5 mm) while in-plane voxel spacings were kept, simulating a typical low-resolution clinical protocol.The proprosed L-DDLS method was then validated on this downsampled dataset.Results are not significantly different from those on the high resolution dataset over the segmentation of the liver, the spleen and the kidneys in a paired t-test, except the pancreas.Since the pancreas is the smallest organ with high shape variability, the interpolation artefacts during downsampling may have more effect on it than other organs.

Comparison with state-of-the-art methods
It is always difficult to directly compare the segmentation performance with those of the state-of-the-art methods (Heimann et al., 2009;Shimizu et al., 2010;Chen et al., 2012;Linguraru et al., 2012;Bagci et al., 2012;Okada et al., 2013) due to different datasets for evaluation, different qualities of manual segmentations, and differences in the evaluation metrics used.Here, the results of three state-of-the-art methods (Chu et al., 2013;Wolz et al., 2013;Wang et al., 2014) which utilized the same dataset (Wolz et al., 2013) or a subset of our dataset (Chu et al., 2013;Wang et al., 2014) for evaluation and also the results of four other methods that used different datasets are shown in Table 5 for comparison.The results of L-DDLS were obtained in a leave-one-out procedure and cross correlation was used as the similarity measure for local atlas selection.L-DDLS 5 and L-DDLS 20 represent that 5 and 20 atlases were selected in L-DDLS respectively.The computational time is the runtime of the segmentation of one target image without parallelization (single core).achieves competitive performance with these state-of-the-art methods.In addition, the proposed L-DDLS method can be implemented very efficiently as shown in Table 5, which can be attractive in clinical practice.

Computational time
The runtimes of our proposed G-DDLS and L-DDLS methods increase approximately linearly with the number of the selected atlases during training.In our implementation, all the experiments were carried out with eight Intel Xeon cores clocked at 3 GHz and 32 GB RAM.It takes around half an hour to segment the four organs of an abdominal scan when 5 atlases are selected for training dictionaries.However, if the number of selected atlases increases to 20, the runtime increases to around 2.5 hours.For G-DDLS, the number of selected atlases yields significant differences in the segmentation accuracy as the Dice overlap values increase significantly from selecting 5 atlases to 20 atlases as shown in Figure 5.However, L-DDLS does not have this problem as its segmentation accuracy reaches the maximum much earlier than that of G-DDLS.Therefore, it takes much more time for G-DDLS to achieve the best segmentation results, as more atlases are needed compared with L-DDLS.Using segmentations can be performed quite efficiently.It takes approximately 15 minutes per scan using F-DDLS since the dictionaries and classifiers have been trained offline and only the sparse coding step is needed in the segmentation stage, which can speedup the process significantly.

Discussion and Conclusion
In this paper, we developed discriminative dictionary learning techniques for the multi-organ abdominal segmentation in CT images.A large dataset of 150 abdominal CT images was used for evaluation.Experimental results show that the proposed DDLS method achieves significantly more accurate results than the traditional multi-atlas segmentation method based on MV label fusion (Heckemann et al., 2006) and the nonlocal patch based segmentation method (Coupé et al., 2011).It provides a comparable segmentation accuracy to those of the state-of-the-art methods (Okada et al., 2012;Linguraru et al., 2012;Bagci et al., 2012;Chu et al., 2013;Wolz et al., 2013;Wang et al., 2014).In addition, our proposed DDLS method achieves promising segmentation results by only using global affine registration.Since only affine registration is required, our method can be implemented efficiently, demonstrating the potential for real-time clinical applications and in challenging datasets where accurate registration is difficult to achieve.
Different atlas selection strategies were implemented and compared with the DDLS method.Among them, the F-DDLS method employs an average model as in approaches based on statistical shape models.In statistical shape models, ideal mean shapes of different organs are constructed from a specific dataset.In F-DDLS, fixed dictionaries and classifiers are learnt from a given subset (i.e.randomly selected 50 atlases).The advantage of F-DDLS is that the segmentation can be performed quite efficiently since the average model (fixed dictionaries and classifiers in F-DDLS) has been learnt offline.However, approaches based on the average model from a specific dataset may be challenged by diverse testing datasets, where high intersubject variability exists.In comparison with F-DDLS, the G-DDLS and L-DDLS methods automatically select suitable atlases for an unlabeled target image and then learn target specific priors for segmentation.This can result in significant improvement in the segmentation performance, especially for the segmentation of the pancreas as shown in Table 3.
The L-DDLS method takes full avantage of the whole dataset and adapts to each location in the target image individually.The atlases most suitable to the current location under consideration are automatically selected.Atlases that have different local anatomical patterns at the current location are not taken into account, but still available for other locations in the target image.In comparison with G-DDLS, there are three advantages of L-DDLS: (1) One can achieve promising segmentation results with fewer atlases by using local atlas selection strategy in comparison with using normal global atlas selection.For example, the L-DDLS method can segment the liver, kidneys, pancreas, and spleen with Dice overlap values of 94.8%, 92.9%, 66.6%, and 92.4% respectively by selecting 5 atlases locally.Although only 5 atlases are selected, the most similar atlases have already been found at each location by using L-DDLS, which can then provide reliable prior information for label estimation.In comparison, the Dice overlap values of G-DDLS using 20 atlases (as shown in Table 3) are still lower than those of L-DDLS with 5 atlases.(2) Since less training atlases are needed for labeling a target image in L-DDLS, the computational burden can also be significantly reduced.The runtime of DDLS is around 30 minutes by selecting 5 atlases, while this increases to approximately 2.5 hours by using 20 atlases.(3) L-DDLS can handle the high inter-subject variability of small organs like the pancreas much better than G-DDLS.This is due to the fact that G-DDLS selects atlases according to global similarity between atlases and the target image.This global similarity, however, is dominated by the similarity in large structures like the liver, weakening the influence of the similarity in small organs like the pancreas.By treating the similarity at each location equally, L-DDLS achieves an improvement of 3% in terms of Dice overlap over that of G-DDLS in the segmentation of the pancreas, which is the most challenging structure.
The number of selected atlases K is an important parameter in multi-atlas segmentation methods.In our work, K was predefined globally, which means that the same number of atlases are selected at each location in the target image.However, it is observed (Aljabar et al., 2009) that K required for the highest segmentation accuracy varies for different structures.This could also be the case for different locations.A further improvement may be obtained by not only selecting similar atlases locally but also choosing the best number of atlases adaptively at each location.This can be done by modeling the segmentation errors as a function of K as proposed in (Awate and Whitaker, 2014).After the function is fitted, the best number of atlases can be estimated at each location.However, it should be mentioned that the process of estimating the best K at each location may increase the computational complexity of our proposed method.
In terms of computational time, patch based segmentation methods (Coupé et al., 2011) can gain some computational efficiency by avoiding the need for non-rigid registration.However, they still suffer from the high computational burden in the label fusion stage (Eskildsen et al., 2011;Wang et al., 2014), which becomes a significant problem for the large abdominal organs in high resolution images.This is why the multi-resolution framework was combined in our work to speed up the segmentation process.A very recent patch-based segmentation method using the patch match algorithm (Ta et al., 2014) allows speed ups of around 200 to 1000 fold in the label fusion stage without losing segmentation accuracy, providing a new potential way to gain further computational efficiency in our work.Overall, a segmentation method providing a high accuracy that can be implemented efficiently will be preferable.
Although the proposed method works well on the segmentation of abdominal organs in CT scans, it has several drawbacks.First, the proposed method still requires alignment between atlas images and the target image with a global affine registration.This process can still be a problematic step in images with a high degree of anatomical variance (Wang et al., 2013).Another direction for future work will be to investigate the extension of our proposed method without registration.Second, atlas selection is an essential step in the proposed method for achieving good segmentation performance.A subset of similar atlases are selected globally or locally from all the training atlases for the segmentation of each target image.However, the remaining "dissimilar" atlases could potentially provide valuable information to aid the segmentation.For example, similar patches could still be present in dissimilar atlases, which can provide additional information for labeling the target patches.In future work, the potential to perform segmentation without atlas selection will be investigated in order to take full advantage of the whole atlas dataset.Furthermore, the proposed method uses local patches for segmentation, which can only provide local intensity patterns, but neglects the global anatomical patterns.The global anatomical information, however, can be helpful for the segmentation work.For instance, the inter-organ relations has been demonstrated to be helpful for segmentation as shown in (Okada et al., 2013;Cerrolaza et al., 2014;Wang et al., 2014), which can also be integrated into the proposed method for a further improvement.

Figure 1 :
Figure 1: Demonstration of the voxel-wise local atlas selection strategy.At different locations in the target image It, different subsets of atlases are selected.Atlases A 2 , A 5 , A 4 , A 23 and A 66 are selected at location (x i , y i , z i ) since these atlases have similar local intensity patterns with that of the target image at this location.When the target voxel vt is at location (x j , y j , z j ), atlases A 1 , A 4 , A 5 , A 10 and A 78 are selected.

Figure 2 :
Figure 2: Example demonstrating the local atlas selection for different local mask sizes.The color maps show the most similar atlas selected at different locations in the target image.Different colors mean that different atlases are selected at different locations.

Figure 3 :
Figure 3: The multiresolution segmentation process.DDLS is performed to generate probabilistic atlas for each organ, which propagates across resolutions.The final segmentation is achieved by using the graph-cuts algorithm in the native space.

Figure 4 :
Figure 4: Comparison of different approaches.The global atlas selection strategy was utilized and 20 atlases were selected for the segmentation of each target image.
Figure A.7  shows exemplar segmentations for the four organ of a subject by using different

Figure 5 :
Figure 5: Comparison of G-DDLS and L-DDLS on the segmentation of four structures using different numbers of selected atlases.L-DDLS 5 means that the local atlas selection strategy is used in DDLS and 5 similar atlases are selected for labeling one target voxel.The mask size was set to 11 × 11 × 11 voxels in L-DDLS at all resolutions.The other parameters in G-DDLS and L-DDLS were set as shown in Table1.

Figure 6 :
Figure 6: Influence of the mask size on the segmentation accuracy of L-DDLS.The results were obtained by selecting the 5 most similar atlases in a leave-one-out procedure.G-DDLS is an extreme case of L-DDLS by increasing the mask size to the image size.
Algorithm 1 DDLS with Local Atlas Selection a Input: A target image I t ; A set of training Atlases: images A = {A 1 , A 2 , • • • , A N } and labels S = {S 1 , S 2 , • • • , S N }; Parameters: β 1 and β 2 .Output: A label map S t .1:Affinely align atlases A to the target space.
2: for each target voxel in I t do 3:

Table 3 :
Comparison of Dice overlaps (MEAN ± STD (%) (p value)) using different atlas selection strategies.The results of F-DDLS were obtained by randomly selecting 50 atlases for training and the remaining 100 subjects for testing, which was repeated 10 times.The other results were obtained by selecting the 20 similar atlases in a leave-one-out procedure.The mask size was set to 11 × 11 × 11 voxels in L-DDLS at all resolutions.The other parameters in F-DDLS, G-DDLS, L-DDLS were set as shown in Table1.† means statistically significant different from the results of L-DDLS with p < 0.0001.

Table 4 :
Table5shows that our proposed method Influence of different similarity measures on the segmentation accuracy of L-DDLS.All the results were obtained by selecting 20 similar atlases in a leave-one-out procedure.The mask size was set to 11 × 11 × 11 voxels in L-DDLS at all resolutions.The other parameters were set as shown in Table1.It should be mentioned that the overall Dice of kidneys is not an average of the Dice of the left kidney and the Dice of the right kidney.All the dice values were calculated according to Equation 6.

Table 5 :
Comparison with state-of-the-art methods (Top group: the proposed L-DDLS method with different number of selected atlases; Middle group: methods using the same dataset; Bottom group: methods using other dataset).