Multi-Atlas Segmentation Using Partially Annotated Data: Methods and Annotation Strategies

Multi-atlas segmentation is a widely used tool in medical image analysis, providing robust and accurate results by learning from annotated atlas datasets. However, the availability of fully annotated atlas images for training is limited due to the time required for the labelling task. Segmentation methods requiring only a proportion of each atlas image to be labelled could therefore reduce the workload on expert raters tasked with annotating atlas images. To address this issue, we first re-examine the labelling problem common in many existing approaches and formulate its solution in terms of a Markov Random Field energy minimisation problem on a graph connecting atlases and the target image. This provides a unifying framework for multi-atlas segmentation. We then show how modifications in the graph configuration of the proposed framework enable the use of partially annotated atlas images and investigate different partial annotation strategies. The proposed method was evaluated on two Magnetic Resonance Imaging (MRI) datasets for hippocampal and cardiac segmentation. Experiments were performed aimed at (1) recreating existing segmentation techniques with the proposed framework and (2) demonstrating the potential of employing sparsely annotated atlas data for multi-atlas segmentation.


INTRODUCTION
I N recent years, major efforts have been undertaken towards building large medical image databases such as ADNI [1].
Segmenting anatomical structures in these images is often necessary to better understand physiological and pathological processes through quantitative analysis.As the wealth of data increases, manually annotating the images becomes prohibitive, especially for large 3D or 4D image datasets.Automated segmentation approaches may face challenges in large databases due to large variability in shape and appearance of the structures of interest, the presence of pathologies, or different imaging protocols used to acquire the images.In particular, it becomes increasingly desirable to develop robust and accurate segmentation techniques that rely on minimal manual input or weak supervision.
Multi-atlas segmentation [2], [3], [4] has proven to be a successful and robust tool and is widely used in the medical imaging community [5].The approach generally relies on label propagation from multiple atlases (i.e.fully annotated training images) to a target image.Using multiple atlases offers the important advantage of capturing anatomical variability.Ideally, the atlases should match the population to be segmented [6].However, suitable atlases are not always available for large image databases, especially if the images in the database exhibit large variabilities, e.g.due to the presence of disease or aging processes.This motivates the use of training data obtained with different annotation strategies, where atlas images are only partially annotated, drastically reducing the labelling effort per image and therefore allowing expert raters to (partially) annotate more training images in the same time.To employ partially annotated atlas data while building on the success of multi-atlas segmentation, we propose a generalisation of the labelling problem in existing multi-atlas segmentation methods.In the following paragraphs, we review relevant work in the field before identifying the main contributions of this paper.
Many multi-atlas segmentation techniques use non-linear registration to warp segmentations from multiple suitable atlases to a target image [2], [3], [4], [7], [8], [9].The target segmentation can be formed by fusion of the propagated labels, for example by applying a majority vote rule [2], [8] or another combination strategy such as a weighted average based on global or local similarity measures between the target and atlas images [7], [10].In [9], a probabilistic framework was presented where the above-mentioned vote rules are expressed with a generative label fusion model.This was extended in [10] to incorporate non-local label fusion and registration uncertainty, and in [11] to allow the use of atlases annotated with different labelling protocols.Other combination strategies include STAPLE [12], where label fusion weights are estimated with an expectation-maximisation algorithm, or Joint Label Fusion [13], where correlations among atlases are taken into account.To account for high local anatomical variability between images, and to relax the requirement for accurate registration, patch-based segmentation [14], [15] has been introduced.Using this approach, the label fusion step employs a non-local weighted average of voxel labels in a small neighbourhood of the atlas images, with weights based on the similarities of patches centred on the compared voxels.

Our contribution
In this paper, we propose methods and annotation strategies which enable the use of partially annotated data for multiatlas segmentation, with the main goal of reducing the required manual labelling effort.As a first contribution, we propose a unifying framework for multi-atlas segmentation using a novel graphical representation of the labelling problem.In Sec. 2 we demonstrate how label fusion, spatial regularisation, and data models can be expressed simultaneously using this representation.To optimise the arising MRF energy function, we provide an efficient optimisation scheme based on continuous max-flow [33], [34].
We then show in Sec. 3 how the proposed framework can be used to go beyond the abilities of existing multi-atlas segmentation techniques: The proposed flexible graph structure allows a relaxation of the annotation requirements in atlas images.This means that our framework naturally allows the use of atlases that were only partially annotated, resulting in a reduced manual labelling effort for expert raters.
In Sec. 3 we examine different partial annotation strategies and investigate modifications in the graph configuration to optimally exploit partially annotated atlas data in the segmentation process.Experiments on hippocampal (Sec.4.1 and 4.2) and cardiac segmentation (Sec.4.3) highlight the performance of the proposed framework and shed light on some of the possibilities it offers for employing partial annotations such as missing slices or scribbles.A preliminary version of this work was presented in [37].

UNIFIED FRAMEWORK FOR MULTI-ATLAS SEGMENTATION
In this section, we first revisit the labelling problem in existing multi-atlas segmentation methods [2], [7], [8], [16], [17] and reformulate it as an MRF energy optimisation problem defined on a graph comprising multiple images (i.e. the target and atlases).In particular, we show how the proposed graphical approach can incorporate label fusion (Sec.2.1), spatial regularisation (Sec.2.2), as well as a data term and missing atlas labels (Sec.2.3).Section 2.4 summarises the components of the proposed framework.To solve the optimisation problem, in Sec.2.5 we propose an extension of CMF [34] which can efficiently minimise energy functions on graphs connecting multiple images.

Label Fusion
For multi-atlas segmentation [2], [7] (MAS) using R images, all atlas images j ∈ {1, . . ., R} are registered to the target image i.For convenience we assume i = R + 1.The label maps l j associated with the atlas images j are then propagated .This can also be interpreted as an MRF optimisation problem, where atlas voxels are connected to the terminal nodes with infinitely weighted edges and inter-image edges β ij (x) encode label fusion.
to the target.Figure 1a shows an example atlas set with corresponding label maps, and an unlabelled target image.Each voxel x ∈ Ω in the target image i is labelled using some combination strategy, e.g. a weighted average of atlas labels l j (x): Here δ(.) is an indicator function.The weights β ij (x) can be uniform (which is equivalent to the majority vote rule as used in [2], [3], [8]) or based on global or local similarity measures between images i and j as in [7], [9], [10].
As an alternative perspective, we can use a graphical representation to model the relationship of shared information between the atlases and the target using a Markov Random Field [32].According to the above labelling scenario, this graph connects each voxel x in the target image i to the corresponding voxels in the atlas images j with an edge weighted by β ij (x).The manual annotations in the atlases can be encoded by the unary potential function where G j (x) is the ground truth label given by the expert rater, assigning infinite cost to the hypothetical scenario of assigning a different label to the atlas voxel.Figure 1b visualises this configuration and in Sec.2.3, these terminal graph connections are discussed in more detail.To find a labelling on the graph, we can formulate a pairwise potential function that penalises conflicting labels in voxels connected by a high weight β ij (x), e.g.
This assigns a high penalty when the target and atlas labels differ and the atlas is considered similar to the target i, as defined by the similarity measure β ij (x).In the case of a majority vote, the weights are uniform, e.g.
The cost for labelling an individual voxel x in image i can then be calculated as follows: As we assume the graph satisfies Markov properties, voxels in the target image are conditionally independent given the atlas images since spatially neighbouring voxels in the target image are not connected in the graph (in contrast to the Fig. 2: Graph configuration representing patch-based segmentation.β ij (x, y) is determined by a patch similarity measure between a patch centred around voxel x in image i and voxel y in image j.Not all connections are drawn for better visibility and to reflect the fact that in practice, dissimilar patches are omitted in the label fusion [14].
setting for regularisation in many vision problems [32]).Since the atlas labels are fixed and assumed to be independent of each other (a common assumption in MAS), it follows that the target voxels are statistically independent, and the optimal label can be found by minimising E propagation (l i (x)) independently for all voxels: = arg min This leads to the same result as the vote rule in Eq. 1, demonstrating that multi-atlas segmentation can be expressed in terms of a graph optimisation problem.It is important to note that patch-based segmentation (PBS [14], [15]) can also be expressed in this framework.In this case we use a slightly different graph structure as the label fusion step in PBS takes into account multiple voxels in a neighbourhood of x in each atlas instead of just one voxel at location x.By denoting the patch-based label fusion weights as β ij (x, y), y ∈ N x to reflect the non-local nature of these methods, a labelling can be found for this scenario as well.Here, multiple patches in the atlases are used at locations y in a neighbourhood N x around location x.This scenario is visualised in Fig. 2. While the proposed formulation holds for these non-local techniques, the graph structure becomes more complex.In the scope of this paper, we limit ourselves to graphs on regular grids where voxels in different images are only connected if they are at corresponding locations, as this makes it possible to use the efficient optimisation scheme proposed in Sec.2.5.This novel perspective on label fusion for multi-atlas segmentation has two advantages: (1) it allows easy integration of additional components and therefore provides a unifying reformulation for existing multi-atlas segmentation methods, and (2) the graphical approach extends to segmentation using partially annotated atlases (Sec.3).

Spatial Regularisation
In the previous section, we proposed assigning pairwise potentials between target and atlas voxels for label propagation.In addition, we can incorporate spatial regularisation with pairwise potentials between adjacent voxels within an image.This simple modification of the graph structure is shown in Fig. 3a.Regularisation enforces spatial consistency by penalising different label assignment in adjacent voxels.If the regularisation weights are based on intensity gradients, consistent labels can be enforced in adjacent labels that are similar in appearance, while allowing different labels across intensity boundaries.A graph configuration as shown in Fig. 3a models the scenario where regularisation is used to refine label fusion results, as for example in [16], [38], [39].

Data Term and Missing Labels
In Eq. 2 we showed how manual annotations can be encoded as unary potentials which are often referred to as a data term [28], [32].The ground truth nature of these annotations is reflected in the graph structure by infinitely weighted terminal connections for each atlas voxel according to the manual label given.As can be seen in Fig. 1b or 3a, the voxels in the target image are not connected to the terminals as they are assumed to be unlabelled and no prior knowledge is available for them.It is important to note that a data term can be specified for the target image as well using prior probabilities, intensity models of the data, or a combination of both.This is a common technique when using MRFs in vision problems [16], [17], [32], [40] and can be incorporated by extending the graph structure as visualised in Fig. 3b.
for label propagation, data term and spatial regularisation, and corresponding inter-image flows r ij (x), source and sink flows p s,t i (x) and spatial flows p i (x), respectively, at location x in image i.
Furthermore, missing labels can be easily accounted for by removing terminal connections (i.e.unary potentials) for voxels where annotations are not available, as shown in Fig. 3c.The important implications of this property will be discussed in detail in Sec. 3 in conjunction with partially annotated atlas data.

Summary
We propose to interpret both the target image and the set of atlas images as a single graph structure (in which each voxel is a node) satisfying Markov properties.On this graph we can use unary potentials to define the data term E data to encode manual annotations or other prior knowledge, or to reflect missing labels.We then showed how pairwise potentials can be used to encode label fusion through inter-image connections and to build a propagation energy term E propagation .Another pairwise potential term E regularisation encodes spatial regularisation through intra-image edges.The propagation, data, and regularisation terms can be combined to a comprehensive labelling energy function defined for the whole graph: As mentioned in the introduction, many existing multi-atlas segmentation methods (e.g.[16], [18]) use an MRF formulation to improve label propagation results with the benefits of regularisation and intensity data models.However, these approaches use probabilistic label propagation results as prior probabilities (i.e.unary potentials) in a subsequent refinement step, therefore adding the MRF optimisation as a separate post-processing step.The above comprehensive formulation treats label propagation as part of the optimisation process, and unifies all the components within a single framework.Furthermore, as we show in Sec. 3, the flexibility of the proposed graph structure lends itself naturally to exploit partially annotated data.

Optimisation using Continuous Max-Flow (CMF)
It has been shown that MRF energy functions consisting of unary and pairwise terms can be minimised using mincut/max-flow approaches if the pairwise terms are metric or semi-metric [21], yielding globally optimal results for binary labelling problems and approximately globally optimal results for multiple labels [21].Recently, [33] proposed a continuous max-flow (CMF) algorithm in the 2D or 3D domain (i.e. a single image) which avoids metrication bias and is inherently parallelisable in contrast to many discrete graph-based methods [33].As the proposed energy function needs to be optimised for a large graph consisting of voxels in all images and their interactions, this approach was adopted and extended for graphs between multiple images.Analogous to discrete max-flow approaches, the energy function on the graph can be optimised by maximising a source flow p s through the network, subject to flow conservation and capacity constraints on the edges.In the original CMF algorithm [33], spatial flows p = [p x , p y , p z ] T exist between adjacent voxels in the image domain Ω (for regularisation) and source and sink flows p s,t between voxels and terminal nodes.The optimisation is performed with a variational approach by introducing a Lagrange multiplier u(x) to incorporate the constraints [33].It has been shown that the resulting u(x) corresponds to the globally optimal labelling [33] in the binary case.

Binary segmentation using CMF
In the following, we propose a generalisation of CMF from a single image to an arbitrary configuration of interconnected images to account for any user-defined choice of inter-image relationships β ij (x). Figure 4 shows the capacity constraints and introduces the notation for inter-image flows r ij (x) (for label propagation), spatial flows p i (x) (for regularisation) and terminal flows p s,t i (x) (for the data term).The notation is similar to [36], where inter-image constraints were used in a different context.To satisfy flow conservation, the sum of all in-and outgoing flows ρ i (x) at each node must be zero, i.e.
where r ij (x) = −r ji (x) and n is the number of images in the graph.We propose to adapt the definitions of the discrete gradient and divergence operators to account for anisotropic voxel dimensions [s x , s y , s z ], which are often found in medical images: This leads to the Lagrangian function which can be maximised iteratively by optimising each variable u, p s , p t , p, r separately [33], [36].The spatial flows p i (x) are updated using the gradient projection approach proposed in [41]: The regularisation constraints α(x) determine the smoothness of the result.To enforce greater smoothness in homogeneous image regions than along intensity boundaries, α(x) can be defined based on the image gradient ∇I(x): with parameters a and σ 1 .This measure is the continuous equivalent of the regularisation term used in in [16], one of the pioneering works combining regularisation and multi-atlas segmentation.The terminal flows p s i (x), p t i (x) can also be found by fixing all other variables, respectively [33].The novel component compared to [33], [36] is the use of inter-image flows r ij (x) between any pair of images i, j [37].We therefore show in particular that the optimisation step at iteration k for r ij (x), while fixing all other variables, is: This leads to where The multiplier u i (x), which serves as the labelling function, is updated with After convergence, a segmentation can be found by discretising the resulting solution for u, e.g. by thresholding at 50%.
2.5.2Multi-label segmentation using the Potts Model CMF has been extended to multi-label segmentation problems in [34] using a Pott's model approach.To optimise for multiple labels, the graph structure is duplicated for every label.The data term is encoded in the sink constraints of each "sub-graph" while the source connections remain unconstrained.The same changes can be applied to the the graph in our framework, as shown in Fig. 5.The Lagrangian function formulated for the binary case (Eq.14) can be augmented to reflect this graph configuration: Here, u i,l is the labelling function for label l ∈ 1, .., L in image i and ρ i,l is the new flow conservation constraint

PARTIAL ANNOTATION STRATEGIES
Manually annotating medical images is very time consuming, placing a major burden on clinical experts tasked with labelling large datasets.However, using the proposed unified framework for multi-atlas segmentation, it is possible to open up a new field of applications, namely segmentation using partially annotated atlas data.We showed in Sec.2.3 how the proposed graphical representation can easily accommodate missing labels through missing terminal connections in the graph structure.By applying our framework to any of the existing approaches discussed throughout Sec. 2, this would lead to a segmentation that is inferred from the available labels only, ignoring missing information.Additionally, spatial consistency in the atlas images can be exploited to employ unlabelled atlas data as well.As neighbouring voxels are expected to share the same label, particularly if the voxels exhibit similar intensity patterns, we propose to use spatial regularisation within the atlas images as a form of intra-image label propagation.This way, labels may be shared between similar regions with labelled and unlabelled voxels in the atlases and propagated to the target image.This modification in the graph structure leads to a configuration as shown in Fig. 6a.Another possible configuration combines this with an additional inter-atlas propagation scheme which allows atlases to share information as well (shown in Fig. 6b).This serves to facilitate the propagation, especially when manual labels are very scarce at some locations x.
With this framework, it becomes interesting to pursue strategies which aim to efficiently build partially annotated datasets which may then be used as training data for segmentation tasks.In the remainder of this section, we propose two partial annotation strategies, which are evaluated in the Experiments Sections 4.2 and 4.3.

Strategy A: Slicewise Annotation
Medical volumetric images are often manually annotated slice-by-slice.Therefore reducing the proportion of annotated slices while retaining robust and accurate segmentation is an important goal.To simulate partially annotated atlases, only annotations from a proportion of evenly spaced 2D slices are used, and the remaining labels are set to be "missing".As an example, Fig. 7a shows a cross-section of a 3D image where every fifth slice is annotated.It is important to note that in the selected slices, the structures of interest are delineated in detail, i.e. all voxels in that slice are labelled.

Strategy B: Scribbles
Scribbles are often used to annotate images in the context of interactive segmentation [26], [28].This strategy typically involves placing brush strokes (i.e."scribbles") on parts of the image considered within the structure of interest, or within the background.As scribbles do not delineate the structure boundary, this only requires a very short user interaction and could potentially require less expertise.These properties make "scribbling" an attractive annotation strategy if it can be shown their use leads to competitive segmentation results.Figure 7b shows an example image with scribbles for both the structure of interest (i.e. the hippocampus) and the background.We propose to annotate the training dataset by efficiently placing scribbles covering large areas (without delineating boundaries), as this can be done efficiently and is expected to make the segmentation task easier than very sparse, small scribbles.

EXPERIMENTS AND RESULTS
In the previous sections, we proposed a unified multi-atlas segmentation framework which can naturally accommodate partially annotated atlas data.We showed how the proposed graphical representation can implement a number of existing techniques through changes in the graph configuration.In the following experiments, we first employ the proposed framework to perform hippocampal segmentation using three existing multi-atlas segmentation techniques (Sec.4.1).We then investigate how the framework can be used -with further modifications of the graph structure -to employ partially annotated atlases for segmentation.This is done using both the slicewise partial annotation strategy (Sec.4.2) and scribbles (Sec.4.3).The experiments were carried out on two datasets: (1) brain MR images from the ADNI database for hippocampal segmentation (a binary segmentation problem) and ( 2) cardiac MR images for segmentation of the right and left ventricular cavities and the left ventricle myocardium (i.e.segmentation with multiple labels).

Evaluation of Proposed Framework for Multi-Atlas Segmentation (MAS)
To explore the proposed unifying framework, a number of different configurations were compared which correspond to existing segmentation techniques.To acquire a labelling on a target image, selected atlas images were aligned with the target image using non-rigid registration [42] and a graph was constructed using each of the chosen configurations.The optimisation proposed in Sec.2.5 was performed to achieve a segmentation result.
The most elementary configuration we studied was multi-atlas segmentation using the majority vote label fusion step (MAS-MV) [2], [3], [8].For this, we assume a graph structure as shown in Fig. 1b and label propagation weights were uniformly set to β ij (x) = 1.We compared MAS-MV to locally weighted label fusion (MAS-LW) as explored in [7], [9], [10].To this end, we chose propagation weights β ij (x) based on a local similarity measure between the target and the atlases as below: where P (x) is a patch centred around voxel x and |P | is the patch size.K does not influence the label fusion result and was set to 1.By modifying the graph configuration to additionally incorporate intra-image edges in the target image, we added a regularisation term as described in Sec.2.2 and shown in Fig. 3a.This configuration (further referred to as MASr-LW) implements simultaneous label fusion and regularisation similar to [16], [17].It is important to note that these approaches incorporated an additional prior probability term based on intensity models of the data.However, in preliminary experiments, we achieved better results without this term.

Data and experiment setup
The proposed method was applied to 202 images from the ADNI database [1] for which reference segmentations of the hippocampus were made available through ADNI.In a pre-processing step, all images were affinely aligned to the MNI152 template space and intensity-normalised [43].The data were split randomly into two equally sized sets, one for parameter training and one for evaluation.Optimal parameters were chosen for locally weighted label fusion (i.e. the propagaton term) and for spatial regularisation.The tuning procedure and results are described in Sec.4.4.1.The terminal connections encoding the data term simply consisted of infinite weights in voxels where manual annotations were available, and zero weight (i.e.missing link) in unlabelled voxels.

Results
For evaluation, a 10-fold cross-validation was performed within the evaluation set.For the each fold, every test subject was segmented using the training data (i.e. the remaining folds), which served as the atlas population.This means that for each test subject, the R most similar images from the remaining folds were used as atlases.Similarity was assessed with normalised mutual information.This was repeated for R = {5, 10, 15, 20} to measure the influence of the number of atlases on segmentation accuracy.Figure 8 shows the mean Dice coefficients of the pooled results.Segmentation results generally increased with the number of atlases used.Majority vote (MAS-MV) was more robust than locally weighted fusion (MAS-LW) when using 5 or 10 atlases, but for larger atlas sets, MAS-LW achieved better results.With additional spatial regularisation, MASr-LW consistently outperformed both MAS-LW and MAS-MV.

Evaluation of Partial Annotation Strategy A: Slicewise (PA-SW)
This experiment aims to investigate the performance of our framework when using atlas data which were partially annotated through slice-by-slice annotation as proposed in Sec.3.1.
As proposed in Sec. 3, we examined two graph configurations using different propagation schemes.In the first configuration (further referred to as PA-SW-CONF1) as shown in Fig. 6a, the regularisation term included spatial regularisation in all images (i.e.target and atlases).The propagation term allowed label propagation from the atlases to the target.In addition, in the second configuration (further referred to as PA-SW-CONF2), label propagation between the atlases was allowed by expanding the propagation term with inter-atlas connections as shown in Fig. 6b.

Data and experiment setup
The same data was used as in the previous experiment (Sec.4.1).To simulate partially annotated atlas data, manual labels of a proportion q of evenly distributed slices in 20 atlas images were used for segmentation of the target image.To determine which slice positions were used, a random offset was determined for each atlas image.The partial annotations were then transformed to the target space using nonrigid registration [42].The data term was built by establishing terminal connections at labelled voxels, while leaving unlabelled voxels unconnected, as explained in Sec.2.3.The proportion of labelled atlas slices ranged from q = 1 (i.e.fully labelled) to q = 0.1 (i.e.every 10th slice) to investigate how strongly the atlas label maps could be sub-sampled while achieving robust segmentation results.
The parameters for the propagation term were chosen as in the previous experiment and optimal choices for the regularisation coefficients a, σ 1 were obtained through parameter tuning as described in detail in Sec.4.4.2.

Results
Results on the evaluation set were obtained using the same 10-fold cross-validation as described in Sec.4.1.2. Figure 9 shows the mean Dice coefficients pooled from all folds for all tested proportions of labelled slices q.For q = 1 (i.e. the group on the left), all atlas slices were labelled.In this case, the proposed graph configurations PA-SW-CONF1 and PA-SW-CONF2 are equivalent to multi-atlas segmentation with regularisation refinement (MASr-LW).It can be seen that reducing the proportion of labelled atlas slices to q = 0.4 still yields comparable results for both tested configurations.When using fewer labelled slices, the performance decays rapidly for PA-SW-CONF1.For the second configuration CONF2, accuracy decreases as well, but more steadily.However, it is important to remember that the performance trade-off for e.g.q = 0.1 stems from one tenth of the labelling effort.Figure 10 shows example segmentation results for one subject at two different slice positions (top and bottom rows) for decreasing values of q (left to right).For the slice in Fig. 10a, even using only every tenth atlas slice (i.e.q = 0.1 on the very right) did not influence the segmentation result.The slice in Fig. 10b was more challenging to segment due to the complex shape of the hippocampus.There, reducing the proportion of labelled atlas slices lead to failure in detecting the folding of the structure.Incorporating constraints preventing holes in the segmentation could potentially help reduce this effect.

Evaluation of Partial Annotation Strategy B: Scribbles (PA-SC)
Finally, we examined the performance of our framework when using data annotated with scribbles as proposed in Sec.3.2.In a first group of experiments, we investigated the scenario when the scribbles were available only on the atlas images.This partial annotation scenario will be referred to as PA-SC-A and was compared against MASr-LW with fully annotated atlases as a gold standard.We used the graph configuration CONF1 (as shown in Fig. 6a) since manual labels were available in roughly the same locations in all images (as opposed to the slicewise annotation strategy where entire slices remained unlabelled).Therefore, the complex propagation scheme CONF2 was not deemed necessary.In the second group of experiments, we examined scenarios which involve placing scribbles on a target image before automated segmentation, closely related to [28].In the simplest configuration, scribbled were placed solely on the target image (PA-SC-T) [28], and no atlases were used.We then investigated if, in addition, a "scribbled" atlas database would improve these results (PA-SC-A+T).Here, scribbles were available both in the atlas database and the target image.Lastly, we used fully annotated atlases in combination with a scribbled target image (PA-SC-AF+T) to obtain a target segmentation with the proposed framework.

Data and experiment setup
These experiments were performed for multi-label cardiac segmentation.The proposed method was tested on a short-axis cardiac MR (CMR) dataset of 28 subjects in the end-diastole (ED) phase.The CMR data were acquired on a 1.5T Philips Achieva system (Best, The Netherlands) using a 32-channel coil and the balanced-steady state free precession (b-SSFP) sequence.Images in the left ventricular short-axis plane were acquired using the following parameters: 320 × 320 mm fieldof-view; 3.0 ms repetition time (TR); 1.5 ms echo time (TE); 50 ms shot duration; 30 cardiac phases; 8 mm section thickness with a 2 mm gap.The reconstructed MR images are of dimension 288 × 288 × 12, with voxel spacing 1.23 × 1.23 × 10 mm.The LV cavity, LV myocardium, and the RV cavity were manually annotated by two experienced imaging scientists.Ten subjects were labelled by one observer, whereas the other 18 were labelled by the second observer.The annotation time for a complete image was approximately 30 min.
In addition, all images were partially annotated by a third observer.For this purpose, scribbles were placed on every slice for all structures (including the background).The task was set such that the observer should rapidly label large areas while not delineating the structure boundaries.This allowed the annotation time to be reduced to a mean time of 3.9 ± 0.6 min, i.e. a speedup of a factor > 7 compared to a full annotation.All manual annotations were done using ITK-SNAP [44].
The propagation weights β ij for label fusion were chosen as in [10], where the same cardiac dataset was used.There, an exponential kernel was proposed based on the sum of squared distances between two patches centred around corresponding voxels in the target and atlas image.The optimal kernel width was found to be 50 and the patch size 3×3×1 voxels.Suitable parameters for spatial regularisation a, σ 1 were found in a training step as described in Sec.4.4.3.

Results
The proposed configurations were evaluated using each image not used during parameter training as a target image.The remaining images were used as atlas images, respectively.For each target subject, the 15 most similar remaining images were used as atlases as in [10] (measured with normalised mutual information).
Figure 12a shows mean Dice coefficients for the first group of experiments, where scribbles were placed on the atlases, and completely unlabelled target images were segmented using the proposed framework.It can be seen that using scribbled atlases (PA-SC-A) yielded results comparable to MASr-LW (where fully annotated atlases were used) for the right and left ventricle.For the myocardium, using scribbled atlases could not match the accuracy achieved when using fully annotated atlases.Figure 13 shows example segmentation results for one subject.It can be seen that the results of PA-SC-A and MASr-LW are similar.However, since there is no boundary delineation in the scribbled atlases, the resulting segmentation results for PA-SC-A were more intensity driven as can be seen for example in the myocardium in the mid-ventricular view.
The results for the second group of experiments are shown in Fig. 12b.Here, the target images to be segmented contained scribbles.In the simplest configuration PA-SC-T, a target segmentation is obtained from the scribbled target image only.Adding the scribbled atlases (PA-SC-A+T) yielded results very similar to PA-SC-T.However, placing scribbles in a target image to aid segmentation using fully annotated atlases (PA-SC-AF+T) yielded considerable improvements over both PA-SC-T (as seen in Fig. 12b) and MASr-LW (as seen in Fig. 12a.Visual results for these experiments are shown in Fig. 14 for the same subject as above.It can be seen that all three methods containing target scribbles were able to detect the myocardium in the apical slice, which was not possible using only atlas information (as seen in the middle row in Fig. 13).Furthermore, it can be seen that the segmentation obtained with fully annotated atlases and a scribbled target image (PA-SC-AF+T) is visually very similar to the ground truth segmentation, which is also reflected in the high Dice scores reported in Fig. 12b.

Parameter settings for the slicewise (SW) partial annotation strategy
For the experiments using slicewise partial annotations (Sec.4.2), the spatial regularisation parameters a, σ 1 were trained on the same training dataset as above.The parameters were tuned separately for both graph examined configurations CONF1 and CONF2.Figure 17 shows optimal parameter choices for both PA-SW-CONF1 (Fig. 17a) and PA-SW-CONF2 (Fig. 17b) when using different proportions q of annotated atlas slices.The parameters with the highest mean Dice score for each configuration and each q were used during the evaluation.

Parameter settings for the scribbles (SC) partial annotation strategy
Here, parameter training is discussed for the final experiment (Sec.4.3) where scribbles are used for cardiac segmentation.To find parameter settings for spatial regularisation, 10 random subjects were selected as target images.For each target subject, the 15 most similar images from the remaining population were used as atlases as in [10].The parameter space was explored on the selected target subjects and the best performing set was used for the remaining population.The spatial regularisation parameters a, σ 1 were explored in a range of {0, 0.001, 0.01, 0.1, 1} and {1, 10, 50, 100, 300}, respectively.Figure 18 shows the training results for all experiment configurations, with optimal parameter choices marked with a white cross.

DISCUSSION
In the experiments section, we first demonstrated how our framework can be used to express state-of-the-art techniques through modifications in the graphical representation of the labelling problem (Sec.4.1).In particular, label fusion using the majority vote rule [2], [8] and locally weighted vote rule [7], [9], [10] were compared against locally weighted label fusion with added regularisation for spatial coherence.As expected, using more atlases generally improved segmentation accuracy [2].The parameters for locally weighted label fusion were only trained using 20 atlases, which may explain the drop in performance of MAS-LW compared to MAS-MV when using fewer (i.e. 5 or 10) atlases.More elaborate parameter training should remove this effect as locally weighted fusion has been shown to outperform majority vote in The white cross marks the optimal parameter choice.
similar settings [9].Regularisation in the target image (MASr-LW) performed consistently better than MAS-LW.However, improvements became smaller for larger datasets where label fusion from many atlases caused inherent smoothness, yielding decreased benefit from additional spatial regularisation.By re-interpreting label fusion (i.e.label propagation) as a pairwise component on a Markov Random Field energy function, it is possible to go beyond the scope of existing applications for multi-atlas segmentation.An important point is that the modular graph structure, where pairwise terms can be used for label propagation (between images) or spatial regularisation (within images) and where a unary term can be used to encode manual annotations, allows a relaxation of the annotation requirements for atlases.Therefore, the proposed framework can employ partially annotated images and represent unlabelled voxels simply by removing terminal links in the graph structure.Furthermore, the label propagation and regularisation schemes can be configured in different ways to facilitate information propagation in the graph.In Sec.4.2, two configurations were used for hippocampal segmentation using partially labelled atlases where only a proportion of slices in each image were annotated.The results showed that with both configurations, it was possible to achieve robust results when using as little as 40% of the annotations.Using the configuration where labels were propagated between atlases as well as to the target image (PA-SW-CONF2), it was possible to reduce the amount of labelled slices even further while still obtaining mean Dice coefficients of 0.83 ± 0.08 for q = 0.1.In that case for example, only every tenth slice was labelled in the atlases.Depending on the application, this performance trade-off could be acceptable, and this would  mean that partially annotated atlas databases could be built in 10% of the time required to create a fully labelled dataset.When allowing propagation only between each atlas and the target image (PA-SW-CONF1), the performance decayed as the proportion of labelled atlas slices was reduced.This can be explained by the increased distance between labelled slices, making it more difficult for intra-image regularisation to interpolate labels.In contrast to CONF2, in CONF1 each voxel in the atlases is connected only to its spatial neighbours and the target image.Therefore, there may be large distances (on the graph) between unlabelled and labelled nodes.CONF2 addresses this problem by facilitating propagation between atlases as well, therefore reducing the distances of unlabelled nodes to nodes with strong data terms.
In the slicewise annotation strategy discussed above, the selected slices were completely annotated with detailed delineations of structures of interest.In contrast, scribbles were proposed as an alternative partial annotation strategy in Sec.3.2, with the aim to save time by not requiring the observer to delineate the structure boundaries.We chose to design the task such that the scribbled areas were as large as possible without sacrificing speed on annotating details (as shown in Fig. 11c).Placing smaller scribbles could further increase speed more, but likely at the expense of segmentation accuracy.The results presented in Fig. 12a show that using scribbled atlases yielded comparable performance to MASr-LW, albeit with slightly worse accuracy in the myocardium.The final set of experiments assumed the infrastructure for placing manual scribbles is available at segmentation time, as for example in interactive segmentation [28].Results (Fig. 12b) showed that in this case, the additional help of scribbled atlases did not greatly influence segmentation results, indicating that scribbles in the target directly are sufficient for obtaining an accurate segmentation with the proposed framework.However, it can be seen that in combination with a scribbled target image, a fully annotated atlas set can improve segmentation results considerably in the myocardium, which is the most challenging structure to segment accurately.

Future Work
In the scope of this paper, the data term was used exclusively to encode manual annotations.However, as briefly described in Sec.2.3, more complex models could be applied to the data term such as intensity models for the structures of interest.Furthermore, it would be of great interest to extend the data term to incorporate weak annotations such as bounding boxes or image tags.Another extension to the proposed framework could move from a voxel-wise representation of the images to a supervoxel representation.This change in the graphical representation could enhance the scalability of the proposed method to larger databases.

CONCLUSION
In this paper, we proposed a unifying formulation for label propagation and regularisation based on a novel graphical representation of the labelling problem which is flexible and easily extendable.Small modifications in its configuration allow the use of partially annotated atlas data for segmentation.Experiments on two datasets demonstrated the usefulness of the proposed framework for segmentation using different partial annotation strategies.Pursuing these annotation strategies can save time and make annotating large databases feasible, while leading to robust segmentation results when combined with existing concepts in multi-atlas segmentation.

Fig. 1 :
Fig.1:(a) A toy dataset with an unlabelled target image on the left, atlas images and corresponding manual annotations (blue and red depict different labels) on the right.(b) In MAS, each voxel x in target image i is labelled by label propagation from atlases j ∈ {1, . . ., R} with fusion weights β ij (x).This can also be interpreted as an MRF optimisation problem, where atlas voxels are connected to the terminal nodes with infinitely weighted edges and inter-image edges β ij (x) encode label fusion.

Fig. 3 :Fig. 4 :
Fig. 3: This figure shows different graph configurations representing (a) multi-atlas segmentation with spatial regularisation in the target image, (b) an additional data term in the target image, i.e. encoding intensity models for the data, (c) multi-atlas segmentation with missing atlas labels.Missing labels are reflected in the graph structure by missing terminal connections.

Fig. 5 :
Fig. 5: Schematic showing graph configuration for multi-label CMF using the Pott's Model.The graph (in this figure only one image i is shown) is replicated for each label l.The data term is encoded in the sink constraints for every label.

Fig. 6 :Fig. 7 :
Fig. 6: This figure shows two graph configurations used when employing partially annotated atlas data (blue and red depict different labels), based on the example dataset of Fig. 1a.Voxels with missing labels (white) are disconnected from terminal nodes.In contrast to Fig. 3c, spatial regularisation is enabled in all images.(a) Voxels at each location x in the target image are connected to voxels in atlases j.(b) Additionally, atlas voxels are connected to voxels in other atlases.

Fig. 9 :
Fig. 9: This figure shows mean Dice coefficients for slicewise partial annotation (PA-SW) for different proportions q of labelled atlas slices.PA-SW-CONF1 and PA-SW-CONF2 describe the graph configurations and the error bars depict the standard error.

2 Fig. 10 :Fig. 11 :
Fig. 10: An example segmentation for PA-SW-CONF2 is shown in red, yellow denotes the ground truth segmentation.The same subject is shown at two different slice positions in (a) and (b).From left to right, the proportion of labelled atlas slices q was 1, 0.8, 0.6, 0.4, 0.2, 0.1.

Fig. 12 :
Fig. 12: Mean Dice coefficients are shown for all experiments employing scribbles.(a) compares the performance of configurations using scribbled atlas data to fully annotated atlas data and in (b), results are shown for all configurations where the target itself contains scribbles as well.

4. 4 . 1
Parameter settings for multi-atlas segmentation In this section, we describe the parameter training procedure for the experiments performed in Sec.4.1.First, we determined parameter values {σ 2 , |P |} for MAS-LW as introduced in Eq. 23.To do this, 10 target subjects were randomly drawn from the parameter training data.For each target image, the 20 most similar images in the remaining training images were

Fig. 13 :
Fig. 13: Visual results for a mid-ventricular (top), an apical (middle) and a basal slice (bottom) for one subject.The example image, ground truth segmentation, the segmentation obtained with PA-SC-A and MASr-LW are shown from left to right.

Fig. 14 :Fig. 15 :
Fig. 14: Visual results for a mid-ventricular (top), an apical (middle) and a basal slice (bottom) for one subject.The example image, ground truth segmentation, the segmentation obtained with PA-SC-A+T, PA-SC-T, and PA-SC-AF+T are shown from left to right.

Fig. 17 :
Fig.17:This figure shows mean Dice coefficients for a grid search of the parameter choices using a proportion of q = {1, 0.8, 0.6, 0.4, 0.2, 0.1} labelled slices in the atlases (left to right).The white cross marks the optimal parameter choice for each q.The colours encode the Dice coefficient (see colorbar on the right).The top (a) and bottom (b) rows show results for CONF1 and CONF2, respectively.

Fig. 18 :
Fig. 18: This figure shows the results of parameter training for all experiments investigating the use of scribbles.The color encodes a measure of combined segmentation accuracy in all structures of interest.
This figure shows mean Dice coefficients for a grid search of the parameter choices for MASr-LW using R = {5, 10, 15, 20} atlases (left to right).The white cross marks the optimal parameter choice for each experiment.