Whole-body bone segmentation from MRI for PET/MRI attenuation correction using shape-based averaging

Habib Zaidia) Division of Nuclear Medicine and Molecular Imaging, Department of Medical Imaging, Geneva University Hospital, Geneva 4 CH-1211, Switzerland; Geneva Neuroscience Center, Geneva University, Geneva CH-1205, Switzerland; Department of Nuclear Medicine and Molecular Imaging, University of Groningen, University Medical Center Groningen, Groningen 9700 RB, Netherlands; and Department of Nuclear Medicine, University of Southern Denmark, Odense DK-500, Denmark


INTRODUCTION
Image segmentation and anatomical regions delineation, aiming at classifying a subject image into volumes of interest and background, play a major role in medical image processing.In many situations, such as bone segmentation from magnetic resonance images (MRI), nonideal image quality, or lack of contrast challenges the segmentation task.Due to the high demand for (semi-) automated segmentation techniques, multiatlas based framework has attracted considerable attention because of its promising performance, robustness, and flexibility. 1Technically, multiatlas refers to a set of images, preferably from the same body region of the test image together with approved target segmentation information.Multiatlas based segmentation typically entails the following steps: (i) individual registration of atlas images to the test image followed by propagation (using the obtained deformation fields) of the corresponding label information to the test subject coordinates, and (ii) final decision on labeling of test image through atlas fusion schemes.Inclusion of multiple atlases is considered to be an effective procedure, which has been found to be more accurate than any of the individual atlases. 1,2he atlas fusion task is typically carried out using voxelwise decision merging techniques, such as majority voting (MV) 3 or assigning likelihoods to each of the outcome possibilities via Bayesian classifiers. 4Moreover, the information provided by each of the raters can be preferably assessed and weighted according to performance and relevance to enhance the effectiveness of the atlas fusion task. 5To achieve the best possible accuracy, this task is commonly carried out locally or at the voxel level, which might lead to undesirable fragmentation of the structures owing to local noise and spatial uncertainty. 6o address this issue, shape-based averaging (SBA), which exploits the natural distance relationship between the voxels to combine different raters decisions based on the signed distance maps of the atlas label maps, was introduced. 7Basically, the SBA technique relies on the Euclidean distance transform, which assigns the distance from the nearest boundary of the target object to each voxel of the label map.The decision about the final segmentation is made by computing the average of all distance maps calculated on rater label maps.
Atlas fusion SBA performed as well as or even slightly better than MV in terms of segmentation accuracy and object identification on human brain MRI data. 7Besides, SBA resulted in less specious fragmentation, which led to significantly more natural and contiguous structures compared to the popular majority voting approach. 6Moreover, SBA exhibited more robustness to poor resolution and small number of input atlases. 7Robitaille and Duchesne 8 compared a number of effective atlas fusion schemes, namely SBA, simultaneous truth and performance level estimation (STAPLE) 9 and MV for brain MRI segmentation using datasets from the International Consortium for Brain Mapping (ICBM). 10In almost all simulations and studies involving segmentation of 2D and 3D neuroanatomical structures, the SBA atlas fusion scheme outperformed the other methods in terms of recognition rate.
The outstanding merit of SBA, in addition to its robustness to noise and homogeneity of output labeling, is thought to be the good performance on a small number of input atlases.In this regard, we chose the SBA method for bone volume identification from whole-body MR images with the aim to incorporate bone in the derivation of attenuation maps for attenuation correction in hybrid PET/MRI.Correcting the PET emission data for attenuation is still a major challenge in PET/MRI owing to the lack of direct correlation between MRI signal and attenuation properties of biological tissues. 11The complexity and variability of the human anatomy combined with the high level of noise and partial volume effect make the segmentation of bone from MRI a challenging task.In addition, application of dedicated MR sequences enabling to delineate bony structures, such as ultra-short echo time (UTE) 12 or zero time echo (ZTE), 13 in whole-body imaging is not yet clinically feasible owing to the long acquisition time and susceptibility to artifacts when using a large field-of-view.][20] Despite promising potential reported in the literature, we observed that the SBA technique tends to underestimate bone volume identification in a comparative assessment of common atlas-based bone segmentation techniques from MRI using a set of MR/CT image pairs. 21Although SBA exhibited poor performance compared to other techniques, its best performance was achieved on a relatively small number of input atlases.
In this work, we aim to propose the SBA method for wholebody bone segmentation from MRI motivated by its remarkable merits, in particular robustness to a small number of input atlases and minimal factitious fragmentation of contiguous structures.These features could be appealing to PET/MRI attenuation map generation, thus allowing the derivation of artifact free attenuation maps using gender/body/posturespecific atlas dataset, which may contain only few atlas images.The main contribution of this work is the combination of the promising SBA atlas fusion scheme with state-ofthe-art statistical atlas selection/weighting methods, such as STAPLE (Ref.9) and selective and iterative method for performance level estimation (SIMPLE), 22 to adapt the SBA technique and enhance its performance for the purpose of whole-body bone segmentation from MRI.In addition, a simple atlas weighting scheme based on shape similarity with a competitive performance was proposed for incorporation within the SBA technique.The performance of different atlas fusion schemes was evaluated in terms of whole-body bone extraction and PET attenuation correction accuracy.

2.A. Data acquisition
The study population comprised 21 patients who underwent whole-body 18 F-FDG PET/CT and MRI scanning for staging of head and neck malignancies.The study protocol was approved by the institutional ethics committee and the patients signed written informed consent.PET/CT imaging was performed on a Biograph 64 True Point scanner (Siemens Healthcare, Erlangen, Germany).CT images were acquired for attenuation correction using an unenhanced low dose CT scan (120 kVp, 60 mAs, 24 × 1.5 collimation, pitch of 1.2 and 1 s/rotation) after a localization scout scan.PET data acquisition started at 146.2 ± 20 min post-injection of 18 F-FDG (371 ±23 MBq) with 3 min per bed position for a total of 4-5 beds, resulting in a total acquisition time of 15-18 min.MRI examinations were performed on the Ingenuity TF PET/MR (Philips Healthcare, Cleveland, USA) 23 using a whole-body MRI Dixon volumetric interpolated T1-weighted sequence 24 using the following parameters: flip angle 10˚, TE 1 1.1 ms, TE 2 2.0 ms, TR 3.2 ms, 450 × 354 mm 2 transverse FOV, 0.85 × 0.85 × 3 mm 3 voxel size, and a total acquisition time of 2-3 min.The slight time difference is due to different axial coverages depending on the patients' length.Some patients underwent one or two more bed-positions acquisition.The inphase image generated by the MRI Dixon sequence was used for the segmentation process.Due to temporal shift between MRI and CT acquisitions, the in-phase MR image of each patient was nonrigidly registered to the corresponding CT image to ensure spatial alignment of MR and CT image pairs.The segmented bone from CT images (produced by applying an intensity threshold of 140 HU) can be considered as the ground truth bone label map for corresponding MR images.The MR and CT image pairs after registration had 512 × 512 matrix resolution in 207 axial slices.
6][27] To remove or minimize these effects, in-phase MR images of all patients underwent the following corrections: (i) gradient anisotropic diffusion filtering 28 for statistical noise removal using a conductance = 4, number of iterations = 10, and time step = 0.01, (ii) N4 bias field correction 29 for canceling out intrasubject intensity inhomogeneity using B-spline grid resolution = 400, iteration number = 200 (at each grid resolution), convergence threshold = 0.001, B-spline order = 3, spline distance = 400, number of histogram bins = 256, and shrink factor = 3, (iii) histogram matching 30 to minimize intersubject intensity nonuniformity using a histogram level = 512 and match points = 64.

2.B.1. Shape-based averaging
In principle, the combination of multiple atlas information or segmentations would result in a more accurate outcome than any of the individual segmentations.Unlike other segmentation fusion schemes, SBA exploits a natural distance relationship between the voxels of an image to achieve the segmentation fusion task.Usually, this distance relationship is expressed in terms of signed Euclidean distance maps of the bone label in each of the input atlas images.Let us suppose that L j (x i ) stands for the label map of atlas ( j) at voxel (x i ), for instance with a value of 0 for the background (BKG) and 1 for bone, and O j (x i ) denotes its corresponding signed Euclidean distance of voxels (x i ) from the nearest voxel with the bone label in L j .O j (x i ) is positive inside bone volume, negative outside, and equal to zero if and only if the voxel resides on the surface of bony structures.In other words, each voxel in the bone label map is weighted based on its distance from the surface of the bony structures.The voxels deep inside the bony structures receive higher positive weights and voxels further outside the bony structures receive higher negative weights [O j (x i )].
In the first step, the distance maps of all structures in all input atlases/segmentations are computed [O j (x i )] and then, the average distance (D ave ) of voxel (x i ) from nearest bone label is defined as where A indicates the number of input atlas images.Thereafter, the segmentation outcome for each voxel T(x i ) is determined through minimization of the average distance from combined label maps.Equation (2) tends to assign labels (bone or BKG) to each voxel, such that it minimizes the average distance map (D ave ) to the assigned labels, This minimization problem can be iteratively solved by applying Algorithm I below.
Compared to other atlas fusion schemes, SBA produces a smoother, more regular and natural looking output segmentations as well as comparable recognition rate. 7,8However, the SBA method resulted in a dramatic underestimation of bone recognition in our previous study. 21This stems from the complexity of structures and the relatively large number of fragmentations of whole-body bone tissue.Figure 1 depicts an example of the SBA fusion scheme, in which three atlas images contributed to the segmentation task while atlas No. 3 exhibited poor performance.Since in the SBA scheme, the fusion task is performed based on distance maps, the poor performance of atlas No. 3 is also magnified in the corresponding distance map at the indicated point, consequently leading to underestimation of the segmentation despite the fact that the majority of the atlases voted differently.Due to the large anatomical variability, different body postures, nonrigid motion and far from perfect accuracy of atlas registration in whole-body imaging (compared, for instance, to brain imaging), this issue is very likely to occur in multiatlas whole-body segmentation when applying the SBA method.
Motivated by the remarkable advantages of the SBA technique, we aimed to improve its performance in the context of whole-body bone segmentation through incorporation of SIMPLE and STAPLE procedures.To this end, the SBA scheme was combined with SIMPLE and STAPLE methods at both local and global levels.Moreover, a simple but effective atlas selection procedure based on shape comparison was A I. Shape-based averaging algorithm. 7

For all x
Initialization: loop over all voxel T (x i ) = unknown Unknown label for target label map Auxiliary distance map End

For all labels
Main loop: over all labels For all x D ave ( proposed.In the following, we elaborate on the mathematical foundations and algorithmic implementation of the proposed methods.

2.B.2. Global SIMPLE (G-SIMPLE)
SIMPLE employs an atlas selection scheme after registration through iterative performance estimation and uses this performance as a weight in the label fusion process. 22oorly performing atlases act as noise in the SBA scheme as illustrated in Fig. 1.At each iteration, SIMPLE discards badly performing atlases such that they no longer contribute to the final segmentation.In the first step, all registered atlases are combined (here using the SBA scheme) to create a rough estimation of the ground truth segmentation (T 0 ).
In the next step, the performance of each atlas where Sl k indicates the selected atlases at the kth iteration, TPR (true positive rate) and TNR (true negative rate) indicate sensitivity and specificity, respectively, and SP indicates a binary overlap measure composed of the average of sensitivity and specificity (see Eq. (3) below).The estimated performance of the atlases is used to identify and discard poor performing ones in the next iteration to generate (T k+1 ), which in turn enhances the accuracy of the new estimated segmentation.This process is repeated iteratively to optimize the selection of atlases (Algorithm II).In Algorithm II, we propose to combine the SIMPLE atlas selection/weighting method with the SBA atlas fusion scheme.
To ensure that an atlas image is not excluded prematurely in the first few iterations (here we used three), all atlas images contribute to the segmentation.Technically, the atlas selection threshold (θ) plays a key role as it determines both the convergence rate and performance of the algorithm.To find the optimal value of θ, we employed the procedure described in Ref. 22 where the atlas selection threshold is determined as a function of the mean and standard deviation σ k of the performance of all atlases j ∈ Sl k : θ k = Ø k − cσ k .The free parameter (c) produced the best results when defined as an increasing function of the number of selected atlases c/Num(Sl k ) because in early iterations a larger number of atlases should be discarded and as the algorithm proceeds, the selection threshold should decrease to discard atlases less quickly.At the end of each iteration, the estimation of the ground truth segmentation is achieved by applying the SBA fusion scheme on the selected atlas images.The binary overlap measures (such as Dice and combination of sensitivity and specificity introduced earlier) between atlas images and the estimated ground truth segmentation (T k ) are used to discard poor atlases.In our experiments, the sensitivity performed slightly better than Dice.Therefore, we selected it A II.Global SIMPLE algorithm (G-SIMPLE).
1. Segmentation (T k ) using the SBA scheme weighted by estimated performances Ø j = f L j , T k −1 using the selected atlases as the similarity measure for threshold selection.The binary overlap measure used to estimate the performance of atlases is estimated globally (whole image).We call this approach G-SIMPLE.

2.B.3. Global STAPLE (G-STAPLE)
STAPLE considers a set of label maps, provided by atlas registration, and computes a probabilistic estimate of the ground truth segmentation as well as a criterion of the performance level achieved by each of the atlases. 9The task of probabilistic estimate of the ground truth segmentation is carried out through combining an optimal subset of the atlases (or label maps) weighted by the estimated performance level.
Here, we consider Y (x i ) as the target MR image with unknown label map T(x i ) containing N voxels.Given A atlas images or label maps (L(x i )) registered to the target image (Y ), a D i j matrix of size N × A can be formed where the rows (D j ) represent L j .Assuming an estimate of the ground truth segmentation (T) is available, the sensitivity and specificity of each atlas can be calculated using Sensitivity (true positive ratio): TPR = TP /(TP + FN), Specificity (true netative ratio): TNR = TN /(TN + FP), where TP and FN are the true positive and false negative rates, respectively.Then, the sensitivity P = (p 1 , p 2 , ... ,p A ) T and specificity Q = (q 1 , q 2 , . . ., q A ) T matrices can be generated where the elements p and q indicate the sensitivity and specificity of one of the atlas images, respectively.Given a matrix D containing all the atlas label maps and hidden true target label map T, the probability mass function of the complete data would be f (D,T |P,Q ), where the goal is to estimate the performance level parameters of the atlases characterized by P or Q, which maximize the log likelihood function of the complete data, Based on the assumption that atlas label maps are all conditionally independent, given the estimated ground truth segmentation and performance level parameters, a version of the expectation-maximization (EM) algorithm was employed to estimate the solution of the maximization problem.A detailed description of the employed EM algorithm is provided in Refs.9 and 31.In the expectation-step an estimation of the unknown target label map is derived through computation of the conditional probability at each voxel where w i indicates the probability of the true target segmentation at voxel (i) subject to To merge the SBA atlas fusion scheme and STAPLE atlas selection method, the signed Euclidean distance map of all atlas labels were computed (O i j ) considering positive values inside bony structures and negative values outside.Then, the calculated distance map was incorporated into the conditional probability [Eq.( 5)] as a weighting factor for each voxel Here, f (T (x i ) = 1)/ f (T (x i ) = 0) serves as prior knowledge for the presence/absence of bone in the target image, where a constant value of 1 is considered for all voxels.Equation ( 7) differs from the original STAPLE method in the sense that it is weighted with signed distance map of bone label.
In the Maximization-step, given the estimated weight variables (w i ), performance level parameters that maximize the conditional expectation of the complete data log likelihood function can be computed considering Regarding Eq. ( 8), since the sensitivity and specificity of each atlas are estimated using all voxels of the image (as opposed to local estimation) we call this method G-STAPLE.

2.B.4. Local STAPLE (L-STAPLE)
So far, the described approaches (G-SIMPLE and G-STAPLE) combined with the SBA scheme were designed to assign weights to each atlas based on the global measurement of performance level.In this section, the same concept is employed for local assignment of weights based on the atlas performance measured on a subset or small patch of images.The procedure (expectation-and maximization-steps) is quite similar as the one described in Sec.2.B.3 but the sensitivity and specificity are computed locally and as such, a spatially varying (voxelwise) sensitivity-specificity matrix can be formed as described in Ref. 32, where γ i j (p i j ,q i j ) indicates the sensitivity and specificity pair measured for the atlas j at the voxel i.Given a matrix γ, the aim is to estimate the voxelwise performance level parameters of the atlases which maximize the log likelihood function of the complete data, In a way similar to the one described previously, the expectation-step deals with an estimation of the unknown target label map derived from the computation of the conditional probability at each voxel The SBA atlas fusion scheme is then combined with spatially varying STAPLE approach via incorporation of the signed Euclidean distance map (O i j ) into the conditional probability [Eq.(11)] as a weighting factor These equations differ from the original spatially varying STAPLE algorithm since they are weighted with the signed distance map of bone label.The Maximization-step computes the performance level parameters that maximize the conditional expectation of the complete data log likelihood function By employing the spatially varying or L-STAPLE approach, a new parameter is introduced (R), which represents the window size centered at the target voxel (i) for local measurement of the atlas performance.

2.B.5. Local shape comparison (L-Shp)
The methods described so far tend to improve the quality of the segmentation outcome through iterative refinement of the atlas selection task.The iterative nature of these methods makes them computationally demanding, particularly when implemented locally (voxelwise).To address this issue, a robust and computationally efficient similarity measure is proposed based on local shape comparison (L-Shp).In this approach, the average distance map of the SBA scheme is computed through voxelwise weighting of the atlases according to their similarity to the others To calculate the weighting factor (aw j i ) −β , the distance map of the bone label should be created for all atlas images.Thereafter, the weight for atlas ( j) at voxel (i) is obtained from where H(:) denotes the Heaviside step function. 33In a similar manner to L-STAPLE, a local window centered at the target voxel (patch i ) is defined to calculate the local weights.This procedure implicitly defines the weights based on MV since the value of weights not only depends on the shape similarity but also on the number of similar atlases in the dataset.

2.C. Parameter optimization
The methods described in Secs.2.B.2-2.B.5 contain free parameters that need to be optimized to reach optimal performance.The atlas selection threshold in the G-SIMPLE method plays a crucial role since a conservative threshold will discard fewer atlases at each iteration, which results in slow convergence rate.On the other hand, fast convergence can be achieved by choosing a strict threshold at the risk of unjustly excluding atlases.As mentioned earlier, the atlas selection threshold (θ k ) was determined as a function of the mean and standard deviation of the performance of all atlases θ k = Ø k − cσ k .The best results were achieved when the free parameter (c) was set as an increasing function of the number of selected atlases (ranging from 0.5 to 1.8) and updated after every second iteration.On average, 12 iterations were used for each data set while in the 3 first iterations no thresholding was applied.
For methods tending to refine atlas selection locally, the most influential parameter is the similarity window size (R for L-STAPLE and patch for L-Shp).These free parameters were optimized through a leave-one-out cross-validation (LOOCV) scheme.In the first step, for each individual patient, all the remaining (N − 1) MR images were deformably registered to the target image.Therefore, for each subject, the remaining 20 patients act as atlas data set.The registration process was performed between in-phase MR images of the target and atlas data set.Thereafter, the obtained transformation matrices were used to transform the corresponding atlas CT images to the spatial coordinate of the target MR image.The registration process was carried out through a combination of affine and nonrigid alignment based on the advanced Mattes mutual information 34 implemented in the Elastix package 35 as described in previous studies. 36,37The following parameters were adopted: interpolate: B-spline, optimizer: standard gradient descent, image pyramid schedule: ( 16 in terms of Dice similarity coefficient.This procedure was repeated for the whole dataset and the optimal sizes of R and patch windows were chosen for the rest of the study (Fig. 2).

2.D. Quantitative evaluation
Bone segmentation was carried out for the 21 clinical studies on a PC equipped with Intel Xeon CPU (2.3 GHz) running  (The MathWorks, Inc., Natick, MA).For each patient, the segmentation was repeated five times using the different methods, namely SBA, G-STAPLE, G-SIMPLE, L-STAPLE, L-Shp.Moreover, MV was considered as a conventional atlas fusion scheme.The assessment of the accuracy of the extracted bones was performed through comparison with a bone segmented from the corresponding reference CT images.Bone segmentation from CT images was performed by applying an intensity threshold of 140 HUs.The validation of bone segmentation was reported using seven volume/distancebased metrics: Dice similarity, 38 relative volume difference (RVD), 39 Jaccard similarity (JC), 40 sensitivity (S), 41 mean absolute surface distance (MASD), 42 Hausdorff distance (HD), 43 and distance error (DE), 44 Dice(RefB,T)  region (RefB p ) to the entire set of points of the target region 45 averaged across the T p boundary points.The Hausdorff distance measures the maximum distance one would need to move the boundaries of the source region (RefB) to completely cover the target region (T).
To assess the sensitivity of the different methods to the number of input atlases, the accuracy of bone segmentation and computation time were also evaluated for varying number of input atlases.In addition to bone extraction accuracy, fragmentation of bony structures was evaluated by computing the number of connected regions in the bone label maps.To this end, face-edge-vertex connectivity (two pixels are considered connected provided they share at least one vertex in the pixel grid) was employed to calculate the number of fragments in the target label maps compared to the reference obtained from CT images.
We further extended our validation through quantitative analysis of PET images corrected for attenuation using the bones segmented from MRI (PET-MRAC).To this end, the segmented bone from the target MRI was superimposed onto the MRI-guided 3-class attenuation map implemented on the Philips Ingenuity TF PET/MR scanner.The latter is obtained by segmenting MR images into the surrounding air, lung, and soft-tissue followed by assigning a predefined attenuation coefficient to each tissue class (air: 0 cm −1 , lung: 0.022 cm −1 , soft-tissue: 0.098 cm −1 ). 23,36,46The 3-class attenuation maps with inserted bony structures from reference CT images were taken as the reference in PET attenuation correction analysis (PET-CT).The described segmentation methods generate target bone label maps in binary format.However, a bone label map with continuous linear attenuation coefficients is desired for PET attenuation correction.To this end, we generated continuous valued bone map for each method.The described methods generate the final bone label map through weighted average of the atlas' votes.The obtained weights were multiplied by the values of the original atlas CTs in HUs (instead of binary votes) to recalculate continuous valued bone maps.
PET image reconstruction was performed by means of the e7 tool (Siemens Healthcare, Knoxville, TN) 47 using the ordinary Poisson ordered subset-expectation maximization (OP-OSEM) iterative reconstruction algorithm and default parameters (four iterations, eight subsets, and a postprocessing Gaussian kernel with a FWHM of 5 mm).The differences between bone segmentation techniques were quantified in terms of change in the standardized uptake value (SUV).The SUVs were calculated by dividing the activity concentration by the injected activity divided by body weight.The impact of whole-body bone segmentation on PET attenuation correction accuracy was assessed through calculation of the relative mean SUV error [Eq.( 17)] and relative mean absolute SUV error [Eq.( 18 Relative absolute error In addition to the calculated SUV bias, linear regression analysis was performed on joint histograms of PET-MRAC and PET-CT images in bony structures for each segmentation method.

RESULTS
Figure 3 depicts a representative sagittal slice of the segmented whole-body bones from Dixon in-phase MRI together with reference CT image and bone label map.Visual inspection revealed superior bone identification when using locally regularized methods (L-Shp and L-STAPLE).
Table I summarizes the quantitative analysis of bone identification accuracy using seven standard volume/distancebased metrics.In agreement with Fig. 3, considerable bone detection enhancement was achieved using L-Shp and L-STAPLE methods compared to original SBA and MV methods based on the results achieved using 21 patients.Since the implemented methods exhibited different convergence rates in terms of number of input atlases, the representative segmented bones (Fig. 3) and the quantitative evaluation (Table I) show the best outcomes for each method.Figure 4 depicts the convergence trend of the different methods based on the Dice metric as the number of input atlases varies between 3 and 20.
In addition to the evaluation of the convergence rate, the required computation time for performing whole-body bone segmentation was assessed for different methods as a function of the number of input atlases (varying between 3 and 20). Figure 5 (top) shows the processing time (min) for the SBA and globally regularized methods where most of the computation time is taken by the calculation of the signed Euclidean distance maps on bone labels.Similarly, Fig. 5 (bottom) presents the processing time for the locally regularized methods where most of the processing time is taken by local iterative performance estimation (for L-STAPLE) and local similarity measurement (for L-Shp).
The fragmentation of contiguous structures is one of the major performance measures used in this study to evaluate the efficacy of different methods.The number of connected regions in the segmented images is plotted in Fig. 6, where those obtained from the target CTs are taken as reference.The number of connected regions achieved by MV is substantially higher than the other methods while application of local regularization makes the number of connected regions follow nicely the reference values.
Table II (top) summarizes the relative SUV mean errors measured on PET images corrected for attenuation using the six different whole-body bone segmentation methods (PET-MRAC).The errors were measured voxelwise for all voxels in bony structures as well as in ROIs within high uptake regions in and near bone tissue.Similarly, the SUV mean relative absolute errors are shown in Table II (bottom) averaged over 21 patients.
Figure 7 depicts the joint histogram graphs obtained by plotting the correlation between PET images corrected for attenuation using the different bone segmentation methods (PET-MRAC) and the reference PET-CT image.The joint histograms are plotted by taking only voxels corresponding to bony structures into account.The L-STAPLE and L-Shp schemes yielded the best correlation for bone tissue y = 1.01x + 0.05 with R 2 = 0.99 and y = 1.02x + 0.08 with R 2 = 0.99, respectively, while the SBA method deviated from the identity line ( y = 0.87x + 0.10 with R 2 = 0.97).Figure 8 shows the error maps (relative bias) between reference PET-CT images and PET-MRAC images corrected for attenuation using the six different segmentation methods for one representative clinical study together with attenuation bias maps.

DISCUSSION
We assessed the performance of the SBA atlas fusion scheme for whole-body bone segmentation from Dixon MR images.Despite the promising performance reported in the literature, 6,8 the SBA method significantly underestimated bone identification in the context of whole-body imaging, even compared to MV atlas fusion scheme (Table I and Fig. 3).Due to the complexity of anatomical structures, large number of bone fragments, considerable anatomical variations across T II.Mean relative and absolute errors (bias) between SUV mean estimated using the different PET-MRAC methods and PET-CT images in bony structures and lesions located in and near bone tissue.patients, and greater likelihood of gross registration errors in whole-body imaging (compared to prostate segmentation), the performance of the SBA technique in whole-body bone identification is challenged.The bone identification accuracy did not exceed 0.63 in terms of Dice metric when applying either the SBA or MV methods.SBA tended to overlook small fractions of bone tissues whereas the MV scheme led to dramatically increased factitious fragmentation (artificially disconnected regions in the resulting bone segmentation) (Figs. 3 and 6).

Relative error
The MV technique operates solely on binary label maps and reaches a conclusion based on the consensus of the atlas' votes, while SBA implicitly ignores the consensus of the raters by incorporating the labels' distance map (Fig. 1).As such, in the SBA method, one atlas might have greater impact than a combination of several other atlases, which can result in underestimation/missing of a small fraction of the target segmentation.Fragmentation of the continuous structures plays a minor role in the task of PET attenuation correction considering the processing steps applied to generate attenuation maps (downsampling, Gaussian smoothing).However, this performance measure is an important factor in other applications requiring bone segmentation, such as vertebral fracture and morphometric analysis of the osteochondral elements.Despite suboptimal performance, the SBA method exhibited better convergence rate in terms of the number of input atlases, particularly compared to the MV method.
We proposed to combine the SBA scheme with an atlas selection/refinement technique to enhance the performance of the SBA method and at the same time sustain its superior convergence rate.To this end, the STAPLE method was chosen owing to its proven efficiency. 25Moreover, the SIMPLE method exhibited similar performance when applied globally (G-SIMPLE) with relatively simpler implementation and lower computational complexity.However, the STAPLE method performed better when applied locally (L-STAPLE) compared to local SIMPLE, thus justifying to report only the results achieved by L-STAPLE.In addition, the proposed local shape-based similarity measure (L-Shp) substantially reduced the computation time by up to five times (Fig. 5) while enhancing the performance of the SBA method.The L-Shp method employed local atlas refinement scheme based on the measurement of the structure/shape similarity across the atlas label maps.In the L-STAPLE method, the atlas performance should be updated after each iteration and the whole algorithm takes on average 11 iterations to converge (in this study), while the L-Shp method computes the interatlas similarity matrix only once to generate weights to be inserted into the SBA atlas fusion scheme.In addition to the improved recognition rate when applying local regularization, the number of connected regions in the resulting label maps follows the reference true values, independent of the number of input atlases.Considering Fig. 4, combining local regularization methods (L-STAPLE and L-Shp) with the SBA scheme sustained the convergence rate of the original SBA to some extent.
The quantitative evaluation of the SUV bias demonstrated a similar trend to that of bone segmentation accuracy while PET images showed less sensitivity to errors in bone identification owing to lower spatial resolution and higher level of noise.Both L-STAPLE and L-Shp methods resulted in less than 3% SUV mean relative and less than 6% mean absolute errors in bony structures due to superior bone identification.The joint histogram plot and regression analysis also support the voxelwise and ROI analysis of activity recovery.In this work, we used the 3-class MRI segmentation-based attenuation map for all methods followed by superimposition of only the bone tissue obtained using the different methods.The same 3-class attenuation map with bone tissue derived from reference CT was taken as reference attenuation map.However, a small SUV bias was observed in soft-tissue regions (even far from bony structures).This may be partially due to differences in scatter correction and the inherent smoothing during the generation of attenuation maps.
The results presented in this work demonstrate that the developed methods are promising and can potentially be used in the clinic.However, atlas-based segmentation techniques consist of some key steps where each step plays a critical role in the quality of the ultimate outcome.The first step is the creation of the atlas data set that should cover a reasonably wide variability of the human anatomy, with a sufficient number of samples for each category based for instance on gender, pathology, body weight and length, body mass index, sitting height/stature ratio,. . .etc.The richness of the atlas data set greatly influences the quality of outcomes since it increases the likelihood of finding an atlas similar to the target subject.The registration algorithm and optimization of their associated free parameters are also important determining factors.The final step is the atlas fusion task, which determines the segmentation output while trying to minimize the nonsystematic errors arising from the previous steps.In this light, atlas-based methods entail many steps and free parameters that need to be optimized prior to potential use in the clinic.These issues question the robustness of the atlas-based techniques to be utilized as the method of choice.Therefore, despite the superior performance of atlas-based methods, as reported in this study and numerous previous works, the shift toward clinical implementation of these techniques is only gradual.
It is the intention of the authors to make the algorithms presented in this work available to the research community under the terms of the GNU Lesser General Public License version.This also applies to the CT/MRI atlases provided approval from our institution/IRB is secured.

CONCLUSION
In this work, we evaluated the performance of the SBA atlas fusion scheme in the context of whole-body bone segmentation from MR images.The SBA technique exhibited suboptimal performance compared to the conventional MV atlas fusion scheme.To enhance the performance of the SBA method, the technique was combined with STAPLE and SIMPLE atlas selection (refinement) methods.Local (voxelwise) application of the STAPLE method led to a dramatic improvement of bone fragmentation and recognition accuracy despite the high computation time.Moreover, a shape comparison based atlas weighting scheme (L-Shp) was proposed, which achieved similar results to L-STAPLE while considerably diminishing the computational burden.The quantitative SUV analysis revealed good correlation between PET images corrected for attenuation using both L-STAPLE and L-Shp methods and the corresponding reference CT images.The technique is promising for potential application in MRI-guided attenuation correction in whole-body PET/MRI.
End Medical Physics, Vol.43, No. 11, November 2016 F. 1. Illustration of SBA atlas fusion framework.The top left panel (A) depicts three binary label images generated by three different atlases and the middle row shows the corresponding distance maps.The lower row presents the combined distance map together with the final labeling result.The right panel depicts the profiles of distance maps corresponding to the lines indicated on the atlases (B) and resulting (C) distance maps.The poor performance of one of the raters (here atlas 3) led to signification underestimation of the segmentation output.

8 4 2 2 )
, grid spacing schedule (32.0 16.0 8.0 4.0 2.0), maximum number of iterations (4096 4096 2048 1024 512), number of histogram bins: 32.The sizes of R and patch windows were optimized by varying the window size from 4 to 30 mm.The resulting bone segmentation was then evaluated F. 2. Optimization of R window size for local atlas performance assessment in L-STAPLE method (top) and patch window size for local shape similarity assessment in the L-Shp method (bottom).
RefB,T) = d ave (S RefB ,S T ) + d ave (S T ,S RefB ) p ,T p ), HD(RefB,T) = max RefB {min T {d (RefB,T)}}, (16) where RefB is the bone segmented from the reference CT image and T denotes the extracted bone from the target MRI.d ave (S Ref B ,S T ) is the average direct surface distance from all points on the reference bone surface S Ref B to the segmented bone surface S T .The distance error is equal to the minimum distance from each boundary point of the source )] in bony structures (voxelwise evaluation) F. 4. Performance evaluation of the different segmentation methods based on the Dice metric as a function of the number of input atlases.Medical Physics, Vol.43, No. 11, November 2016 F. 5. Average computation time for performing whole-body bone segmentation (top) for SBA, G-STAPLE, and G-SIMPLE methods (minute).(Bottom) for L-STAPLE and L-Shp methods (hour) measured at varying number of input atlases.and lesions in/near bony structures (region of interest-based evaluation).The relative errors of SUV estimates from PET-MRAC images are calculated relative to the SUV measured on PET-CT images used as reference.The relative absolute error [Eq.(18)] gives information about reconstruction errors and deviations from the expected values, whereas the relative error [Eq.(17)] reflects the inherent bias in the methodology.The regions of interest (ROIs) were manually drawn on malignant lesions located inside bony structures or in close proximity such that their activity recovery can be affected by bones, F. 6. Performance of the different segmentation methods in terms of number of connected regions vs the number of input atlases.Ref indicates the number of connected regions obtained from reference CT images.Medical Physics, Vol.43, No. 11, November 2016 Relative error(%) = MRAC(SUV) − Reference(SUV) Reference(SUV) × 100%,

F. 8 .
SUV bias maps between PET-CT and PET-MRAC images (top row) and attenuation error maps (bottom row) between reference CT images and attenuation maps generated by (A) SBA, (B) MV, (C) G-STAPLE, (D) G-SIMPLE, (E) L-STAPLE, and (F) L-Shp methods.
T I. Comparison of bone segmentation accuracy (mean ± SD) between the different approaches using the selected evaluation metrics, including the Dice, relative volume distance (RVD), Jaccard similarity (JC), sensitivity (S), mean absolute surface distance (MASD), Hausdorff distance (HD), distance error (DE).