Automated Segmentation of Fractured Distal Radii by 3D Geodesic Active Contouring of in vivo HR-pQCT Images

Radius fractures are among the most common fracture types; however, there is limited consensus on the standard of care. A better understanding of the fracture healing process could help to shape future treatment protocols and thus improve functional outcomes of patients. High-resolution peripheral quantitative computed tomography (HR-pQCT) allows monitoring and evaluation of the radius on the micro-structural level, which is crucial to our understanding of fracture healing. However, current radius fracture studies using HR-pQCT are limited by the lack of automated contouring routines, hence only including small number of patients due to the prohibitively time-consuming task of manually contouring HR-pQCT images. In the present study, a new method to automatically contour images of distal radius fractures based on 3D morphological geodesic active contours (3D-GAC) is presented. Contours of 60 HR-pQCT images of fractured and conservatively treated radii spanning the healing process up to one year post-fracture are compared to the current gold standard, hand-drawn 2D contours, to assess the accuracy of the algorithm. Furthermore, robustness was established by applying the algorithm to HR-pQCT images of intact radii of 73 patients and comparing the resulting morphometric indices to the gold standard patient evaluation including a threshold- and dilation-based contouring approach. Reproducibility was evaluated using repeat scans of intact radii of 19 patients. The new 3D-GAC approach offers contours within inter-operator variability for images of fractured distal radii (mean Dice score of 0.992 ± 0.004 versus median operator Dice score of 0.993 ± 0.006). The generated contours for images of intact radii yielded morphometric indices within the in vivo reproducibility limits compared to the current gold standard. Additionally, the 3D-GAC approach shows an improved robustness against failure (n = 4) when dealing with cortical interruptions, fracture fragments, etc. compared with the automatic, default manufacturer pipeline (n = 40). Using the 3D-GAC approach assures consistent results, while reducing the need for time-consuming hand-contouring.


Introduction
In our ageing population, bone fractures are increasingly common and have become a major socioeconomic burden [1]. Specifically, fractures of the distal radius are among the most common fracture types and indicative of reduced bone quality [2]. However, there is currently limited consensus on the optimal treatment protocol [3]. The conservative treatment of many radius fractures provides the opportunity to study the fracture healing process in humans. A better understanding of this process could help to shape future treatment protocols of distal radius fractures, ultimately resulting in better functional outcomes for patients.
With the introduction of high-resolution peripheral quantitative computed tomography (HR-pQCT), longitudinal changes in the microstructure of the radius can be monitored non-invasively [4][5][6][7][8]. Use of this technology has revealed critical changes in bone microstructure as a result of aging as well as pharmaceutical and surgical interventions. For example, an increase in cortical porosity, which is highly associated with a decrease in bone strength, was demonstrated in a longitudinal study of patients who had undergone kidney transplantation [8]. Preliminary evidence suggests that HR-pQCT might also be a viable technique to monitor bone fractures in vivo [9]. The microstructure in such studies is of fundamental importance, as techniques such as micro-finite-element analysis can be utilized to estimate the capability of a bone scanned with HR-pQCT to withstand mechanical load, a key parameter to assess successful fracture healing.
with HR-pQCT are small (on the sub-mm length scale) and that the cortex forms a well-defined, highcontrast edge in the HR-pQCT image. However, these assumptions break down when processing images of fractured radii where the cortex is often fragmented, and the presence of cartilaginous callus may blur the boundary between bone and soft tissue. Furthermore, radius fractures often occur in the ultra-distal region, contributing to the complexity of the image segmentation, as the cortex is both thinner and less mineralized. To our knowledge, approaches to generate automatic outer contours for HR-pQCT images of distal radius fractures have not been published yet.
Active contouring [14] is a promising technique that has already been shown to successfully segment HR-pQCT images of healthy distal radii [15,16]. The advantage of the active contouring algorithm is the inherent ability to pass over larger gaps in the bone. This is possible because the requirements for a contour to be both close to the object edges and smooth are balanced. Recently, threedimensional morphological geodesic active contours (3D-GAC) have been implemented in Python, allowing for efficient determination of active contours on 3D images [17,18]. Hence, 3D-GAC appears to be a promising approach to create outer contours of HR-pQCT images of distal radius fractures.
The goal of present study was to investigate the use of fully automatic 3D-GAC to generate outer contours on HR-pQCT images of fractured radii. We validated the accuracy of these contours against hand-drawn contours using HR-pQCT images acquired throughout the first year of the healing process from 10 patients with fractured distal radii. We further assessed the robustness of the algorithm by comparing computed morphometric indices from intact distal radii HR-pQCT images of 73 subjects using contours generated from (i) the 3D-GAC and (ii) the scanner manufacturer's default approach. Lastly, the reproducibility of the 3D-GAC based algorithm was assessed based on contours of 19 human subjects imaged six times using HR-pQCT.

Fractured Radii
Images of radius fractures for 10 out of the 75 patients that completed the study were selected from the database based on the highest available visual grading scores (VGS) [19] and well aligned stacks resulting in 60 HR-pQCT fractured radius images. For each image, the most distal slice before the appearance of the lunate fossa was identified, and all evaluations on these images were performed on slices proximal to that slice.

Intact Radii
A total of 73 out of the 75 patients were selected, which had at least one HR-pQCT image of the contralateral, intact radius with a VGS of three or better, resulting in 438 images. Of these 73 patients, 19 had a VGS of three or better across all time points and were used for the reproducibility study.

Image Pre-Processing
Time-lapsed HR-pQCT images of each patient were registered using rigid image registration [20] to allow direct comparison of generated contours across all time points per patient.

Contours: Current Gold Standards
For ten randomly selected slices per fractured radius HR-pQCT image, hand-drawn 2D contours were generated by three researchers experienced in the processing of medical images (DCT, PRA, CJC), respectively, using the software of the scanner manufacturer. For this procedure, the 2D slice is visualised in the scanner manufacturer's software, and using a computer mouse, the contour is drawn onto the image. The software provides zooming functionality and local edge detection to assist the operator in defining the outer contour. These hand-drawn 2D contours were used as the gold standard to validate the results of the 3D-GAC algorithm for fractured radii and to quantify the inherent inter-observer variability in hand-drawn contours of fractured radii.
For intact radii, the default manufacturer pipeline based on a 3D algorithm by Buie et al. [10], written in the Image Processing Language (IPL) of the scanner manufacturer (Scanco Medical AG, Switzerland), was used as the gold standard. All contours generated using the manufacturer's software were exported as binary 3D images to h5 files [21], a hierarchical data format, for further processing using Python [22].

Novel Contouring Algorithm
The proposed algorithm uses 3D morphological geodesic active contours (3D-GAC) [17] implemented in the Python library Scikit-Image [23] using morphological operators [18]. The guiding principle for geodesic active contours is to minimize an energy term consisting of both internal and image energy.
The internal energy penalizes non-smooth contours, while the image energy penalizes contours away from the voxels of interest. The image energy landscape, in this approach, is generated using an inverse Gaussian gradient, where the Gaussian blur removes local minima and the gradient is an operation to detect object edges.
The proposed contouring algorithm can be separated into four distinct steps. First, the image is segmented into sections that only contain one bone (here, radius or ulna). In a second step, an initial guess of the radius contour is generated. Third, the image is pre-processed and converted to an energy landscape. Finally, the 3D-GAC algorithm is applied in an iterative loop to isolate the surface of the radius.

Separation of Radius and Ulna
3D-GAC are drawn to all intensity-based edges in an image and do not differentiate between objects of interest and other objects. Therefore, it is necessary to remove extraneous objects. Here, the ulna must be removed, as it is in close proximity to the radius. The watershed algorithm implemented in Scikit-Image [23] is used to separate these two bones (Figure 1), based on seed voxels of the different objects.
Since the radius and ulna are typically only a few voxels apart in ultra-distal scans, separating them using a global threshold has the risk of identifying both bones as a single object. Therefore, the radius and ulna are identified in a proximal slice (here, 35-39 slices from the most proximal slice to avoid end effects that may have been introduced as a result of image registration) using a threshold of 250 mg HA / cm 3 and component labelling implemented in the Python library NumPy [24]. The largest component identified is the radius and the second largest one the ulna. Finally, a seeds image is generated by copying the labels of the radius and ulna in the proximal slice into the entire proximal half of a zeroed image ( Figure 1). The distal part was left empty to avoid incorrect assignments of seeds in the distal region, where radius and ulna are not well separated.
The energy landscape for the watershed algorithm is generated from the HR-pQCT image by setting all negative values to zero and inverting the sign of all remaining elements. The algorithm is applied to the entire image such that every voxel -background or bone -is assigned to exactly either the radius or ulna (Figure 1).

Initial Guess
To generate the initial guess contour for the 3D-GAC algorithm, the HR-pQCT image is first normalized to the range of zero to one. Afterwards, the image is equalized in the two orthogonal longitudinal planes independently using a contrast limited adaptive histogram equalization implemented in Scikit-Image [23], and the resulting two images are averaged. A threshold of 0.5 is applied and the ulna portion of the image identified during pre-processing is removed from the thresholded image. Finally, the largest connected component is extracted via component labelling, the structure is closed with a closing distance of 50 voxels, the remaining interior holes are filled in an additional component-labelling step and the structure is dilated five iterations to obtain the initial guess ( Figure 1).

Pre-Processing of Image
To prepare the image from the scanner for input to the 3D-GAC algorithm, the image is first cropped to the bounds of the initial guess contour to reduce the image size, which reduces computational cost. The image is then normalized to the range of zero to one and equalized in the same way as for the creation of the initial guess. The catchment basins from the pre-processing watershed algorithm are used to set those voxels not belonging to the radius section to zero. Finally, the voxels outside the initial guess are set to the mean value of the initial guess surface voxels, which are those voxels that make up the perimeter of the initial guess. The final pre-processed image ( Figure 1) is then contoured using the 3D-GAC algorithm.

3D Geodesic Active Contours (3D-GAC)
The 3D-GAC algorithm takes an initial guess, an energy landscape and outputs an optimized contour.
Since the algorithm is easily trapped in local minima, i.e. a contour that does not match the surface of the radius, an iterative application of 3D-GAC to energy landscapes containing increasing levels of detail is part of our approach. For each iteration, the pre-processed image is used to create an energy landscape ( Figure 2). For this, the image is first padded in the longitudinal direction by 20 voxels at each end using the NumPy [24] edge mirroring setting. Gaussian blurring is then applied to smooth the energy landscape ( Figure 2). Finally, the padding voxels are removed. Since finer details of the radius can get lost when Gaussian blurring with a large sigma, three iterations of 3D-GAC are run using decreasing sigmas (14.0, 3.0, 1.5 voxels) to allow for an iterative approximation of the true radius contour ( Figure 2). Note that for the first iteration, image resolutions are halved after blurring to speed up computations. Each 3D-GAC application is then run using the default parameters of SciPy [25] with the number of iterations set to five ( Figure 2). For the first iteration of 3D-GAC the initial guess ( Figure 1) is used while successive iterations then use the output of the previous iteration of the 3D GAC algorithm as the new initial guess ( Figure 2). As a final step, component labelling [24] followed by a selection of everything not connected to the background is performed to remove holes in the contour that can appear at the distal image boundary. . The optimized final contour can then be used to segment the radius from the HR-pQCT image background for further analysis.

Morphometric Indices
The Scanco system provides the standard patient evaluation script for XtremeCT II devices, which is the gold standard approach of generating cortical and trabecular masks [12] and computing bone  [10]. However, the standard patient evaluation pipeline also accepts alternative outer contours as input. Outer contours generated from our novel 3D-GAC method were used as input to the standard patient evaluations and the resultant morphometrics were compared to those of the manufacturer's default pipeline.

Study Design
To assess the quality of the generated contours, we compared them to the respective gold standard for two different datasets (fractured and intact distal radii).

Fractured Radii: Comparison of the 3D-GAC Approach to Hand-Drawn Contours
The 3D-GAC method was run for all fracture images. Contours were visually categorised as acceptable, mistakenly including parts of the ulna, having obvious missing parts, or having both of these issues. Only images categorised as acceptable were used for additional quantitative analysis steps. Agreement between hand-drawn contour operators was assessed by computing the Dice score between their respective 2D contours for each image. The Dice score is able to quantify differences in automatically generated contours against reference contours yielding values in the range from zero to one with zero indicating no overlap while one indicates a perfect match. The Dice score is close to percent agreement when comparing contours that are similar to each other. Additionally, the distance transform of the area between the edges of 2D contours between operators was computed. Accuracy of the 3D-GAC approach was determined by taking the median Dice score between the automatic and the three hand-drawn 2D contours for each of the slices per image.
Furthermore, all hand-drawn 2D contours created by the three operators were combined to yield a smallest and largest contour. The voxels between these two contours describes an area of uncertainty for the hand-drawn contours. Voxels in the smallest hand-drawn 2D contour but not in the automatic one and voxels in the 3D-GAC contour but not in the largest hand-drawn 2D contour were extracted and a distance transform was applied as a measure for deviations of the 3D-GAC contours from the hand-drawn 2D ones. The automated default manufacturer pipeline was also run on all fracture images to generate outer contours and categorisation as for the 3D-GAC contours was performed.

Intact Radii: Comparison of the 3D-GAC and Manufacturer's Approach
The 3D-GAC approach and the default manufacturer pipeline were run on all intact radii. Contours of both approaches were visually assessed and categorized as described previously for the fractured radii. For each patient, one image for which both approaches were successful was randomly chosen for further analysis. Accuracy of the 3D-GAC method was assessed by computing the Dice score between each contour of the 3D-GAC method and the respective contour of the manufacturer's method. Additionally, the distance transform of the volume between the surfaces of both approaches was computed as another measure of how much the two methods deviated from each other. Reproducibility was determined by computing the Dice score of each follow-up image with the baseline image for each patient. Morphometrics were computed to assess how strongly the computation of standard morphometric indices is influenced by the choice of outer contour.

Statistics
Bland-Altman plots were created in Python to assess systematic differences between hand-drawn and 3D-GAC contours. Additionally, Bland-Altman plots were generated to assess systematic deviations between morphometric indices generated from the default manufacturer and 3D-GAC contours. To allow visual comparison of the magnitudes of deviations between contour area and different morphometric parameters, all Bland-Altman plots had their y-axis scaled by 13% of the maximum x-value throughout this paper, which was the maximum value not leading to clipping of data. Robust linear models were fitted for Bland-Altman plots with Huber's T for M estimation using the statsmodels package in Python [26]. This package also provides two-sided Student's t-tests for the linear models with zero slope and intercept of each Bland-Altman plot being the null-hypotheses, respectively. Here, the default significance level of p < 0.05 was implemented.
Statistical significance for the comparison of morphometrics was performed using a paired Student's t-test. Significance level was set to p<0.05.

Fractured Radii: Comparison of the 3D-GAC Approach to Hand-Drawn Contours
One of the main aims of the present study was to develop a fully automatic outer contouring approach for HR-PQCT images of distal radius fractures. An image processing pipeline using a combination of adaptive histogram equalization, watershed segmentation, iterative Gaussian blurred energy landscapes, and 3D-GAC was established such that 56 out of 60 images were automatically contoured successfully. In our hand-contoured reference dataset, we observed clear inter-operator variability, which is a known issue of hand-contouring bone [27]. On average, the automatically generated contours by the 3D-GAC algorithm agreed well with operators. Interestingly, both handdrawn and 3d-GAC contours had slightly larger inter-quartile ranges for appointments 2 and 3, likely as a result of the formation of a callus in this time period and the resulting low mineralization of background voxels visible in the HR-pQCT images. This makes it harder for both operators and the algorithm to decide on the correct contour. Given that no systematic deviations were detected between operators and the 3D-GAC approach and that deviations from individual operators were within the range of inter-operator variability ( Figure 5), the 3D-GAC contouring approach developed in this study provides accurate contours for HR-pQCT images of fractured distal radii compared to the gold standard.
As a point of reference, the default manufacturer's pipeline was used to generate outer contours of the fractured images. The default manufacturer pipeline only reliably contoured HR-pQCT images taken one-year post fracture ( Figure 3b). As this pipeline was not designed to work with images of impaired cortices, this might indicate structural weaknesses in the cortex of the healing radii even 6 months post fracture. This observation is in agreement with recent findings suggesting that fracture repair is a long-term process which might take as long as two years [28].

Intact Radii: Comparison of the 3D-GAC and Manufacturer's Approach
The second aim of the present study was to evaluate the robustness of the 3D-GAC contours against the default manufacturer pipeline by comparing the resultant morphometrics. Notably, the failure rate of the automated manufacturer default pipeline was much higher than that of the 3D-GAC approach for the HR-pQCT images of the intact radii (94 vs 12 failed contours, respectively). This may be a result of the patient cohort used in the present study since fracture patients have been shown to have inherently poorer bone quality than standard populations [29]. One of the major reasons for failure of the default manufacturer pipeline was the inclusion of the ulna into the outer contour of the radius, which is a known issue of this approach that is implementation dependent [10]. The second reason for failure was having obvious missing parts, which appeared to happen when the cortex had very thin or low-density structures (Figure 7). The authors would like to emphasise that these failures of the default manufacturer pipeline can be, and often are, overcome by generating hand-drawn contours using a built-in tool of the manufacturer's software. However, as noted previously, the cost in person-hours of such a solution is prohibitive in large patient cohort studies.
An automated solution for this issue was proposed by Buie and co-workers [10]; here, they increased the number of dilation steps of the manufacturer's pipeline but noted that this often lead to unwanted smoothing of the contour [10]. Moreover, defining per-image numbers of dilation steps would reduce the degree of automation drastically. Both approaches were affected by image quality, as expected. However, the 3D-GAC approach only failed for one image with sufficient image quality to be considered for inclusion in HR-pQCT studies [8,27,30]. Trabecular morphometric indices generated using the successful 3d-GAC contours agreed well with those generated using the default manufacturer pipeline with deviations well below the precision error of these indices [31].
Interestingly, for the cortical parameters, accuracy was slightly worse for Ct.Th generated using the 3D-GAC approach, showing deviations of 1-2%; however, these differences were still less than 1 voxel in thickness and thus at the accuracy limit of the device ( Figure 6). Furthermore, differences in cortical thickness were at the reproducibility limit of the standard patient evaluation (1.3-3.9% dependent on subject age) [32] and were still below differences observed between patient groups from the literature, e.g. healthy and osteopenic women (12.8%) [33]. Given the improved success rate and accurate determination of morphometric indices, the 3D-GAC approach showed an overall improvement in robustness compared to the current gold standard.
Lastly, the present study aimed to determine the reproducibility of the 3D-GAC approach using sequential image scans of the intact radii taken over the course of a year. Here, agreement with the baseline contour was found to be high for all time points (Dice scores > 0.994). Again, deviations in Ct.Th were highest, but well within the reproducibility limit of the standard patient evaluation.
Given the improved robustness and sufficient reproducibility, the 3D-GAC approach appears to be a viable option for clinical studies of both fractured and non-fractured radii.

Limitations
This study is not without limitations. The repeat scans of the contralateral site were spaced over a year, and thus may include variations in bone due to natural remodelling. Additional reproducibility scans could not be taken for these patients due to radiation concerns. Future studies should include follow-up scans at protracted intervals to assess how sensitive the 3D-GAC approach is to local and global remodelling. Due to the large structural changes happening during fracture repair, the images of the fracture site could not be regarded as repeat scans. Therefore, no direct assessment of reproducibility was possible for images of fractured radii. However, the high accuracy observed between the hand-drawn and 3D-GAC contours of the fractured images indicates a high level of reproducibility. Another limitation is the low number of hand-drawn contour slices (270) generated to use as reference. While time was a limiting factor (operators reporting to take roughly 1-2 minutes per slice) for creating more hand-drawn contours, we compensated by including images of the intact contralateral site to cover a large variety of bone morphologies in our assessment of the 3D-GAC robustness and repeatability. These images of the intact radius provided a means to benchmark the 3D-GAC approach against existing automatic approaches for intact radii. Note that the 3D-GAC contours were generated only proximal to the lunate fossa, excluding the articulating surface.
Although this limits the range over which the 3D-GAC approach can be applied, the morphometric analysis of the standard patient evaluation has only been developed for that region; as such, this region remains the most relevant for clinical studies. Lastly, no hand-corrected contours were generated for intact radii to estimate the bias introduced by the automated manufacturer and 3D-GAC protocols. However, recent studies have already done this for the gold standard [34] and, given the good agreement of the 3D-GAC approach with the gold standard, the requirement for minor manual corrections should be on a similar level.

Conclusion
Our proposed 3D-GAC algorithm provides a unified pipeline for generating outer contours from HR-pQCT images of distal radii, including both fractured and intact bones. Importantly, our method automatically contours radii both accurately and reproducibly, while showing robustness when dealing with a wide variety of cortex structures. This will facilitate future clinical studies using large patient cohorts to support the development of improved treatment protocols of radius fractures.
The authors acknowledge the Swiss National Science Foundation (320030L_170205), the European Union's Horizon 2020 research and innovation programme under the Marie Skłodowska -Curie grant (agreement 841316), and the ETH Postdoctoral Fellowship for financial support. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Dr. Michael Blauth consults for the Clinical Medical Department of DePuy Sythes in Zuchwil, Switzerland. The remaining authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.