Estimation of bias and variance of measurements made from tomography scans

Tomographic imaging modalities are being increasingly used to quantify internal characteristics of objects for a wide range of applications, from medical imaging to materials science research. However, such measurements are typically presented without an assessment being made of their associated variance or confidence interval. In particular, noise in raw scan data places a fundamental lower limit on the variance and bias of measurements made on the reconstructed 3D volumes. In this paper, the simulation-extrapolation technique, which was originally developed for statistical regression, is adapted to estimate the bias and variance for measurements made from a single scan. The application to x-ray tomography is considered in detail and it is demonstrated that the technique can also allow the robustness of automatic segmentation strategies to be compared.


Introduction
Tomography refers to the imaging of the interior of a specimen in sections or slices, from which a 3D representation of the specimen can be constructed. A key application of tomography is the measurement of characteristics of internal features. For example, x-ray computed tomography (XCT) is being increasingly used as the basis of non-destructive measurement schemes across a wide range of disciplines. Measurements include, for example, tumour size in medical CT scans [1,2] and porosity, defect size and crack growth in materials science and engineering applications [3][4][5][6]. However, such measurements are typically presented without an assessment being made of their associated variance or confidence interval. In particular, noise in scan radiographs places a fundamental lower limit on the variance (precision) and bias (accuracy) of measurements made on the resulting volumetric data sets. The complete process from scan acquisition to quanti fication should therefore be viewed as an estimation process.
Measurement uncertainty in XCT has recently received much attention for industrial dimensional metrology [7,8]. In this application, a representation of the surface of a sample is generated from XCT data and then compared to a computeraided design (CAD) model or reference workpiece to determine if the sample falls within a given tolerance. The overall measurement uncertainty has contributions from [7] (1) the geometric calibration of the tomography system, (2) the measurement process itself (from acquisition to quantification) and (3) variations in the object, arising from surface roughness and thermal expansion for example. There are a number of approaches to estimate (1) as reviewed in [9], which can be carried out before scanning the object of interest. Furthermore, (3) can be minimised by maintaining a suitably controlled environment around the sample. However, the measurement process uncertainty is the most difficult to determine. Kruth et al [7] reviewed methods for assessing this uncertainty which are applicable to dimensional Measurement Science and Technology Estimation of bias and variance of measurements made from tomography scans metrology and suggested that approaches based on simulations or repeated measurements were the most promising. The simulation approach relies on a representative virtual sample being specified in terms of its geometry and its spatially varying complex refractive index. Ray tracing and Monte Carlo methods are then used to generate synthetic CT data, for which an accurate model of the tomography system is required, including probability distributions of all random variables. Such a scanner model will be complex and difficult to specify accurately [10,11], and the simulation process is time consuming. A simplified bootstrap approach was proposed by Hiller et al [12] to estimate certain aspects of the measurement uncertainty. The alternative, experimental, approach is to take repeated measurements of a calibrated test specimen, which is of similar geometry and material as the actual samples [13]. The external dimensions of the test specimen are measured by a 'gold standard' technique (such as a coordinate measurement machine), enabling both the bias and repeatability of the CT measurements to be estimated. A correction for systematic bias can then be applied in subsequent measurements. Simulations can be combined with exper imental results in order to reduce the experimental effort [14].
For general tomography applications, taking repeated measurements is not practical for a number of reasons: (1) scans times can be prohibitively long for high resolution data (e.g. sub-micron resolution XCT scans can take many hours particularly for low contrast specimens [15,16]) or for clinical scanning; (2) it may be important to minimise dose (e.g. for dose sensitive materials or for clinical and preclinical CT); and (3) specimens may change over time (e.g. in situ x-ray and neutron tomography scans of dynamic processes [6,17,18]). Furthermore a 'gold standard' calibration measurement is not available for bias assessment except in specific cases, such as for porosity measurements by mercury porosimetry [19]. The simulation approach is similarly not generally applicable as the geometry and composition of the sample as well as the scanner model will not be known to high accuracy.
The effects of random errors in data have been widely studied in the context of statistical inference. Errors-invariable or 'measurement error' models are regression models that correct for random errors in the dependent variables [20,21]. These errors, if uncorrected, lead to bias in the parameter estimates, loss of power in detecting signals and the masking of features of the data [21]. The simulation-extrapolation (SIMEX) technique is a general purpose measurement error correction technique developed in the 1990s [22][23][24][25]. The SIMEX method involves two stages; the first consists of simulations to add random noise (with known variance and probability distribution) to measurement error covariates. For different noise variances, the mean values of the resulting parameters estimates are calculated over a large number of simulation repeats. The second stage involves using the variation of the parameter estimates with noise level to extrapolate back to the values when there is no measurement error. Cook and Stefanski [22] have shown that the SIMEX method produces an approximately consistent estimator of the true values, where the approx imation relates to the accuracy of the extrapolation step. SIMEX is applicable when the bias is a smooth monotonic function of noise level, and that the noise level and distribution are well understood. A key advantage of the SIMEX method is that it is general purpose as it makes no assumptions about the distribution of the unobserved true covariate (other than it is known).
In this paper, a framework is presented for applying the SIMEX method to investigate the variance and bias in measurements made from volumetric data, as arising from noise in the tomography acquisition. The adaption of the original SIMEX framework involves replacing the solving of the estimating equation in stage 1 with a 'tomographic measurement process', which comprises tomographic acquisition followed by a 'volumetric measurement process' for making measurements from the reconstructed data. Detailed consideration is given to the specific application to XCT, through a simulated test case and two case studies, and it is demonstrated that the method can be used to estimate the bias and uncertainty of measurements made from a single scan. Consideration is also given to minimising systematic sources of bias.

Volumetric and tomographic measurement processes
A 'volumetric measurement process' takes as its input a 3D volume of scalar data, and outputs one or more values which measure certain characteristics of the data. The process will generally involves a number of steps such as filtering (noise reduction), segmentation of the volumetric data into regions (or materials) followed by a calculation made on the binary (or labelled) data. The process can be described by function f V , such that a scalar measurement M of the volumetric data V(r), for spatial coordinate r, is given by M f V r V ( ( )) = . f V is taken to be deterministic such that the only sources of variation are from the volumetric data. Manual segmentation is therefore not considered. SIMEX could be applied directly to the volumetric data if the noise characteristics of each voxel are known (see section 2.2). However, the tomography process typically results in the noise between nearby voxels being correlated and the noise power spectrum varying with position r [33]. This can occur due to the reconstruction step as well as due to detector blurring. In the case of transmission tomography, analytical expressions for the noise power spectrum in parallel beam, fan beam, and cone beam reconstructions have been derived for Fourier-based reconstruction algorithms [34][35][36]. However in practice, it may be difficult to obtain a good estimate of the noise characteristics, particularly when image processing steps are applied before reconstruction. Furthermore, experimental determination of the 3D noise characteristics is highly involved and time consuming [33].
In this paper, a more general approach is taken which is applicable to any particular tomographic imaging technique (which features additive noise), geometry, reconstruction algorithm and pre-processing steps. The general approach involves incorporating together the tomographic acquisition with the volumetric measurement into a 'tomographic measurement process' as shown in figure 1. The process provides a mapping, f T , from the ordered set of pixel intensities, Q k , which comprise the sequence of scan radiographs or raw data, to the output measurement M f Q T k . Again, f T is taken to be deterministic, such that any variation is due to noise in {Q k }. In figure 1, examples are given of the types of algorithms which are typically applied at each step of the process. In XCT, corrections are often made to the raw projection data to reduce beam hardening artefacts, ring artefacts and scatter. However, the scheme is equally applicable to destructive tomography techniques, such as serial sectioning in an SEM, in which the reconstruction process would comprise alignment of the image stack and pre-processing steps may include correction for non-uniform illumination and perspective foreshortening [37]. Many of these processing step are non-linear so that noise in the acquisition will lead to bias in the measurement. In related work, Rajbhandary and Pelc [38] showed that the spectral range of polychromatic x-rays and the log transformation step taken during tomographic reconstruction introduce bias in material decomposition analysis, with the bias increasing with noise level.

SIMEX applied to a tomographic measurement process
For simplicity, consider the case in which the set of measured intensities, {Q k }, of the N pixels in the detector are subject to additive mutually independent noise ε k with mean zero and standard deviation σ k , so that Q P k k k ε = + , where P k are the 'true' pixel values, which are taken to be constant for a given specimen and imaging setup (including specimen stage position and rotation). The naïve approach (using terminology from the statistics community) would be to apply the tomographic measurement process directly to the measured pixel intensities, . However the naïve value will be biased away from the true value, M TRUE , with the magnitude varying with σ k as shown in the second order Taylor series expansion: The SIMEX method involves adding noise to the pixel intensities with the same form as ε k , such that a new set of intensities is calculated as: where Z b is an independent random variable having the same distribution (PDF) as ε k but with variance of 1. The expected variance of Q k ′ over a large number of simulation repeats b = 1,2, … B is then k 2 σ (1 + λ). The expectation of the corresponding measurement M′ over B simulation repeats is then: Provided that E(M′) is a smooth and continuous function of λ, the true value of the measurement can be estimated by calculating E(M′) for a range of values of λ (typically between 0-2) and extrapolating to λ = −1 where the expected variance of Q k ′ tends to zero. That is: SIMEX therefore provides an approximately consistent estimator of the true measurement. The accuracy of the estimator, M SIMEX , is limited by the extrapolation step. Further details can be found in references [21,22,24,25]. SIMEX can therefore be used for bias estimation and hence bias reduction provided that σ k are known or are well estimated. The method can be extended to account for any correlations in noise between pixels, by generating noise from the appropriate multivariate distribution using the known variance-covariance matrix. A noise model to estimate σ k for XCT is developed in section 2.4, and applied to two laboratory XCT systems featuring CCD based detectors in section 4.1. Consideration is given to correlations between pixels for the two systems.
A limitation of SIMEX is that it is not straightforward to calculate the standard error of M SIMEX . Analytical methods to produce 'asymptotic' standard errors have been derived for homogeneous measurement errors in regression applications [23]. Otherwise, the variance can be estimated in a similar way to M SIMEX , namely by calculating the variance of {M b ′ (λ)} for several λ and extrapolating to λ = −1. The extrapolated value, multiplied by −1 yields an approximate estimate for the variance, var(M SIMEX ) [24]. The variance var{M b ′ (λ)} = 0 when λ = 0, so the procedure can be simplified in the case of linear extrapolation by only using the variance calculated for the minimum λ > 0:

Extension to estimate the naïve measurement variance
It is useful to estimate the variance of the naïve measurement, var(M NAIVE ), particularly when the estimated bias is low or negligible. A parametric bootstrap method can be used for this, in a variation of the approach taken to estimate the voxel variances in positron emission tomography (PET). In the sinogram-based bootstrap approach for PET [39,40], new sinograms are generated by first drawing samples from a Poisson distribution with parameter equal to the corresponding bin values of the original sinogram. The voxel variances are calculated from the reconstructions of the resampled sinograms. A similar procedure can be applied for our purposes by generating resampled projections (or raw data) by adding noise to the 'noise-free' estimates of {P k }. The complete tomographic measurement process is then applied to the resampled pixels, and the measurement process variance estimated. If the measured Q k are taken, on average, to be equal the true (noise free) values, P k , then this bootstrap approach is equivalent to the simulation step of SIMEX for λ = 1. It can be anticipated, therefore, that the variance of M′ for λ = 1 will provide a good estimate of the variance of M NAIVE , when the bias in M NAIVE is 'low'. It is reasonable to expect that the tomographic measurement process will show some robustness, which can be seen theoretically by expanding to second order M b ' and comparing its expected variance to that of M NAIVE calculated over s = 1, … , S repeat scans. In the case of the pixel noise being independent for simplicity, and setting λ = 1: Therefore, the variance over SIMEX repeats of M′ for λ = 1 is expected to provide a good estimate of the variance of M NAIVE , provided that f T is a smooth function and that the bias is low and slowly varying.

A noise model for XCT
Noise in radiographs will contain contributions from detector noise (including thermal, readout and quantization noise) as well as from counting statistics. A complete model of the detector noise characteristics can be developed via a cascaded systems approach, such as that presented in [41]. In this section, a simpler model is developed that is better suited for practical implementation. The signal in the kth pixel can be modelled as follows: where γ k is the contribution to the signal from the detected photons, ε det is the total detector noise (with variance k det σ ) and C dark is the dark current (with variance k dark σ ). The form of γ k depends on the type of detector [42]: if the detector is photon counting γ k will be proportional to the weighted sum over photon energy of the number of detected quanta and will follow Poisson statistics; if the detector is energy integrating (representative of most detectors) then the signal will be proportional to the total imparted energy and will follow compound Poisson statistics. The detector is assumed to have a linear response, and in either case the statistical noise (shot noise) variance of γ k is proportional to the mean value γ k [42,43]. This has been reported to be the case for a range of detectors [42,44]. The dark current is typically measured before the start of the CT scan, by aver- In the case of pixel noise being independent, the variance of T k is then given as follows, as shown in the appendix: .Therefore, resampled transmission images in the 1st stage of the SIMEX method can be generated by adding heteroscedastic noise of the above form. The practical application of the noise model for two microCT systems is considered in section 4.1.

Simulation test case
The feasibility of applying SIMEX to XCT data was assessed using a simulated parallel beam scan of a uniform cylinder over a wide range of noise levels. Two measurements were considered, namely the cylinder cross-sectional area and radius.

Description of the simulation
The cylinder had a radius of 10 voxels and the simulated scan was carried out using a 1D line detector of 201 pixels, with 316 radiographs being taken over 180°. The cylinder was taken to be non-scattering and the transmission through the centre of the cylinder, T c , using monochromatic illumination, was varied from 86% to 2.75% to cover a wide range of materials from soft tissues to highly absorbing super alloys. Gaussian noise was added to the simulated projections using equation (8), with α set to 1 and the noise level ( k ref 0.5 γ − , which was set to be the same for all pixels, corresponding to uniform illumination of the field of view) being varied over a wide range from 0 to 0.25 (detector counts as low as 16), while n ref was set to 21. The detector noise and dark current were set to zero so that the effects of shot noise could be considered in isolation. Filtered back-projection was used to reconstruct the volumetric data and the cylinder was segmented from background by applying a threshold set halfway between the grey levels of the object and background for the no-noise case. The largest connected set of pixels was then selected as being the cylinder and any internal holes were filled.  Examples of reconstructed slices and corresponding binary segmentations are shown in figure 2. Two measurements were considered, namely the cylinder cross-sectional area (with theoretical value of 314 pixels 2 ) and radius (nominal value of 10 pixels) as determined by the maximum of the distance transform within the segmented region. The process was repeated 200 times for each noise level. SIMEX was applied to each simulated scan, using equations (2) and (8) with λ taking the values 0, 0.25, 0.5, 1, 1.5, 2. For each λ > 0, 75 simulation repeats were made. The complete process was repeated 50 times for each combination of scan contrast and noise-level.

Variation of measurements with noise
The measurements of area and radius are shown in figure 3 as a function of transmission and noise level. Area generally increases with noise level as noisy background voxels become included in the segmentation (see figure 2(c)) while radius decreases as holes in the segmentation appear at the cylinder edges. Both measurements vary smoothly and show robustness to noise, as the gradients tend to zero at low noise. In the 'low bias' regime, the bias is close to zero (e.g. <1 pixels 2 for area and 0.05 pixels for radius) and slowly varying, but the standard deviation increases more rapidly with noise-level. More generally, the variation can be complex and not strictly monotonic, depending on both the contrast and noise level.
When the contrast of the object is relatively low (T c ~ 60% or greater), the variation for both measurements can be approximated well by polynomial functions up to high biases, as shown in figures 3(c) and (f). The gradients are close to zero in the low bias regime and are approximately constant at high bias. When the behaviour is more complex (figures 3(a) and (b)), simple extrapolants would still be expected to yield accurate predictions but over a restricted noise range (i.e. close to the low bias regime). In practice, scans are unlikely to be taken with noise levels greater than ~0.1 (see section 4), particularly if manufacturers recommendations are followed (e.g. in the Zeiss Xradia versa user manual recommends a minimum for detector counts of 2000 which corresponds to a noise level of ~0.016, see section 4.1).

Bias reduction
SIMEX was applied using the quadratic and cubic extrapolants that capture the variation of bias with noise level over a reasonable range (see figure 3). To ensure that the extrapolants were well-behaved, the gradients were set to zero at λ = −1. For radius, linear extrapolation was also considered, and a restricted range of λ, namely {0, 0.25, 0.5}, was found to provide more accurate estimates as this was less sensitive to the decrease in gradient at high biases (see figure 3( Figure 4 shows that SIMEX on average can achieve a useful amount of bias reduction over a wide range of contrasts and noise levels, as well as confirm when the bias is low or negligible. For example, the majority of the bias (over 50%) in both area and radius was eliminated at a noise level of γ ref−0.5 = 0.1 for T c = 86.1%. In general, the cubic function provided the better estimates for area, while the linear function was more accurate for radius. The trends in the estimates largely reflect how well the local fit of the function (over λ) extrapolates to capture the lower noise variation. For example, the performance was not as good when the gradient of bias with noise level decreases, as for T c = 13.5% at noise levels The performance of SIMEX as a function of simulation repeats, B, was investigated for T c = 86.1% and a noise level of γ ref−0.5 = 0.125, as shown in figure 5. The SIMEX bias estimates show little variation with the number of simulation repeats, B. However the standard deviation does increase rapidly as B is reduced to below ~17. Practical application of SIMEX will therefore be a compromise between computation time and variability of the SIMEX estimate. These results indicate that a reasonable trade off can be achieved with B as low as ~11.

Estimation of the naïve measurement variance
The standard deviation of the naïve measurement, σ M , was estimated using equation (6), over a wide range of noise levels for T c = 60.7% as shown in figure 6.
The predicted standard deviations for cross-sectional area were in good agreement with the actual values for noise levels up to 0.1 (measured σ M being 3.0 ± 0.5 pixels 2 ), where the bias in the naïve measurement is low. Above this noise level, as the bias in the naïve measurement increases, the predicted standard deviations become an over estimate. The method is less accurate for the radius as this shows a more complex variation with noise level. However, the method can still provide a reasonable approximation for noise-levels up to 0.1 (corresponding to the measured σ M being 0.20 ± 0.02 pixels). At higher noise levels, the method can still provide a good order of magnitude estimate, with the predicted standard deviations being within a factor of 2 of the actual values even at the highest noise level considered here.

Estimation of the SIMEX measurement variance
The variance of the SIMEX estimate, var(M SIMEX ), is shown in figure 7 as a function of increasing noise for T c = 60.6%. The variation is predominantly linear for both area and radius except at low noise levels. Estimates of the variance using equation (5) with λ min = {0.25, 1} are shown for comparison. Overall, better estimates are produced with λ min = 1, which indicates that the majority of the variance is associated with that of the naïve measurement (see equation (6)). It is likely that the values for λ min = 0.25 are affected by the decrease in the gradient at low noise levels (or equivalently when λ is small) as indicated by the curves in figure 7.

Case studies
SIMEX was applied to measurements made on an aluminium cylinder and a mouse femur, which were scanned using two different cone-beam micro XCT systems, but of similar design, namely a Zeiss Xradia microXCT-400 and a Zeiss Xradia versa-520. The scan settings are shown in table 1 and a range of noise levels were considered by varying the exposure time per projection.
Repeat scans were taken to enable the variance of the measurements at each noise level to be calculated. However it was found that systematic changes in the scanners resulted in large variations in the measurements, above that attributable to noise in the scan radiographs. Therefore a segmentation strategy was developed to reduce these systematic effects, which is discussed in section 4.2. The application of the noise model to the two scanners is detailed in section 4.1. SIMEX was applied using code developed in Matlab 2013a (Mathworks, Natick, MA, USA) in which the FDK reconstructions were run automatically by calling the Zeiss Xradia XMreconstructor software from the command line. Code is available from www.github.com/rsbradley/tomotools. The results for the two samples are detailed in sections 4.3 and 4.4.

Application of the XCT noise model
The noise characteristics of the imaging system can be determined from the noise model given by equation (8), by measuring the noise variance, k 2 σ , as a function of pixel intensity for {T k } = 1 (i.e. no object being imaged). However, there are likely to be systematic changes in pixel intensity over repeat radiographs (see section 4.2). Instead, the noise characteristics can be measured from the variance of a small region of a projection image, where Q k ref is approximately constant, provide the net detector noise, k det ψ , is constant or slowly varying over the detector and the correlations between pixels is small. This can be shown mathematically by making use of the fact that for a collection of random variables X i (i = 1, 2 … n) with finite means E(X i ) = μ i and finite variances i 2 σ , the expectation of the sample variance (S 2 ) of X i can be written as [45]: where the last term equals the average value of the off-diagonal elements of the covariance matrix with ρ ij being the correlation coefficient between X i and X j . Zhang et al [44] found that for a Varian flat panel detector there were only non-zeros correlations between the nearest neighbour pixels. 75 repeat images were taken for a range of exposure times for the two microCT systems, operating under the same conditions as detailed in table 1. A central 7 × 7 image patch was taken for calculation of the pixel variances and the correlation coefficients. Figure 8 shows that for the versa-520 system, there were essentially no correlations between pixels, and for the microXCT-400 system that there were only substantial correlations between the 8 nearest neighbours, with correlation coefficients in the range 0.16-0.52. Therefore, the underestimation in the variance by not accounting for the correlations between pixels is expected to be less than 2% (assuming σ i are equal) for the CT systems used in this study. Figure 9 shows that equation (8) can accurately model the pixel standard deviations for both tomography systems used in this work, provided that T k can be estimated. In the following work, 'noise-free' estimates of T k were generated by filtering the transmission images with a median filter of size 5 × 5 pixels.

Systematic variation and segmentation strategy
Systematic changes in the x-ray source and detector over time can lead to changes, between scans, in the grey-scale histograms of the reconstructed volumes. Segmentation based on a constant threshold will therefore be particularly sensitive to these systematic changes.  Characteristics of x-rays sources, such as x-ray flux, spectrum and focal spot position, can vary over a range of timescales. Modern x-ray tubes can achieve a relatively stable output flux, after an initial warm up period, over a typical length of a ~1 micron resolution scan (i.e. <1% variation over 8 h or so). By contrast, the focal spot position can take a few hours to stabilise [11,46]. Over longer times Fukuda et al [47] found that the half-value layer (the thickness of material required to reduce the x-ray flux by half, and is dependent on the x-ray spectrum) varied for a particular x-ray source by ±2% over a 103 week period, while for another source the entrance air kerma increased by ~11%. Similarly, the output of x-ray detectors can vary over a range of timescales [48]. For example, flat panel detectors can show significant lag [49,50]. Over longer times, for both direct and indirect CMOS flat panels, Han et al [51] found that the dark pixel signal increase approximately quadratically with dose, reducing the dynamic range of the detector.
In this work, an automatic strategy for binary segmentation is developed which is less affected by these systematic variations. This 'volumetric thresholding' strategy is based on setting the threshold value such that the segmented volume of object in a given ROI is kept constant across data sets. The ROI must contain object as well as background. This is appropriate to use if one or more regions of the object do not change between scans. If the object does change, or if different objects are scanned, then a 'volumetric standard' could instead be placed in the scanner field of view. The volumetric standard should have approximately the same composition as the objects and it's segmented volume should be the same across all data sets. The strategy is applied by using a threshold to segment the volumetric standard in the 1st data set. The threshold for subsequent data sets is then adjusted until the segmented volume equals that obtained for the 1st. The optimization of the threshold was carried out using the Nelder-Mead simplex method [52] in Matlab.
The robustness of the volumetric thresholding strategy was determined by taking repeat scans with identical scanner settings, as shown in figure 10 for the mouse femur and aluminium samples (see sections 4.3 and 4.4 for further details and scan settings). Care was taken to ensure that the same region of the sample was analysed for all repeats scans, as assessed visually to pixel accuracy. Two other common strategies are shown for comparison namely: (1) set a contrast threshold value for all scans, and (2) calculate the threshold for each data set as 0.5 × (mean grey level of object + mean grey level of surrounding material) from user specified regionof-interests (ROIs). Strategy 2 will be referred to as ROI thresholding.
For both samples, which were scanned on two different systems, there is evidence of systematic changes in the measurements with repeat scan, and hence time. For the constant and ROI thresholding strategies, there are low frequency trends in the measurements (which are almost cyclic in nature   (8) to the data. The fitted parameters were α = 0.242 ± 0.004, ψ det = 14 ± 27 for the microXCT-400, and α = 0.53 ± 0.01, ψ det = 10 ± 35 for the versa-520.
for the aluminium sample), which are largely removed by the volumetric strategy.
The likely contribution from mechanical drift was assessed by retaking the measurements for the 1st scan after shifting the data by ±1 pixel in the vertical direction. Both samples extended outside of the analysis ROI in the vertical direction only, making the measurements more sensitive to shifts in this direction. For the aluminium sample, the volume varied similarly for the 3 segmentation strategies and was in the range 2-4 × 10 −5 mm 3 , well below the variation evident in figure 10(a) which corresponds to the lowest noise data considered in section 4.3. For the mouse femur sample, the variation with pixel shift was in the range 0.3-0.9 μm. Consequently, any notable mechanical drift (not corrected for by the manual assessment to pixel accuracy) would therefore be evident as a trend in figure 10(b) for the volumetric strategy. It can be concluded therefore that mechanical drift is likely to have had only a minimal effect in both case studies presented in the following sections.  2 mm diameter aluminium cylinder was scanned using a Zeiss Xradia microXCT-400 system with settings given in table 1. Scans were taken at a range of noise levels by varying the exposure time per projection from 0.15 s to 1.5 s, corresponding to contrast-to-noise ratios between aluminium (at the centre of the cylinder) and air being in the range 8.4-2.7. Reconstructed slices are shown in figure 11. The volume occupied by the aluminium cylinder in 51 slices was measured, by first applying a grey-level threshold to segment the cylinder. The threshold was set using the 'volumetric thresholding' strategy described above, by matching the volume measured over the upper most 5 slices to that determined for a scan taken with an exposure time of 1.5 s. Small isolated regions in the segmentation (islands) were removed if they were less than 1000 voxels in volume and holes were filled in they were less than 51 voxels. Care was taken to ensure that the same region of the sample was analysed for all repeats scans (to pixel accuracy). Volume renderings of the segmented volumes are presented in figure 11 (middle row). At high noise, segmented regions of the background connected to the cylinder are evident, increasing the volume measured, which is counteracted to some degree by holes within the cylinder.
SIMEX was applied to 3 scans at each exposure time for B = 14 repeats. Two ranges for λ were considered, namely the 'standard range' {0, 0.5, 1, 1.5, 2} and the 'restricted range' {0, 0.5, 1}. Extrapolation was carried out using the quadratic function for the restricted λ range and either a cubic or exponential function for the standard range, with the extrapolent giving the best fit chosen based on the maximum R 2 value. Figure 11 (bottom row) shows the variation of volume with λ for a scan at each exposure time. The curves show that no bias is evident for the 1.5 s exposure time, but at shorter scan times, positive bias becomes apparent which increases with noise level. The predicted volumes given by SIMEX are shown in figure 12(a), together with the naïve values. A reduction in bias is achieved for all exposure times less than 1.5 s (for which no bias is evident). For example, the average bias for the naïve approach (relative to the average naïve value at 1.5 s exposure) is 0.12 ± 0.06% at 0.3 s exposure time compared with 0.009 ± 0.013% for SIMEX with the restricted λ range. Cubic extrapolation with the full λ range was most accurate for the noisiest data (0.15 s exposure time), with the mean bias being only −0.3 ± 0.3%, compared with 1.4 ± 0.4% for quadratic extrapolation with the restricted λ range and 4.2 ± 0.3% for the naïve approach.

Results.
The estimated naïve measurement standard deviations (from SIMEX with λ = 1) are in good agreement with the experimentally determined values over 18-19 repeat scans when the bias is low (see figure 12(b)). As the bias increases above ~0.1%, the SIMEX values are an overestimation. However even for the largest bias, the SIMEX technique can still provide a reliable order of magnitude estimate. Similarly, the standard deviation of the SIMEX estimate was found to be reasonably well predicted by equation (5) for both λ min = 0.5 and 1 (as shown in figure 13) with all predicted values being within factor of 2.2 of the measured values.  The measured values were calculated from 3 repeat scans, while the estimated values were calculated from SIMEX repeats with λ min = 0.5 or 1 using equation (8). Equality between the 2 sets of values is shown by the line. A linear function was used in the extrapolation step of SIMEX.   Methods. The femur was scanned using a Zeiss Xradia versa-520 system with settings given in table 1. A range of noise levels were considered by varying the exposure times between 1.25 s and 0.26 s per radiograph, with corresponding contrast-to-noise ratios between bone and air varying from ~13.5 to ~6.3. Reconstructed slices are presented in figure 14. 201 slices were selected for analysis of bone thickness, and care was taken to ensure that the same region of the sample was analysed for all repeats scans (as determined visually to pixel accuracy). The volumetric thresholding strategy was applied by using the upper 10 slices as the volumetric standard, so that the total bone volume (of the largest connected region) in these slices was kept constant across data sets. Small isolated regions in the segmentation (islands) were removed if they were less than 1000 voxels in volume and the effects of holefilling were considered by removing holes less than 50 voxels in volume. The segmentation was down-sampled by a factor of 2 in each dimension for calculation of the mean trabeculae thickness via the local thickness technique [53], implemented in the ImageJ plugin [54]. SIMEX was applied to 3 repeat scans taken at each of exposure time of 1 s, 0.45 s and 0.26 s with B = 14 simulation repeats and λ = {0 0.5 1 1.5 2}. Linear extrapolation was applied using λ = {0 0.5 1}. Figure 14 (bottom row) clearly shows the effects of noise on the segmentation, leading to high spatial frequency surface texture and the appearance of holes within   the structure. The noise leads to a general reduction in thickness measured throughout the bone ROI as indicated by the histograms in figure 15(a). The naïve mean thickness is shown in figure 15(b) and decreases essentially linearly with 1/t and hence noise level. This indicates that even at 1.25 s exposure time negative bias is evident. Extrapolating to t = ∞, gives an estimate of the bias-free mean thickness of 63.24 ± 0.09 μm without hole-filling and 64.38 ± 0.05 μm when hole-filling is applied. These values may be overestimates as it can be anticipated that the gradient of the variation should tend to zero as t →∞ (as found in section 3). However they do suggest that hole-filling introduces a small positive bias (on the order of 2%) by removing fine features; although hole-filling also introduces greater robustness to noise as demonstrated by the gradient in figure 15(b) being lower. A similar approximately linear relationship is found for SIMEX between mean thickness and λ, as shown in figure 16. Again, the simulation step confirms that hole-filling introduces some robustness to added noise.

Results.
The SIMEX estimates of noise-free mean thickness for linear and quadratic extrapolation are show in figure 17. Both extrapolation methods lead to more consistent results with decreasing exposure time, in comparison to the naïve measurements, indicating a clear reduction in relative bias. The values derived from linear extrapolation are also in good agreement with the corresponding values obtained from the direct linear extrapolation of the naïve measurements presented in figure 15(b). However, since the naïve (measured) value for the longest exposure time of 1.25 s is likely to contain bias, the absolute accuracy of the SIMEX extrapolation cannot be verified. Alternatively, SIMEX can be used to estimate the mean thickness for 1.25 s exposure time by extrapolating to λ * = t/1.25 − 1 for t ⩽ 1.25 s (see equation (8) and taking ψ det to be negligible). The estimated and measured bias relative to the naïve value at 1.25 s are shown in figure 18. Overall, SIMEX leads to a large reduction in bias, over 85% in almost all cases.
The standard deviations of the naïve measurements were determined experimentally from 27 to 28 scans at exposure times of 1.5, 0.45 and 0.26 s. The predicted values from SIMEX (with λ = 1) were calculated for 3 scans at each exposure time, and a comparison between the predicted and experimentally determined values is given in figure 19. Overall, the SIMEX technique produces reasonable estimates (to within a factor of 1.6) for all exposure times considered, although the agreement is better for the lowest noise data as would be expected. Both the predicted and measured values show that hole-filling leads to reduced variation as well as reduced bias. However, the predicted values are typically underestimates, which may indicate that there is an additional source of variation in the measured data. For example, movement of the sample by 1 pixel was found in section 4.2 to cause a change in the mean thickness of between 0.15-0.45 μm. While no sample drift was evident, random sub-pixel movement may increase the measured standard deviation noticeably.
The standard deviation of the SIMEX estimate was found to be reasonably well predicted by equation (5) for λ min = 0.5. With quadratic extrapolation and no hole-filling for instance, the measured standard deviations (over the repeat 3 scans) were in the range 0.15-0.63 μm for exposure times between 1.5 s and 0.26 s. The corresponding predicted values were in the range 0.21-0.32 μm. Similar values were also obtained with λ min = 1, in the range 0.20-0.24 μm. Again, the larger measured values may be attributed to sub-pixel movement of the sample between scans. The measured standard deviations were greater when linear extrapolation was used; by on average a factor of ~1.75. This may be due to greater variation when fitting the linear extrapolant using fewer λ values, which could be counteracted by increasing the number of simulation repeats.

Discussion and conclusions
SIMEX has been successfully adapted to tomographic measurement processes for estimation of bias and variance arising from noise in the acquisition. SIMEX is flexible in that any particular processing steps can be accommodated, including for example different reconstruction and automatic segmentation strategies. The technique is reliant on the distribution of the noise being known and the variation of the measurement with increasing noise level being monotonic and approximated well by simple functions (e.g. low order polynomials). The latter requirement was considered in detail for application to x-ray tomography scans and was found to be the case for volume and thickness measurements over a wide range of contrasts and noise levels in a simulation study, and also in practice for typical scan settings. However for some combinations of contrast and noise level, the simulation study showed that the variation can be more complex. All measurements showed some robustness to noise, as denoted by the 'low bias' regime in which the variance increases much more rapidly than the bias. The SIMEX based estimates of both bias and variance were most accurate in or close to this regime. The key findings from the case studies can be summarised as follows: • Bias estimation: Considerable reduction is bias is achievable, typically over 80% for the case studies. • Estimation of the naïve measurement variance: equation (6) can provide an order of magnitude or better estimate of the standard deviation of the original naïve measurement. The estimates were typically within a factor of 2 of the measured standard deviations. • Estimation of the SIMEX estimate variance: equation (5) can provide an order of magnitude or better estimate of the standard deviation of the original naïve measurement. The estimates were typically within a factor of 2 of the measured standard deviations.
Together these estimates facilitate the statistical comparison of measurements made on different specimens or on the same sample over time. Other key applications include: • Confirmation of low bias. The plot of M′(λ) versus λ can reveal when the bias in the naïve estimate is low or well within the measurement variance. • Comparison of the robustness of reconstruction and segmentation strategies. The plot of M′(λ) versus λ can be used to compare the robustness introduced by differing processing steps. For instance, hole-filling was clearly found to increase robustness in the femur case study. Furthermore the robustness to systematic variations can be investigated by comparing the variance of the naïve measurement over repeat scans to the SIMEX estimate. • Faster and lower dose scanning. The approach can enable scan times to be minimised but still achieve a given accuracy or precision. The accuracy of the extrapolation can be verified taking a scan with low noise. A key application would be to time-lapse XCT studies, where a low noise scan is typically taken first before monitoring changes within a sample (e.g. crack growth or compression/tension experiments) using shorter scans.
The wider applicability of the approach to different measurements and/or tomographic imaging modalities would need to be established on a case by case basis, by for example simulation studies; however it seems reasonable to assume that a simple extrapolent would generally be suitable at least over a limited range of noise levels. For higher noise, low dose scanning more appropriate extrapolents could be determined for specific applications from simulation studies or from measurements made on a suitable 'calibration' sample. The main disadvantage of SIMEX is that is highly computationally expensive. However, both the simulation and the case studies showed that a relatively low number of repeats (e.g. B ⩽ 17) were required to achieve good estimates, and for some measurements (e.g. thickness) only 2 values of λ were required. Furthermore tomographic reconstruction times are reducing with the use of computers with multiple GPUs. For example, reconstruction times in this study were ~10 min for a 2000 3 voxel volume. The process could also be speeded up by applying SIMEX to a limited (but representative) ROI within the scan volume.