Use of the temporal median and trimmed mean mitigates effects of respiratory motion in multiple-acquisition abdominal diffusion imaging

Respiratory motion commonly confounds abdominal diffusion-weighted magnetic resonance imaging, where averaging of successive samples at different parts of the respiratory cycle, performed in the scanner, manifests the motion as blurring of tissue boundaries and structural features and can introduce bias into calculated diffusion metrics. Storing multiple averages separately allows processing using metrics other than the mean; in this prospective volunteer study, median and trimmed mean values of signal intensity for each voxel over repeated averages and diffusion-weighting directions are shown to give images with sharper tissue boundaries and structural features for moving tissues, while not compromising non-moving structures. Expert visual scoring of derived diffusion maps is significantly higher for the median than for the mean, with modest improvement from the trimmed mean. Diffusion metrics derived from mono- and bi-exponential diffusion models are comparable for non-moving structures, demonstrating a lack of introduced bias from using the median. The use of the median is a simple and computationally inexpensive alternative to complex and expensive registration algorithms, requiring only additional data storage (and no additional scanning time) while returning visually superior images that will facilitate the appropriate placement of regions-of-interest when analysing abdominal diffusion-weighted magnetic resonance images, for assessment of disease characteristics and treatment response.


Introduction
Diffusion-weighted magnetic resonance imaging (DWI) provides contrast relating to diffusion of water molecules, and can thus inform on tissue microstructure, which aids disease detection and characterization and the evaluation of treatment response , Koh and Collins 2007, Padhani et al 2009. With DWI now widely utilized in functional imaging protocols, the diffusion-weighted images and parameter maps derived need to be of sufficient quality; in the abdomen, respiratory motion is a significant confounding factor (Dietrich et al 2010). Strategies to ameliorate the effects of motion may be prospective, such as breathholding and respiratory triggering (Kartalis et al 2012, Jerome et al 2014, Seif et al 2014, or retrospective image registration (Saeed 1998, Crum et al 2004, Mazaheri et al 2012. While such strategies are often successful in improving image quality, the improvements are gained with trade-offs: multiple breath-holding may be patient-dependent and uncomfortable; triggering can prolong acquisition times; 3D image registration is challenging using 2D image datasets, and is computationally expensive while requiring images to be removed from the MR scanner workflow. On commercial MR systems, extra-cranial diffusion-weighted (DW) images acquired using multiple signal averages and different diffusion directions are usually combined to form signal-averaged trace-weighted images. This is achieved by averaging the images acquired using different diffusion-encoding directions (usually three) to derive the trace of the diffusion tensor, and multiple signal averaging is employed to both improve signal-to-noise and average over motion.
Using echo-planar imaging (EPI), the most widely used readout for body DWI, each image is read out fast enough (~ < 80 ms) to freeze respiratory motion in a 'snapshot'. It is the averaging of repeat images and diffusion directions acquired in this way that leads to image blurring, locking in the respiratory motion; the established methods of image processing and storage are not able to address this problem. The ability to access and store all images acquired at each signal average or diffusion direction would allow further image manipulation to remove motion and therefore reduce blurring without increasing examination time (Chang et al 2005). This merely requires a larger amount of data storage, which allows one to consider the images as samples of a distribution of signal intensities linked to various tissue properties (principally T 2 relaxation and tissue diffusion characteristics), but also modulated by respiratory motion. Conducting such studies in free-breathing also provides the most efficient data acquisition, and maximizes patient comfort through shorter scan time (Jerome et al 2014).
In this prospective study, we collect and store each acquired image separately-each b-value, diffusion gradient direction, and signal average-in order to interrogate individually acquired signal intensities. We show that replacement of the conventional mean voxel intensity averaged over multiple acquisitions at a given b-value with either the median or trimmed mean voxel intensity leads to visually superior images, with improved boundary sharpness and retained anatomical features, despite the fact that image registration is not employed. The median is a robust estimator and is widely used in image processing as a temporal filter to separate background from moving objects (Piccardi 2004). Motion in medical images presents as a similar temporal phenomenon, although the effective temporal sampling rate between acquisitions in the current context is much slower than the speed of motion. Trimming the acquired signal distribution of a pre-determined number of the largest and smallest samples and then taking the mean combines the outlier-resistance of the median with the statistical power of the mean, and can be tuned to an expectation of the quality of the data obtained. Independently taking the median value per voxel over multiple acquisitions is applicable across entire images, and is not limited to addressing motion in local regions such as is achieved by point-based registration.

Materials and methods
This was a prospective study approved by the institutional review board and informed consent was obtained from all volunteers.

Study population and image acquisition
Ten healthy volunteers (5 female, 5 male, mean age 30 years, range 24-46 years) were recruited. All imaging was performed using a diffusion imaging work-in-progress package on a 1.5T MAGNETOM Avanto clinical scanner (Siemens AG, Healthcare, Erlangen, Germany), using a standard body array and spine coils centered on the kidneys. Coronal imaging of the abdomen was performed to include the kidneys, liver and spleen. Acquisition parameters for DWI were: prototype 2D EPI sequence, allowing separate storage of all images, with optimized bipolar (twice refocused spin echo) diffusion weighting scheme (Feiweier 2011), coronal acquisition, TR 4000 ms, TE 72 ms, pixel size 1.5 mm 2 (interpolated) in-plane, bandwidth 1628 Hz/pixel, slice thickness 5 mm, 16 contiguous slices, 128 × 128 matrix with parallel imaging factor 2. Nine b-values were acquired (0, 20, 40, 60, 80, 100, 250, 500, 750 s mm −2 ) for three orthogonal diffusion directions, with the image from each direction being stored separately. Multiple signal averages were also stored as separate images, with the total protocol having a whole number of signal averages to fit within approximately 10 min, a realistic constraint for inclusion of DWI into a clinical protocol.

Data analysis
Assessment of the extent of movement apparent in the mean image is made possible by examination of the separately acquired images; the average range of the displacement of the upper and lower boundaries of a single kidney throughout the acquisition was measured for each volunteer. Diffusion modelling was performed on a voxel basis to provide initial signal intensity (S 0 ) and a perfusion-insensitive Apparent Diffusion Coefficient (ADC) from a monoexponential diffusion model (equation (1)), using images with b-values ≥100 s mm −2 , from separate acquisitions and diffusion directions and using a Levenberg-Marquardt algorithm. The bi-exponential intravoxel incoherent motion (IVIM) model (Le Bihan et al 1988) was also fitted (equation (2)), for all b-values using a Markov Chain Monte Carlo (MCMC) method as a robust least-squares estimator to return perfusion fraction, diffusion coefficient, and pseudo diffusion coefficient (f, D, D*). The compound parameter fD* was also calculated. All fitting was performed using proprietary software (ADEPT, Institute of Cancer Research, UK).
Images were created for each acquisition b-value using: (i) the mean voxel signal intensity at each voxel for all images for each b-value ('Mean'), (ii) the median voxel signal intensity at each voxel for all images for each b-value ('Median'), and (iii) the trimmed mean where the central 60% of voxel signal intensities at each voxel for all images for each b-value were retained and averaged ('Trimmed'), in every case independent of diffusion gradient direction. Fitting was performed for the two diffusion models as described, and the resulting maps of S 0 and ADC from the mono-exponential fitting were matched for brightness and contrast, their order was randomized, and the images scored on a 5-point scale (5 indicating a superior image) by an expert radiologist and consultant scientist, blinded to the processing used to create each image, in the following categories: overall image quality of S 0 map, degree of blurring in S 0 map, clarity of physiological structures in S 0 map, and overall image quality of ADC map.
Regions of interest (ROIs) of approximately 100 voxels were drawn in the renal cortex and liver for comparison with literature values, and an ROI was also drawn in the paraspinal muscle to give comparative data for a non-moving tissue. Statistics and diffusion parameter values are presented as median ± standard deviation for the ROI; statistical testing was done using a t-test with significance defined at p < 0.05.

Results
The number of signal averages acquired in 10 min was either 5 or 6, dependent on operator, giving either 15 or 18 separate images for each b-value, where equivalently the standard vendor acquisition would return only 1 average trace-weighted image. The raw images, averaged together per b-value, showed a large amount of blurring due to respiratory motion as expected. A typical averaged b = 250 s mm −2 image is shown in figure 1(a), producing poorly defined tissue boundaries which is commonly seen in standard abdominal DWI protocols. Scores indicating (lack of) blurring were 2.9 ± 0.8 and 3.4 ± 0.8 from each of the two observers, out of a maximum 5. Examining the range of motion for each volunteer, the kidney boundaries are seen to move over a range of 10.0 ± 4.2 mm, or 6.7 ± 2.8 voxels.
Marked visual improvement is observed when moving to either the median or trimmed mean, reflecting the exclusion of images where the position in the respiratory cycle can be considered an outlier against the overall sample distribution. Figures 1(b) and (c) show corresponding median and trimmed mean b = 250 s mm −2 images; the boundaries of the kidneys, liver, and spleen are all visually sharper, and physiological features within tissue are retained with decreased blurring. Figure 1(d) is the difference image between the mean and median, and shows that while there are differences over moving structures, there is no difference in non-moving structures such as the lower spine and paraspinal muscles. A 1D projection across the liver and kidney, figure 1(e), clearly shows the sharper boundaries in the median image (black arrows), as well as better delineation of structural features such as hepatic vein and renal pelvis (white arrows) and reflects the clear visual improvement of the images. In general, the profile for the mean in general appears 'smoothed', on the order of a few voxels and as such is consistent with the expected effects of motion blurring. The visual improvements for median and trimmed mean persist in the calculated S 0 and ADC maps, shown in figure 2, as well as for the f and D maps derived from the IVIM model. The black voxels in figure 2 correspond to areas where low signal failed to meet an arbitrary (conservative) Figure 3. Fitting the ADC model for a voxel near the upper liver boundary, use of the median is more likely to represent a single tissue, where the mean gives a weighted average of all tissues that occupy that voxel through the acquisition, manifesting as blurring in the ADC map. For this voxel, location shown in the inset image, ADC (mean) = 117.5, ADC (median) = 105.9 mm 2 s −1 .
N15 threshold for fitting, and can be seen to correspond to regions of low intrinsic signal, such as the lungs, or high flow in larger vascular structures giving accelerated dephasing. The full data for a single voxel towards the top boundary of the liver is shown in figure 3. In this case the distribution is built from sampling two tissues that temporally overlap due to motion, and so the median at each b-value better represents one of the tissues, whereas the mean is an average of the two that does not represent either tissue. Also, in such regions the IVIM model fails to accurately to capture pseudo-diffusion effects, attempting to fit the perfusion component to changes in signal arising from varying contrast between tissues or significant Where the voxel remains within the movement of homogeneous tissue ( figure 4(b)), the bi-exponential curves returned from IVIM modeling on mean and median images are much more closely related.
Results from visual scoring of S 0 and ADC image values are given in table 1, where the mean score ± standard deviation in each of the assessment criteria is shown for both observers. There was a systematic difference in the absolute scores between the two expert readers (p < 0.02 in each category), but both reported a clear improvement in image quality across all categories when considering the median parameter maps of S 0 and ADC. One observer also reported a significant improvement in quality when using the trimmed mean for most criteria (all p < 0.05, except lack of blurring with p = 0.089). Of the 10 volunteers, the images acquired from one were of sufficiently low quality that the use of the median image, while recorded as superior to the mean, did not reduce the apparent motion by a notable amount.
The ADC values calculated for ROIs placed in the liver and kidney are shown in table 2; there is a systematic decrease in the ADC value reported using the median compared to the mean, though both values are consistent with previously reported values of ADC for these tissues (Taouli 2011). The observed larger standard deviations for the median and trimmed mean may indicate a reduced homogenization by blurring of two distinct tissue types within the ROI. The ADC for paraspinal muscle is unchanged from the mean to median and trimmed mean, and shows that there is no introduction of bias for a non-moving tissue. Similarly, the parameters returned from the bi-exponential IVIM fitting (table 3) show no systematic changes except for the pseudodiffusion constant D* for trimmed mean of the liver, which further suggests that there is no significant degradation of the information returned from ROI aggregate statistics from diffusion modelling. S0 overall 2.7 ± 0.7 3.7 ± 0.9 3.7 ± 1.0 3.2 ± 0.6 4.4 ± 0.7 3.6 ± 1.1 S0 blurring 2.9 ± 0.8 3.5 ± 1.0 3.3 ± 0.9 3.4 ± 0.8 4.2 ± 0.9 3.5 ± 1.1 S0 structures 3.0 ± 0.8 3.7 ± 0.9 3.6 ± 1.0 3.5 ± 0.7 4.3 ± 0.9 3.8 ± 0.9 ADC overall 2.7 ± 0.7 3.7 ± 0.8 3.4 ± 0.9 3.3 ± 0.8 4.1 ± 0.7 3.6 ± 1.3

Discussion
Respiratory motion blurring in abdominal DWI may obscure physiological features or result in errors when drawing regions of interest. Conventional averaging of images effectively locks the motion into the image, and provides no opportunity to examine the sources and extent of blurring. Hence, storing of images acquired as separate signal averages or diffusion directions is necessary for more advanced post-processing methods to be applied, such as the identification of outliers, registration schemes that operate at the individual 'snapshot' level rather than the (already averaged) trace image level, and statistics other than the mean. In this study, the use of the median image calculated at each b-value is shown to yield diffusion parameter maps that are visually superior to those derived from the mean image for both ADC and IVIM models, with the median ADC parameter maps scoring higher for a number of relevant criteria. The trimmed mean is also shown to be visually superior, though less so than the median for these data. Region-of-interest diffusion parameters derived from the ADC model are shown to be equivalent to those derived from the standard mean image for nonmoving structures, and the differences observed in ADC for liver and kidney may represent a removal of bias introduced by motion. Parameters from the IVIM model in the same regions are unaffected for ROI statistics when using the median, which may reflect the ability of the more complex model to fit the data, or the greater inherent instability in perfusion-related parameters.
Whilst direct registration is in principle the best approach, this is currently an unsolved problem; difficulties associated with registration of abdominal structures coupled with the fact that clinical diffusion measurements are not acquired as 3D volumes renders this task very challenging. This is made more difficult by the varying contrast at increasing diffusion b-value and the inherent low signal-to-noise at higher b-values. It is therefore desirable to have a simple strategy to reduce blurring without relying on complex registration techniques, and without negatively impacting on patient experience. Separate storage of images acquired in free-breathing allows efficient sampling during acquisition, and provides access to the samples in such a way that the distribution of signal intensities can be evaluated using statistics other than the mean, and opens the possibility for weighted least-squares approaches and bootstrap methods to estimate parameters. These data support a recommendation that MR scanner vendors include the facility to separately store signal averages for such functional imaging sequences, allowing retrospective application of any advanced processing strategies including but not limited to those considered in this study. With separate storage of images, identification and culling of outliers becomes possible, as does efforts to register data suffering from displacement or distortion owing to respiration. The computational effort intrinsic to such involved techniques, as well as operator input, would invariably require removal of Table 3. IVIM parameter values. Units are f (%), D (10 −5 mm 2 s −1 ), D* (10 −2 mm 2 s −1 ), and fD* (10 −4 mm 2 s −1 ). Bold indicates significant (p < 0.05) against mean (paired t-test; median to trim was p > 0.05 in all cases). the images from the scanner workflow, however, and so it remains desirable to have a simple method that can effectively tackle respiratory motion blurring. It is worth noting that given separate image storage, such methods may be equally applied to respiratory-triggered acquisitions that, while often successful is reducing apparent motion, may still benefit from additional processing such as use of median images. It is well-known that the median is more robust to outliers than the mean, with the trimmed mean offering a tunable combination of both forms of averaging. Noise in MR images typically has a Gaussian distribution (in k-space), and with sufficient signal-to-noise, this extends to magnitude images. For a Gaussian distribution both the mean and median are unbiased estimators, where the standard deviation of the sample median is larger than the sample mean. The ratio of these standard deviations depends on the sample size: for very large samples the increase is 25%, for the number of samples used here (15-18) it is around 30% and for three samples (minimum for use of the median) it is 45%. Thus in regions not affected by motion one would expect a modest increase in signal noise, which would propagate through to the estimated parameters, whereas in regions affected by motion the Gaussian assumption is violated and the robustness of the median to outliers becomes advantageous. The improvement conferred by the median in the presence of motion depends in a complex manner on a number of effects, including the nature of the motion and the image features in the affected area. Although breathing is quasi-periodic, the proportion of time spent at inspiration and expiration is typically biased towards expiration; in the limit of a very large number of repeat acquisitions, the complete distribution of signal intensities in any voxel can be thought of as a sum of component distributions, each arising from a tissue appearing in that voxel location over the course of the breathing cycle. This distribution therefore has a dominating component corresponding to the tissue occupying that voxel at expiration. In the extreme example of a voxel close to the boundary between two tissues with very different signal intensities, the total signal intensity distribution for this voxel will have two peaks located at the signal intensities of the two tissues, but with a larger peak for the tissue corresponding to expiration. In this case the optimal signal value to report would be the location of the higher peak, and for a bi-modal distribution of this kind, the median will be closer to the optimum than the mean. Conversely, in regions where the contrast between image features is low, or the spatial extent of such features is larger than the range of motion, the effects of motion will automatically be minimal, in which case the smaller errors of the mean would be preferred. Variation in signal intensity will have several origins, but where signal variation is dominated by motion causing tissues of different contrast to overlap the median will act to remove the bias. These effects are illustrated in figure 3, where the diffusion decay curves derived from mean and median images are shown against the signal intensities for a voxel on the boundary region between the kidney and liver; the median curve better reflects the kidney, and the boundary appears sharper in the resulting ADC map. It is the balance of these and other effects that contribute towards the overall preference of the median over the mean, as measured by the visual scoring that includes sharpness of features such as tissue boundaries. The behavior of diffusion model parameters at tissue boundaries, and within homogeneous regions of tissue, are clearly shown by the distributions of signal intensities of individual voxels. The use of a trimmed mean, presented here for completeness but not optimized for our scanner/volunteer characteristics, allows a balance between mean and median, and might be considered suitable where expectation of the number of outliers is known and the degree of trimming can be tailored to the scanner and/or patient concerned. Where such knowledge is unavailable, and the requirement for operator review/input is undesirable, the effects of outliers from motion are effectively removed by use of the median, with only modest cost of increased noise in regions unaffected by motion.
Limitations of using the median as an estimator are the requirement that the acquisition returns sufficient samples, and the assumption that the measured diffusion is isotropic. Within the limits of a clinical multiple b-value DWI protocol lasting 10 min, this first requirement is fulfilled by acquiring 3 diffusion directions and 5 signal averages, although there will be an inevitable decrease in signal-to-noise ratio when moving from the mean to median, as a function of number of samples taken. For applications in oncology where neoplasms are highly disordered, the assumption that diffusion is isotropic is likely to be valid, in comparison to ordered structures like the kidney where diffusion anisotropy information would be discarded by taking the median sample. Although the median sample will arise from an individual image and thus reflect diffusion encoding in one direction only, different directions may well be represented in the medians across b-values, and though the mean values at each b-value are necessarily influenced by anisotropy, conventional calculation of a 3-scan trace-weighted ADC in standard protocols does not provide information on either degree or direction of anisotropy.
In conclusion, the use of the median image for calculation of diffusion-weighted imaging parameters significantly improves the visual appearance of the resulting parameter maps, retaining physiological features and returning sharper tissue boundaries, and does not compromise the quality of derived statistics. This technique is simple and computationally efficientboth separate storage and median processing can be implemented with relative ease-and offers an alternative to expensive registration algorithms; an increased confidence in the ability to appropriately position ROIs may assist analysis of DWI images for use in assessment of disease characteristics and treatment response.