Accuracy and precision of ultrasound shear wave elasticity measurements according to target elasticity and acquisition depth: A phantom study

Objective To investigate the accuracy and precision of ultrasound shear wave elasticity measurements as a function of target elasticity and acquisition depth. Materials and methods Using five ultrasound systems (VTQ, VTIQ, EPIQ 5, Aixplorer, and Aplio 500), two operators independently measured shear wave elasticities in two phantoms containing five different target elasticities (8±3, 14±4, 25±6, 45±8, and 80±12 kPa) at depths of 15, 30, 35, and 60 mm. Accuracy was assessed by evaluating measurement errors and the proportions of outliers, while factors affecting accuracy were assessed using logistic regression analysis. Measurement errors were defined as differences between the measured values and 1) the margins of the target elasticity, and 2) the median values of the target elasticity. Outliers were defined as measured values outside the margins of the target elasticity. Precision was assessed by calculating the reproducibility of measurements using the within-subject coefficient of variation (wCV). Results Mean measurement errors and the proportions of outliers were higher for high than for low target elasticities (p<0.001), but did not differ in relation to acquisition depth, either within an elastography system or across the different systems. Logistic regression analysis showed that target elasticity (p<0.001) significantly affected accuracy, whereas acquisition depth (p>0.05) did not. The wCV for the 80±12 kPa target (31.33%) was significantly higher than that for lower elasticity targets (6.96–10.43 kPa; p<0.001). The wCV did not differ across acquisition depths. The individual elastography systems showed consistent results. Conclusions Targets with high elasticity showed lower accuracy and lower precision than targets with low elasticity, while acquisition depth did not show consistent patterns in either accuracy or precision.

Introduction Ultrasound (US) elastography is a noninvasive imaging modality for assessment of tissue stiffness. US elastography techniques can be classified as shear wave or strain imaging [1,2], and three modalities are available for shear wave imaging: transient elastography, point shear wave speed measurement, and shear wave speed imaging. Shear waves are generated by controlled external vibrations in transient elastography (Fibroscan, Echogen), and by acoustic radiation force impulses in point shear wave speed measurement (Virtual Touch Quantification [VTQ]; EPIQ 5) and shear wave speed imaging (Virtual Touch Image Quantification [VTIQ]; Aixplorer; Aplio) [1].
New US elastography systems are currently being developed, and US shear wave elastography has been approved for clinical use by the United States Food and Drug Administration (FDA). To validate shear wave elastography as a quantitative imaging biomarker, its accuracy and precision should be guaranteed. Accuracy and precision are essential and fundamental for quantitative imaging modalities [3].
As shear wave elastography is currently being utilized in clinical practice [4][5][6][7][8][9][10], determination of its accuracy and precision is crucial for measurements of tissue stiffness, especially those of the liver, thyroid, breast, and prostate [4][5][6][7][8][9][10]. A previous study assessing precision reported higher variability for the higher elasticity phantom [11]. Additionally, tissue attenuation may dampen ultrasound signals as a function of acquisition depth, limiting the accurate measurement of deeper tissue or organs [12]. However, to our knowledge, no phantom studies have evaluated the influence of target elasticity and acquisition depth on the accuracy of shear wave elastography. This study therefore investigated the effects of target elasticity and acquisition depth on the accuracy and precision of US shear wave elasticity measurements.

Phantoms
Model 049 and 049A QA phantoms were obtained from Computerized Imaging Reference Systems (CIRS; Norfolk, Virginia, USA). These phantoms are manufactured using Zerdine, a solid-elastic polymer with elasticity properties that can be controlled independently of its acoustic properties [13]. The model 049 QA phantom contains sets of spherical mapping targets of 10 mm (depth, 15 mm) and 20 mm diameter (depth, 35 mm). The model 049A QA phantom contains sets of stepped cylinders that vary in diameter from 1.6 to 16.7 mm. The targets of 6.5 mm, 10.4 mm, and 16.7 mm diameters were used in this study. The stepped cylinders in each set are located at depths from 30 to 60 mm, with these depths referring to the centers of the spherical and cylindrical targets. Both phantoms contain materials of five different elasticities, consisting of four types of simulated lesions with elasticities of 8 ± 3, 14 ± 4, 45 ± 8, and 80 ± 12 kPa, and background material of elasticity 25 ± 6 kPa. This study performed measurements on five targets, of elasticities 8 ± 3, 14 ± 4, 25 ± 6, 45 ± 8 kPa, and 80 ± 12 kPa.
Measurements were performed by two operators, with operator 1 being a neuroradiologist with 1 year of experience with US elastography, and operator 2 being a pediatric radiologist with 5 years of experience with US elastography. Each operator acquired a series of ten consecutive shear wave elasticity measurements on each of the systems, and the mean of these ten measurements on each system, each depth, and each transducer was calculated. A total of 14 series of ten measurements were performed by each observer for one target. Among eight series of ten measurements with a linear transducer, four series of ten measurements were performed at 15 mm and 35 mm, respectively. Among six series of ten measurements with a curved transducer, three series of ten measurements were performed at 30 mm and 60 mm, respectively.
For the 049 phantom, shear wave elasticity was measured on five different elasticity targets (8 ± 3, 14 ± 4, 25 ± 6, 45 ± 8 kPa, and 80 ± 12 kPa) at two depths of 15 (target size: 10 mm) and 35 mm (target size: 20 mm) using a linear transducer. For the 049A phantom, shear wave velocity was measured on the same five elasticity targets with target size of 16.7 mm at depths of 30 and 60 mm using a curved transducer. Regions of interest (ROIs) were placed onto homogeneous target regions of the phantom. For the Aixplorer and Aplio 500 systems, ROIs with respective maximum diameters of 10 and 9 mm were used, whereas, for the VTQ, VTIQ, and EPIQ 5 systems, the ROI sizes were fixed (5 × 6, 1.5 × 1.5, and 10 × 12 mm, respectively). ROIs were placed onto B-mode images, in the center of the targets at the predetermined acquisition depths. For the phantom background (25 kPa), elasticity was measured in the same scanned region using all techniques. In the shear wave speed imaging (VTIQ, Aixplorer, and Aplio 500), images were fully filled, in color, for all measurements. Additionally, we measured small (10.4 mm and 6.5 mm) targets in the 049A phantom using curved Aixplorer and Aplio 500 probes, which can decrease the size of the ROIs. We used an ROI diameter of 8 mm for the target size of 10.4 mm, and an ROI diameter of 5 mm for the target size of 6.5 mm.

Statistical analysis
The shear wave elasticity of each phantom was measured ten consecutive times by each operator for each imaging depth and each elasticity target, with the results being summarized as the mean value and standard deviation.
To assess the accuracy of measurements, measurement errors and the proportions of outliers were calculated and compared among target elasticities and acquisition depths. Measurement errors were defined as the differences between the measured values and the margins of the target elasticity values, and outliers were defined as measured values outside of the margins of the target elasticity values. Results were compared by repeated measures ANOVA and McNemar's test. Factors potentially affecting accuracy were assessed by logistic regression analysis according to the target elasticities and acquisition depths.
To assess precision, repeatability was calculated by determining the within-subject coefficient of variation (wCV). This coefficient is indicative of the within-subject variability of parameters and is expressed as a percentage; it was obtained by dividing the within-subject standard deviation by the group mean. A wCV > 50% was regarded as indicating unreliability for clinical implementation [17]. The wCVs were obtained according to the target elasticities and acquisition depths, and were compared among them. The equality of the wCVs was assessed using an asymptotic test [18]. In addition, to investigate the degree of variation across multiple measurements of a single target with an individual elastography system, the coefficients of variation for each measurement were calculated using the following equation: coefficient of variation = standard deviation / mean value × 100% [14]. As the coefficients of variation became larger, the reliability of any single measurement decreased.
Accuracy and precision according to linear and curved transducers was evaluated. In addition, accuracy and precision for small targets (10.4 mm and 6.5 mm) was also evaluated. Interobserver agreement was evaluated by Bland-Altman analysis, and the differences between the measurements of different observers are reported as mean differences and the 95% limits of agreement in elasticity for each shear wave imaging technique and transducer [19]. All statistical analyses were performed using R version 3.4.1 (The R Foundation for Statistical Computing) with the "EntropyExplorer" package [20], and MedCalc software (version 18.6). A p value < 0.05 was regarded as statistically significant.
Logistic regression analysis was performed to determine the effects of target elasticity and acquisition depth on accuracy (Table 5); this showed that target elasticity significantly affected accuracy (p < 0.001), whereas acquisition depth did not (p > 0.05).
Results from the individual elastography systems showed that the coefficients of variation (S2 Table and S3 Table) did not significantly differ across different target elasticities. Furthermore, the wCVs did not significantly differ across acquisition depths (S1 Table); wCV was not compromised by increases in acquisition depth up to 60 mm.

Discussion
This study evaluated the accuracy and precision of US shear wave elasticity measurements for targets of different elasticities and at different acquisition depths. We found that targets with an elasticity of 45 ± 8 kPa and 80 ± 12 kPa showed a significantly higher proportion of outliers (82.1%, each) and higher measurement errors (5.99 and 21.01 kPa, respectively;) than targets with elasticities of 8 ± 3, 14 ± 4, and 25 ± 6 kPa. Logistic regression analysis showed that target elasticity significantly affected accuracy, whereas acquisition depth did not. The wCV for the 80 ± 12 kPa target (31.33%) was significantly higher than that for the targets of 8 ± 3, 14 ± 4, 25 ± 6, and 45 ± 8 kPa (6.96-10.43 kPa; p < 0.001). The wCVs did not significantly differ across acquisition depths, with individual elastography systems showing consistent results. Taken together, targets with high elasticity showed lower accuracy and lower precision than targets with low elasticity, while acquisition depth did not show consistent patterns in either accuracy or precision. In this study, the target with the high elasticity targets (45 ± 8 kPa and 80 ± 12 kPa) yielded lower accuracy than the targets with lower elasticity. These high elasticity targets showed a significantly higher measurement error and higher proportion of measurement errors than targets of lower elasticity. Logistic regression analysis showed that target elasticity significantly affected accuracy. Moreover, the 80 ± 12 kPa target yielded lower precision than lower elasticity targets. Previous studies have reported high variability for high elasticity targets for shear wave elastography [11,21]. This phenomenon is probably due to the higher shear wave attenuation in high elasticity conditions [12]. Our results suggest the need for caution when measuring elasticity in lesions with high elasticity, which would include malignant lymph nodes. A previous meta-analysis reported that the cutoff values for differentiating malignant cervical lymph nodes from benign lymph nodes on shear wave elastography ranged from 19.4 to 57 kPa [5].
If we focus on target elasticities between 8 ± 3 and 25 ± 6 kPa, the current study results revealed measurement errors of only 0.10-0.51 kPa. In addition, data from the five individual elastography systems also demonstrated low measurement error (0.10-1.26 kPa). Moreover, our results demonstrated low wCVs (6.96-8.47%) for target elasticities between 8 ± 3 and 25 ± 6 kPa. These findings suggest the presence of high accuracy and reproducibility across five different elastography systems for target elasticities between 8 ± 3 and 25 ± 6 kPa, target elasticity values that are commonly encountered in daily clinical practice.
A previous study using the VTQ, VTIQ, and Aixplorer systems reported no significant trends between the coefficients of variation and acquisition depths of 10, 25, and 40 mm [22]. The present study also found that the wCV and coefficient of variation did not differ across acquisition depths. Despite increases in acquisition depth of up to 60 mm, the precision of the US elastography systems was not compromised. With regard to accuracy, the current study also revealed that measurement errors and the proportions of outliers were independent of acquisition depth, and the logistic regression analysis also showed that acquisition depth did not affect accuracy. Until now, no study had evaluated the accuracy of US shear wave elasticity measurements in relation to acquisition depth. First, a plausible explanation for our results could be that depth is not an explicit physical quantity, but it is the indirect effect of various confounding factors, such as US and shear wave attenuation, focusing acquisition depth, pulse energy, and other parameters. US shear waves degrade and distort through heterogeneous media with variable elasticities until they arrive at a target in vivo tissue, but US shear wave attenuation is less likely to occur with the uniform media of in vitro phantoms. In vivo heterogeneity may result in unpredictable and inconsistent acquisition depth results [23]. Therefore, phantom studies may be optimal for evaluating the performance of US elastography in a setting of the most important parameters such as elasticity, lesion size, and acquisition depths are known [21]. Second, the acquisition depths may not have been deep enough to reveal significant differences, unlike the elasticities. Although the acquisition depths could affect the shear wave measurements, the results using in vitro phantoms did not provide any consistent patterns in our and previous studies. In our study, we measured targets which have acquisition depth up to 60 mm and previous studies also did the same manner [11,22,24]. US elastography measurement is usually performed within the acquisition depth of 60 mm in the liver [4]. Therefore, our study results might have clinical implications for elasticity measurement.
This information may be clinically meaningful, especially in the absence of a particular target lesion, such as in liver parenchyma, where it may be hard to achieve the same acquisition depth between operators. Therefore, this study demonstrates that US elastography has high reproducibility and accuracy, regardless of the acquisition depth, a finding that is important for daily clinical practice.
This study revealed that the wCVs of low elasticity targets (8 ± 3, 14 ± 4 25 ± 6 and 45 ± 8 kPa) were low (6.96-10.43%) and previous reports also showed low coefficients of variation (0% to 9%) [11,22,24]. In addition, interobserver agreement according to target elasticity and acquisition depth showed only very slight mean differences, a finding that was also in agreement with previous studies that showed high interobserver reliability for shear wave methods (intraclass correlation coefficients of 0.99-1.00) [11,24].
This study has several limitations. First, because this was a phantom study, we could not evaluate clinical conditions that could affect the results of the shear wave elasticity measurements. The US elastography phantoms did not have viscoelastic components like live soft tissues [23]. Without animal or human data, technical limitations may confound our conclusions. Our results must be verified by further studies using animal or human subjects. Second, as our hospital uses the ACUSON S2000 only for evaluation of the superficial neck, a curved transducer was not available, and a linear transducer had not been developed for the EPIQ 5 at the time of this study. Third, although we tried to evaluate various target elasticities and acquisition depths, only five different target elasticities and acquisition depths were evaluated, and these were only investigated using two phantoms. Therefore, it is necessary to use various phantoms with large sample size to evaluate elasticities and acquisition depths with both the US elastography systems used in this study and other ones. Fourth, we could not add an attenuating medium with a known attenuation coefficient on top of the phantoms. Further research using an attenuating medium will be needed.

Conclusions
Targets with high elasticity showed lower accuracy and lower precision than targets with low elasticity, while acquisition depth did not show consistent patterns in either accuracy or precision.
Supporting information S1 Table. Within-subject coefficients of variation (wCVs) for ultrasound shear wave elasticity measurements according to target elasticity and acquisition depth. (DOCX) S2 Table. Shear wave elasticity measurements for four different elasticity targets at two different depths, with measurements obtained using a linear transducer. (DOCX) S3 Table. Shear wave elasticity measurements for four different elasticity targets at two different depths, with measurements obtained using a curved transducer.