Accuracy, Reproducibility and User Experience With Standardized Instructions for Measurement of Total Kidney Volume in Autosomal Dominant Polycystic Kidney Disease

Purpose Total kidney volume (TKV) measurement is integral in clinical management of autosomal dominant polycystic kidney disease (ADPKD) but the gold standard of measurement via stereology/manual planimetry is time-consuming and not readily available to clinicians. This study assessed whether standardized measurement instructions based on an ellipsoid equation enhanced TKV assessment on computed tomographic (CT) images of the kidneys as determined by accuracy, reproducibility, efficiency and/or user acceptability. Methods Participating radiologists were randomized to perform TKV measurements with or without standardized instructions. All participants measured the same 3 non-contrast, low-dose CT scans. Accuracy was assessed as variation from TKV measurements obtained by planimetry. Intraclass correlation coefficients and time to complete the measurements were assessed. Surveys assessed prior experience with TKV measurement and user acceptability of the instructions. Results 49 radiologists participated. There was no difference in accuracy or measurement time between instructed and non-instructed participants. There was a trend towards greater reproducibility with standardized instructions (ICC .8 vs .6). 92% of respondents indicated the instructions were easy to use, 86% agreed the instructions would enhance their comfort with TKV measurement and 75% agreed they would recommend these instructions to colleagues. Conclusions Instructed and non-instructed participants demonstrated similar accuracy and time required for TKV measurement, but instructed participants had a trend towards greater reproducibility. There was high acceptability including enhanced user confidence with the instructions. Standardized instructions may be of value for radiologists seeking to improve their confidence in providing clinicians with TKV measurements necessary to appropriately manage this patient population.

Conclusions: Instructed and non-instructed participants demonstrated similar accuracy and time required for TKV measurement, but instructed participants had a trend towards greater reproducibility. There was high acceptability including enhanced user confidence with the instructions. Standardized instructions may be of value for radiologists seeking to improve their confidence in providing clinicians with TKV measurements necessary to appropriately manage this patient population.

Introduction
Autosomal Dominant Polycystic Kidney Disease (ADPKD) is the most common inherited renal disease and is the fourth leading cause of end-stage renal disease (ESRD). 1-3 ADPKD is progressive in nature, but the rate of progression is highly variable; 4 in light of this variability, a readily available tool to determine individualized rate of progression is valuable in ADPKD management. 5,6 In recent years, measurements of renal size, and particularly Total Kidney Volume (TKV), have emerged as a sensitive marker of disease status, progression and renal prognostication. [6][7][8][9] Furthermore, classification of patients based on TKV is one of the key factors in determining which patients are candidates for disease modifying treatment with vasopressin antagonists, 5,6,10,11 and thus strongly influences treatment decisions made by clinicians caring for these patients.
Measurement of TKV can be highly accurate and reproducible but the reference standard method of manual planimetry is time-consuming 12-14 which limits translation into clinical practice. Alternative methods have been developed that calculate TKV based on a more rapidly obtained series of linear measurements, but there are diverse methods reported [14][15][16] and uptake and familiarity with these methods is limited in many clinical settings. 17,18 More recently, there is evidence around partially, or fully automated measurement of TKV, 19,20 but this has not yet translated from research into a widely available tool in clinical practice settings.
In the interim, given the utility of TKV measurement and its importance in guiding clinical decision making, reliable TKV measurements that are readily available at the point of clinical care has been identified as a need amongst ADPKD care providers. 17,18 A previous study performed by our group evaluated both imaging acquisition methods and TKV measurement methods and found that one of the ellipsoid measurement methods provides accurate and reproducible TKV values for either MR or CT acquired images. 21 It is unknown however if radiologists across diverse practices settings and with variable exposure to ADPKD imaging are familiar with these methods and whether their performance with these methods is similar to radiologists who are more experienced with TKV assessment or those who perform these measurements in research settings. Thus, the current study was conducted to assess the current level of knowledge and performance of general radiologists with TKV measurement, and to determine if that performance as determined by accuracy, reproducibility, efficiency and/or user experience is enhanced by provision of standardized measurement instructions.

ADPKD Patient and Scan Selection
The images used for this study were obtained from our previous study examining the performance of varying image acquisition and TKV measurement protocols. 21 From this collection of scans, a sample of 3 CT scans were chosen by the study team as scans with different volumes and representative of what would be encountered in clinical practice; images of these scans are included in Supplement 2. All participating radiologists performed measurements on the same 3 scans, and the choice of 3 scans to interpret was chosen as a number that would not be overly onerous to encourage participation of radiologists in varying clinical settings.
The CT examinations used in this study were performed on a GE HD750, 64 slice scanner (GE Healthcare, Wisconsin, USA) without the use of contrast and settings of 120 kVp, mA 100-130, reconstructed with our department standard reconstruction (a blend of 60% FBP and 40% ASIR). The reference standard TKV value for these 3 scans was determined by planimetry which was performed manually but assisted by Image J version 1.51d software. 21 The software was courtesy of the UBC and St Paul's Hospital Centre for Heart Lung Innovation.
All relevant ethics approvals were obtained during the original study 21 ; this study based on secondary usage of these same scans did not require repeat ethics approval as per our institutional guidelines.

Participants
Radiologists from across BC were invited to participate with recruitment deliberately targeting a combination of academic and community hospital sites, as well as large urban and more remote hospital settings. The study team contacted department heads at each of these hospitals and all radiologists who were willing were invited to participate.

Intervention
Standardized TKV measurement instructions were developed including detailed descriptions of how to perform the discrete measurements required to calculate total kidney volume (Supplement 1). These instructions utilize a modified ellipsoid volume calculation that has been demonstrated to provide rapid, accurate and reproducible TKV measurements compared to the reference standard of manual planimetry on both CT and MR acquisitions. 15,21 Participating radiologists were then randomized to either receive or not receive these standardized instructions (the 'instructed' and 'non-instructed' groups). The non-instructed group was instructed to perform a TKV assessment with whatever methods they would use if they received a request for TKV measurement in a clinical setting. Block randomization to these two groups was performed in blocks of 2 or 4 to ensure an equal number of radiologists in each group.

Performance of Study Measurements
To provide generalizable data, participants performed measurements via their institution's PACS systems and/or off-line software packages. Measurements were performed in this way to emulate what the participating radiologists would do in a real-world clinical setting.
The participating radiologists were instructed to record their timing throughout the study. In addition to the time required to measure each scan, 'preparation time' was determined by asking participants to record time prior to performing any measurements. In the instructed group, this preparation time was the time required to familiarize themselves with the standardized instructions, while in the noninstructed group, the preparation time was that required to do any pre-reading or research they felt necessary to prepare them to perform a TKV measurement.
Additionally, the participating radiologists completed two short surveys. The first survey was completed before any TKV measurements were performed and aimed to assess their current familiarity with TKV measurement, and their existing preferred method(s) to assess TKV. The second survey was completed after the TKV measurements were performed and aimed to assess the radiologists' acceptability and experience with the standardized instructions. The non-instructed group was provided with a copy of the measurement instructions after they completed their study measurements so that they could also give feedback on these instructions without contaminating their study results.

Statistical Analysis
Percentage difference from the total kidney volume reference standard was considered as absolute rather than directional. The absolute percentage difference was dichotomized at 10% and 15%, with all observation differences of more than the cutoff from the reference standard considered large. Categorical data were presented as number and proportion. For continuous data, median and interquartile range, as well as mean and standard deviation was shown. Intraclass correlation coefficient based on a two-way random effects model was used to assess agreement among the radiologists. P values less than .05 indicated statistical significance. Statistical analyses were completed using SAS software (version 9.4; SAS Institute, Cary, NC).

Results
A total of 60 radiologists were recruited to participate from centres across the province of British Columbia with 30 participants randomized to each of the 'instructed' and 'noninstructed' groups. A total of 10 participants were excluded due to incomplete responses, and 1 from the instructed group was excluded due to extreme outlier responses consistent with a data entry issue (Figure 1). The remaining 23 instructed and 26 non-instructed participants who completed all study measurements were included in the final analysis. Demographic characteristics of the participants are displayed in Table 1.
Prior to performing measurements participants were also asked about their exposure and comfort with renal imaging in ADPKD. 27% of participants stated that they were exposed to ADPKD 'often' in their practice with 51% being exposed 'occasionally' and 22% 'almost never' exposed. 13% of participants reported being 'very comfortable' with TKV assessments while 61% were 'somewhat comfortable' and 25% were 'uncomfortable' ( Table 1). 76% of respondents stated that if asked to evaluate a TKV in routine practice they would use an estimation equation, 5% would use built-in estimation software and 16% did not have a preferred method. Of those who responded with use of an estimation equation, all respondents reported that they would use some variation of an ellipsoid equation of kidney length*width*depth*π/6. The 3 scans that were selected by the study team had TKV as determined by the reference standard of software-assisted manual planimetry of 461 mL, 580 mL and 929 mL (Supplement 2). The mean difference between the reference standard and participant measurements was 116 mL (SD 87 mL) for the instructed group and 130 mL (SD 116 mL) for the non-instructed group, corresponding to a mean variation of 18% (SD 14%) in the instructed group and 20% (SD 17%) in the non-instructed group; none of these differences were statistically significant (Figure 2). 48 measurements deviated from the reference standard by >10% in the instructed group compared to 54 in the non-instructed group and 33 deviated by >15% in the instructed group compared to 45 in the noninstructed group; these trends were not statistically significant (Figure 3). Intraclass correlation coefficient was .8 for the instructed group compared to .6 for the non-instructed group.   Figure 2. Variation from reference standard for instructed and non-instructed participants.
Circles represent data points that fall outside 1.5*IQR.
Median time to perform measurements on all 3 scans was 20 min in the instructed group and 15 min in the noninstructed group (P = .13); this includes a median reported preparation time of 5 min in both groups. 38% of the participants in the non-instructed group reported using additional resources prior to performing measurements compared to 9% in the instructed group. Of those who used additional resources, the most common sources were conducting an internet search (50%) and asking a colleague with more experience (21%).
After performing the TKV measurements, all participants were asked to provide feedback on experience with and acceptability of the instructions. 92% of respondents either agreed or strongly agreed that the instructions were clear and easy to use, 86% agreed or strongly agreed that the instructions would enhance their comfort with TKV measurement and 75% agreed or strongly agreed that they would recommend the measurement instructions to a colleague. Distribution of responses for the instructed and non-instructed groups are displayed in Figure 4.

Discussion
Although the gold standard of kidney size assessment in ADPKD remains TKV obtained by stereology, [12][13][14] this is difficult to obtain in everyday clinical care where assessment of TKV is integral in treatment decisions. Canadian nephrologists have expressed a desire for TKV assessments that are readily available in their usual practice settings and with sufficient accuracy to enable clinical decision making. 17,18 Given lack of either the time-consuming gold standard of manual planimetry or automated TKV measurement and reporting at this time, we examined whether standardized instructions for measurement using time-saving volume equations instead of planimetry might be an acceptable alternative for routine clinical use. We specifically examined the accuracy, speed, ease of use and user acceptability of these standardized instructions. The participants in this study were general radiologists in diverse clinical practice settings, without formalized training in ADPKD, including a large proportion who reported only occasional exposure and varying levels of comfort with kidney volume measurement in ADPKD.
The results demonstrate similar accuracy between both instructed and non-instructed radiologists but the instructed group demonstrated greater inter-rater reliability and a trend towards fewer results with larger magnitudes of variability. The participant survey demonstrated that a high proportion were already using methods similar to given those in our instructions; this may be an explanation as to why there was no overall difference in accuracy between the 2 groups. That said, the good inter-rater reliability with use of the standardized instructions is relevant and may be especially reassuring for clinicians requesting these tests in centres with lesser exposure to TKV measurement such that they can be confident that the final result received by clinicians would be comparable to what they would receive if the test were performed at a centre with greater experience with TKV assessments. It is worth noting that the range of variation observed in this study is similar to results of prior studies; for example a comprehensive evaluation of ellipsoid methods observed similar high correlation between planimetry and ellipsoid measurements with standard deviations in the range of 13.8%-20.1% of TKV 22 and analyses of the widely used Mayo Clinic imaging classification 9 reported TKV standard deviations that ranged from 5.5% to 10.1%. Although this variation may seem large, in analysis of the Mayo Classification, this range of variation was associated with a low rate of misclassification with this most commonly used TKV classification system. 9 In addition to accuracy, we assessed the user experience with these standardized instructions. There was a trend towards slightly longer measurement time with use of the instructions, but it should be noted that a substantial number of participants in the non-instructed group consulted external resources to prepare for performing the TKV measurements. Our methods called for self-reporting of this preparation time, but it was noted that almost all participants reported 5 min of preparation time; a limitation is that with this self-reporting process some participants may have estimated rather than comprehensively recording the time spent searching for resources. Regardless, the majority of the participants reported that the instructions were easy to use, that they enhanced their comfort with providing TKV results and that they would recommend these instructions to a colleague. Taken together, the use of these standardized measurement instructions may be of value for enhancing the confidence of TKV assessment among radiologists, particularly those who are either uncertain which measurement method to utilize or those who perform TKV assessments infrequently.
There are some limitations to our study. Both the sample size of participants and the number of scans was a limiting factor; the latter specifically was a compromise to allow for inclusion of busy clinical radiologists but ideally more scans would have been included and may have altered our results. Similarly, different performance of the instructions may have been seen if scans were included from patients with more extremely large and/or distorted kidney architecture that may be less frequently encountered by general radiologists in their routine practice, but for the same practical limitations we elected to utilize 3 scans felt to be representative of more commonly encountered ADPKD morphology. Future dedicated evaluation of TKV measurement instructions in this specific clinical setting may be of value. Intentionally, we chose low-dose CT scans for measurement rather than MRI. Based on prior experience 21 CT scans can be more difficult to interpret for the purposes of TKV, but access to imaging with this modality is generally less limited than MRI. Thus, we felt that if measurement of TKVon CT scans was supported by this study, the conclusions would be reasonably transferable to MRI.
In conclusion, while stereology remains the gold standard for TKV measurement, there remains a need for TKV assessments that are sufficient for clinical decision making and readily available in diverse practice settings. The TKV measurement instructions in this study demonstrated similar accuracy, but with greater inter-rater reliability and a high user acceptability including enhanced user confidence in providing TKV measurements. While awaiting access to more automated methods for TKV measurement, the use of standardized measurement instructions for this somewhat infrequently encountered task may be of value for radiologists seeking to improve their confidence in providing renal clinicians with the accurate and reproducible TKV assessments they desire.