The reproducibility of 109Cd-based X-ray fluorescence measurements of bone lead.

We assessed the reproducibility of X-ray fluorescence-based lead measurements from multiple measurements made on a low-concentration plaster of paris phantom and in five subjects measured five times on two occasions. Over a 6-month period, 220 measurements of the same phantom were obtained and showed a standard deviation of 1.29 micrograms Pb (g plaster of paris)-1. The two sets of in vivo measurements were made 10 months apart and revealed a mean standard deviation of 3.4 micrograms Pb (g bone mineral)-1 and 5.1 micrograms Pb (g bone mineral)-1 for males and females, respectively. Our measured standard deviation exceeded by 20-30% the calculated standard deviation associated with a single measurement both in the phantom and in subjects. This indicates that some variance is introduced during the measurement process. Operator learning and consistency significantly minimized this increased variability. Measured lead concentrations of the left and right tibia in 14 subjects showed no significant differences between legs. As a result, either tibia can be sampled and compared over time. The levels of reproducibility we report here mean that X-ray fluorescence-based determinations of bone lead concentrations are reliable both over the short and long term. Thus, reasonably sized confidence intervals can be placed on detected changes in concentration and should permit acquisition of longitudinal data within a reasonable length of time.

We assessed the reprducibility ofX-ray fluorescence-based lead measurements from multiple me emeits made on a low-concen on per of pris phainitom and in five sects measured five times oni two occasions. Over a 6-month period, 220 measurements of the same phantom were obine and showed astandrdeviation of Measurements of skeletal lead content by in vivo X-ray fluorescence have yielded crosssectional data from which several important conclusions have been drawn about bone lead burdens in humans. First, several crosssectional surveys of nonoccupationally exposed populations have shown that in vivo tibia lead measurements represent an index of cumulative exposure to environmental (1,2) and industrial (3,4) lead levels. Second, when used as an index of cumulative exposure, in vivo bone lead measurements have proven valuable in discriminating between occupationally and nonoccupationally exposed persons (5). Third, and most importantly, it has been established that a strong relationship exists between elevated blood lead and bone lead in retired lead workers, highlighting the importance of an endogenous lead exposure (6,7).
With the knowledge that the body's lead stores can be mobilized back into the circulatory system, research questions are now addressing the subclinical toxicity of lead (8) released from bone. The release of this endogenous lead store may be a direct result of changes in bone mineral status such as that experienced by women with the onset of menopause (9). The outcomes of such studies will depend strongly on determining changes in bone lead over a reasonable length of time. It is therefore important that in vivo bone lead measurements be reproducible.
Because published data on the reproducibility of bone lead measurements are limited (10)(11)(12), we set out to define the short-and long-term reproducibility of bone lead concentrations determined by our measurement system both in phantoms and in human subjects. We also present preliminary data that examine the differences in tibia lead between the right and left legs within individuals.

Materials and Methods
We determined the lead concentration in phantoms and in subjects with an improved 09Cd K X-ray fluorescence system. Details of the instrumentation of this upgraded system have been described previously (1).
A bare cylindrical plaster of paris phantom with a nominal concentration of 23 pg Pb (g plaster of paris)Y was selected to define in vitro reproducibility. The phantom was measured over a 6-month period, during which 220 measurements were recorded. Each measurement lasted 1800 sec (clock time), and care was taken to reproduce the position of the phantom to eliminate the effects of any concentration inhomogeneity along or around the phantom.
Five subjects (three male, two female) participated in the reproducibility trials. They were selected on the basis that their leg sizes represented a wide range of measurement geometries. We assessed leg size by the circumference at the midpoint between the medial malleolus and the tuberosity of the tibia. Two sets of five measurements were performed at the midpoint of the anteromedial aspect of the left tibia of each subject. For each individual a set of measurements was acquired mostly within a 5-day period. We made the first set of measurements in September 1992 and the second series in July 1993. Before the second set of measurements were repeated, however, two changes occurred in our measurement system. First, the detector resolution, as assessed by the full width at half maximum of the coherent peak, increased from 650 eV to 700 eV.
Second, we purchased a second 109Cd source (1.1 GBq). In all five subjects the first series of measurements were performed with the older 109Cd source (0.50 GBq). In two of the three male subjects, the second set of measurements were done with the new source. For the remaining one male and two female subjects, the second series of measurement were repeated with the original source which was reduced in activity (0.30 GBq) due to decay over the 10 months separating the two measurements. Consequently, we were able to examine short-term reproducibility at three different source strengths.
In 14 subjects (7 male, 7 female,) we made a single measurement on each of the left and right tibia. Again, the midpoint of the anteromedial aspect of each tibia was used, and measurement times ranged between 1800 and 2000 sec.
Peak information from all spectra were extracted using a nonlinear least-squares fit based on the Marquardt algorithm (13). We used chi-square analysis to assess the variance associated with the serial phantom measurements. We define measurement reproducibility as the standard deviation associated with the mean of a series of concentration measurements. The Student's t-test was used to determine the statistical significance of differences between mean concentrations derived from repeat measurements performed in the in vivo part of the study. The t-distribution was also used to derive confidence intervals associated with a measured change in bone lead concentration. Figure 1 shows the results of measuring the plaster of paris phantom over a 6month period. The mean concentration is 23.32 pg Pb (g plaster of paris), with an associated standard deviation of 1.29 pg Pb (g plaster of paris). Twice this deviation is also indicated with the data set. As expected for 220 measurements, 11 (5%) lie outside the 2ay range. The calculated uncertainty on a single measurement of this phantom is approximately 1 pg Pb (g plaster of paris)Y . Therefore, there is a 30% difference between the predicted and measured uncertainties. To examine this difference, the ratio of observed to expected variance was evaluated using the A -~-** . I 9 .-a . 9 reduced chi-square. If the observed variance were equal to the predicted variance, then the reduced chi-square would be equal to 1. For 219 degrees of freedom, the reduced chi-square equals 1.66 and has an associated p-value <0.02, indicating a significant increase in variance associated with making repeated measurements over a 6month period. The results of the in vivo reproducibility trial are summarized in Table 1. The means and standard deviations of the two sets of measurements are given, along with the average calculated uncertainty associated with a single measurement. As indicated, among the five subjects the mean concentration was significantly different in one (p=0.03). The reproducibility in males ranged from 2.1 to 5.3 pg Pb (g bone mineral ' and 2.4 to 9.1 pg Pb (g bone mineral) in females. Reproducibility could not be related to source activity or leg size.

Results
The six estimates of reproducibility in males and four estimates in females can be averaged to obtain typical values because we selected our subjects to be representative of a wide range of in vivo variations. These average results are 3.4 pg Pb (g bone mineral)-' for males and 5.1 pg Pb (g bone mineral)-I in females. With these estimates of variability in typical subjects, confidence intervals associated with a difference in two sets of bone lead concentration measurements can be derived from: cI= A~n where t[2(n-1),1-0.5a) is the 100(1-0.5a) percentage point of a t-distribution with 2 (n-i) degrees of freedom, a is the estimated  (14). Based on this formula, the 80, 90 and 95% confidence interval for both males and females were calculated and are given in Table 2. Differences in lead concentration between the left and right tibia are addressed in Table 3. The difference in lead concentration is presented as a mean  ! .l vitro concentrations are highly reproducible over the short and long term. We previously reported the short term reproducibility for this low-concentration phantom as being 1.20 pg Pb(g plaster of paris)-Y (15). This estimate was derived from a set of 29 consecutive measurements acquired without repositioning. By extending these phantom measurements over a longer term, we have produced an estimate of reproducibility that is less than 10% higher. This suggests that factors such as small changes in source activity, detector and electronics performance, analytical procedure, and repositioning errors do not greatly contribute to the observed variance associated with an estimation of lead concentration in vitro.
The uncertainty assigned to a lead concentration was derived from propagating the errors associated with dividing a measured lead-to-coherent ratio by the slope of the calibration line and accounting for the uncertainties in the calibration line. A full demonstration of the calculation of the uncertainty on a single measurement is given in the appendix. Although our serial phantom measurements were obtained in a carefully controlled laboratory environment, we found that the calculated uncertainty on a single measurement underestimates the observed standard error by approximately 30%. A 30% difference is significant and meaningful. This suggests that our formulation of the calculated standard deviation may be incomplete, thereby underestimating the true standard deviation. Another possible contributor to this difference, although unlikely, is nonuniformity of lead distributed in our phantom. This phantom was one of the first that was made to calibrate our system and niay contain pockets of high lead concentration. If this is true, small errors in repositioning (as judged by source-to-phantom distance and phantom rotation) will lead to different concentration determinations and increased variance. It has been shown that the overall lead detection efficiency over a typical tibia cross-section (4-5 cm) can vary by more than an order of magnitude (16).
We have defined in vivo reproducibility (precision) as the standard deviation from the mean of a series of measurements and have suggested typical values of 3.4 pg Pb (g bone mineral)-and 5.1 pg Pb (g bone mineral)-1 in males and females, respectively. We recently reported a typical measurement error of 2.8 pg Pb (g bone mineral)-' in males (17). This estimate was derived as the median value from a distribution of 30 individual uncertainties in normal males recorded by our system. This estimated median value is 18% less than the measured value we have suggested here. This does not in itself achieve a high degree of statistical significance (0.3>p>0.1). However, taken together with the reproducibility of phantom measurements, it is reasonable to conclude that serial measurements of bone lead are reliable, but again there is some increased variance introduced during in vivo measurements. This increase, if it is real, could be accounted for by subject-dependent parameters such as bone size, bone mass, subject movement during measurement, and the thickness ofsoft tissue overlying the tibia. The effect of these subject-dependent parameters is most apparent by the larger measurement variability we detected in our two female subjects.
In four out of the five subjects we measured, reproducibility improved on the second visit. Although the reproducibility of bone lead measurements will vary between subjects, the variability within a subject is expected to remain constant over a reasonable length of time (< 1 year) given that no drastic changes in bone mineral or bone lead content have occurred. It is quite conceivable that the improved variability we detected is due to an increase in the operator's knowledge of the measurement system. The effect of operator learning may be even greater given that the second series of measurements was recorded with our detector operating with decreased resolution and, in some cases, at a reduced source activity. If the improvement we have noted is due to operator learning, small errors in repositioning may be highlighting small variations in concentration within an individual's tibia.
By repeating some of our in vivo measurements at different source activities, we were able to assess the effects of count rate on reproducibility. Our findings suggest that subjectand operator-dependent factors more strongly influence measurement reproducibility than source activity. This can be seen by examining the standard deviations given in Table 1. All deviations under visit 1 were recorded with a 1.5-year old source (0.5 GBq). Consequently, the variability in the standard deviations are subject dependent. For visit 2, the first two subjects were measured with a new source (1.1 GBq), but their results are no less variable than the remaining three who were remeasured with an old source (0.3 GBq). However, a decrease in source activity and resolution accounts for the increase in the average calculated uncertainty for all second visits.
It was surprising that in one subject a significant change in lead concentration was detected. This change is not readily explained because the expected annual increase in tibia concentration is very small [<0.5 pg Pb (g bone mineral) I] (1,2).
Even if the tibia was not measured at the same position, a significant difference still should not have been detected at the p<0.05 level. Atomic absorption measurements of lead concentration have shown that the concentration of lead sampled anywhere along the length of the tibia will result in a value within 1 SD of the mean value of the entire tibial shaft (18). The apparently significant change may be due to our detecting the 1 in 20 false positive difference which is expected at the p<0.05 level.
The confidence intervals given in Table  2 can be used to establish levels of certainty on a detected change in tibia lead concentration. For example, based on a set of three tibia measurements recorded at the same site in a typical male, the 95% confidence interval on the difference in concentration between the two measurement sets will be ± 14.6 pg Pb (g bone mineral)-.
This means that a change in bone lead concentration of at least 14.6 pg Pb (g bone mineral)-1 must be measured to be 95% confident that the observed difference is due to a variation in the subject measured and not due to measurement uncertainty. As illustrated by Table 2, estimates of change can be made more precise by taking additional measurements. This can be done without being limited by the magnitude of the accumulated dose. For example, the radiation dose received from five consecutive tibia lead measurements will be approximately 0.2 pSv (19). This value is still far less than the annual dose received from natural background radiation (2500 pSv).
In deriving the confidence intervals reported in Table 2, we have assumed that our measured standard deviation (a) is only an estimation of the true reproducibility. If one had knowledge of the true reproducibility and if one assumes that this true value does not change from person to person nor over time, then it would have been more appropriate to derive confidence intervals based on a normal z-distribution rather that a t-distribution. Intervals based on a normal distribution are smaller than those derived from a t-distribution. Consequently, the values given in Table 2 are perhaps an overestimation.
With our small sample size, we found no evidence for a difference in lead concentration between the left and right tibia within a group of individuals. This confirms the findings of other investigators. For example, no differences in concentration were noted in the right and left tibia taken from the archaeological skeletons of 12 colonial American adults as assessed by atomic absorption spectroscopy (18).
Others have demonstrated strong correlations in the lead content at sites of similar Environmental Health Perspectives ,-*** 9 M . -* 9--iOWN bone type such as the trabecular-dominated sternum and calcaneus (6). Therefore, this lack of a detectable difference between two similar weight-bearing bones with similar blood supplies means that either tibia could be sampled in a series of measurements made over time to assess concentration changes. This may prove useful in providing an alternative measurement site in subjects who have broken their legs or for amputee cases.
In conclusion, it is clear that serial measurements of bone lead concentration are reliable enough for a dedicated 109Cd based K X-ray fluorescence system to be the instrument of choice in studies addressing the effects of lead mobilization from bone. For applications involving longitudinal studies such as pharmaceutical interventions, further improvements in reproducibility will allow shorter study times and fewer subjects without sacrificing statistical power.

Appendix
A data set comprised a plaster of paris phantom, for which there were a total of 34 measurements, plus an occupationally exposed male subject and a nonoccupationally exposed female subject. In each spectrum, estimates, together with la uncertainties, were obtained for the amplitudes of four lead X-ray peaks (al, a2, 01) 03) and the amplitude of the coherent scatter peak (coh). Four calibration lines were determined, in which the ratios (Ri) of X-ray peak amplitude (xi) to coherent peak amplitude (coh) was regressed against added lead concentration. For each calibration line, the slope (mi) and intercept (CQ), together with their vari-  (2) coh and the factor 1.46 is the ratio of coherent scattering cross-sections of bone mineral to hydrated plaster of paris at 88 keV and 1600. 2 Also the variance, GRi, of the ratio is However, this has not taken account of the uncertainties in the calibration line. A more complete calculation of the variance in a lead estimate derived from a single X-ray, a2bi, is given by ( The fact that each of the terms KcohJ has a mutual dependence on coh, the coherent scatter amplitude, can be taken into account by adding another term to Equation 9, giving an overall estimate of variance, Eapb, which allows for mutual dependence on coh: Volume 102, Number  Of the three estimates of variance, 2ib., 2bw' and E2i, the first accounts only for an individual's measurement error. The second additionally accounts for calibration line uncertainties, while the third also includes the mutual dependence of each of the four lead estimates on the same coherent scatter peak amplitude. To illustrate numerically the differences between these three estimates of variance, the data set of 34 phantom measurements and 2 subject measurements was used. Table Al shows the slopes, intercepts, variances, and covariances for the four calibration lines. Table A2 shows the X-ray and coherent peak amplitudes for two subjects. Table A3 shows the results for the two subjects shown in Table A2, using the calibration line data of Table Al. The nonoccupationally exposed female subject showed a larger effect of the calibration lines because her results were considerably less than the mean values for the calibration. Her results also showed an extremely small effect of mutual dependence on coherent peak amplitude because the additional term in Equation 10 depends on the product of Xray peak amplitudes, which were all small in her case. Even for the occupationally exposed male subject, whose lead concentration was substantial (although not extreme), the effect of mutual dependence on coherent peak amplitude was very minor. This demonstrates that this effect can normally be safely ignored.
The effect of uncertainties in the calibration lines is to add 2-3% to the errors estimated in the crudest.fashion. This should not be ignored because to do so would produce a systematic underestimate of the measurement error; however, the size of the discrepancy remains small compared to the variation in error between individuals.