Validating National Kriging Exposure Estimation

Pathogens tend to be amplified in animals raised in CAFOs and, thus, are more difficult to eliminate in meat packing processes.

In their article  argued that it is feasible to use national scale daily kriging to estimate ambient air pollution exposure, even in locations where monitoring data are limited. In addition, they argued that national scale kriging is preferable to regional kriging and that automated variogram estimation is preferable to manual. The advocated methodology seems appealing when compared with the more standard approach of estimating ambient exposure separately in individual metropolitan cities Miller et al. 2007;Pope et al. 1995) because it simplifies exposure assessment for multicity studies and allows inclusion of subjects far from monitoring sites.  also suggested estimating daily variograms without accounting for day-today relationships and variations in data availability. This is a convenient simplification if it produces reliable results, but the evidence is not convincing.
The primary focus of the article by  is on kriging daily ambient PM 10 (particulate matter with aerodynamic diameter ≤ 10 µm) based on the U.S. Environmental Protection Agency Air Quality System (AQS) measurements. Three cross-validation statistics were reported, namely prediction error (PE), standardized prediction error (SPE), and root mean square standardized (RMSS). PE is the difference between predicted and measured concentrations at each site; SPE is the PE divided by the estimated SE; and RMSS is the SD of the SPEs across sites. PE and SPE can be regarded as measures of bias, and RMSS is a measure of the accuracy of the SE estimates (RMSS should be near 1, with RMSS > 1 indicating that the estimated SEs are too small). Cross-validation SE statistics were not reported, but the SE at Women's Health Initiative (WHI) subject addresses is reported for some models.
The goal of kriging is accurate predictions at locations without measurements. This could be verified by a cross-validation mean square error (MSE) or similar summary of unsigned prediction error. In lieu of this, a reasonable alternative is to examine SE and RMSS together. If the RMSS is near 1 (< 1) then it is reasonable to regard the mean estimated SE as a valid estimate (upper bound) for the MSE. However,  did not always report both the RMSS and SE, and some of their conclusions are erroneously supported by only one of these. We also note that limiting cross-validation to AQS sites may not be representative of performance at subject addresses.
The primary claim of  is that the "data support the overall validity of kriging-based estimation approaches to estimate location-specific PM concentrations across the contiguous United States." The authors argued that the average crossvalidation PE and RMSS statistics are "acceptable" for 95% of days. However, PE and RMSS alone do not provide a reliable estimate of the prediction accuracy. An RMSS near 1 suggests that the SE is a good estimate of prediction accuracy, but the cross-validation SE was not reported. In another section of the article, the daily mean SE of predicted PM 10 at WHI subject locations was reported to be 27.35 µg/m 3 , which is high compared with the overall mean concentration of 26.29 µg/m 3 .  claimed that national kriging is preferable to regional kriging, and they compareed their national model to one in which the continental United States is divided into five regions. They reasoned that the two models perform equally well under cross-validation based on comparisons of SPE and RMSS, so other issues such as missing data and locations near regional boundaries argue for a national approach (we note that the boundary issue is easily addressed by overlapping regions). However, because they did not report SE for the regional model, it is impossible to verify their claim that the national model performs equally well.
Finally,  claimed that automated variogram estimation is preferable to manual. Based on 6 days of data, the authors argued that the manually fit model is worse because it produces somewhat larger SEs. However, the RMSS values on these days for the automatically fit model were all > 4, whereas they were near 1 for the manually fit model. Thus, comparing SEs to assess model accuracy is not valid because the SEs for the automatically fit model are unreliable. In fact, because the SEs were fairly similar and the RMSSs were significantly larger for the automatic fit, we would be inclined to favor the manual fit.
In summary, the methodology proposed by     do not adequately support using national-scale, log-normal ordinary kriging to estimate daily mean concentrations of PM 10 (particulate matter with aerodynamic diameter ≤ 10 µm) at unmonitored locations in the contiguous United States. They posit that the absence of the cross-validation SE prevents evaluating the validity of kriging estimation, as we implemented in this context, and the comparability of both regionalversus national-scale kriging and manually modified versus semiautomated, defaultcalculated semivariograms.
Little literature is available on the use of kriging methods to estimate daily air pollution data for large population-based multicenter epidemiologic studies. The four studies cited by Szpiro et al. Miller et al. 2007;Pope et al. 1995) all used cohort analyses for which only long-term average exposures are required, and only one of those ) actually involved interpolation methods at all, although the study was restricted to a single city. In contrast, our objective was to create an interpolated daily pollutant concentration database for a multisite population-based epidemiologic study.
The cross-validation mean square error (MSE) mentioned by Szpiro et al. is also termed the "root-mean-square (RMS) prediction error," which is the empirical SE based on the mean square of the predictions, as opposed to SE, the mathematical formula for the RMS prediction error. RMS and SE, both are available from ArcView (ESRI Inc., Redlands, CA), are often considered jointly as an alternative measure (to RMSS) of the validity of spatial analysis.
The average RMS and SE from 366 daily PM 10 spherical model cross-validations based on year 2000 PM 10 data were 19.48 and 16.19 µg/m 3 , respectively, from the log-normal regular kriging, and 26.43 and 25.60 µg/m 3 , respectively from a ordinary kriging. The validity of the model is supported by RMSS alone (≈ 1), by the similarity of RMS and SE, and by SPE (≈ 0). Additionally, the average daily SD of PM 10 measured at the monitor locations was 27.20 µg/m 3 . Comparing SD with the kriging-RMS provides a measure of the reduction in error due to interpolation. If RMS is less than the SD, then the kriging approach has some benefit, compared with using long-run averages. From both ordinary and log-normal kriging, especially for the latter, we see a notable reduction in RMS compared with SD. Meanwhile, substantial variability remains, suggesting that kriging error should be taken into account when using the kriged values.
Szpiro et al. also implicitly criticize our use of daily kriging when the objective was to interpolate daily data. Spatial-temporal models have potentially greater power than a 1-day-at-a-time spatial analysis but are not easy to apply in practice, with large datasets and many missing values.
Regional kriging could be superior to national kriging if the spatial dependence parameters (range, sill, and nugget) vary substantially from region to region, in which case a national kriging model could result in misspecified covariances. However, regional kriging also uses fewer data points to estimate those parameters and could result in greater errors. We would welcome theoretical or empirical studies that could cast further light on this trade-off. However, as far as our article ) is concerned, our main purpose was to note that the national kriging method appears to be competitive when assessed by overall RMS error. We compared the results of regional-and national-scale kriging on a small set (17%) of days when the largest number of monitors (≥ 400) were reporting data-a scenario heavily favoring regional spatial interpolation strategies. On the remaining days when only 120-400 monitors were reporting data, regional kriging was inherently problematic given the restricted availability of monitors within regions. Szpiro et al. suggest that the problems of interpolation near the boundary could be solved by "overlapping," but this is only one of the issues encountered using regional-kriging: One would still need to decide how to consistently define the regions, considering the number of available data points that change substantially from day to day, to achieve a meaningful reduction in RMS error.
Based on the 12 "optimal" days in 2000, the average RMS and SE were 12.68 and 12.82 µg/m 3 , respectively, from the national scale kriging, compared with 12.22 and 12.49 µg/m 3 , respectively, from regional-scale kriging ). These results, together with RMSS and SPE we reported, support our conclusion that national kriging performs comparably to regional kriging even when restricted to optimal days. Szpiro et al. correctly note that it is possible to improve the RMSS values by manual adjustment. However, typically we found that when one of the validity measures (RMSS, PE, or SPE) was improved by manual adjustment, other measures became worse. It is difficult to manually adjust models to improve all cross-validation parameters simultaneously. Manually adjusting daily semivariogrms is not feasible when kriging over 10 years. Moreover, the predicted SE at unmeasured locations was uniformly lower in automatically fit models.
Szpiro et al. are correct that cross-validation may not be representative of the performance at participant address locations, although it is unclear what alternative methods they would like us to use. The ability to do semi-automatic cross-validations was a major attraction of ArcView and, despite limitations, is the best tool we know for validating spatial predictions.
The semiautomated kriging approach presents considerable advantages in estimating daily residential-level pollutant concentrations in large cohorts over long periods. Our proposed method  used log-normal kriging based on a spherical model to interpolate daily data on a national scale, and the weighted least squares method of parameter estimation without manual adjustment. We believe that the cross-validation statistics, presented in our article and amplified here, provide adequate support for these recommendations against reasonable alternatives that we considered. doi: 10.1289/ehp.9927 Arsenic in drinking water is a worldwide problem, and studies in southwest Taiwan (Chen et al. 1985(Chen et al. , 1988Tseng 1977;Tseng et al. 1968;Wu et al. 1989) have been the bases of many cancer risk assessments. In a recent reanalysis of the data, Lamm et al. (2006) found "township" a confounder. Specifically, of six townships, only three (2, 4, and 6) showed positive dose-response relationships with arsenic exposure, and three others (0, 3, and 5) demonstrated cancer risks independent of arsenic exposure. The authors speculated that the confounding was related to blackfoot disease (BFD), a peripheral vascular disease associated with arsenic ingestion (Ch'i and Blackwell 1968;Tseng 1977), but they did not know the identities of the individual townships.

Cancer Risks Associated with Arsenic in Drinking Water
On the basis of data collected in previous studies (Brown et al. 1997(Brown et al. , 2000Kuo 1968), townships 0, 2, 3, 4, 5, and 6 (Lamm et al. 2006;National Research Council 1999) can be identified as I-chu, Pu-tai, Hsieh-chia, Yen-shui, Pei-men, and Hsiaying. With the highest prevalence of BFD in the country, I-chu, Pu-tai, Hsieh-chia, and Environmental Health Perspectives • VOLUME 115 | NUMBER 7 | July 2007 Pei-men are generally referred as the "BFDendemic area." Therefore, the "township factor" might be indeed related to BFD, because all three of the townships affected by this factor were in the endemic area. Pu-tai was the only BFD-endemic township not affected by the factor, and two of the five villages included in the analysis are in the northern part of the township, where the prevalence was low; this might be why it appeared to be a nonendemic township. Lamm et al. (2006) further speculated that the township factor was a reflection of a selection bias because the water sampling was focused on villages with high BFD prevalence. Although the sampling was related to BFD cases, the chance of a bias occurring in the selection of villages by Wu et al. (1989) was small because they included almost all of the villages covered in the survey by Kuo (1968) in the six townships. Lamm et al. (2006) claimed that their finding of a threshold-like model indicating no increase of bladder cancer with exposure levels < 150 µg/L was consistent with results from toxicologic studies and other epidemiologic data from the United States, Argentina, and northeastern Taiwan. Actually, it is also consistent with studies on bladder cancers covering the whole of Taiwan , southwest Taiwan only (Guo 1999;Guo and Tseng 2000), and another reanalysis of the same data (Morales et al. 2000;Stöhrer 2001). Furthermore, their results are consistent with studies on lung (Guo 2004) and skin (Guo et al. 1998(Guo et al. , 2001 cancers. Whereas Lamm et al. (2006) stated that low-dose villages showed a negative dose-response curve for bladder and lung cancers, they reported a positive slope (1.275) in their Figure 2; this is likely to be an error. In previous studies, exposures between the detection limit (0.001 ppm) and 0.01 ppm had a significant negative effect on transitional cell cancer of the kidney and on skin cancer in both sexes; the negative effect on bladder cancer was also significant in women, but not in men (Guo et al. 1994(Guo et al. , 1998. Even if the dose-response relationship fits a threshold-like model, using the median arsenic level in each village as the exposure indicator might not generate accurate risk estimates because villages with similar median arsenic levels can have very different distributions of exposures. For example, villages 0-G and 3-5 had very close median arsenic levels, 0.030 and 0.032 ppm, respectively (Lamm et al. 2006); if there was a threshold at 0.150 ppm, as proposed by Lamm et al., one would expected an increased risk in village 0-G, where the residents had exposure levels up to 0.770 ppm, but not in village 3-5, where all residents were assigned the exposure level of 0.032 ppm. In fact, previous studies on urinary cancers in Taiwan suggested an inflection point > 0.32 ppm (Guo 1999;Guo et al. 1994Guo et al. , 1997. If 0.32 ppm is adopted as the cutoff, villages 0-G, 0-E, 0-I, and 3-Q would be placed in the "low-dose villages" group, although they should have increased risks because some of the residents had exposures above the threshold. These four villages were in the township group "0, 3, and 5," which Lamm et al. regarded as being affected by the township factor, but no villages in the other township group (townships 2, 4, and 6) had this problem. Therefore, misclassifications might also contribute to the township factor; a different choice of exposure indicator may help clarify the uncertainties.
The author declares he has no competing financial interests. We thank Guo for his comments and additional information. As indicated by Guo in his letter, three of the six southwest Taiwan townships in in the internal cancer study of Wu et al. (1989) (townships 2, 4, and 6) show significant dose-response relationships for bladder and lung cancer standardized mortality ratios (SMRs) with respect to the median village well arsenic levels, whereas the other three townships (townships 0, 3, and 5) show high background rates for these cancers and no significant dose-response relationship with village arsenic level. Figure 1 demonstrates that the township-specific inverse linear regression lines for townships 2, 4, and 6 all meet the no increased risk level of SMR = 100

How-Ran Guo
A 340 VOLUME 115 | NUMBER 7 | July 2007 • Environmental Health Perspectives (inflection point) at arsenic exposure levels of approximately 125-150 µg/L, which is consistent with a threshold model. That is the same inflection point range seen for skin cancer prevalence in southwest Taiwan (Byrd et al. 1996) and in Inner Mongolia (Lamm et al., in press). In contrast, the data for townships 0, 3, and 5 are indicative of high background bladder and lung cancer rates (SMRs > 250 at low arsenic levels) that are independent of the arsenic level. We inferred from these analyses the presence of a second (nonarsenic) carcinogenic factor and speculated that it might be related to the nonarsenic etiological factors for blackfoot disease, a condition uniquely reported for this area. On the basis of ongoing analyses, we are currently less inclined to believe that the "township" factor is related to blackfoot disease.
Guo inquires whether exposure heterogeneity within the villages has affected the accuracy of the risk estimates based on the village medians and suggests using alternative exposure indicators. We have examined this. The analytic fits to the models demonstrated in Figure 1 are quite similar whether the median or the mean is used as the summary exposure indicator for the villages; Table 1 shows the robustness of the arsenic concentration of the inflection point with the use of a variety of exposure indicators. The table demonstrates that the inflection point for this group of townships and its 95% confidence interval (CI) for these townships is also robust, based on 20 villages. The lower confidence limit of the inflection point is 40 µg/L arsenic.
In spite of the uncertainties in the exposure assessments, the analytic findings are quite robust. They best fit a nonlinear or threshold carcinogenic risk model for arsenic with an inflection point at 150 µg/L (Taiwan) with the presence of at least one additional confounding risk factor. Further analysis will follow the deciphering of the village code. However, interpretation should be cautious because the Wu et al. (1989) study contained data for only about onethird of the villages in the six-township area.  Li et al. (2006) measured paraoxonase 1 (PON1) in workers and found an inverse association between lead exposure and enzyme activity. This observation compliments some epidemiology and related experiments with animals, because low paraoxonase activity is associated with diabetes mellitus, familial hypercholesterolemia, ischemic heart disease, and metabolic syndrome (Klevay 2004). Paraoxonase, although studied most extensively because of its ability to detoxify organophosphate insecticides (James 2006;van Himbergen et al. 2006), has drawn increasing attention because it hydrolyzes homocysteine thiolactone, a vascular toxin that inhibits copper enzymes (Klevay 2006). Lead intoxication has many manifestations (Fischbein 1998), lesser-known of which is induction of copper deficiency (Klauder and Petering 1977). Rats deficient in copper have an approximately 28% decrease in paraoxonase activity (Klevay 2004). These observations are consonant with the decrease in superoxide dismutase (SOD) associated with occupational exposure to lead  because this enzyme also depends on adequate copper nutriture for activity (Linder and Goode 1980;Owen 1981). Thus, SOD is an index of copper nutriture in humans (Uauy et al. 1985). Li et al. (2006) stated that "the mechanism by which heavy metals inhibit serum PON1 activity is still not clear." It seems likely that lead interferes with copper utilization in the workers, leading to low copper nutritional status (Li et al. 2006). Low copper status has been related to a large variety of adverse cardiovascular phenomena in both animals and people; in this context, the most important are hypercholesterolemia, hypertension, and impaired oxidative defense (Klevay 2000(Klevay , 2002.

S.H.L. is a consultant in medical epidemiology and has clients (plaintiff and defendant) involved in arsenic issues before regulatory agencies and in
Are there unpublished copper data on the workers, or can they be reexamined to test this copper hypothesis? Plasma copper and ceruloplasmin are not likely to be useful because they are increased by inflammation (Pepys 1996) and may be falsely high. Extracellular SOD may be helpful because it is sensitive to low copper status (Johnson et al. 2005), and low values have been associated with atherosclerosis in humans (Landmesser et al. 2000;Wang et al. 1998).
The author declares he has no competing financial interests.

University of North Dakota School of Medicine and Health Sciences
Grand Forks, North Dakota E-mail: leslie_klevay@und.nodak.edu We appreciate the opportunity to discuss the issue raised by Klevay in his letter. Klevay suggested that the decrease of PON1 activity in lead workers, as we reported in our article (Li et al. 2006), might be due to copper deficiency induced by lead intoxication. Because the scope of our study did not include the interaction between lead and copper, we did not measure plasma copper level, ceruloplasmin, or superoxide dismutase (SOD) activity in our cohort. Therefore, it is not possible to test Klevay's hypothesis at this point.
Although animal studies have shown that lead ingestion caused a decrease in blood copper level in cattle (Doyle and Younger 1984) and weanling rats (Mylroie et al. 1986), the relationship between lead exposure and copper deficiency in humans is less evident. Although  and Patil et al. (2006) showed a decrease of SOD activity in lead workers, results of other studies did not agree with such findings (el-Gazzar and Hamid 1998;Oktem et al. 2004). Whether long-term lead exposure induces copper deficiency in humans is still in question.
To our knowledge, there are very few studies dealing with the effects of copper on PON1 activity. Debord et al. (2003) showed that copper ion had weak inhibitory effects on PON1 activity in vitro, whereas Klevay (2004) claimed that a copper deficiency diet caused PON1 activity reduction in rats. With little information available, it is difficult to conclude whether copper has any effects on PON1 activity in humans.
However, because of the link between copper deficiency and cardiovascular disease, we agree with Klevay that in the future, the association between lead exposure, copper deficiency, and serum PON1 activity should be examined.
negative impacts on the environment (U.S. Environmental Protection Agency 2002). Indeed, there was no mention of the relative rarity of lagoon failure, leaving the impression that this is a common event.
In some cases, where the authors attempted to assign risk to CAFOs rather than to animals, they presented biased opinion rather than fact. For example,  stated that Pathogens tend to be amplified in animals raised in CAFOs and, thus, are more difficult to eliminate in meat packing processes. This blanket statement does not specify a type of pathogen, but studies indicate thatamong the more important pathogensthis is an inaccurate statement. For example, pathogens such as Trichinella spiralis, formerly one of the most prominent pork-associated pathogens, have largely disappeared with the movement of pigs to indoor production (Roy 2003). Furthermore, pork carcass contamination with bacterial pathogens such as Salmonella is consistently lowest in the large packing plants, which because of the large volume of production are most likely to acquire animals from large producers (U.S. Department of Agriculture, Food Safety and Inspection Service 2006). This clearly invalidates the argument that these pathogens are more difficult to eliminate in the packing process.  also referred to Denmark as a country that has experienced a successful transition to antibiotic-free production. This is incorrect, and indeed the latest DANMAP report (Danish Integrated Antimicrobial Resistance Monitoring and Research Program 2006) indicates that therapeutic antibiotic use in agriculture now exceeds the amount of antibiotics that were used to promote feed conversion before a ban on antibiotic growth promotion. Additionally, pig mortality in Denmark has risen 25% in the last 10 years (Agence France-Presse 2005).  suggest that industry leaders should take a leadership role in antibiotic use, but they apparently were unaware or intentionally overlooked the fact that the American Veterinary Medical Association, the American Association of Swine Veterinarians (2004), and the National Pork Board (2000) have had prudent use guidelines for 7 years, and that in 2005 the pork industry launched the Take Care -Use Antibiotics Responsibly program (National Pork Board 2005). More than 50 million pigs are marketed each year by producers who recognize the importance of protecting public health and animal health and welfare through the responsible use of antibiotics by pledging their support for the Take Care program.
The topics covered by EHP are important and worthy of public discussion and scientific investigation. However, we do the public, producers, and the research community a disservice when that discussion is driven by misinformation and subjective opinion.
I am pleased that the National Pork Board has read the Mini-Monograph on Environmental Health Impacts of Concentrated Animal Feeding Operations (CAFOs) ); I hope the board has given consideration to the 26 recommended priority research needs and 16 recommendations for translating science to policy. These were carefully developed by the 31 authors in the five workgroups on Airborne Exposures, Monitoring and Modeling, Water Quality, Infectious Disease Epidemics and Antibiotic Resistance, and Community Health.
Wagstrom states her belief that the process of developing the reports was flawed; because she was not involved in the process, she likely does not know what the process was. In 2002, I convened a group of seven professors, each with decades of experience with this issue and representing environmental health, exposure analysis, occupational medicine, veterinary medicine, environmental policy, water quality, and infectious diseases. We considered existing environmental health problems associated with the industrialization of agriculture and developed goals and plans for the conference. In 2003 we submitted a grant proposal to the National Institutes of Health and, based on extramural peer-review, received a grant to host the conference. We conferred widely with experts in the field to identify those scientists actively engaged in research relevant to the conference topics. We then invited recognized scientists with peer-reviewed publications on the subjects of the workgroups to speak at the plenary session and participate in the ensuing workgroups. In addition, three state regulators provided expertise in exposure modeling. The workgroups considered the problems and formulated a set of research and policy needs that were presented to the full workshop. Based on feedback, these presentations were further developed by the workgroups into articles that were peerreviewed by EHP before publication in the Environmental Health Perspectives • VOLUME 115 | NUMBER 7 | July 2007

ERRATUM
In the May Focus article, "Perfluoroalkyl Acids: What Is the Evidence Telling Us?" [Environ Health Perspect 115:A250-A256 (2007)], the caption on page A254 should have read "Once PFOAexposed mice reach adulthood, however, they are more likely to become obese (above)"; PFOS (perfluorooctanyl sulfonate) exposure in utero has not been linked to later obesity in laboratory animals. In addition, the caption on page A255 may be interpreted as implying causality between prenatal exposure to PFOS and PFOA (perfluorooctanoic acid) and altered body weight and head circumference in human infants. This was not EHP's intention. In fact, although an association has been shown, causality has not been established. On the same page, the journal identified as Archives of Occupational and Environmental Health is actually International Archives of Occupational and Environmental Health. EHP regrets the errors.