Utilization of data from human population studies for setting air quality standards: evaluation of important issues.

Epidemiological studies of community populations are highly relevant to the process of setting national ambient air quality primary standards, as criteria for those standards are the protection of human populations against adverse effects on health. Nevertheless, because of the difficulties of performing adequate community population studies of a quality commensurate with the needs of standard setting, the use of data derived from studies is problematic. This paper addresses the important issues of appropriate exposure assessment and health assessment, and discusses the problems of multiplex variables and colinearity as they are critical in assessments of exposure-effect relationships. It is concluded that a major problem in the use of data from such studies for standard setting is not necessarily one of scientific reliability or validity, but arises from the attempt of translating adequate science into policy decisions.

A major stumbling block in the acceptance of data from human population studies for determining scientifically reasonable air quality standards is that conducting very good studies of this type is very difficult. Publications by Holland et al. (1) are representative of the doubts, disagreements and confusion associated with such studies. Furthermore, the question has become so politicized that Congressional committee reports, such as the Brown Committee report on the EPA Community (CHESS) studies (2) have major impacts on how one views the data from such studies. Nevertheless, the task of clarifying how such studies can be used is important and worth pursuing, as these data are the most pertinent for setting standards based on adverse effects on health in human populations.
Several attempts have been made to establish criteria by which epidemiological studies can be evaluated (3)(4)(5), and several major reviews have tried to face the issue of what criteria to use and how to select adequate and appropriate studies (6)(7)(8)(9)(10)(11)(12)(13)(14)(15). One federal interagency group has tried to publish guidelines for use in judging epidemiological studies (16), but this has met with disagree-*University of Arizona Health Sciences Center, Westend Laboratories, lucson, AR 85724. ment and opposition in the epidemiological community. What are the problems of such studies that have led to this potential impasse and how does one resolve these problems? These discussions represent the plan of this paper.
Basically, there are three methodological problems: those related to the measures of exposure, the measures of effect, and the use of covariables and confounding variables. These are all important in the attempt to obtain estimates of exposure-response relationships. These problems will differ in geographical (spatial), temporal, and temporal-spatial studies. They will differ in studies of episodic and nonepisodic acute effects and chronic effects. They will differ in retrospective (that is, outcome-to-exposure) and prospective (that is exposure-to-outcome) studies.
There is substantial agreement as to problems related to exposure measurements and their accuracy and relevance for individuals studied. Exposure to pollutants may not be sufficient or accurate in population studies. It is even possible that the correct pollutants are not even being measured. The number of studies needed to investigate interactions between pollutants increases rapidly with the number of pollutants of concern. Personal exposure variables and meteorological concisions may cause a given subject to experience pollution levels very different from those measured at a nearby fixed monitoring station. These problems are greater in long-term studies because the nature and quality of aerometric data are variable over time and because individuals change job and residences and thus their exposures, over time (2).
There is less agreement on health end points. Many health variables in epidemiological studies are qualitative or "soft" answers to questionnaires. Responses may be biased by the way in which questions are asked, as well as the setting. For example, an air pollution alert may increase positive responses to a direct question. The quantitative measurements, such as pulmonary function, may be affected by other conditions at the time of testing, and by the presence of acute disease (17). Quantitative results are likely to be questioned by some as unimportant outcomes, especially in acute studies. This may occur in --spite of their legitimacy in medicine as indicators of disease. In fact, there is great disagreement as to the importance of physiological, biochemical, immunological, and other such changes in individuals. Then what is their biological meaningfulness? In the long run, one must rely either on current biomedical scientific judgments or majority agreement among environmental epidemiologists (18).
Problems in techniques certainly plague all studies, though not only epidemiological ones. However, covariates and confounding are very special problems in epidemiological studies. Thus, for instance, the Investigative Report (2) had this to say about epidemiological studies in general: "Whether the health measurement is subjective or objective, the response is often affected by factors (covariates) associated with the subject studied and unrelated to pollutant exposure. Whether the individual smokes or is subjected to cigarette smoke at home or work is a corvariate of dominant importance in pollution studies. Educational attainment may affect responses to questions about phlegm or pneumonia. Occupational, age, sex, race, immunity to influenza, allergy, access to air-conditioning and countless other covariates complicate the interpretation of epidemiologic data. Epidemiologists treat covariates in two ways. They try to choose study populations which have similar covariate characteristics so that health differences between such populations can be ascribed to pollution effects. Alternatively, they make mathematical adjustments to nullify the effects of covariate imbalances. Both strategies have weaknesses, and neither works if the investigator is unaware of an important covariate or has failed to measure it." Some covariables can often be as important as the major aerometric variables themselves in affecting human health. In addition to other expo-sures such as smoking and occupational exposures, meteorological variables, such as wind speed, temperature, sudden temperature changes, and humidity levels, are very important as predisposing and precipitating factors, which, along with air pollutants, might affect health in a deleterious manner (19)(20)(21)(22)(23)(24)(25)(26)(27)(28)(29). Speciation of particulates and particulate size may be critical as well, but without adequate exposure data, epidemiological studies may be of little use in studying such refined issues.

Review of Importance of Major Covariables
Studies of the acute effects of pollutants often considered meteorological variables, age, and sex as important possible covariables. However, many such studies did not measure other variables. In chronic studies in adults, smoking and pollutant levels are examined separately, to determine any additive effects. Also, study groups that have very similar smoking habits, but different pollutant exposures have been compared. In longitudinal studies, it is necessary to measure changes in smoking habits, as many longitudinal changes may be associated with such changes (30)(31)(32)(33). In children, smoking is considered as a less likely confounding variable.
Social class (SES) may not only affect reports of health, but may affect the actual health outcomes themselves. Some investigators have studied only one sex within a specific occupation group in order to minimize occupational and social class differences (20,(34)(35)(36)(37)(38)(39). This may not be sufficient always in that urban/rural differences, economic differences, or activity differences may have still affected health. This approach is often considered to be an acceptable way to control for occupational and social class differences. However, specific occupational exposure conditions are almost never considered in such studies, despite their frequent importance (40). Some studies have used education or income to control for socioeconomic factors, because such variables are highly correlated with related factors, such as smoking, migration, and various household characteristics (e.g., the number in a family and crowding).
Exposure to passive smoking and other sources of indoor pollution may be critical, as the relevance of those exposures may have deleterious health effects (41). Indoor exposures to NAAQS pollutants may be less than outdoor levels for some gases (i.e., SO,, 03) and may be greater than outdoor levels for some gases (i.e., NON, CO) and total particulates, as seen in Table 1 and Figure 1. As seen, indoor values may have greater temporal peaks, with concomitant effect on health (41). The fact that 80-90% of an individual's time is spent indoors would seem to underestimate the effects of SO2 and 03, especially chronic effects, and may incorrectly estimate the effects of NON, CO and particulates. Indoor particulates appear to be quite different from outdoor particulates, complicating the issue further. Therefore, though ambient levels might still show an exposure-effect relationship, the exposure side of the equation (ambient concentrations) must be considered a useful marker or index only and used as such.
Furthermore, occupational epidemiological studies may be useful to consider mechanisms of the effects of various pollutants (12)(13)(14)(15). However, they are not as useful in estimating dose-response effects, because the working population is generally more healthy and is self-selected. Furthermore, the mixes of pollutants in industry are different.
As ethnic group differences are related to physiologic differences, such as in pulmonary function, it has been easier usually either to exclude all but one ethnic group/race from a given study or to analyze results from the ethnic groups separately (4347).
As previously mentioned, few have studied interactions of the pollutants present, such as oxidants with sulfur oxides in producing observed health effects. Such differentiation of specific pollutant effects or elucidation of synergistic effects, by means of epidemiological studies, are difficult. Only few such studies (25,26,4446,(48)(49)(50)(51) appear to have attempted this differentiation; they have shown significant interactions.
Some studies exist which indicate that possible confounding variables are not always as important as they were thought to be. For example, follow-up studies on a cohort started by Douglas, Waller, and colleagues (52) did not confirm original social class differences to be of much significance in accounting for health findings later in life. Furthermore, Manfreda (53) did not find "urban" characteristics to be relevant in explaining his results. Other studies have shown that household/familial factors are not important in all cases (49)(50)(51). Likewise, geographical studies have shown positive relationships of adverse health effects with pollutant concentrations despite potential selective migration (38,44,46,(54)(55)(56). Thus, one should not overemphasize the relative importance of potential confounding or covarying factors when these have not been specifically ruled out as alternative explanations for specific results.

Criteria for Evaluating Studies and Their Results
As has been stated frequently, no single study alone, no matter how well-designed or conducted, completely establishes a "scientific fact." Rather, excellence in the design and conduct of a given study, internal consistency, biological plausibility of results, and their consistency with other results (such as from animal toxicological and controlled human exposures studies), and specificity of results help to heighten confidence in the likely existence of the relationship obtained. Even greater certainty is attributed to the probable existence of such relationships if further independent studies, regardless of particular individual flaws, yield results consistent with such relationships. Thus, consistency, in the overall pattern of results indicative of particular relationships, or the overall "weight of the evidence" from more than one study, are crucial in establishing given relationship in determining the degree of certainty ascribed to the (4,18,57,58).
With these observations in mind, some criteria can be stated by which to evaluate epidemiological studies. First, the study design, population (and size) has to be reasonable, the health measurements reliable, co-and confounding variables considered, the aerometric data sufficient and that analysis sufficient. The results should have internal and external consistency, be biologically reasonable and some replication if often necessary to insure plausibility. With these criteria, certain studies can be used as examples to see if they can provide data for standard setting.

Examples and Illustrative Concepts
A geographic comparison study by Lambert and Reid (59) was the first to demonstrate adequately that levels of smoke shade and SO2 were additive to the effects of smoking on persistent productive cough and pulmonary function. Their study involved 10,000 residents ages 36-69, by area of residence. However, social class was not controlled. Some have indicated its importance (1), while others find it less important (52), even in the same country. Nevertheless, the results are biologically reasonable. The aerometric data might be spotty, but estimates are sufficient to determine ranked differences in areas (60). Thus, it is worth using this study as relevant for criteria documentations of exposure-effect.
On the other hand, a similar geographical study by Winkelstein and Kantor in Buffalo, NY, women (37) controlled for social class, smoking and occupation. However, because it used an "unacceptable" measurement method for SO2 (sulfation rate), it was not acceptable to use as part of standard setting criteria. Nevertheless, even this method can properly distinguish census tracts of differing sulfur oxide concentrations and the study should be relevant for the scientific evaluation of exposure-effect relations along with a parallel mortality study (54,55).
A less persuasive example of a geographical study is provided by Lave

March
April May June WEEKS FIGURE 2. Spring 1964: Common cold incidence and prevalence rates, air pollutatits (particle matter (COH), CO, S02) and temperature, as daily averages by week (CFIS).
They attempted to obtain the relationship between the bronchitis mortality and sulfates in England and Wales and in the U.S. using geographically derived analyses. Unfortunately, this method is easily biased by the use of geographical data and by the lack of important covariables (e.g., smoking) in the analyses. They also showed that the pollution variables were so highly correlated with one another that each had a similar relationship to mortality, and that the variability of the estimates were too great for good predictions (56). Even the use of this statistical method for this purpose has been questioned seriously (63). Temporal analyses of the type performed by Schimmel et al. (64,65) have many of the same problems. Thus, though of some academic interest, these types of studies do not help in standard settings.
A temporal study of morbidity may be illustrated by an outbreak of acute respiratory illness (ARI) occurring in a Manhattan study population (66). It occurred one spring (1964), and was related to a preceding increase in pollutants and a decrease in temperature, as shown in Figure 2 (67). Primary and secondary attack rates in families were as expected for a bacterial or viral outbreak and occurred in all ages, sex, SES, and race subgroups. Indeed, Adenovirus 5 was shown to be present by increases in paired sera in this population and similar close-by population (68); other agents did not have increased prevalence (68,69). School absenteeism curves followed this population's incidence (especially primary attack) rates. Cigarette smoking did not appear to be a factor. However, note that subsequent increases in SO2 and CoH did not produce further increases in ARIs; Co was down and temperature had gone up.
One can therefore ask whether every peak in pollution has to have an ARI response? If the population is no longer susceptible or there is no agent in sufficient presence to produce ARIs, then one might not expect further ARI increases so soon after such an outbreak. However, temporal studies of mortality in New York City (21,27,28) indicate that many pollution-weather stimuli do not have mortality responses and vice-versa. If there is a consistent response, over time, in more than one place, then a stimulus-response relationship may exist (29,48,70). The results must be biologically plausible, internally and externally consistent. Nevertheless, if the results are -0 I too complicated or do not provide separate concentrations for each pollutant, they may not be useful in standard setting.
Another question posed by the ARI outbreak and similar studies of the infectious model is why animal studies ofthis model require substantially higher pollutant concentrations to produce similar events. Lower concentrations can produce cellular effects but total organ or systemic effects are more difficult to produce in the laboratory. There are several possible (hypothetical) reasons for the difference in dosage necessary to promote pathogenic infection. Laboratory animals are generally healthy and kept healthy, while humans have some baseline morbidity and are attacked continuously by biological and environmental insults. These biological insults are neither attenuated nor dose-regulated, as are microorganisms used in laboratory experiments. Furthermore, contact spread and exposure time are continuous among humans, but are minimized in the laboratory setting. The environmental insults to community populations are complex mixes of pollutants and of meteorological conditions (multiplex situations), whereas the exposure is extremely controlled in the laboratory setting.
Do acute morbidity effects lead to chronic effects? Those with chronic airway obstructive diseases have a history of significantly more frequent and severe ARIs (40) and a significant history of childhood respiratory problems (71). A study of acute pulmonary function changes in healthy children in a smelter town (72) indicated significant acute reversible changes. A further study of children in that town, another smelter town, and a control town (73) indicated that pulmonary function values were lower overall in the smelter towns (even despite potential selective migration). Thus, there are grounds for a possible relationship between acute and chronic pulmonary function changes. Nevertheless, it is sometimes difficult to separate the acute (peak) exposure effects from the chronic exposure effects (74).
Epidemiologists are asked frequently to assess effects in sensitive individuals. Cohen et al. (26) studied attack rates in 20 responsive asthmatics, derived from all physician-confirmed asthma in Cumberland, West Virginia. Over a period of 7 months, they showed significant correlations between reported and confirmed attack rates and 24-hr mean air pollution levels after the effect of temperature had been removed from the analysis. Physician visits were used to validate the attacks. Significant above average increases in attacks were seen with 24-hr concentrations of the pollutants. Suspended sulfates showed the strongest relationships, although suspended nitrates SO2, TSP and soiling index (CoH, Coefficient of Haze) each individually explained a significant portion of the residual after the effect of temperature had been removed and season controlled. This effect in asthmatics is confirmed by controlled human exposure studies (75). Since this appears to be a reasonable and biologically plausible attempt at exposure-effect estimations, it should be used as criteria for standard setting.
Another attempt at an estimation of an exposure-effect relationship has been made by Leaderer et al. (76). They combined CHESS chronic bronchitis studies (Rocky Mountains, Salt Lake Basin, New York) with similar studies of the Yale Lung Center (Connecticut, South Carolina) to form a dose-response relationship with sulfates, SO2 and TSP. They accounted for sex and smoking. Using least squares, they found that every 2.0 gig/m3 of sulfates adds 1.24% to the chronic bronchitis rates for both sexes. The use of step functions suggested a level of 5.8 ig/M3 sulfates as the point at which chronic bronchitis starts increasing rapidly. Having developed a "threshold" equation for estimating excess chronic bronchitis, they attempted to estimate total excess cases using sulfate sampling data for the U.S. along with population estimates for the U.S. According to their results, approximately 150 million were exposed to annual sulfate concentrations above 5.8 jg/M3 (their threshold) in 1972. They then plotted estimates excess cases against estimated increases or decreases in concentrations (Fig. 3). They found that a decrease of 70% from 1972 ambient sulfate concentrations would "essentially eliminate any excess cases of chronic bronchitis related to ambient sulfate exposure." A 50% decrease in excess cases would be produced by a 30% decrease in sulfates. A 50% increase in concentrations would supposedly lead to a doubling of the 1972 number of excess cases. This information is important and appropriate to issues of criteria for standard setting.

Discussion
Other decisions relevant to the acceptance or rejection of epidemiological data are based on criteria not strictly related to scientific merit. For instance, the Brown Committee (2) reviewed and essentially rejected the EPA CHESS studies, in spite of the conclusion that many epidemiological studies shared similar problems. Its criticism was often of epidemiological studies in general, yet its decision to disqualify the relevance of the CHESS studies was not based on scientific rationale per  a6* se. After all, it was not a committee of peer scientists. In fact, the CHESS studies' major problem was one of exposure assessment, as is true of most studies. Accurate and precise estimates of dose are not obtained, but estimates of exposure can be, and have been, obtained from epidemiological studies (18), including CHESS. In terms of design and assessment of effect, CHESS studies represented state-of-the-art techniques. If CHESS had problems with a few confounding variables or with follow-up in panel studies, they were problems common to other studies and should be addressed with the same scientific criteria. Political criteria may be relevant when administrators decide on actual standards, but not when evaluating scientific merit (18,75).
It happens sometimes that scientific criteria of useful data for standard setting becomes too strict, or scientific committees become overzeal-ous in rejecting large bodies ofbiologically plausible and consistent data. For example, Holland et al. (1), in a review financed by the U.S. Iron and Steel Institute, found only eight quotable investigators suitable for drawing conclusions about levels at which adverse health effects occur due to TSP. Even in a restricted set of studies on one aspect of the relationship, in which there were 23 studies (Table 2), only two were considered useful by Holland et al. (1). If one considered only those studies listed in Table 2 that were performed in the U.S. as appropriate for U.S. standard setting, there would be none. Fortunately, other studies have been performed since then, which were consistent with those listed in finding health effects even after controlling for appropriate confounding variables (73,91,(100)(101)(102).
Examples of the formulation of exposure-effect relationship derived from data furnished by epi- demiological studies are illustrated by the assessments of the National Research Council (NRC/ NAS) and the World Health Organization (WHO) (6-9, 12-15, 18, 103) (Tables 3-5). It is important to note that the formulations, and the concentrations from the same studies, can differ in such attempts, dependent on the specific committee performing this task, when the formulation was made, and for whom. Those assessments by WHO and NRC/NAS reflected scientific differences of opinions in the evaluation of animal toxicological and controlled human exposure experiments as well, and because of the differences in setting again. Nevertheless, the NRC/NAS and WHO committees were in substantial accord. It appears to indicate that sufficient scientific agreement  (110,111). bHigh volume sampler (2-month mean, possible underestimation of annual mean). cEstimates based on observations after end of study; probable underestimations of exposures in early years of study. dAutomatic conductimetric method. eLight scattering method, results not directly comparable with others. does exist, that data from epidemiological studies are sufficient to provide the basis on which standards can be based.

Conclusions
Epidemiological studies are sufficiently difficult to perform adequately, so there has been the reflected difficulty in determining which are adequate and which data can be used in providing a scientific base for standard setting procedures. Good estimates of exposure in population studies have been the most troublesome aspect. Measures of health effects have been somewhat troublesome, but studies are more often rejected for inadequate attention to covariables and confounding variables. There has been disagreement and contradictory results concerning these variables, as demonstrated in various examples. Conceptual aspects of these problems continue to require further clarification. It is quite apparent that different groups of scientists may come to different conclusions dependent on the time, and especially on the circumstance, in which conclusions are drawn. Nevertheless, expert committees of both the National Research Council/National Academy of Sciences and the World Health Organization, neither one committed to telling the U.S. government what standards should be set, have been able to use epidemiological data to recommend levels at which adverse effects on health are likely to occur. Other individuals and groups have been able to provide such estimates as well, such as the American Thoracic Society (11). Thus, scientific agreement is possible and epidemiological data are useful. This is important as standards are set for human populations. The final decision concerning standards, and only the final decision, should and will involve policy decision incorporating social and political factors.