The ecologic method in the study of environmental health. II. Methodologic issues and feasibility.

This paper reviews some methodological aspects of ecologic studies of human health, with emphasis on investigations of environmental quality. A recent census of Canadian and U.S. data sets potentially suitable for this type of study is summarized. It is concluded that despite the considerable utility of the ecologic design for this purpose, substantial practical difficulties are common in their implementation. Particular problems are the relative scarcity of relevant environmental data and complications associated with rendering them compatible with health data.


Introduction
This paper discusses some of the key methodologic issues in the use of the ecologic design for epidemiologic studies. Several ofthese issues affect both ecologic and other tpes ofdesign, but the ecologic design is also affected by the so-called ecologic fallacy. Some statistical issues in the analysis ofecologic data are also outlined. Finally, the practical feasibility of using the ecologic design is assessed using a recent census ofdata sets on water quality and human health in the Great Lakes area.

Methodologic Issues
Establishment of Appropriate Numerators A major requirement of ecologic studies is that all relevant health events (which constitute the numerators for health event rates) are accurately and completely recorded. Accuracy and completeness may vary according to the type of health event in question; completeness is likely to be good for mortality data, but the accuracy may be less than perfect. Both accuracy and completeness may be poorer for less serious morbidity and symptomatology.
As an example, consider cancer registration. Swerdlow (1) describes numerous problems, including the possibility ofmissing data on variables such as occupation or birthplace; boundary changes that may occur over time in the region covered by the registry; persons that may appear in more than one registry if they are treated at separate health facilities; registration determined according to the place of residence or the location ofthe treatment facility; administrative delays causing late registration and deletions or alterations to registry figures once they have been assembled; record linkage may be required to eliminate *Dtrptment ofClinica Epidemiology and Biostatistics, McMaster University Medical Centre, Hamilton, Ontario, Canada, L8N 3Z5. duplicate registrations ofthe same patients and to deal with patients with multiple cancer primaries. All of these problems potentially affect the accuracy of the numerators that would go into an ecologic analysis.
An additional problem to be faced in time-trend ecologic studies is that the taxonomy of health events may have changed over time. An example is the periodic revision of the International Classification for Diseases (ICD). Because aggregations, disaggregations, and redefinitions occur at each revision, artifactual changes in disease rates occur, even though the actual disease rate may have remained constant. Similar artifactual changes in time series information might occur in the use of hospitalization data. For example, the construction ofa new hospital might increase the number ofhospitalizations, because ofincreased numbers ofclinical referrls to the new facility from outside the region.
Finally, population migration may affect the disease numerators. Crump and Guess (2) note that migration occurs in some patients after their diagnosis ofcancer or other serious disease, leading to a discrepancy between the residence on the death certificate and the residence at diagnosis. Migration may also be associated with environmental exposure variables. In their study ofhealth and water quality in Southern California, Mah et al. (3) noted that very high rates of migration might account for discrepancies in their results between analyses using mortality and incidence data.

Establishment of Appropriate Denominators
A valid ecologic study requires the identification ofpopulation denominators that correspond to the health event numerators. In other words, the denominators should consist ofthe numbers of individuals at risk ofexperiencing the health event in question. Most usually, populations are estimated from census data, and they can be constructed for geographic units such as states, counties, municipalities, or population subgroups such as inner cities with high numbers of ethnic minorities (1).
The population denominator estimated from the census is supposed to provide the population at risk ofdisease. However, there are still some problems for ecologic studies even at this level. Some individuals in the population maintain temporary addresses, for instance, students and members ofthe armed forces. Such temporary addresses might be the ones used for the purpose ofcancer registration or death certificates, rather than the more permanent address. Also, ifdisease registration is made using the location ofthe treatment facility, persons who are treated in a different geographic region than the one in which they reside will be associated with the wrong denominator.
Unknown addresses can also be problematic. In the Ontario Cancer Registry, for instance, the number of patients with unknown county ofresidence is less than 5 %. However, this low rate is achieved by careful linkage of various data sources concerning each patient, at least one of which usually provides the address. In other data bases, the rate of missing residence may be higher if such linkage is not possible.
Censuses have undergone some changes in their operational procedures. For instance, in some British censuses, individuals were counted even if they were temporary residents, whereas permanent residents away on holiday were not counted; in other censuses these rules have not applied. Such rule changes might create artifacts in time-trend analyses.
There are some health events for which the census denominator is not appropriate, most notably, perinatal and neonatal mortality. The appropriate ecologic denominator here would be the total number of births in various geographical regions. Registration of births is virtually 100% complete in developed counties, but some of the same problems of assignment to the correct geographic subunit may still apply.
Administrative geographic boundary changes may occur, a problem affecting both census and birth denominators. Although infrequent, boundary changes do sometimes affect the composition of municipalities and/or counties, and adjustment would be needed to establish comparable figures for time-trend analyses.

Reliability of Exposure Data
The crucial assumption made in ecologic analyses is that the exposure level assigned to a geographic subunit applies to all members ofthat subunit. An ideal ecologic study would be one in which homogeneous subgroups ofthe population were identified and where a single measurement ofexposure (e.g., a water quality variable) could validly be assumed to apply to all persons. For instance, one would like to assume that water quality as evaluated at a treatment plant would correspond to water quality in all the homes it serves. Changes in quality as the water moves through the-distribution system should be negligible.
A further assumption of the ideal study is that residence in a particular location implies exposure of the individual at the assumed ecologic level. This will often be an invalid assumption. For instance, in studies ofwater quality, alternative sources and modifications to the water supply might be used (e.g., bottled water and water softeners). In addition, even if some consumption ofthe domestic water supply occurs, other supplies may also be used; for instance, persons who spend a high percentage of their working day away from the home may actually consume most oftheir water at work. Finally, the ecologic design cannot take into account variation in individual consumption; even ifthe other assumptions are met, such variation would negate the assumption of equivalent exposures to all members of the ecologic unit of analysis.
The extent of this type of problem will likely vary between locations and between individuals. For instance, Hogan et al. (4) noted that only a small fraction ofeach county in their study had received its water from the facility used for water quality assessment. Also, Mah et al. (3) found that 18 to 25% of southern Californians drink bottled water. This high percentage increased the difficulty ofdoing ecologic studies based on the quality of water in the public distribution system.
One must also be concerned that the measurement ofexposure is consistent over geographic regions and over time. If several laboratories are involved in the testing oflevels ofenvironmental contaminants in various parts of the geographic region, interlaboratory reliability should be assessed. Similarly, if laboratory techniques have changed over time, cross-validation and calibration would be required.
Crump and Guess (2) comment on the possibility of indirect measures being used for environmental exposures, for instance, the use ofchlorination instead ofdirect measurements oforganic contminant concentration. Wiklins et al. (5) note that use ofsurrogate variables for exposure (such as surface versus groundwater) makes an assumption that surface water supplies are higher in organic contaminants and perhaps more likely to be chlorinated han groundwater sources. While surface water will generally be higher in contamination levels than ground water, there may be considerable overlap between the distributions. Failure to measure detailed conaminant levels may reduce precision in the statistical analysis. Finally, certain conaminants vary seasonally and may not be distributed uniformly in the water supply system (5).

Consideration of Latency in Disease Development
For many health outcomes, the exposure of interest is not the current environmental quality, but rather the exposure levels as they existed several years previously. For instance, to investigate the effect ofmany carcinogens, it mightbe necessary to establish historical data from 10 or more years previously.
In their ecologic study ofthe association ofwater chloroform levels and cancer, Hogan et al. (4) noted the possible inappropriateness of their chloroform exposure data, which was collected in 1975, whereas their cancer data related to the period 1950 to 1969. An assumption was required that the same chloroform readings would have been obtained had they been available in an earlier time period; the most relevant time period would have been 1925 to 1959, if a latency interval of 10 to 25 years is presumed.
The study by Ththill and Moore (6) is a rare example of an analysis using historical data. This was termed a "ecologic time lag study," using 1949 water quality data in Massachusetts and relating it to cancer mortality in 1969 to 1973.

Population Migration Effects
An additional problem using the ecologic approach if latency is to be taken into account is that some individuals migrate and hence are exposed to various environments over time. Ifone was studying individuals, then in principle one could construct an exposure history based on the environmental quality in the various locations at which each individual had resided. (Even this would be very difficult in practice.) However, it appears virtually impossible to adjust for migration to establish comparable exposure histories on an ecologic basis.
Migration may also be important as a response to or outcome of health problems; it is conceivable that persons in relatively good health are more likely to be migrants. For instance, people with exceptionally good health may be more able to migrate to take advantage ofevolving economic opportunities. The converse is also possible; for instance, asthmatics may move to places with a preferable climate (e.g., Arizona); aged or ill persons may move to locations with better medical facilities. Ifeither of these phenomena exist, they would considerably complicate the interpretation of ecologic data.
A partial solution might be to restrict the ecologic assessment of latency to stable communities where the rates of inand outmigration are relatively low. The use ofstable populations might then permit a more valid ecologic assessment ofexposure of individuals based on residence. However, stable communities might differ from unstable communities in various ways; the social structure of long-established stable communities is likely to be quite different with respect to lifestyle and socioeconomic variables, which may in turn be related to both environmental quality and health outcomes. Study generalizability may therefore be limited.

Ecologic Fallacy
The so-called ecologic fallacy is the most important methodologic problem afflicting ecologic studies. The key issue is that the degree ofassociation between an exposure and disease may differ in ecologic data, as compared to the same association measured using data from individual people. The fallacy comes about because the overall association between exposure and disease is made up of two components, one representing the covariance within ecologic subgroups and the other representing the covariation between ecologic subgroups. Depending on the relative importance of these two components, an ecologically measured association can either be stronger or weaker than the same association evaluated with individual data. Figure 1 shows two hypothetical scenarios to illustrate this point. In scenario A, there is a strong covariance between exposure and disease within groups but only a weak covariance between groups. If this association were evaluated ecologically with only one average exposure level and one overall disease risk being measured for each subgroup, the association would appear very weak. However, ifthe group means were taken into account in individual data, the strong association within groups would become evident.
In scenario B of Figure 1 the reverse situation applies. Here there is only a weak association within each groups but a strong association between groups. Hence an ecologic analysis would show a very strong association, while only a weak association would be found in individual data after appropriate adjustment for group effects. Morgenstern (7) has described how the ecologic association is affected by two forms ofbias, aggregation bias and specification bias. Aggregation bias occurs when data are aggregated or "collapsed," ignoring the subgroups of data from which individual observations came. Specification bias is effectively a confounding effect of "group." Specification bias can occur ifa third or extraneous risk factor is differentially distributed by group, or ifthere is some property ofthe ecologic subgroup that is correlated with the disease rate. The combination ofaggregation and specification biases, termed cross-level bias (7), can make an ecologic association stronger or weaker relative to the individual data, but it is usually the ecologic association that is stronger. No bias exists ifand only ifthe mean exposure level for a group has no effect on the disease rate given an individual person's exposure value.
Individually collected data are also potentially subject to confounding from extraneous risk factors, as is well known in prospective and retrospective epidemiologic studies ofmany kinds. However, individual data are not subject to aggregation bias. It turns out that an extraneous variable that is a confounder at the individual level may not be a confounder at the ecologic level.
The example given by Morgenstern is that of sex, which is likely to be similarly distributed across geographic regions. Hence sex may be a confounding variable for case-control or cohort studies, but itis unlikely to be a confounder in an ecologic study.
Effect modification (or intaction) by a covariate can also confound an ecologic association even in situations where individual level confounding would not occur (8). It is also possible that a variable that acts as a confounder at the ecologic level may not confound at the individual level. This possibility is most likely ifthe grouping into geographic subunits is made on the basis of the disease rate. In such a situation, confounding will emerge with any variable correlated with disease rate, even if it is not associated with exposure to the risk factor at the individual level.

Multicollinearity
A problem of analysis that applies to many types ofobservational study is that several risk factors may be mutually correlated. It is thus more difficult statistically to estimate the contribution ofeach factor, adjusting for possible effects ofthe other factors. The problem becomes more acute as the level of intercorrelation between exposure variables increases. In an extreme case, where two risk factors are perfectly correlated, it is impossible to distinguish their possibly differential relationships to health outcomes.
This problem of multicollinearity is likely to apply quite strongly to ecologic studies ofthe environment, for instance, of water quality. Water of high quality will tend to have low levels ofcontamination by most pollutants, whereas poor quality water may be contaminated by several toxins.
Multicollinearity is usually stronger at the ecologic level than at the individual level. This is because in the ecologic analysis, each geographic subunit is assigned a single value for each exposure variable, ignoring the variation within the ecologic subgroup; however, the correlation between exposures within subgroups is typically not 100%. Ifit were possible (through use of individual observations) to account for the within-subgroup variation in exposure, collinearity would be reduced. The important practical implication ofcollinearity induced by ecologic aggregations is that it is more difficult to separate the contributory effects of different exposure variables with ecologic data than with individual data. In particular, the problem of ecologic multicollinearity is likely to be more severe in data where there are groups containing large populations or if the number of subgroups in the data is small.

Strategies in Statistical Analysis to Avoid Ecologic Bias
Many ecologic analyses use regression techniques to assess the association between exposure variables and health outcomes. This is entirely appropriate because ifthe ecologic subgroups are homogeneous with respect to exposure, regression will yield unbiased estimates of risk coefficients. In order to construct homogeneous subgroups, it may be necessary to use quite small geographic areas as the units ofanalysis. This raises the question of feasibility, in particular, whether suitable data would be available on a small-area basis. In addition, use of small areas will increase potential problems associated with migration ofthe population; the probability that an individual migrates in or out of a small geographic area is large compared to the corresponding probabilities for a larger geographic unit. For instance, migration between census tracts is quite likely relative to migration rates between larger geographic units such as states. An additional problem associated with using small areas is that each area will involve a smaller sample size, with corresponding imprecision of the estimated disease rates.
In the situation where unstable rates can occur or where the rates in different ecologic subgroups have different precision, it may be desirable to use weighted regression. Pocock (9) has argued that ordinary, unweighted regression is inappropriate if the ecologic groups vary substantially in size. On the other hand, weighting the observations according to the inverses of their variances (a common weighting technique) may be too extreme, giving too much emphasis to large towns or population groups. An intermediate solution uses maximum likelihood techniques and takes into account the variation in rates which would be expected by chance (9).
Pocock (9) used the maximum likelihood method in an analysis ofstomach cancer mortality in 25 London boroughs as related to the degree ofwater reuse in tap water supplies. Hogan et al. (4) investigated the association between cancer rates and water quality in all counties ofthe 48 contiguous United States. There were substantial differences between the results depending on whether weighted or unweighted regression was used. They speculate that these discrepancies were due to interactions between the effect of chloroform (the main water quality variable under investigation) and size of population in the ecologic subunits. Wllkins et al. Many authors analyzing ecologic data have used correlational techniques. However, Morgenstern (7) has pointed out that ecologic grouping ofdata may lead to bias in the estimated correlation coefficient, even if the ecologic groups are homogeneous with respect to exposure. In addition, one may frequently see high correlations between disease and exposures in ecologic data ifthe grouping into geographic units has been made on the basis ofexposure level. High correlations do not necessarily mean that the exposure variables are important predictors ofthe health outcomes, but simply that other potential confounders are likely to have been well controlled through the ecologic grouping. Overall, there seems to be a strong case for using regression as opposed to correlation in the analysis ofecologic data. Ifcovariates are present, the usual tactic of standardization (which can eliminate the effects ofconfounding in individual data) is not adequate to produce unbiased effect estimates in ecologic data. Consequently, situations where one or more confounders are suspected must be interpreted with great caution in ecologic data (8).

Other Issues Concerning the Interpretation of Ecologic Studies
It is unlikely that all of the scientific information on an environmental hazard will come from ecologic studies alone. Even for those situations where ecologic data provide the bulk of the evidence, it is important to compare the results of ecologic studies to other types of information in order to enhance the scientific credibility ofthe results. One should consider the data from other epidemiologic studies such as case-control investigations and from animal research. Consistency of evidence between studies of different design should add to the overall plausibility of health hazards suggested by ecologic data. This said, we should note that other study designs may suffer from some ofthe same methodologic shortcomings as some ecologic studies. For instance, many case-control studies would use the same indirect measures ofwater quality, might also rely on imperfect death certificate data and may also lack important information on confounders.
Although not a major focus of these papers, one might also consider the use ofthe ecologic design for intervention studies. For instance, iffluoridation ofpublic water supplies is introduced as a preventive strategy, ecologic assessment ofoutcomes such as the prevalence of dental caries is entirely appropriate. If individuals in the mtet population actually consume their drinking water from other sources (for instance, bottled water) to any great extent, then the fluoridation will have lower effectiveness. Similarly, interventions made at the level ofthe water treatment facility may not be fully effective in the water quality of residences. Considerable variation exists between the quality of delivered water at different residences from the same treatment plant, and there may be an overall difference ofdomestic water quality relative to water leaving the water treatment facility (S). Nevertheless, an ecologic evaluation is quite suitable to estimate effectiveness of the intervention for the entire population.

Review of U.S. and Canadian Data Sets For Suitability in Ecologic Studies
The foregoing review ofthe methodology ofecologic studies was written initially in preparation for a workshop sponsored by the International Joint Commission, Committee on the Assessment of Human Health Effects ofGreat Lakes Water Quality, to investigate the possibility ofusing ecologic epidemiologic studies to study the association ofwater quality with human health in the Great Lakes area. In addition to considering the methodologic principles, attendees ofthe workshop had also been commissioned to carry out censuses of the available data in the U.S. and Canada. The author ofthis paper was then asked to comment on the potential usefulness ofeach data set for epidemiologic studies ofthis kind. The following is abriefsummary ofthat assessment. It serves as an illustrative example ofthe likely feasibility ofusing the ecologic design in addressing a specific environmental question.
The census of U.S. data sets was carried out by J.R. Wilkins and C. Reider ofOhio State University. One hundred sixty-two survey forms were distributed to Federal, State, county, and local agencies, bureaus, and institutions. Several independent, nongovemment organizations were also surveyed. Ninety rcplies were received, for a response rate of 56%.
The corresponding census ofCanadian data sets was compiled by T. Arbuckle. Two hundred thirty-nine questionnaires were sent to agencies, institutions, and government departments. There were 185 responses (77% response rate), ofwhich 93 had relevant databases. Table 1 shows the total number ofdata sets identified in each country under headings ofambient water quality, drining water quality, fish data sets, and data on human disease. The data on fish mostly concerned the levels of toxins and pollutants found in freshwater fish tissue. Because oftheir nature, these data sets were ruled out of scope for consideration. While useful research might be carried out to assess the health effects of eating contaminated fish, such studies would be unlikely to use the ecologic ¶lble 1. Number of Cdan and US data sets identified for potential use in ecologic studies of water quality and human hlth in the Great Lak region. design. The main difficulty would be in identifying exposed subgroups ofthe population. Fish products are distributed widely geographically by commercial and private fisherman, so it would be unwise to presume exposure to contaminated fish among residents of nearby communities. Furthermore, consumption of fish varies widely between individuals. Even in populations residing near a source of contaminated fish, there will be consumption of fish products from other locations. Therefore, the linkage between disease and exposure at the individual level will be very poorly represented by ecologic data.
There are similar problems with the data on ambient water quality. It may be difficult or impossible to identify the population exposed to risk through use of contminated beaches. The individuals who use recreational facilities such as a swimming beach are unlikely to all come from the same municipality. Specifically, one could not assume that residence near a contaminated supply of ambient water constituted exposure. Even ifone knew the identities and residences ofusers ofa recreational water facility, they would vary greatly in their degree ofwater exposure, depending on whether they swam, degree ofimmersion, and dates ofusage. Thus, ecologic and individual assessments of exposure would differ substantially.
Much the same argument applies to other types of ambient water, such as industrially conuminated areas and snow-melt and runoff around landfills. Ecologically exposed subgroups, constructed on the basis ofproximal residence to such areas, would likely have highly diverse, real exposures. For these reasons, the ambient water data sets are not reviewed further here. Table 2 shows the total number of drinking water and health data sets judged according to their potential suitability for ecologic studies. The more serious difficulties appear to exist with the water data sets, ofwhich only 2 out of 15 had definite applicability. The main reasons for unsuitability included limited numbers of sampling stations with few ecologic areas that could be compared; variable laboratory methodology; uncertain precision in data; irregular observations over time, often with substantial gaps; and sampling only during times of actual or perceived water containation. Additionally, in almost all situations, it is very difficult to construct an appropriate exposed population ble 2. Lev of b fore isyof anaddU.S data sets on drinking water quality and human hith in the Great Lakes reion. because of mixing of multiple sources of water through the supply system. Finally, one has to rely on the water quality variables already recorded, which may not be direcdy relevant to the health question being investigated. Perhaps one ofthe better data sets with potential for ecologic work is the Ontario Drinking Water Surveillance program. This began in 1986, with 35 municipal water supply locations and 140 variables covering microbiological, organic, inorganic, and process characteristics. Each site is measured approximately eight times per year.
The situation with the health data sets is somewhat more hopeful. Table 3 gives the distribution of the types of data set identified in the censuses. There are many good-quality data sets dealing with disease incidence and mortality in well-defined population subgroups. Vital statistics and state and provincial cancer registries are common sources of these data. Frequently the data are available for very small areas, and one is limited only by the paucity of disease cases occurring in each.
There are several other types of data with potential for ecologic work. First, there are national or large area surveys, from which more local estimates may be derived. Examples include the National Health Interview Survey (U.S.), the Canada Fitness Survey, and the Canada Health and Disability Survey. These typically consist of very large samples, with the objective of obtaining national estimates. Often, multistage sampling is used, so there may be many areas for which no individuals are sampled. This may limit the potential for ecologic work requiring detailed coverage of a more limited area such as a state.
Second, some areas maintain centrlized records ofhealth care events such as hospital discharges. These data may be used ecologically, but some work may be needed to determine appropriate populations at risk corresponding to hospital catchment areas. Another limitation is that multiple events for the same person are often not linked, so that the rate numerator consists ofa number of events rather than the number of people with any event.
A third type of data was identified from the memberships of several large worker organizations, such as labor unions and employees of corporations. Typical examples are a registry of deaths among union members, maintained for reasons associated with pensions, and files ofmedical absences from work. While the rate numerator information from such files may be excellent, there may be some difficulties with the denominator. For instance, there may be no direct linkage ofmortality data with the files ofliving employees. Also, the occupational and residential history information may be very limited or nonexistent, thus inhibiting use of the data in ecologic studies. Ible 3Typ ofdata sets on huma health ideified in censusofCanain and U.S sources. Type Number National ongoing databases from which regional 10 data can be derived National surveys from which regional data can 8 be derived Regional ongoing databases 17 Regional surveys 4 Employee membership lists 3 Health data identified in the surveys that were not useful for ecologic studies were so classified because of limited or uncertain geographic coverage, unclear completeness or quality, or lack of relevant variables. Typical examples in this group were a data set dealing with residential care in nursing homes and data on hospital costs.

Conclusions
The ecologic study method has been used to study a wide variety of health problems. Epidemiologists have probably chosen this approach in many situations because of its practicality. By its nature, ecologic methodology allows the study of large populations in ways that might not be feasible with any other design. Ecologic studies have several advantages in being quick to execute, not requiring contact with individuals in the population, and often being able to use extant data sets. For these reasons, ecologic studies tend to be very cost efficient.
On the other hand, we have seen that there are also several disadvantages. There is particular concern that the ecologic fallacy will lead to imvalid associations of disease with risk factors. Information on potential confounders may be unaailable at the ecologic level, again leading to potential bias. Exposure to risk factors may be inadequately characterized by residence in a particular area; this will fiil to take account of exposures occurring elsewhere, for instance, at work. Indeed, legal residence does not necessarily imply any local exposure experience. Historical exposures may be important for diseases with long latent periods but difficult to ascertain with ecologic data. Population migration is also hard to account for with ecologic analyses. Finally, considerable effort may be needed to establish health and environmental data sets with comparable population subgroups. Health data are usually available for administrative units such as counties or municipalities, but environmental exposures transect their boundaries. For instance, drinking water systems may supply several areas; atmospheric quality data available from discrete and irregularly spaced sampling stations must somehow be converted to provide estimates of exposure for populations in the ecologic units of analysis.
Our review of Canadian and U.S. data sources for the Great Lakes region revealed that much of the environmental data is either irrelevant or difficult to use in ecologic studies. The data on ambient water and fish seemed largely inappropriate for ecologic work. Relatively few data sets on drinking water quality would be usable, mainly because of limited coverage and relevance or lack of compatibility with health data sets. Even the best data sets on driking water covered only selected parts of the population and covered only a small number of years. Other types of environmental data are likely to involve similar problems.
The situation with the health data is somewhat better. Such data can typically be aggregated at various levels, from relatively large units such as counties or municipalities to much smaller units such as census tracts. Hence there may be some flexibility in the choice between large units with stable risk estimates but heterogeneity of exposure, or small units with less frequent events and relatively homogeneous exposure.
Many variables relevant to the study of environmental health are readily available from sources such as the census. These might include ecologic descriptors such as socioeconomic status, urbanization, industrialization, and lifestyle and demographic variables such as fertility. However, these sources will typically not contain direct information on levels of contaminants in air and water, which are the types of exposure of greatest concern to environmental health scientists. Also, if good data were available, these would be precisely the effects that might be most appropriate for study with the ecologic method.
There is therefore a certain irony in the fact that the environmental exposures of interest which might best be studied ecologically are precisely those with the most limited data available in ecologic format. Improvement ofenvironmental data collection methods to render them usable with human health data is perhaps the most pressing need and, at the same time, the most significant challenge. Ecologic studies of human health using existing data still hold some promise, but great caution is required in their execution and interpretation.
An earlier version of this paper was written under contract to the Committee on the Assessment of the Human Health Effects of Great Lakes Water Quality, International Joint Commission.