Methodologic frontiers in environmental epidemiology.

Environmental epidemiology comprises the epidemiologic study of those environmental factors that are outside the immediate control of the individual. Exposures of interest to environmental epidemiologists include air pollution, water pollution, occupational exposure to physical and chemical agents, as well as psychosocial elements of environmental concern. The main methodologic problem in environmental epidemiology is exposure assessment, a problem that extends through all of epidemiologic research but looms as a towering obstacle in environmental epidemiology. One of the most promising developments in improving exposure assessment in environmental epidemiology is to find exposure biomarkers, which could serve as built-in dosimeters that reflect the biologic footprint left behind by environmental exposures. Beyond exposure assessment, epidemiologists studying environmental exposures face the difficulty of studying small effects that may be distorted by confounding that eludes easy control. This challenge may prompt reliance on new study designs, such as two-stage designs in which exposure and disease information are collected in the first stage, and covariate information is collected on a subset of subjects in state two. While the analytic methods already available for environmental epidemiology are powerful, analytic methods for ecologic studies need further development. This workshop outlines the range of methodologic issues that environmental epidemiologists must address so that their work meets the goals set by scientists and society at large.


Introduction
The environment, for most epidemiologists, comprises everything that is not genetic; so diet, smoking, and even exercise are considered environmental factors. Environmental epidemiology, however, has a more restricted connotation, referring to those environmental factors that are outside the immediate control of the individual. Smoking, therefore, would not be a factor included in environmental epidemiology, but the effects of tobacco smoke put into the air by others would be. Other exposures of interest to environmental epidemiologists include air pollution, water pollution, and occupational exposure to physical and chemical agents.
The spread of infectious agents through water, foods, or other environmental media could be seen as part of environmental epidemiology, but this area has long been claimed by infectious disease epidemiologists and does not suffer from most of the methodologic problems facing environmental epidemiologists. Although there are areas of overlap between infectious disease and environmental epidemiology, such as the suspension of exotic pathogens in indoor air or the possibility of environmentally spread oncogenic viruses, environmental epidemiologists usually do not concern themselves with infectious agents.
Environmental epidemiology comprises the study of more than just physical and chemical agents, however. Rising health consciousness is a social phenomenon, and concern about the health of the environment itself, as well as its effect on us and other species, is a growing preoccupation among scientists and nonscientists alike. Psychosocial factors are increasingly important concerns in environmental epidemiology research: Studies of populations living near electric power lines or nuclear generating power plants can be neither conducted nor interpreted properly without a clear assessment of the role of the public's perception of environmental health risks. In some instances the psychologic reaction of the public may be a major component of the effect of an environmental influence; in others, the ability to conduct a study at all, and the way in which it should be conducted, are influenced profoundly by publicity and public response.
Why make a distinction between environmental exposures that can be controlled by the individual and those that are beyond his or her control? Those exposures that are beyond individual control are typically exposures that affect many individuals simultaneously and for which individual exposure may be difficult to measure. These conditions frequendy lend themselves to what some epidemiologists call ecologic research, using aggregate rather than individual data. Those environmental studies that do have individual people as subjects often have distinctive methodologic features that derive from the nature of the exposure. It is as much these methodologic distinctions as the subject matter itself that warrant the use of a special term for environmental epidemiology. Furthermore, the most important research gaps in the area of environmental epidemiology may be methodologic problems.

Exposure Assessment
Atop the list of methodologic problems is the problem ofexposure assessment, a problem that extends through all of epidemiologic research but is a towering obstacle in environmental epidemiology. Routine practice has been to use crude measures that are only tenuously related to the actual exposure experienced. Working in a plant, for example, has often been used as an indicator for occupational exposures that are varied in kind and intensity within the plant. Communitybased sampling ofair or water has been used commonly to approximate individual exposure in many studies. Indeed, in ecologic research, data may be aggregated over geographic units as large as continents. Any externally derived information as a proxy for individual exposure introduces measurement error that will affect the analysis. For exposures such as electromagnetic fields, which vary strikingly over short distances, measuring an individual's exposure by proxy measures is bound to result in substantial errors. For many exposures, a crucial part of the assessment indudes the personal history. Such information is formidable to obtain after the fact and can be obtained prospectively only with gargantuan effort. These problems in exposure assessment are compounded by the problems of low prevalence of putative high-risk exposures to the environmental agents and the low frequency of many of the outcomes of interest.
The long induction time likely to intervene between the presumed causal action of many environmental agents and the resulting appearance of disease aggravates the difficulties ofexposure assessment. With a long time interval between exposure and disease, the investigator must either conduct a long, expensive prospective study or rely on retrospective measurement of the exposure information. Retrospective measurement is feasible for certain types of exposure, such as occupational exposures for which adequate employment records and industrial hygiene evaluations exist, or smoking for which the memory ofthe smoker usually contains a reasonable enough record of the exposure. For some exposures, such as ionizing radiation, medical records and employment information may give partial information on the amount and timing of exposure; but assessing the amount ofexposure may involve considerable guesswork, making retrospective evaluations less informative. For certain unrecorded and imperceptible exposures, such as electromagnetic fields, retrospective evaluation can at best be highly indirect.
Better methods of assessing environmental exposures are a high priority for the future. One hope has been to find exposure biomarkers, which ideally might serve as builtin biologic dosimeters, to measure the biologic record ofpast exposure on the individual. An attraction of biomarkers is the theoretical concept that if a chronic exposure can affect disease risk, there must be a biological footprint somewhere in the organism that intermediates the causal action. The use of biomarkers can overcome measurement error that stems from an individual's incorrect recall or lack of awareness of an exposure. The use of biomarkers also can bypass exposure assessment errors arising from variation in individual absorption or metabolism of exposures by focusing on a later step in the causal chain. Chromosomal abnormalities among long-lived lymphocytes have been used in this way to assess the health effects of radiation in the studies ofthe Hiroshima and Nagasaki cohorts. Another example of this use of biomarkers is the possibility of using measurement of DNA adducts to assess the effects of tobacco smoke in target tissues, a method that may prove to be much more accurate than asking subjects about their smoking habits.
An additional approach to refining exposure measurement is to use multiple measures of exposure routinely until we find exposure measures that reflect the exposure as completely as the research problem demands. Replicate measures of exposure also can curb measurement uncertainty. The effect of residual uncertainty can be quantified by sensitivity analyses that explore the implications of errors in exposure assessment.
What are the priority areas for improving methods of exposure assessment in environmental epidemiology? The following areas are those that should command the highest attention [These recommendations are discussed in greater detail in the paper by Hatch and Thomas (1)]: a) development of dosimetric models using a combination of direct measurement, biological markers, and questionnaire data, and the development of new strategies for historical dose reconstruction of environmental exposures; b) development of sensitivity analysis and other approaches to estimating dose uncertainty, including methodology for validation substudies; and c) development of methods to measure covariates more accurately.

Study Design
The range of epidemiologic study designs comprises true experiments with randomized assignment of study subjects to intervention groups, as well as nonexperimental studies in which randomization cannot be relied upon to equalize the distorting effect of confounding factors related to both the exposure and the outcome. Randomized assignment of individuals into groups with different environmental exposures generally is impractical, if not unethical; community intervention trials for environmental exposures have been conducted, although seldom (if ever) with random assignment. Furthermore, the benefits of randomization are heavily diluted when the number of randomly assigned units is small, as when communities rather than individuals are randomized. Thus, environmental epidemiology consists nearly exclusively of nonexperimental epidemiology. Ideally, such studies use individuals as the unit of measurement; but often environmental data are available only for groups of individuals, and investigators turn to so-called ecologic studies to learn what they can.
The most basic epidemiologic study design, which includes experimental studies, is the cohort study. In a cohort study, a population is characterized as to its exposure to an agent of interest, and this population is then followed to measure the rate of occurrence of one or more types of dis-ease events within variously defined exposure cohorts. Cohort studies may be entirely prospective, in which case they are expensive and usually last a long time, or they may be partially or completely retrospective, in which case they are shorter and cheaper but typically have to rely on data collected before the research plan was concocted. Case-control studies, although they have been described as backward cohort studies involving a comparison of exposure distributions in cases and controls, may be better conceptualized as streamlined cohort studies: They involve sampling the base population, or some facsimile of it, to learn the distribution of exposure within it, enabling the investigator to estimate the relative rate of disease occurrence within each exposure cohort. The sampling is usually a big cost-saver. It comes at a reasonable price-only relative rates of disease occurrence are calculable, unless the sampling fractions are known. If the sampling fractions are known, the case-control study can provide estimates of the absolute disease rates. Like cohort studies, case-control studies can be retrospective or prospective.
Ecologic studies differ from the basic cohort study in that individual exposure levels are not measured, or such exposure information, if it is measured, is not linked to disease occurrence at the individual level. The usual unit of statistical analysis is typically a geographic area, such as census tract, county, or state. For each group or region, we can estimate the distribution of individual exposures or at least the average exposure level, and we can estimate overall disease rates, but we do not have measurements of both exposure level and disease status that would allow one to estimate directly the joint distribution of the two variables. Therefore, it is impossible to get direct estimates of the rate of disease in exposed and unexposed populations from ecologic data; indirect estimates must be obtained. The indirect estimation of effects in ecologic studies and fundamental methodologic concerns, such as the control of confounding, are replete with methodologic complications that make ecologic studies a highly specialized methodologic area in epidemiology. The need to conduct such studies emanates primarily from the basic difficulty of obtaining high-quality data on environmental exposures and covariates.
The challenge posed by environmental epidemiology cannot be answered simply by conducting larger and more expensive studies; the special problems inherent in this area of research may call for new types of study designs intended to address these problems. One example is the idea of conducting a two-stage study in which exposure and disease information are collected in the first stage, and covariate information is collected on a subset of subjects in the second stage. This study design should be useful when covariate information is expensive relative to information on exposure and disease. The results from stage one estimate a crude effect, and the information in stage two is used to estimate the effect adjusted for covariates. Covariate information is collected most efficiently in case-control studies, and therefore, we can look forward to seeing more twostage studies in which the second stage of the investigation is a case-control study.
Another type of study that merits attention is one that focuses on intermediate steps in the causal path to disease. Such studies could give information about the relation between acute and chronic effects and provide some results much earlier than more traditional studies. Surveillance systems may be worthwhile so that selection and reporting biases can be avoided. As mentioned above, dearer understanding of the use and conduct of validation substudies is another important priority in study design. Theoretical work is needed on the validity of estimates from ecologic analyses to understand the relative importance of various assumptions and how departures from these assumptions affect the estimates. Understanding of the interaction of genes and environment will have to grow rapidly to keep pace with the information explosion about the genome. All these areas are fertile ground for more theoretical work on epidemiologic study designs.

Data Analysis
For studies on individuals with information on important confounders and little measurement error for the confounders, exposure, and outcome variables, the analytic methodology to assess exposure effects while controlling for confounding is reasonably well developed. Methods exist to control for confounding and to assess the exposure effect even when the exposures and confounding factors have complicated variations over time. Where analytic problems exist in environmental epidemiology research, it is usually the result of lack of information on confounding variables or measurement errors in confounders, exposure, or outcome variables. Such problems are the major sources of bias in environmental epidemiology research, although bias also arises from the same sources that affect all nonexperimental epidemiology, such as selection biases and information biases.
Biases can arise in any study from the use of inappropriate mathematical models in an analysis; but this is a particularly important problem in ecologic studies, because they rely on aggregate data. The oftenassumed linear relation between exposure and disease risk may not correspond to the biologic relation between exposure and disease. Ecologic studies also suffer from biases that distort the estimation of exposure effects because of heterogeneity of exposure status within population aggregates.
Measurement error usually has been taken into account by assuming a value for misdassification probabilities and recalculating effect estimates based on the assumed value, thus allowing a type of sensitivity analysis. Usually the misdassification probabilities are known from estimates based on limited data. A methodologic priority for data analysis is the development of methods to take account of uncertainty in the assumed values for misclassification probabilities, thus progressing from a sensitivity analysis to a more direct, corrected estimation of exposure effects that incorporates measurement error and the attached uncertainty.
Another important need is improved methods for the analysis of ecologic studies, especially with regard to controlling confounding. It would be useful to develop methods to control confounding in aggregate-data studies using information from surveys on individuals. Such approaches would call for corresponding innovation in data analysis.
Studies of multiple exposures face the formidable task of separating effects of interactions from variations in the induction periods and dose-response curves of different exposures. There is a need for analysis methods that simultaneously account for interactions, induction periods, and dose-response in a parsimonious fashion.
The difficulty and expense of epidemiologic research on environmental problems forces attention toward methods for aggregating results over a set of studies when appropriate. While many critics of metaanalysis rightly object to the pooling of inherently noncomparable work, no one argues that literature reviews are undesirable. It seems reasonable to review published work as objectively and quantitatively as possible. Meta-analysis should be thought ofsimply as a "quantitative literature review," as Greenland has called it (2). Meta-analyses should rely on the principle that the primary comparisons from which effect estimates are derived should be made within each study proper and then given appropriate statistical treatment, in terms of adjustment and weighting, to combine results across studies.
Better methods are needed for adjusting the individual study-specific results to reduce bias before combining with other results, especially to take account of errors in exposure assessment that differ across studies.

Risk Assessment
Some people believe that we now live in a chemical soup that implacably erodes our health, while others believe that we have engineered an environment that protects us from most of the important health risks that otherwise would have been our fate. In either case, however, it is clear that assessing the risks of our technological world is becoming more complex.
The complexity is compounded by the intricacy of the public policy issues relating to environmental epidemiology, involving economic, political, and social concerns that must be taken into account along with the health consequences ofenvironmental exposure. Perhaps the broadest and most important methodologic problem in environmental epidemiology is the problem of how environmental epidemiology should be used in relation to other sources of information to address these public policy issues. How many studies, and of what type, are needed before policy should be promulgated? What are the implications of publication bias (resulting from a failure to publish studies that do not show a relation between environmental exposures and health problems)? How should animal studies be weighed in relation to epidemiologic studies? What role should the public take in the conduct of research and risk assessment? The answers to these questions are important to us as citizens, but they are usually seen to be outside the scope ofour work as scientists. This set of questions should be another priority for methodologic research. eB