Design and analysis of multilevel analytic studies with applications to a study of air pollution.

We discuss a hybrid epidemiologic design that aims to combine two approaches to studying exposure-disease associations. The analytic approach is based on comparisons between individuals, e.g., case-control and cohort studies, and the ecologic approach is based on comparisons between groups. The analytic approach generally provides a stronger basis for inference, in part because of freedom from between-group confounding and better quality data, but the ecologic approach is less susceptible to attenuation bias from measurement error and may provide greater variability in exposure. The design we propose entails selection of a number of groups and enrollment of individuals within each group. Exposures, outcomes, confounders, and modifiers would be assessed on each individual; but additional exposure data might be available on the groups. The analysis would then combine the individual-level and the group-level comparisons, with appropriate adjustments for exposure measurement errors, and would test for compatibility between the two levels of analysis, e.g., to determine whether the associations at the individual level can account for the differences in disease rates between groups. Trade-offs between numbers of groups, numbers of individuals, and the extent of the individual and group measurement protocols are discussed in terms of design efficiency. These issues are illustrated in the context of an on-going study of the health effects of air pollution in southern California, in which 12 communities with different levels and types of pollution have been selected and 3500 school children are being enrolled in a ten-year cohort study.(ABSTRACT TRUNCATED AT 250 WORDS)


Introduction
Epidemiologists recognize two basic strategies for looking at the association between an exposure and a disease: ecologic studies, in which disease rates in groups of individuals are related to the average exposure rates in these groups, and analytic studies, in which individuals' disease outcomes are related to their own exposure values. Cohort studies and case-control studies are examples of the latter type. The epidemiologic literature is full of examples of discrepancies between the conclusions of the two types of studies. In a classic example, Durkheim found suicide rates in provinces of western Europe to be highly correlated with the proportion of Protestants. Regression analyses of these rates produced an estimate of the rate ratio for Protestants relative to Catholics of 7.5, compared with a value of 2 estimated on an individual basis (1). Similarly, numerous associations between cancer rates and mean consumption of various dietary factors have been found in ecologic correlation studies, but establishing such associations at an individual level has proven more elusive (2).
The resolution of such paradoxes usually turns on three issues: between-group confounding, measurement error, and restricted variability. Between-group confounding refers to a characteristic of groups that is not accounted for in the model but is the real risk factor. In the suicide example, such a factor might be the alienation felt by Catholics in predominantly Protestant provinces. This is the essential explanation of the "ecologic fallacy," in which spurious ecologic associations may be caused by a tendency for the individuals in the higher exposure groups who get the disease not to have been exposed themselves but rather to have gotten the disease as a result of some other group characteristic. Exposure measurement error has different effects on the two types of studies, generally biasing associations at the individual level toward the null, but not at the aggregate level. Finally, studies conducted within a single group may have a restricted range of variation in exposure, and hence limited power. Thus, in the diet example the positive associations at the ecologic level might be explained by some confounding variable, such as race, that is not accounted for in the analysis, whereas the lack of association at the individual level might be due to dilution of a real effect by measurement error or by restricted variability in diet within racial groups.
Each of these designs has advantages and disadvantages. The main advantage of the ecologic design is cost, but its relative freedom from measurement error bias and greater variation in exposure between groups are other advantages. On the other hand, it typically suffers from betweengroup confounding (partly because groups will be more heterogeneous with respect to confounders than members of groups and partly because data on confounders are unavailable) and the exposure data are usually of poor quality (e.g., food disappearance rates rather than mean intake rates). Analytic studies are more readily controlled for confounding factors and have better quality data, but may suffer from the effects of measurement error and restricted variability.
To overcome these problems, we consider a hybrid design involving aspects of both approaches, which we shall call the Environmental Health Perspectives 25 "multilevel analytic design." Key to this design is an analysis that will exploit both levels of comparison. Exposure and confounder data will be assembled on individuals, to provide the best quality possible. Individual level analyses within groups will be adjusted for measurement error. The resulting exposure-response relations then can be tested for compatibility with the between-group differences in rates; and if compatible, the two analyses can be pooled for greater power. In particular, this allows one to assess how much of the differences in disease rates between groups can be explained by differences in the distribution of risk factors.
In the next section, we provide some details about the basic design and its analysis. In the following section, we describe how the effects of measurement error may be incorporated. We then address the issue of design optimization, and provide an example with a simulation study. Finally, we describe an application to the design of the University of Sothern California (USC) study of the health effects of air pollution.

Muftilevel Analytic Design and fts Analysis
The new design begins with a selection of a number of groups g=1,..., G, which might be defined by geographic areas (as in a study of air pollution), ethnicity (as in a study of diet), or any other factor for which group identifying data are readily available. Within each group, individuals i=l,...,Ig are selected. (For notational simplicity, we set Ig-I). Data on outcomes Ygi' exposures xgi, and confounders vgi are collected on each individual; in addition, certain characteristics of the group Xg may also be collected. For example, in an air pollution study, individual exposure information might comprise personal exposure estimates (e.g., ozone badges), microenvironment sampling (e.g., in homes, schools, cars, outdoors), or individual exposure modifying factors such as proportion of time spent outdoors or characteristics of the subjects' homes (air conditioning, presence of a smoker, heating and cooking sources, etc.). Group exposure characteristics might include estimates of the ambient levels from area monitoring. The specifics of the outcomes (continuous or binary, cross-sectional or longitudinal) and the sampling plan for individuals (survey, cohort, or case-control) will vary from study to study, but are not germane to the issues discussed here.
For conceptual and notational simplic-ity, we will assume that the outcome, exposure, and confounder are all univariate and continuous, and that the individuals in each group are chosen by simple random sampling. We also assume that the quantities of interest are linearly related, that is, Yge ag+ JXgj+ YVgi + 6gi [1] where ag is the baseline outcome for group g and the Egi are independent random variables with E(egi) =0, Var(egi) = ar2. Interest centers on the estimation of ,B, the exposure effect. The baseline effects ag may be considered fixed or random. Considering them random may be appropriate when the groups on which data are collected are randomly chosen from a larger population of groups. The true exposures xgi and the confounders vgi may also be considered either fixed or random. If the groups are randomly chosen or the subjects are randomly chosen within groups, it may be appropriate to consider them random. In what follows we will consider ag, xgi, and vgi to be random, and we make the following assumptions: First, the random variables a,,..., aG are independent and identically distributed (e.g., the groups are selected by simple random sampling). Second, the group baseline effects ag are independent of both xgi and vgi.
In general, the true exposures xgi will be unknown, and will be estimated by measured values, as discussed in the following section. For the remainder of this section, we will ignore the effect of measurement error, effectively assuming the true exposures to be known. We will also assume that the true values of the confounder vgj are known, although measurement error in vgi can bias the estimator of , (3). Equation 1 can be used to estimate , and is appropriate when the ags are considered fixed. When the ags are independent random variables with E(ag= a), var(ag) = T 2 an estimator with smaller variance is obtained using the equation ygi = a + fxg+ Yvgi + %1gi [2] The error r1g, is equal to ag -a+ Egi. The covariance matrix of i7 can be described as follows: Let p= T2/(a2+ T2). Define £=(1-p) I + piiT, where I is the identity matrix and 1 is an I-dimensional column of Is. Define IBIG to be the GI x GI block diagonal matrix, consisting of G identical blocks of the matrix E. Then the covariance matrix of 11 is equal to (a2+ 12)XBIG.
If p is known, the parameters a, fJ, and y can be estimated by weighted least squares. If C2 and r2 are unknown, the parameters can be estimated by a two-stage procedure. In the first stage, only withingroups differences are used. This is accomplished by using Equation 1 to estimate the parameters al,...,aG, 3, yby ordinary least squares. Denote by al the estimate of P obtained from this first stage regression, and by C2 the usual mean square residual estimate of error variance. The second stage regression involves only the betweengroups differences. The regression equation is obtained from Equation 2 by averaging over i: yg = a+I,Bg. +24g. +lg. [3] The variables r1g are independent with mean 0 and variance r2 + C2 IL Denote by 2 the ordinary least squares estimate of ,from Equation [4] The relationship between weighted least squares and the two-stage procedure is given by the following: Theorem: Let 62, ' 2 be the estimators of a2, T2 from the two-stage procedure. Then 1ipookd is the weighted least squares estimate of/ when p = 2/2+ T2) Corollary: If the errors i7gi are normally distributed, then the MLE of P satisfies Equation 4, with 4MLE substituted for tpooled and Var (4,) and Var (42) evaluated at the MLEs of(02 and r 2.
Proofs of these claims are provided in the Appendix. The corollary suggests that Equation 4, if iterated, will converge to the MLE.

Environmental Health Perspectives
Allowance for Exposure Measurement Error In many circumstances, it may not be feasible to obtain complete and error-free data on all individuals, and hence some variables will only be available for some (randomly selected) subset of individuals. For example, in a dietary study, one might wish to validate the use of a food frequency questionnaire in the entire group by repeated 7-day records. In an air pollution study, it might be feasible to obtain personal monitoring or microenvironment sampling data on only a sample, but questionnaire data on individual modifying factors might be available on the entire group. Optimization of the design typically would entail trade-offs between the number of groups and the number of individuals in the main study and in validation substudies, and the extent of the measurement protocols, subject to constraints on the total costs. These design issues will be discussed further below. In this section, we will focus on the effect of exposure measurement error. To simplify matters, we will ignore confounding.
We make a distinction in our analysis between two types of measurement error. The first type, known as the "Berkson" error model (4), applies when individuals are assigned their group average exposures. The second type, known as the "classical" error model, applies when the assigned exposure is a random variable whose expected value is the true exposure.
Let xgi denote the unobservable true exposure for individual i in group g and let Zgi indicate the measured value (e.g., from personal monitoring). The classical error model assumes that the measured values are randomly distributed around the true value with the property that E(zgi xgi) = xgi.
As is well known [reviewed recently by Thomas et al. (5)], the classical error model produces a bias towards the null, essentially because the measured exposures are overdispersed (Var(zgi) = Var(xgi) + Var(zgi xgi) > Var(xgi)). Thus if Var(xgi) = 2 and Var(zgi Ixgi) = 02, the regression on Zgi produces a slope estimate , that has expectation cg= gl/(Og + w2) times the expectation of the slope of the regression on the xgi. This suggests a simple correction for measurement error if these variances are equal and known. First fit the naive regression on zgi and then correct the estimated slope coefficient by dividing it by c (6). For more complex situations, for example if the variances differ between groups, a useful strategy is to replace the zgi's by x'g= E(xgjlz )=tcz j+(1-cg)E(xg) and then use these xgi s as if they were the true exposures in the regression. The Berkson error model assumes instead that the true exposures xgi of individuals are distributed around their group estimates Xg with the property that E(xgi Xg) = Xg. Thus, in an air pollution study with no personal monitoring, we might assume that individuals' exposures are randomly distributed around the ambient levels for their communities. A consequence of this assumption is that, at least for linear dose-response models, the regression on the measured values provides unbiased estimates of the true slope. If ygi= ag+ fxgi + E, then E(ygOIA) = ag + PE(xgi IXg) + E(e Xg) =ag + fiXg.
Thus, Berkson error produces no bias towards the null for linear models. Typically, it would not be feasible to obtain true exposure data on any individuals. Rather, a surrogate variable w would be obtained on everybody and higher quality measurements z only on a sample. The measurements are assumed to be unbiased in the classical error sense and might be replicated T times. In this case, it will not be possible to use the z's directly in modeling y because they are available on too few subjects; but they could be used to build a model for the relationship between z and w, which could be then be used for imputing x values in the first stage regression. The surrogate variable w might be a simpler measure of x (such as a food frequency questionnaire) or it might be a personal modifier of a group exposure characteristic X (for example, percent time spend outdoors in an air pollution study could modify the ambient pollution level).
To give a concrete example of this imputation procedure, assume that at times t = 1,2,...,Twe have measurements of a group exposure characteristic Xg, for each group, and for a subset of individuals we have an exposure modifying variable uWti and an exposure measurement Zgit. We assume that X and w are assessed without error, and z has a classical error structure in relation to true exposure x. We assume the following relationships: xgit N g+o+lgit¢ )2 [5] zgit -N(xgit, W2) [6] We assume that c2 is known from other studies or from another set of replicate measurements, but that 40, 6i, and 02 are unknown. Combining Equations 5 and 6 yields zgit -N(Xgt + 30 + 61 wgit, &2+o/2) [7] from which we can obtain unbiased estimates of S0, Si, and 02 (since o2 is known). We then estimate xgi as Xg = Xg + So + SI w ,, which is an unbiased estimator of E(xgjI Xg,wg) since S0 and SI are unbiased.
Allowing for measurement error complicates the two-stage procedure for estimating the parameter P as follows. We assume ag -N(a,+T2) Ygi -N(ag + Xg,,a2) [8] [9] ,r2 2 where a, 2, and or are unknown. Since the x are also unknown, we replace them with their estimates Xgi when fitting the model. The first stage model is thus: Ygi = ag + P/j + £gijX [10] where the £gi are independent normal ran- The IRLS procedure is then conducted as follows. Set (2 and ,B to arbitrary initial values, then fit model Equation [1 1] over values of ag, g= 1, . G, /3, 5, 451) and a2 Let 'ag, pi, and c2 be the estimates of ag, /3, and a2 obtained from the first-stage model Equation 10. Let W be the diagonal matrix whose diagonal elements are C2+/4 Wgi. Let X be the design matrix corresponding to the first-stage model Equation 10. The usual estimate of the covariance matrix of &g, 4i1, is (XTW-iX)-i, which does not seem to be accurate when the exposure is poorly estimated (see simulation results below). An alternative which may prove more satisfactory is the "sandwich estimator" (7), which may be less sensitive to misspecification or variation in the weight matrix W. Other alternatives include computing the MLE of 01 using Equation 11 and using the estimated Fisher information, or estimating the variance with a bootstrap procedure. All the estimators just described provide estimates of the conditional variance of A given Xg and wgi. They will serve as estimates of the unconditional variance with no additional bias if the conditional expectation of / given Xg and wgi is constant across values of Xg and wgi.
The second-stage model is obtained from Equation 10 by averaging over i and by replacing ag with its mean a to obtain For many applications in environmental epidemiology, it is more appropriate to assume that true exposures are nonnegative and lognormally distributed and that measurement errors are lognormally distributed and multiplicative. Furthermore, the individual exposure modifiers wgi might also be assumed to act multiplicatively on the group means Xg. All this can be accomplished without any new theory simply by redefining x, X, and z to be the logarithms of their respective quantities. For chronic exposures, however, it may be more appropriate to relate the outcome y to the timeweighted-average (arithmetic mean) or cumulative exposure than to the geometric mean or integral of the log exposures. This leads to additional complexities involving means and variances of lognormal distributions.

Second-stage Models
If the assumption that group baseline effects are independent of group mean exposure levels (assumption 2) is violated, the second-stage model Equation 12 may produce a biased estimate of /3. This is the case because the error term r1g in Equation  12 is correlated with the group baseline effect ag. This result is precisely the ecologic fallacy, wherein effects due to variation in acg appear to be explained by variation in xg. It is therefore wise to test for bias in /2 before pooling it with /4k. One way to do this is to test whether the difference /1-/32 is significantly different from 0. In the absence of measurement error, the estima-tors /I and /2 will be nearly uncorrelated even if ag and xg are dependent, so the variance of the difference can be estimated by adding the variance estimates from the first and second stage regressions. In the presence of measurement error, it is not yet clear how to estimate this variance accurately.

Design Optimization
At the design stage, the epidemiologist needs to consider the trade-off between the number of groups and the number of subjects per group, the selection of the specific groups to be included, the number of subjects in the main study versus the number in the validation sample, and the number and complexity of measurements to be made on each sample. These are important issues that have been given only limited attention in the context of analytic studies and none in the context of ecologic or hybrid designs. For analytic studies, Greenland (8) and Spiegelman and Gray (9) have considered the trade-offs between numbers of subjects in the main and validation studies and provided explicit formulas for determining the optimal design where it is planned to use measurement error adjustment methods in the analysis like those described above. Rosner and Willett (10) considered the trade-off between numbers of subjects and numbers of replicate measurements in a validation study.
For linear models with a continuous normally distributed outcome, ignoring confounding at the individual and group levels, measurement error, and assuming exposure is assessed only at the group level, the power of the study can be computed as a function of four quantities: the number of groups G, the number of subjects Isampled in each group, the true R2 between group mean exposure and group mean outcome, and the ratio VR=VWIVB, where Vw and VB are outcome variances within and between groups respectively. Given these quantities, we compute R*= IVBR2/(VW+ IlVB), the squared correlation between group mean exposure and the average outcome among the individuals sampled from the group. The quantity R* is less than R2, because the sample mean outcome rather than the true group mean outcome is used.
The power to detect a nonzero R2 is calculated by using Fisher's transformation of R 2 . Table 1 illustrates the results for a variety of choices of the model and design parameters. It is clear that the power is much more strongly influenced by the number of groups than by the number of subjects. For a logistic model for binary outcomes, the power also depends on the overall disease frequency, but the same basic result emerges-the power of the aggregate analysis depends much more strongly on the number of groups than on the number of subjects per group. Table 2 provides similar power calculations for testing a partial R2 for the individual regression after removing group effects, again using Fisher's transformation. The power of these analyses depends only on the total number of individuals and it is clear that with sample sizes in the thousands, there is adequate power for detecting very small correlations. However, it is important to note that these are the correlations with the measured exposures, which could be severely attenuated by measurement error.
To provide further guidance for the design of the USC air pollution study, we undertook a limited simulation study. For this purpose, we varied the number of groups G, the number of subjects per group in the main study I, and the number of subjects per group in the exposure substudy S. Relationships among the variables were as given in Equations 5 to 9, with the ambient levels Xg, Xgt and individual modifiers wgi, wgit being normally distributed. For each choice of design parameters, 1000 replicate data sets were simulated and analyzed using the methods described above. We tabulated the bias and variance of the parameter estimates from the individual level regressions (with and without adjustment for measurement error), the ecologic regression, and the proposed pooled combination of the two regressions.
The design parameters were chosen to approximate those being considered for the USC air pollution study, and the model parameters were then adjusted to illustrate a hypothetical situation in which the two approaches to estimation would be roughly equally informative. Table 3 illustrates the effect of modifying the design parameters under the constraint that the total number of measurements G(I+SM) be fixed at 3000. (A more realistic simulation would allow for differences in costs between the different types of measurements.) Under the assumptions of the simulation, measurement error is minimized when one measurement is taken per individual in the substudy. Therefore we set T= 1. All para-   depends only on the total number of subjects. Comparing the two blocks illustrates the trade off between the number of subjects in the main study and validation substudy. The second-stage estimator is relatively insensitive to this parameter, while the first-stage estimator is improved by having a larger proportion in the validation study, although we do not have enough information to determine the optimal allocation. Perhaps more important, when too few subjects are assigned to the substudy, the nominal SE of A is far too optimistic, since it fails to take into account the error in misspecifying the weights. Since Apooled iS based on the nominal SEs, it is no longer the optimal linear combination of Al and ,2' and in some cases is less efficient than 132.

Example: The USC Air Pollution Study
In January 1992, the California Air Resources Board (ARB) awarded a contract to the University of Southern California to initiate a 10-year cohort study of the health effects of air pollution in southern California. The study will enroll a cohort of about 3500 school children from 12 communities selected to represent a variety of types and levels of air pollution that are represented in the basin. The primary focus of the study is on the effects of chronic exposure to 1-hr peak ozone (03), but particulates (PMio), nitrogen dioxide (NO2), acids (H+), and other pollutants are also being measured. Health outcomes to be measured annually will include various Volume 102, Supplement 8, November 1994 lung function tests, symptoms reported by questionnaire, and absences abstracted from school records.

Conmuniy Selection
Some preliminary power calculations based on assumed values for true effects indicated that for studying a single pollutant, it would be necessary to have at least ten groups for power to be adequate. We carried out further calculations along similar lines to assess the prospects for doing multivariate analyses of two or more pollutants and concluded that it would be possible, provided groups could be selected in such a way that the correlations in pollutant levels across groups were not too large. Thus, the optimal choice would have to take account of the actual levels of exposure to each of the pollutants we wished to assess. Fortunately, extensive data were available on the four highest priority pollutants from the ARB's monitoring program. Yearround average levels for the period 1986 to 1990 were obtained from 86 monitoring stations scattered across southern California. (For some pollutants, notably acids, the values had to be interpolated from other stations on an inverse-distance weighted basis). Our initial selection of sites was based on the intuitive notions that we wished to maximize the dispersion of each of the pollutants, and we wished to represent as many combinations of high and low levels of each pollutant as possible. These notions are appropriate when the response surface is linear.
For each pollutant, we calculated the mean level over the 86 communities, then for each community, we converted the pollution levels to standard units. Each community was assigned a "profile" by recording it as either above (+) or below (-) the mean level for each pollutant. For a design based on all four pollutants, there were thus 24=16 possible profiles, of which demographically suitable examples could be found for seven of them. Within each profile, we then selected from one to three communities whose sum of squared standardized pollution levels were large. Table 4 describes the characteristics of the communities that we judged to be the most suitable on this basis, under the constraint that we could afford to study no more than 12. This selection process differs from the one described above in that the groups were not randomly chosen. Thus the group effects must be considered fixed rather than random.
To compare alternative designs based on different selections of priority pollutants, HI is measured in parts per billion on a mole basis. b+ signifies that the pollution level is above the mean level of the 86 communities considered, -signifies that the pollution is below that level. we then carried out a further simulation study, based purely on the second stage ecologic regression but allowing the actual pollutant levels to differ from the measured values subject to a covariance structure estimated from the observed data [detailed in Peters (11)]. Table 5 summarizes the results of this simulation, which led us to the conclusion that, if all four pollutants had health effects, then the optimal design would need to be based on all four. This design appears to have adequate power for detecting differences in mean forced expiratory volume in one second (FEV) of about 3 to 5% between the high and low communities for each of the four pollutants in multivariate analysis, assuming that one-third of the variance in FEV, is explained by variation in the pollutants, and that 03 and PMIo each contribute twice as much to the health effect as do NO2 and H+. Alternative designs that ignore one or more of these pollutants (with the same total number of communities) may slightly increase the variability of the pollutants of primary interest, which normally would be expected to yield an improvement in power. However, they also substantially weaken the power for controlling the confounding effect of the omitted pollutants and therefore in most instances reduce the power for the effects of interest in a multivariate analysis.
To determine whether we could significantly improve our selection of communities under the four-pollutant design, we conducted a final simulation along similar lines, starting with the choice given in Table 4 and in a stepwise fashion considered replacing each of the 12 communities by each of the remaining candidates. This led to the conclusion that, under an optimality criterion that maximized the sum of the powers for the four pollutants, it was theoretically possible to improve the design further by changing 5 of the 12 sites. This alternative choice attained better overall power by substantially reducing the correlations among the exposure variables. However, it did so at the expense of substantially reducing the variance of each exposure. Since we were unsure of the validity of the correlation estimates because many of the entries were based on interpolation, and since the overall improvement in power was modest, we decided to retain Environmental Health Perspectives our original selection. Essentially, we judged that the primary objective of the study was to maximize the overall power to detect any air pollution effect and that the separation of the effects of particular pollutants was of only secondary importance, after having demonstrated an overall effect. We therefore felt that it was more important to maximize the variance in exposures than to minimize their covariances.

Exposure Modeling
The measurement protocol entails a combination of ambient monitoring, personal monitoring, microenvironment sampling, and questionnaire assessment of personal modifying factors. Ambient data are routinely collected by the ARB for each of the communities, and will provide long-term average levels throughout the study as well as historically. The questionnaire will be administered to all subjects and will include items on residence history, usual indoor and outdoor times and activities, and household characteristics (smoking by family members, air conditioning and heating, air exchange, sources of indoor pollution, etc.). Personal monitoring will be possible only for ozone and only on a sample of subjects. These subjects will also maintain a daily diary of their activities during the times when the monitoring badge is worn. Microenvironment sampling will be done on all pollutants at a variety of indoor and outdoor locations in each community.
The goal of the analysis will be to combine these various data sources in such a way as to provide estimates of individual and group mean exposures for the firstand second-stage regressions described above, including estimates of measurement error distributions for adjustment purposes. The actual form of the models to be used is still under development, and will incorporate the extensive body of literature on the determinants of personal exposure. To illustrate the general approach, we make some simplifying assumptions that will be remedied in our final analyses.
First, we assume that the relevant exposure variable is the long-term arithmetic mean (i.e, the "time-weighted average," TWA). We also assume that ambient levels, true personal exposures, and measurement errors are lognormally distributed. Finally, we assume that the ratio of personal exposures to ambient levels is described by a multiplicative factor that depends loglinearly on the personal modifying factors. The basic relationships are thus as described in Equations 5 to 10, except for the additional complexities introduced by the lognormal assumptions. Using the estimates from this model, we can compute for each subject in the main study the TWA, E(eXgIXg,wgi), for use in the first-stage regression, together with the average over all subjects of these TWAs for use in the second-stage regression. Whether it will be possible to assess exposure effects at an individual level will depend primarily on the variability between individuals in their modifying factors and the ability of the exposure model to accurately predict per-sonal exposures. Even if it is not possible to assess dose-response relations at an individual level, however, the use of average TWAs rather than Xg in the second stage should lead to more reliable estimates, because communities with different exposure patterns are likely to differ substantially in modifying factors such as use of air conditioning and proportion of time spent outdoors, because of major differences in climate across southern California.

Conclusions
It is reasonable to expect that the proposed two-stage analysis of the multilevel analytic design will provide unbiased and efficient estimation of effects in a complex model involving unmeasured between-group differences, measurement error, and a complex measurement model combining individual and aggregate exposure data. In particular, in cases where the within-groups exposure variance is less than the betweengroups variance, estimates obtained through pooling should be more efficient than estimates based on either individual level or aggregate level analyses alone. Simulation techniques can be used to optimize the various trade-offs between the design parameters if reasonable estimates of the model parameters are available. We believe this design and its associated analysis offer considerable promise for resolving some of the difficulties of between-group confounding, measurement error, and restricted variability that have historically plagued environmental epidemiology. APPENDIX Proof of Theorem: We prove the theorem for a more general case with an arbitrary number of confounders. The model is Ygi= a+f3xg1+Vy+ %7gi where V is a matrix of confounder variables. Let 62 and 2 be estimates of a2 and r2, and let £BIG be the corresponding estimate of XBIG.-Assume without loss of generality that £-12 x is orthogonal to 1/2 V. Otherwise, replace x with x_V(VT BIGVYl VT52IGx and reparameterize the confounders y. The weighted least squares estimates of a and P are the values minimizing y-a-3x)T A IG (y-a-I3x) [14] which is equal to G I (yga-3xg)T-1 (yg_ a-Pxg) g=.
Multiplying numerator and denominator of Equation 9 by VA1V92 shows that