Hierarchical regression for epidemiologic analyses of multiple exposures.

Many epidemiologic investigations are designed to study the effects of multiple exposures. Most of these studies are analyzed either by fitting a risk-regression model with all exposures forced in the model, or by using a preliminary-testing algorithm, such as stepwise regression, to produce a smaller model. Research indicates that hierarchical modeling methods can outperform these conventional approaches. These methods are reviewed and compared to two hierarchical methods, empirical-Bayes regression and a variant here called "semi-Bayes" regression, to full-model maximum likelihood and to model reduction by preliminary testing. The performance of the methods in a problem of predicting neonatal-mortality rates are compared. Based on the literature to date, it is suggested that hierarchical methods should become part of the standard approaches to multiple-exposure studies.


Introduction
Many epidemiologic studies, especially in occupational and environmental health, have as their objective the evaluation of multiple exposures for potentially harmful effects. While the statistics literature ostensibly deals with these problems under such topics as stepwise regression and multiple comparisons, prominent epidemiologic methodologists have been adamant in their rejection of such methods (1)(2)(3). These authors criticize conventional methods for (among other things) failure to take account of prior knowledge, the irrelevance of the inferential objectives on which the methods are based, and the tendency of the methods to misrepresent continuous estimation problems as discrete decision problems. I have been largely sympathetic with these criticisms; nevertheless, I have not found these alternative points of view to be entirely satisfactory with regard to the representation and solutions they offer for the multiple-inference problem (4).
Developments over the past few decades offer fresh approaches to the problem. Thanks to dramatic increases in computing power, hierarchical methods (such as empirical-Bayes regression) can be explored as practical alternatives or supplements to the regression analyses common in epidemiology. The hierarchical perspective is oriented specifically toward multiple-inference problems. In contrast to conventional multiple-inference methods, the Bayesian format of hierarchical methods allows straightforward accommodation of prior information and interpretation of results. At the same time, hierarchical methods can enjoy superb repeated-sampling properties (5,6).
Several authors have discussed the application of parametric empirical-Bayes methods to the estimation of parameters in risk-regression models (7-10); a related, Bayesian log-linear approach to risk modeling was given by Cornfield (11). Here, I provide an overview and comparison of two of the more common modeling strategies employed by epidemiologists (maximum likelihood [ML] and preliminary testing) to two hierarchical methods, parametric empirical-Bayes regression (6) and a variant I call "semi-Bayes" regression (9,10). After reviewing the basic strategies, I illustrate these methods and another hierarchical method (Bayes empirical-Bayes) in an application to a neonatal mortality study. The results illustrate how hierarchical methods can offer considerable advantages over more common approaches to epidemiologic studies of multiple exposures.

Background
Consider the following problem: An investigator gathers data on an outcome (dependent) variable y, an n-row vector of study variables x ("exposures"), and an m-row vector of nuisance variables w ("potential confounders"), with the intention of estimating the n-column vector 3 of exposure coefficients in a generalized linear model for the expectation ofy conditional on x and w, g[E(y x, w)] = a+ xf+ wy where g is a known, strictly increasing link function and y is assumed to be randomly sampled from its distribution conditional on x, w. In this multiple-estimation setting, the investigator may be concerned to maximize the "accuracy" in estimating /B. In the epidemiologic literature, the concept of accuracy is rarely formalized; when it is, accuracy of point estimation is sometimes equated with mean-squared estimation error (12), and accuracy of interval estimation is usually equated with nominal or conservative coverage coupled with short length (12). The multiplicity inherent in the interval-estimation task is almost always disregarded on the ground that componentwise coverage, not overall coverage, is scientifically relevant in exploratory studies (1)(2)(3). The latter view finds acceptable the high probability that at least one component of ,B will not be covered by the componentwise 95% intervals if ,B has, say, 20 components.
Other multiple-inference problems arise if one is chiefly interested in estimating the expectation E(y x, w) or future values ofy (prediction). In these problems, the coefficients are of only intermediate interest. Nonetheless, accuracy in coefficient estimation reasonably could be expected to correlate with prediction accuracy, and coefficient estimation methods can be compared for their effectiveness in producing accurate prediction.
In epidemiology, two common strategies for estimating study-variable coefficients are the following: Use estimates from a "full" model, one that contains all the study variables (if such a model can be fit), or as many variables as can be fit. Kleinbaum et al. (12), Miettinen (1), and Rothman (2) make recommendations along these lines. (Note that full stratification on all variables is equivalent to such an approach, but is rarely practical because of the sparsity of the stratification.) * Use estimates from a "reduced" model, including only components of ,B or y that survive some preliminary-testing algorithm (such as forward selection, backward elimination, or univariate testing). Note that in the second strategy, coefficients deleted from the model are, in effect, assigned an estimated value of zero.
Either of the above strategies may involve algorithms for reduction of the nuisance parameter y, e.g., by backward elimination of components of w. In fact, virtually all the epidemiologic literature on variable selection has focused on methods for reduction of the nuisance parameter, rather than P (12,13). This focus may explain in part why published analyses often treat the components of ,B one at a time: for each component of ,B, a model is selected based on some method for reducing the number of remaining components; in each model, one component is treated as the sole study variable, while the remaining study variables are treated as nuisance variables. For example, PI would be estimated by forcing xl into the model, then applying some selection or reduction algorithm to the vector of remaining variables (X2.Xn-WI) ... Wm)* In essence, this approach treats an n-parameter inference problem as n one-parameter problems. If no reduction of the nuisance vector is applied, this approach is equivalent to simply fitting the full model and basing inference on this fit; otherwise, it may be viewed as an ad hoc compromise between the strategies specified above. Empirical-Bayes estimation may be viewed as a more formal compromise: as in the first strategy, coefficients will not be dropped; but, as in the second strategy, large unstable estimates from the full model may be replaced by much smaller estimates.

Hierarchical Methods
In addition to modeling the outcomes as random variables whose distributions are a function of the target parameters /, hierarchical methods model these target parameters as random variables whose joint distribution is a function of hyperparameters and prior covariates z that are thought to determine the magnitudes of the target parameters. For example, in a model in which 31_./36 represented the carcinogenic effects of six polychlorinated biphenyls x1,...,x6, the magnitude of each of these effects may be determined by the degree of chlorination of the particular compound. Thus, one would assign to each P3i (i= 1,...,6) a covariate zi that measures the chlorination of chemical x.. The impact of these and other such properties of exposures on carcinogenic activity could then be analyzed using a second-stage model for the first-stage (disease) coefficients pi. Consider, for example, the regression model pi = Zi 7r+ ai j=l,....,.n, i.e., / = Z7r + = ,u + d, [2] where zi is a row-vector of p known prior covariates, r is a column vector ofp possibly unknown prior coefficients, the 3i are independent Gaussian (normal) random variables with mean zero and possibly unknown variance T2, and Z is the n-by-p matrix with rows zi. Equation  To model effect modification, one may add product terms among the exposures and confounders to x, and then include their coefficients in the second-stage model.
Similarly, to more flexibly model dose response, one may add multiple terms (e.g., linear and quadratic) for a single exposure and then include their coefficients in the second-stage model. Building such models involve a number of complexities beyond the scope of the present discussion, however; Cornfield (11) presents an example involving interaction in cross-classifications.
The nuisance parameter y can be included with /3 in the second-stage model, or dealt with separately. Options for ywill be further described below. Bayesian Methods The classical Bayesian approach to inference on /3 requires that one completely specify the prior distribution for /3. Thus, to use Equation 2, one would have to treat the hyperparameters r and T2 as known quantities. Bayesian analysis would then proceed by merging the prior distribution for /3 with the likelihood function for /3 to obtain a posterior distribution for /3 (14, ch. 10). For simplicity, the present exposition will concentrate on the Gaussian approximation to this analysis.
Suppose that the prior follows Equation 2 and the likelihood for /3 is well approximated by a multivariate normal density with mean A and covariance matrix VI where , is the maximum-likelihood estimate (MLE) and Ais the inverse of the observed information matrix evaluated at /3; such an approximation is adequate in large-sample logistic and Poisson regression (15). The posterior distribution for /3 will then be approximately Gaussian with mean Bu + (I-B) A and covariance matrix VV(I-B), where B= (V+721I) V and I is the n-by-n identity matrix (6). Note that the posterior mean is a weighted average of the prior mean p and the MLE Ai.

Parametric Empirical Bayes Methods
The "naive" parametric empirical-Bayes approach treats r and T2 as unknown parameters, estimates them from the data (via one of several available methods), plugs these estimates into the prior, and then uses this estimated prior in a standard Bayesian analysis to obtain an empirical-Bayes posterior distribution. This "naive" empirical-Bayes approach is unsatisfactory because it fails to account for the uncertainty in the estimated prior parameters kand T. The resulting estimates can, however, be corrected to account for this uncertainty (6). With correction, the approximate mean of the empirical-Bayes posterior distribution for P under Equation where A=2B*e(B*e)'I(n-p) is an adjustment term that accounts for estimation of the prior variance (6). /* and C* can be used to obtain empirical-Bayes confidence regions for /; as illustrated below, these regions can be superior to the analogous conventional regions based on / and V(6). As discussed by Morris (6) and illustrated in simulations (10), the confidence intervals based on C* can be extremely conservative in small samples when the prior variance is small relative to the first-stage variance. Improvement in small-sample behavior can be obtained by componentwise variance corrections (6); with the Morris (6, Eq. 5.6-5.9) corrections, the estimated variance of component i of the EB estimator is Vi= Vii -(1-H1)( VB*)ii + (V, +X2I)W.7Aii, [5] where H*=Z(Z'W*Z)' Z'W*, V*=W*V/ EiW., and subscript ij on a matrix expression indicates the ijth element of the expression. Greater improvements may be obtained via further approximations (16) or via Monte-Carlo-based methods (17).
Approximate procedures become more complex if ir (as well as 3) is regarded as random (18), but such extensions are easily handled by Monte-Carlo methods (19). In particular, Monte-Carlo methods render practical a fully Bayesian empirical Bayes analysis, in which r and T2 (the unknown hyperparameters of the prior distribution for /3) are themselves assigned "hyperprior" distributions (20). Such distributions represent a third stage in the modeling process.

Semi-Bayes Methods
Consider Equation 2. It often is possible to specify the prior standard deviation r on a priori grounds, or at least to assign r a plausible range of values. In logistic regression with binary covariates, /3 represents a vector of log odds ratios, and it often is possible to set upper and lower prior bounds on these ratios. Typical environmental and occupational surveillance studies involve exposures whose effects are expected to produce small positive log odds ratios. If Z is a vector of ones, setting 'r = 0.5 would correspond in this case to being 95% certain that any given ratio is within an exp[2(1.96) (0.5)] = 7-fold range of about the prior geometric mean odds ratio; this would be appropriate if (for example) 19 out of 20 ratios fell between 1 and 7. Fixing r at some prior value in the empirical-Bayes analysis shifts both the philosophy and the mechanics of the analysis back toward the classical Bayesian approach; thus, when r is specified in advance, I refer to the empirical-Bayes analysis as "semi-Bayes." This approach is computationally simpler than standard empirical-Bayes; for example, under Equation 2, 7r will require iterative estimation using standard empirical-Bayes methods, but has closed form using the semi-Bayes approach (9). As shown in simulations (10), semi-Bayes confidence regions can be superior to standard empirical-Bayes regions when r equals or exceeds the true standard deviation of the components of /3, although this superiority is purchased by a risk of subnominal coverage if r is set lower than this.
Under Equation 2, the approximate mean and covariance of the semi-Bayes posterior distribution for ,B arẽ~~~~~~1 B= BF+ (I-B)fp [6] and and B = WV (as in the classical Bayesian analysis). Note that, as r is increased, the semi-Bayes estimate /3 approaches the maximum-likelihood estimate /,, and C approaches V. The variance estimate for component i of each SB estimator is i3=V.-(1-I_H)(VB )ii [8] where H=Z(Z'WZ)-1Z'W.

Preiminary-Test Estimates Revisited
For comparison to Bayesian methods, a preliminary-test estimator (such as a stepwise-regression result) may be viewed as an ad hoc, discontinuous shrinkage estimator (21): for any given data set, the estimate is either shrunk all the way to zero (if it gets deleted from the equation), or else it is replaced by an estimate from an equation with fewer variables than the full equation (if it is retained but other variables are deleted). This shrinkage rule is somewhat perverse relative to the usual empirical-Bayes rationale. Like empirical-Bayes, unstable estimates are most subject to change, but outliers are poorly handled. Unlike empirical-Bayes, small estimates are much more subject to change than large ones. It would seem natural, then, to expect preliminary-test estimators to have larger mean-squared error than empirical-Bayes estimators, and in fact this has been demonstrated in the multivariate normal model (22).
Another problem with preliminary-test estimators is the lack of a valid standard error to attach to the estimate, especially when the estimate is set to zero. A preliminary-test confidence interval may be constructed by using the information matrix for the full model evaluated at the reduced parameter-vector estimate. 'While this ad hoc device performs surprisingly well in small samples, simulation results (10,23) indicate that, in terms of coverage and average width, preliminary-test interval estimators still do not perform as well as other intervals.
Nuisance Parameters A common practice in occupational and environmental epidemiology is to eliminate the nuisance parameters y from the model by "preadjusting" the outcome variable for w (possibly after some preliminary reduction of w as well). For example, y may be the log of standardized mortality ratio (SMR), the latter being the observed number of cases of disease divided by the number expected given the age and sex (w) composition of the particular exposure group (x-level) under examination. Breslow and Day (24) and Checkoway et al. (25) present detailed discussions of this method (which implicitly assumes that joint exposure and confounder effects on rates are multiplicative). This approach allows elimination of explicit consideration of y in the problem of inference on /3; concern now focuses on inference on P in an equation g[E(ywl x)] = aw + x/3, where the subscript w indicates that some appropriate preadjustment for w has been made, as distinct from simply dropping w from Equation 1 (no preadjustment of x is needed if the exposure groups are homogeneous with respect to x).
Volume 102, Supplement 8, November 1994 One may instead fit the full first-stage model (Equation 1) directly, and then model both ,B and y in a second-stage model analogous to Equation 2. If y is allowed to have a different variance parameter from /, then, as this variance parameter becomes arbitrarily large, the hierarchical estimates of ywill approach the maximum-likelihood estimates. Also, the hierarchical estimates of ,B will approach those obtained by modeling ,B alone (as in Equation 2) when A is obtained from the full first-stage model (Equation 1). The resulting hierarchical estimates are modelbased analogues of those obtained using preadjustment.
An alternative approach would be to apply some type of preliminary-test model reduction to ywhen fitting Equation 1, followed by second-stage modeling of the coefficients that remain. Such an approach, however, would be subject to the same objections as other preliminary-test estimators, such as poor handling of outlying estimates.

An Example
Neonatal mortality (death among liveborn infants within the first 28 days after birth) underwent a notable decline in industrialized countries during the middle decades of this century. Neonatal mortality rates in the United States underwent a particularly important decline during the 1960s and 1970s, with the advent of a number of medical innovations, including electronic fetal monitoring and methods for management of prenatal and perinatal risk factors. An interesting question is the relative impact of interventions (such as monitoring), modifiable or potentially treatable factors (such as duration of pregnancy), and fixed patient characteristics (such as maternal age). How much did the decline in neonatal mortality arise from changes in these risk-factor distributions, especially for treatable versus fixed factors?
The Beth Israel Hospital study (26) provides limited data bearing on this issue. This cohort study of neonatal death included over 14,000 live births but only about 60 neonatal deaths from 1970 to 1975, after exclusions. The longitudinal nature of the study, and the documented trends over the period, provide means for evaluating the regression methods studied here (albeit indirectly) by comparing the success of the methods in predicting the observed trend. Table 1 displays the estimates of the coefficients in a logistic regression of neonatal death rates on 14 risk factors recorded in the Beth Israel data set, using only the 2992 subjects in 1970. (Certain other variables were excluded from the original study on a priori grounds; for example, Apgar score was excluded because it was considered an intermediate indicator of risk.) The preliminary-test estimates were derived using a 0.10 significance level, as was done in some early (pre-1978) presentations of these data (one additional variable, malpresentation, was significant at the 0.10 level in the full model but was not significant at this level after the other nonsignificant variables were deleted). The semi-Bayes and empirical-Bayes estimates are both based on a unidimensional prior and method of moments, as used in the simulation study (10). Risk factor i was assigned a prior covariate value of zi= 1 if the factor was expected to have a positive coefficient, zi= -1 if the factor was expected to have a negative coefficient, and Zi=0 if the factor was expected to have a near-zero coefficient. The prior variance T2 for the semi-Bayes estimates was set to 0.25, 0.5 and 1.0, round figures that also happen to be reasonable on prior grounds: For example, T2= 0.5 corresponds to a prior that 95% of the odds ratios for the factor effects, the exp(f3i), are within a exp(2(1.96'I0.5)=16-fold span, such as 0.75 to 12. (To save space, results for r2=0.25 and 1.0 are omitted from the table.) Also shown are the posterior means from a Bayesian empirical-Bayes (BEB) analysis based on extending the hierarchical model with third-stage hyperpriors for 7r (the coefficient of z) and r2. The hyperprior for it was Gaussian with mean 0.4 and standard deviation 0.2, while the hyperprior for '2 was 5.5 times an inverse x2 distribution on 13 degrees of freedom (the residual degrees of freedom in the second-stage regression). The latter hyperprior was chosen because it has mean 0.5 (the fixed '2 value in the semi-Bayes analysis) and roughly 90% of the distribution falls between 0.25 and 1 (a prior standard deviation r of 0.5 to 1.0). The hyperprior for it represents a geometric mean relative risk (per unit change in the expected harmful direction) of exp(0.4)=1.5, with a 95% hyperprior interval of exp[0.4 ± 1.96(0.2)] = 1.0, 2.2. This analysis was done via Gibbs sampling (19,27) averaging over the final 100 draws from 100 parallel Markov chains of 200 iterations each; using the measure of Gelman and Rubin (28), further iterations would probably not have changed any estimate by more than 0.5%.
For clarity and compactness, Table 1 gives standard errors only for the ML estimates. The preliminary-test (PT) estimates had 5 to 10% smaller estimated standard errors than the ML estimates; EB and BEB estimates typically had 20 to 40% smaller estimated standard errors, while the SB estimates typically had 25 to 50% smaller estimated standard errors. In the remaining discussion, I will refer to the EB, SB, and BEB estimates as the hierarchical estimates.
Environmental Health Perspectives hydramnios). are heavily shifted toward their estimated prior mean, while stable coefficients (such as for duration of pregnancy) are not much changed. It is interesting that three of the nonsignificant ML coefficients appear to be in the wrong direction on subject-matter grounds: dysfunctional labor, public ward, and prolonged rupture of membranes (PROM). All three are shifted in the a priori correct direction by the other methods, and two get the a priori correct sign from the hierarchical methods.
There is no precise prior information to judge the remaining coefficients, however. Figure 1 shows the results of using the five regressions in Table 1 to predict the neonatal death rates in Beth Israel Hospital in 1971 to 1975, the remainder of the study period. (Not shown is the "null" regression, which would be a flat line at 5.68 per 1000.) Although only 13, 10, 7, 3, and 7 deaths occurred in 1971 to 1975, and the predicted rates from each model are not significantly different from one another, their relative relationship to the observed rate bears an interesting resemblance to small-sample simulation results (10): the hierarchical estimates are closer to the true parameters (the observed rates) than are the ML and PT predictions. On average, the hierarchical curves are not closer to the observed curve than the null line, but they do correctly predict the changes between all years but 1971 to 1972. The hierarchical curves are not sensitive to reasonable choices for the hyperparameters.
For example, similar patterns are seen for all 0.2<T2<1 in the semi-Bayes analysis. Note that the ordinary empirical-Bayes and Bayes empirical-Bayes curves are indistinguishable; similar results were obtained using other hyperpriors for 7r and z2.
The discrepancy between the ML/PT curves and the hierarchical curves is almost entirely attributable to the failure of the former to correctly predict the trend over 1970 to 1971. Further analysis (not shown) revealed that hydramnios prevalence jumped in 1970 to 1971 from 3.3 to 9.4 per 1000, and that this jump resulted in the upward shift of the ML and PT curves, due to the large hydramnios coefficients in the ML and PT regressions. In contrast, the hierarchical curves were not so affected by the hydramnios jump because they used severely shrunken hydramnios coefficients. Most of the remaining fluctuations in the observed curve are captured by all four predicted curves, and are largely accounted for by changes in prevalence in the three strongest factors (duration of pregnancy, hydramnios, and multiple birth).
All four predicted curves move away from the observed curve to the same extent over 1971 to 1972. This suggests that most of the discrepancy between the fitted and observed curves was produced by a rapid and lasting decline in some important unmodeled risk factor (such as a medical management factor) around 1971 to 1972. Unlike the conventional analyses, the hierarchical analyses pinpoint this change as Deaths per 1000 The lesser difference among the predictions than seen in the simulated coefficient estimates could have been anticipated: Dempster et al. (21) noted lesser accuracy gains of ridge and Stein estimators relative to ordinary least squares when evaluated in terms of prediction error rather than coefficient-estimation error. In one sense, the PT regression does well, in that it captures the predictive power of the full ML regression while using only 3 of the original 14 covariates. On the other hand, the results conform to theoretical work (29) indicating that preliminary-test estimates will often do no better than least-squares estimates and tend to do worse than empirical-Bayes estimates in prediction problems.

Discussion
In this article, I have chosen to focus on empirical-Bayes estimation rather than the more general subject of "shrinkage" estimation, which subsumes Stein estimation and ridge regression (29)(30)(31). While it is possible to carry out analyses similar to empirical-Bayes and semi-Bayes analyses entirely within the non-Bayesian context of logistic ridge regression (30), hierarchical methods have an advantage in the interpretability of the tuning parameter that controls the degree of coefficient shrinkage: in empirical-Bayes and semi-Bayes regression, the tuning parameter is the prior variance, which has a direct subject-matter interpretation, and the inferences are more easily seen to be approximately Bayesian.
While the large-sample standard errors may be questionable in the above example (with 14 predictors for 17 deaths), simulation studies (10) indicate that their relative magnitudes do indeed reflect the relative precision of the estimators, even in small samples. Considering mean squared error, such studies also indicate that the variance reduction of the hierarchical estimates (relative to ML) more than compensates for the fact that hierarchical estimates are biased towards their prior means. Furthermore, the ML estimates are only unbiased in the large-sample sense, and so are not guaranteed to have any small-sample bias advantage over hierarchical estimates.
Simulations provide a quantitative estimate of the gains one may expect in employing hierarchical methods in epidemiologic studies of multiple parameters. The following simulation results (10) seem especially relevant to formulating recommendations for applications: * In small samples with few exposures and subjects, ordinary empirical Bayes did not provide dramatic benefits over ordinary maximum likelihood. * In contrast, the semi-Bayes variant of empirical Bayes appeared superior to the other simulated methods in very small studies, in that it provided narrower intervals with coverage robust to variance misspecification in such studies. * In large samples with many covariates (i.e., in large studies), there was little difference in performance between empirical Bayes and semi-Bayes with correct specification. Semi-Bayes had the disadvantage of sensitivity to misspecification; empiricalBayes was clearly superior to ML and preliminary testing. * Semi-Bayes estimation appeared robust to overspecification of the prior standard deviation (setting T too large), in that overspecification did not reduce confidence-interval coverage and did not significantly decrease small-sample precision. * Preliminary testing, perhaps the most common approach to handling many exposures, did not appear to have the precision or validity of ordinary empirical-Bayes when the number of covariates was large; it also lacked large-sample validity. Its validity could be improved by raising the significance level, but then its precision could drop below that of ordinary maximum likelihood. Based on these observations, one might recommend semi-Bayes methods for very small studies, empirical Bayes for very large studies, and either method for the spectrum in between, with a caution to overspecify the semi-Bayes prior variance if either the sample size or number of parameters is not small. Both methods, however, can be subsumed under and replaced by Bayesian empirical Bayes (BEB) methods: ordinary empirical Bayes arises as the limiting case of BEB as the hyperpriors for 'r and z2 become diffuse; semi-Bayes arises as the limiting case of BEB as the prior for Xt becomes diffuse, with a one-point hyperprior for zr.
If used with a proper hyperprior (as in the above example), Bayesian empirical Bayes may offer the best compromise between the small-study precision of semi-Bayes and the large-sample robustness of ordinary empirical Bayes. Furthermore, due to recent developments in Monte-Carlo and approximation methods (16,19,27,28,(32)(33)(34)(35), Bayesian empirical-Bayes analyses are now computationally practical. Nevertheless, because of their theoretical and computational sophistication, I suspect such methods will remain far removed from routine application in the near future, despite an abundance of applications in which they could be used. In the hopes of encouraging more applications of hierarchical methods, the present article has focused on the computationally simpler ordinary empirical Bayes and semi-Bayes methods, which can be easily programmed with a matrix language (such as GAUSS, SAS IML, S-Plus, or SC), run rapidly on a personal computer, and appear to provide good approximations to fully Bayesian results in some common situations.
Given that hierarchical methods can be recommended for multiple regression analysis, there remains one major problem in implementation: design of the structure of the prior (in particular, specification of the prior design matrix Z). The problem of prior specification is a familiar one in Bayesian analysis, and has remained a major obstacle to wide use of classical Bayesian methods. Nevertheless, the problem may be less intractable in hierarchical analysis: The major specification demand is that the investigator identify subsets of parameters within which the parameters may be regarded as "exchangeable" (possibly after location and scale transforms of the parameters, which must also be specified). Here, exchangeability means that the parameters within a subset may be regarded as draws from a common prior distribution, in much the same way as effects are modeled in random-effect and mixed-model analysis of variance (8). This ANOVA parallel is helpful in orienting the problem to a more widely taught and used context, and in pointing out the major limitations of hierarchical methods: if subsets of exchangeable parameters cannot be identified, the methods cannot be applied. For example, in simple descriptive epidemiology, one often begins by examining the dependence of disease occurrence on basic demographic variables such as age, sex, and race; the coefficients of these variables could hardly be imagined as draws from a common distribution, even after transformations. More generally, one must be able to specify a second-stage regression model and error-covariance structure that reflects dependencies among the first-stage parameters. Such specification demands even greater subject-matter familiarity than ordinary regression analysis.
The semi-Bayes method requires additional specification of the prior standard deviation, 'r. This step may be viewed as an elicitation problem similar to those encountered in classical Bayesian analysis. The objective is to specify a value that is an upper bound on the range of parameter values that would correspond to expert opinions. In practice, I have found that the potential range for this "minimal conservative" t is very narrow, and that its elicitation is simple: even in the most vociferous epidemiologic controversies, the span of relative-risk values posited for dichotomous causal factors rarely exceeds 25-fold and is usually within 10-fold. Allowing for a 5% chance that the entire span of expert opinion is too narrow leads one to specify r= ln(25)12(1.96) 0.8 in the former case and r= ln(10)/2(1.96) 0.6 in the latter. Furthermore, r may be allowed to vary with the prior covariates.
The prior specification problem is further mitigated by the fact that absolute stringency in specification of exchangeability or other aspects of the prior does not seem necessary to realize benefits from hierarchical methods. The neonatal-death example illustrates this point: the prior covariate used here is clearly naive; at the very least, an obstetrician would want to distinguish a priori strong risk factors (such as hydramnios) from a priori weak factors (such as ward). Yet, despite the naive prior, hierarchical methods produced better predictions of observable quantities than the usual methods. This result is not an isolated case: other applications of simple hierarchical methods have produced similar results in a variety of different contexts. Morris (6) presents an example and provides references to other examples. There is apparently much robustness in hierarchical methods, at least within the realm of applications considered to date. The chief caution seems to be that it is better to err on the vague side than the stringent side when specifying prior distributions.
Hierarchical methods have demonstrable advantages over conventional methods and are no longer seriously limited by computer hardware. It thus seems timely to introduce such methods into epidemiologic teaching and software, as was done with risk-regression methods during the 1970s and 1980s.
Environmental Health Perspectives