Empirical Bayes methods for smoothing data and for simultaneous estimation of many parameters.

A recent successful development is found in a series of innovative, new statistical methods for smoothing data that are based on the empirical Bayes method. This paper emphasizes their practical usefulness in medical sciences and their theoretically close relationship with the problem of simultaneous estimation of parameters, depending on strata. The paper also presents two examples of analyzing epidemiological data obtained in Japan using the smoothing methods to illustrate their favorable performance.


Introduction
One of the most promising and rapidly developing branches of statistics is the use of smoothing methods that are based on the empirical Bayes approach. These methods are known in econometrics and engineering, but in medical sciences their use appears sparse in spite of their potential. The smoothing methods were developed separately from the standard statistical theory. For example, the moving average method was introduced in a heuristic way, though it is intuitively appealing. The aim of the paper is to review recent developments of smoothing methods in relation to the standard statistical method. Our emphasis will be placed on their usefulness and the need for further research on extending the methods so as to be useful in analyzing the epidemiological data.
The smoothing problem is regarded as the simultaneous estimation in a model with many strata under the assumptions that the strata are linearly ordered and the neighboring strata have density functions close to each other. This view permits us to formulate the model by describing the smoothness in terms of the prior distribution on the hyperpopulation and to embed the smoothing methods in the standard theory. Then we can construct estimators and test statistics by applying the likelihood inference such as the maximum likelihood estimator and the likelihood ratio test.
We begin with the formulation ofmethods in a general form, followed by the explicit description of the standard methods including the Stein problem and useful smoothing methods. Our formulation is a direct exten-sion of well-known ones,but it is not seen in the literature. Historical notes and relations with other procedures are added. The review of smoothing methods extend to more advanced ones. Finally, our experiences in analyzing epidemiological data sets in terms of the smoothing methods are given.

Methods in a General Form
Consider a model with K strata having the density (probability) function of the kth stratum, p(x; 0, P k), k = 1,... ,K where the parameter Rk depends on the stratum and 0 is common through the stratum. Let Xki, i = 1,... nk be a sample of size nk from the kth stratum. Write p. = (Rl, .. . pa), and Xk = (Xkl,... Xk,,k)'. Then our problems will be the following: a) estimate the parameter pu, b) estimate the parameter 0, and c) test for the null hypothesis ,u E Mo.
Keep in mind that our interest is placed on all the parameters in a model. We assume ,u is an outcome from a hyperpopulation having the density function g(t.;8), 8 E D, which is a prior distribution in the Bayesian context. The parameter space D has a limiting point 80 such that g(p.;B) tends to a degenerated measure; write it g(p.;50) for convenience. The null hypothesis Mo in the test problem above will be expressed as 8 = 80.
The overall likelihood is written as The rejection region of the test for ,u e MO with the level ot is T = 2 log{ML(x; 0, 9)/ML(x; 0)} > ca, where 0 maximizes ML(x; 0, 80).
Some extensions look straightforward. The difficulties could arise in calculating the marginal likelihood, in numerical maximization of the likelihood, and also in obtaining the critical value ca. The use of the conjugate prior distribution, if acceptable, sharply reduces computational load.

Applicable Models
Selecting density functions p(x; 0, ,u) and g(,u; 8) suitably, we can give a variety of methods.

Example 1 (Stein Problem)
Let Xk be a sample of size 1 from a normal population N(ILk, 1) (1). Suppose pt is a sample vector of size K from a normal hyperpopulation N(0,8). In this example the common parameter 0 does not appear, and the value 80 is 0. Then it follows that the estimate Ik = [11x112 -K] XkIIIXII2 with [z]+ = max(z,O). The test statistic T takes the value 0 for 11XI12 < K and IIXI12 -K log (11x112/ K) -K otherwise. Therefore the rejection region of the test for R, = ... = ,UK = 0 with a standard level a say 0.05, is _xa2 > ).

Example 2 (One-Way Design)
Let Xk be a sample vector of size n from a normal population N(Rk, Cr). Suppose A is a sample vector of size K from a normal hyperpopulation N(T,v). Then or2 and (T,v) correspond to 0 and 8, respectively. Some algebras yield fk = x + [Rl]+(xk-)/R where x and Xk are sample means of x and Xk, and R = SVS2 with S2 and S2, being the strata and within variances. The rejection region of the test for the homogeneity of ,uk'S with a standard level is expressed as R > FK 1,(n-1)K;1-a'Y which is equivalent with the conventional F test. The estimator c2 is given by S2 if S2, <bS and {(K -1)Sb + (n -1)KSw}/(nK-1) otherwise.
The two simple examples just discussed show that the obtained estimators and tests are appealing. The derivation of methods based on other models is easily done in a parallel way, especially when the conjugate prior distribution can be assumed. However more useful methods pertain to smoothing data. We can find a series of attractive, useful methods for smoothing data, and our attention will later focus on the smoothing problem.

Example 3 (Smoothing Based on Differences of the Second Order)
In the standard smoothing problem the strata are linearly ordered in k. Let Xk be a sample of size 1 from a normal population N(p1k,u2). To describe our confidence of gradual change of 1tk, we assume R is an outcome from a multivariate normal hyperpopulation N(ael + Pe2, SD-), where el and e2 are the normalized orthogonal vectors from (1,. . . ,1)' and (1,2, . . . ,n)', and D-is the Moore-Penrose g-inverse matrix of D such that x'Dx = (Xk+2 -2411 + Xk)2. Therefore it holds that De1 = De2 = 0. The null hypothesis Mo = {>I1> = ae1 + be2} is expressed as 80 = 0, consequently, yo = 0. It follows after the partial likelihood treatment that the marginal likelihood is given by with -y = u2/8 and I being the K x K identity matrix. Let i be the estimator maximizing ML(-y). Then the estimators are ,u = (I + yD)_'x, a2 = x'(I-(I + -yD)1l)x/(n-2). The rejection region of the test for linearity of ,u is given by T = 2 log ML( A)/ML(oo) > ca. The critical value c0, depends on K and is given using the simulation study by Yanagimoto and Yanagimoto (2).
The extension of the smoothing problem based on differences of the general dth order is straightforward except for obtaining ca. The simulation studies show that critical values for a = 0.05 are approximated by a(d)(K + d + 1)/K for d = 1, 2, 3 and 4, where a(l) = 2.0, a(2) = 1.85, a(3) = 1.75 and a(4) = 1.7.

Historical Reviews
As far as we know, the empirical Bayesian approach to smoothing data was started by Whittaker (3) and Whittaker and Robinson (4), where the word "graduation" was used in place of "smoothing." Shiller (5) posed the use of the smoothness prior distribution. These gave mathematically elegant formulations of the penalized least square method. However, in these papers the estimation of the ratio of parameter y = el/5 was not given explicitly. In the Bayesian context the prior distribution is assumed to be known, but the assumption looks too restrictive in practice. Wahba and her associates (6,7) developed mathematical aspects of the smoothing problem and recommended the use of the generalized cross validation. The conceptual progress of likelihood inference in the Bayesian (including empirical Bayesian) model is attributed to Good (8). Akaike (9) advocated the use of type II likelihood, that is, the marginal likelihood. He also extended the smoothing problem so as to cover the seasonal adjustment.
The empirical Bayesian formulation described here is associated with various other statistical methods. Henderson (10) discussed the estimation problem ofthe component effect in random effect modeling. The procedure previously described provides an explicit one. A formal application of the EM algorithm (11) yields the same estimate of the parameter y = a2/8. The Kalman fltering (also smoothing) is computationally efficient (12), though it is not easy to identify the distribution of the initial state. The practical importance of a test for homogeneity was stressed by Yanagimoto and Yanagimoto (2). Morris (13) recommended a nomenclature, the parametric empirical Bayes method. However, it seems to the authors that a rather general term presented by Cassella (14) is preferable.
The smoothing model has a wide range of extensions and modifications. Later we review the seasonal adjustment model and the smoothing model in the twodimensional space. These two models look promising in analyzing epidemiological data. Other applicable models will be found in non-Gaussian modeling. In the simultaneous estimation of many parameters as in Examples 1 and 2 the conjugate prior distribution is useful. However it is tough to develop the conjugate prior distribution in the smoothing except for the normal case, since there is no flexible, tractable, multivariate nonnormal distribution (15). Recent researchers on non-Gaussian modeling are succeeding in innovating the analysis in this area. [For example, see West, Harrison, and Migon (16) and Kitagawa (17).] An attempt to apply the model to data for asthma attack is seen in a report by Kamakura and Yanagimoto (18).

Further Smoothing Methods
An advantage of the empirical Bayes smoothing method is its versatility. Actual data often has their own characteristics usable for analysis. In turn, our purpose for analyzing the data is often associated with the characteristics, for example, monthly data consisting ofthe incidences ofdiseases. (An epidemiologist may suspect a significance of the seasonal effect and hope to obtain the estimated trend.) Thus we can recommend formulating the potential seasonal effect in terms of a suitable prior distribution. Such advanced methods are still under investigation. and the requirements to Tk are the same as those in Pk shown in Example 3. Note that we add 13 hyperparameters to the previous model. Since all the distributions appearing in this model are hormal, there is no need for numerical integration for calculating the marginal likelihood. Numerical optimization is, however, still elab-orate. This model was originally developed by Akaike (9).
The seasonal adjustment method is widely employed in econometrics and is known as a typical problem having a difficulty in identifiability. Various methods such as X-11 have been proposed. We again emphasize that the above approach is based on clear analytical assumptions and procedures for inference of parameters contained in the model. These advantages are mostly desirable in natural sciences.

Example 5 (Smoothing of Spatial Data)
In this example we let Pkh be the variate at the (k, h)th site of a two-dimensional rectangular lattice. Whittle (19) proposed a simultaneous autoregressive model: Fkh = E akhijpLij + Ekh k = 1,... , K, and applied it to data for the yield of wheat. The unknown parameters in both models are estimated using the maximum likelihood method. Kashiwagi (21) gave an empirical Bayesian formulation of a smoothing method for spatial data; he pointed out that the likelihood function, defined in both the simultaneous and conditional autoregressive models, is equivalent to the marginal likelihood function in the empirical Bayes method. In the context of the smoothing spline, Wahba (22) studied the use of thin plate splines for smoothing noisy multidimensional data.
It seems that this method is applicable to analyzing meshed geographical data for mobility and mortality. It enables us to give all the smoothed estimates of Pkh using our knowledge of gradual changes. Descriptive methods such as the grid square method (23) are attributable to skilled subjective judgments.

Applications
Two examples of applying the smoothing methods to actual data obtained in Japan follow. Cancer Mortality in Japan Stomach cancer is still the largest cause of cancer death. We analyzed yearly data cited from Japanese vital statistics for the crude number of cancer death for males during the time period between 1965 and to 1986. Figure 1 shows the result in the case of stomach cancer in males. We observe that even in the crude number base, the manual mortality has been decreasing in recent years, though it is widely accepted that the adjusted mortality is decreasing. The fitness of the simple linear regression is apparently bad. This is supported by the fact that the (marginal) likelihood ratio test statistic T takes 11.34, which is much greater than 1.85 -25/22. To compare it with an existing method, the same data are also analyzed using the familiar statistical software, S, which is given in Figure 2. The general trends are similar, but the estimated line in Figure 2 looks overfitted; ours appears to be more appealing. A clearer difference between the two analyses is the fact that ours is closely related with the simple linear regression. The simple linear regression is powerful and often our primary choice.

SMON Patient Incidence
According to leading Japanese epidemiologists, subacute myelo-optico neuropathy (  YEAR _ large:scale side effect of the drug, clioquinol. At that time when the etiology of SMON was under study, it was suspected that a relatively high incidence of SMON cases occurred in the summer. To illustrate the usefulness of the seasonal adjustment method, we analyzed the data for the monthly incidence of SMON cases cited from Table 7.1 in the Research Report (24) between November 1966 to August 1970. The estimated line with the estimated trend and seasonal effects is given in Figure 4. The smoothing model disregarding the seasonal effects is also fitted and is given in Figure 5. Both the estimated lines appear to be acceptable. More precisely, very short-term fluctuations are observed in the seasonal adjustment method. On the other hand, the upper and lower peaks cannot be interpreted well by the smoothing method. The likelihood ratio test statistic takes 50.32. Since the difference of numbers of a parameters in the models is 13, the test for the existence of seasonal effect is obviously highly significant, though we do not have explicit results on the critical value. The estimated seasonal effect shows the gradual increase of SMON from winter to summer and the highest peak seen in September, followed by a sharp decrease.
The assumption of the Poisson distribution may be more familiar than that of the normal distribution. In this case we must apply the non-Gaussian theory, and its actual implementation, including the use of computer programs, requires further investigation.