Confidence intervals for effect parameters common in cancer epidemiology.

This paper reviews approximate confidence intervals for some effect parameters common in cancer epidemiology. These methods have computational feasibility and give nearly nominal coverage rates. In the analysis of crude data, the simplest type of epidemiologic analysis, parameters of interest are the odds ratio in case-control studies and the rate ratio and difference in cohort studies. These parameters can estimate the instantaneous-incidence-rate ratio and difference that are the most meaningful effect measures in cancer epidemiology. Approximate confidence intervals for these parameters including the classical Cornfield's method are mainly based on efficient scores. When some confounding factors exist, stratified analysis and summary measures for effect parameters are needed. Since the Mantel-Haenszel estimators have been widely used by epidemiologists as summary measures, confidence intervals based on the Mantel-Haenszel estimators are described. The paper also discusses recent developments in these methods.


Introduction
In the study of cancer or other chronic disease epidemiology, the most frequently used measure of disease occurrence is the instantaneous incidence rate, which is the number of new cases per unit of person-time at risk (also called the incidence density or hazard rate). As for measures of exposure-disease association, attention is centered to the rate ratio and difference between the instantaneous-incidence-rates in the exposed and the unexposed groups.
Both of these two parameters of interest are directly estimated in cohort studies, while only the rate ratio can be estimated by the odds ratio in case-control studies. For a long period statisticians considered that the odds ratio could estimate the risk ratio for the ratio of two cumulative incidence rates, given that the disease under study is rare (1). However, it is explained that the odds ratio estimates the rate ratio, and the rate ratio can approximate the risk ratio if the disease is rare (2).
Many procedures have been proposed for calculating approximate confidence intervals for the parameters of interest in cancer epidemiology. The best-known approximation procedure is Cornfield's (3) method for the odds ratio. In the analysis of crude (i.e., unstratified) data, the approximate large-sample confidence intervals, based on unconditional efficient scores including Cornfield's method, may perform well.
Although a crude analysis possesses a cogency, strat-*Department of Epidemiology. School of Health Sciences, Faculty of Medicine, University of Tokyo, Hongo 7-3-1, Bunkyo, Tokyo 113, Japan.
ified or matched analysis is often needed to remove confounding. The approximate methods based on efficient scores can extend in a straightforward manner to common effect parameters when the number of strata remains fixed but sample sizes become large (large strata). However, the unconditional score methods will fail when fine stratification or matching has been made (sparse data). Since the famous Mantel-Haenszel estimators for the common odds ratio (4), and the common rate ratio (5) and difference (6) are consistent in both large-strata and sparse-data large-sample theories (7,8), the approximate confidence intervals based on the Mantel-Haenszel estimators have been developed in the past 10 years. Both of the first-order Taylor series intervals and Fieller-like intervals based on the Mantel-Haenszel approach perform well, and the latter possesses relations to tests of null association.

Approximate Methods for Crude Data
Odds Ratio from Case-Control Studies Consider a pair of independent binomial observations (X,Y) with denominators (n,m) and success probabilities (Pl, po). In case-control studies, X and Y denote the number of exposed persons out of n cancer cases and m controls. We wish to find approximate confidence intervals for the odds ratio , = pl(lpo)l[po(l -Pi)] based on efficient scores.
The unconditional log-likelihood may be the sum of the logarithms of two binomials with parameters (Pi, po). When one reparametrizes by letting p1 = 'Po/(IPo + 1 -po), then the score statistics are Si,(qpo) = aL(p,po)la* = (Xnpl)/4 and SplO(,po) = dL(*, Po)/dpo where t = X + Y. The maximum likelihood estimator (MLE) of the nuisance parameter po, Po, is the solution to the equation Spo(qi, PO) = 0. Using the conventional notation that EA = EA(X;O) = npl and Pi = 'Po / [*P0 + (1 -Po)] EA could be determined as the appropriate root of the quadratic equation The score method is based on S1,*( PO) = (X -EA)/14. (1) where z,x./2 is the 100(1 -a/2) percentile of the normal distribution and c' = 1/2, when a correction for continuity is needed, or c' = 0, otherwise. This equation is identical with that proposed by Cornfield (3), and some algorithms to solve it iteratively are given by Gart (9) and Fleiss (10). Gart and Thomas (11,12) showed that Cornfield's method with and without the continuity correction perform well in the conditional and unconditional sample spaces, respectively.
As an alternative to Cornfield's method based on unconditional scores, we may use more accurate approximate mean and variance of X conditional on t in Eq.
(1). The conditional distribution of the data is the noncentral hypergeometric distribution with noncentral parameter qi. Harkness (13) showed an exact relation between the mean and variance that E[X(mt + X)lt] (14) (15). Among 200 male cases diagnosed with esophageal cancer, 96 were exposed to high daily alcohol consumption, while 109 among 775 controls were exposed to high daily alcohol consumption. The (unconditional) maximum likelihood estimate of the odds ratio is * = 5.640. The approximate 95% confidence intervals (q'L, qpu) are: Cornfield

Rate Ratio and Difference from Cohort Studies
In follow-up studies of dynamic populations, X and Y denote the number of persons contracting the disease out of n exposed and m unexposed fixed person-time denominators. Thus X and Y are modeled as a pair of independent Poisson observations with means (nr1, mro), where r1 and ro are the instantaneous incidence rates ofthe exposed and the unexposed. The parameters of interest are the rate ratio w = r1/ro and the rate difference e = r, -ro.
In making inferences about the rate difference it is necessary to employ the unconditional distribution of (X, Y). By letting r, = ro + e we have Sj(tj rO) = (Xnr,)/r, where ir= ro + e and (17) Sato (1 7) gave the Newton-Raphson procedure to solve Eq. (8) iteratively. Alternatively, the usual first-order To illustrate these confidence interval methods for the rate ratio and difference, we use follow-up data on the breast cancer for women with tuberculosis repeatedly exposed to multiple X-ray fluoroscopies and women with tuberculosis not so exposed (20). Among the exposed group, 41 women suffered from breast cancer out of 28,010 person-years, while 15 women suffered from breast cancer out of 19,017 person-years among the unexposed group. The estimated rate ratio is Z = 1.856. The approximate 95% confidence interval based on the score method [Eq. (5)] is (1.036, 3.325), while that obtained by Eq. (6) with Eq. (7) is (1. 006, 3.611). The lower limits are very close, but the upper limit by the score method is smaller than that by the approximate conditional method. It is because the normal approximation in the use ofthe score method will be inadequate unless t is large. When the rate difference is of interest, its estimate is e = 67.50 per 100,000 person-years. The approximate 95% interval based on the score method Eq. (8) is (4.320, 129.0) per 100,000 person-years, while that given by the usual method Eq. (9) is (7.493, 127.5) per 100,000 person-years. Sato (17) showed that the score method 8  The first formula for an asymptotic variance of 1MH was given by Hauck (21) on the basis of large-strata limiting model where the number of strata K remained fixed but each Nk tended to infinity. Breslow (7) proposed the conditional variance, based on the noncentral hypergeometric distribution, using a sparse-data limiting model in which K tended to infinity but a finite number of different configurations of (nk, Mk) occurred. A well-known example of this limiting model is (1, M) matched case-control design. In both limiting models 4MH iS consistent for * and asymptotically normal. Although 4MH is not asymptotically fully efficient (22) unless i4 is unity, it maintains high efficiency relative to the efficient estimators for ip under both large-strata and sparse-data cases (23,24).
Because of the skewness of the distribution of 4MH, the natural log transformation is usually used to construct the confidence intervals for * (25). Robins et al. which is the arithmetic average of an original estimator and a recomputed one after interchange. The second term of the right-hand side of the above equation is equal to zero when Nk are constant across strata, and tends to zero when Nk are increased as in large-strata case. As a result, Flanders' estimator may essentially be the same as VRBG. The resulting 1 -a Taylor series confidence interval for ip (ln-method) is obtained by A Fieller-like interval proposed by Sato (29) (11) where Pk = (Xk + Mk -Yk)lNk and Qk = (Yk + nk -Xk)lNk, from a first-order Taylor's series expansion. Flanders (28) proposed a consistent estimator similar to VRBG. Unfortunately, his estimator is not invariant under interchange of the labels, i.e., the cases and the controls, or the exposed and the unexposed. Under such interchange, only the sign of ln4MH changes, and the true variance cannot change. Obviously VRBG has this invariance property. An invariant version of Flanders' variance is given by 2 where c = (1 + OP)/4, when the continuity correction is needed, or c = 0, otherwise. The correction value (1 + O/)4 is chosen in order to hold the invariance under interchange of the labels. In the matched-pairs case, Eq. (14) with the continuity correction reduces to the equation based on the normal approximation to the conditional distribution given by Breslow and Day (15). The Fieller-like interval [Eq. (14)] is closely related to the Cochran-Mantel-Haenszel (4,30) test of null association (29). Consider the problem of testing that = 1. For this null value, the following simplifications K 1 E PkRk VRBG = 2 k=[ 1 (ERk)2 k=l nominators (nk, Mk) and means (nkrlk, mkrOk), where rlk and rOk are the instantaneous incidence rates of the exposed and the unexposed. First suppose that the rate ratio o = rlklrOk remains constant across strata. Let rlk = Xklnk, rOk = Yklmk, Nk = nk + Mk, and tk = Xk + Yk. Similar to the odds ratio case, the Mantel-Haenszel estimating function for X is arrived at Hence the Mantel-Haenszel rate ratio (5) is the solution of W(w) = 0 that (MH = > j lRkI>= lSk, where Rk = mkXklNk and Sk = nkYklNk. In both sparse data and large strata .MHiS also consistent but inefficient; however, it maintains relatively high efficiency (8,33). Because of the skewness of the distribution of 'MH, the log scale may be used to set the confidence intervals for w. The asymptotic variance formula of lnioMH is similar to that of ln+MH. Noting that v'(Rk -WSk) (I)nkMktkiNk, we have occur: Rk -Sk = Xktk N = Xk -E(Xkltk, 4  For a numerical comparison of the lnand Fieller-like methods, we use two data sets that are examples of sparse-data and large-strata cases. Table 1 gives the (1, 4)-matched case-control data that is the study of the effect of exogeneous estrogens on the risk of endometrial cancer at Los Angeles (15). The Mantel-Haenszel odds ratio is 4MH = 8.462. We found the approximate 95% intervals (I'L, I'U) = (3.412, 20.99), (3.535, 20.25), 3.294, 23.53) for the In-method [Eq. (12)], the Fieller interval [Eq. (14)] without and with the continuity correction, respectively. These methods give very close intervals. Table 2 gives stratified data of the Ille-et-Vilaine study (15) referred to previously. The Mantel-Haenszel odds ratio is 4MH = 5.158. The approximate 95% interval based on the In-method is (3.562, 7.468), while those obtained by the Fieller-like method without and with the continuity correction are (3.580, 7.431) and (3.498, 7.656). Again these intervals are reasonably close.

Stratified Cohort Analysis
Consider now a series of K pairs of independent Poisson observations (Xk, Yk) with fixed person-time de- The Fieller-like method also extends in a straightforward manner to the common rate ratio (29). Using arguments that parallel those given in the odds ratio case, the approximate interval (wL, wOU) is obtained as the two solutions to the quadratic equation the Fieller-like interval for e is the two solutions to the quadratic equation (18) where The 1 -a Taylor series interval for e is thus obtained by EMH Z,/2Via r) (19) As an alternative to the Taylor series interval, we propose a Fieller-like method similar to Eqs. (14) and (18 (20) where c' = 1/2 with the continuity correction, or otherwise zero.
It is noteworthy that the Fieller-like methods Eqs. (18) and (20) have close relation to test of null exposuredisease association. When testing null association that w = 1 and t = 0, both T(w) and T(t) can reduce to which is identical with the efficient score test of null association given by Shore et al. (34).
To illustrate the confidence intervals methods for the rate ratio, we consider the Montana study of arsenic exposure and respiratory cancer (35). Table 3 gives observed deaths and person-years of the Montana study stratified by age, class and calendar period. We find a summary rate ratio estimate of (MH = 3.138. The conditional and unconditional variance estimates of InWMH are calculated according to Eq. (15) of VB = 0.0501 and Eq. (16) of VGR = 0.0520. The approximate 95% confidence intervals based on the ln-method [Eq. (17)] are (2.023, 4.867) with the conditional variance and (2.007, 4.907) with the unconditional one, while those obtained by solving Eq. (18) are (2.014, 4.889) without the correction and (1.948, 5.102) with it. Similar to the odds ratio situation, these intervals are quite close.
Although both the rate ratio and difference cannot remain constant in a data set, in order to illustrate the Table 3. Deaths from respiratory cancer among Montana smelter workers.