Corrected likelihood for proportional hazards measurement error model and its application.

Consider the case where the exact values of covariates in the proportional hazards model may not be observed but instead, only surrogates for them involving measurement errors are available. The maximum likelihood estimate based on the partial likelihood with the true covariate replaced by the observed surrogate is even asymptotically biased and may cause seriously misleading results in covariance analysis based on the partial likelihood. These facts are illustrated by Monte Carlo simulation. A correction to partial likelihood proposed by the first author is studied to gain insight into its merits and limitations in practical applications. The results indicate that when the "effective magnitude of the measurement error" as defined in this article is small, which is indeed the case for most applications, the method will be useful. Some other correction methods for the measurement error in censored survival models are also reviewed and discussed.


Introduction
The proportional hazards model in censored survival analysis, generally referred to as the Cox-regression model, has been used extensively in medical applications. The model allows quantitative assessment of the effects of covariates on survival times without assuming any specific form for the distribution of survival times. It is commonly used in cancer clinical trials to test the equality of treatments adjusting for observed imbalance of prognostic factors between treatment groups, and in epidemiology to measure the degree of association, adjusting for confounding factors, between the level of a certain risk factor and failure times due to several distinct causes under the so-called "competing risk" situation.
The ordinary partial likelihood proposed by Cox (1) assumes no measurement error in covariates; the maximum likelihood estimates based on the partial likelihood will yield asymptotically unbiased estimates of the regression coefficients of the covariates when the values of the covariates are exact. This paper was presented at the 4th Japan-US Biostatistics Conference on the Study of Human Cancer held 9-1 1 November 1992 in Tokyo, Japan.
Much of this work was done while the first author was at the National Institute of Environmental Health Sciences, Research Triangle Park, North Carolina. We thank David Hoel for his support and appreciate the important suggestions from statisticians at the institute. We are grateful to the Scientific Data Center of A-Bomb Disasters, Nagasaki University, for providing the survival data. The work was also supported by the  This article illustrates certain effects of measurement error on the estimation and inference of parameters in the proportional hazards model. We will deal first with the measurement error in a risk variable to observe the so-called "attenuation" as described by Fuller (2) with normal linear regression models. Then we will study the effect of measurement errors in unbalanced confounding variables on the estimation of a treatment effect as illustrated by Carroll (3) with covariance analysis in normal linear regression models.
We will review the correction proposed by Nakamura (4) to the partial likelihood (1), adjusting for measurement error, and study the performance of the correction with Monte Carlo simulations. Finally, the correction will be applied to the atomic bomb survivor follow-up data with inaccurate dose estimation

Proportional Hazards Measurement Error Model
The proportional hazards model assumes the hazard of an individual with a covariate z to be X(tlz) = XO(t)exp(pTz) where t denotes elapsed time, ko(t) an unspecified baseline hazard at t, and I an unknown coefficient vector for z. Let D denote the set of persons who died, ti the time of the ith death of D, whose covariate will be denoted by Z(i), and Ri the risk set at ti. Let Z stand for the set of z's and Y denote the set of the observed values of the dependent variables D, t, and R. Define Si(f,Z) = I exp(TZj}) jERi Then the partial likelihood is written by Fep(TYZ(i)) ieDD S'i(f3Z) [ and Pz which maximizes the partial likelihood, if attainable, is the maximum partial likelihood estimate.
When the z's are not directly observed but, instead, estimates x's are obtained such that xk=zk+Sk with 6s being independent of z's and normally distributed with mean 0 and a known variance matrix A. Let X denote a set of x's. If x's are used in Equation 1 instead of z's, then we have L(01 Y,X), which is termed a naive likelihood and x which maximizes it, will be called a naive estimate. Nakamura (4) proposes an approximately corrected partial likelihood adjusting for the measurement error [exp P X(j) L IYX)= iDlS3X exPGA ){ [2] where Gi = 2T AJ(I -Ci(, X)) [3] with C.(P,X) = Si(2p,X)ISi(p,X)2 [4] Environmental Health Perspectives The P* that maximizes L*(PI YX) with J*(p*, Y,X) = _a2logL*(J3l Y,X)IJ2la= 1, corrected observed information, being positive definite is a corrected estimate. When Cj is taken to be simply TAp/2, the resulting L*(PI Y,X) will be called a first-order corrected partial likelihood and the corresponding * a first-order corrected estimate. The first-order correction does not depend on the observed and true values of the covariates. Nakamura (4) proposes an estimate for the asymptotic variance of [* based on MI estimate theory (5).

Effect of Measurement Error in Risk Variable
Fuller (2) and others dealing with the covariate measurement error in the normal linear regression models describe the phenomenon called "attenuation" of the least squares estimate for regression coefficients; that is, the estimate ignoring the measurement error tends to be small in absolute value. Prentice (6) observes the attenuation caused by the measurement error in the proportional hazards model through simulation. Hereafter, the estimates obtained ignoring the measurement error will be referred to as "naive" estimates.
In the simulation of this section, Z is a set of 100, 200, or 300 uniform random numbers in (0,121/2) so that SD(Z) = 1.
We fixed l =1 and measurement errors 8's independently follow the normal distribution with mean 0 and the standard deviation a=0.5, 0.6, 0.7, 0.8, 0.9, or 1.0. Y was generated following the proportional hazards model based on the method as described by Akazawa et al. (7), where 20% of the cases were assumed to survive to the time of the last death. A merit of the method is that it does not require any distributional assumption for ko(t). Then £'s were generated to obtain X, a set of observed covariate values, and the naive and corrected estimates were obtained from the Yand X For each combination of (0,n), 400 independent (YX)'s were generated to obtain 400 independent estimates. The results are shown in Table 1 for n= 100, Table 2 for n= 200 and Table 3 for n = 300. Some of the iterations for the corrected estimates terminated before completion due to the encounter of negative values of I*(O, Y,X) caused by large trial values of P appearing in the course of Newton-Raphson algorithm; in such cases the pair (1 §X) was deleted and the number of repetitions became less than 400. The biases of the naive estimates are substantial and independent of the number of cases. In the normal linear regression model, the expected value of the naive estimate equals the true value multiplied by Var(Z)/{Var(Z)+ 12', which indicates the degree of attenuation and is termed "reliability ratio" by Fuller (2). The degree of attenuation observed in the tables is close to Var(Z)/{Var(Z)+(a+0. 1)2}, indicating the bias is more serious in the proportional hazards model than the normal linear regression model; Tosteson (personal communication) observed the degree of attenuation in probit regression models was also close to Var(Z)/{Var(Z)+(aY+0.1)21 . Nakamura (4) argues that l1al should be small for most applications, and therefore we performed additional simulations with smaller values of a to gain further insight into the problem; the results are shown in Table 4. For smaller f, the degree of attenuation is nearly equal to the reliability ratio in the normal linear regression model.
Effects of the corrections are remarkable; however, the performance depends on the sample size: the larger the sample size, the smaller the bias. The number of failures to find corrected estimates, denoted by fi increases as ay increases. The simulations for n=100 with a.0.9 and n=200 with A referee wonders if the corrected estimates exist in theoiy, but cannot always be calculated in practice with the algorithm used, so that the corrected estimates obtained may not be representative. In the probit regression model, corrected estimates do not exist in theory when measurement errors are large (8). Thus, further studies are required to have clear interpretation of the results for n and a with largef values.
The biases of the first-order estimates are slightly larger than those of the corrected estimates, and falso tends to be larger; but the difference is negligible for small a.
According to the considerations by Nakamura (4), lo3Yl is invariant under linear transformations in z's and the perfor-

Measurement Error in Unbalanced Confounding Variable
The proportional hazards model dealt with in this section is k(tlA,z) = Xo(t)exp(PAA + I3z) [5] where A equals 0 or 1, depending on whether an individual belongs to a control group or a treatment group, and z is a confounding variable. Equation 5 is a model commonly applied to estimate the treatment effect P. adjusting for the confounding effect with z. We will study by simulations the effect of measurement error in z on the estimation of P. in terms of the partial likelihood method. In the following simulations, x= z+£ for £ being normally distributed 2 with mean 0 and variance a .
We will fix P= -0.3 and Pz = 1. The z's for the control are sampled from a triangular distribution with support (0,121/2) that 01/2 = has the maximum density 2/12 at z= 0 and the minimum density 0 at z= 121/2; on the other hand, the z's for a treatment group are sampled from a triangular distribution with support (0,12 /2) that has the J/2 1/2 maximum density 2/12 at z= 12 and the minimum density 0 at z= 0; the sample size is 150 each. Figure 1 illustrates the triangular distributions. SD(Z)=1 for the combined samples. Although the support of the two distributions are the same, z's are considerably unbalanced between the two groups. The a Figure 1. Illustration for the difference in the distribution of a confounding variable z between the controls and the treated cases. generation of Y, X, and the estimates are performed in the same manner as in the previous section. The number of repetitions for each ay is 100. The results are presented in Table 5.
It is striking that the naive estimate for PA indicates almost no effect of the treatment when ( = 0.5 or 0.6, and indicates even reverse effect of the treatment when the measurement error is larger. The bias in P_ is more serious than the corresponding result with the same a in the previous section. On the other hand, the corrected estimates appear approximately unbiased for the treatment effect PA when a<0.8 and are rather attenuated when the measurement error is larger.

Application
Pierce et al. (10)(11)(12), Gilbert (13), Prentice (6), and some recent Radiation Effects Research Foundation (RERF) technical reports deal with measurement error in estimated doses for atomic bomb survivors followed by RERF since 1945. The cohort data have been the main source of information concerning the biologic effects of radiation exposure on human survival. Determining the shape of dose-response curves for the data is especially important when estimating low-dose risks, because the procedure involves extrapolation to low doses from results with those exposed to high to intermediate doses (12). Since estimated doses may be subject to substantial "True values of PA and ,z are -0.3 and 1, respectively.
Measurement errors exist only in the confounding variable whose standard deviation is denoted by a. Estimates are the means of 100 independent estimates. error, which may lead to an incorrect model or biased estimates, the effect of the measurement error on the dose-response analysis should not be ignored. Studying the nature and the magnitude of the error in dose estimation based on the T65D system, Jablon (14) concluded that, for an estimate x of a true dose z, log(x) follows a normal distribution with mean log(z) and a standard deviation of about 0.3. Independent of the RERF studies, Nagasaki University has conducted a follow-up study of atomic bomb survivors registered in Nagasaki City Hall since 1970 (15). The corrected likelihood method is applied to the follow-up data between 1970 and 1987 for the survivors whose estimated doses were positive and less than or equal to 600, as has been done by Okajima, Mine, and Nakamura (15). The covariates are dose = log(x); age = age as of April 1, 1970; and sex= male or female. The underlying time variate is the followup time. The variance matrix for the measurement error is 3x3 with 0.16=0.42 at the (1,1) component and 0 otherwise. Table 6 shows the results for major cancer sites. The averages of dose are between 4.0 and 4.5; SDs are 1.0 and 1.1. Emigration and deaths due to other causes are treated as censored following the standard treatment under the competing risks situation  (16). The corrected estimates are roughly 20% larger than the naive estimates. Table 7 breaks the result for all cancers into those for age and sex categories, cited from Nakamura (4). The ratio of each estimate to its estimated standard deviation is nearly equivalent for the naive and corrected estimates; this phenomenon is verified analytically by Stefanski and Carroll (17) for generalized linear models. The first-order correction presents practically the same results, which was expected because the effective magnitude of error lIryl appears quite small. Some of the recent RERF reports applied the correction method proposed by Pierce et al. (10)(11)(12) to the atomic bomb survivors follow-up data. Their method is aimed primarily at grouped survival data and uses the conditional distribution of the true dose given an estimated dose. Basically, it assumes that certain properties of the ordinary generalized linear models with exact covariate values will approximately hold when the covariates are subject to measurement errors. The method, however, may have practical value for the analysis of the atomic bomb survivors follow-up data because the method seems robust and the impact of the interpretation of the data on human society is possibly serious. The efficiency of the method is yet unclear, and calculation of the conditional moments of the true dose requires tedious computation and additional assumptions. As compared to the methods of Pierce et al. (10)(11)(12) and Prentice (6), the method described in this article is computationally simple and does not need any assumptions in addition to the measurement error distribution. For grouped survival data, the Poisson regression model is usually applied as usually performed by RERF. The exact corrected likelihood for the Poisson model with log-linear risk function has been obtained (18). For more complex risk functions, approximation would be possible (19) if the exact correction does not exist or is difficult to find.