The effect of measurement error on the dose-response curve.

In epidemiological studies for an environmental risk assessment, doses are often observed with errors. However, they have received little attention in data analysis. This paper studies the effect of measurement errors on the observed dose-response curve. Under the assumptions of the monotone likelihood ratio on errors and a monotone increasing dose-response curve, it is verified that the slope of the observed dose-response curve is likely to be gentler than the true one. The observed variance of responses are not so homogeneous as to be expected under models without errors. The estimation of parameters in a hockey-stick type dose-response curve with a threshold is considered on line of the maximum likelihood method for a functional relationship model. Numerical examples adaptable to the data in a 1986 study of the effect of air pollution that was conducted in Japan are also presented. The proposed model is proved to be suitable to the data in the example cited in this paper.


Introduction
In order to assess the risk of a chemical substance in the environment we have to estimate the dose-response relationship between the dose level of the substance and the prevalence rate of a set of symptoms that might be caused by it. The estimation is often performed based on the data in epidemiological studies.
In epidemiological studies, the raw data are often highly dispersed, as shown in Figure 1, which is a part ofthe data published by the Japan Environment Agency and is interpreted by Yoshimura (2). When a significant correlation is proved for such data, it is usually arranged in a reduced form (Fig. 2) by taking averages within categorized classes that are constructed on the dose. On such reduced figures we can easily confirm a monotone dose-response relationship.
However, there is one point to be noticed on this line of data processing. If the true dose-response relationship is similar to the one that is observed in Figure 2, the dispersion of raw data must be similar to that shown in Figure 3. In Figure 3 the middle solid line implies a dose-response curve, and the upper and the lower lines imply the widths of the standard deviations multiplied by 1.5 under the Poisson assumption stated later. The observed dispersion that is shown as open circles in Figure 3 is inhomogeneous in contrast with the expected dispersion shown by curves. This paper gives a reasonable explanation of this inconsistency between the data and the fitted model by introducing a measurement error on doses and studies about the misleading effect of the measurement errors. *School of Engineering, Nagoya University, Nagoya 464-01, Japan.   independently distributed random variables. In real situations the distributions may be discrete, but, for the convenience of expression, we regard that they are identified with probability-densities f(x; (i, a) and g(y; (i, P), respectively. It does not affect the following arguments.
When the areas are chosen purposively, (i is regarded as an unknown parameter representing the true dose on the area Ai. On the contrary, when the areas are chosen randomly from a population of areas, (i values are regarded as independently and identically distributed random variables with a prior probability density h(k; 0). Assume that the parameters ,B, o-, and 0 are independent of areas. Let us eliminate the subscript i in the following, unless the specification of the area is necessary. When the true dose e is given, the expectation of Y, q = E(Y E) = f y g(y; t, I) dy (1) is regarded as the true mean response on the area. As 'r7 a function of e and I, -n = r(t; ,B) implies the true doseresponse curve. In general, one of the principal pur-40 poses of such a survey is to know this true dose-response curve. In the first case we assume e values are fixed unknown constants, which implies a functional relationship model. When we ignore measurement errors on the dose, we usually evaluate the average of the observed values of Y for the observed values of X as an estimate of the true dose-response curve. An example is shown in Figure 2. The dose-response curve thus obtained is regarded as an observation of an weighted average of E(Yi): This function ro(x; ,3, a-) is called the apparent doseresponse curve. In many cases, conclusions derived from epidemiological studies are based on the apparent dose-response curve. However, if measurement errors exist, the apparent dose-response curve is distorted from the true one so as to cause a misunderstanding about the effect of the substance in question, as shown in the following material. Consider the following assumptions on the distributions and parameters: -Assumption 1: t1 2 %... (a and at least one inequality is strict. the parameter e has a monotone likelihood ratio in x that is, for any xl, x2, t', and t2 such that xl < Under these assumptions the following theorem holds: Theorem 1. If assumptions 1 through 4 hold, then ro(x; 13, cr) is a strictly monotone increasing function of x for b, < x < b2 and constants cl and c2 exist such that tl < cl -c2 < a and ro(x; ,r) > r(x; P) for bi < x < cl (5) ro(x; 3, r) < r(x; 13) for c2 < x < b2.

(9)
This theorem implies that the apparent response tend to appear to be greater than the true one in low doses, while less in high doses. In most actual situations cl = c2 as shown in the numerical examples in the next section, and then the average slope of the apparent doseresponse curve is gentler than the true one in the central region of observed dose. Hence, it is defective as an estimate of the true dose-response curve and tends to cause an incorrect conclusion from the viewpoint of the risk assessment.
When both X and Y are normal variables and the doseresponse curve is linear, this fact is well known in the context of the functional relationship model; however, in the situation we are faced with in epidemiological studies, the normality and the linearity are usually violated. I think this is the reason why this biased property contained in the apparent dose-response curve is neither noticed nor examined.
In the above theorem the strict monotonicity is assumed on the dose-response curve. If there is a threshold below which there is no increase of response, the monotonicity is not strict. Then Assumption 3 must be modified as Assumption 3' below: Assumption 3': r(k; 1B) is constant for e S d and is a strictly monotone increasing function of e for dt, where the constant d is a given constant such that ti < d < ta.
Even when Assumption 3 is replaced by Assumption 3' the above proof of the theorem is valid, so that the following corollary holds: Corollary. If Assumptions 1, 2, 3', and 4 hold, then the conclusion of the theorem holds. This corollary is important in actual situations, because even when there is a threshold it cannot be observed in data if there are errors in observed dose variables.
(12) If the true variance is constant on the whole range of the dose, the apparent variance is greater than it by the second term of Eq. (12). The excess variation in the apparent variance is in general, remarkable in a central part of the distribution of e values, as shown in the numerical example in the next section.

Structural Relationship Model
In the second model we assume e values are independently and identically distributed random variables with the prior probability density h(Q, 0), which implies a structural relationship model. In this case, we modify the apparent dose-response curve as follows: ro(x; 1 , cr, 0) = E(Y I X = x) independent of e and b1 < a1 < a2 < b2.
(16) (17) The proof is entirely the same as that of Theorem 1 and hence it is omitted. Likewise, when Assumption 3 is replaced with Assumption 3' the conclusion of the theorem holds.

Numerical Example
In this section, let us show some numerical results adaptable for the example shown in Figure 1. As for the dose, let X be such that ln(X) is distributed with N(ln(Q), cr). As for the response, let Y be such that n Y is distributed with Poisson (n -q), where n is supposed to be the number of persons sampled in the area andis supposed to be the true prevalence rate in the area in the case of the example. Though Y is discrete in this model, it does not violate the validity of the argument in the preceding section.
Let the true dose-response curve be as follows: n = r(r; 1) = PO + 13( -12)+, where (t -032)+ = max{O,(t -,B2)}. This model implies the hockey-stick regression. In Eq. (19) P3o is, in a sense, the spontaneous prevalence rate, 1I is the risk factor, and 12 is the threshold value. Under the functional relationship model, the likelihood function L can be written as: where -i = r(tj; 1) = Po + ,Bi(t -12)+ . If v is known, the maximum likelihood estimates can be obtained numerically through an iterative method with suitable initial values of parameters. The estimates adaptable for the data shown in Figure 1 are obtained for some values of C as shown in Table 1.
As far as the data shown in Figure 1 is concerned, the measurement error models with the parameters set in Case 1 and Case 2 are well fitted, compared with the

Conclusion and Discussion
It has been said that it is difficult to fit a simple doseresponse curve to such data as that shown in Figure 1. However, in this paper the possibility of fitting a simple dose-response curve to such data is shown by assuming the existence of a measurement error on the dose. It is to be noted that if we ignore the measurement errorin spite of the actual existence of it-we are likely to estimate the true dose-response curve incorrectly with a bias.
Further investigations should obtain effective methods of estimation of parameters well fitted to real data under the measurement model. However, it is anticipated that the knowledge about the dispersion or the standard deviations is necessary to estimate parameters, because even when the normality of errors and the linearity of the curve are assumed, the knowledge about the dispersion inevitably get satisfactory result, as is noted in Fuller (4) or Singh and Kanji (5).