Assessing the Lognormal Distribution Assumption For the Crude Odds Ratio: Implications For Point and Interval Estimation

The assumption that the sampling distribution of the crude odds ratio (ORcrude) is a log-normal distribution with parameters mu and sigma leads to the incorrect conclusion that the expectation of the log of ORcrude is equal to the parameter mu. In fact, mu is the median of the lognormal distribution, not the mean. If a different parameter is obtained as the expected value of the lognormal distribution, then this quantity can be used to obtain a new estimate of the true odds ratio (ORtrue). Here, simulations are conducted based on a simple randomized clinical trial study design. The simulations demonstrate that the new estimate of ORtrue (based on the expectation of the lognormal distribution function) yields interval estimates that are more statistically valid than the standard method. These interval estimates are obtained by both a parametric bootstrap method and a calculated percentile method. The statistical conclusion validity of the estimated confidence intervals are based on the intended coverage probability (ie the probability the confidence interval contains ORtrue). Additionally, an interval based hypothesis test based on the improved confidence interval estimate has higher power to reject the null hypothesis that ORtrue is equal to one when the alternative hypothesis is true (ie ORtrue is not equal to one) than the standard hypothesis test when the intervention is protective.


Background:
Relative effect measures have been utilized to interpret and report the magnitude of benefits or risks to human health of diverse exposures since the 1950s. These measures are estimated from data from both observational and experimental studies. Additionally, confidence intervals are typically calculated utilizing these point estimates along with estimates of their standard errors. Hypothesis tests of the null hypothesis test that the true value of the effect is equal to one (ie. there is no effect of exposure on outcome). These tests are conducted utilizing the same point estimates along with the same estimates of their standard errors as are used to derive the confidence intervals. Mathematically, the typical assumption for deriving interval estimates for the true value of the effect, as well as hypothesis tests, is to assume that the sampling distributions of the estimates are lognormal probability distributions. This report will concentrate on the theory of the standard method of point and interval estimation for the odds ratio, although the theory may be applied to all commonly used relative effect measure estimates (ie. the incidence density ratio and the cumulative risk ratio). Monte Carlo simulations of a simple clinical experiment are used to assess statistical conclusion validity of the confidence intervals for the standard and three alternative methods and the associated tests of the null hypothesis that the true odds ratio is equal to one for each of the four methods. The statistical conclusion validity of the interval estimates are assessed by the simulated empirical coverage probability (for a given ). The validity of the hypothesis tests are assessed by examining the simulated empirical power for rejecting the null hypothesis that the true odds ratio is equal to one when the alternative hypothesis is true (ie. the true odds ratio is not equal to one). Comparisons are made between the standard method of estimation and the three alternative estimation approaches which may yield less biased point estimates along with better interval estimates and hypothesis tests.
The odds ratio is a commonly employed relative effect measure in epidemiologic research. Its widespread use is mainly a result of its application to multiple logistic regression modelling as well as its use in the analysis of data from case control studies. The crude odds ratio (  or unadjusted sample OR) may   be calculated from the standard two by two table of disease versus exposure by the familiar "cross product" formula [1]. This crude measure often serves as an estimate for the unknown true odds ratio ( ).
Common practice in both hypothesis testing and interval estimation for the odds ratio is to log transform ̂ and then obtain an estimate for the standard deviation of (̂) utilizing the assumption that (̂) is normally distributed. This assumption is mathematically equivalent to assuming that ̂ is lognormally distributed. The assumption that (̂) is normally distributed is usually justified by an asymptotic result known as the Delta Method [2]. This theorem asserts that if a real valued function of a random vector , ( ), is asymptotically normally distributed with mean u and standard deviation (where lim →∞ = 0) , then for any real valued function g, which has a continuous first derivative at u: For the odds ratio the Delta Method yields an estimate for which is derived from a Taylor Series expansion of ( ( )) = (̂) . In practical applications, only the constant and linear terms of the expansion are evaluated. This is how the formula for the common estimate for the standard deviation of (̂) is derived.
The standard method of calculation of the confidence interval for the true odds ratio equates (̂) with an estimate of the parameter from a normal distribution (the assumed sampling distribution of (̂) ). A problem arises because the above assumptions imply that the expectation of (̂) is not equal to the parameter (ie. ( (̂) ≠ ). This is a result of the relationship between the parameters when the parametrization for the lognormal distribution arises from an exponentiated normally distributed random variable with the same parameters ( ) [3]. The above implies that point and interval estimates for will be biased when the standard method of estimation is used [4]. Barendregt attempts to correct for this bias in the point estimate by using the value of the expectation of the crude odds ratio when it is assumed that the crude odds ratio follows as lognormal distribution.
Under the assumption ̂ follows a lognormal distribution with parameters ( ), the formula for the expectation of a lognormally distributed random variable can be utilized: Thus, we may obtain a better estimate for by: The estimate of the parameter may be obtained by the Delta method: (Where a, b, c and d are the cell values from the 2x2 table).
If we define ̂ * exp((ln (̂) − ⏞ 2 2 ) , then as a means to find the sampling distribution for ̂ * consider the following: Let: ̂~( , ). This implies that (̂)~( , ). Next, define: Assuming that ̂ is an unbiased estimate of , (1) implies that: Where: Equation (2) follows from the fact that for a fixed finite sample size, samples of independently and identically normally distributed random variables will yield sampling distributions of the sample means that are stochastically independent from from the sample variances.
(2) implies that: It should be noted that the above method of derivation for the point estimate of , ̂ * , yields a lognormal sampling distribution for this statistic, namely ̂ * ~( * , ) , for which the parameter has the same value as the corresponding parameter from the ( , ) distribution, which is the assumed sampling distribution of (̂) . This is in contrast with Barendregt (1), (2) and (3) and from the fact that for a random variable distributed ( , ) the standard estimates of μ and σ are stochastically independent, the above recalculations appear to be unnecessary.
The methods used to obtain interval estimates based on these two point estimates of are as follows: The standard method utilizes the formula: This confidence interval is intended to have coverage probability equal to 1 − . In other words: Another confidence interval may be obtained from (1) by: The intended coverage probability for this CI is given by: The mathematical justification for ∝ * comes from the equations (1), (2) and (3) and the Percentile Lemma [5]. The percentile lemma implies that a confidence interval which is obtained using the parametric bootstrap method is mathematically equivalent to * . This method entails generating #PBS bootstrap estimates of * as realizations of a lognormal stochastic process with parameters * and .
The Additionally, the point estimate and associated confidence interval recommended by Barendregt will be included for comparison. The statistical conclusion validity of the four associated interval estimates will be assessed from the simulated empirical coverage probability [6] and confidence interval lower and upper bounds. The proportion of times that either the lower bound of the interval is greater than , or the upper bound the interval is less than , yields one minus the simulated empirical coverage probability of the CI. The proportion of times that either the lower bound of the interval is greater than 1 or the upper bound the interval is less than 1 when is not equal to 1 out of the total number of MC simulations, will be the simulated empirical power of the two tailed hypothesis tests to reject the null hypothesis that is equal to 1 (ie. no treatment effect), when in fact ≠ 1 [6]. The theoretical power of this hypothesis test will be calculated utilizing the standard formula [7]: This theoretical power will be compared to the empirical power obtained from the MC simulations.

Monte Carlo Simulation Methods:
Simulations were conducted using R [8]. The stochastic process that was utilized to generate #MC samples of was based on a prospective study design. The exposure status and disease outcome status were generated for a total of n=200 subjects for each simulation. Two sets of simulations were conducted, one for = .279 (exposure "protective") and one for = 2.365 (exposure "harmful"). For both examples, the exposure status E=0 (unexposed) or E=1 (exposed) were randomly assigned by a Bernoulli process with the probability of exposure set to P(E) = .5. Next, the probability of disease, conditional on exposure status, yielded the event probabilities for two more Bernoulli random variables whose outcomes resulted in the subject's classification in one of the

Monte Carlo Simulation Results:
The simulation results are presented in Table 1 and Table 2. The leftmost column (column 1) gives the point and interval estimation method used, along with the value of the point estimate of . Column 2 gives one minus the coverage probability of each 95% CI as well as the mean lower and upper bounds and the CI widths. Columns 3 and 4 give the probability that the upper bound of the CI is less than and the probability that the lower bound of the CI is greater than respectively (these two numbers add up to one minus the coverage probability). The rightmost column (column 5) gives the simulated empirical power of the interval based hypothesis test to reject the null hypothesis that = 1 based on the rejection criterion defined by the 95% CI not including 1.
In both Table 1 (exposure protective) and Table 2 (exposure harmful) the mean point estimates of for the parametric bootstrap and calculated percentile methods are almost identical to . These two estimation methods also exhibit the highest level of statistical conclusion validity for their respective confidence intervals as indicated by one minus the coverage probability being close to .05. Overall, as demonstrated by the MC simulations, these two methods exhibit the least biased point estimates, the best coverage probability (as defined by it's closeness to 1-.05) and the narrowest confidence intervals of the four estimation approaches. In Table 1 (exposure protective), the two percentile based methods yielded simulation derived empirical power that was higher than the theoretical power to reject the null hypothesis that = 1 but lower than the theoretical power to reject this null hypothesis in Table 2 (exposure harmful). The simulated empirical power to reject the null hypotheses is closest to the theoretical value for the standard method of interval estimation for both cases (exposure protective or exposure harmful). The method recommended by Barendregt gives the most biased point estimates and poorest CI coverage probability in both tables. This method yielded power for the interval based hypothesis test that were midway between the standard method and the two percentile based methods for Table 1 but lower than the other three estimation methods for Table 2.

Discussion:
By utilizing Monte Carlo simulations, it is possible to evaluate the properties of estimators from data that are generated based on a specific stochastic process. This process can be simulated based on the statistical model from which point and interval estimates as well as hypothesis tests are formulated. Therefore, classical statistical inference, which includes both estimation and hypothesis testing, now has a means of checking if these inferential procedures are in fact valid. This type of validity has been referred to as statistical conclusion validity.
In the case of the odds ratio, there currently exist numerous alternatives for both interval estimation and hypothesis testing. This report focused on the standard method, (which is still widely used in both epidemiology and clinical trials) and three alternatives. The simple chi-square test for independence of E and D does not yield an estimate of the magnitude of the "effect" of exposure on outcome. On the other hand, the confidence interval for the odds ratio (and other relative effect measures) give the investigator a sense of how "protective" or "harmful" an exposure is. The confidence interval around the point estimate also yields a hypothesis test at the alpha level of the null hypothesis that the odds ratio equals 1 (ie, the exposure has no effect of outcome).
In this report, four different methods of point and interval estimation were examined via Monte Carlo simulations. The coverage probability was given the highest level of value as far as a measure of the statistical conclusion validity of each of the confidence interval estimation methods. By this standard, the two percentile-based methods (parametric bootstrap and calculated percentile method) yielded the highest level of validity. These intervals were also the narrowest, thus indicating a smaller amount of variability around the point estimate.

Conclusions:
The concordance of the theoretical power of the confidence interval based hypothesis test of the null hypothesis to reject the null hypothesis that the true OR was equal to one, with the simulation derived power, was highest for the standard CI. However, for the example given of a true odds ratio that was less than one, the two percentile methods yielded the highest simulated empirical power. This result should be examined more closely, possibly with additional simulations, since if valid the higher statistical power for a protective exposure with a relatively small sample size might have particular value for researchers involved in clinical trials.   .5017