Questionable utility of the relative risk in clinical research: A call for change to practice

This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.


Introduction
In clinical trials, the relative risk or risk ratio (RR) is a mainstay of reporting of the effect magnitude for an intervention 1 . The RR is the ratio of the probability of an outcome in an intervention group to the probability of an outcome in a control group, and thus the RR provides an increase or decrease in the likelihood of an event based on some intervention. Despite some concerns [2][3][4] , the RR has been widely used because, it is considered a measure with 'portability' across varying outcome prevalence, especially when the outcome is rare 3 . The RR is probably the oldest and most used of a variety of measures of association which have been developed 5 to express in quantitative terms the relationship between an exposure (e.g. risk factor or intervention) and a binary outcome, but its first use in clinical research is more recent and attributed to Cornfield [6][7][8] .
The RR has also been approximated by the odds ratio (OR), an approximation that has been described much earlier in clinical research 9,10 . The use of the OR to estimate the RR has been thought to be appropriate for studies of rare outcomes, however the RR and OR are not numerically equivalent when the outcome is common 11 and in the latter situation the RR may not even have a monotone relationship with the OR 12 . It has also been thought that the OR overestimates the RR and thus may inappropriately affect clinical decision-making or policy development leading to unintentional errors in the analysis of treatments 13 .
By and large, the RR remains more popular today than the OR especially when reporting clinical trials and meta-analyses of clinical trial data. This is, in part, is because of the notion that researchers should preferably choose more interpretable measures of association based on a ratio or difference in risks to minimize confusion and misrepresentation of research 2,[14][15][16][17] However, what has been ignored is the fact that the RR is asymmetric 18 and this has led to some uncertainty over its constancy across J o u r n a l P r e -p r o o f [5] baseline risks 2,3 and over interpretation and applicability of its results. 19 Today, the debate continues over the merits of the OR versus the RR and their interpretation. 12,20 In this paper, we will show that the RR is not a measure of the magnitude of the interventionoutcome association alone because it has a stronger relationship with prevalence and therefore is not generalisable beyond the baseline risk of the population in which it is computed. More importantly, we will also demonstrate that the ORs accurately reflect the increase or decrease in odds attributed to an exposure/intervention over baseline odds in a control group and thus accurately estimate the effect magnitude and there is no overestimation of effect magnitude by the OR. We should point out that we will use outcome-prevalence and cumulative incidence (in the whole study) synonymously as the outcome prevalence is equal to the cumulative incidence proportion when we define (as we must) the outcome-prevalence at the end of the follow-up period.

Derivation from first principles
Bayes described the relationship between the posterior probability of an outcome conditional on an exposure based on the unconditional prior probability and an evidence factor. This can be restated in terms of the posterior odds of an outcome ( ) D conditional on an exposure ( ) E based on the population level prior odds and an evidence factor (likelihood ratio) 21,22 . The latter can be given by the following expression: where E denotes non-exposure and D denotes the absence of the outcome.
J o u r n a l P r e -p r o o f [6] The evidence factor or likelihood ratio is a ratio of two exposure probabilities: the outcome group exposure probability 1 ( ) φ and the no-outcome group exposure probability 0 ( ) φ respectively. The ratio 1 0 φ φ is called the exposure likelihood ratio but also equals the exposure-conditional odds ratio ( ) E OR and represents the ratio of the exposure-conditional odds of outcome to the unconditional odds of outcome in the general population and is given by Similarly the no-exposure likelihood ratio represents the conditional odds ratio ( ) E OR in the nonexposed population. This ratio is thus the non-exposure conditional odds of an outcome to the unconditional odds of the same outcome in the general population and is given by These two likelihood based OR's are different from that commonly computed in a case-control study, where the ratio of odds is that of the retrospective outcome conditional odds of exposure to the retrospective no-outcome conditional odds of exposure and this would be exactly the same (mathematically) as the retrospective exposure conditional odds of outcome to the retrospective noexposure conditional odds of outcome. This classical odds ratio ( ) C OR is therefore given by: and is equivalent to the diagnostic OR in diagnostic studies. 23  J o u r n a l P r e -p r o o f [7] outcome odds (exposure-conditional vs unconditional) and so they only differ in the denominator odds.
The C OR is therefore the ratio of the two likelihoods (or odds ratios) and represents the fold increase in odds from the no-exposure state to the exposure state. This is easy to see from the following expression: In prospective cohort studies we can compute prospective probabilities of outcome (known as risks) in the exposure group 1 ( ) r and in the non-exposure group 0 ( ) r . The ratio of these risks is the RR popularly used in epidemiology today. However, the retrospective likelihood ratio and the prospective risk ratio are not equal and thus Cornfield 6 has shown that the retrospective proportions can be used to compute risks if prevalence of disease ( ) δ in the general population is taken into consideration as follows: It is quite easy to compute prospective conditional outcome risks at any value of the population level prevalence using the E OR or the E OR as depicted above and it follows that the ratio of expressions [5] and [6] is the RR given by J o u r n a l P r e -p r o o f [8] ( It is clear therefore from expression [7] that the RR is dependent on the prevalence of an outcome and therefore is not 'portable' across prevalence levels and therefore only partially reflects the exposureoutcome association. In addition, the RR only approximates the In all other cases the numerical value of the C OR will always differ from the RR. This can be easily demonstrated by computing the RR and OR at varying prevalence levels. As shown in Figure 1, when the OR is held constant and the prevalence of outcome increases, the RR shifts progressively towards the null. If we accept the OR as an effect magnitude measure that quantifies the fold increase in odds of an outcome from the non-exposure state to the exposure state, then as depicted in Figure 1, the RR does not reflect the exposure-outcome association if outcome prevalence is ignored. It is quite easy to compute the prospective risk in those exposed at any value of the no-exposure risk 0 ( ) r since we would simply need to replace E OR with C OR and δ with 0 r in expression [5] so that and dividing expression [8] by 0 r again gives the risk ratio as follows: This is exactly the expression suggested by Zhang and Yu 24 in 1998, though the use of this expression to generate confidence limits for the RR based on the limits of the OR by substitution has been criticized on the grounds that the proposed confidence interval for the RR would be too narrow presumably because of its failure to account for variability in the baseline risk. 25 This may, however, be a reflection of the RR J o u r n a l P r e -p r o o f [9] dependence on prevalence 26 and this is no longer an issue given that this derivation advocates using the OR and its limits to determine posterior probabilities for a fixed baseline risk of interest from which clinically useful limits for the absolute effect sizes 27,28 are derived. These limits are necessarily a function only of the estimated variability in the OR.
Subtracting expression [8] from 0 r gives the risk difference 27,28 (RD) for any baseline risk given by and finally the reciprocal of expression [10] gives the number needed to treat 27,28 (NNT) given by J o u r n a l P r e -p r o o f [10] Box 1. This box depicts the main relationships that emerge from expressions [9] to [11] where 0 r is baseline risk and C OR is the classical odds ratio. The confidence limits are determined from those of the classical odds ratio and the key expressions of interest are:  [11] the premise that 77% of lung cancer subjects smoked that much while only 58% of a group of non-lung cancer subjects smoked that much. As baseline population prevalence of cancer varies from 10% -90% and the various likelihoods ( and ) E E OR OR are held constant, the two risks and their ratio (RR) vary.
The reason why this happens is clearly depicted in Table 1 and Figure 2 where the NNH and RR are highly dependent on prevalence of the outcome as mathematically demonstrated above. The OR does not have this issue because it also is a likelihood ratio and therefore is invariant with the prevalence of disease/outcome. The fact that it is a likelihood ratio also explains the observation by Hoppe et al 30 that the OR can also be interpreted in terms of a conditional RR given discordant pairs from unmatched data collected pair-wise. On the contrary, expressing both posterior probabilities as a ratio (i.e. the conventional RR) is meaningless. The absolute change from non-exposure to exposure is what is meaningful and can be expressed as a NNH/NNT and can be computed from the OR for any specific baseline non-exposure risk through the relationships depicted in Box 1.

Analysis of 140,620 trials scraped from Cochrane
Data from individual studies of meta-analyses published in the Cochrane Database of Systematic Reviews from 2003 (issue 1) to 2018 (issue 5) were extracted using the R package ''RCurl''. The R code has been provided in a previous study which used this dataset to compare tests for publication bias. 31 A total of 181,278 trials used in 18562 meta-analyses were scraped from Cochrane. We dropped metaanalyses containing trials with zero events in both arms. We also dropped studies with baseline or intervention risks exactly equal to zero or one and dropped meta-analyses with less than 5 studies (in this sequence). There were 140,620 trials left for analysis across 14,960 meta-analyses. The RR, OR, outcome prevalence and baseline risk in the control group were computed for each of these trials (regardless of meta-analysis they belonged to). The ORs were found to have no association with J o u r n a l P r e -p r o o f [12] prevalence and all the expressions derived above (and in box 1) were confirmed to be numerically equivalent to the directly computed RR, RD or NNT/NNH. The relationship between the RR and prevalence described mathematically above was confirmed and is depicted in Figure 3. Since the RR is a ratio of two conditional probabilities, it loses meaning given that the two conditional probabilities increase with prevalence and when the association between exposure and outcome is fixed (by generating percentiles of the logOR and analyzing by percentile) the RR is just a linear function of prevalence and nothing more ( figure 3, panel C). A similar relationship is seen when prevalence is fixed by percentile and depicted in supplementary figure S1.
Given this finding, pooled effects from meta-analyses using the RR effect measure will tend to be increasingly misleading as prevalence of the outcome increases or varies across studies (in metaanalyses) and will also be expected to have decreasing heterogeneity with increasing prevalence. This was demonstrated by computing these meta-analyses using the fixed effect model in Stata and results show much less heterogeneity (between studies variance, tau2) when pooling using the RR compared to when pooling using the OR and this was expected given the range limitation of the RR as prevalence increased ( Figure 4). In keeping with our observation, the use of the RR in meta-analysis has recently been flagged as a source of bias in RR based meta-analyses 32 .

Examples from the literature
In a meta-analysis of prostate cancer-specific mortality associated with androgen deprivation therapy (ADT) among patients with prostate cancer 33  NNT would then be 58 (OR) vs 80 (RR) using the OR or the RR respectively. Of interest here is that Fagan's nomogram can also be used to go from control to post-intervention probability using the OR as the likelihood ratio.
In another meta-analysis of stroke or systemic embolism after high dose non-vitamin K oral anticoagulants (NOAC) vs vitamin K antagonists (VKA) 34 , the authors report the RR for stroke or systemic embolism to be 0.79 ( Figure 6). The meta-analysis using the OR returned numerically similar results because outcome incidence was similar across studies (4% -6%). However, the interpretation of both measures differ. Using the OR, assuming varying baseline incidence of stroke or systemic embolism in the VKA group, we get varying results for the RD across use of the RR or the OR (Figure 6 panel C). The RR, therefore, gives inaccurate risk information regardless of whether it has or has not got a numerical similarity to the OR.

Conclusion
In this paper we discuss only two relative measures (OR & RR) and attempt to explain why one of them seems problematic and thus by default this leaves us with only one moving forwards. While many effect measures exist, the differences in objectives for the different measures are not raised in this paper as we focus solely on the debate between OR & RR as we show that all information previously derived from the RR can just as easily be derived from the OR. Of greater concern is that the RR may not be a valid measure of effect. The basic concepts underpinning this study are all well established but what this paper adds is that what we think is being measured is not actually what is being measured and this lends itself to conclusions that are not true.
J o u r n a l P r e -p r o o f [14] A main thesis of this paper is that relative risk depends on prevalence, more so than on the strength of the exposure-outcome association that it is supposed to reflect. Thus, if the association is unchanged and we modify prevalence then the relative risk changes. If the strength of association is measured independently of the RR and kept constant then we can show that the RR is a linear function of prevalence. We can no longer accept the commonly argued for view that the relative risk is easier to understand. Once we realize that the RR depends more on prevalence than the exposure-outcome association, its interpretation becomes much more difficult to comprehend than the odds ratio.
It is well known that, for common events, large values of the risk ratio are impossible and this should have rung the alarm bells much earlier regarding whether the RR is more a measure of prevalence than a measure of effect. However this was not the main focus of the derivation outlined previously and the latter was aimed at demonstrating why the OR is a true measure of effect against which the RR can be compared. The derivations relating to the OR in this paper relate primarily to the odds ratio and then are linked to the relative risk through derivation of risk. The derivations do not require the assumption that the probability of exposure given outcome or the probability of exposure given no outcome, are constants, but rather that they are independent of outcome-prevalence. This is logical since they are derived from the outcome or no outcome groups separately. While test principles are applied in the derivation, a test being a consequence of the outcome or an exposure a cause of the outcome is not really important in this context because the probabilities are assessed at one point in time -in nondiagnostic studies this would be at the end of follow-up.
Clinical research has a substantial need for absolute measures and either the OR or RR can be used to compute these as we show in this paper. However the choice of binary effect measure in epidemiologic studies precedes the computation of such absolute effects. Creating a measure of impact is usually done by combining a relative measure with baseline risk information. However, this can only J o u r n a l P r e -p r o o f [15] be done when the relative measure has portability across different risk groups. This paper demonstrates this lack of portability with the RR and so both the direct interpretation and the translation to absolute measures of impact are flawed.
While it is a common perception that consistency of an effect measure in meta-analysis means that it is the effect measure of choice. This again is a misconception because If the RR does not measure effect magnitude and simply reflects prevalence of the outcome in the study, consistency is no longer meaningful. Consistency increases with the RR as prevalence increases simply because the RRs become more equal (as well as lose meaning as a measure of effect magnitude). It may be okay to meta-analyse studies with similar prevalence (though if the prevalence is high, interpretation is still questionable) but if this differs across studies (see the example) it makes little sense to meta-analyse such studies as the RRs cannot be interpreted as similar even if they are fortuitously consistent.
We reiterate that the RR does not fully reflect the magnitude of the effect in clinical trials. Rather, it reflects more of the baseline prevalence of an outcome, shifting towards 1 as prevalence increases. It is also a ratio of two conditional probabilities making it's interpretation questionable. On the other hand, the OR is a likelihood ratio whose magnitude reflects the fold increase in odds from that in the baseline group to that in the intervention group and reflects the magnitude of the intervention effect independent of prevalence. Interpretation of the OR is similar to that of the likelihood ratio in diagnostic studies. The OR is expected to be specific to any intervention and thus should be the binary effect measure in randomized trials and meta-analyses of such trials and the RD and NNT should preferably be derived from the OR and not the RR. To this end, we provide a conversion excel sheet in supplementary material (S2) for use by clinicians to move from baseline risk to post-intervention risk, just like is done for diagnostic tests with the likelihood ratio replaced by the OR from a clinical trial. What this paper raises about the RR dependence on prevalence should also not be conflated with the well known fact in J o u r n a l P r e -p r o o f [16] epidemiology that the OR approximates the RR when prevalence is small. The latter is about numerical equivalence thus rendering the OR interpretable as a RR. This has never been previously viewed as a problem inherent to the RR, which is what this study aims to explain. Therefore, while the RR and OR coincide numerically over a narrow range below baseline risks of 20-30%, this is no justification to continue to use this ratio. The risk relativism as it is currently viewed is an illusion.
Finally, we should point out that the basics used in this paper reflect well known material. What we are arguing for is a major change in the status quo regarding interpretation and usage based on these very same basic principles. While we appreciate the fact that some debates are put out there to stimulate thought and advance understanding, we are also seriously concerned about this issue because we believe the evidence we provide in support of change is compelling.

Funding statement
This work was made possible by Program Grant #NPRP10-0129-170274 to SAD, LT and LFK from the Qatar National Research Fund (a member of Qatar Foundation). The findings herein reflect the work, and are solely the responsibility of the authors. All authors had full access to all the data in the study and the corresponding author had final responsibility for the decision to submit for publication. LFK is also supported by an Australian National Health and Medical Research Council Fellowship (APP1158469) [19]