Article Text

Original research
Simulation study on the validity of the average risk approach in estimating population attributable fractions for continuous exposures
  1. Yibing Ruan1,
  2. Stephen D Walter2,
  3. Priyanka Gogna3,
  4. Christine M Friedenreich1,4,
  5. Darren R Brenner1,4
  1. 1Cancer Epidemiology and Prevention Research, Alberta Health Services, Calgary, Alberta, Canada
  2. 2Department of Health Research Methods, Evidence, and Impact, McMaster University, Hamilton, Ontario, Canada
  3. 3Department of Public Health Sciences, Queen's University, Kingston, Ontario, Canada
  4. 4Departments of Oncology and Community Health Sciences, University of Calgary Cumming School of Medicine, Calgary, Alberta, Canada
  1. Correspondence to Yibing Ruan; yibing.ruan{at}albertahealthservices.ca

Abstract

Background The population attributable fraction (PAF) is an important metric for estimating disease burden associated with causal risk factors. In an International Agency for Research on Cancer working group report, an approach was introduced to estimate the PAF using the average of a continuous exposure and the incremental relative risk (RR) per unit. This ‘average risk’ approach has been subsequently applied in several studies conducted worldwide. However, no investigation of the validity of this method has been done.

Objective To examine the validity and the potential magnitude of bias of the average risk approach.

Methods We established analytically that the direction of the bias is determined by the shape of the RR function. We then used simulation models based on a variety of risk exposure distributions and a range of RR per unit. We estimated the unbiased PAF from integrating the exposure distribution and RR, and the PAF using the average risk approach. We examined the absolute and relative bias as the direct and relative difference in PAF estimated from the two approaches. We also examined the bias of the average risk approach using real-world data from the Canadian Population Attributable Risk of Cancer study.

Results The average risk approach involves bias, which is underestimation or overestimation with a convex or concave RR function (a risk profile that increases more/less rapidly at higher levels of exposure). The magnitude of the bias is affected by the exposure distribution as well as the value of RR. This approach is approximately valid when the RR per unit is small or the RR function is approximately linear. The absolute and relative bias can both be large when RR is not small and the exposure distribution is skewed.

Conclusions We recommend that caution be taken when using the average risk approach to estimate PAF.

  • epidemiology
  • statistics & research methods
  • public health

Data availability statement

Data are available on reasonable request. Extra data, including the R code for simulation and the exposure datasets from the ComPARe study, are available by emailing to yibing.ruan@albertahealthservices.ca.

http://creativecommons.org/licenses/by-nc/4.0/

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.

Statistics from Altmetric.com

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Strengths and limitations of this study

  • This study examined the assumptions and validity of the average risk approach to estimate the population attributable fraction, which has not been explored previously.

  • We used both simulated and real-world data to demonstrate the factors associated with the bias of the average risk approach.

  • As an empirical study, our simulation could only analytically establish the direction of bias of this approach and discuss the magnitude of bias using a limited number of risk exposure distributions and relative risk functions.

Introduction

Population attributable fraction (PAF) is an important measure for estimating the burden of disease in a population that is causally attributable to an exposure. Since its first introduction, PAF has received substantial attention in the field of epidemiology.1 Many advances have been made in different approaches to calculating PAF of single and multiple risk factors,2–6 in estimating the variance7 8 and in the interpretation of PAF.9–11 There have also been many comprehensive projects, either nationwide or globally, in estimating PAF for the burden of disease associated with its risk factors.12–22 The International Agency for Research on Cancer (IARC) has specialised in providing estimates of cancer surveillance and burden of cancer estimates from around the world. The IARC Working Group 23 introduced an approach to estimating PAF when the prevalence data on a continuous exposure in the population under study are only available as a population average. This approach, to be referred to here as the ‘average risk approach’, estimated the relative risk (RR) at average exposure of the whole population using the risk of disease per unit increase in exposure, and the average level of exposure of the whole population. No proof was provided at the time that this method was proposed. Hence, the purpose of this paper is to examine the underlying assumptions and validity of this average risk approach when estimating PAF for disease burden in a population. Specifically, we examined how the shape of the RR functions and the exposure distributions affect the validity of this approach.

Methods

Description of average risk approach

The average risk approach estimates the RR at an average exposure of the whole population using the RR of disease per unit increase in exposure along with the average level of exposure of the whole population as follows:

Embedded Image(1)

where Risk is the RR at the population average exposure, RRunit is the RR associated with a unit increase in exposure, Embedded Image is the weighted average level of exposure. An underlying assumption with this method is that a log-linear relationship exists between the exposure and the risk of cancer. The average risk approach then estimates PAF as:

Embedded Image(2)

where it was assumed that ‘each individual has experienced a similar average exposure’, IARC 2007 pg 5). Under this assumption, that all population under study are exposed at the population average level, formula (2) is a simplification of Levin’s formula when the prevalence (P) is 100%:

Embedded Image(3)

The IARC Working Group stated that ‘This formula is valid when the risk of cancer per unit of exposure was estimated in a model using log transformation. This is the case for logistic regression or Poisson regression, which are models widely used in case–control and cohort studies respectively’, IARC 2007 pg 5). No proof was shown for this statement, although the authors went on to acknowledge that ‘the dose–effect relationship is, in fact, rarely linear (or log-linear) over the whole range of exposures, but this method is considered to be the best approximation available in this respect. Therefore, the validity of the average risk approach has not been fully assessed, particularly concerning its sensitivity to departures from the assumed dose–response relationship, or concerning the impact of the exposure distribution.

When the distribution of a continuous exposure is known and no confounding is assumed, a valid method to estimate PAF involves integrating across all levels of exposure:

Embedded Image(4)

where RR(x) is the RR at exposure x; P(x) is the population distribution of exposure; and m is the maximum exposure level. Note that if there were to be no bias in the average risk approach, the following equation would have to hold:

Embedded Image(5)

Under the log-linear risk assumption, the left-hand side of equation (5) becomes:

Embedded Image(6)

Define Embedded Image in which x is a random variable with distribution Embedded Image, (6) is E[g(x)], and the right-hand side of (5) is g[E(x)], because Embedded Image is strictly convex (ie, a line segment connecting any two points on the graph of a function lies above the graph) when Embedded Image is greater than 1, the Jensen’s inequality24 determines that:

Embedded Image(7)

According to (7), the average risk approach will not overestimate PAF. The magnitude of the bias is determined by the extent of the convexity of Embedded Image over the effective range of x. When Embedded Image is small (ie, close to 1.00), Embedded Image is approximately linear and there is little bias. However, whether or not the choice of the exposure distribution Embedded Image affects the validity of this approach is unexplored. Specifically, it is unknown, if the exposure distribution in a population is strongly skewed or bimodal, whether or not the average risk approach still provides a good approximation to the actual PAF. Therefore, we studied the validity of the average risk approach under the loglinear RR function and a variety of exposure distributions.

In broad terms, when the loglinear function of RR is not assumed, the average risk approach can still be generalised as equation (2), in which ‘Risk’ is the RR at the population average exposure level. It can be reasoned that the curvature of the RR function determines the direction and the magnitude of the bias. When RR is a linear function of the exposure (ie, Embedded Image), there is no bias, because the integral PAF (Embedded Image) and the average risk PAF (Embedded Image are equivalent. When the RR function has a convex form, which indicates a risk profile that increases more rapidly at higher levels of exposure, this approach underestimates PAF. In contrast, it overestimates PAF with a concave RR function, which indicates a risk profile that increases less rapidly at higher levels of exposure. To illustrate the latter point, we included two examples of simulated concave RR functions and calculated the bias of the average risk approach.

Investigation of validity of average risk approach

To investigate whether or not the validity of the average risk approach is affected by the exposure distribution, we simulated several exposure distributions where the exposure is continuous, ranging between standardised values of 0–100, with 0 indicating no exposure and 100 indicating the maximal level of exposure in the population (figure 1). The prevalence distributions were scaled so that the prevalence of all exposure levels summed to 100%. The details of the distributions are summarised in table 1. We calculated PAF using both the average risk approach and by integrating across all exposure levels. We calculated the absolute bias (Embedded Image) and the relative bias Embedded Image. Note that because PAF is often expressed as a percentage, their absolute and relative biases are both in percentage units. However, the meaning of the former is the percentage points, and the meaning of the latter is an actual percentage. For example, an absolute bias of −5% from the difference of Embedded Image of 15% and Embedded Image of 20% indicates a relative bias of −25%.

Figure 1

Probability density curves of selected distributions in this study.

Table 1

Description of the exposure distributions used in this study

To examine if the magnitude of risk affects the validity of the average risk approach, we tested a range of values for the RR per standardised unit, from 1.001 to 1.04. Using a standardised unit resolves the scaling issue of the unit. For example, the RR of standardised unit of body mass index (BMI) and the disease associated with obesity is the same for the RR of 1 kg/m2 or 5 kg/m2, as long as it pertained to a single population. In this study, we refer to RR per standardised unit as ‘RR per unit’, unless otherwise stated. We also considered that the risk becomes implausible for RR per unit values above 1.04. For example, the RR at maximal exposure level would be 132, if the RR per unit is 1.05 under the log-linear assumption.

In addition, we illustrated the bias of the average risk approach when the RR function is non-linear or loglinear. In particular, we used two simulated examples of quadratic and cubic spline RR functions, which are both concave (online supplemental figure S1). The quadratic RR function has a form of Embedded Image, in which Embedded Image. This quadratic form has RR=1 when x=0, and RR has a maximum of 5 when x=k. In the illustrated example, we used k=75, that is, 75% of the maximal exposure. The cubic spline RR function is based on simulated data, with the function being approximately quadratic in the lower exposure range, and approximately linear at higher exposures.

Finally, we used real-world data of the distribution of air pollution (PM2.5) and residential radon exposures, which were investigated in the Canadian Population Attributable Risk of Cancer (ComPARe) study. The ComPARe study collected national-representative and population-weighted exposure data of PM2.5 and residential radon and used the integral approach to estimate PAF of lung cancer for 2015 for Canada.25 26 We compared this PAF to that obtained using the average risk approach, to illustrate the validity of this approach. We also estimated the approximate 95% CI of the PAFs and the bias, assuming a fixed prevalence distribution for simplicity and a lognormal distribution of the RR. We resampled 10 000 RRs from this distribution and calculated PAF and bias. We used the 2.5% and 97.5% quantiles as the approximate 95% CI.

Patient or public involvement

No patients involved.

Results

First, we examined the bias of the average risk approach under the loglinear RR function with the exposure distributions we selected in table 1. The results at RR per unit of 1.001, 1.01 and 1.03 were illustrated in table 2 and the results with a range of RR per unit from 1.001 to 1.04 were shown in figure 2. At RR of 1.001, the absolute and relative biases were very small and the average risk approach can be regarded unbiased. At RR of 1.01, the absolute bias remained small for all tested distributions although the relative bias started to increase substantially in the power distribution and in the Poisson distribution with an extreme tail (table 2). At RR of 1.03, large absolute and relative biases were observed in several distributions. However, the normal and hypergeometric distributions were more robust than the Poisson with extreme tail and power distributions with the increase in RR (table 2, figure 2). For some distributions (uniform, beta (0.5, 0.5), beta (8, 2) and bimodal), the largest absolute and relative bias occurred at an intermediate value of RR (figure 2). As RR increases, the bias becomes smaller, because the PAF estimates approaches 100%. Regardless of the exposure distribution and the magnitude of RR, the direction of the bias is underestimation in the case of loglinear RR.

Figure 2

The absolute and relative bias of the average risk approach under the selected distributions and a range of RR per unit. Both absolute and relative bias are presented as a percentage. The absolute bias is the difference in PAF percentage, and the relative bias is the difference in PAF over the PAF using integration and expressed as a percentage. PAF, population attributable fraction; RR, relative risk.

Table 2

Absolute and relative bias in PAF between the average risk approach and the integration approach in selected exposure distributions when RR per unit is 1.001, 1.01 or 1.03 for the loglinear function

We then illustrated the direction of the bias when the RR function is concave. Table 3 showed the resulting bias of the two RR functions in online supplemental figure S1 when the exposure distributions were as reported in table 2. With concave RR functions, the direction of the bias in the average risk approach is overestimation. Similar to the loglinear RR function, we observed little bias in normal, hypergeometric, and beta(8, 2) distributions, whereas substantial bias was observed in power, Poisson with extreme tail, and beta(0.5, 0.5) distributions.

Table 3

Absolute and relative bias in PAF between the average risk approach and the integration approach in two illustrated examples of concave RR functions

Finally, we explored the bias of the average risk approach using real-world data for air pollution (PM2.5) and residential radon. Epidemiologic studies support a loglinear RR function between exposure to residential radon and lung cancer.27 28 A loglinear dose response between PM2.5 and lung cancer risk was less consistent. The loglinear relationship was supported by several studies,29–32 while two studies reported some deviation from it.33 34 The 2019 Global Burden of Disease Study of 87 risk factors suggested that PM2.5 has a loglinear relation with lung cancer in low exposure range (0–50 ug/m3) and a linear relation in high exposure range (>50 ug/m3).16 We assumed a loglinear relation for PM2.5 because the level is typically below 20 ug/m3 in Canada. We found that both exposures had skewed distributions (online supplemental figure S2). The PM2.5 distribution had a long left tail, while the distribution of residential radon has a long right tail. We standardised the exposure levels of PM2.5 and radon to 0.14 ug/m3 and 7.4 Bq/m3 per unit, so that the maximal exposure level is 100 units. The RR per unit of PM2.5 associated with lung cancer was 1.0012 (95% CI 1.0008 to 1.0016). The PAFs of PM2.5 using the integral and the average risk approach were 6.89% (95% CI 4.71% to 8.98%) and 6.87% (95% CI 4.70% to 8.95%), respectively, indicating very small bias in the average risk approach (−0.02%, 95% CI −0.03% to −0.01%). The RR per unit of radon associated with lung cancer was 1.011 (95% CI 1.005 to 1.016). The PAFs of radon using the integral and average risk approach were 6.87% (95% CI 3.33% to 10.52%) and 6.37% (95% CI 3.21% to 9.37%), respectively. The bias was larger than that seen in PM2.5. The absolute bias was −0.5% (95% CI −1.2% to −0.1%) and the relative bias was −7.3% (95% CI −11.0% to −3.5%), indicating slight to moderate bias. The observations were consistent with the simulations, in that small RRs yield little bias (PM2.5), and moderate to large RRs could produce bias with some skewed exposure distributions (radon).

Discussion

Since being introduced by the IARC Working Group in 2007, the average risk approach has been used in several PAF estimation projects.12–15 35 In addition to the cancer burden study in France,15 the ComPARe study in Canada,35 a study of attributable causes in China,12 and two studies in Brazil13 14 have used this method. We illustrated that the direction of bias of the average risk approach is determined by whether the RR function is convex or concave, while the magnitude of bias is affected by the degree of convexity or concavity, as well as the exposure distribution. When the RR per unit is small under a loglinear RR function, the magnitude of bias is also small and the average risk approach is approximately valid. With larger RR and increased convexity, the validity of the average risk approach would also depend on the exposure distribution. We demonstrated that under some circumstances (eg, Poisson distribution with extreme tail, power distribution), the approach could potentially lead to moderate to severe bias.

The average risk approach has an implicit assumption that the minimal risk exposure value is 0. When the minimal risk exposure value is not 0, this approach generates invalid estimates. To offer a simplified example, overweight and obesity defined as BMI ≥25.0 kg/m2 is associated with postmenopausal breast cancer. The minimal risk exposure value of BMI is 25.0 kg/m2. Assuming a log-linear relationship between BMI above 25.0 kg/m2 and the risk of breast cancer and that a postmenopausal female population has a normal distribution of BMI at a mean and SD of 25.0 and 5.0 kg/m2. The average risk approach yields a PAF of 0 in this population, because the population average risk exposure is 25.0 kg/m2, which has a RR of 1.0. Although it is possible to recode the exposure so that the minimal exposure is zero, a new average of the recoded exposure must be estimated, which requires the information of the exposure distribution. On the other hand, the prerequisite of applying the average risk approach is that such information is only available as a population average. In practice, many natural or physiological exposures have a non-zero minimal risk exposure value and the estimation of PAF for such exposures requires additional considerations.36 Therefore, this implicit assumption is a substantial limitation of this approach. For the same reason, the average risk approach cannot be applied in the framework of generalised impact fraction, in which the impact of partial reduction of exposure is considered.

Our study has some limitations. First, this study is an empirical examination of the validity of the average risk approach. We have mathematically demonstrated the direction of the bias in this approach. However, we only qualitatively discussed the magnitude of the bias associated with the RR function and the exposure distribution. We illustrated the magnitude of the bias through several RR functions and exposure distributions. However, this pragmatic approach could not cover all RR functions and distributions. Second, we compared the average risk approach to the integral approach under the assumption of no confounding. The integral approach is an extension of Levin’s formula, which is biased in the presence of confounding.1 11 Ideally, the validity of the average risk approach should be tested against the integral form of Miettinen’s formula, which is based on the prevalence of exposure among the cases and is valid in the presence of confounding.6 However, because the average risk approach was developed under the framework of Levin’s formula, we considered that a comparison of two approaches under the same framework would be more appropriate. Nevertheless, it should be noted that the validity of the average risk approach is also prone to the presence of confounding, just like Levin’s formula.

In conclusion, we have shown that the average risk approach has some utility, but nonetheless carries the risk of bias. This approach should not be used when the minimal exposure level is not zero. We recommend using approaches with smaller risk of bias, such as the integral approach, to estimate PAF when the information regarding the RR function and the exposure distribution data are available.

Data availability statement

Data are available on reasonable request. Extra data, including the R code for simulation and the exposure datasets from the ComPARe study, are available by emailing to yibing.ruan@albertahealthservices.ca.

Ethics statements

References

Supplementary materials

  • Supplementary Data

    This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.

Footnotes

  • Contributors YR participated in study conceptualisation, statistical analyses, drafted the initial manuscript and approved the final version of the manuscript. SW participated in study conceptualisation, supervision and critically reviewed and edited the manuscript. PG provided resources (ComPARe datasets), critically reviewed and edited the manuscript, and approved the final version of the manuscript. CMF participated in funding acquisition, supervision, critically reviewed and edited the manuscript, and approved the final version of the manuscript. DB participated in funding acquisition, supervision and approved the final version of the manuscript.

  • Funding This study was supported by the Canadian Cancer Society Partner Prevention Research Grant (grant #703106).

  • Competing interests None declared.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.