Design issues in studies of radon and lung cancer: implications of the joint effect of smoking and radon.

Many case-control studies have been undertaken to assess whether and to what extent residential radon exposure is a risk factor for lung cancer. Nearly all these studies have been conducted in populations including smokers and nonsmokers. In this paper, we show that, depending on the nature of the joint effect of radon and tobacco on lung cancer risk, it may be very difficult to detect a main effect due to radon in mixed smoking and nonsmoking populations. If the joint effect is closer to additive than multiplicative, the most cost-effective way to achieve adequate statistical power may be to conduct a study among never-smokers. Because the underlying joint effect is unknown, and because many studies have been carried out among mixed smoker and nonsmoker populations, it would be desirable to conduct some studies with adequate power among never-smokers only.

Domestic radon exposure is widespread, and estimates suggest that it is a major contributor to the national burden of lung cancer (1). Currently, the best available data upon which to base a risk assessment of residential radon exposure are derived from mining studies (2)(3)(4)(5). Such studies have clearly established radon as a lung carcinogen, with consistent relative risk levels demonstrating a clear dose-dependent relationship between exposure and lung cancer. Nonetheless, adequate risk estimates derived directly from the residential environment are not yet available. Consequently, many studies have been undertaken tO directly estimate the effect of domestic radon exposure. As demonstrated by Lubin et al. (6), detection of lung cancer attributable to radon will require large sample sizes and accurate exposure estimates, as well as sound research design. However, many of the studies published to date have been inadequate in one respect or another. Some have been ecologic in design rather than case-control (7)(8)(9)(10); some have suffered from small sample sizes (11)(12)(13), inadequate exposure assessment (14,15), or from relatively low and homogeneous exposure levels (16,17), and most have included smokers, which may have obscured the effect of radon. Critical summaries of published studies have been reported (18,19). There are approximately 20 current studies of domestic radon and lung cancer risk which include over 12,000 lung cancer cases and 19,000 referents (20). While these studies may have corrected many of the deficiencies of the earlier studies, there remains a paucity of subjects who are not current or former smokers. Thus, results from these studies will primarily reflect the relationship between lung cancer and radon among smokers. However, depending on the underlying joint effect between radon and smoking, the efficiency of these investigations may be compromised. In fact, the ability of these studies to detect the effect of radon at risk levels with significant public health consequences could be obscured by the presence of the more potent carcinogen.
If the joint effect of cigarette smoking and radon exposure is multiplicative, then the relative risk of lung cancer due to radon will be the same among smokers and nonsmokers. Under those conditions, an epidemiologic study would have equivalent power or require equal sample sizes whether carried out among smokers, nonsmokers, or both. However, if the relationship is not multiplicative, then differences in power would result that could have a profound effect on the underlying levels of risk that could be detected with a case-control study.
Although the notion that the model of interaction between risk factors has an important impact on study design and statistical power is not novel (21,22), it is frequently not addressed in the design of investigations. In this paper, this notion will be illustrated with the example of radon and smoking, using a computer model to determine sample-size requirements. Sample size is a major limiting factor in the ability of an investigation to detect the carcinogenic effect of residential radon. Correspondingly, this analysis will demonstrate that if the true underlying joint effect is less than multiplicative, then studies among never-smokers may be more cost-effective than those among mixed smoking and nonsmoking populations.
Such studies could be undertaken at individual institutions with the ability to identify adequate numbers of never-smokers over the course of several years, or could be undertaken at multiple institutions. Joint Effect and Relative Risk Currently, the true nature of the joint carcinogenic effect of radon and smoking is unknown-it could conceivably range from supramultiplicative to subadditive. A recent analysis of 11 underground mining studies (3) provides the largest database for estimating the joint effect of radon and smoking. While the joint effect varies among the six cohorts for which smoking data were available, the average joint effect appears to be much closer to additive than multiplicative. Nonetheless, the existing scientific evidence is inadequate to either establish or rule out any particular jointeffect model. Although a model of constant levels of joint effect at various levels of smoking would be more parsimonious, a nonlinear model could also be considered, as well as models under which the joint effect varies with different temporal relationships between smoking and radon exposure.
There may be theoretical support for a model in which the interactive effects of the two risk factors depend on level of smoking. For example, if bronchitis developing in the heaviest smokers results in a mucous layer covering the respiratory epithelium, this may protect the basal cell layer from alpha-radiation.
If residential radon is an important risk factor for lung cancer, then the ability to detect this main effect critically depends on the unknown joint effect of radon and smoking. The following hypothetical example illustrates this phenomenon. Suppose the incidence of lung cancer A --* e e~* _ 9 * among radon-unexposed never-smokers is 10/100,000. Further suppose that the incidence among unexposed smokers is 100/100,000 (smoking relative risk, Ro0s = 10), and that among radon-exposed neversmokers is 20/100,000 (radon relative risk, Rw1O = 2.0). If the true underlying joint effect between radon and smoking is multiplicative, then, as shown in Table 1, the incidence among exposed smokers would be 200 and the relative risk of lung cancer due to radon would be the same as for never-smokers (Rs/RO's = 200/100 = 2.0).
On the other hand, under an additive model, the incidence among smokers would be 110/100,000, resulting in an RV,/R0s of only 1.1 compared with the radon relative risk of 2.0 among neversmokers. Figure 1 illustrates the relationship between the relative risk of lung cancer due to radon among smokers compared with that among never-smokers under the additive and multiplicative models.
As the calculations presented below demonstrate, this effect could have dramatic implications for the sample-size requirement of a case-control study. This masking of a weak effect by a stronger one may be considered an example of the phenomenon underlying the advice of Rothman and Poole (21) to restrict studies to lower-risk populations in order to "strengthen weaker associations." Methods Two case-control design options were explored and are compared here: 1) carrying out a study in a mixed population of smokers and nonsmokers, made up of approximately 40.4% current smokers, 31.1% former smokers, and 28.5% neversmokers, and 2) carrying out a study among never-smokers only. In the latter case, it would of course be necessary to include a screening stage in the fieldwork plan to determine the smoking status of potential subjects.
A series of computations were performed, given the two design options, to explore the implications of the aforementioned phenomenon on sample-size requirements and costs. A number of assumptions were made, as follows: 1) the U.S. Environmental Protection Agency National Residential Radon Survey (23) was used to create a hypothetical distribu- Risk of lung cancer due to radon among never-smokers (Rwo) Figure 1. Relative risk (RR) of radon exposure as a cause of lung cancer among smokers never-smokers given additive and multiplicative models of joint effect between radon and s tion of radon exposures grouped into 43 exposure level bins. 2) There are 146,000 annual U.S. lung cancer deaths (LCDs) (24), and the varying proportions of these are due to radon. The number of LCDs attributed to radon is a direct function of relative risk in the overall population. For analyses under an additive model only, the number of LCDs due to radon were allowed to vary between 500 and 13,500. For analyses under varying joint effect models, the number of LCDs due to radon was fixed at 5,000.
3) The prevalences of never, current, and former smoking were taken to be 28.5%, 40.4%, and 31.1%, respectively, based primarily on data available from the Surgeon General's report (25). 4) Rs and Rf the relative risks of lung cancer due to smoking among current and former smokers, are assumed to be 19.22 and 8.52, respectively (25). 5) Exposures to smoking and radon are independent. 6) The relationship between radon exposure and risk was assumed to be linear (consistent with current concepts of dose effects in radiation). 7) The model makes no attempt to adjust for the effects of population mobility and error in exposure assessment.
When estimating sample size, it is necessary to fix some level of excess risk as the detectable level of the effect to be examined. It is customary to use the relative risk as the parameter to be fixed. However, for the purpose of this analysis, it was considered more appropriate to use the attributable risk because the relative risk of a high prevalence exposure in the general population may not be as relevant as the overall public health impact of the exposure. For an exposure as prevalent as radon, even a very low relative risk can result in a high attributable risk. For example, at an Rv of 1.1 among those exposed to the top quartile of radon levels, there would be almost 7,000 annual U.S. lung cancer deaths. While the failure to detect a relative risk as low as 1.1 might seem acceptable for many occupational or environmental exposures, this would result in a significant undetected hazard in the case of radon. The following formula expresses the joint effect of smoking and radon: Rw, =[R0 XR0]qX [RW +R Os -q tive_ where q = 0 represents an additive joint effect, and q = 1 represents a multiplicative joint effect (6) (the analogous formula with "f* substituted for "s" was used for the joint effect of radon and former smoking). 1. 9 2 Because the value of q is unknown, it was allowed to vary between 0 and 1.2 to cover the range between additive and supramulcompared with tiplicative. smoking.
Sample size and power estimates were Volume 103, Number 1, January 1995 calculated using the continuous exposure model method of Lubin et al. (26). Given each set of the quantities specified above, expected distributions of radon exposures for cases and controls were generated and used to calculate sample size requirements to detect a radon effect, assuming 80% power, a 1:1 case-control ratio, and a two-tailed a = 0.05. These calculations are described in greater detail in the appendix. The computations were implemented using Microsoft Excel version 5.0 (Microsoft Corporation, Redmond, Washington).
Once the sample sizes were estimated, additional computations were performed to investigate the anticipated relative costs of various case-control studies roughly based on estimates obtained from a pilot case-control study of never-smokers in Michigan. For the purpose of this paper, the cost estimates include the labor and materials associated with data collection (interviewing, field work, coordination), and do not include the costs of data analysis (principle and co-investigators, consultants), nor do these costs include institutional indirect costs. It was assumed that cases would be identified through the abstracting of hospital and clinic records, using an existing tumor registry system (e.g., a Surveillance, Epidemiology, and End Results Program database), and that these subjects would be subsequently interviewed to screen for smoking status. The costs for case ascertainment include the labor and materials to abstract information from hospital charts for all lung cancer cases, mail letters to physicians prior to patient contact, and coordinate the project. For never-smoker cases, costs would be much higher because approximately 20 times as many lung cancer patients must be identified to yield an equal number of never-smoker cases. Further, a brief telephone screening interview must be performed to identify smoking status. While cases drawn from the general population of lung cancer patients might cost only $30 each to ascertain, the never-smoker cases would cost approximately $400 each. The cost ratio (for ascertaining never-smoker cases versus all cases) is lower than 20:1 because of economies of scale and other operational factors which reduce the per registry patient cost of enrolling subjects in the never-smoker cohort. Control ascertainment would be through random-digit dialing methods and may be of similar cost in either study because smoking status would need to be matched in either case.
Exposure assessment would be similar regardless of subject status as a case, control, smoker, or never-smoker. This would require a lengthy interview of each subject, the identification and tracking of each of their prior residences (2.7 homes per subject), gaining entry to each home, deployment of an alpha-track measurement device, administration of a home survey, repeated telephone contacts during the year of measurement, collection of detectors from each home, along with an updated home survey, data entry, and project coordination. Table 2 lists the costs that were assumed for subjects in each of the two study designs under consideration. Costs are provided for both never-smokers and a mixed population of smokers and nonsmokers (in naturally occurring proportions).

Results
Under the assumptions listed in Methods, and assuming an additive joint-effect model with an attributable risk of 5000 annual lung cancer deaths per year, only 111 cases and controls would be required if never-smokers were studied. If a similar study were conducted in a mixed smoker and nonsmoker population, however, the sample size required would increase to 5651 cases and controls. If the underlying joint-effect model were not additive and the attributable risk were nonetheless fixed at 5000 annual cases, the sample-size requirement would remain constant at 5651 in the mixed population but would vary for never-smokers depending on the joint effect between radon and smoking. Under the additive model, only 111 cases and controls would be required, while under a multiplicative model, 5651 would be required. Figure 2 shows sample-size requirements for different values of the joint-effect parameter. With a fixed number of annual lung cancer deaths, the overall relative risk of lung cancer due to radon in the general population remains fixed, explaining the constant sample-size requirement under conditions of a mixed population. On the other hand, if a study were limited to never-smokers, the samplesize requirement would depend on the relative risk of radon among never-smokers. With the overall relative risk of lung cancer due to radon (R,) fixed, the relative risk among never-smokers (Rk,) will decrease, while the relative risk among smokers (Rk S) will increase, as the joint-effect parameter varies from additive (q = 0) to multiplicative (q = 1). As a result, the samplesize requirement is relatively low (111) when radon is responsible for 5000 annual LCDs under an additive model, whereas it is considerable (5651) when the same number of deaths are attributed under the multiplicative model. If the joint effect were taken to be 0. 19 [the average of the joint-effect parameters reported by the recent report (3) of the National Cancer Institute], then the sample-size requirement would be 577 in a never-smoker study and 5651 in a mixed population study.
Assuming a purely additive model, Figure 3 presents the sample-size requirements as a function of the potency of radon as a carcinogen (expressed as a varying attributable risk) for both never-smokers and a mixed population. The analysis demonstrates that under the additive model, studying never-smokers would require up to two orders of magnitude fewer subjects than studying a mixed population, irrespective of the level of relative or attributable risk within the range of attributable risks studied (500 to 13,500 attributable cases per year). Figure 4 presents cost estimates for identifying subjects and assessing exposure in studies among never-smokers and the general population of lung cancer cases, based on the estimated sample-size requirements of Figure 2. A considerable savings can be appreciated for studies restricted to never-smokers when the joint effect is submultiplicative. In fact, if the true joint effect is approximately 0.19, as suggested by the weighted average of the joint-effect parameters found in the mining studies (3), then the cost estimate for a study of never-smokers is approximately eight times lower than a comparable study of a mixed population of current smokers, former smokers, and never-smokers.

Discussion
A large number of studies have been undertaken to examine the relationship between domestic radon and lung cancer. Since the vast majority of lung cancer patients (typically 95%) are or were smokers, most of these studies are dominated by smokers. If the joint effect between radon and smoking is multiplicative or supramultiplicative, some of these studies would have sufficient power to detect reasonable relative risks of lung cancer due to radon. On the other hand, if the joint effect is closer to additive, then it is unlikely that any of the current studies would have adequate power to detect the effect at risk levels that would be of public health importance. This could inappropriately lead to the conclusion that residential radon does not pose a significant carcinogenic risk.
One solution to this dilemma may be to increase sample sizes with additional studies among smokers. However, given the high cost of exposure assessment for radon, it would be more cost effective to study a smaller cohort of never-smokers. This advantage appears to hold for most levels of joint effect below multiplicative, even though the cost of case ascertainment for never-smokers is high (it generally   requires screening 20 cases to identify one never-smoker). Depending upon the sample size required (based upon the underlying joint effect), the difference in cost between studying smokers compared with never-smokers could be considerable. It may be that meta-analysis could extract additional useful information about the effect of radon among never-smokers. However, even when existing case-control studies are pooled, the total number of never-smokers remains small. This factor, and the usual heterogeneity of data sources, may still preclude a definitive conclusion about radon and lung cancer from a meta-analysis.
Another alternative might be to oversample never-smokers using a randomized recruitment approach to probability matching on smoking status, as suggested by Weinberg and Sandler (27). Such an approach would allow one to take advan-tage of the extra information the neversmokers would provide if an additive (or submultiplicative) joint effect holds. This extra information is reflected in the contrasting sample sizes of 111 for neversmokers versus 5651 for smokers. If a sample size between these two extremes could be afforded, including data on smokers collected in the same study might allow for assessment of interaction. However, such a study would have a sample-size requirement of greater than 111 for the neversmoker stratum, since either the main effect estimate would be diluted by a much smaller effect of radon among the smokers, or a multiple comparison adjustment would be indicated for the separate subgroup tests.
If there had been no previous investigations, then the case for studying neversmokers versus smokers would not be as clearcut. In fact, it may be argued that these Volume 103, Number 1, January 1995 -9.J i E S1 strategies evaluate different hypotheses. While the smoker study would also be able to examine the hypothesis of joint effect of smoking and radon, the never-smoker study may have greater power to examine the main effect of radon alone. Further, a never-smoker study might provide an opportunity to simultaneously examine the effects of passive smoking, which could not be examined among smokers, since the effect would be overwhelmed by the much stronger effect of active smoking.
The current ongoing studies may be adequate to evaluate the risk of radon among smokers if the joint smoking-radon effect is close to multiplicative. However, if the true joint effect between smoking and radon is close to additive, it would be more cost-effective to study never-smokers. Although case ascertainment costs would be considerably higher in a never-smoker study, this would be more than offset by the reduced sample-size requirement, which would reduce exposure assessment costs. However, a joint effect close to additive would also imply that the public health burden of radon as a lung carcinogen, as measured by the number of attributable cases, is less than it would be if the true joint effect approaches a multiplicative or supramultiplicative model. In this sense, the need to detect the risk associated with radon may be greater if the multiplicative model were correct.
Nevertheless, under the additive model, radon could be responsible for a large number of annual lung cancer cases, which would be extremely difficult to detect with a study of smokers. As previously mentioned, under an additive model, even very low relative risk levels can result in considerable mortality due to the large population exposed to radon and the high overall lung cancer mortality rate.
The present evaluation of differential costs for studies using different strategies was predicated on several simplifying assumptions, and varying these assumptions could affect the conclusions. First, the actual costs of screening for neversmokers and performing other aspects of an investigation may vary depending on where the study is undertaken. The relative advantages of one strategy over another may be affected if relative costs were very different from those assumed in Table 2. The estimates used in this analysis were based on an investigation performed at a center with a tumor registry having the capabilities for rapid reporting of lung cancer cases. If an abstracting unit were not already in existence, then the costs of case ascertainment could be much higher, and this could affect the relative cost effectiveness of the study design being considered.
Second, a variety of other factors which may increase sample-size requirements have not been considered in this analysis, including measurement error in the use of alpha-track devices, use of current exposure data as a surrogate for historic exposure, error in the reconstruction of home occupancy dates, loss of data on exposure (e.g., homes may be unavailable for measurement; occupancy durations may not meet inclusion criteria for study), use of ambient measurements in fixed locations rather than where the subjects actually spent their time (both within the home and outside the home), inadequacy of ambient radon levels as a measure of alpha radiation dose to the respiratory epithelium, use of radon gas measurements as a surrogate for radon progeny exposure, error associated with estimating the temporal association between radon and lung cancer (e.g., latency periods), inadequate accounting of other carcinogenic exposures (e.g. inaccurate active or passive smoking histories), age effects, and nonresidential radon exposures. Further, variations of the additional assumptions described in the Methods section could impact the outcomes. Although such factors and others may have a significant effect on ultimate sample size requirements, they should not affect the comparison of design strategies. Although the focus of this paper was on radon and lung cancer, the underlying principles apply more generally. The assessment of sample size for case-control studies is usually based on an implicit assumption that the risk factor under study interacts in a multiplicative fashion with other risk factors. When this assumption of multiplicity does not hold, sample size estimates may not be accurate. When the joint effect is less than multiplicative, larger samples may be required, or else the study could be restricted to a subgroup in which the other risk factors are absent. The optimal approach would depend on the relative costs of identifying and studying the subgroup as compared with a larger, unrestricted cohort. Unless models other than multiplicative are considered, negative study outcomes may not support the conclusion that an association is absent. A lack of association cannot always be inferred until the body of studies includes a reasonable complement of approaches.
In addition to selecting the population based on smoking status, investigators should also consider other important design issues. The regions for study should have relatively high and heterogeneous exposure levels. This analysis was based on the national exposure distribution in the United States. However, sample-size requirements may be reduced for studies carried out in areas with higher exposure levels. Because population mobility will homogenize exposure and reduce study power (6), areas with low mobility are preferred. Effort must be made to maximize yield and minimize missing data in the reconstruction of the residential histories and in the placement of radon detectors in homes. The most commonly recommended method for exposure assessment is the year-long alpha-track measurement (28). However, such measurements examine only current exposure at fixed ambient locations in the home. A possible complementary method of exposure measurement might include the use of heirloom glass or porcelain objects as dosimeters, or measurements of lead-210 emissions made in vivo (29). The relationship between these measurements and actual historic cumulative dose to the human respiratory epithelium is not well established.
In conclusion, given the lack of welldesigned studies restricted to never-smokers, such an investigation may be indicated to examine the hypothesis that radon poses a significant public health risk to both smokers and nonsmokers under a submultiplicative, additive, or nonlinear model of joint effect between radon and smoking. computed, i.e.: 146,000/(146,000 -15,000) = 1.1145. Step 3: From the above relationships, the quantities Cr s Cr and Cr n may be computed.
Step 4: The quantities C , C f, and C are r~~s' r~f' r,n then used to impute the distributions of radon exposure levels for current smokers, former smokers, and never smokers. These are in turn used to compute power or sample size using the Lubin method (26).
The starting point for the exposure distributions is the recent EPA residential radon survey data (30). The exposure levels were grouped into convenient intervals of varying widths. Cutpoints were 0, 0.0625, 0.125, . . . 1.0, 1.25, 1.5, 1.75, 2.0, 2.5, 3.0, . . . 5.0, 6.0, 7.0, ... 30.0, and 53.0 pCi/i. All subjects in an interval were assumed to have been exposed to the midpoint value, except for the last interval, where the exposure was 53.0 pCi/i. It is assumed that the rate of radon-induced lung cancer is linearly related to the exposure level. Therefore, the proportion of lung cancers in the ith interval is estimated to be: where Hi is the proportion of homes in the ith exposure interval. If C , Ct f and C; n are substituted for Cr, and Crs C, and C, f~or C, respectively, in the above formula, proportions of lung cancers at each exposure level for current, former, and never smokers may be estimated.