Data on cardiovascular and pulmonary diseases among smokers of menthol and non-menthol cigarettes compiled from the National Health and Nutrition Examination Survey (NHANES), 1999-2012.

This Data in Brief contains results from three different survey logistic regression models comparing risks of self-reported diagnoses of cardiovascular and pulmonary diseases among smokers of menthol and non-menthol cigarettes. Analyses employ data from National Health and Nutrition Examination Survey (NHANES) cycles administered between 1999 and 2012, combined and in subsets. Raw data may be downloaded from the National Center for Health Statistics. Results were not much affected by which covariates were included in the models, but depended strongly on the NHANES cycles included in the analysis. All three models returned elevated risk estimates for three endpoints when they were run in individual NHANES cycles (congestive heart failure in 2001-02; hypertension in 2003-04; and chronic obstructive pulmonary disease in 2005-06), and all three models returned null results for these endpoints when data from 1999-2012 were combined.


Value of the data
Results of different models run on the same data set provide insights into how the data (i.e., which cycles of NHANES) and the covariates selected for inclusion in a model influence risk estimates.
Estimates based on individual (i.e., 2-year) cycles of the NHANES versus estimates from combined cycles of NHANES show inconsistency and illustrate that analyses using individual cycles should not be used to draw causal inferences about the population.
The data provided here allow comparisons between analyses published in two recent papers that reported contradictory results.

Experimental design, materials and methods
Two recent publications reported contradictory findings from analyses of data from the National Health and Nutrition Examination Survey (NHANES). Vozoris reported a statistically significantly increased adjusted odds of stroke diagnosis among menthol compared with non-menthol cigarette smokers, in particular among non-African Americans, using data from 2007-2008 cycle (incorrectly reported as 2001-2008) of NHANES [5]. Rostron did not detect a difference in stroke risk among smokers of menthol compared with non-menthol cigarettes, based on analyses of NHANES data from the 1999 through 2010 cycles [3]. Our investigation of the reasons for the discordant results reported by Vozoris and Rostron with respect to stroke risk, and the results of new analyses comparing stroke risks among smokers of menthol and non-menthol cigarettes that use all NHANES cycles from 1999 through 2012 is available elsewhere [4]. The differences between the Vozoris [5] and Rostron [3] results were shown to be mainly due to the inadvertent exclusion of all but the 2007-2008 NHANES data from the Vozoris [5] analysis. The data presented here examine risks of other endpoints evaluated by Vozoris (i.e., hypertension (HTN), myocardial infarction (MI), congestive heart failure (CHF), and chronic obstructive pulmonary disease(COPD)) among smokers of menthol compared with nonmenthol cigarettes estimated according to three different logistic regression models: 1) models proposed by Vozoris, using NHANES 2007-2008, 1999-2010, and 1999-2012; 2) models proposed by Rostron, using NHANES 2007-2008, 1999-2010, and 1999-2012; and 3) a new set of models we developed with purposeful selection techniques using NHANES 1999-2012.
NHANES is a nationally representative survey of US, non-institutionalized civilians. It is conducted in two year cycles, with approximately 10,000 individuals in each cycle. Interviews elicit information on demographic characteristics (e.g., age, gender, race/ethnicity), smoking habits, and whether a health professional had ever diagnosed the participant with certain medical conditions. Cycles of the NHANES can be combined, or they can be analyzed individually. Because NHANES employs a complex, multistage, sampling strategy, survey statistics must be used to analyze the data and to generalize findings to the US population. In this case, we used the SURVEYLOGISTIC procedure of SAS/STAT© version 9.4 to perform logistic regression accounting for the complex sampling design, i.e., using both the masked variance pseudo-primary sampling unit (SMDVPSU) and the masked variance pseudostratum (SDMVSTRA) variables, using the adjusted 2 year interview weight (WTINT2YR), and using Taylor series linearization to estimate the covariance matrix. Weights were adjusted for the inclusion of multiple surveys [2] by dividing the WTINT2YR variable by the number of cycles used in each analysis. We additionally ran all models within strata defined by age, race/ethnicity, and gender using the SAS DOMAIN statement to specify these subpopulations and to ensure the variance and standard errors were calculated correctly. See associated file SAS CODE.DOCX for the code to combine the cycles of NHANES with common variables and an example of the Proc Logistic code used for analysis.        Following both Vozoris and Rostron, we defined current smokers as those who had smoked Z1 of the last 30 days and who were Z20 years old at the time of the interview. Table 1 shows the variables we used in these analyses. We identified cases by their self-reported diagnoses according to the question "has a doctor or other health professional ever told you that you had [high blood pressure, a heart attack, congestive heart failure, a stroke, or COPD (emphysema or chronic bronchitis)]" (yes/no). We considered all other responses to be a non-response and set them as missing. Stroke was the subject of Van Landingham et al. [4], and data are not presented here.  We ran three sets of models for each outcome using data from NHANES 2007 to 2008 (as used by Vozoris), from 1999 to 2010 (as used by Rostron) and from 1999 to 2012 (all cycles available when we undertook the project) to determine if the selection of covariates or cycles of the NHANES influenced the results. First, we implemented the model described by Vozoris (Tables 2-4); second, we implemented the model described by Rostron (Tables 5-7); last, we developed a new model for each outcome using purposeful selection of covariates (Table 8). Purposeful selection of covariates was conducted as follows: a preliminary model consisted of cigarette type (menthol or non-menthol) and all relevant, potential covariates (Table 1) with cigarette type forced to remain in all models. We identified each covariate, other than cigarette type, with a p-value of greater than 0.05. We refit the model after dropping the covariate with the largest p-value, until only cigarette type and covariates with p-values of 0.05 or less remained. Each covariate that had been dropped was added back individually, and we calculated the relative percent change in the regression coefficient for cigarette type for the larger model compared with the model containing only statistically significant covariates     (Eq. (1)). If including a given covariate resulted in a relative percent change in the regression coefficient greater than 15%, that covariate was retained in the model.
Once we determined the covariates to include in the model (main effects), we explored all the possible interactions between the covariates (excluding cigarette type). We added all interaction terms with p-values less than or equal to 0.1 to the model individually, along with the main effect terms, and retained them if the relevant coefficients in the fully adjusted model were statistically significant, with p-values of 0.05 or less. We retained statistically significant interaction terms in the model only if one or both main effects were also statistically significant. We used domain variables to define strata according to race/ethnicities, genders, and age groups, but did not repeat the model building process. We then re-ran each model for individual cycles of the NHANES in order to determine if there were anomalous or secular patterns in risk of any outcome that might be overlooked in the combined analysis (Figs. 1-4).