Racial and Gender Disparities in Incidence of Lung and Bronchus Cancer in the United States: A Longitudinal Analysis

Background Certain population groups in the United States carry a disproportionate burden of cancer. This work models and analyzes the dynamics of lung and bronchus cancer age-adjusted incidence rates by race (White and Black), gender (male and female), and prevalence of daily smoking in 38 U.S. states, the District of Columbia, and across eight U.S. geographic regions from 1999 to 2012. Methods Data, obtained from the U.S. Cancer Statistics Section of the Centers for Disease Control and Prevention, reflect approximately 77% of the U.S. population and constitute a representative sample for making inferences about incidence rates in lung and bronchus cancer (henceforth lung cancer). A longitudinal linear mixed-effects model was used to study lung cancer incidence rates and to estimate incidence rate as a function of time, race, gender, and prevalence of daily smoking. Results Between 1999 and 2012, age-adjusted incidence rates in lung cancer have decreased in all states and regions. However, racial and gender disparities remain. Whites continue to have lower age-adjusted incidence rates for this cancer than Blacks in all states and in five of the eight U.S. geographic regions. Disparities in incidence rates between Black and White men are significantly larger than those between Black and White women, with Black men having the highest incidence rate of all subgroups. Assuming that lung cancer incidence rates remain within reasonable range, the model predicts that the gender gap in the incidence rate for Whites would disappear by mid-2018, and for Blacks by 2026. However, the racial gap in lung cancer incidence rates among Black and White males will remain. Among all geographic regions, the Mid-South has the highest overall lung cancer incidence rate and the highest incidence rate for Whites, while the Midwest has the highest incidence rate for Blacks. Between 1999 and 2012, there was a downward trend in the prevalence of daily smokers in both genders. However, males have significantly higher rates of cigarette smoking than females at all time points. The highest and lowest prevalence of daily smoking are found in the Mid-South and New England, respectively. There was a significant correlation between lung cancer incidence rates and smoking prevalence in all geographic regions, indicating a strong influence of cigarette smoking on regional lung cancer incidence rates. Conclusion Although age-adjusted incidence rates in lung cancer have decreased throughout the U.S., racial and gender disparities remain. This longitudinal model can help health professionals and policy makers make predictions of age-adjusted incidence rates for lung cancer in the U.S. in the next five to ten years.


Results
Between 1999 and 2012, age-adjusted incidence rates in lung cancer have decreased in all states and regions. However, racial and gender disparities remain. Whites continue to have lower age-adjusted incidence rates for this cancer than Blacks in all states and in five of the eight U.S. geographic regions. Disparities in incidence rates between Black and White men are significantly larger than those between Black and White women, with Black men having the highest incidence rate of all subgroups. Assuming that lung cancer incidence rates

Introduction
Lung and bronchus cancer is the second most commonly diagnosed cancer (excluding nonmelanoma skin cancer) in the United States [1]. Approximately 7.7% of men and 6.3% of women will be diagnosed with this cancer during their lifetime. The risk factors for lung and bronchus cancer (henceforth lung cancer) are tobacco use, family history, and environmental and occupational exposures, such as second-hand smoke, radon, and asbestos [2]. According to the Centers for Disease Control and Prevention, cigarette smoking is the number one risk factor for lung cancer, linked to about 80% to 90% of lung cancers in the United States [3]. Racial/ethnic and socio-demographic differences exist in the prevalence of cigarette smoking, with higher use among males, especially Blacks males, persons with lower education, and those with annual household income less than $20,000 [4]. Similarly, lung cancer incidence rates vary substantially by gender, race/ethnicity, socioeconomic status, and geography, in large part because of differences in cigarette-smoking patterns [5]. Lung cancer incidence is higher among Black males, people of lower socioeconomic status, and persons living in the South [6].
Although reduced tobacco use and increased prevention and early detection efforts have improved lung cancer outcomes for both men and women, disparities remain [7,8]. The National Cancer Institute defines cancer disparities as adverse differences in cancer incidence, prevalence, mortality, survivorship, and burden among specific population groups [9]. Several studies have reported pronounced racial/ethnic disparities in lung cancer in the United States [10][11][12][13]._ Others have demonstrated that adjusting for socioeconomic status virtually eliminates racial/ethnic differences in stage-adjusted lung cancer mortality [14]. In this study, we investigate disparities in incidence rates of lung cancer by race (Black/White), gender, and prevalence of daily smoking in 38 U.S. states, the District of Columbia, and the eight U.S. geographic regions between 1999 and 2012.
Because of variation in confounders of lung cancer incidence at state, regional, and group levels, the longitudinal analysis was applied as a linear mixed-effects model. Mixed-effects Institute grant 2U54-CA118948 also supported this work. Mona N. Fouad and Karan P. Singh received this funding. This work was also supported in part by the National Institute on Minority Health and Health Disparities (http://www.nimhd.nih.gov/) (grant U54MD008176, Mona N. Fouad, P.I.). Mona Fouad, Karan P. Singh, and Gabriela R. Oates were supported by this funding. Also, this work was supported intramurally by Cameron University, Lawton, Oklahoma (http://www.cameron.edu/) (Jean-Jacques Kengwoung-Keumo and Akinola Akinlawon were supported by this funding). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. models are powerful tools in longitudinal studies because they allow for a simultaneous estimation of fixed and random coefficients. Random coefficients capture variation among and within elements, whereas fixed parameters improve the predictions through calibration of various dynamics of the model. When the longitudinal response is discrete, generalized linear [15] and nonlinear [16] mixed-effects models are more appropriate for relating changes in the mean response to covariates [17]. Tabatabai et al. [18,19] were first to use a longitudinal hyperbolastic mixed-effects type II model in cervical cancer research. They analyzed disparities in cervical cancer mortality rates between White and Black women in 13 U.S. states between 1975 and 2010 and attributed racial disparities to differences in socioeconomic factors, such as education and poverty levels, as well as to screening and treatment modalities. In this paper, we use a longitudinal linear mixed-effects model to analyze disparities in lung cancer incidence in the United States and establish a predictive model that estimates lung cancer incidence rate as a function of time, race, gender, and prevalence of daily smoking.

Methods
Lung cancer incidence data for 1999-2012 were obtained from the U.S. Department of Health and Human Services Cancer Statistics Section, available through the Centers for Disease Control and Prevention WONDER database [20]. The data set included 38 U.S. states (Alabama, Arizona, Arkansas, California, Colorado, Connecticut, Delaware, Florida, Georgia, Illinois, Indiana, Iowa, Kansas, Kentucky, Louisiana, Maryland, Massachusetts, Michigan, Minnesota, Mississippi, Missouri, Nebraska, Nevada, New Jersey, New York, North Carolina, Ohio, Oklahoma, Oregon, Pennsylvania, Rhode Island, South Carolina, Tennessee, Texas, Virginia, Washington, West Virginia, and Wisconsin) and the District of Columbia. The total number of observations in this study is 2,169. The choice of states was based on data availability for both racial groups. We doubled the sample size by separating Blacks from Whites. The combined data reflect approximately 77% of the U.S. population and constitute a representative sample for making inferences about lung cancer incidence rates in the U.S.
Smoking prevalence rates were obtained from Dwyer-Lindgren et al. [21], who performed a comprehensive study of smoking prevalence in U.S. counties using data of over 4 million adults from the Behavioral Risk Factor Surveillance System (BRFSS) for 1996-2012 [22]. The authors utilized the BRFSS data and applied validated small-area estimation techniques to estimate daily cigarette smoking prevalence for U.S. counties. Using the county data for male and female smokers from 1999 to 2012, we estimated the percent daily smokers for all 38 states and District of Columbia.
We used a mixed-effects model because of observed variability in lung cancer incidence rates at state and regional levels. After initial screening of the data, including tests for linearity, we selected the following linear mixed-effects model: where the response variable Y ij is the i th lung cancer incidence rate in the j th state at time T ij (i = 1,2,. . .,n j ), and n j is the number of observations from the j th state. β 0j , β 4j are unknown state-specific regression coefficients, and β 0j = β 0 + U 0j and β 4j = β 4 + U 1j are used to explain the observed variability between the states with respect to daily smoking. β 0 , β 4 are unknown regression parameters, and U 0j and U 1j are state-specific random effects, where U 0j expresses how much the intercept of state j which is denoted by β 0j deviates from the global intercept β 0 , and U 1j expresses how much the slope of the percent of daily smokers for state j denoted by β 4j coefficients. Eq 1 can be presented in an expanded form as By replacing R ij by 0 and G ij by 1 in Eq 2, we obtain the following equation for the lung cancer incidence rate for Black males: Y ij = (β 0j + β 2 ) + (β 3 + β 5 )T ij + (β 4j + β 7 )S ij + ε ij A one-percent increase in the Black male daily cigarette smokers would result in an increase of (β 4j + β 7 ) cases per 100,000 in the incidence of lung cancer for Black males. For White males, replacing R ij by 1 and G ij by 1 in Eq 2 gives the following equation of lung cancer incidence rate: Y ij = (β 0j + β 1 + β 2 ) + (β 3 + β 5 + β 6 )T ij + (β 4j + β 7 + β 8 + β 9 T ij )S ij + ε ij A one-percent increase in the White male daily cigarette smokers would result in an increase of (β 4j + β 7 + β 8 + β 9 T ij ) cases per 100,000 in the incidence of lung cancer for White males.
Similarly, by replacing the corresponding numbers for race and gender using Eq 2, we obtain the equations for White females and Black females.
According to the Surveillance, Epidemiology, and End Results (SEER) data, the formula for incidence rate is: Age-adjusted incidence rates were calculated with age distribution ratios from the year 2000 standard million population and are shown per 100,000 people. For brevity, in the remainder of this paper we use incidence rates to mean age-adjusted incidence rates. Computer-based analyses were performed with SAS version 9.3, SPSS version 23, and Mathematica version 10.

Results
Using the model described above and SAS PROC NLMIXED features, we obtained the model parameter values. Table 1 shows the estimates of coefficients and the associated standard errors. An estimate of the unstructured variance covariance matrix for the longitudinal linear mixed-effects model is given by: As seen in Table 1, race, gender, time, percentage of daily smokers, and the interaction variables are all significant predictors of lung cancer incidence rate. Based on the parameter estimates from Table 1, the estimated predictive equation for lung cancer is: The estimated incidence rate for Black males is: and the corresponding equation for White males is: The equation for White males reveals interaction between time T and percent daily smokers S. This indicates that the rate of change of lung cancer incidence with respect to change in daily smoking is different at different time points. For Black males, this rate is constant at the level 3.9311; for White males, the rate is a linear function of time T and is estimated as (2.8466+-0.1087T). The results indicate that among Black males, a 1% increase in the percent of daily smokers will increase the Black male lung cancer incidence rate by approximately 3.9311 cases per 100,000. For White and Black females, this rate remains constant. For females, we have approximated the percent daily smokers as a function of time by For males, the estimated equation is Replacing S in Eq 3 using Eq 4 results in the following equation for the incidence rate of females as a function of race and time: Similarly, replacing S in Eq 3 using Eq 5 results in the following equation for the incidence rate of males as a function of race and time: This predictive model accurately assesses the fluctuating trends of lung cancer incidence rates from 1999 to 2012. Although there is no universally acceptable coefficient of determination (R 2 ) for longitudinal mixed-effects models, we use the Xu [23] formula defined as

or f ull model Variance f or null model :
For our model, R 2 is 0.82. Assuming that lung cancer incidence rates remain within reasonable range, the model predicts that by mid-2018, the gender gap in the incidence rate for Whites would disappear. At that time, the common incidence rate for Whites regardless of gender would be approximately 48 cases per 100,000. By year 2026, the gender gap in the incidence rate for Blacks would disappear as well. However, the racial gap in lung cancer incidence among males will not disappear in the near future. For year 2013, the model estimates of lung cancer for Black males, White males, White females, and Black females are 89.87, 65.17, 50.52, and 48.56, respectively. These estimates are consistent with the estimates of the Centers for Disease Control and Prevention [24]. Table 2 gives the mean incidence rates for Blacks and Whites from 1999 to 2012. Fig 2 shows that from 1999 to 2012, Whites continued to have lower burden of lung cancer incidence than Blacks. The mean incidence rates in lung cancer for Blacks and Whites decreased from 90.72 to 70.46 and from 76.34 to 65.20, respectively.
To further explore differences in lung incidence rates between the two racial groups, we computed mean incidence rates by gender ( Table 3 and Table 4). Within each racial group, we observed pronounced gender disparities in lung cancer incidence. Black men have incidence rates about 1.8 times higher than those for Black women, while White men have incidence rates approximately 1.4 times higher than those for White women.
Gender disparities in lung cancer incidence are thus more severe in the Black population than in the White population. In the 14-year time period, the lung cancer incidence rates for females of either race have been lower than the incidence rate for their male counterparts and remain stable throughout the study period. Over time, males of either gender have decreasing incidence rate, but Black males have higher incidence curve compared to White males. Tables 2-4 show the estimated standard deviations, coefficient of variation (CV), and fivenumber summary. The CV, which is the ratio of the standard deviation to the mean, was computed to control for the differences in the mean mortality rates for each race in the 38 U.S. states and the District of Columbia. The CV for a single variable aims to describe the dispersion of the variable in a way that does not depend on the variable's measurement unit; the higher the CV is, the greater the dispersion of the variable. Among the 38 states and the District of Columbia, Kentucky had the highest incidence rate for all four subgroups: Black males, Black females, White males, and White females. Delaware had the lowest incidence rate among White females, Florida had the lowest incidence rate among Black females, and Nevada had the lowest incidence rate for Black males.
There is evidence that lung cancer incidence rates vary by geographic region [25,26]. We grouped the 39 elements in our investigation (38 states and the District of Columbia) into eight U.S. geographic regions to assess regional differences in lung cancer incidence rates by race. The geographic regions were New England (Connecticut, Massachusetts, and Rhode Island), Mid-Atlantic (District of Columbia, New Jersey, New York, and Pennsylvania), Midwest  Table 5 shows the lung cancer incidence rates for Blacks and Whites by U.S. geographic region. Among Blacks, the highest mean incidence rate (92.92) was observed in the Midwest, and the lowest mean incidence rate (63. 19) in New England. The Mid-South had the highest coefficient of variation (42.89%) while the Rocky Mountain had the lowest coefficient of variation (30.89%). Among Whites, the highest mean incidence rate (86.24) was observed in the Mid-South, and the lowest mean incidence rate (63.20) in the Mid-Atlantic. The Mid-South had the highest coefficient of variation (29.65%), while New England had the lowest coefficient of variation (17.72%).
In five U.S. geographic regions (Mid-Atlantic, Mid-South, Midwest, Pacific Coast, and Southwest), the mean incidence rates for Blacks were higher than those for Whites during the period 1999-2012. In the remaining three regions (New England, Rocky Mountain, and South), this trend was reversed over time. During 1999-2012, the coefficients of variation for Whites in all regions were smaller than those for Blacks, indicating that lung cancer incidence rates for Whites were more homogeneous across U.S. regions. We also analyzed gender disparities in lung cancer incidence rates within each racial group by geographic region. During the 14-year period, incidence rates for men were higher than those for women in both racial groups. There was an insignificant racial difference in the incidence rates of women but pronounced racial disparities in the incidence rates of men. Overall, during the 14-year period, Black men continued to have higher incidence rates in all U.S. regions. Table 6 shows that the Mid-South region had the highest mean incidence rate (86.85) of all regions, while the Rocky Mountain had the lowest mean incidence rate (63.72), regardless of race and gender. The Mid-South also had the highest coefficient of variation, while New England had the lowest coefficient of variation. These findings prompted us to investigate further the dynamics of incidence rates in the Mid-South.
We applied our longitudinal linear mixed-effects models to the Mid-South region, comprised of six states: Alabama, Arkansas, Kentucky, Louisiana, Mississippi, and Tennessee. Table 7 shows that in 1999 the difference between mean incidence rates for Blacks and Whites was about 3/100,000. The highest mean incidence rate was recorded in Kentucky, while the Disparities in Incidence of Lung and Bronchus Cancer lowest in Alabama ( Table 8). The racial gap fluctuated but eventually narrowed down to almost zero in 2012, suggesting that the Mid-South region is successful in eradicating racial disparities in lung cancer incidence rates. However, gender disparities persisted throughout the 14-year period. Of note, the mean incidence rate for White women was higher than that for Black women, while the opposite was observed for men ( Table 9). For both genders, the Mid-South had the highest prevalence of daily smokers among all U. S. regions, while New England had the lowest ( Table 10).
In the Mid-South, Kentucky had the highest percent of daily smokers for both genders, and Mississippi had the lowest percent of daily smokers for both genders (Table 11). Among all 38 U.S. states and District of Columbia, Kentucky had the highest percent of daily smokers for both races and genders, and California had the lowest percent.  Table 12 show a downward trend in the prevalence of daily smokers in both genders. However, males have significantly higher rates of cigarette smoking than females at all time points.
Among the 8 U.S. geographic regions, the Mid-South had the highest lung cancer incidence rate as well as the highest percentage of daily smokers ( Table 13). The estimated correlation coefficient between lung cancer incidence rates and the percentage of daily smokers for the 8 U.S. geographic regions was 0.74, indicating a strong influence of cigarette smoking on regional incidence rates of lung cancer.

Discussion
This longitudinal study assessed differences in lung cancer incidence rates between Black and White males and females in 38 U.S. states, the District of Columbia, and eight U.S. geographic regions from 1999 to 2012. Using a longitudinal linear mixed-effects model, we demonstrated that age-adjusted incidence rates in lung cancer have decreased across the U.S., but racial and gender disparities persist. Although the racial gap has decreased over time, Blacks continue to have higher age-adjusted incidence rates for lung cancer than Whites, with these racial disparities being significantly worse among men than women. Black males bear the highest burden of lung cancer incidence of all subgroups, followed by White males, White females, and Black females. Importantly, our model predicts that the racial gap in lung cancer incidence among males will not disappear in the near future. In contrast, the gender gap will gradually disappear-by mid-2018 for Whites and by 2026 for Blacks-provided current lung cancer incidence rates remain within reasonable range.
The study revealed a strong association between lung cancer incidence rates and prevalence of cigarette smoking. Among all U.S. geographic regions, regardless of race and gender the Mid-South has both the highest overall lung cancer incidence rate (86.85) and percentage of daily smokers (23.06). Although there is a clear downward trend in cigarette smoking in both genders, males continue to have significantly higher rates of cigarette smoking than females at all time points, which is reflected in their higher lung cancer incidence rates at all time points. These findings are consistent with previous research, which attributes racial and gender disparities in lung cancer incidence rates to differences in tobacco use [27,28].
In addition to tobacco use, lung cancer incidence is associated with environmental and occupational exposures, family history, stage at diagnosis, and a number of psycho-social factors [27,[29][30][31][32]. While the National Lung Screening Trial showed a 20% reduction in risk of  death from lung cancer in high-risk patients screened with low-dose CT as opposed to chest radiography [33], no screening test has been shown to decrease incidence or mortality rates of lung cancer in the general population [34]. In the absence of viable screening options, prevention efforts need to focus on major population risk factors, such as tobacco use. It has been established that tobacco use is associated with socioeconomic status [35][36][37]. For example, Hu et al. report higher use of tobacco products among persons with a GED certificate or less than high-school education and those with annual household income <$20,000 [4]; they also observe higher prevalence of cigarette smoking in the Midwest and the South, which is corroborated by the findings of our study. Considering such evidence, approaches that may help reduce lung cancer incidence rates include [26,28]: 1) Counseling through tobacco quit lines and free nicotine replacement therapy; 2) Media campaigns to discourage initiation of smoking, encourage smoking cessation, and protect nonsmokers from second-hand smoke; 3) Tobacco and vapor-free policies in institutions and recreation facilities; 4) Reinforcing comprehensive smoking bans on airlines and in buildings; 5) Improved health coverage of smoking cessation treatments for all smokers, especially for pregnant women, federal employees, retirees, and their spouses and dependents; 6) Free radon tests in homes; 7) Continued surveillance of lung cancer incidence and smoking prevalence within racial and ethnic groups in the U.S.
The model presented in this paper accurately describes the dynamics of lung cancer incidence rates by race, gender, and smoking prevalence across the United States from 1999 to 2012. Previous research shows that lung cancer incidence varies also by histology: while squamous, large, and small cell carcinoma rates continue to decrease for all gender-race combinations, adenocarcinoma rates remain relatively constant in males and are increasing in females [38]. Future investigation should therefore include additional covariates of lung cancer incidence, such as histological type and tumor size. To the best of our knowledge, this work is the first attempt to analyze disparities in lung cancer in the United States using a longitudinal linear mixed-effects model. The developed model could help health professionals and policy makers make predictions about age-adjusted lung cancer incidence rates for approximately five to ten years after 2012.