Effect of socioeconomic status on stage at diagnosis of lung cancer in a hospital‐based multicenter retrospective clinical epidemiological study in China, 2005–2014

Abstract There is inconsistent evidence of associations between socioeconomic status (SES) and lung cancer stage in non‐Chinese populations up to now. We set out to determine how SES affects stage at diagnosis at both individual and area levels, from a hospital‐based multicenter 10‐year (2005–2014) retrospective clinical epidemiological study of 7184 primary lung cancer patients in mainland China. Individual‐level SES data were measured based on two indicators from case report forms of the study: an individual's education and occupation. Seven census indicator variables were used as surrogates for the area‐level SES with principal component analysis (PCA). Multivariate analysis was undertaken using binary logistic regressions and multinomial logit model to describe the association and explore the effect across tertiles on stage after adjusting for demographic variables. There was a significant stepwise gradient of effect across different stages in the highest tertile of area‐level SES, comparing with the lowest tertile of area‐level SES (ORs, 0.77, 0.67, and 0.29 for stage II, III, and IV). Patients with higher education were less likely to have stage IV lung cancer, comparing with the illiterate group (ORs, 0.52, 0.63, 0.71, 0.64 for primary school, middle school, high school, college degree or above subgroup, respectively). Findings suggest that the most socioeconomically deprived areas may be associated with a higher risk of advanced‐stage lung cancer, and increasing educational level may be correlated with a lower risk to be diagnosed at advanced stage in both men and women.


Introduction
Lung cancer is the leading cause of cancer death and the most common incident cancer in China. Strong social gradients in lung cancer incidence have been observed in China, with significantly higher age-standardized incidence rate in urban areas than in rural areas in 2015 (445.0 vs. 288.3 per 100,000) [1]. In China, highly unequal distribution of insurance benefits still exists under the current social health insurance programs, especially for the vulnerable groups such as children, women, low-income, and rural population [2,3]. The possible underlying causes of the different cancer outcomes among social groups include health care provision according to socioeconomic areas [4][5][6], health-related behaviors of individuals [7,8], and environmental or occupational exposures across socioeconomic groups [9][10][11]. Up to now, there is inconsistent evidence of the association between socioeconomic status (SES) and lung cancer stage from the current studies in non-Chinese populations. One Scottish study found that rates of earlystage cancer were higher in more deprived patients than less deprived [12]. Similarly, a US study showed that college graduates were more likely to be diagnosed with advanced stage at diagnosis compared with those without a college degree [13]. However, findings from a recent systematic review and meta-analysis have not shown the socioeconomic inequalities in late stage at diagnosis in the most studies, compared with the least, deprived group [14], most of included studies (5/7) were from the UK and USA.
China is experiencing urbanization at an unprecedented rate over the last two decades [15]. Moreover, the proceeding ambitious healthcare reforms aim to achieve equitable access to basic health services, and to build a safe, fair, and effective healthcare system for both urban and rural residents. The socioeconomic measurements have proved useful to monitor health inequalities and can provide fundamental implications for prevention initiatives and resource allocation [16][17][18]. To date, the role of SES, either at individual or area level, in shaping lung cancer risk has only been examined from outside of China [9,12,[19][20][21]. Since each study has used different variables and different approaches to estimate individual or neighborhood socioeconomic conditions, the accumulated evidence is difficult to assess systematically in China.
Albeit the best way to measure the extent to which the SES has influenced the health inequalities is to link electronic medical records from hospitals and census data from the bureau of statistics for the target population. For now, the electronic health record systems and electronic medical systems in China are still in their infancy, and the information is not available for researchers. In this study, we set out to determine how SES affects stage at diagnosis at both individual and area levels, from a stages in the highest tertile of area-level SES, comparing with the lowest tertile of area-level SES (ORs, 0.77, 0.67, and 0.29 for stage II, III, and IV). Patients with higher education were less likely to have stage IV lung cancer, comparing with the illiterate group (ORs, 0.52, 0.63, 0.71, 0.64 for primary school, middle school, high school, college degree or above subgroup, respectively). Findings suggest that the most socioeconomically deprived areas may be associated with a higher risk of advanced-stage lung cancer, and increasing educational level may be correlated with a lower risk to be diagnosed at advanced stage in both men and women.
hospital-based multicenter 10-year (2005-2014) retrospective clinical epidemiological study for primary lung cancer in mainland China. Furthermore, the paper outlines a reproducible approach to the individual and area deprivation index based on readily available data, and explores the joint effects of different levels of SES, gender, age, smoking, and other demographic characteristics in relation to stages of primary lung cancer.

Study design
This study was a hospital-based multicenter 10-year (2005-2014) retrospective clinical epidemiological study of primary lung cancer cases via medical chart review.
Hospital selection, case sampling, and data collection As part of the Cancer Screening Program in Urban China (CanSPUC) supported by the central government [22], the current survey was conducted in eight tertiary hospitals in eight provinces across China between March 2015 and August 2016. According to the traditional administrative district definition by the National Bureau of Statistics, China is stratified into seven geographic regions (North, Northeast, Central, South, East, Northwest, and Southwest), of which population urbanization and economic development level would vary widely. However, the national representativeness was taken carefully into account when selecting the hospitals. The sampling framework consisted of the highest level cancer hospital in each region (eight hospitals in seven regions). Finally eight tertiary hospitals from seven regions were included in this study using convenience sampling, which were The medical records of primary lung cancer patients diagnosed between 2005 and 2014 were collected by welltrained health professionals over a period of 2 years from 2015 to 2016. It can be summarized as follows.
Step 1: Using random assignment to the month. One of the months every year in each hospital was randomly selected to review the entire cases except for January and February, in which the whole Chinese people celebrated their Spring Festival and fly home to spend the holiday with family. There were much fewer patients during the time period and might be a potential confounding in valid outcomes comparison.
Step 2: Specify inclusion criteria for cases. The cases contained in medical records database should be older than 18 years and have completed information relating to key demographic and lifestyle factors, diagnostic information (pathologic TNM stage or clinical TNM stage), surgery approaches, use of radiotherapy and chemotherapy, molecular targeted therapy, and pathologic characteristics for lung cancer. Those who were only diagnosed with lung cancer or underwent surgery followed up after surgery without any treatment in the hospital were excluded from our study.
Step 3: Conduct a thorough review. The medical records were reviewed in each local hospital by local clerks who had been trained systematically. The clerks started to choose in a forward manner from the first day of selected month. As soon as he/she extracted a hundred medical reports in this month, he/she would turn to other randomized month in the next year. The clerks would indicate reasons for inclusion or exclusion in a special designed table to verify the accuracy of recorded information at the same time. The contents of the case report form (CRF) were designed into an initial questionnaire by experts from cancer epidemiology, pathology, imaging diagnosis, thoracic surgery, medical oncology, radiation oncology, and general medicine, and after a presurvey, revised repeatedly into a formal questionnaire eventually. It was used to extract the information of medical reports as described above.
Step 4: Recode the raw variables. According to the designed CRF, the raw variables were encoded for analysis. All the variables were double-entered from the paper or electronic medical record to computer-based database (EpiData 3.1) by two local well-trained clerks, and then were sent to National Office of CanSPUC for the data check. Patients were assigned stage based on pathologic TNM when available and clinical TNM otherwise. Staging was categorized into seven groups based on the seventh edition of American Joint Committee on Cancer (AJCC) tumor-node-metastasis (TNM) staging system for lung cancer (I, IIA, IIB, IIIA, IIIB, IV, and unknown or not applicable stage). Advanced stage refers to stage IIIB-IV, with stage I-IIIA classifying nonadvanced-stage cancer. The stage was independently classified and checked by experienced clinicians blinded to the patients' deprivation status.
In order to be consistent with the other research findings in the future based on the study, we included extensive/ localized stage small-cell lung cancer cases without clinical stage I-IV information in the descriptive analysis. When conducting multivariate analysis, we excluded 171 small-cell lung cancer cases due to unknown or not applicable stage. As a result, the final study population consisted of 7184 individuals (5262 men and 1922 women). A total of 7013 cases (5119 men and 1894 women) were used for exploring the association between SES and stage at diagnosis.
The study protocol was approved by the Institutional Review Board of the Cancer Hospital of Chinese Academy of Medical Sciences.

Measurement of individual-level and arealevel SES characteristics
Individual-level SES data were obtained and measured based on two indicators from CRF of the retrospective survey: an individual's educational level and occupational rank, based on Hollingshead's personal rating of people's relative social standing in New Haven, CT, in the early 1960s [23]. Education represents knowledge and skills; occupation captures material and social resources and assets [9]. Education was categorized to five levels: illiterate, primary school, middle school, high school, and college degree or above. With regard to occupation, both European and United States conceptualized occupations as a social relationship based on a graded hierarchy of occupations ranked according to skill [23]. Although such definition of occupation was the best representation of SES, the raw data were not available in China. Given the farmers or rural migrant workers as a huge and special population in China, occupation in our study was classified into two categories: farmer/migrant workers; nonfarmer/nonmigrant workers.
The CRF does not collect area-level SES characteristics. Therefore, we used the census data to reflect the influence of area-based SES. We attempted to recreate some indices because some census indicator variables of SES (such as percent older than 16 years in workforce without job, percent blue-collar workers, and median rent) developed by Yost were not available in China [24]. Thus, GDP per capita, percentage of illiteracy aged 15 and over, per capita annual income of urban household, number of hospital beds per 1000 people, number of health technical personnel per 1000 people, healthy life expectancy, and the infant mortality rate (IMR) were included as a surrogate for area-level SES. All of variables of eight districts or provinces were obtained from annual reports issued by National Bureau of Statistics and National Health and Family Planning Commission in China from 2005 to 2014. Then we used a comprehensive socioeconomic index to represent the SES of each area based on these variables.

Statistical methods
A standardized index score was created to represent the area-level SES by combining the seven indicator variables as defined above using principal component analysis (PCA). The number of components retained for extraction was based on the Kaiser criterion (eigenvalue ≥1.0). According to PCA result, the standardized component scores for the eight areas were sorted and ranked into tertiles (factor scores ranging from low [first tertile] to high [third tertile]).
Descriptive analysis was conducted tabulating the demographic and socioeconomic characteristics of the study population. Comparison of proportions between the stage and SES was made using the chi-square test or Fisher's exact test. Multivariate analysis was undertaken using binary logistic regressions to describe association between individual/area-level SES and stage at diagnosis after adjusting for age, sex, marital status, body mass index (BMI), medical insurance type, smoking, drinking, and history of respiratory diseases. In addition, we considered analysis of categorical data using a multinomial logit model to explore the effect across tertiles on stage among lung cancer patients. The statistical analyses were performed using SAS version 9.4 (SAS Institute Inc, Cary, NC).

Results
A total of 5262 men and 1922 women diagnosed with incident lung cancer between 2005 and 2014 were collected from seven regions in China. Table 1 shows the seven census indicator variables of SES and the standardized index score for eight provinces in these regions. In general, all the indicator variables (except life expectancy and IMR) were significantly different among the areas (P < 0.001). Yunnan, Gansu, and Anhui provinces were assigned the lowest tertile group, Guangxi and Hunan provinces were the second tertile group. Shanxi, Liaoning and Zhejiang provinces were ranked as the highest tertile group with highest levels in economic development, medical resources allocation, and health care quality.
Overall 7013 cases were staged with a detailed assessment of the tumor stage, while fewer cases (2.4%) were presented as unknown or not applicable stage. As can be seen in Table 2, patients with advanced-stage (IIIB-IV) lung cancer accounted for 42.5% of the total population. Patients older than 75 years were less likely than those younger than 50 years to have advanced-stage lung cancer (39.5% vs. 53.4%, P < 0.001). We also observed an obvious difference in stage distribution for people with BMI ≥ 30 and BMI < 25 groups (24.7% vs. 41.7%, P < 0.001). The proportion of current and ever-smokers at diagnosis was 57.0%. In the individual-level SES analysis, the educational level was slightly negatively associated with lung cancer stage. The proportion of patients with advanced stage in the illiterate group (47.7%) was higher compared to the primary school (44.7%), middle school (44.8%), Y. Li et al. Effect of SES on Stage of Lung Cancer high school (41.9%), and college degree or above group (46.7%). Farmers were more likely to be diagnosed at advanced stage than nonfarmers (46.2% vs. 42.8%, P < 0.05). Although no negative SES gradient persisted in the tertile group, we found a massive decrease in percentage of 22.4% for advanced-stage cases in the highest tertile group compared with cases in the lowest tertile group (25.5% vs. 47.9%, P < 0.001).
To further explore the association between demographic characteristics, individual/area-level SES, and having advanced-stage (IIIB-IV) disease at diagnosis, we used binary logistic regression and implemented stratified analysis by sex (Table 3). In the unadjusted models, the OR for having advanced-stage lung cancer among patients with the least versus most deprived tertile of area-level SES was 0.35 (95%CI 0.30-0.40) for men, 0.29 (95%CI 0.23-0.36) for women, and 0.33 (95%CI 0.29-0.37) for both. Adding demographic variables attenuated the OR to 0.37 (95%CI 0.32-0.44) for men, 0.39 (95%CI 0.30-0.50) for women, and 0.37 (95%CI 0.33-0.42) in both. On the contrary, an inverse relationship was observed for middle versus most deprived tertile of area-level SES. Nonfarmers (e.g., government employee, company employee, selfemployed Individuals, manual worker) for women and the both sexes were less likely to be diagnosed at advanced stage than farmers before adjustment for age, BMI and other factors (OR = 0.75; 95%CI 0.62-0.92 in women; OR = 0.87; 95%CI 0.78-0.96 in both sexes). After controlling for all demographic variables, a negative statistically significant association remained between educational SES and having advanced-stage lung cancer for all subgroups of patients except high school and college degree or above in women. Percentage of illiteracy aged 15 and over (%) (mean ± SD, 2005-2014). 3 Annual income per capita of urban household (yuan per person per year) (mean ± SD, 2005-2014). 4 Number of hospital beds per 1000 people (mean ± SD, 2005-2014). 5 Number of health technical personnel per 1000 people (mean ± SD, 2005-2014). 6 Life expectancy (years) only for 2010 (mean). 7 Infant mortality rate (‰) only for 2010 (mean). 8 Following the principle component analysis, the two factors extracted from the variables were retained with eigenvalue ≥ 1.0, which can explain 79.9% of the variance.
Effect of SES on Stage of Lung Cancer Y. Li et al.     Result of univariate methods before multiple logistic regression analysis. 2 Adjusted for age, BMI, medical insurance type, smoking, educational level, occupational level, and area-level SES of men in binary logistic regression model. 3 Adjusted for age, BMI, medical insurance type, smoking, history of respiratory diseases, educational level, occupational level, and area-level SES of women in binary logistic regression model. 4 Adjusted for age, BMI, medical insurance type, smoking, history of respiratory diseases, educational level, occupational level, and area-level SES of the whole population in binary logistic regression model.  of the lowest tertile. Educational SES index was also associated with the stage IV disease. Patients with higher education were 36%, 29%, 37%, and 48% less likely to having stage IV lung cancer, compared with the illiterate subgroup (OR = 0.64[95%CI = 0.48-0.85] for primary school; OR = 0.71[95%CI = 0.53-0.94] for middle school; OR = 0.63[95%CI = 0.46-0.88] for high school; OR = 0.52[95%CI = 0.36-0.76] for college degree or above. Nonfarmers seemed to have a less risk of having higher stages at diagnosis compared with farmers before the adjustment, but the staging differences across occupation had shown no significant effects when the demographic factors were added to the model.

Discussion
This study showed that a clinical relevance existed between socioeconomic disparities and stage at diagnosis for primary lung cancer in mainland China. In this hospital-based analysis from eight provinces, the least socioeconomically deprived areas were associated with a lower risk of having advanced-stage lung cancer, whereas patients who belong to the median deprived areas had a significant increased risk of having advanced-stage cancer when compared with patients from the most deprived areas. Moreover, we also found increasing educational level was associated with a decreased risk to be diagnosed at advanced-stage lung cancer in both men and women. To the best of our knowledge, this is the first study to consider, simultaneously, measures of socioeconomic status at the individual and regional levels and their influence on primary lung cancer clinical outcomes in mainland China.
Information on SES for individuals is not reported in the current national cancer registry database [25]. Therefore, following the recommendation regarding measures of social class for public health research and surveillance by Krieger, we used the occupational and educational categories as a proxy measure of individual SES, with the consideration of unavailable income information in our CRF database. For the occupation category, the best known and longest employed of the occupational class measure is the British Registrar General's social class schema which is based on skill and status. In the USA, the census occupational data can be meaningfully grouped to create a class-based measure (e.g., the administrative support, sales, and other six census-defined occupational groups are defined as working class). The occupation data collected from the medical records of the study is different from the USA. There are eight occupational groups including government worker, corporate personnel, office staff, selfemployed individuals, freelancer, soldier, unemployed, and farmer/rural migrant worker. Obviously, the occupational class from the records can not reflect the difference between material and social resources and assets. However, Farmers or rural migrant workers are a huge and special group in China and their identity is different from citizen and other occupational groups. As a vulnerable group in China, the farmers are poorly educated and skilled, with limited access to health care. So the classification in our study can capture occupations as a measure of what Stevenson termed "standing within the community" or "culture" [23].
We found that patients with higher education degree incurred a moderate decrease at the advanced stage at diagnosis. In fact, many studies revealed a similar association between education/income and cancer outcome and the impact varied by race/ethnic, smoking, insurance style, and cancer screening program [19,[26][27][28][29][30]. Moreover, the two studies suggested that more deprived patients were likely to present with more advanced-stage cancer in non-UHCS in the USA. One study from Silverstein et al. suggested that lower socio-economic position (using per capital, income and education as measure of SEP) were likely to present with distant stage at diagnosis compare to high SEP without statistical significance (OR = 1.06[95%CI = 0.28-3.96]). [31]. The other study from Schwartz et al. used the aggregate SES variable (including occupation, poverty, education, and age) found to contribute significantly to risk of nonlocalized stage in higher compared to low SES (OR = 1.28[95%CI = 1.12-1.45]) [20]. The most other studies found no association according to a recent systematic review [14]. There were several plausible explanatory factors for this phenomenon in China. However, individuals with low educational level or low income seem to be less likely to seek medical advice or undergo treatment for a cough or hoarseness, prone to smoking [32], or lived close to cancer causing substances, such as asbestos, arsenic, coal, and diesel engine exhaust [9,33]. According to the result of the 2015 China Adult Tobacco Survey, the percent of current smoking rate with low education level (middle school degree or below) was higher (60.0%) compared with high education level (college degree or above) in men (41.9%) [34]. Friedemann et al., reported being a smoker was associated with reduced likelihood of help-seeking, and one contributor to late-stage diagnosis could be patient delay in help-seeking [35]. Moreover, public hospitals and medical centers that can provide routine physical examination or the lung cancer screening program were primarily located in the advantaged district of urban areas, and the medical expenditure for diagnosis and treatment seemed catastrophic for low-income patients with lung cancer in China.
For the area-level or neighborhood SES, the illiteracy percentage index, the household income index, and other typical indices are frequently used in order to judge arealevel SES in most studies. However, there are no certain criteria of judging the degree of area-level social-economic Y. Li et al. Effect of SES on Stage of Lung Cancer status. Each researcher utilized these SES indices according to their own criteria so far [9,36,37]. So there is lack of comparative data for the several indicators. In this situation, we used a multifactorial socioeconomic index which was created from census indicator variables of SES (education index, median household income) developed by Yost and some new variables (GDP per capita, number of hospital beds per 1000 people, number of health technical personnel per 1000 people, healthy life expectancy, and the infant mortality rate). Each variable measures a different aspect of area-level SES, capturing area-level SES more aptly. Considering strong correlation between the selected input variables, the PCA technique effectively deals with multicollinearity. The deprivation indices represented an attempt to more accurately reflect the multidimensional character of regional socioeconomic position. We identified low area-level SES is an independent indication in diagnosing advanced-stage lung cancer after adjusting for demographic variables. However, we did not observe dose-response gradients between area-level SES and stage. The negative association was entirely limited to the most versus least deprived tertile of area-level SES, suggesting that some confounding factors could influence cancer staging. Hystad et al. found no linear dose--response relationships were observed for unadjusted or adjusted models and the elevated ORs of 1.66 were only restricted to the lowest quintile of neighborhood SES index [29]. Moreover, we also found some conflicting result about relationship between the median deprived areas index and stage. Our data showed that an observed increasing probability to be diagnosed at stage IIIB or IV in median versus most deprived areas after adjusting for demographic variables (ORs, 1.39, 1.41). Given that some unmeasured risk factors such as individual health-related behaviors and access to health may contribute to the development and progression of lung cancer, it is hypothesized that, with the improvement of living conditions, individuals living in the median deprived areas may be more likely to eat energy-dense, nutrient-poor foods, prone to smoking more often, and adopt sedentary behaviors than those of lowest area-level SES [38,39]. Meanwhile they still had worse access to health services than those with highest area-level SES. And because the eight provinces covered large geographic areas and tens of millions of people, the area-level SES measures may not capture different exposure factors. More research is needed to examine the association between SES and lung cancer in a smaller unit, such as community level considering other important etiological factors.
Our study has some potential limitations that should be considered in the interpretation of the results. First, we did not include some indicator variables of SES, such as annual personal income, percent of older than 16 years in workforce without job in our analysis, which may fail to capture some implication for outcome. We used annual average of census data in 2005-2014 to minimize error in determining relevant time period to estimate area-level SES. Second, we used the hospitalbased data and the patients from the leading public cancer hospital of the province may not represent the whole population of the area. Those who chose the highest level hospital, in other words, the most expensive hospital may be different from those who chose township-or municipal-level hospitals. As a result, the demographic and socioeconomic characteristic of the cancer cases from the eight hospitals may be different from the common people nationwide. Third, considering relevant pathways by which socioeconomic status may affect the stage of lung cancer, our analyses could benefit from the addition of other useful information such as working circumstance, the effect of air pollution and cancer screening program in the targeted area. Fourth, we used the convenience sampling instead of random sampling methods. We give the general framework for sampling considering the number of cases in each month, that is to say, the clerks should extract one hundred medical reports in one month every year. Because there were less than one hundred cases in January and February in most years of most hospitals, we had to choose two or three months to extract the information if we included the two months. Thus we excluded January and February in the convenience sampling process. In addition, the method of this sampling has been used in a similar retrospective clinical epidemiological study of breast cancer, which has published more than 20 papers [40].
In conclusion, our study outlines a reproducible approach to the development of SES indices at the individual and area levels simultaneously according to readily available census data in China. Using data from seven socio-demographically diverse regions, it shows that a clinical relevance existed between socioeconomic disparities and lung cancer stage at diagnosis in China. These results provide evidence that public health policy makers should allocate efficiently, the limited medical resource to those socially deprived individuals, such as farmers or rural migrant workers with low education background, and provide better accessibility to undertake diagnosing, health lifestyle-related education information for rural and undeveloped areas.

Conflicts of Interest
Authors report no financial disclosures or conflict of interest.