Methylation-derived inflammatory measures and lung cancer risk and survival

Examining immunity-related DNA methylation alterations in blood could help elucidate the role of the immune response in lung cancer etiology and aid in discovering factors that are key to lung cancer development and progression. In a nested, matched case–control study, we estimated methylation-derived NLR (mdNLR) and quantified DNA methylation levels at loci previously linked with circulating concentrations of C-reactive protein (CRP). We examined associations between these measures and lung cancer risk and survival. Using conditional logistic regression and further adjusting for BMI, batch effects, and a smoking-based methylation score, we observed a 47% increased risk of non-small cell lung cancer (NSCLC) for one standard deviation (SD) increase in mdNLR (n = 150 pairs; OR: 1.47, 95% CI 1.08, 2.02). Using a similar model, the estimated CRP Scores were inversely associated with risk of NSCLC (e.g., Score 1 OR: 0.57, 95% CI: 0.40, 0.81). Using Cox proportional hazards models adjusting for age, sex, smoking status, methylation-predicted pack-years, BMI, batch effect, and stage, we observed a 28% increased risk of dying from lung cancer (n = 145 deaths in 205 cases; HR: 1.28, 95% CI: 1.09, 1.50) for one SD increase in mdNLR. Our study demonstrates that immunity status measured with DNA methylation markers is associated with lung cancer a decade or more prior to cancer diagnosis. A better understanding of immunity-associated methylation-based biomarkers in lung cancer development could provide insight into critical pathways.


Background
Lung cancer is the leading cause of cancer death in the USA, projected to account for 21.7% of all cancer deaths in 2021 [1]. A large percentage of lung cancer patients are diagnosed at an advanced stage [2] and five-year relative survival rates for those patients are between 3 and 6% [3]. Thus, early detection remains a key strategy to improve survival. However, the currently recommended strategy for lung cancer screening-low-dose computed tomography (LDCT) for persons 50 to 80 years old with at least a 20 pack-year smoking history and currently smoke or have quit within the past 15 years-is expensive and has a high false positive rate [4,5]. Modifying the current lung cancer screening strategy by performing risk stratification could help prioritize LDCT screening and optimize secondary prevention. We propose that immune system markers could be incorporated into such risk stratification tools to help identify persons at higher risk of lung cancer to target for screening.
While smoking is the most important risk factor for lung cancer in the population, there is growing evidence that the immune system, in response to or independent of smoking, plays an important role in lung cancer development, acting potentially through the genesis of chronic inflammation [6]. For instance, an aggregated genome-wide association studies (GWASs) analysis of lung cancer risk found a direct causal effect of BMI on small cell lung cancer and an inverse effect on lung adenocarcinoma, suggesting the complexity of the role BMI and chronic inflammation plays in lung cancer subtypes [7]. Furthermore, it is plausible that inflammatory profiles prior to lung cancer diagnosis are associated with lung cancer-specific survival. Markers of systemic inflammation, including elevated levels of C-reactive protein (CRP) and the peripheral blood neutrophil-to-lymphocyte ratio (NLR), also have been identified as robust markers of cancer-associated inflammation [8,9]. Elevated CRP levels [8], elevated serum levels of pro-inflammatory cytokines [10][11][12], increased neutrophil counts and decreased lymphocyte counts [13,14], and polymorphisms in inflammation-related genes [15][16][17][18] have been associated with increased lung cancer risk. These inflammatory measures have also been associated with poor survival of lung cancer patients in several retrospective and a few prospective studies [19][20][21]. In addition, both experimental and epidemiologic studies support a role for chronic inflammation as a hallmark of cancer development and progression [8,[22][23][24][25]. We posit that a better understanding of the role of inflammation in lung cancer etiology could be gained by examining DNA methylation alterations in blood that are associated with the systemic immune response.
In the current study, we first predicted peripheral blood leukocyte composition and a neutrophil to lymphocyte index using validated DNA methylation markers (mdNLR), then quantified DNA methylation levels at loci previously linked with circulating concentrations of CRP, and calculated methylation-derived immune cell ratios by using an expanded deconvolution library. We evaluated the associations of these potential markers with lung cancer risk and lung cancer-specific survival. To address this question, we used pre-diagnostic blood samples of cases and controls obtained from the CLUE I/II cohorts. Our analyses controlled for self-reported smoking and methylation-predicted cumulative smoking in order to better focus our examinations on the DNA methylation marks that are informative of the immune response profile [26].

Population characteristics
Characteristics of the 208 lung cancer cases and their 208 matched controls included in this analysis are presented in Table 1. Over 99% of the majority of participants were White. The median time between blood draw and lung cancer diagnosis was 14 years. The median age at blood draw in 1989 was 59 and 57 years in cases and controls, respectively. Overall, 55% of cases and controls were women and 11% were never smokers (Table 1).

Methylation-derived mdNLR index, leukocyte proportions, and lung cancer risk
We observed a 47% increased risk of non-small cell lung cancer (NSCLC) for one standard deviation increase in mdNLR (n = 150 pairs; OR: 1.47 [1.08, 2.02]). However, higher mdNLR values were not statistically associated with overall risk of lung cancer in our study. This association was comparable for NSCLC cases diagnosed within 10 years and beyond 10 years after blood draw. No stable associations could be estimated for small cell lung cancer (SCLC). After multiple comparison adjustments, monocyte/lymphocyte ratio showed a borderline significant 65% increased risk of NSCLC for each standard deviation increase (n = 150 pairs; OR: 1.65, adjusted CI: [0.99, 2.76]). In addition, immune cell ratios for CD4/CD8, NLR, B cell/lymphocyte, T cell/lymphocyte, Neu + Mono/lymphocyte, Eos/lymphocyte, CD4nv/ lymphocyte, B cell/CD8, CD8/Treg, Bnv/Bmem, CD4nv/ CD4mem, CD8nv/CD8mem, and Treg > 0 vs. Treg = 0 were not statistically significantly associated with lung cancer risk overall or by histologic types (Table 2).

Methylation-derived CRP scores and lung cancer risk
CRP Score 1 was built using 54 CpG sites that were previously associated with inflammatory markers, while CRP Score 2 and 3 were each built with a subset of these 54 CpGs that were putative cell-specific or cell type invariant, respectively. Using data from a previously published pancreatic cancer dataset [27], all three scores were moderately correlated with log CRP and log IL-6 levels ( Table 3). In this nested case-control study, we found all three CRP Scores inversely associated with risk of NSCLC after additionally adjusting for methylation-predicted pack-years (n = 150 pairs; Score 1 OR: 0. 57  . We also found statistically significant inverse association between CRP Score 1 and risk of NSCLC among cases diagnosed within 10 years and beyond 10 years, and between CRP Score 2 for NSCLC cases diagnosed within 10 years of blood draw (

Survival analysis
We examined whether the mdNLR, methylation-derived immune cell ratios, and CRP Scores were associated with risk of dying of lung cancer among lung cancer cases ( Table 5, Fig. 1).
We observed a 47% increased risk of dying for one standard deviation of mdNLR for NSCLC cases (n = 149 cases; HR: 1.47 [1.20, 1.81]). Among the NSCLC cases whose mdNLR was from < = 10 years before their diagnosis, we found a 73% increased risk of dying for a one standard deviation increase in mdNLR (HR: 1.73 [1.19, 2.51]). In comparison, the risk of dying for a one standard deviation increase in mdNLR was lower among the NSCLC cases whose mdNLR was from 10 to 25 years prior to diagnosis (HR: 1.39 [1.05, 1.85]). Lastly, we observed a 28% increased risk of dying from lung cancer for one standard deviation increase in mdNLR (n = 205 cases deleted 3 cases with personyear = 0 or > 25 years; HR: 1.28 [1.09, 1.50]).

Discussion
Our study prospectively assessed predicted immune cell profiles using DNA methylation markers and examined associations between previously identified DNA methylation markers of inflammation and lung cancer risk and survival. Using pre-diagnostic blood samples of lung cancer cases and controls who participated in the CLUE I/II cohorts [23], pre-diagnosis mdNLR was associated with increased risk of NSCLC, and among cases, with total lung cancer and NSCLC lung cancer-specific death. In Table 3 Correlations between methylation-based CRP scores and circulating log-CRP level, log-IL6 level, peripheral blood leukocyte types, BMI, and smoking score residual among controls only a Correlations with log-CRP level and log-IL6 level were tested with a pancreatic cancer dataset [43] b CpG Score 1 is built using 54 CpG sites c CpG Score 2 is built using the top 10 highly cell-specific CpG sites d CpG Score 3 is built using the 10 modestly cell-specific CpG sites e We calculated a pack-years methylation score to represent pack-years smoked associated methylation alterations. This score correlates with gene expression changes that are affected by smoking   addition, we built a series of methylation-derived CRP scores to capture individual systemic inflammatory profiles years before lung cancer diagnosis; these scores were inversely associated with risk of lung cancer, especially for NSCLC after adjusting for methylation-predicted pack-years smoked, but not with lung cancer-specific mortality. Studies on NLR (calculated from measured WBC differentials) and lung cancer risk and survival typically measure pre-treatment NLR at diagnosis or up to 30 days prior to treatment [28][29][30]. Unlike prior studies, we were able to assess individual systemic inflammation profiles many years prior to diagnosis by using methylation markers of inflammation. Our study is not directly comparable to prior studies since we measured mdNLR using blood samples from subjects with a median of 14 years prior to lung cancer diagnosis. In addition, most cases in our study were diagnosed before the widespread use of immunotherapy. To our knowledge, only one other cohort, the multicenter β-Carotene and Retinol Efficacy Trial (CARET), examined pre-diagnosis mdNLR and lung cancer risk and survival using blood drawn years prior to diagnosis (median 4.7 years) [31,32]. CARET, a study of heavy smokers, reported a 21% increased risk of lung cancer per one unit increase in mdNLR (OR: 1.21 [1.01, 1.45]), a 30% increased risk of NSCLC for one unit increase in mdNLR (OR: 1.30 [1.03, 1.63], and no association between higher pre-diagnosis mdNLR and risk of developing SCLC (OR: 1.06 [0.77, 1.47]) [31]. Like in CARET, in CLUE I/II we observed a 47% increased risk of NSCLC for a one standard deviation increase in mdNLR (n = 150 pairs; OR: 1.47 [1.08, 2.02]), but in contrast to CARET, we found no statistically significant association for overall lung cancer risk.
CARET researchers recently reported that pre-diagnosis mdNLR was positively associated with increased mortality for SCLC cases, but not for other case types [32]. In comparison, we observed a positive association between pre-diagnosis mdNLR and lung cancer-specific and NSCLC-specific mortality. In the case of SCLC, the number of cases was too limited for us to estimate stable associations (N = 29). Taken together, the CLUE and CARET results suggest that a systemic inflammatory profile marked by elevated NLR could indicate a lesser ability to mount a robust immune response to a developing lung cancer and/or a more favorable environment for  cancer progression. Differences in findings between the two studies could stem from differences in study populations. The CARET cohort is exclusively heavy smokers, including a subgroup exposed to asbestos. In comparison, our analysis in the CLUE I/II cohorts included never, ever, and current smokers. Furthermore, our study population had a lower mdNLR in the lung cancer cases (mean 1.86 and SD 1.32) than in CARET (mdNLR mean 2.18 and SD 1.46). Using a newly expanded deconvolution library, we were able to parse apart the granulocyte subtypes (neutrophils, eosinophils, and basophils) and investigate the balance between naïve and memory cell compartments for lung cancer. Previous research has identified the monocyte/ lymphocyte (or lymphocyte/monocyte) ratio as an independent prognostic factor in NSCLC, demonstrating significant association with overall survival in patients with NSCLC [33][34][35]. In comparison, our exploratory analyses of immune cell ratios suggest that one standard deviation increase in the monocyte/lymphocyte ratio could potentially indicate increased risk of NSCLC after additionally adjusting for methylation-predicted pack-years. In addition, we found an increased risk of dying from lung cancer associated with an increase in Neu + Mono/ lymphocyte ratio among the NSCLC cases after multiple comparison adjustments.
We also investigated three CRP Scores that we built from 54 CpG sites that had been strongly associated with CRP in previous studies. We found these methylationpredicted CRP Scores to be moderately correlated with log-CRP and log-IL6 in the controls of a previously published pancreatic cancer dataset [27]. CRP is a systemic marker of chronic inflammation and has been reported as a risk factor for cancer development [36]. Previous studies of pre-diagnostic circulating CRP concentration and lung cancer risk (7 cohorts [10,11,19,[37][38][39] and 3 nested case-control studies [8,12,40]) have consistently found a moderate positive association between pre-diagnostic CRP concentrations and lung cancer risk. In our study, CRP Scores were not associated with lung risk when taking into account the matching factors, BMI, and batch effects. However, we observed an inverse association when additionally adjusting for methylationpredicted pack-year. Our results suggest that when strict control of smoking is applied, our CRP Score is likely capturing the unique individual immune response that is not driven by smoking.
Furthermore, these results provide preliminary evidence supporting the hypothesis that systemic inflammation not driven by smoking could have a protective effect on individuals. While smoking is by far the most important risk factor for lung cancer, our DNA methylationbased CRP Scores provide the opportunity to examine inflammatory measures not related to smoking that could play a role in modulating cancer risk years prior to diagnosis. Lastly, our experience with the CRP Scores suggests that measuring methylation-derived inflammatory responses using pre-diagnostic samples provides the opportunity to capture informative individual systemic inflammatory profiles years prior to diagnosis, potentially shedding light on risk factors key to lung cancer development and progression, e.g., underlying genetics, exposure to environmental risk factors, and behavior risk factors.
Like other observational studies, our study included a limited number of NSCLC and SCLC cases. The relatively small sample size of SCLC cases (N = 29) impacted our ability to observe associations for this subtype (SCLC comprises about 15% of lung cancer cases in the USA). In our survival analysis, we adjusted for stage and restricted our analysis to samples whose time between blood draw and date of lung cancer diagnosis was less than 25 years; however, our survival analysis did not have access to post-diagnosis smoking status information. Our study is also limited by a lack of replication dataset and reduced generalizability. (Study population is mainly White and with very few cases in never smokers.) The CRP Scores we built should be investigated in other populations to ensure that what we observed did not arise due to chance.

Conclusions
Our study suggests that elevated pre-diagnosis mdNLR and a lower non-smoking-related systemic inflammatory profile before diagnosis are associated with higher cancer risk and poorer lung cancer-specific survival. These relationships were especially evident for NSCLC. As the most common subtype of lung cancer, most NSCLC cases are diagnosed with locally advanced or metastatic disease. Our prospective results support future evaluation of whether DNA methylation-based inflammatory measures could enhance lung cancer risk stratification to improve targeted lung cancer screening.

Study Population
This nested case-control study selected cases and controls from individuals who participated and provided blood in both CLUE I and CLUE II [26]. The CLUE I cohort was developed to identify serologic precursors of cancer and was conducted in Washington County, Maryland, in the fall of 1974. A blood sample was collected from 25,620 volunteers at the time of participation [41,42]. The CLUE II cohort was conducted from May through October 1989. During this time, 32,894 participants donated a blood sample which was collected in tubes containing heparin and kept chilled until centrifuged, aliquoted into plasma, erythrocytes, and buffy coat, and frozen at 70 °C [43]. In CLUE II, the baseline for this study, health information was collected at the time of blood draw, including attained education, cigarette smoking status, cigarette smoking dose, cigar/pipe smoking status, and self-reported weight and height.
Incident lung cancer cases were ascertained from linkage to the Washington County cancer registry (before 1992 to the present) and the Maryland Cancer Registry (since 1992 when it began to the present). We ascertained 241 incident lung cancer cases who participated in CLUE I and were diagnosed after the day of blood draw in CLUE II through January 2018. Cases were characterized with respect to histology. We used incidence density sampling to select one control matched to each case on age, sex, smoking status and intensity (cig/day), and cigar/pipe smoking status. Death from lung cancer as the underlying cause was obtained from death certificates. The Institutional Review Board at the Johns Hopkins Bloomberg School of Public Health and the Tufts University Health Sciences Campus Institutional Review Board approved this study.

DNA methylation measurements
Extracted DNA was bisulfite-treated using the EZ DNA Methylation Kit (Zymo), and DNA methylation was measured with the 850 K Illumina Infinium Methyla-tionEPIC BeadChip Arrays (Illumina, Inc., CA, USA). All samples and all array experiments were performed blinded to case-control status. Details on DNA methylation measurements, data preprocessing processing, and quality control assessment/screening are provided in the Additional file 1. The 850 K methylation microarray has been validated from a biological and technical standpoint. Reproducibility of results from 850 K Illumina array has been previously shown to be very high (r = 0.997) [44]. DNA volume and quality were sufficient for 208 of the cases and 222 controls totaling 208 matched pairs.

Methylation-Derived Neutrophil Lymphocyte Ratio (mdNLR)
The peripheral blood neutrophil-to-lymphocyte ratio (NLR) is a cytological marker of both inflammation and poor outcomes in cancer patients [48][49][50][51][52]. We used a DNA methylation-derived NLR (mdNLR) index to predict the common clinical NLR parameter using a previously described approach [9]. This index is based on normal isolated leukocyte reference DNA methylation libraries and established reference-based cell mixture deconvolution algorithms [9,53].

Inflammation-associated CpG score
We used 54 CpG sites that have been strongly associated with C-reactive protein (CRP) [54,55] to build three CRP Scores. We selected these 54 CpGs (remaining 4 were not on the 850 K array that we used) from the 58 CpGs identified by Ligthart and colleagues [54] for their association with serum CRP level (listed in Table 3) using 450 K DNA methylation data. Forty-five of these 58 CpG sites were validated to have the same direction of protein-methylation associations by Myte et al. [55]. These CpGs, while identified based on their CRP association, have also been shown to be associated with other inflammatory mediators [54][55][56]. To compute CRP Score 1, we multiplied the beta value at each selected CpG site with the effect size estimates reported by Ligthart et al. These estimated beta coefficients represented the change in DNA methylation per one unit increase in log CRP. In the CRP Score 1 formula, we weighted the beta coefficients estimated by Ligthart et al. with their corresponding standard errors.
B ij is the beta value for the ith participant at the jth CpG site. ∆ j is the beta coefficients reported by Ligthart  Since most of the estimated beta coefficients are negative, CRP Score 1 ranged between − 0.059 and -0.026 in these participants. A score closer to zero indicated higher CRP levels. Based on CRP Score 1, we computed two additional CRP Scores, one cell (leukocyte)-type invariant (CRP Score 2) and one cell-specific (CRP Score 3). Among the 54 inflammation (CRP)-associated CpGs, we identified putative cell-type invariant and cell-specific CpGs by conducting ANOVA using the dataset described in Salas and Koestler et al. [47] and publicly available on the Gene Expression Omnibus (GSE110555). The dataset used for this ANOVA consisted of EPIC methylation data profiled in purified leukocyte cell population isolated from different healthy adults. Specifically, methylation signatures were available for CD4 + T cells, CD8 + T cells, NK cells, B cells, monocytes, and neutrophils. One-way ANOVA models were fit independently to each of the 54 CRP-associated CpGs treating methylation as the dependent variable and cell type as the independent variable. We tested the null hypothesis that the mean methylation beta-value is the same across the cell types. The F-statistic, corresponding p value, and maximum absolute pairwise difference in the mean methylation beta value across cell types were calculated for each of the 54 CpGs. We then selected subgroups of CpG sites that had the top 10 smallest or top 10 largest F-statistic value to build the two additional CRP Scores. CRP Score 2 consists of putative cell-specific CpGs with high F-statistics, e.g., those exhibiting a difference in mean methylation beta-values between at least two of the six cell types. CRP Score 3 is made of cell-type invariant CpGs with low F-statistics, e.g., CpGs for which there did not appear to be a substantial difference in mean methylation beta-values across the normal six leukocyte subtypes. Score 2 ranged between − 0.0002 and 0.0046, while Score 3 ranged between − 0.025 and − 0.016. In the regression analyses, we used a standardized version of CRP Scores 1, 2, and 3 (mean = 0, sd = 1) for easier interpretation of results and allowing us to compare the results for each of the scores.

Statistical analyses
All statistical analyses were performed in R (version 3.5.1). We estimated mdNLR as described above, used an independent pancreatic cancer dataset [27] to estimate the correlation between estimated values of CRP Scores 1-3 with the log CRP and log IL-6 levels, and tested a series of a priori hypotheses concerning the mdNLR and CRP Scores. In addition, we also conducted exploratory analyses to generate novel hypotheses regarding the role of methylation-derived leukocyte proportions in lung cancer. Immune cell ratios (e.g., CD4/CD8, Neu/lymphocyte, B cell/lymphocyte, T cell/lymphocyte, Mono/ lymphocyte, Neu + Mono/lymphocyte, Eos/lymphocyte, CD4nv/lymphocyte, B cell/CD8, CD8/Treg, Bnv/ Bmem, CD4nv/CD4mem, and CD8nv/CDmem) were calculated for each sample by taking the ratio of its predicted cell proportions described above and tested as continuous variables. The presence of Treg was tested as a dichotomous variable. Given the need for multiple comparison adjustment, Bonferroni adjustment (familywise error rate = 0.0013) was conducted for all exploratory analyses. We used conditional logistic regression to examine the association between DNA methylation-based inflammatory measures (CRP Scores 1-3 and continuous mdNLR) and lung cancer risk. Models were fit with age, sex, and smoking status (never, former, current) as matching factors and were adjusted for potential confounding factors, including body mass index (BMI), batch effect, and previously described methylation-predicted pack-years smoked [57]. These analyses did not additionally adjust for methylation-derived cell proportions given how these proportions correlated with methylation-based inflammatory measures (Table 3). We repeated these analyses by lung cancer histology (NSCLC, SCLC), length of time between blood draw and diagnosis (< = 10, > 10 years), and BMI (< 25, ≥ 25 kg/m 2 ).
Among the lung cancer cases, we examined the association between these same pre-diagnostic DNA methylation-based inflammatory measures (CRP Scores 1-3 and continuous mdNLR) and risk of lung cancer-specific death using a series of multivariable Cox proportional hazard regression adjusting for age, gender, smoking status, BMI, stage at diagnosis (three strata: stage 1 & 2, stage 3 & 4, and missing), cell proportion, batch effects, and methylation-predicted pack-years smoked. The proportional hazards assumption was checked by conducting global tests of correlating the set of scaled Schoenfeld residuals with time for each covariate. We excluded three lung cancer cases whose date of diagnosis and date of death were the same, or whose time between blood draw and date of lung cancer diagnosis was longer than 25 years. Cases were followed until their date of death from lung cancer, death from another cause, or the end of follow up in 2018, whichever came first.