DNA methylation derived systemic inflammation indices are associated with head and neck cancer development

Objectives: Head and neck squamous cell carcinoma (HNSCC) is often associated with chronic systemic in- ﬂ ammation (SI). In the present study, we assessed if DNA methylation-derived SI (mdSI) indices: Neutrophil-to-Lymphocyte ratio (mdNLR) and Lymphocyte-to-Monocyte ratio (mdLMR) are associated with the presence of HNSCC and overall survival (OS). Materials and methods: We used two peripheral blood DNA methylation datasets: an HNSCC case-control dataset (n=183) and an HNSCC survival dataset (n=407) to estimate mdSI indices. We then performed multivariate regressions to test the association between mdSI indices, HNSCC development and OS. Results: Multivariate logistic regression revealed that elevated mdNLR was associated with increased odds of being an HNSCC case (OR=3.25, 95% CI=2.14 – 5.34, P =4×10 − 7 ) while the converse was observed for mdLMR (OR=0.88, 95% CI=0.81 – 0.90, P= 2×10 − 3 ). In the HNSCC survival dataset, HPV16-E6 seropositive HNSCC cases had an elevated mdLMR ( P =9×10 − 5 ) and a lower mdNLR ( P =0.003) compared to seronegative patients. Multivariate Cox regression in the HNSCC survival dataset revealed that lower mdLMR (HR=1.96, 95% CI=1.30 – 2.95, P =0.0013) but not lower mdNLR (HR=0.68, 95% CI=0.46 – 1.00, P= 0.0501) was associated with increased risk of death. Conclusion: Our results indicate that mdSI estimated by DNA methylation data is associated with the presence of HNSCC and overall survival. The mdSI indices may be used as a valuable research tool to reliably estimate SI in the absence of cell-based estimates. Rigorous validation of our ﬁ ndings in large prospective studies is warranted in the future.


Introduction
Head and neck squamous cell carcinoma (HNSCC) includes cancers arising from the lining epithelium of the upper aero-digestive tract including the oral cavity, larynx, pharynx, and nasopharynx [1]. Globally, HNSCC accounts for nearly 600,000 newly diagnosed cancer cases leading to approximately 325,000 deaths each year [2]. Tobacco and alcohol abuse are the main risk factors for HNSCC [1]. However, highrisk types of human papillomavirus (HPV), especially HPV type 16 have emerged as risk factors in a subset of HNSCC, particularly oropharyngeal cancer (OPC) [3]. Disease stage has been the single most predictive prognostic factor for OPC [4], although recent studies point to HPV status as an independent marker of prognosis in HNSCC patients [5,6].
T Cancer related chronic systemic inflammation (SI) is an enabling characteristic of cancers which help them to acquire tumour hallmarks [7] and its role in cancer prognosis has been increasingly recognised [8]. Chronic SI also promotes tumour initiation and progression by induction of immunosuppression via Myeloid Derived Suppressor Cells (MDSCs) [9]. Indeed, HNSCC is often associated with immunosuppression, with an imbalance in both the composition and function of effector immune cells [10].
DNA methylation based cell-type deconvolution algorithms have shown promise in estimating leukocyte cell type proportions [23]. Two recent studies have shown the utility of generating a methylation-derived NLR (mdNLR) index from peripheral blood DNA as a marker of cancer development and progression [19,24]. Furthermore, the authors reported a strong agreement between mdNLR and cell count based NLR estimates, instilling confidence in the DNA methylation based estimates of leukocytes and methylation-derived SI.
In the present study, we estimated mdSI indices (mdNLR and mdLMR) in pre-treatment HNSCC cases, cancer free controls, and in an independent set of HNSCC patients with overall survival (OS) data. We evaluated whether mdSI indices are associated with the presence of HNSCC and OS.

HNSCC case-control dataset
The DNA methylation data, percentage of leukocyte subtypes and covariates for the HNSCC case-control dataset were kindly provided by co-authors (DCK and KTK, GEO accession: GSE30229).
The study consisted of 92 pre-treatment HNSCC cases and 92 cancer-free control subjects with DNA methylation data from peripheral blood samples. The cases and controls were frequency-matched on age and gender. Details about the sample selection and preparation have been described previously [25]. DNA methylation was assessed using the Illumina Infinium HumanMethylation27 BeadChip assay (Illumina, Inc., CA, USA). To avoid potential biases, HNSCC case and control samples were randomized to bead chips. A sample with an unusually high value of mdLMR (mdLMR = 1116) was considered an outlier and was removed leaving a total of 91 HNSCC cases and 92 cancer-free controls for the final analysis.

HNSCC survival dataset
The study population for this analysis was comprised of individuals enrolled in the Head and Neck 5000 clinical cohort study [26,27]. Briefly, 5511 people with a new head and neck cancer diagnosis were recruited from 76 centres across the UK between April 2011 and December 2014. Individuals were recruited before they started treatment unless their treatment was their diagnostic procedure. Full ethical approval was granted by The South West -Frenchay Regional Ethics Committee (ref: 10/H0107/57).
At baseline, participants were asked to complete three self-administered questions, which included questions on socio-economic circumstances, lifestyle, general health and past sexual behaviours. Biological samples (blood (n = 4676 (85%), saliva (n = 4986 (90%) and tissue) were collected from all consenting participants. Information on stage at diagnosis, treatment and various other clinical and pathologic prognostic variables were abstracted from participants' medical records. 5474 (99%) data capture forms and 4099 (74%) health and lifestyle questionnaires were completed [27].
Blood samples were frozen and stored at −80°C and then processed in the Bristol Bioresource Laboratories. Following extraction, DNA was bisulphite-converted using the EZ DNA MethylationTM kit (Zymo, Irvine, CA, USA) as per the previously published protocol [29]. Following conversion, genome-wide methylation status of over 850,000 CpG sites was measured using the Infinium MethylationEPIC BeadChip [30]. The arrays were scanned using Illumina iScan and the initial quality review was assessed using GenomeStudio. DNA methylation data for the HNSCC survival dataset has not been previously published.
During the data generation process, a wide range of batch variables were recorded in a purpose-built laboratory information management system (LIMS).

Quality control and normalisation
Raw data (IDAT files) from GenomeStudio were loaded into R package meffil [31] and quality control (QC) data extracted (Supplementary Fig. 1). In total, 5 samples failed at least one of the QC steps. Overall, 443 samples passed the QC. Due to the subsequent recoding of the ICD-10 classifications, we had 436 samples of the oral cavity and oropharyngeal cancers. After filtering for the samples with complete data on HPV status, alcohol consumption and smoking status, we were left with 407 samples for the final analysis. These samples consisted of 389 OPC and 18 oral cancer cases.
Following QC, we performed functional normalization which exploits control probes to separate biological variation from technical variation [32]. Data were normalised using six control probe principal components derived from the technical probes.
Tobacco, alcohol, comorbidity and HPV exposure Detailed information on tobacco and alcohol history was obtained at baseline via the self-reported questionnaire. Participants were asked about their use of tobacco and alcohol products prior to receiving their HNSCC diagnosis.
Smoking status was defined as "ever" (current and former) or "never". Former smokers were those that reported having smoked ≥100 cigarette in a lifetime, whilst never smokers were defined as having never smoked at least one daily cigarette during a whole year.
Respondents were asked to report their average weekly alcohol consumption and were defined as "ever" and "never".
Chronic diseases are associated with increased systemic inflammation [33]. We used the Adult Comorbidity Evaluation 27 (ACE 27) completed by research nurses in clinical centres to record the presence and severity of medical comorbidities including chronic systemic diseases as described by Piccirillo et al. [34]. The participants were grouped into four categories: no co-morbidity, mild comorbidity; moderate decompensation and severe decompensation.

Mortality data
Regular updates are received from the NHS Central Register (NHSCR) and the NHS Information Centre (NHSIC) notifying on subsequent cancer registrations and mortality among cohort members throughout the Head and Neck 5000 study. Recruitment for the study finished in December 2014 and follow-up information on mortality status was obtained in September 2017, resulting in at least 2.75 years of follow-up for all participants.

Estimating cell counts and computing the methylation-derived systemic inflammation indices
For the HNSCC case-control dataset, cell counts were estimated as previously described [19,38]. For the HNSCC survival dataset, we used the dataset from Reinius et al. as a cell type reference [39] and cell counts were estimated using the Houseman et al. algorithm for estimating cell counts [38] in meffil. Each sample was normalised individually to the cell type reference, thus avoiding having cell count estimates dependent on other samples being included in the normalisation.
Methylation derived Neutrophil-to-Lymphocyte Ratio (mdNLR) was estimated by dividing estimated proportions of granulocytes by lymphocytes as previously described [19]. Similarly, methylation derived Lymphocyte-to-Monocyte Ratio (mdLMR) was estimated by dividing estimated proportions of lymphocytes by monocytes.

Statistical analyses
The analyses were performed using statistical software R (version 3.4.0). A Wilcoxon rank sum test was used to compare the mean mdNLR and mdLMR in (a) HNSCC cases and cancer free controls (from the HNSCC case-control dataset) and (b) HNSCC cases with available OS data (from the HNSCC survival dataset).
Multivariate logistic regressions were performed to test the association between mdNLR (continuous), mdLMR (continuous) and HNSCC case-control status. To test the association of mdSI indices (categorical, above and below median) with OS, univariate and multivariate Cox proportional hazard analysis was performed in the HNSCC survival dataset.
Prior to testing associations in the HNSCC case-control dataset, any potential effect of plate and/or BeadChip were regressed out using ComBat [40] as previously described [19]. The multivariate logistic regression model was adjusted for covariates age, gender, smoking status and HPV status. The ability of mdSI indices to classify HNSCC cancer cases and cancer free controls was assessed using receiver operating characteristic (ROC) curves and corresponding Area Under the ROC Curve (AUC) values using the R package pROC [41].
For the HNSCC survival dataset, we performed univariate and multivariate Cox proportional hazard analyses using the R package survival (https://cran.r-project.org/web/packages/survival/index. html). For each model, the proportional hazards assumption for a Cox regression was tested to check for any violation using function cox.zph implemented in the R package.
The multivariate Cox proportional hazards regression model was adjusted for age, gender, smoking status (ever/never), tumour stage (stage I&II (low)/ III&IV (high)), HPV16 E6 serology (positive/negative), alcohol consumption (ever/never) and ACE27 categorisation. Furthermore, to address potential sources of unwanted technical variation, we performed Surrogate Variable Analysis (SVA) [42][43][44]. We created a full model matrix (a model matrix containing mdNLR (above/ below median value), mdLMR (above/below median value), OS status, survival time, HPV16 E6 seropositivity status, smoking status, age, gender, alcohol consumption, ACE27 categorisation and low/high stage of tumour) and a null model matrix (a model matrix containing mdNLR (above/below median value), mdLMR (above/below median value), HPV16 E6 seropositivity status, smoking status, age, gender, alcohol consumption, ACE27 categorisation and low/high stage of tumour) from our phenotype data. Ten surrogate variables were derived as the most variable technical artefacts in our data. The multivariate Cox proportional hazards regression model was further adjusted for these ten surrogate variables along with the covariates mentioned above. Kaplan-Meier survival curves were plotted using the R package survminer (https://github.com/kassambara/survminer).
Finally, a Wilcoxon rank sum test was performed to compare the mean values of myeloid differentiation associated 5 CpG sites [24] in HNSCC cases who died during the follow-up period and those who remained alive.

Sample characteristics HNSCC case-control dataset
The sample characteristics including demographic and clinical data for the HNSCC case-control dataset have been previously described [19] and are shown in Supplementary Table 1. The mean age of the participants was 60 years, with 69% men. The mean mdNLR and mdLMR were 2.35 (SD = 1.36) and 6.25 (SD = 5.37) respectively.

HNSCC survival dataset
The sample characteristics including demographic and clinical data for HN5000 are shown in Table 1. Four hundred and seven people with HNSCC were included in this study. The mean age of the participants was 59 years, with 77% men. For lifestyle associated risk factors, 97% were ever alcohol consumers, 69% had a smoking history (ever smokers). The majority of the tumours (85%) were of high stage (stage III or IV) vs low stage (stage I and II). Of all the HNSCC samples analysed for HPV16 E6 protein seropositivity, 67% of the samples were positive while 33% were negative. The mean mdNLR and mdLMR were 2.4 (SD = 2.51) and 3.5 (SD = 1.42), respectively.
mdLMR is associated with overall survival in HNSCC In the HNSCC survival dataset, 109 (27%) participants died during the median follow-up period of 4.54 years (range 0.18-6.41 years).
Elevated mdNLR was observed in HNSCC patients who died during the follow-up period (P = 0.004, Fig. 1C) while mdLMR was elevated in HNSCC patients who remained alive compared to those who died during follow-up (P = 1.1 × 10 −5 , Fig. 1D).
There were no serious violations of the proportionality assumption across the predictors used in the univariate and multivariate analysis. Univariate Cox proportional hazards regression analysis ( Fig. 2 and Table 3) showed that lower mdNLR was associated with a reduced risk of death (HR = 0.55, 95% CI = 0.38-0.81, P = 0.00253), while lower mdLMR was associated increased risk of death (HR = 2.38, 95% CI = 1.66-3.55, P = 0.00002).
Chronological age, advanced stage (stage III/IV) and ever smoking (current/former) were associated with poorer OS in HNSCC. In contrast, HPV16 E6 seropositivity (Table 3) and OPC (Supplementary Table 2) were associated with better OS. Elevated mdNLR and lower mdLMR were observed in HPV negative and ever smoker HNSCC cases ( Supplementary Fig. 4).
We found that all five CpGs associated with myeloid cell differentiation (suggested to be a surrogate for mdNLR) were hypermethylated in HNSCC patients who remained alive during the follow-up period ( Table 4, Supplementary Fig. 5).

Discussion
In the present study, we have identified that methylation-derived systemic inflammation indices may be used to distinguish HNSCC cases and controls. The mdSI indices provide a slight improvement over covariates (age, gender, smoking and HPV status) in distinguishing HNSCC cases from controls. Intriguingly, lower mdLMR was associated with poorer OS.
We observed an elevated methylation-derived circulating neutrophil and monocyte count and a decreased lymphocyte count in HNSCC cases compared to controls. Similarly, HNSCC cases with poor OS showed elevated neutrophil and monocyte cell counts and a lower lymphocyte count. Our findings concur with previous reports on cell count-based leukocyte measurements in HNSCC development and progression [45][46][47].
We utilised mdNLR and mdLMR to understand the contribution of systemic inflammation in HNSCC development and survival. Findings from the present study suggests an association of mdSI indices with the presence of HNSCC and OS similar to previous cell count based reports [16,18]. Interestingly, our DNA methylation-derived estimates of NLR and LMR were similar to the cell count based measure of SI [18,48]. The similarities between DNA methylation and cell count based inflammation indices strengthens the utility of mdSI indices as a valuable research tool to estimate SI in the absence of cell count based measurement, especially in prospective studies.
Although the mdSI indices are associated with the presence of HNSCC and with OS, these associations may also be driven by exposure to inflammation-associated risk factors of HNSCC such as smoking and HPV status that are also associated with poor prognosis in HNSCC [6]. Our findings of an elevated mdNLR and lower mdLMR in HPV negative and ever smoking HNSCCs are in agreement with the previous observations [14]. These observations may be indicative of the potential biological differences between HPV positive and negative tumours. HPV effectively evades the innate immune system by confining gene and protein synthesis to the epithelial cells hence, only nominal amounts of replicating virus are exposed to the immune system [49,50]. Our findings of elevated mdLMR and lower mdNLR in HPV-positive HNSCC may therefore be reflective of an innate immune response. Smoking is associated with increased systemic inflammation [51]. Lifetime smoking related tobacco exposure measured by pack years was recently shown to be associated with elevated NLR [52]. Indeed, our recent DNA methylation study identified an altered number of immune cells in response to smoking [53].
We observed an association between mdSI indices (elevated mdNLR and lower mdLMR) and increased odds of being an HNSCC case. Our results concur with the previously published work validating the SI indices derived using DNA methylation data [14][15][16]18]. Importantly, the level of mdNLR derived inflammation index were similar to the NLR derived using cell counts in HNSCC [16,48]. Previous studies have reported a higher monocyte count and a lower lymphocyte count associated with poor clinical outcome in HNSCC [54][55][56]. Our finding of lower LMR associated with reduced OS validates previous reports that cell count based pre-treatment LMR may be an independent prognostic marker in cancers, including HNSCC [18,57]. Altered SI (NLR, LMR) derived from either DNA methylation-based data or based on cell counts are reflective of systemic inflammation [13]. Previous studies have suggested a vicious cycle of interaction between tumour cells and cells of myeloid origin such as neutrophils and monocytes through cytokines which leads to neo-angiogenesis and poor treatment response [46,47,58,59]. On the other hand, lymphocytes play a critical role in strengthening the host immune response against cancer [55]. In fact, the levels of tumour infiltrating  lymphocytes (TILs) are known to predict survival in OPC patients [60]. We observed an increase in baseline methylation-derived myeloid cell counts (neutrophils and monocytes) and a decrease in baseline lymphocytes in incident HNSCC cases and cases with poor survival. Thus, our findings underline the significance of immune homeostasis in HNSCC development and progression. Strengths of our study include the use of two datasets, giving us the ability to explore varied roles for mdSI indices in distinguishing pretreatment HNSCC cases and controls, as well as in relation to survival. We performed multivariate analyses adjusting for appropriate potential confounders and possible sources of technical variation.
Our study is not without limitations. Firstly, the sample size for both the case-control and OS studies was small, moreover, we were unable to identify independent prospective datasets to validate our findings. This is attributed to limited published and publicly-available HNSCC datasets with genome-wide DNA methylation information on whole blood. Secondly, in the absence of genetic instruments for the cell count based systemic inflammation index, we are unable to evaluate causality of the observed association using Mendelian randomization [61] nor can we rule out reverse causation in the HNSCC case-control study. Thirdly, we were unable to compare mdSI indices to cell count based SI indices due to the lack of availability of directly measured blood cell type proportions for the studied datasets. In spite of these limitations, the confidence in our measured methylation-derived SI is strengthened by (i) previous studies that have validated the use methylation-derived cell counts in estimating SI [19,24] and (ii) similarities between our mdSI indices and previously published cell count based SI. Finally, we had limited information on the presence of oral inflammatory conditions such as oral lichen planus, Behcet's disease, and recurrent aphthous stomatitis in our datasets, so we were unable to adjust for these factors in the statistical models. However, we did account for the presence of chronic diseases associated with inflammatory conditions by adjusting for the ACE-27 score in our statistical models.
In conclusion, we have demonstrated that systemic inflammation indices are associated with the presence of HNSCC. Further, the mdSI indices are sufficient to distinguish HNSCC case and controls. In the HNSCC survival dataset, lower mdLMR was associated with poorer OS. The mdSI indices may be useful as a research tool for predicting highrisk HNSCC, especially HPV-negative HNSCC where there is a lack of reliable biomarkers of detection, although this would require rigorous validation in large prospective studies. The mdSI indices will be particularly helpful in prospective studies where the estimates of leukocyte subtypes were not recorded at recruitment. It remains to be tested whether mdSI measures SI independent of acute phase proteins such as CRP. Finally, we would be interested in testing whether mdSI in circulation is reflective of inflammation status in tumours.