Gene-expression profiles in lung adenocarcinomas related to chronic wood smoke or tobacco exposure

Background Tobacco-smoke is the major etiological factor related to lung cancer. However, other important factor is chronic wood smoke exposure (WSE). Approximately 30 % of lung cancer patients in Mexico have a history of WSE, and present different clinical, pathological and molecular characteristics compared to tobacco related lung cancer, including differences in mutational profiles. There are several molecular alterations identified in WSE associated lung cancer, however most studies have focused on the analysis of changes in several pathogenesis related proteins. Methods Our group evaluated gene expression profiles of primary lung adenocarcinoma, from patients with history of WSE or tobacco exposure. Differential expression between these two groups were studied through gene expression microarrays. Results Results of the gene expression profiling revealed 57 statistically significant genes (p < 0.01). The associated biological functional pathways included: lipid metabolism, biochemistry of small molecules, molecular transport, cell morphology, function and maintenance. A highlight of our analysis is that three of the main functional networks represent 37 differentially expressed genes out of the 57 found. These hubs are related with ubiquitin C, GABA(A) receptor-associated like protein; and the PI3K/AKT and MEK/ERK signaling pathways. Conclusion Our results reflect the intrinsic biology that sustains the development of adenocarcinoma related to WSE and show that there is a different gene expression profile of WSE associated lung adenocarcinoma compared to tobacco exposure, suggesting that they arise through different carcinogenic mechanisms, which may explain the clinical and mutation profile divergences between both lung adenocarcinomas.

USA, lung cancer death rates declined 36 %, from 1990 to 2011, among males and 11 %, between 2002 and 2011, among females due to reduced tobacco use as a result of increased awareness of the health hazards of smoking and the implementation of comprehensive tobacco control [2]. There have been reports of a doubling in the annual incidence of lung cancer in never smokers, identifying as well that non-smoker NSCLC patients tend to be female and young [7,8]. Regarding mortality, never smokers present lung cancer death rates greater in men than in women and a large fraction of cases have no identified risk factors [9]. Meanwhile former smokers present an increased risk of lung cancer but cumulative risk decreases with earlier smoking cessation compared to smokers who continue smoking [10].
Chronic wood smoke exposure (WSE) is related to obstructive pulmonary disease in developing, European and American countries [11,12]. Wood dust has also been identified as a human carcinogen and a risk factor for lung cancer [3,13]. Wood byproducts such as benzene, 1butadiene, formaldehyde and acetaldehyde, are well-known carcinogens [14]. For more than 50 years, WSE has been associated with an increased risk of lung cancer as compared with pulmonary tuberculosis, interstitial lung disease and various pulmonary conditions (OR: 1.9; 95 % confidence interval (CI): 1.1-3.5) after adjusting for age, education, socioeconomic status and tobacco smoke exposure [13]. In Mexico, approximately 16 % of the population has long-term exposure to wood smoke for residential heating and/or cooking, and 30 % of lung cancers are associated with WSE [5,15]. Molecular assays have shown upregulation and phosphorylation of p53 in WSE related lung cancer [16]. Moreover, WSE is associated with macrophage dysfunction and an increase in the activity of metalloproteinases, like MMP-2 and MMP-9, which could be related to lung injury in chronic obstructive pulmonary disease and have a role in the physiopathology of lung cancer [17].
Ethnical origins and different risk factors for lung cancer might explain the distinct mutation profiles, as in the case of epidermal growth factor receptor (EGFR) and KRAS for Asians, Caucasians and Latins [18][19][20]. Our group previously reported a high rate of treatment response and a better outcome in patients with WSE related lung cancer treated with EGFR-Tyrosine Kinase Inhibitors (TKIs) [21]. We have further described that WSE related lung cancer is associated with an older age at diagnosis, adenocarcinoma histology, pleural effusion, high prevalence of EGFR mutations (55.4 %) and a low prevalence of KRAS mutation (6 %), compared to patients with smoking history [15]. These situations indicate clear differences in the molecular and clinical evolution of WSE related lung cancer compared with tobacco associated lung cancer.
In order to further analyze the molecular differences observed in WES-related lung cancer, the objective of our work was to compare the genetic expression profile of lung adenocarcinoma in patients with WSE or a smoking history.

Experimental design
This study used clinical, longitudinal, prospective, observational and analytical cohorts with the selection of a non-probabilistic sample type. The protocol was approved by the Scientific and Bioethical committees of the Instituto Nacional de Cancerología (INCan, 008102510M1, CB451).

Patients and tissue samples
Patients admitted to the INCan with a pulmonary lesion suggestive of primary lung cancer were prospectively biopsied from January 2008 to June 2011. After informed consent, tissue was obtained by computer tomography-guided tru-cut (Care fusion, San Diego, CA, USA) from the clinically suspected primary tumor. Data were excluded from the analysis if there was no histological diagnosis, a different type of primary cancer was present, or if the pathology report indicated a histology different from lung adenocarcinoma. The patients with histologically confirmed advanced lung adenocarcinoma (stages III B and IV) were eligible for inclusion in the study (Fig. 1).
A complete medical history that included a detailed history of smoking, wood smoke exposure and a physical examination was obtained. Tumor specimens were collected at the time of diagnosis. WSE was defined as exposure to fumes resulting from burning wood in fireplaces and wood stoves for ≥ 4 h per day for ≥ 5 years. The WSE exposure index was calculated as the average number of hours spent on cooking daily per the total number of years spent cooking [22]. A smoker was defined as being someone having a lifetime exposure of more than 100 cigarettes [6]; the tobacco-smoking index was calculated by multiplying the number of cigarette packs consumed per day by the number of years spent smoking [15].

RNA isolation and RNA preparation for microarrays
Primary tumor core-biopsy was performed prior to any treatment and snap-frozen in nitrogen for RNA extraction. A trained pathologist confirmed histological diagnosis and quantified tumor cell percentage.
The procedure for extraction and purification of total RNA from tissue (up to 5 mg tissue) was done using RNeasy Micro Kit (QIAGEN, Germany) (cat. 217084). RNA integrity was evaluated by capillary electrophoresis using the Agilent Bioanalyzer 2100 (Agilent Technologies, Santa Clara, CA). Samples with RNA integrity number (RIN) of six or higher were included for microarray analysis.

RNA amplification and expression microarray analysis
Gene expression analysis was done using the Affymetrix GeneChip® Human Gene 1.0 ST Array System, which evaluates the expression of 28,869 different genes. Sample processing was done following the manufacturer's instructions.

Strategy for microarray gene-expression analysis
Statistical analysis for differential expression was conducted using R and Bioconductor. Background correction for nonspecific hybridization was performed with Robust Multiarray Average (RMA) [23] which uses a fairly complex statistical model that supposes both additive and multiplicative noise components. After background correction to the individual probes, quantile normalization [24] was applied, both steps are implemented in the oligo package [25]. Normalized and corrected probes are summarized into probe sets using the median polish algorithm, which is a type of robust 2-way ANOVA, where one factor is the array and the other is the probe set. The algorithm is robust to outliers, making single probes with large values are downweighted. Batch correction was also applied to all samples using combat in the sva package [26]. Differential expression was identified through linear models implemented in the limma package [27], genes were selected as significant according to two summary statistics: p-value (<10 −2 ) and fold-changes larger than 1.2 in absolute values.
For the biological networks and functional analysis, we used QIAGEN's Ingenuity Pathway Analysis (IPA®, QIA-GEN Redwood City, http://www.ingenuity.com/). IPA was used to identify gene-signaling pathways that were involved in biological processes of WSE versus tobacco exposure. Networks of these genes were algorithmically generated based on their connectivity and assigned a score. The score ranks networks according to how relevant they are to the input genes not necessarily to the quality or significance of the network. The network indicates the molecular relationships between genes/gene products. Node color indicates up-or down-regulation and intensity is associated to degree of regulation. It is important to remark that uncolored genes were not identified as differentially expressed in our experiment, however, IPA integrate those into the computationally generated networks based on the collected evidence indicating a relevance to this network.

Identification of differentially expressed genes
Significant changes in gene expression were selected according to p-values <10 −2 and fold changes in absolute values larger than 1.2. Differences in gene expression are shown in volcano plots representing fold changes in log 2 base along the x-axis and the level of trust in the form oflog 10 (p-values) along the y-axis. A value of 2 on the y-axis represents our cut-off of 10 −2 (top right and left corners). Heat maps show gene profiles clustered, result of an unsupervised hierarchical clustering of genes significantly different (p < 0.01) between patients with different risk factors.

Results
Overall 53 tumor samples were collected, and 29 samples were suitable for gene expression analysis. Two samples with suitable material were excluded due to patients' history of asbestos exposure and thoracic radiotherapy. The 27 remaining samples included 12 patients with an exclusive history of tobacco exposure and 15 patients with exclusive WSE history (Fig. 1).
On differentially expressed genes in WSE compared with tobacco smoke exposure The comparison of adenocarcinomas from patients exposed to tobacco smoke versus WSE revealed that both groups can be separated based on the differential expression of 57 genes (p < 0.01), 35 up-regulated and 22 downregulated (Fig. 3).
Enrichment and functional analysis through biological networks was conducted using IPA software. The top functional networks were related to five different biological categories as follows: Lipid metabolism; Biochemistry of small molecules; Transport of molecules; Cell morphology; Function and cell maintenance. Table 3 lists the up-and downregulated genes with significant changes in expression from patients with NSCLC exposed to tobacco smoke versus WSE (p < 0.01). The two main categories with greater differences were transport of molecules and cell function and maintenance.

Discussion
Although the majority of lung cancer occurs in smokers, 25 % of worldwide lung cancer occurs in life long never smokers [28], being the 7 th largest cause of cancer-related mortality in this group [29], presenting a wide-ranging geographic incidence and risk factors such as asbestos, air pollution, radon, arsenic compounds, cadmium, chromium, ionizing radiation and WSE [30]. Additionally, molecular profiles observed in lung cancer are critically different among smokers and non-smokers particularly identified in genes such EGFR, KRAS, P53 and ALK [31]. In the case of WSE, there have been association with NSCLC and adenocarcinoma histology, EGFR mutations, a reverse association with KRAS mutations and higher response to EGFR-TKIs [15] making it a distinctive disease entity inside the group of never smokers which would be a good candidate for personalized diagnostic and therapeutic approaches. Therefore, lung cancer associated to WSE presents unique characteristics that make it a distinctive entity of disease within the group of never smokers; thus, it could be a good candidate for personalized diagnostic and therapeutic approaches. There is evidence of differential expression profiles associated with the bronchial epithelium of tobacco-smokers that sustains carcinogenesis [32], as well as the determination of tobacco-smoke transcriptional changes in oncogenes and anti-oncogenes [33]. Our study shows that the gene expression profiling of samples from patients with WSE is different from patients using tobacco.
Our group has previously reported that lung cancer related to tobacco smoke and WSE exhibits different clinical and pathological characteristics that may be related to different mechanisms, and this is reflected in their response rate and overall survival in NSCLC patients [15]. However, in the present report we show a specific gene expression profile for WSE that involves 57 genes. Using biological or functional network analysis, 37 genes were identified around UBC, GABARAPL1 genes and PI3K/AKT and MEK/ERK signaling pathways.
The UBC hub in Network 1 (Fig. 4) is involved in cellular homeostasis and signaling. It was originally activated to degrade misfolded or disused proteins, but it has been recently associated with the cell cycle, DNA repair, endocytosis, antigen processing and apoptosis [34]. Recently,   Tang et al. demonstrated that the inhibition of the ubiquitin system decreased the proliferation and radio-resistance in the H1299 cell line (NSCLC cells) [35]. In this regard, a clinically relevant observation is the approval of bortezomib as an inhibitor of the protein degradation system in human cancer [32]. The GABARAPL1 hub in Network 2 (Fig. 5) is a highly conserved protein throughout evolution. It is related to autophagy and vesicle intracellular transport [36]. Its participation in cancer is still not clear, but it has been reported that lower levels of this transcript correlates with decreased survival in patients with neuroblastoma [37] and increased metastasis in breast cancer [36]. On the other hand, the ectopic over-expression of GABARAPL1 inhibits cancer cell proliferation and tumor growth in mice [38]. There are other reports that relate low expression of this gene in several cancer cell lines [39]. Regarding the last network, there have been reports that show that PI3K/AKT and MEK/ERK signaling pathways are altered in NSCLC and their activation is associated with malignant transformation and drug resistance (Figs. 6 and 7). MEK and PI3K inhibitors can inhibit cell proliferation in NSCLC; however, for apoptosis activation, both signaling pathways must be simultaneously inhibited [40,41], a situation that is directly related to the frequently observed EGFR-TKI resistance in this tumor.
On a clinical note, our group has previously reported the association between NSCLC adenocarcinoma and positive EGFR mutation status in patients with history of WSE compared to tobacco smoke exposure. WSE is also related to gene promoter methylation that synergistically increases the risk for reduced lung function in cigarette smokers [46]. A recent report describing the toxicological characteristics associated with WSE in A549 cell lines, including high levels of polycyclic aromatic hydrocarbons (PAH) and low level of water-soluble metals, showed an enhanced level of free radicals, DNA damage and the major expression of inflammatory/oxidative stress genes [47]. There is evidence that some potential molecular targets, such as EGFR and the ErbB family receptor, are usually altered in epithelial tumors [48]. EGFR mediates cell proliferation, differentiation, survival, angiogenesis and migration, and is overexpressed in approximately 40-80 % on NSCLC tumors [49][50][51].
Clinically, it is known that EGFR inhibitors in NSCLC extend survival after first-line or second-line therapy in patients with EGFR mutations [52]. These mutations are more frequent in specific populations, including women, Asian and Hispanic ethnicities, never-smokers and adenocarcinoma histology [18,53,54]. Activating mutations in EGFR leads to constitutive tyrosine kinase activation and oncogenic transformation of lung epithelial cells [12,13]. In this sense, the presence of these common activating EGFR mutations is tightly associated with sensitivity to reversible EGFR-specific tyrosine kinase inhibitors (e.g.: erlotinib or gefitinib). Patients with these mutations display EGFR-TKIs response rates of approximately 70 % a median progression free survival (PFS) of approximately 9-12 months and overall survival rates that may exceed 20-32 months [55]. Most patients will experience disease progression and drug resistance attributed to the development of other second mutations or with the presence of other uncommon EGFR mutations [56]. Certain therapeutic relations in NSCLC include the main oncogenic protein KRAS-GTP with biological significance between EGFR and PI3K/AKT or MEK/ERK pathways [56]. The presence of KRAS mutations leads to an increased signal through the MEK/MAPK transduction pathway [56]. Rare cases of mutations of MEK have been reported in NSCLC [57]. Preclinical studies in both the KPC mouse model as well as patient-derived xenografts have shown that blocking the MAPK pathway at MEK results in a decrease of cell proliferation and a subsequent halt in tumor growth [58]. The activation of EGFR recruits PI3K to the cell membrane and phosphorylates phosphatidylinositol-2-phosphate (PIP2) to phosphatidylinositol-3-phosphate (PIP3), which in turn activates AKT and several downstream effectors [59]. Inhibitors of both PI3K and AKT have been developed [60], although inhibition of PI3K is complicated by the fact that there are multiple isoforms of the protein [61]. Another biological interaction takes place on KRAS is one that directly activates PI3KCA [62]. Unlike most oncogenic driver mutations on NSCLC, PI3K mutations may occur in association with EGFR or KRAS mutations [63]. Although rare, PI3K/AKT/mTOR pathway activation may occur through AKT mutations in NSCLC [64]. Clinical evidence has shown that reversible EGFR-TKIs are considered the frontline treatment for advanced NSCLC patients harboring EGFR mutations [65]. New emerging evidence suggests that the anti-tumor activity of EGFR-TKIs in resistant NSCLC cell lines can be enhanced by combined therapy with other regimens. Early efforts have shown that cetuximab, produced synergistic anti-proliferative effects when used in combination with gefitinib or erlotinib [66]. Our analyses provide biological networks relationships between 37 genes and PI3K/Akt and MEK signaling for understanding the biologic properties of WSE effects as a carcinogenic factor in NSCLC. It also shows useful common pathway maps for a future understanding of the disease and the development of new therapeutic targets. Whilst the differences in gene expression patterns between WSE or tobacco-related lung cancer that we identified in this paper provide an important insight into the molecular basis of the clinical and biological differences between these two tumors, there is a limitation regarding the small sample size. However, this is countervailed by a thorough characterization of the samples, a detailed clinical history and close follow-up on all patients. It is imperative to continue further study to validate the potential biological and clinical implications of our findings.

Conclusion
In conclusion, our results suggest a differential gene expression profile for WSE or tobacco-related lung cancer, which suggests different carcinogenesis mechanisms between both risk factors and enlightens the clinical-pathological and mutational profiles between both groups with adenocarcinoma.