- Split View
-
Views
-
Cite
Cite
Patrícia P Couto, Luciana Bastos-Rodrigues, Hagit Schayek, Flavia M Melo, Raony G C Lisboa, Debora M Miranda, Alyne Vilhena, Allen E Bale, Eitan Friedman, Luiz De Marco, Spectrum of germline mutations in smokers and non-smokers in Brazilian non-small-cell lung cancer (NSCLC) patients, Carcinogenesis, Volume 38, Issue 11, November 2017, Pages 1112–1118, https://doi.org/10.1093/carcin/bgx089
- Share Icon Share
Abstract
Lung cancer (LC) is a leading cause of cancer-related mortality. Although smoking is the major risk factor, ~15% of all cases occur in never-smokers, suggesting that genetic factors play a role in LC predisposition. Indeed, germline mutations in the TP53 gene predispose to multiple cancer types, including LC. To date, few studies compared the somatic and germline mutational profiles of LC cases by smoking status, and none was reported in Brazilians. Whole-exome sequencing (WES) was performed on two pools (seven smokers and six non-smokers) of tumor-derived DNA using the Illumina HiSeq2000 platform. Files from pools were analyzed separately using Ingenuity®Variant AnalysisTM and Mendel,MD. Validation of all candidate variants was performed by Sanger sequencing. Subsequently, validated mutations were analyzed in germline DNA from the same patients and in ethnically matched controls. In addition, a single recurring Brazilian TP53 germline mutation (R337H) was genotyped in 45 non-small-cell lung cancer patients.
Four novel germline variants in the ATAD2, AURKA, PTPRD and THBS1 genes were identified exclusively in smoker patients, and four germline missense variants in PLCD1, RAD52, CP and CDC6 genes were identified solely in non-smokers. There were 4/45 (8.9%) germline carriers of the R337H TP53 mutation. In conclusion, the recurring Brazilian TP53 mutation should be genotyped in all non-small-cell lung cancer in Brazil, regardless of smoking status. Distinct pathogenic mutations and novel sequence variants are detected in Brazilian non-small-cell lung cancer patients, by smoking status. The contribution of these sequence variants to LC pathogenesis remains to be further explored.
Introduction
Lung cancer (LC), a molecularly heterogeneous disease, is the leading cause of cancer death among men worldwide. In developed countries, LC has the highest death rates among females and is the second major cause of cancer deaths in less developed countries (http://globocan.iarc.fr/Default.aspx, accessed 11 October 2016). In Brazil, in 2016, 28,220 new cases (4.7% of all new cancer cases) have been diagnosed with LC (http://www1.inca.gov.br/vigilancia/, accessed 15 November 2016).
Tobacco exposure, primarily cigarette smoking, is the single most important environmental factor that underlies the vast majority of LC cases (1). Yet only ~20% of smokers develop LC and ~15% of LC patients have never smoked. In addition, environmental tobacco exposure, indoor and outdoor pollution, various carcinogens and genetic susceptibility have all been suggested as etiological factors for LC tumorigenesis in never-smokers. These factors combined, facilitate accumulation of genetic alterations in lung tissue and thus promote the malignant transformation process (2–4).
Epidemiological studies identified striking demographic, clinicopathological and molecular differences between LC in smokers and non-smokers, indicating that LC may develop through distinct carcinogenic pathways (2,5). When compared with tobacco-related LC, tumors from never-smokers are more frequently diagnosed in Asian women, at a younger age, with a preponderance of adenocarcinoma, and better survival outcome despite a more advanced stage at diagnosis (6).
In addition to the distinct epidemiological and demographic differences in LC between smokers and non-smokers, the tumor (somatic) genomic landscape is markedly distinct in never-smokers compared with smokers: (i) significantly higher mutation frequencies are observed in smokers, (ii) different mutation patterns between smokers (C:G>A:T predominant) and never-smokers (C:G>T:A predominant) and (iii) distinctive sets of somatic mutations identified in never-smokers (EGFR mutations, ROS1 and ALK fusions) and smokers (KRAS, TP53, BRAF, JAK2, JAK3, and mismatch repair gene mutations) (2,4,7,8).
Germline mutations in TP53 are associated with a rare inherited cancer syndrome, Li–Fraumeni syndrome (LFS, OMIM #151623). Germline TP53 mutation carriers have >90% lifetime risk for developing a variety of cancer types, including LC (9). In the South and Southeast regions of Brazil, a prevalent pathogenic missense mutation in the TP53 gene (R337H) was described, and shown to be a founder mutation (10). The R337H mutation can be detected in about 0.3% of the general Southern Brazilian population with higher frequency among children affected with adrenocortical cancer and choroid plexus carcinoma (11,12), LFS and LFS-like families (13), and Brazilian breast cancer cases (14). The rate of this single-germline TP53 mutation in LC cases in Brazilians has never been reported.
Previous studies of LC focused primarily on the identification of new driver genes somatically. In the present study, we focused on evaluating germline mutations in both the known founder TP53 gene mutation, and novel genes detected in the course of whole-exome sequencing of tumor DNA, with subsequent germline DNA analysis in Brazilian LC cases from smokers and non-smokers.
Materials and methods
Patient identification, recruitment and data collection
The study cohort encompassed patients diagnosed with lung adenocarcinoma who were eligible for surgery, with no history of chemotherapy or radiotherapy. Patients were recruited from a referral center of thoracic surgery (Hospital Julia Kubitscheck, Belo Horizonte) between 2006 and 2013 period. The Ethics Committee of Universidade Federal de Minas Gerais (#373-05) approved the study protocol and all participants signed a written informed consent.
Data acquisition
From each participant, relevant data including smoking habits, other environmental exposures, personal habits, family history of cancer, staging and complete clinical and imaging evolution were obtained from the medical charts. Pathological data were confirmed by histopathological reports. Tumor-related data were collected from the patients’ files in the Pathology Unit and the clinical course and response to chemotherapy and radiotherapy were all collected from the patients’ files at the Oncology department.
Genetic analyses
DNA extraction was performed using the Qiagen DNeasy kit (Sao Paulo, Brazil) following the manufacturer’s recommended protocol.
Targeted genetic analysis of candidate gene mutations
The R337H* TP53 missense mutation was genotyped using a PCR restriction enzyme digest as follows: a 237 bp DNA fragment encompassing exon 10 of TP53 was amplified with primers TGCATGTTGCTTTTGTACCGTC (forward) and GGAAGGGGCTGAGGTCACTC (reverse); thereafter the amplified germline DNA was digested with restriction endonuclease HhaI, which cleaves wild-type TP53 mutant, but not the R337H mutant. Digestion products were separated on a 3% agarose gel and all abnormally digested fragments were confirmed by Sanger sequencing from an independent PCR product.
Genotyping of germline mutations in the CHEK2 (p.Ile157Thr; rs17879961) and BRCA2 (p.Lys3326X; rs11571833) genes, mutations that have also been implicated in LC predisposition (15), were performed by bidirectional sequencing.
Whole-exome sequencing (WES)
Exome sequencing was performed on two pools. The first pool included DNA from tumors of seven smoker patients (patients 1–7 in Table 1) and the second pool encompassed combined tumor DNA samples from six non-smoker patients (patients 8–13 in Table 1). Pools were made of DNA extracted consecutive unselected patients who presented at the Thoracic Clinic and did not have previous history of chemotherapy or radiotherapy. These cases were selected based on the order of recruitment with the first recruited cases selected for this phase. In addition, for smokers, we selected the youngest consenting eligible patients. Two micrograms of DNA from each patient were used in each pool as required by Illumina’s protocol (Illumina Inc, San Diego, CA). The two pools of samples were then subjected to whole-exome capturing and sequencing using the Roche NimbleGen V2 chip (Madison, WI) and the Illumina HiSeq2000 sequencing platform (Hayward, CA).
Patients . | Smoking history . | Family history . | Tumor staging . | ||
---|---|---|---|---|---|
Number . | Age, years . | Gender . | |||
1 | 44 | Female | Smoker | T2AN0M0 IB | |
2 | 42 | Female | Smoker | Na | T2AN1M0 |
3 | 53 | Female | Smoker | Mother with lung cancer, Brother with lung cancer, Aunt with throat cancer, Cousin with breast cancer | T3N2M0 |
4 | 47 | Female | Smoker | Cousin with breast cancer, Uncle with bone cancer | T1BN0M0 |
5 | 56 | Male | Smoker | Na | T3N2M0 |
6 | 41 | Male | Smoker | Na | T3N0M0 |
7 | 57 | Male | Smoker | Na | T2AN2M0 IIIA |
8 | 51 | Male | Non-smoker | T3N0M1 IV | |
9 | 63 | Male | Non-smoker | Na | T3N2M0 IIIA |
10 | 51 | Male | Non-smoker | Na | Na |
11 | 74 | Female | Non-smoker | Brother with lung cancer | T3N0M0 IIB |
12 | 78 | Female | Non-smoker | Brother with brain cancer, Nephew with prostate cancer | T2ANOMO IB |
13 | 58 | Female | Non-smoker | Na | T2BN0M0 IIA |
Patients . | Smoking history . | Family history . | Tumor staging . | ||
---|---|---|---|---|---|
Number . | Age, years . | Gender . | |||
1 | 44 | Female | Smoker | T2AN0M0 IB | |
2 | 42 | Female | Smoker | Na | T2AN1M0 |
3 | 53 | Female | Smoker | Mother with lung cancer, Brother with lung cancer, Aunt with throat cancer, Cousin with breast cancer | T3N2M0 |
4 | 47 | Female | Smoker | Cousin with breast cancer, Uncle with bone cancer | T1BN0M0 |
5 | 56 | Male | Smoker | Na | T3N2M0 |
6 | 41 | Male | Smoker | Na | T3N0M0 |
7 | 57 | Male | Smoker | Na | T2AN2M0 IIIA |
8 | 51 | Male | Non-smoker | T3N0M1 IV | |
9 | 63 | Male | Non-smoker | Na | T3N2M0 IIIA |
10 | 51 | Male | Non-smoker | Na | Na |
11 | 74 | Female | Non-smoker | Brother with lung cancer | T3N0M0 IIB |
12 | 78 | Female | Non-smoker | Brother with brain cancer, Nephew with prostate cancer | T2ANOMO IB |
13 | 58 | Female | Non-smoker | Na | T2BN0M0 IIA |
Na, not available.
Patients . | Smoking history . | Family history . | Tumor staging . | ||
---|---|---|---|---|---|
Number . | Age, years . | Gender . | |||
1 | 44 | Female | Smoker | T2AN0M0 IB | |
2 | 42 | Female | Smoker | Na | T2AN1M0 |
3 | 53 | Female | Smoker | Mother with lung cancer, Brother with lung cancer, Aunt with throat cancer, Cousin with breast cancer | T3N2M0 |
4 | 47 | Female | Smoker | Cousin with breast cancer, Uncle with bone cancer | T1BN0M0 |
5 | 56 | Male | Smoker | Na | T3N2M0 |
6 | 41 | Male | Smoker | Na | T3N0M0 |
7 | 57 | Male | Smoker | Na | T2AN2M0 IIIA |
8 | 51 | Male | Non-smoker | T3N0M1 IV | |
9 | 63 | Male | Non-smoker | Na | T3N2M0 IIIA |
10 | 51 | Male | Non-smoker | Na | Na |
11 | 74 | Female | Non-smoker | Brother with lung cancer | T3N0M0 IIB |
12 | 78 | Female | Non-smoker | Brother with brain cancer, Nephew with prostate cancer | T2ANOMO IB |
13 | 58 | Female | Non-smoker | Na | T2BN0M0 IIA |
Patients . | Smoking history . | Family history . | Tumor staging . | ||
---|---|---|---|---|---|
Number . | Age, years . | Gender . | |||
1 | 44 | Female | Smoker | T2AN0M0 IB | |
2 | 42 | Female | Smoker | Na | T2AN1M0 |
3 | 53 | Female | Smoker | Mother with lung cancer, Brother with lung cancer, Aunt with throat cancer, Cousin with breast cancer | T3N2M0 |
4 | 47 | Female | Smoker | Cousin with breast cancer, Uncle with bone cancer | T1BN0M0 |
5 | 56 | Male | Smoker | Na | T3N2M0 |
6 | 41 | Male | Smoker | Na | T3N0M0 |
7 | 57 | Male | Smoker | Na | T2AN2M0 IIIA |
8 | 51 | Male | Non-smoker | T3N0M1 IV | |
9 | 63 | Male | Non-smoker | Na | T3N2M0 IIIA |
10 | 51 | Male | Non-smoker | Na | Na |
11 | 74 | Female | Non-smoker | Brother with lung cancer | T3N0M0 IIB |
12 | 78 | Female | Non-smoker | Brother with brain cancer, Nephew with prostate cancer | T2ANOMO IB |
13 | 58 | Female | Non-smoker | Na | T2BN0M0 IIA |
Na, not available.
Variant calling and annotation
For each sequenced pool, raw sequence files were prepared using the Genome Analysis Tool Kit. Each fastq file was aligned against the human hg19/GRCh37 reference genome. PCR duplicates were removed using Picard (http://picard.sourceforge.net/), reads around known and detected indels were realigned and base quality was recalibrated using Genome Analysis Tool Kit. To call variants from the processed BAM (Binary Alignment/Map) files, a variant calling pipeline from Genome Analysis Tool Kit was applied.
All generated VCF files from the pools were analyzed separately using two different tools. The first one was Ingenuity® Variant Analysis™ software, from Ingenuity Systems and available at www.ingenuity.com/variants. The second software was Mendel, MD, developed by the Clinical Genomic Laboratory of Universidade Federal de Minas Gerais (16).
Only variants with call quality of at least 40 and read depth of at least 20 were considered for subsequent analysis. Variants with high allele frequency (≥0.05% in Mendel,MD and ≥1.0% in Ingenuity Variant Analysis) in 1000 Genomes project (http://www.1000genomes.org/) or in the public Complete Genomics (http://www.completegenomics.com/) or in NHLBI ESP exomes (http://evs.gs. washington.edu/EVS/) were also excluded from further analysis.
Ingenuity® variant analysis™ software
In addition to the above-specified variant selection criteria, we further selected variants that were experimentally reported to be associated with a pathogenic phenotype, possibly pathogenic or established gain of function and gene fusions or inferred activating mutations by Ingenuity® or predicted gain of function by SIFT or in a microRNA-binding site or frameshift, in-frame indel or stop codon change or missense unless predicted to be innocuous by SIFT or Polyphen-2 (17).
Variants were filtered considering their biological context and were selected according to cancer-associated mouse knockout phenotypes, cellular processes, pathways therapeutic targets, COSMIC or TCGA. Subsequently, comparison of variants unique to smokers and those exclusive of non-smokers LC patients were carried out.
Mendel, MD
Confidence and frequency criteria described previously were used to perform the analysis in Mendel,MD software. For each of the selected samples, only variants predicted pathogenic in PROVEAN (18), SIFT, PolyPhen and CADD (17) were selected and grouped as exclusive of smokers or non-smokers.
Gene annotation
The results obtained from Ingenuity®Variant AnalysisTM and Mendel, MD were compared and only variants detected by both analyses tools were considered for subsequent analysis, and a final combined variant list was generated for each pool.
Pathway information was gathered from the Kyoto Encyclopedia of Genes and Genomes (KEGG) (http://www.genome.jp/kegg/) and Reactome, a Curated Pathway Database (http://www.reactome.org/, accessed 14 June 2016); gene biological process properties were gathered from the Gene Ontology database; gene–gene and protein–protein interactions were gathered from the BioGRID interaction database (http://thebiogrid.org/; accessed 24 May 2017); STRING protein functional interactions database (http://string-db.org/, accessed 24 May 2017) and data about somatic mutations in cancer were obtained from the COSMIC database (http://cancer.sanger.ac.uk/cosmic, accessed 7 June 2017). In addition, mouse models from each gene were gathered from KOMP (https://www.mousephenotype.org/about-ikmc/about-komp, accessed 15 June 2017); Mouse Genome Informatics (MGI; http://www.informatics.jax.org/, accessed 15 June 2017) and HomoloGene (http://www.ncbi.nlm.nih.gov/homologene, accessed 7 June 2017). Expression of each protein in lung tissue was analyzed through GeneCards (http://www.genecards.org/, accessed 7 June 2017). Following the aforementioned steps of variant and gene annotation and classification, data were manually reviewed to generate a final candidate gene table summarizing any supporting evidence for their putative involvement in lung adenocarcinoma cases.
Validation of the sequencing data
At first validation of the candidate genes found in both tumor pool exomes was carried out using Sanger sequencing of PCR products, with primers flanking mutation sites (available on request). Primers were designed using PrimerBlast software, available at http://www.ncbi.nlm.nih.gov/tools/primer-blast/. All PCR reaction products were gel-verified and purified using the PCRLinkTM Quick PCR Purification Kit (Life technologies, Carlsbad, CA) then submitted to sequencing reaction with the ABI BigDye Terminator Cycle Sequencing Kit v3.1 on an ABI PRISM 3730XL Genetic Analyzer (Applied Biosystems, Foster City, CA).
Subsequently, the validated mutations were also analyzed in all 45 germline DNA samples. In addition, DNA extracted from blood of 95 ethnically matched elderly controls (≥65 years of age) without any discernible personal or family history of benign or malignant tumor were tested for these sequence variants, looking for polymorphisms.
Results
Patient characteristics—TP53 subset analysis
Overall, there were 45 non-small-cell lung cancer (NSCL) cases; 29 (64.4%) men and 37 (82%) were current or past smokers. Mean age at diagnosis was 59.5 ± 8.7 years (mean ± SD) and six (13.3%) had a positive family history of cancer in first- or second-degree relatives. The most common histological subtype was adenocarcinoma, diagnosed in 36 patients.
TP53, BRCA2 CHECK2 germline genotyping data
There were four carriers of the R377H TP53 mutation (8.9%). Of these, three had a positive family history of cancer, three were smokers and none had an adverse effect of radiotherapy (Table 2). Genotyping for the germline mutations in CHEK2 (p.Ile157Thr; rs17879961) and BRCA2 (p.Lys3326X; rs11571833) revealed no mutations in any of the samples.
Patient No. . | Gender . | Age at diagnosis, years . | Histology . | Smoker . | Family history . |
---|---|---|---|---|---|
1 | Female | 78 | Adenocarcinoma | No | FDR, ovary carcinoma, SDR, prostate carcinoma |
2 | Female | 64 | Adenocarcinoma | Yes | FDR, prostate carcinoma |
3 | Female | 42 | Adenocarcinoma | Yes | None |
4 | Male | 46 | Adenocarcinoma | Yes | FDR, unspecified CNS tumor |
Patient No. . | Gender . | Age at diagnosis, years . | Histology . | Smoker . | Family history . |
---|---|---|---|---|---|
1 | Female | 78 | Adenocarcinoma | No | FDR, ovary carcinoma, SDR, prostate carcinoma |
2 | Female | 64 | Adenocarcinoma | Yes | FDR, prostate carcinoma |
3 | Female | 42 | Adenocarcinoma | Yes | None |
4 | Male | 46 | Adenocarcinoma | Yes | FDR, unspecified CNS tumor |
CNS, central nervous system; FDR, first-degree relative; SDR, second-degree relative.
Patient No. . | Gender . | Age at diagnosis, years . | Histology . | Smoker . | Family history . |
---|---|---|---|---|---|
1 | Female | 78 | Adenocarcinoma | No | FDR, ovary carcinoma, SDR, prostate carcinoma |
2 | Female | 64 | Adenocarcinoma | Yes | FDR, prostate carcinoma |
3 | Female | 42 | Adenocarcinoma | Yes | None |
4 | Male | 46 | Adenocarcinoma | Yes | FDR, unspecified CNS tumor |
Patient No. . | Gender . | Age at diagnosis, years . | Histology . | Smoker . | Family history . |
---|---|---|---|---|---|
1 | Female | 78 | Adenocarcinoma | No | FDR, ovary carcinoma, SDR, prostate carcinoma |
2 | Female | 64 | Adenocarcinoma | Yes | FDR, prostate carcinoma |
3 | Female | 42 | Adenocarcinoma | Yes | None |
4 | Male | 46 | Adenocarcinoma | Yes | FDR, unspecified CNS tumor |
CNS, central nervous system; FDR, first-degree relative; SDR, second-degree relative.
Patient’s characteristics—WES subset
Relevant clinical and pathological data of all lung adenocarcinoma cases (n = 13) whose DNA was subjected to WES are shown in Table 1. This studied population comprised seven women and six men, with a mean age at diagnosis was 63.7 ± 9.15 years (range 41–78 years), all without past personal history of cancer and only four had any discernible family history of cancer. Seven patients (53.8%) were smokers (current or former smokers) and the other six were classified as never-smokers (46.2%).
WES from smokers DNA
Variant calling from WES of smoker patients resulted in 51922 single-nucleotide variations with a coverage of at least 8×. The mean base call quality was 1015 and average read depth was 110.
Following analysis by Ingenuity Variant AnalysisTM, a total of 105 genes with mutations predicted to have a potentially damaging effect, which were targeted (Supplementary Table 1 is available at Carcinogenesis Online). The same analysis steps led to a list of 428 genes using Mendel,MD (Supplementary Table 2 is available at Carcinogenesis Online). Combining the two lists and focusing only on variants that emerged as possible candidates by both analyses, a list of 51 candidate genes was generated (Figure 1).
Finally, these 51 genes were reviewed for any supportive evidence to their putative involvement in lung carcinogenesis (see Materials and methods). Following this selection step, a total of 18 genes were defined (Supplementary Table 3 is available at Carcinogenesis Online) of which only 11 genes (AP2A1, ATAD2, AURKA, CACNA1S, EPHA3, MCM7, PARG, PDGFRA, PTPRD, TGM2 and THBS1) were validated via Sanger Sequencing. For these 11 validated somatic sequence variants, genotyping of germline DNA was carried out and four genes (ATAD2, THBS1, PTPRD and AURKA) displayed the same sequence variants in the germline DNA (Table 3). These dually validated (somatic and germline) gene variants were genotyped in 95 healthy ethnically matched controls. None of the controls harbored any of these sequence variants.
Gene . | Chromosome . | AA change . | Codon change . | SIFT . | Polyphen 2 . | CADD . | Provean . |
---|---|---|---|---|---|---|---|
ATAD2 | 8 | p.R195H | cGt/cAt | 0.01 | 0.999 | 4.73 | −2.90 |
AURKA | 20 | p.K14* | Aag/Tag | Nd | Nd | 8.46 | Nd |
PTPRD | 9 | p.C1428G | Tgc/Ggc | Nd | 0.992 | 4.123 | −11.02 |
THBS1 | 15 | p.D899G | gAc/gGc | 0.0 | 0.999 | 4.00 | −6.34 |
Gene . | Chromosome . | AA change . | Codon change . | SIFT . | Polyphen 2 . | CADD . | Provean . |
---|---|---|---|---|---|---|---|
ATAD2 | 8 | p.R195H | cGt/cAt | 0.01 | 0.999 | 4.73 | −2.90 |
AURKA | 20 | p.K14* | Aag/Tag | Nd | Nd | 8.46 | Nd |
PTPRD | 9 | p.C1428G | Tgc/Ggc | Nd | 0.992 | 4.123 | −11.02 |
THBS1 | 15 | p.D899G | gAc/gGc | 0.0 | 0.999 | 4.00 | −6.34 |
Cutoff: Provean (−2.5), SIFT (0.05) and CADD (20). ND, no data; SNV, single-nucleotide variation.
Gene . | Chromosome . | AA change . | Codon change . | SIFT . | Polyphen 2 . | CADD . | Provean . |
---|---|---|---|---|---|---|---|
ATAD2 | 8 | p.R195H | cGt/cAt | 0.01 | 0.999 | 4.73 | −2.90 |
AURKA | 20 | p.K14* | Aag/Tag | Nd | Nd | 8.46 | Nd |
PTPRD | 9 | p.C1428G | Tgc/Ggc | Nd | 0.992 | 4.123 | −11.02 |
THBS1 | 15 | p.D899G | gAc/gGc | 0.0 | 0.999 | 4.00 | −6.34 |
Gene . | Chromosome . | AA change . | Codon change . | SIFT . | Polyphen 2 . | CADD . | Provean . |
---|---|---|---|---|---|---|---|
ATAD2 | 8 | p.R195H | cGt/cAt | 0.01 | 0.999 | 4.73 | −2.90 |
AURKA | 20 | p.K14* | Aag/Tag | Nd | Nd | 8.46 | Nd |
PTPRD | 9 | p.C1428G | Tgc/Ggc | Nd | 0.992 | 4.123 | −11.02 |
THBS1 | 15 | p.D899G | gAc/gGc | 0.0 | 0.999 | 4.00 | −6.34 |
Cutoff: Provean (−2.5), SIFT (0.05) and CADD (20). ND, no data; SNV, single-nucleotide variation.
WES from non-smokers DNA
Variant calling from WES of tumor samples from non-smokers resulted in a total of 51011 variants in 13707 genes with a coverage at of least 8×. The mean base call quality was 993 and average read depth was 111.
Following the analysis by Ingenuity Variant AnalysisTM, a total of 72 genes with potentially damaging effect was found (Supplementary Table IV is available at Carcinogenesis Online). The same analyses steps led to a list of 310 genes using Mendel,MD (Supplementary Table V is available at Carcinogenesis Online). Combining the two lists resulted in 32 candidate genes shared by both analysis pipelines (Figure 1). For these 32 genes, supportive evidence for LC pathogenesis was provided for 10 genes: BARD1, CDC6, CP, DGKH, KIF13B, NEO1, NOTCH2, PICALM, PLCD1 and RAD52. All variants found in these selected genes were validated via Sanger Sequencing. For these validated somatic sequence variants, validation was carried out in germline DNA and four genes (PLCD1, RAD52, CP and CDC6) displayed the same sequence variants (Table 4). These four dually validated (somatic and germline) genes variants were genotyped in 95 healthy age and ethnically matched controls. None of the controls harbored any of the sequence variants.
Gene . | Chromosome . | AA change . | Codon change . | SIFT . | Polyphen 2 . | CADD . | Provean . |
---|---|---|---|---|---|---|---|
PLCD1 | 3 | p.A349T | Gcc/Acc | 0.04 | 0.996 | 4.853 | −3.05 |
RAD52 | 12 | p.E53K | Gag/Aag | 0.02 | 0.995 | 5.104 | −3.78 |
CP | 3 | p.G895A | gGc/gCc | 0.02 | 1.0 | 4.700 | −5.52 |
CDC6 | 17 | p.D295N | Gat/Aat | 0.1 | 0.831 | 5.05 | −2.56 |
Gene . | Chromosome . | AA change . | Codon change . | SIFT . | Polyphen 2 . | CADD . | Provean . |
---|---|---|---|---|---|---|---|
PLCD1 | 3 | p.A349T | Gcc/Acc | 0.04 | 0.996 | 4.853 | −3.05 |
RAD52 | 12 | p.E53K | Gag/Aag | 0.02 | 0.995 | 5.104 | −3.78 |
CP | 3 | p.G895A | gGc/gCc | 0.02 | 1.0 | 4.700 | −5.52 |
CDC6 | 17 | p.D295N | Gat/Aat | 0.1 | 0.831 | 5.05 | −2.56 |
Cutoff: Provean (−2.5), SIFT (0.05) and CADD (20). Nd, no data; SNV, single-nucleotide variation.
Gene . | Chromosome . | AA change . | Codon change . | SIFT . | Polyphen 2 . | CADD . | Provean . |
---|---|---|---|---|---|---|---|
PLCD1 | 3 | p.A349T | Gcc/Acc | 0.04 | 0.996 | 4.853 | −3.05 |
RAD52 | 12 | p.E53K | Gag/Aag | 0.02 | 0.995 | 5.104 | −3.78 |
CP | 3 | p.G895A | gGc/gCc | 0.02 | 1.0 | 4.700 | −5.52 |
CDC6 | 17 | p.D295N | Gat/Aat | 0.1 | 0.831 | 5.05 | −2.56 |
Gene . | Chromosome . | AA change . | Codon change . | SIFT . | Polyphen 2 . | CADD . | Provean . |
---|---|---|---|---|---|---|---|
PLCD1 | 3 | p.A349T | Gcc/Acc | 0.04 | 0.996 | 4.853 | −3.05 |
RAD52 | 12 | p.E53K | Gag/Aag | 0.02 | 0.995 | 5.104 | −3.78 |
CP | 3 | p.G895A | gGc/gCc | 0.02 | 1.0 | 4.700 | −5.52 |
CDC6 | 17 | p.D295N | Gat/Aat | 0.1 | 0.831 | 5.05 | −2.56 |
Cutoff: Provean (−2.5), SIFT (0.05) and CADD (20). Nd, no data; SNV, single-nucleotide variation.
Finally, we genotyped the variants found in all 45 samples of LC herein described. The mutations found as follows: ATAD2 and THBS1 (2/45); AURKA, PTPRD, CP and PLCD1 (1/45), RAD52 and CDC6 (3/45) and none were concomitant. No additional mutations were found in exons 4–10 of TP53 gene.
Discussion
The current study shows that 8.9% of unselected Brazilian LC cases carry the predominant founder R337H*TP53 mutation. The rate of carriers from this preliminary study should be validated and extended in other cohort studies focusing on Brazilian LC cases from other parts of the country. If these rates are consistently high, it may imply that all cases of LC diagnosed in Brazil should be screened for this specific genetic variant. None of the R337H TP53 mutation carriers in this study had a family history that fulfills the criteria for LFS (19). The absence of a significant family history was also noted by other investigators who evaluated the rate of this mutation in consecutive breast and pediatric tumor cases (20). Thus, using family history as a major criterion for offering genetic testing for this specific mutation is too strict a criterion in the Brazilian population.
It has been established that smoking plays a significant role in the initiation and progression of lung adenocarcinoma (2). Patients with a smoking history harbor 10-fold more frequent somatic point mutations compared with never-smokers (7,20). The vast majority of previous studies that focused on somatic mutations suggested that the spectrum of mutated genes of smokers compared with never-smokers seemed largely distinct (7,21). Yet, to the best of our knowledge, no comparison of the germline mutational spectra in LC cases by smoking status was ever published, to the best of our knowledge.
Recently, an abstract (22) was presented at the ASHG 2016 meeting showing an association of SHPRH (also known as RAD5; DNA repair proteins) with DNA from never-smokers. However, this gene was not present in our WES study participants, and further studies are required to establish its function in LC pathogenesis in non-smokers in ethnically diverse populations.
In the present study, four novel, seemingly pathogenic, germline variants in the AURKA, PTPRD, ATAD2 and THBS1 genes were identified in smokers with NSCLC, and four distinct germline missense variants in PLCD1, RAD52, CP and CDC6 genes were only detected in non-smokers with NSCLC.
Aurora kinase A (AURKA) is a cell cycle-regulated kinase, key regulators of mitotic events (23). Dysfunction of Aurora kinases can cause polyploidy and chromosomal instability, known contributors to lung tumorigenesis (24). It has been implicated in tumorigenesis as its overexpression has been found in various epithelial malignant tumors such as gastrointestinal (25) in addition to LC (24), AURKA and AURKB inhibition have been shown to be KRAS targets in LC (26).
Protein tyrosine phosphatase receptor type D (PTPRD) is a member of a family of receptor protein tyrosine phosphatases (PTPs). It seems to be a plausible LC predisposition gene based on several lines of evidence: the PTPRD gene localizes to chromosome 9p, a region displaying the loss of heterozygosity in LC and other malignancies (e.g. neuroblastoma) (27); the PTPRD gene reportedly harbors somatic inactivating mutations in LC as well as head and neck squamous cell carcinomas, melanomas and glioblastomas (28,29).
AAA domain-containing 2 (ATAD2) gene is a member of the ATPase family of genes that encodes for proteins containing both a bromodomain and an ATPase domain. This gene maps to chromosome 8q24, a region commonly amplified in many cancer types, including LC (30). ATAD2 is a cofactor for MYC, a nuclear phosphoprotein that plays a role in cell cycle progression, apoptosis and cellular transformation. Furthermore, Fouret et al. (31) provided evidence suggesting that amplified ATAD2 is the main driver of MYC contribution to uncontrolled cell proliferation in lung adenocarcinoma.
Thrombospondin I (THBS1 or TSP1) is a natural inhibitor of neovascularization and tumorigenesis in healthy tissue, generally acting to suppress tumor development (32). It has been suggested that TSP1 is upregulated after oncogenic Ras activation in a p53-dependent manner, which then positively feeds back into the Ras-MAPK signaling pathway (33).
These indirect lines of evidence tentatively support our findings that pathogenic inactivating variants in the AURKA, PTPRD, ATAD2 and THBS1 genes in smokers are predisposing factors associated with LC susceptibility. Yet, these data should be interpreted very cautiously, given the lack of functional analyses and the small sample size.
The RAD52 gene encodes a major protein for DNA double-strand break repair that binds to single-stranded DNA ends and mediates the DNA–DNA interaction, together with the RAD51 recombination protein, for the annealing of complementary strands of DNA bound by replication protein A (34,35). Genome-wide association studies have implicated the 12p13.33 locus that encompasses the RAD52 gene, to be associated with the increased risk of LC, mostly with squamous cell carcinoma and small cell LCs (36,37). Tumor expression levels are increased for the RAD52 transcript and protein and somatic gains of the 12p locus have been reported in squamous cell LC but not adenocarcinomas (38), a finding consistent with LC genetic susceptibility (36,37). In the present study, a pathogenic germline mutation in the RAD52 gene was detected in lung adenocarcinoma in non-smoker patients, suggesting that RAD52 activity can be compromised regardless of exposure to tobacco.
Phosphoinositide-specific phospholipase C (PLCD1) gene is a member of key enzymatic family involved in intracellular signal transduction (39). Multiple PLCs have been implicated in cancer pathogenesis by affecting cytoskeletal GTPase-mediated regulatory pathways (39). The PLCD1 gene has been identified as a novel tumor suppressor gene in esophageal squamous cell cancer and breast cancer (40,41). Only one previous study reported somatic mutations in the PLCD1 gene in lung adenocarcinomas (21).
The Cell Division Cycle 6 (CDC6) gene functions as a regulator at the early steps of DNA replication (42). Karakaidos et al. (43) studied 32 NSCLC adenocarcinomas and showed a synergistic effect in tumorigenesis of hCDT1-hCDC6 overexpression and mutant p53. More recently, Allera-Moreau et al. (44). demonstrated that overexpression of CDC6 gene is associated with a non-favorable prognosis in NSCLC. These authors proposed that the upregulation of the CDC6 gene helps cancer cells tolerate spontaneous replication stress.
Ceruloplasmin (Ferroxidase) (CP) is a metalloprotein that binds most of the copper in plasma and is involved in the peroxidation of Fe(II)transferrin to Fe(III) transferrin and excess Copper (Cu) is potentially hazardous to human health primarily by producing free radicals (45). Cross-sectional and case–control data have shown that tumor cells contain relatively high concentrations of Cu (46). As an additional evidence for the putative involvement of the CP gene and oxidative stress in LC pathogenesis, Pignatelli et al. (47) studied 52 LC patients and demonstrated that cigarette smoking increased oxidative stress, contributing to DNA damage, promoting chronic lung injury. However, the mechanisms of oxidative stress in non-smokers may be different. Inactivating mutations in the CP gene cause aceruloplasminemia, which, in turn, lead to iron accumulation and secondary tissue damage and could contribute to lung tumorigenesis.
Defining the distinct germline genetic alterations in LC in never-smokers compared with smokers is of critical importance for prevention strategies, diagnostic applications and therapeutic choices. The present study provides preliminary evidence that germline variants in the ATAD2, AURKA, PTPRD and THBS1 genes might contribute to the development of NSCLC tumorigenesis in Brazilian smokers and additional four germline missense variants in PLCD1, RAD52, CP and CDC6 may also contribute to NSCLC pathogenesis in Brazilian non-smoker patients.
Of the variants described herein, all but PARG and TGM2 mutations have been reported previously as somatic mutations in LC and/or other types of cancer (https://esp.gs.washington.edu/drupal/; https://www.ncbi.nlm.nih.gov/clinvar/; http://cancer.sanger.ac.uk/cosmic; https://cancergenome.nih.gov/; http://www.genecards.org/, all accessed on 7 June 2017). Alexandrov et al. (3) reported the results of somatic genotyping of 871 LC samples from smoker patients and 130 samples from non-smokers. These authors demonstrated one signature (signature 4) hallmarked mainly by C>A alterations that in all likelihood represent the direct mutational consequence of DNA damage induced by tobacco carcinogens. Notably, none of these alterations were present in the present study. In addition, data reported by the Cancer Genome Atlas Research Network (48) showed different somatic mutational spectra than that reported herein and by Alexandrov et al. (3). As the current study’s main focus is germline and not somatic mutations it seems plausible that the exact genes underlying LC susceptibility are different in ethnically diverse populations. In addition, it is also consistent with the notion of multiple pathways that eventually lead to LC phenotype. Clearly, expansion and validation of these preliminary results in Brazilian and non-Brazilian cases are needed before firm conclusions can be drawn.
Funding
This work was partially funded by Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq no. 405053/2013–4) and Fundação de Amparo à Pesquisa de Minas Gerais (FAPEMIG no. APQ-00220-14) (principal investigator Dr. De Marco), Brasil.
Abbreviations
Acknowledgements
We are grateful to all patients.
Conflict of Interest Statement: None declared.
References