Association of TERT, OGG1, and CHRNA5 Polymorphisms and the Predisposition to Lung Cancer in Eastern Algeria

Lung cancer remains the most common cancer in the world. The genetic polymorphisms (rs2853669 in TERT, rs1052133 in OGG1, and rs16969968 in CHRNA5 genes) were shown to be strongly associated with the risk of lung cancer. Our study's aim is to elucidate whether these polymorphisms predispose Eastern Algerian population to non-small-cell lung cancer (NSCLC). To date, no study has considered this association in the Algerian population. This study included 211 healthy individuals and 144 NSCLC cases. Genotyping was performed using TaqMan probes and Sanger sequencing, and the data were analyzed using multivariate logistic regression adjusted for covariates. The minor allele frequencies (MAFs) of TERT rs2853669, CHRNA5 rs16969968, and OGG1 rs1052133 polymorphisms in controls were C: 20%, A: 31%, and G: 29%, respectively. Of the three polymorphisms, none shows a significant association, but stratified analysis rs16969968 showed that persons carrying the AA genotype are significantly associated with adenocarcinoma risk (pAdj = 0.03, ORAdj = 2.55). Smokers with an AA allele have a larger risk of lung cancer than smokers with GG or GA genotype (pAdj = 0.03, ORAdj = 3.91), which is not the case of nonsmokers. Our study suggests that CHRNA5 rs16969968 polymorphism is associated with a significant increase of lung adenocarcinoma risk and with a nicotinic addiction.


Introduction
Lung cancer (LC) remains the most common cancer in the world, both in terms of new cases (2 million new cases, 11.6% of all cancers) and deaths each year (1.7 million deaths 18.4%) [1]. As for Algeria, out of a population of some 42 million, 4000 cases of new LC appear every year; it is the first cause of mortality from cancer in men and the seventh in women [1,2]. Although cigarette consumption accounts for the largest part of this alarming figure, over 80% of LC deaths are tobacco related but only a small fraction of smokers (usually <20%) develops this disease [3,4]. In addition, a fraction of lifelong nonsmokers will die from LC, suggesting that LC is a multifactorial disease that results from complex interactions with many genetic and environmental factors [5]. Thus, it is of great importance to perform molecular biology studies in order to gain some understanding of susceptibility factors in action in the LC appearance and identify causal variants, which can be useful in the screening, the early diagnosis, and the therapy of LC. Independent Genome-Wide Association Studies (GWAS), which are considered a gold standard for reporting genotype-phenotype associations, and several other studies have been performed with the aim of looking for single-nucleotide polymorphism (SNP) of predisposition to LC [6][7][8]. Among the important genes which could modify the risk of LC are TERT, CHRNA5, and OGG1, and these are the genes we have been focusing on in this study.
TERT, encoded by the TERT gene, is one of the major components of telomerase; it is a reverse transcriptase that can add a nucleotide sequence (5′-TTAGGG) to telomere end [9]. The promoter region of TERT, located at positions c.124:C>T and c.146:C>T, is considered a regulatory element of telomerase activity [10]. The rs2853669 variant, located at -245 kb upstream (Ets2 binding site) of the TERT gene, prevents Ets/TCF binding, and it has been associated with lower TERT expression and decreased telomerase activity [11]. A growing number of epidemiological studies have been conducted to determine the associations between this polymorphism and cancer risk, particularly for breast cancer and LC; however, the results were inconclusive [12].
The OGG1 gene is located at chromosome 3p26.2. Its protein is a DNA repair enzyme, having both DNA glycosylase and AP lyse (apurinic or apyrimidinic site lyase) activities to remove 8-hydroxyguanine (8-oxoG) lesion produced by reactive oxygen species [13]. The rs1052133 polymorphisms have been identified in the OGG1 gene; it is a substitution at codon 326 (Ser326Cys), located in the promoter region [14]. The Cys326 protein has been shown to have about 7 times weaker 8-hydrox-yguanine-repair capacity than Ser326 protein in a complementation assay using Escherichia coli mutant strain deficient in 8-oxoG [15]. It has been suggested that rs1052133 was associated with LC risk; several studies investigated this association [16][17][18].
CHRNA genes code for proteins that form receptors which bind to nicotine and its metabolites [19]. In a large GWAS study, Hung et al. found a strong association of LC risk and CHRNA5 rs16969968 SNP; it results in the substitution of arginine for aspartic acid at the highly conserved codon 398 (D398N) of the CHRNA5 protein [7]. In vitro studies indicate that rs16969968 decreases CHRNA5 function and favors nicotinic addiction [20]. Numerous studies have been performed to find out whether this association is direct, or whether the rs16969968 SNP is simply a proxy for increased exposure to tobacco carcinogens, which were inconclusive [7,[21][22][23][24][25][26].
In our study, we were interested in SNPs of three different genes, which were selected based upon GWAS and metaanalyses of multiple populations TERT (promoter region rs2853669, c.-124C>T, and c.-146C>T), OGG1 (rs1052133), and CHRNA5 (rs16969968). The aim of our study is to investigate how these SNPs can modify the risk of LC in the Eastern Algerian population. It is important to mention that our population, to the best of our knowledge, has not been the object of any molecular study concerning the LC till now, and thus this study, as any other pioneering work, had to deal with a relative scarcity of data.
Although this study has been conducted within various population centers in Eastern Algeria which thus encompass several ethnic groups, Arabs and Central Algeria Berbers, yet the population mixing which took place at an accelerated pace after Algeria's independence, the standardization of lifestyles, the rural exodus, all concurred to homogenize the genetic stock to a large extent [27]. Furthermore, the complex processes of admixture and isolation at work has made the genetic heterogeneity found in Algeria not strongly correlated with geography or culture. It would thus be improper to assume that the various population groups to be genetically isolated and immune of gene flow [27,28].

Study Population.
We have performed in this work a case control study of 144 LC patients and 211 healthy controls. The patients were first diagnosed with NSCLC between June 2015 and July 2016. They came from a wide zone spanning five Wilayas (territorial units) in Eastern Algeria between governmental University Hospitals (Constantine, Sétif, Batna, and Annaba) and private clinics. There was no restriction on gender, histologic subtype, or stage with the exception of patients who refused to participate in the study. Diagnosis was made by cytological imaging and histopathological examinations. We feel confident as explained previously that our samples are largely homogeneous. In addition to that, the democratization of access to medical treatment, freely provided to all, as shown in the fact that city dwellers and those from rural areas are equally represented in the various patient samples, allowed us to make the claim of genetic homogeneity of the samples with a good degree of confidence.
The control group comprised individuals with no history of cancer and without lung-related disease to avoid possible interference from overlapping genes. They were chosen from a pool of volunteers in the blood transfusion centers during the same period of patients without restriction of age or gender. All subjects were Algerians residing in Eastern Algeria.

Pulmonary Medicine
We used a standard epidemiological questionnaire submitted to patients and controls, and we used the medical files to collect personal data, including residential region, age, gender, profession, smoking status, symptoms, family history of cancer, histologic subtype, and stage TNM.

Ethics Statement.
Our research has been approved by the local Ethics Committee. The use of human blood sample and the protocol in this study strictly conformed to the principles expressed in the Declaration of Helsinki, and informed consent was obtained from all participants or from their family members.
2.3. DNA Extraction and Genotyping. Genomic DNA was extracted from six to eight milliliters of peripheral venous bloods, which was collected into a tube containing ethylenediamine-tetraacetic acid (EDTA) from each participant. We used standard protocols following the salt extraction procedure. The DNA concentration was measured with NanoDrop One spectrophotometer. For genotyping rs16969968 and rs1052133, we used Polymerase Chain Reaction (PCR) Taq-Man; with ViiA 7 (Applied Biosystems) and TaqMan® SNP Genotyping assay kits (Applied Biosystems), we used the standard protocol recommended by the manufacturer.
The promoter TERT was amplified by PCR, and the PCR primers used were as follows: 5 ′ ACCGTTCAGTTA GCGATTTCCCACGTGCGCAGAGGAC3′ (forward), and R: 5′CGGATAGCAAGCTCGTCTCCCAGTGGA TTCG CGGGC3′ (reverse). Then, it was genotyped with Sanger sequencing in both the forward and reverse directions using the Big Dye Terminator Chemistry V.I and an automated fluorescent sequencer, an ABI 48 Capillary Array Sequencer (Applied Biosystems 3730). Data management and analysis were done using SeqScape. For TERT sequencing, we could do sequencing for only 126 cases and 94 controls.

Statistical Analysis.
To estimate sample size, we used functions from the R package "epiCalc" (Version 2.9.0.1) with a minor allele frequency data obtained from NCBI (http://www.ncbi.nlm.nih.gov/SNP/) and a study power fixed at 80%. The Hardy-Weinberg equilibrium of the genotype distributions is evaluated in the control subjects using a function from R package "Hardy-Weinberg" (Version 1.6.1).
The strength of association between the three polymorphisms and LC risk is assessed by calculating odds ratios (ORs) with the corresponding confidence intervals (95% CI). The ORs are also performed for a heterozygote model (AA vs. aa, where A: major allele and a: minor allele), homozygote model (Aa vs. AA), recessive model (AA+Aa vs. aa), and dominant model (AA vs. Aa+aa). We used Pearson's chi-square test to compare distributions of demographic variables, smoking status, and genotypes of the three genes between cases and controls. To calculate ORs and 95% CI, unconditional logistic regression analyses are performed. In addition, multiple unconditional logistic regression analyses with adjustment for possible confounders (age, sex, and smoking) are performed to calculate adjusted ORs and 95% CI. Statistical tests along with ORs and corresponding CIs are done using the publicly available packages of R Project for Statistical Computing (R version 3.5.1) and especially functions from packages MASS and questionr (Version 0.7.0). All statistical analyses are two sided, and the significance level is set at p = 0:05.

Subjects Characteristics.
To test the association between TERT, OGG1, and CHRNA5 gene polymorphisms and the risk of LC, we conducted a case-control study consisting of 144 NSCLC cases and 211 controls enrolled in four University Hospitals of Eastern Algeria (Constantine, Sétif, Batna, and Annaba). The distributions of age, gender, smoking history, and histology type among the study subjects were summarized in Table 1. The distribution of gender and that of age were statistically significant between the cases and controls (p < 0:01). Thus, the cases and controls were incompletely matched in this study population. Among the cases, 66% were diagnosed with adenocarcinoma and 34% with squamous cell carcinoma. Approximately 73% of case subjects and 37% of control subjects were ever smokers. As expected, case subjects had a significantly higher level (p = <0:01) of smoking than control subjects.

Association Analysis of Candidate SNPs with LC Risk.
The distribution of TERT, OGG1, and CHRNA5 gene polymorphisms in cases and controls are presented in Table 2. The allele frequencies of the three gene polymorphisms in controls were consistent with the Hardy-Weinberg equilibrium; the p values are as follows: 0.15 for TERT, 0.49 for OGG1, and 0.15 for CHRNA5.
After sequencing the promoter of TERT gene, the two SNPs could not be found in this population (c.124C>T, c.146C>T), but we observed the presence of the rs2853669 polymorphism with a high frequency.

Stratified
Analysis for the Three SNPs Studied with LC Risk. We then performed a stratified analysis according to, tobacco consumption, histological type (Table 3), age, and sex (Table 4), in order to investigate associations between genetic polymorphisms and stratified factors in dominant, codominant recessive models. In addition to that, we studied these associations at the level of alleles (Table 5).
For TERT rs2853669, the results showed that the persons having CT or CC genotypes and passed 60 years had a larger risk of suffering from LC than those carrying TT genotype (p = 0:04). Among males, the homozygote genotype CC increase the risk of LC than TT or TC genotypes with p = 0:03 (Table 3). In addition to that, the patients with CC genotype have a tendency to have a squamous type rather than adenocarcinoma. Yet after adjustment, there was no statistical signification for the three stratified factors.
The OGG1 rs1052133 did not show any signification with stratified factors, except for age (>60 years), which reveals an association with the increase of LC for the persons having a minor allele (GG), in comparison with person with CC genotype (p = 0:04). After adjustment for age, sex, and smoking status, the signification become marginally significant (p = 0:09, OR = 0.49, CI = (0.21-1.11)).
As for CHRNA5 rs16969968, we observed an increasing risk for LC among males with a recessive genotype (AA) compared with those having GG or GA genotype with p = 0:02. However, after adjustment, the signification disappears.
Smokers with an AA allele have a larger risk of LC than smokers with GG or GA genotype (p = 0:02); after adjustment for age and sex, the signification persists with pAdj = 0:03, ORAdj = 3:91, and CI = ð1:24 − 17:34Þ. For the stratified analysis according to the histological type, we observed that the persons with AA allele have preferentially an adenocarcinoma rather than squamous type under the recessive and homozygote models p = 0:003 and p = 0:005, respectively. This risk  As for the stratified analysis by sex, smoking, histological type, and age, according to alleles, no significant association was found.

Discussion
High incidence and poor prognosis of LC make it a major health problem worldwide. Although the LC is linked to environmental exposure to carcinogens, especially cigarette smoking, only some smokers develop LC, which suggests that there is an interindividual difference in susceptibility to the disease. In this study, polymorphisms in three genes involved in the metabolism of carcinogens or in the repair of damaged DNA in lung cells, TERT (rs2853669), OGG1 (rs1052133), and CHRNA5 (rs16969968), were examined for association with NSCLC risk in a case-control study of 144 patients and 211 control subjects of Eastern Algerian population in an attempt to explain this interindividual difference.
In our population, none of these three genes shows an influence on the LC risk, except the CHRNA5 rs16969968 SNP, which was significantly different between cases and controls, but after adjustment, the signification disappears. These results are in accordance with the findings of numerous studies but contradictory to others [16,26,29]. The allelic frequencies were slightly different from those found in other populations and close to those of Caucasian population but differ drastically from Asiatic and African population [30].
For TERT gene, after sequencing its promoter, we identified in our population the rs2853669 polymorphism. The rs2853669 MAFs in our control group is C = 20%. According to literature and HapMap data, there is no big difference between the MAFs of rs2853669 in Caucasian and Asian populations, C: 26%-37% [12,31]. Overall, in our study, no significant association was found between the rs2853669 and NSCLC. In the stratified analysis, no difference was noticed for all stratified factors. Only two studies were performed in LC which were conducted exclusively on the Asian population, and they found a strong association between rs2853669 and LC risk [32,33]. A large recent metaanalysis where thirteen studies involving 16 datasets were pooled to evaluate the association between rs2853669 and cancer risk were realized, and researchers demonstrated that rs2853669 alone does not increase or decrease the overall risk and prognosis of cancer. In a stratified analysis by cancer type, a protective effect was found for breast cancer and a significant association was found for LC and glioblastoma; however, the small number of studies limited its credibility [29]. Concurrently, another recent meta-analysis with the same objective of the precedent had the pooled results indicating that the rs2853669 polymorphism was significantly associated with increased cancer risk in a homozygote model. In the stratified analysis, a significantly increased cancer risk was observed for Asians, but not so for Caucasian patients. A subgroup analysis by cancer type also revealed a significant increase in the risk of LC, but not breast cancer [12].
For the OGG1 rs1052133, previous studies have shown that homozygous carriers of the variant appear to have reduced repair capacity toward oxidized DNA lesions, and previous evidence indicated higher levels of 8-oxoG in lung tissue of LC patients than in lung tissue of patients without cancer [34,35]. Numerous studies have investigated the association between LC and this SNP, and the results are quite diverse [16][17][18]. In our study, the MAFs in the control group of rs1052133 was G = 29%, which is slightly higher than frequencies previously reported for Caucasians (15% to 25%), and lower than those reported (40% to 62%) for Asian populations, but consistent with those obtained (33%) for the Turkish population, and similar to those found (G = 27%) in a study of the North African population [16,36,37]. The  frequency of individuals carrying homozygous variant allele GG was slightly higher in the controls (9.5%) than in the cases (7.6%), which suggests a protective effect, but this difference is not statistically significant (p = 0:46). Two previous studies found that this SNP could have a protective effect for LC [36,38]. Overall, the rs1052133 does not have any effect on the risk of NSCLC in our population. These results are in agreement with the finding of three studies but in discordance with evidence from two other studies [13,36,[39][40][41]. Duan et al. did a meta-analysis (data from 8 studies), and no association was found between OGG1 rs1052133 and NSCLC risk [16]. It seems that Asians may have a much higher susceptibility to LC than the others due to having a higher frequency for the variant G allele [42,43].
In our stratified analysis by histological type, gender, age, and smoking habits, only the age showed a marginally significant elevation of risk of NSCLC for persons more than 60 years old and carrying a recessive genotype GG. However, results obtained by Hung et al. were not in agreement with ours [34].
Our findings could not explain the significantly lower capacity to incise 8-oxoG by leukocytes of NSCLC patients in comparison to healthy controls. A possible explication is that OGG1 gene polymorphisms are only one of many parameters that might affect OGG1 activity. Another explanation as suggested by Lee et al. is that the GG genotype is deficient in the repair of oxidatively generated DNA damage only under conditions of cellular oxidative stress [44]. However, both of these hypotheses need to be confirmed in future studies.
The rs16969968 CHRNA5 polymorphism was the subject of a large number of investigations [21,23,26,45]. Different gene expression and disease association studies provided evidence that both nicotine-dependence risk and LC risk are influenced by this polymorphism. This variant is common in populations of European and Middle Eastern origin, (MAFs = 37%-43%) but uncommon in African, East Asian, and Native American populations (<5%) [20]. For the Algerian population, the MAFs of the rs16969968 in the control group is A = 31%, which is in disagreement with the result found for the Mozabit ethnic group (MAFs = 18%) [46]. This result can be explained because the Mozabit ethnicity stands out genetically from the remaining population due to their practice of endogamy [46]. Our results show a decrease of the variant allele in cases and not in controls, but after adjustment for sex, age, and smoking, the signification disappears (p = 0:16). Thus, our data do not support an important role for the SNP in NSCLC risk. However, previous studies on different ethnic groups, predominantly Caucasian, showed a rather important association between rs16969968 and LC risk.
When stratified according to histologic subtype, rs16969968 was associated with LC risk in ADK but not in SCC (p = 0:03) under the recessive model. Falvella et al. reported that CHRNA5 mRNA levels are upregulated 30fold in ADK compared with normal lung tissue in individuals carrying AA genotype compared with those having GG genotype [47]. Nevertheless, Jaworoska et al. showed that this locus is implicated in all histopathologic subtypes of LC [48]. In spite of ADK and SCC being categorized as NSCLC, they have as molecular pattern, and for that reason, the rs16969968 may influence LC susceptibility differently according to histologic subtype. Additional studies are needed to confirm this hypothesis.
The stratified study on the smoking factor showed that smokers with AA genotype exhibit a four-fold higher risk of NSCLC compared with those carrying GA and GG genotypes. Results reported by Le Marchand et al. could explain our results; the authors found that smokers with the AA genotype smoke more cigarettes, but also smoke more intensely, extracting a greater amount of nicotine and carcinogens per cigarette, compared with noncarriers [49]. On the  other hand, another plausible explanation is that the variant of rs16969968 leads to reduced receptor activity and that individuals carrying the A allele may require larger amounts of nicotine to achieve the same level of dopamine release [20]. On the other hand, a meta-analysis concluded that the rs16969968-A predicts delayed smoking cessation and an earlier age of LC diagnosis [50]. It was confirmed by a recent study realized by Forget et al. showing that transgenic rats expressing a human nicotinic receptor polymorphism selfadminister more nicotine at high doses and exhibit higher nicotine-induced reinstatement of nicotine seeking than wild type. This relapse is associated with reduced neuronal activity in the interpeduncular nucleus [51]. All these studies show the importance of the rs16969968, especially for early diagnosis of LC, and may provide potential targets for new therapies for smoking cessation interventions. Especially, prospective studies indicate that in 2030, tobacco use will be responsible for 10 million deaths, making it the leading cause of preventable deaths [52].
It is important to mention that the Algerian population-which belongs to Africa's largest country-is a very specific population genetically wise due to both historical and cultural reasons with the Berbers being the native population of the region [53]. History of the region bears witness to many invasions, conquests, and migrations by Phoenicians, Romans, Vandals, Byzantines, Arabs, Jews, Spanish, and French, making it a true admixture which could explain why the MAFs of the Algerian population is close to the Caucasian and Middle Eastern ones and that no significant association was found. In addition, it was shown that some Berber populations (Tuareg, Mozabite) are different compared to the genetic North African ones, having gone through long periods of genetic isolation [27]. This can explain the remarkable difference between the rs16969968 CHRNA5 MAFs of Mozabite that were found (A = 18%) and ours (A = 31%) [46]. Another characteristic of our population, related to cultural factors connected with smoking, is that Algerian women do not smoke except in rare cases, which could explain the lack of significance between rs16969968, (strongly associated with the risk of LC) in Caucasian population.

Conclusions
In conclusion, our study, which is the first of its kind related to the molecular of LC for the East Algerian population, has considered the association of the polymorphisms of three genes TERT (rs2853669), CHRNA5 (rs16969968), and OGG1 (rs1052133) in Eastern Algerian population, and has shown that the rs16969968 CHRNA5 has a significant association with adenocarcinoma LC, and interestingly that the risk is important for smoking patients carrying the AA genotype.
These significant and novel results of our study have however some limitations. First, there is the limited sample size used in the study and the mismatch of our population in sex, age, and smoking, which could result in some bias in the results (although the unconditional logistic regression we used with adjustment for confounding variables, such as sex, age, and smoking, mostly neutralized this effect). Secondly, as the sample was not too large, the statistical power to detect the modest effect of three potential functional SNPs in stratified analyses is relatively low. Besides this, there were some missing information on the smoking status for some people and the duration and intensity of the nicotinic addiction tests. In any case, to fully apprehend the difference in the genetic background of the various North African populations will require more studies, and some of the results reported here may help bring new perspectives on the topic.

Data Availability
The data used to support the findings of this study are included within the article.

Conflicts of Interest
The authors declare no conflict of interest.