Cumulative evidence for association between genetic polymorphisms and esophageal cancer susceptibility: A review with evidence from meta‐analysis and genome‐wide association studies

Abstract An increasing number of publications had reported the association between single‐nucleotide polymorphisms (SNPs) and esophageal cancer (EC) risk in the past decades. Results from these publications were controversial. We used PubMed, Medline, and Web of Science to identify meta‐analysis articles published before 30 July 2018, that summarize a comprehensive investigation for cumulative evidence of genetic polymorphisms of EC and its subtype risk. Two methods, Venice criteria and false‐positive report probability (FPRP) tests, were used to assess cumulative evidence of significant associations. At last, 107 meta‐analyses were considered to be in conformity with the inclusion criteria, yielding 51 variants associated with EC or esophageal squamous cell carcinoma (ESCC). Thirty‐eight variants were considered to be nominally significant associated with risk of EC or ESCC, whereas the rest showed non‐association. In additional, five variants on five genes were rated as strong cumulative epidemiological evidence for a nominally significant association with EC and ESCC risk, including CYP1A1 rs1048943, EGF rs444903, HOTAIR rs920778, MMP2 rs243865, and PLCE1 rs2274223, 10 variants were rated as moderate, and 18 variants were rated as weak. Additionally, 17 SNPs were verified noteworthy in six genomewide association studies (GWAS) using FPRP methods. Collectively, this review offered a comprehensively referenced information with cumulative evidence of associations between genetic polymorphisms and EC and ESCC risk.

common pathological types of EC, and they were caused by different risk factors. For ESCC, the risk factors were smoking, alcoholic beverages, socioeconomic status (SES), polycyclic aromatic hydrocarbons (PAHs), betel quid, diet quality, low fruit and vegetable intake, micronutrients, pickled vegetables, hot food and beverages and so on. While gastroesophageal reflux disease, Barrett's esophagus and smoking tobacco have been verified as risk factors for EADC. Although these environmental factors were considered to be risk factors for EC, epidemiological and etiological studies have shown that the role of genetic variants was also needed to be considered. 8 Over the past few decades, a large number of candidate genes association studies [9][10][11][12] were performed to explore the relationship between gene polymorphisms and EC risk. However, due to the small sample size and inadequate statistical power, the results were instability. Meta-analysis could present more credible results and stronger statistical power through integrating individual study findings, 13,14 and more than 100 meta-analyses had been performed in recent years. Most of the results for same variant, however, were inconsistent. For EGF rs4444903, Xu et al 15 performed a meta-analysis and found that the variant rs4444903 could decrease the risk of EC (OR = 0.73, 95% CI = 0.61-0.86), whereas Li et al 16 found the variant rs4444903 could increase the risk of EC (OR = 1.17, 95% CI = 1.09-1.25). For TP53 rs1042522, Zhao et al performed a meta-analysis and found that the variant rs1042522 could increase the risk of EC (OR = 1.20, 95% CI = 1.06-1.36), whereas Jiang et al found that the variant rs1042522 had an opposite association with EC risk (OR = 0.73, 95% CI = 0. 57-0.94). 17 GWAS could screen the sequence variation in the human genome and identify SNPs related to human diseases, 18 and they extended our understanding of associations between genetic variations and cancer risk. 19 To date, GWAS that have two stages (discovery and replication) have identified to be a commonly powerful and successful tool in the identification of genetic variants associated with susceptibility to complex human diseases or phenotypes. 20,21 As early as 2008, Dong et al 22 reported that three variants were significantly related to EC by assessing the FPRP of meta-analysis. In recent years, more than 100 meta-analyses had been performed. When GWAS method is applied, gene mutations or susceptibility loci were identified to have relationship with many diseases. 23 In 2010, of the 18 SNPs, Wang LD et al 24 summarized two identified susceptibility loci (10q23 and 20p13) associated with ESCC risk. Meanwhile, Abnet et al 25 found variants on PLCE1 gene associated with ESCC risk and Wang et al 24 found that the gene, C20orf54, had significant association with ESCC risk in Chinese population. Later, Jin et al 26 found consistent associations two loci (6p21.1 and 7p15.3) with EC risk in both GWAS and replication stages. In 2013, Levine et al 27 added three new susceptibility loci (3p13, 9q22, and 19p13) for EADC. In following years, Chang et al 28 found another two variants on 13q22.1.
Although more than 100 meta-analyses and several GWAS with association between genetic variants and EC susceptibility had been performed, these results of different studies for same variant were inconsistent, indicating the possibility of false-positive associations. Ioannidis et al indicated that mechanisms for summarizing and assessing genetic epidemiological evidence require periodic updates of all appropriate association studies based on widely accepted assessment criteria. 29 However, the current literature still lacks an updated comprehensive assessment report covering all possible variants of multiple genes with EC risk.
Therefore, we attempt to collect cumulative evidence of associations between genetic variants and EC risk from published meta-analyses and GWAS, and evaluate these associations, which may offer referenced information for further investigation of genetic risk factors for EC and its subtype.

| Literature search strategy and criteria for inclusion
The following items were used in search process: ("esophageal") and ("cancer" or "adenocarcinoma" or "carcinoma" or "tumor" or "squamous cell carcinoma") and ("meta-analysis" or "Meta-analysis" or "systematic review" or "literature review") and ("genetic association" or "Genetic" or "SNP" or "polymorphism" or "single nucleotide polymorphism" or "genotype" or "variant" or "variation" or "mutation" or "susceptibility"). Additionally, we also checked all the relevant references to find other potential meta-analyses that could offer relevant data.
Meta-analysis articles met to the following criteria: (a) The publications were in English; (b) cancer type was EC or including subtypes; (c) the patients with EC were diagnosed by pathological or histological examination; (d) sample size was not fewer than 1000; (e) they were studied of EC incidence/susceptibility (rather than mortality or survival rate). The following criteria should be met for screening the SNPs in GWAS on PubMed: (a) The publications were in English; (b) cancer type was EC, which includes all the subtype of EC; (c) the patients with EC were diagnosed by pathological or histological examination; (d) the studies included two phases (discovery and replication); (e) OR and 95% CI were provided and the less than cutoff of 1 × 10 -8 of P value was considered statistically significant; (f) they were studied of EC incidence/ susceptibility (rather than mortality or survival rate).

| Data extraction
Data were extracted by J.T and checked by two authors (C.L and GL). Information extracted from each eligible publication in GWAS included PMID of article, first author, publishing year, gene name, genetic variant, ethnicity of participants, the amount of subjects (cases and controls), minor allele frequency (MAF), OR, 95% CI, P value. In meta-analysis, the following data were collected: first author, publishing year, gene name, genetic variant, OR and 95% CI,the number of studies, the number of subjects (cases and controls), ethnicity, I-square, test for heterogeneity (Q test) between studies, 30 and the test for publication bias (Egger's test). 31 I-square refers to the percentage of variation across studies due to heterogeneity. We referenced the Cochran's Q test 32 to evaluate heterogeneity between studies. Generally, the cutoff P value used for between-study test of heterogeneity (Q-test) was 0.10. P value >0.10 represents little heterogeneity while P value <0.10 indicates the presence of heterogeneity. We attempted to extract information of EC, including ESCC and EADC; however, the results showed that almost all of the articles were studied for EC and ESCC, not for EADC. Therefore, we were unable to conduct assessment of EADC, and then, we just evaluated cumulative epidemiological evidence for EC and ESCC. In addition, the eligible studies reported two major ethnicities, Asian and Caucasian. Majority of the meta-analyses included a combination of two or more ethnicities, which were defined as "diverse populations." Therefore, we extracted the information of diverse population, when applicable, Asian and Caucasian population were also extracted. Multiple studies concerning the same SNP reported conflicting results due to varied sample sizes and the selection of different association models. Therefore, given study quality and result credibility, we selected the most recently published study with the greatest number of and most integrated participants and the standardized report of the genetic association study based on guidelines of the Human Genome Epidemiology Network for systematic review of genetic association studies 33 and Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA). 34 In addition, articles usually offered different genetic models; therefore, the additive model (see Table S1) was considered as the priority model for data extraction and evaluation in order to reduce selection bias. Specifically, the rest models were also used when additive model was not usable. For variant name, the most recent gene names were used to identify the different variants. An association was considered to be statistically significant if the 95% CI excluded 1.0 or if the reported P value was <0.05.

| Assessment of cumulative evidence
Venice criteria were applied to assess the epidemiological credibility of significant associations identified by   29,35 Credibility was rated as strong, moderate, or weak (grade A, B, or C) according to three elements: amount of evidence, replication of association, and protection from bias. The first element was evaluated by the total of alleles or genotypes among cases and controls which was divided into three groups: >1000, 100-1000, and fewer than 100 (representing A, B and C, respectively). Although certain test allele or genotype amounts were not provided, we could obtain the MAF from database of SNP on NCBI and further calculate the amounts. Association replication was calculated using heterogeneity statistics assigned one of three grades, as follows: grade A (I 2 < 25%), grade B (25% < I 2 < 50%), or grade C (I 2 > 50%). Protection from bias was mainly determined by sensitivity analysis and a series of bias tests including publication bias, small-study bias, as well as an excess of significant findings (see Table S2). Briefly, protection from bias was graded as A if there was no observable bias, and bias was unlikely to explain the presence of the association, B if bias could be present, or C if bias was evident or was likely to explain the presence of the association. Assessment of protection from bias also considered the magnitude of association; a score of C was assigned to an association with a summary OR < 1.15, unless the association had been replicated prospectively by several studies with no evidence of publication bias (ie, GWAS or GWAS meta-analysis from collaborative studies). Nevertheless, the associations did not participant in grading if the information was insufficient for assessment. For cumulative epidemiological evidence, all three elements were A would be considered as strong evidence, a C for any grade were weak, the rest of combinations were moderate. We performed a false-positive report probability (FPRP) assay with a prior probability of 0.001 and an FPRP cutoff value of 0.2 to uncover potential false-positive results among significant associations and evaluate whether these associations should be omitted, and we used the statistical power to detect an odds ratio of 1.5 for alleles with an elevated risk in FPRP calculations, as suggested by Wacholder et al. 36 Statistical power and FPRP values were calculated by the Excel spreadsheet which was offered on Wacholder's website. If the calculated FPRP value was below the prespecified noteworthiness value of 0.2, we would consider the association noteworthy, indicating the association might be true. Value for FPRP was assigned to three groups: <0.05, 0.05-0.2, >0.2 (representing strong, moderate and weak, respectively). Cumulative evidence was upgraded from moderate to strong or from weak to moderate base on FPRP <0.05. Conversely, cumulative evidence was downgraded from strong to moderate or from moderate to weak base on FPRP >0.2.

| Description of search results and characteristics of the studies
As presented in Figure 1, our search yielded a total of 918 publications. Of these, 179 publications were excluded due to overlaps, 157 irrelevant articles were excluded for The grade of C is given because the OR value is less than 1.15 and the association is not replicated by GWAS or GWAS meta-analysis. reading the title or abstract, 21 articles were excluded due to not meta-analysis, genetic polymorphism, or esophageal cancer, 19 articles were excluded due to not latest meta-analysis. In addition, 15 additional articles identified through relevant reference publications. At last, 107 metaanalyses were eligible for criteria in our review. On the basis of the data that extracted from the method mentioned above, yielding 51 SNPs associated with EC or ESCC susceptibility (45 SNPs in 41 genes, six SNPs in miRNA). We used the PubMed to identify the SNPs in GWAS. As a result, 17 SNPs were identified in six GWAS studies (15 SNPs are in 10 genes and two SNPs are near one gene or between genes).

| Significant association in metaanalyses and GWAS
For 51 SNPs identified by meta-analysis, 38 SNPs had statistically significant association with EC or ESCC susceptibility.  Meta-analysis results showed that there were 27 SNPs associated with increased EC or ESCC risk, whereas 11 SNPs decreased the risk of EC or ESCC risk. In addition, 17 SNPs were evaluated using the additive model. Ten SNPs were evaluated using the dominant model and another 11 SNPs were assessed using recessive or homozygous model because the additive model was not available. As presented in Table 1, cumulative epidemiological evidence was graded for 38 significant associations among the main meta-analyses. Venice criteria were firstly used to assess these associations. Strong, moderate, and weak evidence were assigned to four, seven, and 17 SNPs for EC, and were assigned to one, three, and two SNPs for ESCC, respectively. Next, cumulative evidence were upgraded from moderate to strong for CYP1A1 rs1048943, PLCE1 rs2274223, MMP2 rs243865 in EC and HOTAIR rs920778 in ESCC, from weak to moderate for ADH1B rs1229984 and COX-2 rs20417 in EC, based on FPRP <0.05. Cumulative evidence were downgraded from strong to moderate for IL-18 −607C>A, MMP1 rs1799750, SLC52A3 rs13042395 and MDM2 rs2279744 in EC, and TNF-α rs1800629, C20orf54 rs13042395, microRNA124 rs531564 in ESCC, from moderate to weak for GSTT1 null/present, XRCC1 rs1799782, Hsa-mir rs3746444, hOGG1 rs1052133, STK15 rs2273535 in EC, and microRNA-34b/c rs4938723, NAT2 rapid/slow in ESCC, based on FPRP >0.2. Finally, five SNPs on five genes were rated as strong for cumulative epidemiological evidence of association by combining Venice criteria and FPRP results, including CYP1A1 rs1048943, EGF rs444903, MMP2 rs243865, PLCE1 rs2274223 for EC and HOTAIR rs920778 for ESCC. Seven SNPs with EC and three with EC were rated as moderate. Sixteen with EC and two with ESCC were rated as weak. Great discrepancy between the calculated amount and the true amount makes the grade arduous to determine. Therefore, calculated amounts of less than 3000 were not included for assessment in MAFs obtained from the dbSNP.  The P values are all <1.00E-08.
The associations including 17 SNPs identified in six GWAS studies were presented in Table 2. 24,[26][27][28][75][76][77] Eleven SNPs listed in chart had associations with increased EC risk. Opposite association was found in six SNPs, [26][27][28]75 all of which were regarded as noteworthy based on FPRP method. The way of Venice Criteria was not applicable to GWAS which has not enough datasets even regarding the two-step GWAS including discovery and replication phases as individual studies, 29 we did not further evaluate these results. In addition, two variant (rs13042395 on C20orf54 and rs2274223 on PLCE1) were performed both in meta-analysis and in GWAS.

| Nonsignificant association in metaanalyses
We performed statistical power analyses to determine the stability of the associations. In our meta-analysis results, 13 variants were not significantly associated with EC or ESCC risk. 37,42,60,78,79 The variants (Arg399Gln on XRCC1) with sample sizes >10 000 were also not significantly associated with EC; further investigations for this variant may not be fruitful. Certain variants presented with relatively small sample sizes; as such, the evidence for nonsignificant (see Table  S3) was considered unstable.

| DISCUSSION
This review collates a comprehensive investigation for cumulative evidence of genetic polymorphisms of EC and its subtype risk. We extracted relevant useful information from meta-analyses and GWAS to support a comprehensive assessment for further evaluation. Using FPRP tests and Venice criteria, we scored strong, moderate, or weak cumulative evidence as credibility and strength of an association with cancer susceptibility. Five SNPs on five genes with strong evidence of association were identified, including CYP1A1 rs1048943, EGF rs444903, MMP2 rs243865, PLCE1 rs2274223 for assessing risk of EC and HOTAIR rs920778 for ESCC. Ten variants were found to have moderate evidence of association with EC or ESCC risk, and 18 variants weak evidence. CYP1A1, located on chromosome 15-q22, is an isozyme of cytochrome P450 and encods aryl hydrocarbon hydroxylase (AHH) which may combine with DNA to form adducts via a series of biochemical reactions. The ultimate carcinogens converted from the DNA adducts were considered to be associated with the development of EC. 37,90 SNP (rs1048943) was rated as strong evidence of association with a 1.49-fold increased risk of EC in overall population based on over 6000 sample size. This SNP triggers an increase in the enzymatic activity and increases the activation of enzyme induction, thus may accelerate cancer development. 37,91 In our subgroup analysis, this SNP increased EC risk based on 5431 sample size in Asians, whereas nonassociation was found in Caucasians based on 734 sample size. More studies of this variant in Caucasians or other ethnic groups should be performed.
EGF, located on chromosome 4q25-q27, 92,93 participates in the process of proliferation and differentiation of cells 94 and promotes gene transcription when EGF binds to its receptor. 95 Quiet a few studies have identified the G allele promoted the EGF protein expression when EGF binds to its receptor which could interfering DNA folding and further increased susceptibility of a range of human cancers. 96 Our review showed that the G allele of EGF +61A>G (rs4444903) polymorphism was rated as strong evidence of association with 1.38-fold increased risk of EC based on 1713 sample size.
MMP-2 is a sort of zinc-dependent endopeptidases, which can regulate various cell behaviors such as tumor initiation and growth by modulating cell proliferation, apoptosis and angiogenesis. 97,98 The SNP (rs243865), located in the promoter region of the MMP-2, disrupts an Sp1-type promoter site (CCACC box) and then affects MMP-2 expression or activity, which was considered to be associated with development of cancer condition. 99 Our review showed that the SNP (rs243865) was rates as strong evidence of association with EC risk (OR = 0.67, 95% CI = 0.55-0.80) under dominant model. All studies were performed on a single ethnic group (Asian), and we recommend expanding studies on this variant to other ethnic groups.
PLCE1, located on chromosome 10q23, participate in cell growth, differentiation, gene expression and oncogenesis. 100,101 Our review showed that this SNP (rs2274223) was rated as strong evidence of association with increased risk of EC for a common A to G transition of PLCE1 that may increase expression of PLCE1 protein 102,103 in a diverse population based on a sample size of over 20 000. In our subgroup analysis, this SNP was found significant association with EC risk in Asians, whereas nonassociation was found in Caucasians. Although the mechanism for ethnic differences is still unclear, one possible reason is due to differences in genetic backgrounds and in the environmental and lifestyle context.
HOTAIR, located on chromosome 12q13.13, might transform normal cells to malignant. 44 As Zhang et al 105 suggested, the TT genotype was nominally significant related to ESCC susceptibility among Chinese population when compared with the rs920778 CC genotype. 44 In our review, the SNP (rs920778) was rated as strong evidence of true association with ESCC risk for the T allele of HOTAIR that may induce genome-wide retargeting of polycomb-repressive complex 2, trimethylates histone H3 lysine-27 (H3K27me3) and deregulation of multiple downstream genes which participated in development and progression of ESCC 105 based on 4221 sample size (OR = 2.525, 95% CI = 1.921-3.320). Again, all studies were performed on a single ethnic group (Asian), and we again recommend expanding studies on this polymorphism to other ethnic groups.
There are six variants showing moderate evidence of association in our review, all of which downgraded from strong to moderate: IL-8 −607C>A, MMP-1 rs1799750, TNF-α rs1800629, MicroRNA124 rs531564, SLC52A3 rs13042395 and MDM2 rs2279744 based on a high FPRP (> 0.2). The FPRP method considers the P value, prior probability, and statistical power of the test; as we calculated FPRP at prior probability of 0.001 and used the statistical power to detect an odds ratio of 1.5 for alleles with an elevated risk in FPRP calculations, certain otherwise significant associations may have been excluded. Previous studies using different prior probabilities have classified their results as more noteworthy. Further investigations on these six variants may be necessary to analyze their associations in greater depth. Additionally, SNP (rs13042395) on C20orf54 also showing moderate evidence of association with ESCC susceptibility, which downgraded from strong to moderate. This SNP was both verified in GWAS and assessed in meta-analysis. The association in GWAS, however, may be more statistically significant and convincing when related meta-analysis results are inconsistent or the association evidences are not strong by combination Venice criteria and FPRP method. The Venice criteria could assess multiple reasons of potential bias such as genotype error or misclassification and ethnicity stratification, which were difficult to perform in meta-analysis. Therefore, results might be more convincing if different weights in the Venice Criteria were reset.
Cumulative evidence of two variants (ADH1B rs1229984 and COX-2 rs20417) with risk of EC was upgraded from weak to moderate based on FPRP <0.05. Additional assessment of two variants were necessary, particularly the variant (COX-2 rs20417) since sample size of study for this variant are relatively small (a total of 3779 sample size). 65,106 Thirteen variants were found not to be significantly associated with EC risk, to include nine variants on seven genes and two mRNAs in a sample of approximately 4000 patients, at approximately 85% power to detect an OR of 1.15 under different model for a variant with MAF of 20%. The MAFs of the aforementioned seven variants were almost all over 0.3 despite sample sizes >4000. We can safely conclude, therefore, that these seven variants are unlikely to be associated with EC (see Table S5). Further investigations evaluating these seven variants will probably not yield meaningful results with regards to EC.
Certain limitations do apply to this report. Although we did a comprehensive literature search, some articles may have been missed. A variability in sample size presented among different studies; the smaller sizes may have impacted the credibility of the data. The evaluated data extracted from only one source which may be main cause of bias. Finally, only the susceptibility/incidence between genetic variants and EC risk were evaluated; however, other roles of genetic polymorphisms such as tumor progression, metastasis, drug resistance for EC were not be assessed due to lack of data or information. Despite the limitations of our method, we believe that our study, as an updated summary and evaluation of the existing literature reporting genetic predisposition of EC, is of value for further genetic studies.
This review evaluated the cumulative epidemiological evidence of significant associations by combining the Venice criteria and FPRP results which identified five SNPs having strong evidence of true association. Collectively, our review provides referenced information for further investigation into genetic susceptibility of EC and ESCC.