A Genome-Wide Association Study Identiﬁed Novel Genetic Susceptibility Loci for Oral Cancer in Taiwan

: Taiwan has the highest incidence rate of oral cancer in the world. Although oral cancer is mostly an environmentally induced cancer, genetic factors also play an important role in its etiology. Genome-wide association studies (GWAS) have identiﬁed nine susceptibility regions for oral cancers in populations of European descent. In this study, we performed the ﬁrst GWAS of oral cancer in Taiwan with 1529 cases and 44,572 controls. We conﬁrmed two previously reported loci on the 6p21.33 (HLA-B) and 6p21.32 (HLA-DQ gene cluster) loci, highlighting the importance of the human leukocyte antigen and, hence, the immunologic mechanisms in oral carcinogenesis. The TERT-CLMPT1L locus on 5p15.33, the 4q23 ADH1B locus, and the LAMC3 locus on 9q34.12 were also consistent in the Taiwanese. We found two new independent loci on 6p21.32, rs401775 in SKIV2L gene and rs9267798 in TNXB gene. We also found two suggestive novel Taiwanese-speciﬁc loci near the TPRS1 gene on 8q23.3 and in the TMED3 gene on 15q25.1. This study identiﬁed both common and unique oral cancer susceptibility loci in the Taiwanese as compared to populations of European descent and shed signiﬁcant light on the etiology of oral cancer in Taiwan.


Introduction
Oral cancer is the eighth most common cancer in men worldwide with an estimated 264,211 new cases in men in 2020 [1]. The age-standardized incidence rate was 2.6-fold higher in men (6.0 per 100,000) than in women (2.3 per 100,000) worldwide, although the rate varies widely across regions, with a 20-fold difference in men and 10-fold difference in women [1]. Taiwan has the highest incidence rate of oral cancer in men (27.01 per 100,000) in the world and the male predominance is even more pronounced with nearly 90% of oral cancers occurring in men, accounting for the third highest incidence rate and fourth highest cause of cancer deaths among all cancers among Taiwanese men [2]. The high incidence rate and striking male predominance of oral cancer in Taiwan are attributed to the high prevalence of major risk factors, including smoking, alcohol drinking, and particularly, betel quid chewing [2][3][4][5]. Betel quid chewing confers an approximately 8-fold increased risk of oral cancer, much higher that the risks conferred by smoking (3.6-fold) and drinking (2.2-fold) [6]. More significantly, there is a strong synergistic effect of smoking, drinking, and chewing in promoting oral carcinogenesis, resulting in an over 40-fold increased oral cancer risk for the smoking-drinking-chewing persons [6][7][8].
Although oral cancer is mostly an environmentally induced cancer, genetic factors and gene-environment interactions also play an important role in its etiology. Epidemiological studies have observed an elevated risk of oral cancer in individuals with a family history of head and neck cancer [9][10][11]. Numerous candidate gene studies have been performed to assess the associations of selected single nucleotide polymorphisms (SNPs) in genes involved in essential biological pathways such as carcinogen metabolism, DNA repair, cell-cycle control, and inflammatory response with the risks of head and neck cancer overall and/or oral cancer specifically [12][13][14][15][16][17][18][19][20][21][22]. The only consistent associations from numerous candidate gene studies are those related to alcohol dehydrogenase (ADH) genes: SNPs in several ADH genes were associated with the risks of upper aerodigestive cancers including oral cancer and the effects became more apparent with increasing alcohol consumption, indicating a gene-environment interaction [20,23]. The genome-wide association study (GWAS) has revolutionized genetic association research and identified hundreds of thousands of novel susceptibility loci for hundreds of genetic traits, including approximately 800 susceptibility loci for over 20 different cancers [24][25][26]. There were only three published GWAS of head and neck cancer or upper aerodigestive cancer that included the subtype of oral cancer, all in European populations [23,27,28]. These GWAS identified at least nine genetic susceptibility regions for oral cancers, including 2p23.3 (GPN1), 4q21 (HEL308 and FAM175A), 4q23 (ADH1B, ADH1C, ADH7), 5p15.33 (CLPTM1L), 6p21 (HLA), 6p22.1 (ZNRD1-AS1), 9p21.3 (CDKN2A-CDKN2B), 9q34.12 (LAMC3), and 12q24 (ALDH2). These loci only explain a small portion of the genetic heritability of oral cancer. More susceptibility loci remain to be discovered. Furthermore, these loci were found in European descendants, who have much a lower incidence rate than Taiwan and were not exposed to betel quid. No GWAS of oral cancer has been conducted in Taiwan. Given the common (smoking and alcohol drinking) and distinct environmental exposures (betel quid chewing) in Taiwan, and the diversity of genetic structure across different ethnicities, we hypothesize that there are common and unique genetic susceptibility loci of oral cancer in the Taiwanese as compared with the European population. We therefore performed the first GWAS of oral cancer in Taiwan.

Demographics of Study Population
After strict quality control procedures, data from 1529 cases and 44,572 controls were included in the final analysis. Table 1 shows the selected characteristics of the cases and controls. The mean ages (standard deviation) of the cases and controls were 55 (11.5) and 54 (14.5) years, respectively. About 86.9% (N = 1328) of the patients and 77.7% (N = 34,616) of the controls were men.

Novel Variants Associated with Oral Cancer in the Taiwanese
A quantile-quantile plot of the observed versus expected χ 2 test statistics did not show a large deviation from what was expected by chance (inflation factor λ = 1.023; Figure 1A). Fifteen variants on chromosome 6p21 were associated with oral cancer with genome-wide significance (p < 5 × 10 −8 ) ( Figure 1B, Table 2), twelve of which were highly linked and located in or near the HLA-B gene on 6p21.33 ( Figure 2A). The other three, including one small insertion and two SNPs, were in the CFB, SKIV2L, and TNXB genes, respectively, on 6p21.32 ( Figure 2B). The CFB and SIKV2L SNPs were in moderate linkage, and the TNXB SNP was not linked to the other two. Besides these three independent loci on 6p21, there were several other promising loci that were close to having genome-wide significance, including another independent locus on 6p21.33 (HLA-DQ gene cluster, Figure 2C), the TERT-CLMPT1L locus on 5p15.33 ( Figure 3A), a locus near the TRPS1 gene on 8q23.3 ( Figure 3B), and the TMED3 gene on 15q25.1 ( Figure 3C).
show a large deviation from what was expected by chance (inflation factor λ = 1.023; Figure 1A). Fifteen variants on chromosome 6p21 were associated with oral cancer with genome-wide significance (p < 5×10 −8 ) ( Figure 1B, Table 2), twelve of which were highly linked and located in or near the HLA-B gene on 6p21.33 ( Figure 2A). The other three, including one small insertion and two SNPs, were in the CFB, SKIV2L, and TNXB genes, respectively, on 6p21.32 ( Figure 2B). The CFB and SIKV2L SNPs were in moderate linkage, and the TNXB SNP was not linked to the other two.

Common Variants Associated with Oral Cancer Validated among Various Populations
Previous GWAS in populations of European descent identified nine susceptibili loci for oral cancer. We queried these loci in our population. The consistent loci include rs1229984 in the ADH1B gene on 4q23, rs928674 in the LAMC3 gene on 9q34.12, and se eral loci on 6p21.32 and 6p21.33 (Table 3). For those loci that were close to genome-wid significance in previous GWAS, the TERT-CLMPT1L locus on 5p15.33 was highly co sistent and the lead SNP was rs7726159 in the TERT gene (OR = 1.19, p = 2.18 × 10 −7 previous GWAS and OR = 1.16, p = 6.40 × 10 −5 in our population) (Table 4).

Common Variants Associated with Oral Cancer Validated among Various Populations
Previous GWAS in populations of European descent identified nine susceptibility loci for oral cancer. We queried these loci in our population. The consistent loci included rs1229984 in the ADH1B gene on 4q23, rs928674 in the LAMC3 gene on 9q34.12, and several loci on 6p21.32 and 6p21.33 (Table 3). For those loci that were close to genome-wide significance in previous GWAS, the TERT-CLMPT1L locus on 5p15.33 was highly consistent and the lead SNP was rs7726159 in the TERT gene (OR = 1.19, p = 2.18 × 10 −7 in previous GWAS and OR = 1.16, p = 6.40 × 10 −5 in our population) (Table 4).

Discussion
In this study, we performed the first GWAS of oral cancer in Taiwan. We confirmed two previously reported 6p21.33 and 6p21.32 HLA loci, and found two additional independent 6p21.33 loci for oral cancer in the Taiwanese. We also confirmed the previously reported oral cancer susceptibility loci at the TERT-CLMPT1L locus on 5p15.33, thet 4q23 ADH1B locus, and the LAMC3 locus on 9q34.12 in Taiwanese populations. We found suggestive novel susceptibility loci near the TPRS1 gene on 8q23.3 and in the TMED3 gene on 15q25.1.
The most notable finding of this study is the multiple independent oral cancer susceptibility loci spanning the 6p21.3 regions, which contain the human leukocyte antigen (HLA) gene clusters. The HLA system plays essential roles in innate and adaptive immune responses [29]. The HLA system consists of three regions: the class I region encodes HLA-A, -B, and -C; the class II region encodes HLA-DR, -DQ, and -DP; and the class III region genes code for proteins of the complement system and the TNF family members. The functions of class I and class II molecules are to bind intracellular and extracellular peptide antigens and present them to antigen-specific T lymphocytes. Peptide antigens associated with HLA class I molecules are recognized by CD8+ T cells and those associated with HLA class II molecules are recognized by CD4+ T cells [30]. The 6p21.3 is the most gene-dense region in the human genome and the HLA genes are the most polymorphic in the human genome [31]. Previous GWAS have reported genetic variants in various HLA genes as susceptibility loci for many human diseases, including a few cancers such as lung cancer [32], liver cancer [33], cervical cancer [34], colorectal cancer [35], leukemia [36], lymphoma [37,38], and different subtypes of head and neck cancers [23,27,28,39]. Specifically, for oral cancer, previous GWAS identified three loci in the HLA region, including rs3828805 (chromosome 6 position 32636120) [27] and rs3135001 (6:32670136) in HLA-DQB1 (6p21.32) [28] and rs1265081 (6:31111675) in CCHCR1 (6p21.33) [28]. Recently, Ji et al. found that two independent SNPs in the 6p21.33 regions were associated with altered oral cancer risks in a Chinese population: rs2524182 (6:31130593) in TRIM39-RPP21-HLA-E and rs3131018 (6:31143582) in PSORS1C3-TCF19 [40]. In our current study, we found many SNPs in 6p21 that reached genome-wide or borderline genome-wide significance in their associations with oral cancer in Taiwan, covering multiple HLA genes (Table 2, Figure 2), further supporting the important roles of the HLA system in oral cancer etiology.
For other GWAS-identified oral cancer susceptibility loci in European populations, we were able to replicate the loci on 5p15.33, the 4q23 ADH1B locus, and the LAMC3 locus on 9q34.12 ( Table 3). The 5p15.33 region encompassing the TERT-CLPTM1L genes has been associated with the risks of at least 11 different cancers, including lung, prostate, breast, pancreatic, bladder, esophageal, endometrial, gastric, and head and neck cancers, glioma, and melanoma [28,32,[41][42][43][44][45][46]. Multiple mechanisms, including telomere structure, epigenetic modification, transcriptional regulation, and apoptosis, have been suggested to explain the associations of the TERT-CLPTM1L locus and cancer susceptibility [47,48]. Chromosome 4q23 contains a cluster of alcohol dehydrogenase (ADH) genes and several SNPs in different ADH genes have been identified as susceptibility loci for oral cancer in European populations (Table 3) and other populations [49]. The best-studied SNP is rs1229984, a missense SNP at codon 47 (Arg47His) of the ADH1B. The A allele (coding for His) is about 40 times more active in metabolizing alcohol and is associated with a reduced risk of oral cancer. The frequency of the A (His) allele is only~5% in European populations but reaches~80% in East Asians. We used the predominant A (His) allele as the reference group, and those with the less active G (Arg) allele had 20% (OR = 1.20, 95% CI, 1.06-1.35, p = 0.003), consistent with literature reports in Europeans and Asians [49].
We also found potential novel susceptibility loci at 8q23.3 near the TRPS1 gene. TRPS1 is an atypical member of the GATA transcriptional factor family, exhibiting transcriptional repression by interacting with corepressors [50][51][52]. Recent studies have suggested that TRPS1 is overexpressed in several cancers and can act as an oncogenic driver through various mechanisms [53], such as driving heterochromatic origin refiring and genome amplifications [54], controlling the cell-cycle progression [55], promoting epithelial-to-mesenchymal transition [56], promoting angiogenesis [57], and causing epigenetic alterations (DNA methylation and histone acetylation) [50,58,59]. There has been no report of TRPS1 in oral cancer. Future studies are warranted to investigate the role of TRPS1 in oral carcinogenesis. Another novel oral cancer susceptibility locus is in the TMED3 gene on 15q25.1. TMED3 is a transmembrane protein and plays an important function in vesicular transport and innate immunity [60]. Several recent studies have shown an increased expression of TMED3 in a number of cancers and TMED3 promotes the carcinogenesis of liver, breast, colorectal, lung, and endometrial cancer as well as glioma and osteosarcoma [61][62][63][64][65][66][67]. Biologically, TMED3 can activate IL-11/STAT3 and Wnt/beta-catenin signaling pathways [61,62]. The role of TMED3 in oral carcinogenesis remains to be investigated.
Whether these oral susceptibility loci in the Taiwanese are consistent in other Asian populations are largely unknown. A recent study in China [40] used Human Exome Bead-Chip (~240K mostly nonsynonymous coding variants), but the final analyzable SNPs were only~63K because most of the SNPs were monomorphic. They found two independent oral cancer susceptibility loci in the 6p21.33 regions: rs2524182 in TRIM39-RPP21-HLA-E and rs3131018 168 in PSORS1C3-TCF19 [40], consistent with our data. Other regions remain to be investigated. The data in India were more limited. Only one early pilot GWAS using human CNV370k BeadChip in only 55 cases and 92 controls was published [68]. Due to its small sample size, none of the reported European and Taiwanese oral cancer susceptibility loci were among the top hits in that study. The genetic susceptibility loci to oral cancer in India warrant further investigation.
The findings from our study not only shed significant insight into the biology of oral cancer etiology in Taiwan, but also have an important clinical and public health impact. The identified susceptibility genes may become potential preventive and/or therapeutic targets. For example, strategies that improve immune response, telomere maintenance, and alcohol metabolism may prevent oral cancer development given the importance of relevant genes in these pathways in oral cancer susceptibility. Another potential translational application is to use multiple genetic susceptibility loci to develop a polygenic risk score (PRS) for each person. This PRS can be integrated with environmental exposures such as smoking, drinking, and betel nut chewing to identify individuals at the highest risk of developing oral cancer, who then would be subjected to targeted cancer prevention and screening. This is the first GWAS of oral cancer in Taiwan. We had the largest oral cancer cases and controls in any association studies of oral cancer to date. We found both common and unique oral cancer susceptibility loci in the Taiwanese as compared to European populations. Our large sample size enabled the unequivocal confirmation of HLA regions as the most prominent oral cancer susceptibility loci in Taiwan. We also confirmed the susceptibility loci at 5p15.33, the 4q23 ADH1B locus, and the LAMC3 locus on 9q34.12. We found novel suggestive susceptibility loci near the TPRS1 gene on 8q23.3 and in the TMED3 gene on 15q25.1.

Study Population and Data Collection
The study participants were part of the China Medical University Hospital (CMUH) Precision Medicine Project, a systemic effort initiated in 2018 to recruit subjects and collect biospecimens from all patients who come to CMUH for medical visits [69,70]. More than 170,000 subjects have been enrolled to date. The recruitment and sample collection procedures were approved by the ethical committees of CMUH (CMUH107-REC3-058 and CMUH110-REC3-005). Each participant signed an informed consent form and provided blood samples. Clinical information was abstracted from the electronic medical records (EMRs) of the CMUH. A total of 1529 oral cancer patients (ICD-10-CM Diagnosis Code C00 to C06) and 44,572 controls (without a history of any cancer) were included in the study.

Genotyping and Imputation
The whole genome SNP genotyping using the Affymetrix genome-wide human SNP array 6.0 chip was performed according to the manufacturer's protocol [68]. We excluded samples and SNPs with genotyping call rates of <90%. We filtered out SNPs with a Hardy-Weinberg equilibrium p-value of <10 −6 , and a minor allele frequency (MAF) < 10 −4 . We excluded SNPs on sex chromosomes. Genotype imputation was performed as we recently described [69]. Briefly, we first constructed a population-specific reference panel by using whole genome sequencing data (1463 individuals) from the Taiwan Biobank (TWB). We used four algorithms (IMPUTE2, IMPUTE4, IMPUTE5, and Beagle5.2) and two reference panels (TWB and East Asian participants of the 1000 Genomes Project) to perform genotype imputation and found Beagle5.2 exhibited the fastest calculation speed, smallest storage space, highest specificity, and highest number of high-quality variants (15,277,414). The Beagle5.2 imputations were performed using its default parameters, except for the effective population size (20,000) and the buffer region (500,000 bases). The accuracy of the imputation result was measured using BCFtools gtcheck [71] to assess the concordance rate between the imputed genotypes and the WGS data. The imputation accuracy was 98.75% by Beagle5.2.

Statistical Analysis
For the participants' characteristics, continuous data were presented as the means with standard deviation, and categorical data were presented as proportions. We used t-tests to compare the mean values of continuous variables and chi-squared tests to compare the frequencies of categorical variables between the cases and controls. The association of each SNP with the risk of oral cancer was analyzed using an additive model in the logistic regression analysis with PLINK V.1.90 [72]. To control for population structure, we performed principal component analysis (PCA) in EIGENSTRAT and adjusted significant principal components (PC) associated with the cancer status in unconditional logistic regression analysis, together with demographic variables including age and gender when estimating odds ratio (OR) and 95% confidence interval (CI). A genome-wide significance level was set at 5 × 10 −8 . Funding: The authors are grateful to every subject who donated their samples to this study. We appreciate the support from China Medical University Hospital (DMR-112-130). The funders had no role in the study design, data collection, statistical analysis, or decision to publish or preparation of the manuscript.

Institutional Review Board Statement:
The recruitment and sample collection procedures were conducted according to the guidelines of the Declaration of Helsinki, and approved by the ethical committees of CMUH (CMUH107-REC3-058 and CMUH110-REC3-005).