Genome-wide association study (GWAS) of ovarian cancer in Japanese predicted regulatory variants in 22q13.1

Genome-wide association studies (GWAS) have identified greater than 30 variants associated with ovarian cancer, but most of these variants were investigated in European populations. Here, we integrated GWAS and subsequent functional analyses to identify the genetic variants with potential regulatory effects. We conducted GWAS for ovarian cancer using 681 Japanese cases and 17,492 controls and found that rs137672 on 22q13.1 exhibited a strong association with a P-value of 1.05 × 10−7 and an odds ratio of 0.573 with a 95% confidence interval of 0.466–0.703. In addition, three previously reported SNPs, i.e., rs10088218, rs9870207 and rs1400482, were validated in the Japanese population (P < 0.05) with the same risk allele as noted in previous studies. Functional studies including regulatory feature analysis and electrophoretic mobility shift assay (EMSA) revealed two regulatory SNPs in 22q13.1, rs2072872 and rs6509, that affect the binding affinity to some nuclear proteins in ovarian cancer cells. The plausible regulatory proteins whose motifs could be affected by the allele changes of these two SNPs were also proposed. Moreover, the protective G allele of rs6509 was associated with a decreased SYNGR1 expression level in normal ovarian tissues. Our findings elucidated the regulatory variants in 22q13.1 that are associated with ovarian cancer risk.


Introduction
Ovarian cancer (OC) is one of the most common cancers among women worldwide [1]. The high mortality rate in ovarian cancer is due to late diagnosis resulting from the nonspecific nature of symptoms and lack of effective screening tools [2]. In the Japanese population, ovarian cancer exhibits the highest mortality rate compared with other gynecologic malignant tumors, and its prevalence has been increasing since 1975, although the main cause remains unclear [3]. Genome-wide association studies (GWAS) have identified greater than 30 variants associated with OC susceptibility. Most of these studies were conducted in European populations [4][5][6][7][8][9][10], and only two studies included Asian populations [11,12]. Pathogenic variations in the BRCA1 and BRCA2 tumor suppressor genes responsible for most of hereditary breast and ovarian cancer syndromes [13] have been reported in numerous ethnic group including Japanese populations [14]. However, low-penetrance genetic variants still need to be elucidated, especially in Japanese populations.
To understand the functional consequences of cancer risk loci, post-GWAS analysis is performed, particularly with non-protein-coding variants. The goal is to uncover functional or causal SNPs that likely differ from associated SNPs obtained from GWAS. The systematic strategies for post-GWAS [15,16] include the following: (1) targeting SNPs in linkage disequilibrium (LD) with the associated SNP; (2) determining mRNA expression levels of nearby genes that may be affected by the expression quantitative trait loci (eQTL); (3) characterization of gene regulatory regions; (4) identification of potential epigenetic mechanisms using tissue-specific data. In addition, (5) electrophoretic mobility shift assays (EMSA) are used to confirm the potential interaction between the tested variant and transcription factors (TF) [17]. Here, we performed a first population-based case-control GWAS in ethnical Japanese, and then selected the loci with the strongest associations for post-GWAS analyses.

Patients and controls
All participants were ethnic Japanese women. The DNA samples of 681 ovarian cancer patients were stored in an automated DNA storage system; and 5μg of DNA samples (50 μl at a concentration of 100 ng/μl) were provided by Biobank Japan [18]. The 17,492 noncancer control female samples were obtained from four population-based cohorts: the JPHC (Japan Public Health Center)-based Prospective Study [19], the J-MICC (Japan Multi-Institutional Collaborative Cohort) study [20], ToMMo (Tohoku Medical Megabank Organization) and IMM (Iwate Tohoku Medical Megabank Organization) [21,22]. The characteristics of each cohort are presented in Table 1; only the age of subjects was included in this analysis. All participating studies obtained written informed consents from all participants by following the protocols approved by their institutional ethical committees before enrollment. The consent procedure was approved by the ethical committees at each institute. This study was approved by the first ethics committee of Institute of Medical Science, The University of Tokyo (approval number of 29-74). We cannot access to any patient-level identifying information as part of the study.

GWAS genotyping and imputation
DNA genotyping and imputation were conducted at RIKEN Center for Integrative Medical Sciences in the previous studies for cases [23] and controls [24]. Genomic DNA samples were extracted from peripheral blood leukocytes using a standard method. All case and control samples were genotyped using the Illumina OmniExpress Exome or the OmniExpress+Huma-nExome BeadChip (Illumina Inc.). The type, version, and number of SNPs of genotyping platform used in each cohort were described in Table 1. A list of SNPs in each platform was obtained from Illumina. We select 925,436 common SNPs those were genotyped by any platforms for all samples. Allele calling algorithm used to compute the genotyping data was GenomeStudio V2011.1. A quality control was applied to the raw genotyping data to filter unqualified SNPs following the criteria as previously described [24]. We excluded SNPs that met the following criteria: minor allele frequency (MAF) < 0.01; Hardy-Weinberg equilibrium (HWE) P-value < 1 × 10 −6 ; call rate < 0.99. We also exclude SNPs with a large allele frequency difference between the reference panel and the GWAS (> 0.16) as described previously [25]. After quality control, 498,990 SNPs were included for imputation analysis. Imputation of the ungenotyped SNPs was conducted with MaCH [26] and minimac [27] using the data from the JPT/CHS/CHB subjects and the 1000 genomes project phase 1 (release 16, March 2012) as a reference. Allele labels, as an effect or non-effect allele, and allele frequencies of imputed SNPs were obtained from minimac [27]. Post-imputation quality control was performed based on these following exclusion criteria: (1) MAF < 0.01; and (2) HWE P-value < 1 × 10 −6 . Finally, a total of 7,521,072 imputed SNPs were obtained for further analyses. The genotyping data is available at DNA DataBank of Japan (DDBJ) with an accession number: JGAD00000000123.

Statistics
The statistical analyses were done for the SNPs that were common to different genotyping platforms used (Table 1) and whose genotype information was available for all cases and controls after imputations/implementation of quality control measures. The association between SNP and risk for developing ovarian cancer risk was investigated using logistic regression based on the first principal component (PC1) and the second principal component (PC2) as covariates [28]. The genetic inflation factor lambda (λ) was derived from P-values obtained using the Cochran-Armitage trend test for all the tested SNPs [29,30]. The quantile-quantile plot was drawn using the R program. Odds ratios were calculated using the non-effect alleles as references. The effect size (beta) from the logistic model and the standard error for beta (SEbeta) were calculated using R program. The 95% confidence interval was calculated based on the

SNP selection
We selected 201 candidate SNPs within 24 regions exhibiting a high association with ovarian cancer based on the following inclusion criteria: GWAS P-value < 1 × 10 −5 and imputation quality score (Rsq) > 0.3 (S1 Table). Pairwise linkage disequilibrium (r 2 ) between each SNP and lead SNP (the SNP with the lowest P-value in each region) in Japanese was obtained from Ensembl [31]. SNPs previously reported to be associated with ovarian cancer in published GWASs were obtained from GWAS catalog (https://www.ebi.ac.uk/gwas/). The data of reported SNPs, including risk allele and odd ratio, were retrieved from the original publications and further compared to the data of this Japanese dataset.

Analysis of regulatory features
Thirty candidate SNPs at 22q13.1 were analyzed based on their location, epigenetic markers (i.e., H3K4Me1, H3K4Me3, and H3K27Ac) in ENCODE [32], Ensembl regulatory build indicating gene regulation, and TF binding data in Ensembl, ReMap [33], and JASPAR [34]. All data were visualized in the UCSC genome browser using track data hubs [35]. Regarding track settings in ReMap 2018, transcription regulators with peaks greater than 1.5 kb in size were retrieved from all public and ENCODE ChIP-seq data [33]. For JASPAR, we chose predicted binding sites with matching scores greater than 400 (P-value � 10 −4 ). The ovary-specific transcriptional regulations, including epigenome activity representing open chromatin and TF binding retrieved from ChIP-Seq data, were obtained from Ensembl. Regional plots were generated using LocusZoom (http://csg.sph.umich.edu/locuszoom). The SNPs located in regulatory regions were further analyzed. The transcription factors reported in three databases and epigenome activity in ovaries were investigated. eQTL data of each SNP with nearby genes in normal ovarian tissues were obtained from Ensembl. The TF binding motifs containing SNP sequences were downloaded from HOCOMOCO [36], abstracting from ChIP-Seq datasets with quality A ratings.

Electrophoretic mobility shift assay
SKOV3 cells were purchased from the American Type Culture Collection (ATCC). Cell culture was maintained using the depositor's recommendations. Nuclear proteins from SKOV3 cells were extracted using NE-PER nuclear and cytoplasmic extraction reagents (Thermo Fisher Scientific) according to the manufacturer's protocol. Protein concentrations were measured using a BCA protein assay (Thermo Fisher Scientific). EMSA was performed using DIG Gel Shift Kit, 2 nd Generation (Roche) following the manufacturer's instruction with the additional step of re-annealing to eliminate the non-specific bands [37]. EMSA was performed two times separately, including screening for nine SNPs in 22q13.1 (S2 Fig) and confirming positive SNPs. The sequences of oligonucleotide probes are listed in S2 Table. In brief, 60 fmol of labeled probes containing SNP positions were hybridized with 5 mg of nuclear protein extract for 15 minutes at 20˚C. The mixtures were then loaded into a 6% TBE gel, separated by electrophoresis at 4˚C and transferred onto a nylon membrane. The membrane was then hybridized with anti-digoxigenin-AP antibody and developed by CSPD solution. The intensity of the shifted band was quantified using ImageJ software [38].

GWAS of ovarian cancer in a Japanese population
The DNA samples of 681 ovarian cancer patients and 17,492 cancer-free control females were genotyped by Illumina OmniExpress Exome or the OmniExpress+HumanExome Bead-Chip. The characteristics of each cohort are presented in Table 1. We conducted a standard quality control and genome-wide imputation analysis. The SNPs were excluded based on the following criteria: minor allele frequency (MAF) < 0.01; Hardy-Weinberg equilibrium Pvalue < 1 × 10 −6 ; call rate < 0.99; GWAS allele frequency difference from the reference panel > 0. 16. Consequently, we obtained the genotyping results of 7,521,072 imputed SNPs on autosomal chromosomes and analyzed their associations with OC risk (Fig 1A). The genomic inflation factor lambda (λ) was 1.035 ( Fig 1B). We selected 201 candidate SNPs in 24 genomic regions demonstrating a suggestive association (P-value < 1 × 10 −5 ). The most significant SNPs in each region, called lead SNPs, are presented in Table 2. Regional plots of 24 candidate loci are presented in S1 Fig. Among all candidates, rs137672 on 22q13.1 that is located in the upstream region of the SYNGR1 gene (Synaptogyrin 1) exhibited the strongest association (P = 1.05 × 10 −7 ; odds ratio of 0.573 with 95% confidence interval of 0.466-0.703). Detailed information of all 201 candidate SNPs are presented in S1 Table.

Associations of reported variants in a Japanese population
The imputed SNPs in this study were investigated whether they had been previously reported in published GWASs. First, we included reported SNPs that exhibited associations with OC susceptibility in any population, but not including the SNPs associated with specific OC subtypes or OC survival. Next, the reported SNPs were searched in this GWAS and found that 34 SNPs, reported in nine studies [4][5][6][7][8][9][10][11][12], passed the quality control and could be evaluated in the Japanese dataset (Table 3). Among nine studies, two included Asian populations [11,12]. Chen

Analysis of regulatory features
From GWAS results, we selected the most strongly associated loci (22q13.1) in a Japanese population, including 30 candidate SNPs that passed the criteria (P < 1 × 10 −5 and Rsq > 0.3) for post-GWAS analyses. The enlarged view of the regional plot for these 30 candidate SNPs is presented in Fig 2A. We analyzed the following regulatory features (Fig 2B): 1) epigenetic markers indicating an active promotor or enhancer region, i.e., H3K4Me1, H3K4Me3, and H3K27Ac; 2) regulatory build indicating regions that are likely to be involved in gene regulation; 3) transcriptional regulation data. The results demonstrated that nine SNPs were located in regions with positive epigenetic markers and transcription factor binding sites based on ReMap and ENCODE ( Fig 2B); however, these regions did not include the lead SNP rs137672 or seven SNPs with absolute linkage disequilibrium with the lead SNP (r 2 = 1) (S1 Table). Among nine SNPs in regulatory regions in 22q13.1, only rs6509 was located in the proteincoding region on exon 2 of the RPL3 gene. However, this variant was a synonymous SNP that did not affect the protein sequence. Moreover, five SNPs were located in predicted promotor regions active in ovary cells reported by Ensembl, i.e., rs738331, rs6509, rs470082, rs5757613,  and rs137627 (Table 4). The eQTL data demonstrated significant associations (P < 0.05) with SYNGR1 level for six SNPs, i.e., rs137620, rs137621, rs94852, rs6509, rs470082, and rs137627. Only rs137627 was also associated with PDGFB level (Table 4). Noteworthy, the eQTL effect size of all nine SNPs revealed that the trend of association with RPL3 was in the opposite direction to that with PDGFB and SYNGR1.

Allele specific binding of nuclear proteins to rs2072872 and rs6509
To investigate whether the genetic variations in 22q13.1 affect the binding affinity of some transcription factors, we performed the electrophoretic mobility shift assays using the nine candidate SNPs. We examined the binding of nuclear proteins extracted from SKOV3 human ovarian cancer cells and a labeled oligonucleotide corresponding to each allele of the candidate SNPs. Among the three SNPs that exhibited allele-specific binding in the screening (S2 Fig), rs2072872 and rs6509 showed consistent results in the confirmation step (Fig 3). The oligonucleotides corresponding to G alleles of these two SNPs exhibited stronger binding affinity to nuclear proteins compared with A alleles (S2 Fig and Fig 3). Several transcription factors, including KLF6 and TP53, are predicted to bind DNA fragments containing these SNPs with different affinities as shown in

Discussion
This is the first GWAS for ovarian cancer using Japanese case-control samples. Furthermore, the functional analyses were carried out following a GWAS to distinguish functional from non-functional risk SNPs. Novel 201 SNPs in 24 loci exhibited an association with ovarian cancer susceptibility with P-value less than 1 × 10 −5 (S1 Table). Among all candidates, rs137672 in the upstream region of SYNGR1 gene at 22q13.1 was the most associated variant (P = 1.05 × 10 −7 ). Given the relatively small number of patients, if compared to previously published GWASs [4,5,11], no SNPs with a significant GWAS P-value (< 5 × 10 −8 ) was observed in this study. Indeed, the incidence of ovarian cancer in Japanese population was lower (agestandardized rate = 8.4 per 100,000 persons/year in 2012) than that reported in European population that exhibited the highest incidence in central and eastern Europe (age-standardized rate = 11.4 per 100,000 persons/year in 2012) [1]. However, the prevalence of pathogenic variants in BRCA1/2 seems comparable across diverse ethnicities, including European and Asian women [39], suggesting that other risk or protective factors still need to be identified. In the present study, we verified the significant associations (P < 0.05) of three previously reported  Table. The shifted band indicated the interaction between nuclear protein extracted from SKOV3 cells and probes containing the SNP allele as indicated, i.e., A or G allele. The star indicates specific binding to the G allele of each SNP. Arrow indicates non-specific interaction found in every sample. The intensity of a shifted band was quantified based on the fold-change of the G allele with respect to the A allele using ImageJ software. https://doi.org/10.1371/journal.pone.0209096.g003 SNPs in a Japanese population; all SNPs exhibited similar associations with other ethnicities (Table 3). Among these variants, the association of rs10088218 was previously reported in an Asian population (Han Chinese) [12], whereas rs9870207 and rs1400482 were investigated in Asian for the first time in this study.
The regulatory feature analysis of 30 SNPs (P < 1 × 10 −5 ) with the strongest associations (22q13.1) unveiled nine candidate functional SNPs that exhibited interactions with some transcription factors based on ChIP-Seq databases and positive histone marks associated with active promotor or enhancer based on ENCODE (Fig 2). We subsequently analyzed nine candidate causal SNPs using EMSA and identified two regulatory SNPs, rs2072872 and rs6509, that affected the binding affinity to nuclear proteins from ovarian cancer cells (Fig 3). In GWAS, the association results of rs2072872 (G/A) were OR (95% CI) = 0.612 (0.501-0.750) with P = 2.08 × 10 −6 , and the results for rs6509 (C/T or G/A) were OR (95% CI) = 0.613 (0.501-0.750) with P = 2.16 × 10 −6 (S1 Table). The distances between the SNP with the strongest association with OC risk (rs137672) and the regulatory SNPs, rs2072872 and rs6509, were 23.97 kilobase and 22.60 kilobase, respectively. The pairwise r 2 of these two SNPs to the SNP rs137672 was 0.85 in Japanese (S1 Table), suggesting that the strong association of rs137672 may be influenced by the two regulatory SNP. The eQTL results demonstrated that the T (A) allele of rs6509 was associated with increased SYNGR1 levels (effect size = 0.23, P = 0.005) ( Table 4). Given that these two SNP are in complete LD (r 2 = 1, both SNP's G alleles are correlated), the post-GWAS results can possibly predict the regulation of transcription factor(s) that synergistically regulate(s) the decreased expression of SYNGR1 through binding to G allele of rs6509 and rs2072872 simultaneously. However, in vivo experiments are essential to verify that either or both of rs6509 and rs2072872 have the regulatory functions. GWAS identified G alleles of both SNPs as being associated with reduced OC risk. Our finding suggested that SYNGR1 and higher level of SYNGRI expression may plausibly increase ovarian cancer risk.
The RPL3 gene encodes a ribosomal protein L3 that plays an essential role in the initial step of protein translation [40,41]. Moreover, RPL3 is involved in modulation of cell cycle and apoptosis pathways [42] and serves as a target of Omacetaxine, an anticancer drug used for chronic myeloid leukemia [43]. RPL3 mRNA expression is extraordinarily high in ovarian tissue compared with other organs [44], highlighting some important functions that should be investigated. Although SNPs at 22q13.1 were not associated with RPL3 expression based on eQTL data, further studies should focus on functional roles of these SNPs and RPL3 in ovarian cancer risk. In addition, the role of SYNGR1 in ovaries should be clarified.
In conclusion, we utilized GWAS and post-GWAS analyses to identify regulatory genetic variants that were predicted to function as transcriptional regulators, without causing amino acid changes. Although further replication studies are essential, our results elucidated the important role of genetic variations in the development of OC among the Japanese population.