Variant Alleles of the ESR1, PPARG, HMGA2, and MTHFR Genes Are Associated With Polycystic Ovary Syndrome Risk in a Chinese Population: A Case-Control Study

Polycystic ovary syndrome (PCOS) is the most common endocrinopathy in women of reproductive age, with a prevalence of 6–8%. Although the etiology of PCOS has been investigated extensively, the association between genetic predisposition and PCOS risk is largely unknown. In this study, we genotyped 63 SNPs in 10 genes among 361 PCOS patients and 331 healthy controls in a Chinese Han population. The following variant alleles were significantly associated with decreased PCOS risk: ESR1 rs9340799 (P = 0.000), PPARG rs709154 (P = 0.013), and rs1151996 (P = 0.013), HMGA2 rs2272046 (P = 0.000), MTHFR rs1801133 (P = 0.000). Accordingly, the following genotypes at various loci were associated with reduced PCOS risk: GA genotype at rs9340799 (P < 0.0001) in ESR1, TA genotype at rs709154(P < 0.0001) in PPARG and CA genotype at rs2272046 (P < 0.0001) in HMGA2. Moreover, GA genotype at rs1999805 (P = 0.013) in ESR1 and TT genotype at rs1801133 in MTHFR (P < 0.0001) correlated with elevated PCOS risk. Furthermore, haplotype analysis revealed significant differences in haplotype distributions of CYP11A1, ESR2 and PPARG gene between cases and controls. In addition to confirming that ESR1 rs9340799, HMGA2 rs2272046 and MTHFR rs1801133 are related to the risk of PCOS, these findings also provide the first evidence that PPARG rs709154 and ESR1 rs1999805 are significantly associated with PCOS risk in a Chinese population. Further functional studies are warranted to elucidate the underlying biological mechanisms.


INTRODUCTION
Polycystic ovary syndrome (PCOS) is the most common endocrinopathy in women of reproductive age, with a prevalence of 6-8% (1). This syndrome is characterized by hyperandrogenism, amenorrhea or oligomenorrhea and polycystic ovaries (2,3). Women with PCOS are potentially at elevated risk of multiple diseases and disorders, including infertility (4), insulin resistance (5), obesity (6), premature carotid arteriosclerosis (7), type 2 diabetes mellitus (8), and mood disorders, such as depression and anxiety (9,10). Epidemiological studies suggest that androgen excess, lifestyle (11), ovulatory dysfunction, alteration in intrauterine environment (12), adipose tissue dysfunction and gonadotropin abnormalities can contribute to PCOS risk (13). The etiology of PCOS is complicated and not well elucidated.
A substantial body of evidence has implicated various genetic factors in PCOS development (14). Related studies have been conducted on at least 70 genes, which roles are involved in the main process of this disease, such as steroid synthesis (CYP11A1 and CYP19A1) (15), steroid action (ESR1, ESR2, and PGR) (16), lipid metabolism (PPARG and MTHFR) (17)(18)(19), insulin action (HMGA2) (20) and embryonic development (SUMO1P1) (21,22) have been implicated in PCOS. However, the heterogeneity and generalizability of these genetic associations are not enough to explain clearly the considerable genetic susceptibility for this endocrine-metabolic disorder. For example, FSHR rs6166 was associated with increased PCOS risk in a study of 377 Chinese PCOS patients and 388 age-matched healthy controls (23), but not in a study in a different ethnic population (24). ESR1 rs9340799 has been associated with increased PCOS risk in a Pakistani population (25), but not in a Brazilian population (26).
Therefore, finding as many single-nucleotide polymorphisms (SNPs) as possible through population studies will provide valuable assistance to clinicians and patients. Except for the previous reported genes which are involved in the pathogenesis of PCOS, morphological changes in the ovary is a key part in the development of PCOS. Related studies have proved that LAMC1 can promote the development of follicles (27) and some SNP polymorphisms were significantly associated with premature ovarian failure (POF) (28). Because of the role of follicular development in the development of PCOS, the association between LAMC1 polymorphism and PCOS risk deserved studying.
Considering the important role of genetic factors in this pathogenesis of PCOS, the aim of the study was to discover some genetic variations associated with these abnormal pathological mechanisms. So we have chosen several sites in CYP11A1, CYP19A1, ESR1, ESR2, PGR, PPARG, LAMC1, HMGA2, MTHFR, and SUMO1P1 as our research object according to the functional description in the literature. We genotyped for 63 SNPs including 40 which had never been assessed for their potential association with PCOS in 10 known pathogenicitycausing genes by a case-control study of 361 PCOS cases and 331 controls in a Chinese population.

Sample Collection
A total of 361 PCOS patients and 331 controls, all of Han Chinese ethnicity, were included in this multi-center study. Subjects were recruited between January 2015 and February 2017 at the Reproductive Hospital Affiliated to Shandong University, at the Women's Hospital of the School of Medicine of Zhejiang University and at the Renji Hospital Affiliated to Shanghai Jiao Tong University. PCOS was diagnosed based on the NIH criteria [National Institutes of Health/National Institute of Child Health and Human Development (NIH/NICHD) in April 1990], including biochemical and/or clinical hyperandrogenism and ovulatory dysfunction, after the exclusion of related or other disorders (1). This study was approved by the Shanghai Xinhua Hospital Research Ethics Board and all subjects gave their informed consent for inclusion before they participated in the study. Women were excluded if they had been diagnosed with 21-hydroxylase-deficiency, androgen-producing tumors, hyperprolactinemia, non-classical adrenal hyperplasia, Cushing's syndrome, and active thyroid disease, since these conditions likely affect reproductive physiology (19,29). All the controls were defined as women with normal hormonal status and regular menstrual cycles at intervals of 28-35 days. Information on participants was collected from medical records.

Biochemical and Hormonal Analyses
Peripheral blood samples were taken from participants in a fasting state on days 3-5 of the menstrual cycle. Blood samples were collected into EDTA anticoagulant tubes, centrifuged, and stored at −80 • C. Levels of luteinizing hormone (LH), folliclestimulating hormone (FSH), testosterone (T), and estradiol (E 2 ) were measured using a chemiluminescent analyzer (Beckman Coulter, Fullerton, CA, USA). The limit of detection (LOD) of LH, FSH, T and E2 was 0.1 IU/L, 0.1 IU/L, 0.1 nmol/L and 0.1 pg/ml respectively. They intra-and inter-assay coefficients of variations (CV) were <6% and <10%, respectively.

SNP Selection and Genotyping
A total of 63 SNPs were selected in the following 10 genes, mainly based on literature review: CYP11A1, CYP19A1, ESR1, ESR2, PGR, PPARG, MTHFR, HMGA2, LAMC1, and SUMP1O1 (21). Five of the SNPs were 3 ′ -flanking variants; 57 SNPs, intronic variants; 3 SNPs in exons, non-synonymous variants; 3 SNPs in exons, synonymous variants; and 2 SNPs near 5 ′flanking regions. Genomic DNA was extracted from the buffy coat using DNA isolation kits (TIANGEN, China) according to the manufacturer's protocol. Genotyping was carried out using polymerase chain reaction ligation detection reaction (PCR-LDR). Genotyping was performed by technicians blinded to case   or control status. In addition, 5% of samples were re-genotyped, and the results agreed 100% with the first results. PCRs were carried out on an ABI 7300 (Applied Biosystem, Foster City, CA, USA) in a total volume of 20 µl including 20 ng genomic DNA, 2.5 mM MgCl2, 0.2 mM dNTPs, 0.5 µM each primer, 1 × PCR Buffer and 1 U Hot-Start Taq DNA polymerase (QIAgen). Cycling parameters were set as follows: 95 • C for 10 min, 38 cycles at 94 • C for 15 s, 60 • C for 1.5 min, 72 • C for 60 s, followed by 72 • C for 10 min. The ligation reaction was carried out in a final volume of 20 µl containing 2 µl template DNA, 1 µl 10 × ligation buffer, 20 U Taq DNA ligase and 1 pmol of each discriminating probe (New England Biolabs, USA). The LDR parameters were as follows: 94 • C for 1.5 min, 35 cycles at 94 • C for 15 s and 58 • C for 1.5 min. Following the LDR reaction, 1 µl LDR reaction product was mixed with 1 µl loading buffer as well as 1 µl ROX. The mixture was then analyzed using the ABI 7900HT (Applied Biosystems).

Statistical Analysis
Hormone levels, differences in age and body mass index (BMI) between controls and cases were evaluated using Student's t-test.
The fit of results for each SNP in controls was assessed against expectations based on Hardy-Weinberg equilibrium (HWE) using the χ 2 -test. Associations between selected polymorphisms and PCOS were assessed using unconditional logistic regression with adjustment for BMI and age. Pairwise linkage disequilibrium (LD) among the selected SNPs was examined using Lewontin's standardized coefficient D' and LD coefficient r 2 (30). Haplotype blocks were defined by the method of (31) in the publicly available Haploview software (www.broadinstitute.org/haploview/downloads) using default settings: the confidence interval for a strong LD was minimal for upper 0.98 and low 0.7 and maximal for a strong recombination of 0.9, and a fraction of strong LD in informative comparisons was at least 0.95. Haplotype analysis was performed using the SNP stats tool (www.snpstats.net/snpstats/start.htm? q=snpstats/start.htm), with adjustment for age and BMI where appropriate. Multiple test corrections were performed for each gene using Bonferroni correction. All statistical analyses were performed using SAS 9.4 software and R. P < 0.05 was considered statistically significant.

Characteristics of the Participants
The characteristics of the participants are shown in Table 1. The age of cases ranged from 19 to 40 years (mean ± SD, 28.1 ± 3.7); the age of controls, from 22 to 45 years (28.4 ± 4.2, P = 0.041). As expected, levels of LH, testosterone and BMI in the PCOS group were significantly higher than that in healthy controls.

Association Between Individual SNPs and PCOS Risk
Results on minor allele frequencies and HWE at selected SNPs are shown in Table 2. Among the 63 polymorphisms, five deviated from HWE (P < 0.05): rs1801132, rs1884051, rs722208, rs932477, and rs728524. Minor allele frequencies of 0 were observed for the following five SNPs: rs1913474, rs1042839, rs480851, rs1805192, and rs709150. These 10 SNPs were excluded from subsequent analysis. Among 53 polymorphic locus analysis, we found that the following variant alleles were significantly associated with decreased PCOS risk: The A>G allele distributions at rs9340799 in ESR1 (P = 0.000), the A>T at rs709154 (P = 0.013), and A>C at rs1151996 (P = 0.013)      Boldface indicates P < 0.05; CI, confidence interval; NA, not available; OR, odds ratio; PCOS, polycystic ovarian syndrome; SNP, single-nucleotide polymorphism. a P value was from unconditional logistic regression with adjustment for BMI and age.
P ajusted were performed for each gene using Bonferroni correction. Frontiers in Endocrinology | www.frontiersin.org in PPARG. the A>C at rs2272046 in HMGA2 (P = 0.000), and the T>C at rs1801133 in MTHFR (P = 0.000, Table 3). Further we detected the following genotype distribution were associated with decreased PCOS risk: GA genotype at rs9340799 (P < 0.0001) in ESR1, TA genotype at rs709154 (P < 0.0001) in PPARG, CA genotype at rs2272046 (P < 0.0001) in HMGA2. Moreover, GA genotype at rs1999805 (P = 0.013) in ESR1 and TT genotype at rs1801133 in MTHFR (P < 0.0001) correlated with elevated PCOS risk ( Table 4). We conducted a stratified analysis of risk-related loci by BMI (BMI < 24, BMI ≥ 24; Supplementary Table 1). Finally we found that BMI may not affect the risk of rs9340799 and rs1999805 in ESR1, rs2272046 in HMGA2, rs1801133 in MTHFR, rs709154 in PPARG gene and PCOS.

Haplotype Block Structure and Haplotype Analysis
Haploview analysis indicated strong LD among SNPs in the genes CYP11A1, ESR1, ESR2, LAMC1, PGR, and PPARG (Figures 1-3). In CYP11A1, three haplotype blocks were defined using 8 genotyped SNPs. In PGR, two haplotype blocks were formed: the first spanned 17 kb and contained five tested SNPs, and the second contained three tested SNPs. In addition, ESR1, ESR2, LAMC1, and PPARG were constructed respectively with one block. Table 5 summarizes the associations between genetic haplotypes and PCOS risk. After adjusting for age and BMI, the risk of PCOS was significantly increased among individuals carrying the haplotype "CGA" in Block3 of CYP11A1 (OR = 2.30, 95% CI = 1.45-3.66, P ajusted = 0.013), compared with those carrying the most common haplotype "TAG." Similarly, the risk was also increased among individuals carrying the haplotype "GTGC" in ESR2 (OR = 2.01, 95% CI = 1.33-3.02, P ajusted = 0.020), compared with those carrying the haplotype "GCAC." Furthermore, one highly protective haplotype "CGCA" in PPARG with 87% reduction in risk of developing PCOS comparing with those carrying the haplotype "AGCA" (OR = 0.13 and 95% CI = 0.04-0.40, P ajusted = 0.010).
In CYP11A1, three haplotype blocks were defined using 8 genotyped SNPs. In ESR1, one haplotype block was defined in each gene using the Haploview program with default settings. The confidence interval minima were upper 0.98 and low 0.7 in the case of strong LD; the upper confidence interval maximum was 0.9 for strong recombination; and the fraction of strong LD in informative comparisons had to be at least 0.95. SNPs are  In ESR2 and LAMC1, one haplotype block was defined in each gene using the Haploview program with default settings.
In PGR, two haplotype blocks were defined using 10 genotyped SNPs. In PPARG, one haplotype block was defined in each gene using the Haploview program with default settings.

DISCUSSION
In this PCOS case-control study in a Chinese Han population, we found the following genotypes were associated with a lower risk of developing PCOS: GA genotype of rs9340799 in ESR1, TA genotype of rs709154 in PPARG, and CA genotype of rs2272046 in HMGA. Conversely, the GA genotype of rs1999805 in ESR1, and the TT genotype of rs1801133 in MTHFR were significantly associated with increased risk of PCOS. Associations of ESR1 rs9340799 (25), HMGA2 rs2272046 (21), and MTHFR rs1801133 (32) with PCOS risk have been reported in other studies. In contrast, our results appear to provide the first evidence linking rs1999805 in ESR1 and rs709154 in PPARG with PCOS risk in a Chinese population. These findings provide evidence that polymorphism in genes involved in hormonal action, lipid metabolism and insulin action may modify PCOS risk.
Polymorphism in the ESR1 gene may influence PCOS risk because the protein is necessary for the proper functioning of the hypothalamic-pituitary-ovarian axis. The protein is up-regulated in theca cells of polycystic ovaries, such that the ratio of ESR1 to ESR2 expression is elevated in PCOS, which may contribute to abnormal follicular development (33). Whatever the underlying mechanism, the association between ESR1 polymorphism and risk of PCOS appears to depend in complex ways on ethnicity. A Pakistani study linked the CC genotype of ESR1 rs2234693, the GG genotype of rs9340799, and the CT genotype of rs8179176 Boldface indicates P < 0.05, CI, confidence interval; OR, odds ratio. a Polymorphic bases are listed in 5 ′ -3 ′ order as in Figure 1. b Adjusted for age and BMI. P ajusted were performed for each gene using Bonferroni correction.
with elevated disease risk (25). However, studies in Greeks (34) and Caucasians (35) revealed no differences in frequencies of these genotypes between PCOS patients and controls. The present study found that the GA genotype of rs9340799 protected Chinese women from PCOS. Further study should clarify to what extent these divergent findings reflect ethnicity. Our results raise the possibility of a correlation between polymorphism at rs1999805 in influencing PCOS risk. We found that the GA genotype of rs1999805 increased that risk in a Chinese population. Since rs1999805 lies in an ESR1 intron, the question arises whether the polymorphism contributes to disease or is simply a marker in LD with other untyped functional variants. Our LD analysis does not predict other disease-relevant sites in this gene. Therefore, the pathway(s) connecting rs1999805 polymorphism with PCOS warrant further exploration.
Our finding of a link between rs709154 and rs1151996 in PPARG, which lies in an intron of the gene, and risk of PCOS may be explained by the fact that the encoded protein functions as a nuclear hormone receptor to play a crucial role in lipogenesis, cell differentiation, inflammatory cytokine production, glucose homeostasis and insulin sensitization (36). We found evidence that PPARG rs709154 and rs1151996 acts as a protective factor against PCOS. Similarly, a meta-analysis has concluded that PPARG rs1801282 C>G polymorphism is associated with decreased PCOS risk. Conversely, the PPARG rs1801282 (Pro12Ala) polymorphism increased PCOS risk in a study comparing 100 African PCOS patients and 120 healthy controls (37). We found PPARG rs709154 to be in strong LD with rs709149 and rs709151, but we are unaware of studies investigating possible correlation among these three polymorphisms in PCOS risk. Future work should examine this question.
Our results with rs2272046 in HMGA2 are consistent with those of a previous study of 744 Han Chinese PCOS patients and 895 healthy controls linking the SNP with decreased PCOS risk (21). The polymorphism rs2272046 is located in an HMGA2 intron, raising the possibility that it affects risk of PCOS by altering degradation or translation of the HMGA2 mRNA (38). The SNP rs2272046 is in complete LD with rs74980477 (D ′ = 1, r 2 = 1), and the latter is predicted to alter the function of the myc gene. Normal expression of myc oncoprotein plays a critical role in initial oocyte growth and autonomous growth of granulosa cells, so dysregulation may affect follicular development (39). However, This study didn't revealed that SUMO1P1 was associated with PCOS risk, and the result was inconsistent with the GAWS studies (21). The population geographic differences, our limited power and clinical features differences in these subjects may partly explain the inconsistent results.
The results with rs1801133 in MTHFR are consistent with an Iranian study linking the CC genotype at this locus with decreased PCOS risk (32). The SNP rs1801133 occurs in an MTHFR exon that changes an Ala to Val, based on South Han Chinese data from the 1000 Genomes Project (40). This polymorphism is associated with acetylation of lysine 27 of the H3 histone protein, and this H3K27Ac mark is believed to enhance transcription, possibly by blocking the spread of the repressive histone mark H3K27Me3, which is usually found near regulatory elements and is considered to enhance transcription (41). In this way, polymorphism at rs1801133 may affect MTHFR expression and thereby affect risk of PCOS.
This study has some limitations. Our relatively small sample size limited the statistical power of our findings. We recruited only Han Chinese participants, preventing the analysis of how our SNP results depend on ethnicity. The fact that five of our 63 SNPs deviated from HWE and 5 had a minor allele frequency of 0 greatly reduced the rate of locus detection. Since our study lacked a validation group, further work is needed to confirm our observed associations between SNPs and PCOS risk.
In summary, we found, the first evidence that polymorphism at ESR1 rs1999805 and PPARG rs709154 were significantly associated with PCOS risk in Han Chinese. GA genotype at rs9340799 in ESR1, TA genotype at rs709154 in PPARG, CA genotype at rs2272046 in HMGA2, and GA genotype at rs6022786 in SUMO1P1 may be associated with a lower risk of developing PCOS. Conversely, GA genotype at rs1999805 in ESR1, TT genotype at rs1801133 in MTHFR correlated with elevated PCOS risk. Our findings may help clarify the genetics of PCOS and generate leads for further functional research on the disease. Further investigations with larger, multi-ethnic samples are required to confirm our results.