Exploring Genetic Interactions in Colombian Women with Polycystic Ovarian Syndrome: A Study on SNP-SNP Associations

Polycystic ovary syndrome (PCOS) is an endocrine and metabolic disorder with high prevalence in women around the world. The identification of single-nucleotide polymorphisms (SNPs) through genome-wide association studies has classified it as a polygenic disease. Most studies have independently evaluated the contribution of each SNP to the risk of PCOS. Few studies have assessed the effect of epistasis among the identified SNPs. Therefore, this exploratory study aimed to evaluate the interaction of 27 SNPs identified as risk candidates and their contribution to the pathogenesis of PCOS. The study population included 49 control women and 49 women with PCOS with a normal BMI. Genotyping was carried out through the MassARRAY iPLEX single-nucleotide polymorphism typing platform. Using the multifactor dimensionality reduction (MDR) method, the interaction between SNPs was evaluated. The analysis showed that the best interaction model (p < 0.0001) was composed of three loci (rs11692782-FSHR, rs2268361-FSHR, and rs4784165-TOX3). Furthermore, a tendency towards synergy was evident between rs2268361 and the SNPs rs7371084–rs11692782–rs4784165, as well as a redundancy in rs7371084–rs11692782–rs4784165. This pilot study suggests that epistasis may influence PCOS pathophysiology. Large-scale analysis is needed to deepen our understanding of its impact on this complex syndrome affecting thousands of women.


Introduction
Polycystic ovary syndrome (PCOS) is a complex chronic disorder that manifests in women of reproductive age.The prevalence of PCOS varies according to the diagnostic criteria used and the study population.Globally, the prevalence ranges from 5 to 15% [1].The Rotterdam 2003 criteria have been the most used for the diagnosis of PCOS.These include the presence of at least two of three characteristics: clinical/biochemical hyperandrogenism, polycystic ovarian morphology, and oligo/amenorrhea [2].
The heterogeneity of PCOS is manifested throughout a woman's life through reproductive, metabolic, dermatological, and psychological consequences [3].Although research efforts for this disorder are considerable, the etiology remains unknown [4].It has been identified that environmental and genetic factors contribute to the progression of the disease.From a genetic perspective, genome-wide association studies (GWASs) have been established as the most effective approach to identify single-nucleotide polymorphisms (SNPs) in complex diseases like PCOS [5].However, the identified SNPs generally show modest effects on the disease risk, which is known as a "missing heritability" problem.In response to this challenge, identifying SNP-SNP interactions (also called epistasis) has been proposed, as complex diseases are determined by multiple genetic factors that interact with each other [6].
Various methods have been described for the analysis of these interactions, mostly based on statistical regression models [7].However, these models require a priori genetic models and face challenges with data dimensionality, given that an increase in the number of variables (SNPs) exponentially increases higher-order interactions [8].Increasing the sample size reduces this problem and allows for a robust estimation of interactions.However, this results in additional high costs [7].
The multifactor dimensionality reduction (MDR) method developed by Ritchie et al.
[9] was the first machine learning approach proposed as an alternative to small sample sizes and limitations of statistical methods in gene-gene interaction analyses.MDR is a nonparametric method used in case-control studies, which reduces the dimensionality of SNP genotypes by grouping them into high-and low-risk groups.This diminishes type I and II errors [10].Additionally, MDR can detect high-order interactions, even in the absence of statistically significant main effects [11].Since MDR does not assume a specific inheritance model, it selects the best SNP model among all possible combinations through a cross-validation procedure, achieving maximum balanced accuracy.Permutation tests allow one to identify whether the model is statistically significant [8,12].
For PCOS, there are large amounts of genomic data and studies focusing on detecting SNPs in isolation.However, few studies have explored interactions between polymorphisms for this complex disorder.Therefore, this research aimed to evaluate the epistatic effect of 27 SNPs from the genes THADA, LHCGR, FSHR, DENND1A, YAP1, HMGA2, ERBB3, AMHR2, TOX3, INSR, and AMH in a sample of Colombian women with PCOS.

Characteristics of the Study Sample
Tables 1 and 2 present the clinical, endocrine, and metabolic characteristics of the study sample, previously reported by Alarcón-Granados et al. [13].Significant differences were observed in weight, with the PCOS group having a higher median weight (60.8 kg) compared to the controls (60 kg) (p = 0.037).No significant differences were noted in height and body mass index (BMI) between the groups.It should be noted that the PCOS group had an average BMI within the normal range (23.16 kg/m 2 ), representing the lean PCOS phenotype.This subgroup is not reflective of the broader PCOS population, which generally includes more women with overweight or obesity.
The PCOS group had significantly higher levels of follicle-stimulating hormone (FSH), antimüllerian hormone (AMH), luteinizing hormone (LH), estradiol (E 2 ), total ovarian volume, and total antral follicular count (AFC) compared to controls (all p < 0.0001).Family history data revealed higher incidences of polycystic ovaries and endometriosis among the PCOS group.Reproductive features indicated significantly fewer pregnancies and higher incidences of early pregnancy loss in women with PCOS compared to controls (Table 1).
In women with PCOS, hyperandrogenism was evident from elevated levels of androstenedione (1.49 ± 0.59 ng/mL), DHEAS (152.8 ± 64.51 µg/dL), and free testosterone (median 1.34 pg/mL), as shown in Table 2, with clinical manifestations including acne (60%), facial hair (68%), and abdominal hair (60%).Amenorrhea, or the absence of menstrual periods, is reflected in the significantly longer menstrual cycle length in women with PCOS (31 days) compared to controls (28 days, p < 0.0001), with 60% experiencing menstrual bleeding cessation for more than 3 months and 50% reporting multiple menstrual bleeds in one month.Ovarian ultrasound findings, crucial for PCOS diagnosis, show a significantly higher total ovarian volume (12.25 cm 3 vs.7.61 cm 3 , p < 0.0001) and total antral follicular count (median 27 vs. 16, p < 0.0001), indicating the typical polycystic ovarian morphology associated with the syndrome.

Epistasis Analysis
The basic information related to the 27 SNPs included in this study is shown in Table 3.The correlation between polymorphisms in the THADA, LHCGR, FSHR, DENND1A, YAP1, HMGA2, ERBB3, AMHR2, TOX3, INSR, and AMH genes and the risk to PCOS in the allele model was evaluated.However, no statistically significant difference was observed between polymorphisms and PCOS risk (p > 0.05).Table 4 summarizes the results of the SNP-SNP interaction analysis.We found that the three-locus model including rs11692782-FSHR, rs2268361-FSHR, and rs4784165-TOX3 was the best model with a cross-validation consistency = 7/10, testing balanced accuracy = 0.6327, and p < 0.0001.Figure 1 details the combinations of genotypes associated with PCOS risk in this model.We identified interactions in high-risk genotypes for rs4784165, rs11692782, and rs2268361 such as (GG + TT + CC), (GG + AA + CT), (GG + TA + CT), (GG + AA + TT), (GG + TA + TT), (GT + TA + CC), (GT + AA + CT), (GT + TT + TT + TT), (TT + TA + CC), (TT + TT + CC), (TT + AA + CT), (TT + TT + TT + CT), and (TT + AA + TT), respectively, and low-risk genotypes such as (GG + AA + CC), (GG + TA + CC), (GG + TT + CT), (GG + TT + TT + TT), (GT + AA + CC), and (TT + AA + CC), respectively.The entropy analysis that evaluates what types of effects are represented in the model is detailed in Figure 2. In this analysis, a new SNP was included in the model (rs7371084-LHCGR).In the interaction map (Figure 2a), each node represents an SNP, as well as the individual entropy percentage for each polymorphism (main effects).The values between the nodes represent the interaction effects between SNPs.Positive entropy values (represented by red or orange lines) between polymorphisms indicate information gain or synergy, and negative values (represented by yellow or green lines) indicate redundancy or independence.Our results show a synergistic interaction between rs2268361 and rs7371084 (0.21%), rs4784165 (0.19%), and rs11692782 (3.30%).These last three SNPs present a redundancy effect among themselves.The dendrogram representing the interactions between SNPs (Figure 2b) groups the polymorphisms with the highest redundancy (rs4784165-rs7371084) and synergy (rs2268361).
respectively, and low-risk genotypes such as (GG + AA + CC), (GG + TA + CC), (GG + TT + CT), (GG + TT + TT + TT), (GT + AA + CC), and (TT + AA + CC), respectively.The entropy analysis that evaluates what types of effects are represented in the model is detailed in Figure 2. In this analysis, a new SNP was included in the model (rs7371084-LHCGR).In the interaction map (Figure 2a), each node represents an SNP, as well as the

Discussion
This is the first study in a sample of Colombian women that applies the MDR method to identify SNP-SNP interactions in 27 variants located in the genes THADA, LHCGR, FSHR, DENND1A, YAP1, HMGA2, ERBB3, AMHR2, TOX3, INSR, and AMH, associated with PCOS.In the individual risk analysis under the allelic model, no statistically signifi-

Discussion
This is the first study in a sample of Colombian women that applies the MDR method to identify SNP-SNP interactions in 27 variants located in the genes THADA, LHCGR, FSHR, DENND1A, YAP1, HMGA2, ERBB3, AMHR2, TOX3, INSR, and AMH, associated with PCOS.In the individual risk analysis under the allelic model, no statistically significant differences were identified between women with PCOS and controls.In previous published [13,14] and unpublished pilot studies for the same cohort, we identified a negative association between rs7371084-LHCGR and rs4784165-TOX3, and a positive association between the SNPs rs10986105, rs10818854, rs7857605, and rs12337273 in the DENND1A gene and the risk of PCOS.
Using the MDR method, it was evident that the three-locus model including rs11692782-FSHR, rs2268361-FSHR, and rs4784165-TOX3 was the best (p < 0.001).This model presented an OR (95% CI) of 11.29 (4.183-30.49), a value that indicates a significant increase in the pathogenesis of the syndrome.These results were confirmed when creating the interaction map and dendrogram, where the SNP rs7371084-LHCGR was included.A tendency towards synergy was observed between rs2268361 and the SNPs rs7371084-rs11692782-rs4784165.This suggests that their combined effects are larger or different than would be expected by simply adding the individual effects of each variant [15].The synergy not only suggests a greater combined effect between SNPs but may also indicate a functional relationship between the variants and a higher penetrance of the PCOS phenotype [16].
The genes involved in this interaction were FSHR, LHCGR, and TOX3.The relationship between FSHR and LHCGR has been well described since both encode receptors for key hormones in the regulation of the menstrual cycle and reproductive function: folliclestimulating hormone (FSH) and luteinizing hormone (LH).In healthy women, the LH:FSH ratio is 1 [17].During the menstrual cycle, FSH and LH act together to regulate the growth and maturation of ovarian follicles, as well as ovulation [18].FSH stimulates the growth of follicles in the ovaries, while LH induces ovulation and helps maintain the corpus luteum, which produces progesterone [19].The expression and activity of these receptors are coordinated and regulated in a complex manner to guarantee adequate maturation of the follicles, ovulation, and preparation of the endometrium for implantation of the fertilized egg [20].
In women with PCOS, one of the most common hormonal alterations is increased levels of LH.This increase causes a high LH:FSH ratio, from 1 to 5.5 [21].Elevated LH stimulates the production of androgens in the theca cells of the ovary, which can contribute to hyperandrogenemia, manifesting in symptoms such as hirsutism and acne [22].Meanwhile, FSH generally remains normal or low compared to LH.In women with PCOS, the altered LH:FSH ratio prevents adequate maturation of the follicles, leading to anovulatory cycles [23].This hormonal dysregulation contributes to menstrual irregularity, a cardinal symptom of PCOS [24], and reproductive problems such as infertility or difficulties conceiving, which are common concerns among women with this condition [25].
On the other hand, TOX3 is a transcription factor that plays a role in regulating gene expression in various cells and tissues, including reproductive ones [26].Variants in TOX3 have been associated with an increased risk in women with PCOS, and their possible involvement in the regulation of ovarian function and fertility has been suggested [27].However, at the functional level, the interaction of TOX3 with FSHR and LHCGR has not been described.
In contrast, it was observed that the interaction between rs7371084-rs11692782-rs4784165 was represented by redundancy values, which means that the combined effects of the SNPs are similar to the sum of the individual effects of each SNP [28].Redundancy may be at a functional level, suggesting that these variants affect similar or overlapping biological pathways or processes in the development of PCOS [29], or indicate the presence of compensatory mechanisms, where a variant that increases the risk of PCOS could be compensated due to the presence of another variant that has a protective or neutralizing effect [30].Similar results were reported by Thathapudi et al. [31] who established a weak interaction between LHCGR and FSHR/CAPN-10 variants in women with PCOS.
Considering that PCOS is a multifactorial and complex disorder, presenting a wide variability in symptoms and severity, studies on epistasis are essential for understanding the genetic basis and phenotypic variability of the disease.These studies can identify new therapeutic targets and improve individual risk prediction, which could have significant implications for developing more effective prevention and treatment strategies for PCOS.
While our study provides an initial exploration of epistatic effects, the small sample size and limited number of SNPs per gene constrain the depth of our analysis.Future research with larger and more diverse samples is needed to robustly assess these interactions and their implications.In particular, future studies with larger sample sizes could investigate the impact of gene interactions on phenotypic features of PCOS in both lean and obese cohorts of women.To increase the number of PCOS patients available for analysis, it would be beneficial to establish connections and networks with research groups, associations, and organizations focused on women's sexual and reproductive health.Collaborative efforts and engagement with these networks could facilitate access to larger and more diverse patient populations.
Regarding the limitation of our study's focus on Colombian women, we acknowledge that genetic variations and disease manifestations can differ across populations.This regional focus may impact the generalizability of our findings.Addressing this, we suggest that future studies should include diverse populations to better understand the global applicability of the results and to overcome these limitations.Furthermore, the insights gained from our study, despite its limitations, provide a foundation for exploring the clinical applications of epistatic interactions in PCOS.With broader and more inclusive research, there is potential for translating these findings into improved risk assessment and targeted treatment strategies.

Study Participants
We conducted an exploratory case-control study in Colombia with a sample of 49 control women and 49 women with a confirmed diagnosis of PCOS.The characteristics of the sample and the inclusion and exclusion criteria have been detailed previously [13].

SNP Selection and Genotyping
We included 27 SNPs from 11 genes widely reported as risk candidates for PCOS in different studies [32,33].According to the manufacturer's instructions, we used the Invisorb R Spin Universal Kit (Stratec Molecular, Berlin, Germany) to extract total genomic DNA from the peripheral blood of the participants.DNA concentration was measured using an EPOCHTM2 Microplate Spectrophotometer (Biotek, Winooski, VT, USA).Genotyping was performed using the MassARRAY iPLEX single-nucleotide polymorphism typing platform (Agena Bioscience, San Diego, CA, USA).This platform employs matrix-assisted laser desorption ionization time-of-flight mass spectrometry (MALDI-TOF MS) in conjunction with single-base extension polymerase chain reaction (PCR) to enable high-throughput multiplex detection of SNPs [34].The design of extension primers was performed using the Assay Design Suite (ADS) software version 2.0, and the allelic discrimination, after the iPLEX reaction, was performed using the Typer software version 4.0.Design details are shown in Table 5.

Statistic and SNP-SNP Interaction Analysis
The R Studio software version 4.2.3 was used to evaluate the risk to PCOS of the polymorphisms under the allelic model.The minor allele frequency (MAF) for each SNP, Hardy-Weinberg Equilibrium (HWE), odds ratio (OR), and 95% confidence interval (95% CI) were determined.The p-value was estimated using a chi-square test.
SNP-SNP interactions in PCOS risk were evaluated between the 27 SNPs by the nonparametric model-free multifactor dimensionality reduction (MDR) method using the MDR software version 3.0.2(open-source version available at https://www.epistasis.org,accessed on 5 February 2024).
The SNP combination with maximum cross-validation consistency (CVC) and test accuracy was considered the best model [35].For the best model, a dendrogram and an interaction map for PCOS risk were created to represent the interactions between SNPs.In the interaction map, each node corresponds to a polymorphism and the individual effect value of each SNP.The values between SNPs represent the entropy and interaction strength.Negative values mean redundancy and positive values mean synergy between polymorphisms.Additionally, the OR, 95% CI, and p-value were calculated in the best models obtained.Significance was considered with a p-value < 0.05.

Conclusions
This exploratory study in a sample of Colombian women evaluated 27 polymorphisms previously identified as risk candidates for PCOS.Through the MDR method, we identified that the best interaction model was rs11692782-FSHR, rs2268361-FSHR, and rs4784165-TOX3.The interaction graphs showed a tendency towards synergy between rs2268361 and the SNPs rs7371084-rs11692782-rs4784165, and a tendency towards redundancy between rs7371084-rs11692782-rs4784165.The above demonstrates that polymorphisms in complex diseases can interact with each other and contribute to the pathogenesis of the disease.Therefore, it is necessary to carry out large-scale studies that allow us to elucidate, even at a functional level, the effect of epistasis in PCOS.

Figure 1 .
Figure 1.Distribution of high-risk and low-risk genotypes in the best three-locus model.Each cell shows counts of the control group on the left and the PCOS group on the right.Dark-gray cells represent high risk and light cells represent low risk.

Figure 1 .
Figure 1.Distribution of high-risk and low-risk genotypes in the best three-locus model.Each cell shows counts of the control group on the left and the PCOS group on the right.Dark-gray cells represent high risk and light cells represent low risk.

Figure 2 .
Figure 2. Entropy analysis.(a) Interaction map for PCOS risk.(b) Dendrogram between SNPs in PCOS patients.The coloring scheme represents a continuum from synergy (red, representing a nonadditive interaction) to redundancy (blue, representing loss of information).

Figure 2 .
Figure 2. Entropy analysis.(a) Interaction map for PCOS risk.(b) Dendrogram between SNPs in PCOS patients.The coloring scheme represents a continuum from synergy (red, representing a non-additive interaction) to redundancy (blue, representing loss of information).

Table 3 .
Basic information of 27 polymorphisms included in this study.

Table 4 .
Best MDR models of SNP-SNP interactions.

Table 4 .
Best MDR models of SNP-SNP interactions.

Table 5 .
Design details for SNP genotyping.

Table 5 .
Cont.Uniplex Amplification Score (this score indicates how well the amplicon meets the design criteria, individually); MP_CONF: multiplex amplification score (this score indicates how well the amplicon meets the design criteria, taking into account the other primers included in the multiplex reaction); Tm (NN): melting temperature for the extension primer; PcGC: percentage of GC contained in the first extension; UEP_DIR: Address of the first extension; UEP_MASS: mass of the first extension; UEP_SEQ: sequence of first extension; EXT1_CALL: first allelic variant; EXT1_MASS: mass of the sequence of the first extension + genotype of the first allelic variant; EXT1_SEQ: extension primer sequence + first allelic variant; EXT2_CALL: second allelic variant; EXT2_MASS: mass of the sequence of the first extension + g; EXT2_SEQ: extension primer sequence + second allelic variant.