Sex modifies the effect of genetic risk scores for polycystic ovary syndrome on metabolic phenotypes

Females with polycystic ovary syndrome (PCOS), the most common endocrine disorder in women, have an increased risk of developing cardiometabolic disorders such as insulin resistance, obesity, and type 2 diabetes (T2D). While only diagnosable in females, males with a family history of PCOS can also exhibit a poor cardiometabolic profile. Therefore, we aimed to elucidate the role of sex in the cardiometabolic comorbidities observed in PCOS by conducting bidirectional genetic risk score analyses in both sexes. We first conducted a phenome-wide association study (PheWAS) using PCOS polygenic risk scores (PCOSPRS) to identify potential pleiotropic effects of PCOSPRS across 1,380 medical conditions recorded in the Vanderbilt University Medical Center electronic health record (EHR) database, in females and males. After adjusting for age and genetic ancestry, we found that European (EUR)-ancestry males with higher PCOSPRS were significantly more likely to develop hypertensive diseases than females at the same level of genetic risk. We performed the same analysis in an African (AFR)-ancestry population, but observed no significant associations, likely due to poor trans-ancestry performance of the PRS. Based on observed significant associations in the EUR-ancestry population, we then tested whether the PRS for comorbid conditions (e.g., T2D, body mass index (BMI), hypertension, etc.) also increased the odds of a PCOS diagnosis. Only BMIPRS and T2DPRS were significantly associated with a PCOS diagnosis in EUR-ancestry females. We then further adjusted the T2DPRS for measured BMI and BMIresidual (regressed on the BMIPRS and enriched for the environmental contribution to BMI). Results demonstrated that genetically regulated BMI primarily accounted for the relationship between T2DPRS and PCOS. Overall, our findings show that the genetic architecture of PCOS has distinct sex differences in associations with genetically correlated cardiometabolic traits. It is possible that the cardiometabolic comorbidities observed in PCOS are primarily explained by their shared genetic risk factors, which can be further influenced by biological variables including sex and BMI.


Introduction
Polycystic ovary syndrome (PCOS) is a highly heritable endocrine disorder that affects 5%-21% of females of reproductive age who are typically diagnosed by having two or more of the following features under the Rotterdam criteria: polycystic ovaries, oligo-and anovulation, or hyperandrogenism [1][2][3]. Although Rotterdam is the most common PCOS criteria, other criteria can be used for diagnosis, including the National Institutes of Health criteria, the Androgen Excess and PCOS Criteria, or the 2018 International Evidence Based PCOS guidelines [2]. Each criterion slightly differs in requirements in an effort to cover the range of PCOS symptoms that are exhibited in patients. However, as a result of the differing diagnostic criteria, the heterogenous presentation of symptoms, and the prevalent comorbidities that reside outside of diagnostic requirements, patients often spend years seeking a diagnosis or worse, may be one of the 75% of females estimated with PCOS who are undiagnosed [4,5].
Many clinicians select criteria based on their perception of the most defining PCOS feature [6]. In some cases, as with the Androgen Excess and PCOS Society criteria [7], this will mean hyperandrogenism, a symptom that typically manifests as acne, hirsutism, or alopecia [8]. Androgen excess is also hypothesized to underlie many of the comorbid metabolic dysfunctions experienced by patients such as insulin resistance, obesity, metabolic syndrome, type 2 diabetes (T2D), and cardiometabolic diseases (CMDs). However, as previous studies have shown, the genetic risk factors present in patients with metabolic manifestations of PCOS could differ from others who have primary presentations of reproductive dysfunction [9,10].
PCOS is multifactorial and twin studies estimate heritability at 70% [11][12][13][14]. With an underlying polygenic architecture, multiple variants are hypothesized to be involved in the development of PCOS [15,16]. Furthermore, the variation of clinical features can be partially explained by ancestry informative markers, indicating population specific genotypes may be correlated with PCOS [17,18]. Despite the small effect size of individual common variants, aggregation of common risk variants together as a polygenic (or genetic) risk score (PRS) reflects the overall additive genetic liability to PCOS. This marker of disease risk is associated with PCOS diagnosis in multiple ancestries and offers many advantages to parsing out the phs001672.v1.p1) and phs001672.v3.p1 (https:// www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study. cgi?study_id=phs001672.v3.p1). The electronic health record data that support the findings of this study are available from Vanderbilt University Medical Center, but restrictions apply to the availability of these data, which were used under license for the current study. The data are only available from the institution with appropriate material transfer agreements or data use agreements and permission of Vanderbilt University Medical Center. The data in question must first be reviewed by the Integrated Data Access and Services Core to ensure that the deidentification is complete and no potentially identifying information remains. genetic etiology of PCOS that is entangled with its comorbid presentations [11][12][13]. Furthermore, there is increasing evidence that a spectrum of clinical PCOS manifestations is also correlated with the PCOS PRS [11].
Therefore, in this study, we aimed to determine whether the PCOS PRS demonstrated pleiotropic associations with other health conditions in a hospital biobank population through a phenome-wide association study (PheWAS). We then tested the PCOS PRS in males and females separately revealing the impact of PCOS-associated inherited genetic variation in males, despite the fact that clinical PCOS is only diagnosed in females. Guided by the results of the PheWAS, we further performed additional analyses to determine whether genetic risk for associated cardiometabolic phenotypes also increases risk of PCOS. Lastly, we characterized the impact of body mass index (BMI) on the genetic relationship between PCOS and cardiometabolic comorbidities.

VUMC EHR-linked biorepository
This study used a retrospective cohort design including individuals who visited Vanderbilt University Medical Center (VUMC) between January 1990 to February 2020. VUMC is a tertiary care hospital in Nashville, Tennessee, with several outpatient clinics throughout Tennessee and the surrounding states offering primary and secondary care. Medical records have been electronically documented at VUMC since 1990, resulting in a clinical research database of over 3 million EHRs referred to as the Synthetic Derivative [19]. EHRs include demographic information, health information documented through International Classification of Disease, Ninth Revision (ICD9) and Tenth Revision (ICD10) codes, procedural codes (CPT), clinical notes, medications, and laboratory values. This information is linked with a DNA biorepository known as BioVU. Use of EHRs and genetic data for this study was approved by the Vanderbilt University Institutional Review Board (IRB #160279).

Genetic data
BioVU contains 94,474 individuals genotyped on the MEGA EX platform, which was designed with an increased number of variants found in diverse ancestries to improve genotyping coverage in these populations [20]. We applied a standard quality control pipeline which removed SNPs with low genotyping call rate (< 0.98) and individuals who were related (pi-hat > 0.2), had low call rates (< 0.98), sex discrepancies, or excessive heterozygosity (Fhet > 0.2). A principal component (PC) analysis (PCA) was performed on remaining individuals to determine genetic ancestry using FlashPCA2 [21]. BioVU genotyped samples were stratified by ancestral origin based on PCs herein referred to as the European (EUR) or African (AFR) ancestry dataset. Extended details for the quality control of these datasets have been described previously [22].

Generation of polygenic risk scores (PRS) for PCOS
PCOS PRS were calculated with PRS-CS software using the weighted sums of the risk allele effects as reported in the summary statistics from the Day et al. GWAS of PCOS and applying a Bayesian continuous shrinkage parameter to select SNP features and to model linkage disequilibrium [16,23]. We calculated PCOS PRS for both EUR and AFR BioVU genotyped ancestry samples. We previously demonstrated the PCOS PRS is associated with a PCOS diagnosis defined by our previously published EHR-based algorithm [11]. PCOS PRS were trained on the full PCOS GWAS dataset (N SNPs-EUR = 777,507, N SNPs-AFR = 766,260) and the limited dataset that included only the most significantly associated 10,000 SNPs, due to the restrictions on data sharing when including participants from 23andMe (N SNPs-EUR = 1,210, N SNPs-AFR = 1,192).

Phenome-wide association study (PheWAS) of PCOS PRS
Next, we were interested in identifying the pleiotropic effects of the genetic susceptibility to PCOS on the medical phenome. Therefore, we analyzed the effects of PCOS PRS across 1,380 medical conditions using a PheWAS framework in which multivariable logistic regressions were performed with PCOS PRS as the indicator variable. This analysis was first performed in a sex-combined sample for our EUR and AFR-ancestry datasets. While males cannot be diagnosed with PCOS, they still harbor genetic risk for PCOS and thus, were included. In the sexcombined logistic regression model, covariates included the median age of individuals across their medical record, sex, and the first ten principal components (PCs) estimated from genetic data to control for ancestry. Next, females and males were analyzed separately. In the sex-stratified models, covariates included median age across an individuals' EHR and the first ten PCs.

Sex interaction analysis for phenotypes with significant PCOS PRS main effects
For each phenotype with evidence of a significant main effect of PCOS PRS in either sex (defined as false discovery rate [FDR] q < 0.05), we tested for two-way interactions (sex * PCOS PRS ) to determine whether the phenome-wide PCOS PRS associations were influenced by biological effects of sex. These selected phenotypes included hypertension, essential hypertension, hypertensive heart disease, coronary atherosclerosis, ischemic heart disease, loss of teeth or edentulism, obesity, type 2 diabetes, diabetes mellitus, and overweight, obesity, and other hyperalimentation. For these interaction analyses, sex-combined models included the multiplicative effects of sex*PCOS PRS , main effects of sex, PCOS PRS , median age of individuals across their medical record, and the top ten PCs.
Additionally, we tested for the modifying effects of sex on BMI for the same significant traits observed in the PheWAS analysis. This model included the multiplicative effects of sex*BMI, main effects of sex, median BMI of individuals across their medical record, PCOS PRS , median age of individuals across their medical record, and the top ten PCs.

PheWAS sensitivity analyses
Several sensitivity analyses were performed to assess the robustness of the significant phenome-wide findings. First, we evaluated which phenotypes were independent of a PCOS diagnosis in females by accounting for PCOS diagnosis [11]. This analysis allowed us to identify true pleiotropic associations and provided insight into which phenotypes were exclusively correlated with genetic risk, even in the absence of PCOS.
BMI is strongly correlated with both PCOS and its comorbidities, and thus, can influence the strength of the results [24]. Therefore, we adjusted for BMI (median measurement across an individual's EHR) to test whether the significant phenotypes associated with PCOS PRS were independent of obesity-related effects. In addition to adjusting for BMI, the models were adjusted for median age, sex (only in the sex-combined sample), and the top ten PCs.
Finally, given that BMI has strong contributions from both genetic and environmental sources of variance, we specifically accounted for the environmental contribution to BMI. To do this, we calculated the residuals of individuals' median BMI (i.e., median measurement across the medical record) adjusted for BMI PRS (residuals(medianBMI~BMI PRS )) and used it as a covariate in a subsequent sensitivity analysis of the previously described PheWAS. This residual BMI variable is herein referred to as BMI residual .

Publicly available summary statistics for cardiometabolic disorders (CMDs)
Genome-wide association study (GWAS) summary statistics were acquired for BMI (as a proxy for obesity), diastolic blood pressure, systolic blood pressure, pulse pressure, T2D, heart failure, and coronary artery disease (i.e., representing each of the significant phenotypic associations observed in the PheWAS models described below). Each GWAS was selected based on public availability, maximal sample size, and maximal sample diversity.
The Genetic Investigation of Anthropometric Traits (GIANT) consortium BMI summary statistics included 339,224 individuals of European and non-European ancestry [25]. Blood pressure traits (diastolic, systolic, and pulse) and type 2 diabetes (T2D) summary statistics were obtained from the Million Veteran Program (MVP), a large biobank consortium effort that houses biobank data from various sites in the Department of Veterans Affairs health system [26]. Blood pressure traits were generated from a trans-ethnic sample of over 750,000 individuals from MVP [27]. T2D summary statistics were generated from a meta-analysis using data from 1.4 million participants in various biobanks and consortia groups [28]. Heart failure summary statistics were collected from 47,309 cases and 930,014 controls of European ancestry across nine studies in the Heart Failure Molecular Epidemiology for Therapeutic Targets (HERMES) consortium [29]. Finally, coronary artery disease (CAD) datasets generated from the Coronary Artery Disease Genome-wide Replication and Meta-analysis plus The Coronary Artery Disease (CARDIoGRAMplusC4D) consortium were used as the genetic measurement for the coronary atherosclerosis phenotype [30]. This meta-analysis assembled 60,801 cases and 123,504 controls of multiple ancestries across forty-eight study sites.

Genetic correlation between PCOS and CMDs
Linkage Disequilibrium Score Regression (LDSC) was used to calculate genetic correlation, an estimate of genetic similarity, between traits [31]. LDSC only utilizes GWAS summary statistics and is not sensitive to sample overlap, which may be present across the publicly available GWAS datasets used in this study. This method utilizes the effect estimate of each SNP to estimate genetic correlation between traits while accounting for the effects of SNPs in linkage disequilibrium based on the GWAS reference population. The European reference panel was used for all analyses based on the demographic majority of each of the GWAS samples.
BioVU genotyped datasets were filtered to females and contained 365 PCOS cases and 6,597 controls in the EUR dataset and 149 PCOS cases and 2,182 controls in the AFR dataset. In brief, cases required PCOS billing codes for polycystic ovaries or irregular menstruation and hirsutism, and no exclusion codes (i.e., coded strict algorithm). Exclusions were comprised of codes that could affect menstruation cycles and mimic PCOS symptoms such as Cushing's syndrome. For a complete list, see the supplementary material found in Actkins et al. 2020 [11]. Controls excluded individuals who had any inclusion or exclusion codes. The algorithmic positive predictive value was 96%, the sensitivity was 68%, and the specificity was 25% when compared to a gold standard of clinician adjudicated chart review of 200 patients meeting a data floor and filter criteria [11]. Additional details regarding the selection of PCOS cases and controls in the VUMC EHR dataset have been described elsewhere [11].
Each CMD PRS was used as the independent variable in a logistic regression model with algorithmically defined PCOS diagnosis as the dependent variable. All models were adjusted for median age of the individuals' medical record and the top ten PCs for each ancestry. BMI was included as a covariate in the sensitivity analysis.
Lastly, we adjusted for BMI residual , instead of clinically measured BMI, in the logistic regression models. A multiple testing correction of p < 1.16x10 -3 (0.05/43) was implemented to account for all statistical tests in the PCOS PRS sex interaction analysis, BMI and sex interaction analysis, genetic correlation analysis, and CMD PRS regression analysis for EUR-ancestry females. All statistical analyses were done using R 3.6.0.
No associations in the sex-stratified AFR-ancestry analysis reached statistical significance after correction for multiple testing (S1 Supporting Information).

Sex interaction analysis of significant PCOS PRS PheWAS associations
The sex interaction analysis demonstrated that males of EUR-ancestry with a high PCOS PRS were more likely to be diagnosed with hypertension (p interaction = 7.24x10 -3 ), essential hypertension (p interaction = 7.71x10 -3 ), and hypertensive heart disease (p interaction = 1.19x10 -2 ) than females with the same PCOS PRS when an FDR threshold of 0.05 was implemented ( Table 1). These sex differences were also observed when calculating the prevalence for each trait by decile of PCOS PRS (Fig 3). We also found that males were at an increased risk for hypertensive heart disease (p interaction = 7.13x10 -6 ) than females with the same BMI ( Table 2). Together, these results indicate that sex is an important modifier of both PCOS genetic risk and BMI.

Sensitivity analyses adjusting for PCOS case status, BMI, and BMI residual
In a separate sensitivity analysis (EUR-ancestry females only), we found that after adjusting for PCOS diagnosis, females with a high PCOS PRS still demonstrated a significant positive phenome-wide association with T2D and diabetes mellitus (S3 Fig). Next, we adjusted for median BMI in all of the EUR PCOS PRS -PheWAS models. There were no surviving associations in the sex-combined model or stratified analyses (S3 and S4 Figs), suggesting BMI may mediate the pleiotropic effects observed via PCOS PRS . However, we observed almost no difference from the original results in the female stratified analysis when adjusting for BMI residual , the environmental component of BMI (S5 Fig). Indeed, T2D remained significantly associated with PCOS PRS in females (OR = 1.10, 95% CI = 1.05-1.14, p = 5.45x10 -6 ), as did diabetes mellitus (OR = 1.09, 95% CI = 1.04-1.13, p = 2.78x10 -5 ) when adjusting for the BMI residual . These two associations also remained in the BMI residual adjusted sex-combined results (S6 Fig). None of the associations passed Bonferroni correction in the male stratified analysis when adjusting for BMI residual (S5 Fig).

CMD PRS analysis of PCOS diagnosis
Outside of T2D and BMI, none of the PRS built for CAD, heart failure, or blood pressure traits were significantly associated with a PCOS diagnosis in females of either EUR or AFR-ancestry (Fig 4). BMI PRS was positively associated with PCOS diagnosis in EUR females (OR = 1.32, 95% CI = 1.18-1.47, p = 6.13x10 -7 ) and this finding was replicated in the AFR-ancestry population (OR = 1.20, 95% CI = 1.00-1.44, p = 0.05). However, the BMI PRS was not significant after adjusting for clinically measured BMI. T2D PRS also associated with PCOS diagnosis for EURancestry females, again losing significance when conditioned on BMI (OR unadjusted = 1.16, 95% CI = 1.04-1.30, p = 6.75x10 -3 ; OR adjusted = 1.09, 95% CI = 0.97-1.22, p = 0.16). To determine whether this reduction in effect size was due to the genetic correlation between T2D and BMI, we tested a model in which T2D PRS was covaried for BMI residual (Fig 5). This model indeed recovered the original association between T2D PRS and PCOS diagnosis for the EUR population (OR = 1.14, 95% CI = 1.02-1.28, p = 0.02), suggesting that genetically predicted BMI, not BMI residual , mediates the association between T2D PRS and PCOS diagnosis.

Discussion
First, through a comprehensive analysis of PCOS genetic risk across multiple phenotypes, we identified sex differences in the cardiometabolic traits associated with PCOS genetic risk. Among these, the most notable difference was that males with high PCOS PRS were at greater risk of cardiovascular conditions than females at the same level of genetic risk. Furthermore, only T2D PRS and BMI PRS were associated with PCOS diagnosis, indicating that many of the associations observed in the PCOS PRS PheWAS were primarily driven by PCOS genetic risk and not the genetic effects of the identified comorbidities. Second, BMI also had strong effects on many of these associations with genetically regulated BMI being a primary driver of the risk between PCOS and T2D (S7 Fig).
There is growing interest in studying PCOS related effects in males whether through their relationship to first-degree family members with PCOS or by an equivalent phenotype [32][33][34]. Generally, males with mothers or sisters with PCOS tend to exhibit a poorer cardiometabolic profile which can be observed as early as infancy [33]. Previous studies suggest males who exhibit high genetic risk for PCOS are more likely to present with CMDs such as obesity, T2D, and diabetes mellitus, findings which were among the top associations for males in this study [12,35]. Our sex*PCOS PRS interaction analysis further showed that males with a high genetic risk for PCOS had an increased risk for hypertension compared to females. Additionally, the sex*BMI interaction analysis demonstrated that males had a greater likelihood of hypertensive heart disease compared to females with the same BMI. We found that associations with CMD phenotypes in males were largely accounted for by genetically predicted BMI which can significantly increase the lifetime risk for CMD and mortality rates of high-risk individuals [36]. Genetic susceptibility for PCOS includes contributions from BMI and metabolic pathways. While the independent and sex-differential effects of BMI contribute to CMDs, these results also raise the hypothesis that sex hormones could play a role in the sex differential risk of CMDs conferred by the PCOS PRS . Together, these processes could be an additional catalyst for CMD events in males, making individuals already predisposed to adverse metabolic outcomes more vulnerable.
In an effort to determine whether genetic risk for phenotypes comorbid with PCOS could also increase risk for PCOS, we conducted separate multivariable logistic regressions with PRS for BMI, diastolic blood pressure, systolic blood pressure, pulse pressure, T2D, heart failure, and CAD on the PCOS diagnosis outcome and found no significant associations outside of T2D PRS and BMI PRS . As with T2D, we found that genetically predicted BMI is primarily responsible for the association between T2D PRS and PCOS, and for the reverse association between PCOS PRS and T2D diagnosis. Given that females with PCOS are more likely to develop T2D at an earlier age, PCOS patients with a family history of T2D and obesity may have an even greater risk of morbid outcomes [37]. For example, this genetic susceptibility could explain the high prevalence of insulin resistance in PCOS patients, which can be as high as 70% regardless of BMI status [38]. Poor outcomes can also be further exacerbated by the genetic predisposition of BMI which already increases the risk of PCOS and T2D as shown by Mendelian Randomization studies [39,40]. Given that the environmental and genetically regulated variance of BMI can have differential effects on PCOS and resulting comorbidity risk, this study continues to underscore the importance of BMI in PCOS risk. Therefore, the repercussions of these effects should be investigated further, especially since the genetic and biological pathways could differ in lean PCOS patients who also experience a high rate of insulin resistance [41].
This study offers many strengths. Firstly, we showed that cardiometabolic associations vary with sex and that the metabolic outcomes related to PCOS genetic architecture can be further understood by studying both males and females. Secondly, we decomposed EHR measured BMI into genetically predicted (i.e., BMI PRS ) and environmentally enriched (i.e., BMI residual ) variance and evaluated their respective roles in mediating the cardiometabolic profiles associated with PCOS PRS . However, limitations include low power to detect any significant associations in our African ancestry sample. To date, there is no PCOS GWAS of African ancestry individuals, limiting all current similar studies to building PRS using European-based genetic variants, which do not perform as well in non-European populations [11,12]. Second, we only examined one environmental risk factor. Although many effects such as lifestyle and diet can be captured through BMI, it is not an exhaustive measurement nor does it accurately account for the full wellness of an individual [42,43]. Although other anthropometric features like hip-towaist ratio (WHR) may be better indicators of health for some phenotypes [44], this information is not routinely collected in clinical settings or reported in EHRs. Furthermore, evidence does suggest that clinically ascertained BMI may be more informative for PCOS than WHR [39]. Finally, despite using the largest PCOS GWAS to date for this analysis, our PCOS PRS still only explains a small portion of PCOS genetic variance. As these analyses expand, so too will our ability to detect the full genetic spectrum of PCOS and its subphenotypes.
PCOS is a multifaceted disorder with genetic architecture that is reflective of its heterogeneous outcomes. This polygenic structure captures a spectrum of metabolic comorbidities that is even more apparent when compared between sexes. Our findings show that males with high PCOS liability are indeed a high-risk group and those with a family history of PCOS should be closely monitored for hypertension and other CMDs. This is also true for females with PCOS and a family history of T2D, whose genetic risk could induce more severe comorbid outcomes. As such, management and screening strategies should be updated to reflect advances in PCOS etiology. This call to action is paramount and requires both widespread dissemination of risk factor information to relevant stakeholders and increases in PCOS research priorities and funding. This becomes even more crucial as PCOS comorbidities are often under-recognized in clinical settings and metabolic features are not included in many PCOS screening methods [45][46][47][48].