Genome-Wide Association Study of Meiotic Recombination Phenotypes

Meiotic recombination is an essential step in gametogenesis, and is one that also generates genetic diversity. Genome-wide association studies (GWAS) and molecular studies have identified genes that influence of human meiotic recombination. RNF212 is associated with total or average number of recombination events, and PRDM9 is associated with the locations of hotspots, or sequences where crossing over appears to cluster. In addition, a common inversion on chromosome 17 is strongly associated with recombination. Other genes have been identified by GWAS, but those results have not been replicated. In this study, using new datasets, we characterized additional recombination phenotypes to uncover novel candidates and further dissect the role of already known loci. We used three datasets totaling 1562 two-generation families, including 3108 parents with 4304 children. We estimated five different recombination phenotypes including two novel phenotypes (average recombination counts within recombination hotspots and outside of hotspots) using dense SNP array genotype data. We then performed gender-specific and combined-sex genome-wide association studies (GWAS) meta-analyses. We replicated associations for several previously reported recombination genes, including RNF212 and PRDM9. By looking specifically at recombination events outside of hotspots, we showed for the first time that PRDM9 has different effects in males and females. We identified several new candidate loci, particularly for recombination events outside of hotspots. These include regions near the genes SPINK6, EVC2, ARHGAP25, and DLGAP2. This study expands our understanding of human meiotic recombination by characterizing additional features that vary across individuals, and identifying regulatory variants influencing the numbers and locations of recombination events.

variation in meiotic recombination in humans using either direct observation in gametes or inferring recombination based on family genotype data (Chowdhury et al. 2009;Kong et al. 2010Kong et al. , 2014Fledel-Alon et al. 2011) or population-level data (Myers et al. 2005). Differ-ent studies have focused on different aspects of trait variation, such as average number of recombination events, location and frequency of the recombination in different areas on the genome, and different patterns in males and females, etc. The most commonly studied recombination phenotype is average recombination count (ARC) over multiple gametes in a single proband (parent). One gene, RNF212, has been conclusively shown to affect overall recombination in ARC (Coop et al. 2008;Kong et al. 2008Kong et al. , 2014Chowdhury et al. 2009;Fledel-Alon et al. 2011). Kong et al. (2008) first reported the RNF212 gene in a GWAS study conducted in an Icelandic population, and showed that specific SNPs in RNF212 have opposite effects on male and female recombination rates. This result was later replicated by other studies (Coop et al. 2008;Kong et al. 2008Kong et al. , 2014Chowdhury et al. 2009;Fledel-Alon et al. 2011). In addition to specific genes, an inversion on chromosome 17q21.31 is also associated with female recombination rate (Kong et al. 2008(Kong et al. , 2014Chowdhury et al. 2009;Fledel-Alon et al. 2011). Other genes putatively associated with ARC include KIAA1462 in females, and UGCG and NUB1 in males (Chowdhury et al. 2009;Fledel-Alon et al. 2011), but these have failed to replicate in other studies (Fledel-Alon et al. 2011;Kong et al. 2014). Most recently, Kong et al. (2014) reported eight new variants (including two rare variants) associated with recombination in the Icelandic population. This latter study used methods based on long-range haplotyping uniquely applicable to n the Icelandic population. Because of the extremely large sample size, and the highly significant p values reported by Kong et al. (2014), it is likely that most or all of these are true positive associations, at least in this population, but they have not yet been examined in any other population (Kong et al. 2014).
Several studies have shown that, in addition to the total recombination rate, the location of recombination events is also under genetic control. Abnormal recombination location has been associated with improper chromosomal segregation (Kimura et al. 2006;Cheung et al. 2010). Based on historical population-based information as represented in patterns of linkage disequilibrium (LD), the frequency of recombination events is higher at some locations of the genome. These 1-2 kb areas of the genome are known as "hotspots" (Kauppiz et al. 2004;Neale 2010). Hotspot areas may be determined by multiple factors such as presence of a particular motif in the hotspot regions, presence of epigenetic factors, and trans-acting loci (Sandovici and Sapienza 2010).
PRDM9 has been shown in several recent studies to affect recombination within hotspots. Activity of various alleles of PRDM9 differs, thus genotype may affect genome-wide hotspot activity (Berg et al. 2010;Kong et al. 2010;Hinch et al. 2011;Segurel et al. 2011). The role of PRDM9 is not limited to human recombination hotspot usage. A recent study showed that PRDM9 is also involved with nonexchange gene conversion (Sarbajna et al. 2012). All of these findings suggest there are other unknown determinants that will add to our understanding of the mechanism of PRDM9 and its role in human recombination and hotspot usage.
The human consensus PRDM9 allele is predicted to recognize the 13-mer motif enriched at human hotspots, and considered as one of the major regulators of meiotic recombination hotspots (Yang et al. 2014); thus, the percent of recombination near these motifs may show individual variability that is genetically determined (although the motif issue is itself controversial) (Kong et al. 2014). From the hotspot locations, initially a list of motifs including 9-mer and 7-mer later extended to degenerate 13-mer motifs containing zinc finger-binding arrays has been discovered (Myers et al. 2008;Yang et al. 2014).
The goals of our study are to find additional recombination genes, and to gain greater understanding of previously discovered genes. In particular, we consider new phenotypes related to hotspot usage to dissect further the genetic architecture of recombination control. We consider percent of recombination occurring in historical hotspots (HS_PCT), average count of recombination occurring in historical hotspots (HS_CNT), average count of recombination occurring outside of historical hotspots (NHS_CNT), and percentage of recombination occurring near the putative motif (MOTIF). The rationale for looking separately at recombination in and out of hotspots, and looking at hotspot recombination as both a percentage and a count, is that these different measures may add insight about the effects of genes. For example, if a variant increases recombination in hotspots but decreases recombination outside of hotspots, there may be a compensatory regulatory mechanism acting to keep total recombination constant. We studied all phenotypes separately in males and females, and also performed combined-sex analyses. Most previous studies of the ARC phenotype have found very different effects in males and females, while previous studies of hotspot phenotypes have shown similar effects in both sexes (Berg et al. 2010;Kong et al. 2014). In addition, we focus on the question of whether the genes discovered by Kong et al. (2014) are associated with recombination phenotypes in a European descent population, given that some of them are relatively rare variants in the Icelandic population. n Column 6 of the table represents the direction of the effect size of each SNP presented in column 2 in each study. In combined analysis, studies were included in the following order (GDCS female, GDCS male, AGRE female, and AGRE male). In female only analysis, first position in the direction column is for GDCS female and the 2nd position is for AGRE female and same ordering is used in male only analysis and for the remaining phenotypes.

Study population and samples
This study included three populations: the Geneva Dental Caries Study (GDCS) (Shaffer et al. 2011), the Autism Genetic Resource Exchange (AGRE) (Weiss 2008), and the Framingham Heart Study (FHS) (Dawber et al. 1951). The GDCS and FHS samples were ascertained without regard to any particular phenotype. There is no known relationship between autism and meiotic recombination. The GDCS and AGRE samples were genotyped on the Illumina Human610-Quad Beadchip, and FHS samples were genotyped on the Affymetrix 5.0 chip. After quality control, final analysis was limited to autosomes only, and a total of 551,227 SNPs, 520,018 SNPs, and 388,060 SNPs from GDCS, AGRE, and FHS datasets, respectively.

Pedigrees
Two-generation nuclear pedigrees with two or more children were used for this study; 171 families came from GDCS, 737 from AGRE, and 654 from FHS. Genotype data on each family were used to score recombination in each parent. Quantitative measures of meiotic recombination in the parents were then used for the GWAS analyses.

Phenotypes
Recombination events in each parent of a nuclear family were called according to the method described in Chowdhury et al. (2009). Briefly, the method is as follows: first, the set of informative markers is identified in each family. A locus is informative if one parent is homozygous and another is heterozygous. Among two or more children, one is considered as the reference child, and, in a sibling pair, a switch from one allele to another allele in a particular parental haplotype along the chromosome will indicate a recombination in the heterozygous parent. A recombination thus observed in a sibling pair cannot be assigned to a specific offspring, but we do not need to do so since we are calculating the recombination phenotype for the parent. When there are three or more siblings, recombinations observed in more than one pair can be resolved as described by Chowdhury et al. (2009) to correctly score recombination in the parent. To avoid spurious double-recombinants due to genotyping error, we required five or more consecutive markers to call each observed recombination event.
From the recombination data, we calculated five different recombination phenotypes: ARC, HS_PCT, HS_CNT, NHS_CNT, and MOTIF. A set of predefined historic hotspot regions identified by HapMap project (International HapMap Consortium 2007) was used to calculate the three phenotypes related to hotspots: HS_PCT, HS_CNT, and  Genotypes, error checking, and data handling For the GDCS dataset, 589,735 SNPs were released by the Center for Inherited Disease Research (CIDR). The AGRE dataset had 520,018 SNPs, and FHS had 388,060 SNPs available for analysis. To ensure the quality, an extensive data cleaning was performed for these datasets. Full details of data cleaning steps for GDCS can be found in Geneva consortium website (https://www.genome.gov/27550876/). Detailed data cleaning steps for AGRE and FHS datasets are presented in Chowdhury et al. (2009). Briefly, measures of identity-by-descent were used to verify relationships, SNP intensities of X-and Y-chromosomes were used to verify gender, and principal component analysis (PCA) was used to summarize genetic ancestry. Two thresholds used in the analysis are a Hardy-Weinberg disequilibrium cut-off of p , 0.0001, and minimum minor allele frequency cut-off of ,2% for all SNPs.

Genome-wide association studies
To identify genes or SNPs associated with different aspects of recombination, we conducted three genome-wide association studies for each phenotype; we conducted separate male and female analyses, as well as performing a combined analysis. We used PLINK (http://pngu.mgh. harvard.edu/~purcell/plink/) to conduct all GWAS using an additive genetic model. All of our phenotypes are continuous; so we used the linear regression option in PLINK for the association tests. As per significance level of association studies, we used the threshold with p , 10 207 as genome-wide significant. We combined the AGRE and GDCS GWAS results using metaanalysis instead of combining all three datasets, because the AGRE and GDCS datasets were genotyped on the same platform (Illumina 610 chip), while the FHS dataset was genotyped on the Affymetrix 5.0 chip, which has a very different coverage profile. Because the Affymetrix 5.0 platform has very different coverage than the Illumina platform in a number of key regions, we did not impute genotypes, since imputation does not "fix" lack of coverage (Begum et al. 2012). We used fixed effects meta-analysis to combine the GDCS and AGRE datasets, which has been shown to perform very similarly to mega-analysis (directly combining datasets), but is slightly more robust to population differences in the phenotype (Lin and Zeng 2010;Sung et al. 2014). We performed GWAS meta-analysis for each gender separately, and also performed combined-sex GWAS metaanalysis using the software METAL (Willer et al. 2010). We used R for most of the data analysis, and LocusZoom (Pruim et al. 2010) to plot the data for each genomic region. We then used the FHS dataset for qualitative replication in regions suggestive or significant in the meta-analyses. Column 6 of the table represents the direction of the effect size of each SNP presented in column 2 in each study. In combined analysis, studies were included in the following order (GDCS female, GDCS male, AGRE female, and AGRE male). In female only analysis, first position in the direction column is for GDCS female, and the 2nd position is for AGRE female and same ordering is used in male only analysis and for the remaining phenotypes.

RESULTS
Important characteristics of these datasets are summarized in Table 1.
The GDCS dataset has not been used previously in any published study of recombination. The AGRE resource was used in Chowdhury et al. (2009), but the dataset used here is larger, and was genotyped with a denser GWAS array chip. The FHS dataset used here is the same as that used in Chowdhury et al. (2009). In GDCS, a total of 421 children were used to score recombination for 171 male and 171 female meioses. Similarly, 1987 and 1858 children were used to score recombination in 736 male and 737 female meioses in AGRE, and 639 male and 654 female meioses in FHS. We used nuclear families with two or more children to score recombination for each of the parents. p values from GCDS and AGRE were combined by metaanalysis for each sex individually, and for both sexes combined. FHS was then used as a replication dataset at the gene level.

GWAS for new recombination phenotypes
For each of the recombination phenotypes, we performed a GWAS in males (meta-analysis of AGRE and GCDS), a GWAS in females (similarly), and a GWAS combining both datasets for both sexes. The 579,043 SNPs overlapping between GDCS and AGRE datasets are included in this meta-analysis. The most significant new results for each phenotype are presented below. We used two different cut-offs for statistical significance in our GWAS analyses: genome-wide significant with p , 10 207 , and p value between 10 205 , p , 10 207 as a suggestive signal. Following the new results, the subsequent section discusses replication of previously reported associations. In discussing replication of previously published results, we considered significance levels appropriate for candidate gene analyses. This is followed by a qualitative description of replication in the FHS dataset. The final section of results examines our associations across all five phenotypes in order to infer new information about RNF212 and PRDM9.
Average recombination count: ARCs for three different datasets are presented in Table 1. The ARCs for each of these studies, and the variation between males and females, are quite consistent with previous studies of human meiotic recombination (Chowdhury et al. 2009;Kong et al. 2010Kong et al. , 2014Fledel-Alon et al. 2011). The distribution of the male and female average recombination counts per meiosis is presented in Figure 1. The top five most highly associated SNPs for all GWAS analyses of the ARC phenotype (male, female, and combined-sex) are listed in Table 2, which also includes nearby flanking genes for each region. In the male analysis, RNF212 was the most significant gene (p = 1.695e 208 ). Males and females have estimated effects in opposite directions, which is consistent with the previous literature. The Manhattan plot for the male-only analysis is presented in Figure 2, and the QQ plot of the same analysis is presented in Figure 3. Manhattan and QQ plots of the female meta-analysis, and the pooled meta-analysis results are presented in Supplemental Material, Figure S1, Figure S2, Figure S3, and Figure S4.
Percent of recombination occurring in hotspots: Distribution of the HS_PCT phenotype is presented in Figure S5. For the HS_PCT phenotype, the top signals for male only, female only, and combined-sex GWAS analysis are presented in Table 3. The strongest association (p = 1.20e213) was with multiple SNPs in and near the PRDM9 gene in the combined-sex analysis (top SNP reported). In the separate male and female analyses, PRDM9 was also among the most statistically significant results. Manhattan plots and QQ plots for female and male are presented in Figure S6, Figure S7, Figure S8, and Figure S9. Figure 4 presents the Manhattan plot of the combined-sex analysis, with the QQ plot in Figure 5. It is notable that other regions showed similar levels of association as observed for PRDM9, particularly in males.
Average count of recombinants in hotspots: Our third phenotype was HS_CNT. Distribution of the HS_CNT phenotype is presented in Figure  S10. Table 4 shows the top five hits for single-sex and combined-sex GWAS meta-analyses. Males showed a stronger effect of PRDM9 on HS_CNT than did females, the opposite of what was observed for HS_PCT. Other suggestive SNPs for HS_CNT had very minimal overlap with the suggested SNPs for HS_PCT. Among the top hits for the male-only analysis of HS_CNT was RNF212, while the top hit in the combined analysis was in PRDM9. The top hit for the female-only analysis was in an intergenic region. Manhattan plots and QQ plots for HS_CNT are presented in Figure  S11, Figure S12, Figure S13, Figure S14, Figure S15, and Figure S16.
Average count of recombinants in nonhotspot areas: In the analysis of recombination events outside of hotspots, we looked at NHS_CNT. Distribution of the NHS_CNT phenotype is presented in Figure S17. The top five SNPs from each analysis are presented in Table 5. In the combined-sex analysis, one of the SNPs (chr5: rs12186491) was n Column 6 of the table represents the direction of the effect size of each SNP presented in column 2 in each study. In combined analysis, studies were included in the following order (GDCS female, GDCS male, AGRE female, and AGRE male). In female only analysis, first position in the direction column is for GDCS female, and the 2nd position is for AGRE female and same ordering is used in male only analysis and for the remaining phenotypes.
genome-wide significant (p = 6.36E208), and this SNP is in the gene SPINK6, which is a serine protease inhibitor. The next most significant hit was in PRDM9. In female-only analysis, none of the SNPs were genome-wide significant. In male-only analysis, one SNP (chr4: rs10937651) in EVC2 showed genome-wide significance. EVC2 is a protein coding gene, and related to bone formation and skeletal development, and is well known as causal for Ellis-van Creveld syndrome, which has clinical features including limb and facial abnormalities, and heart defects (D'Asdia et al. 2013;Kamal et al. 2013). The Manhattan plots and QQ plots are presented in Figure S18, Figure S19, Figure S20, Figure S21, Figure S22, and Figure S23.
Percent of recombination occurring near the motif: The distribution of the MOTIF phenotype is presented in Figure S24. As our last phenotype, we looked at the percent of recombination occurring near the 13 bpr MOTIF. Table S1 lists top hits from each analysis (femaleonly, male-only, and combined-sex). The Manhattan plots and QQ plots are presented in Figure S25, Figure S26, Figure S27, and Figure  S28, Figure S29, and Figure S30.

Replication of previously reported genes
Over the past decade, several studies have characterized meiotic recombination variation, and identified a handful of genes/loci associated with different aspects of recombination. We replicated two most well known genes (PRDM9 and RNF212).
In addition to PRDM9 and RNF212, the most recent study by Kong et al. (2014) nominated eight new loci as being associated with total recombination, including some rare variants. While they also examined  recombination events within hotspots, they found no new evidence of association with hotspot recombination. Because of the enormous sample size used (35,927 parents, and 71,929 offspring), most of these loci were highly significant, and are likely to be true associations with recombination in the Icelandic population. However, these have not been examined in other populations. Table S2 qualitatively summarizes our results at the gene level for reported top hits from Kong et al. (2014). LocusZoom plots for selected loci are presented in Figure S31. Our sample size is much smaller that that of Kong et al. (2014), and our study population is from the United States (primarily of European ancestry), but we were able to see evidence of replication of several of their loci. Poor coverage limited our ability to replicate others. Though our analysis was limited to only common markers, when we looked at the gene level replication, we were able to replicate evidence for CPLX1 (p 10 207 ) and MSH4 (p 10 203 ), which carried rare variants in the data of Kong et al. (2014).
SNPs in the inverted segment on chromosome 17 showed consistent (lowest p 10 24 ) hits of replication across three phenotypes (ARC, HS_CNT, and MOTIF) in females, but not in males, which is consistent inversion region in three datasets. SNPs in FHS dataset is in high LD compared to two other datasets. The SNPs are colorcoded according to HapMap Phase II CEU LD pattern between SNPs (presented in inset in upper right corner). Known genes, and orientation notes are plotted below the SNPs. HapMap recombination rates has been shown with a blue line behind the SNPs. SNP coverage in FHS datasets, and Illumina one million chip is noted by tick marks above the plot.
with Kong et al. (2010Kong et al. ( , 2014. Different SNPs in the region were associated with different phenotypes, however. Selected LocusZoom plots for that region across phenotypes are presented in Figure S32, and the plot for HS_CNT female is presented in Figure 8A. Other previous GWAS studies of recombination have also reported several possible associations, including NUB1, UGCG, and SNP (chr5: rs17542943) for male average recombination counts (Chowdhury et al. 2009;Fledel-Alon et al. 2011). Similarly, previously reported genes for female average recombination include PDZK1, KIAA1462, CRHR1, LRRC37A, OBSCN, and SNP (chr9: rs10985535) (Chowdhury et al. 2009;Fledel-Alon et al. 2011). LocusZoom plots of these previously reported genes from our male and female analyses are presented in Figure S33 and Figure S34, respectively. In males, the UGCG gene replicated moderately (p = 1.34E24), and others showed hints of replication. In females, only CRHR1 (p 10 24 ) and KIAA1462 (p 10 23 ) showed suggestive replication.

Replication of GDCS and AGRE study findings in FHS study
To support our GWAS meta-analysis findings in GCDS and AGRE, we examined 150 regions of interest in the FHS dataset that included at least the top 10 significant SNPs from the fixed effect meta-analyses of GDCS and AGRE for each phenotype, and made LocusZoom plots in the FHS dataset, totaling around 150 LocusZoom plots. We compared male-only analysis with FHS male GWAS results, and female-only analysis with FHS female GWAS results. To compare combined-sex analysis, we combined FHS male and female analyses using fixed effect meta-analysis. Because the FHS dataset, and the two other datasets n examined here, had limited SNP overlap, we performed this replication analysis at the gene level. We did not impute because imputation would not overcome the problem of significantly different coverage for the two chips. Since many of the SNPs/genes of our interest were not among the top hits of FHS dataset (for example, the top hits for the phenotype ARC in FHS dataset presented in Table S3), instead of presenting top hits for each phenotype for the FHS dataset, we extracted our SNPs/gene of interest from the FHS dataset and provide p values as well as LocusZoom plots. For the ARC phenotype, the only replication observed in the FHS dataset was for RNF212 in males (p 10 25 ; Figure 6). In males, a SNP near NAV2 (5th significant SNP rs1035699, Table 2) also showed p 10 25 . Only three SNPs of the 11 most significant in AGRE/GCDS were genotyped in the FHS dataset. Among the eight other SNPs, two were tagged by SNPs with strong LD (0.8 , r 2 , 1.0) in FHS, and four were in medium to high LD.
For our HS_PCT, HS_CNT and NHS_CNT phenotypes, the PRDM9 gene was the center of interest. However the FHS dataset showed no SNP in PRDM9 significantly associated with any of these phenotypes due to extremely poor coverage (see Figure 7).
For HS_CNT, a few of the top results from the AGRE/GCDS metaanalyses showed gene-level replication in FHS. For HS_CNT, in females, the fourth significant SNP was in ARHGAP25. In the FHS dataset, several SNPs on ARHGAP25 showed p 10 24 . And SNPs near SULF2 (10th significant hit) showed p 10 23 in the FHS dataset. In males, the 2nd most significant hit was rs13378443 (nearby genes GPC5, GPC6). In FHS dataset, SNPs near GPC5 showed p 10 23 .
For the NHS_CNT phenotype, there were again some gene-level replications in the FHS dataset, including rs12186491 on SPINK6 (p = 1.1e 204 ) is presented in Figure S35. In male analysis, the top significant hit in the GCDS/AGRE meta-analysis was EVC2, and the third significant hit was near DLGAP2. In FHS, a nearby SNP in EVC2 showed p 10 23 (Figure S36), and a nearby SNP in DLGAP2 showed p 10 25 .
We also looked at the previously reported genes from Kong et al. (2010Kong et al. ( , 2014, and others in the FHS dataset. For the ARC phenotype, in males NUB1 (p 10 23 ), UGCG (p 1.34e 24 ), chr5: rs17542943 (p 10 24 ), and in females CRHR1 (p 10 24 ), KIAA1462 (p 10 23 ), LRRC37A (p 10 23 ), PDZK1 (p 10 23 ) were well replicated in the FHS dataset. Among the previously reported genes/SNPs for the HS_PCT phenotype, only one SNP (chr18: rs1864309) was replicated with p 10 23 . The FHS dataset also showed replication (p 10 25 ) of association between the chromosome 17 inversion and the ARC phenotype in females, as presented in Figure 8B. A group of SNPs in strong LD across that 900 kb region showed association with the ARC phenotype in females.

Further dissection of PRDM9 and RNF212
To gain insight into the roles of the previously reported genes influencing recombination rates, we looked at our association results across all five phenotypes.
PRDM9: The PRDM9 gene association results for different phenotypes are presented in Table 6. We selected the three SNPs with the lowest p values in our study, and examined their p values across all other phenotypes. PRDM9 showed no evidence of association with the average recombination count and MOTIF phenotypes. In combined analysis, PRDM9 SNPs are significantly associated with HS_PCT (p , 10 213 ), and also with HS_CNT. PRDM9 SNPs are associated with HS_PCT in both males and females, with similar effect sizes. The male and female effect sizes are also similar for HS_CNT, although the n  p values were smaller in males. NHS_CNT showed much stronger association (both p value and effect size) in females than in males. Notably, the effect sizes for HS_CNT were in the opposite direction of those for NHS_CNT, suggesting that these PRDM9 variants are in some sense shifting recombination out of nonhotspot areas, and into hotspot areas, particularly in females. Equivalently, this can be seen as evidence of the existence of a compensatory mechanism that keeps total recombination relatively constant as PRDM9 increases or decreases hotspot recombination (again primarily in females). Table 7 presents the RNF212 association p values across all phenotypes, though it is primarily associated with ARC phenotype. Females show no association with RNF212 for any phenotype. In males, RNF212 SNPs show association with HS_CNT (with p 10 25 ) but not with NHS_CNT, and only slight association with HS_PCT.

DISCUSSION
The goal of this work was to expand our understanding of genetic control of meiotic recombination, finding new recombination genes and more information about already known genes by analyzing new datasets and new phenotypes, particularly phenotypes involving recombination in and out of recognized hotspot regions, and to ask whether the recently discovered recombination genes in the Icelandic population also show association in a United States population.
With regard to the most well-established recombination genes, RNF212 and PRDM9, our results provide new insight into recombination differences between males and females. RNF212 is well known to affect total recombination, particularly in males, and PRDM9 is similarly conclusively associated with recombination in hotspots in both males and females, but recombination outside of hotspots has not previously been studied specifically. Kong et al. (2014) showed that markers in PRDM9 are associated with total recombination in males but not females. This suggests that females might have a compensatory mechanism, such that increased recombination in hotspots is balanced by decreased recombination elsewhere. Our results provide further evidence for this hypothesis. In females, we observed that PRDM9 was associated with both HS_CNT and NHS_CNT, but with effects in opposite directions, which is exactly what would be expected if the hypothesized compensatory mechanism existed. In males, we observed an effect of PRDM9 only on HS_CNT, not NHS_CNT, consistent with the lack of the compensatory mechanism in males. We also observed that markers in RNF212 are associated with HS_CNT but not NHS_CNT in males, which is again consistent with the idea that males lack such a regulatory mechanism. While far from proof of any hypothesis, these results raise important questions that could be explored further in larger datasets.
We nominated several potential new recombination genes, including a SNP on chromosome 5 (rs12186491) in the protein coding gene SPINK6, a serine protease inhibitor, in combined-sex analysis with p = 6.36E208. Another SNP of interest is chr4: rs10937651, with p = 5.16E208 in the protein-coding gene EVC2, which showed genome-wide significant association with recombination outside of hotspots in males. Two other genes showed lesser statistical significance in our GWAS but replicated in the FHS dataset; ARHGAP25 (associated with female HS_CNT), and DLGAP2 (associated with male NHS_CNT). ARHGAP25 plays role in actin remodeling, cell polarity, and cell migration (Katoh and Katoh 2004). DLGAP2, which was associated with recombination in males in our study, is an imprinted gene that is highly expressed in the testes (Luedi et al. 2007).
This was also the first study to attempt to replicate the genes found by Kong et al. (2014) in the Icelandic population. We conducted our replication at the gene level, in consideration of the significant population and chip differences. We clearly replicated the association near CPLX1 and GAK on chromosome 4 in females. We also replicated their findings on chromosome 14 near SMEK1 for female recombination. Another association on chromosome 14 from Kong et al. (2014) was near C14orf39 in females; we detected only a small signal in females, but a strong association (p , 1026) in males, a new result that may reflect differences between the Icelandic and United States populations. Other associations from Kong et al. (2014) were not replicated in our study, primarily in regions in which our study had poor coverage, or in which the associated variant in Kong et al. (2014) was rare. In that sense, we replicated all of the Kong et al. (2014) results that we could have expected to, which supports the conclusions of most literature to date that recombination genes tend to have consistent effects across populations.