Exchange of genetic information between therian X and Y chromosome gametologs in old evolutionary strata

Abstract Therian X and Y sex chromosomes arose from a pair of autosomes. Y chromosomes consist of a pseudoautosomal region that crosses over with the X chromosome and a male‐specific Y‐chromosomal region that does not. The X chromosome can be structured into “evolutionary strata”. Divergence of X‐chromosomal genes from their gametologs is similar within a stratum, but differs among strata, likely caused by a different onset of suppression of crossing over between gametologs. After stratum formation, exchange of information between gametologs has long been believed absent; however, recent studies have shown limited exchange, likely through gene conversion. Herein we investigate exchange of genetic information between gametologs in old strata that formed before the split of Laurasiatheria (cattle) from Euarchontoglires (primates and rodents) with a new phylogenetic approach. A prerequisite for our test is an overall preradiative topology, that is, all X‐chromosomal gametologs are more similar among themselves than to Y‐chromosomal sequences. Screening multiple sequence alignments of the coding sequences of genes from cattle, mice, and humans identified four genes, DDX3X/Y,RBMX/Y,USP9X/Y, and UTX/Y, exhibiting a preradiation topology. Applying our test, we detected exchange of genetic information between all four X and Y gametologs after stratum formation.

of (1) a pseudo-autosomal region (PAR), where crossover between the sex chromosomes is still possible and (2) a male-specific Ychromosomal region (MSY) in which recombination is suppressed.
The lack of recombination between parts of the X and Y chromosomes has resulted in the differentiation of allelic variants on the X and Y chromosomes, that is, the differentiation of gametologs (Garcia-Moreno & Mindell, 2000); for a review, see, for example, Ellegren (2011).
Strikingly, the therian Y chromosome is degraded, such that it contains only few functional genes but many repetitive sequences, whereas gene content and repetitiveness of the X chromosome are similar to autosomes. Genes on the MSY are thought to degenerate (Graves, 2006) due to 1. a higher mutation rate: The microenvironment of the Y chromosome in the testes boosts mutations due to a lack of repair enzymes, oxidative stress, and a high number of cell divisions.
2. the inefficiency of selection on the Y chromosome due to lack of recombination: Recombination generates variance in fitness, which provides opportunity for selection, whereas genetically linked regions are inherited (and selected for) as a whole. Deleterious mutations then cause gradual degeneration of the linked genes on the Y chromosome (Muller's ratchet).
As a result, most genes on the therian Y chromosome have been lost completely during evolution, some have remained only as pseudogenes, that is, dysfunctional DNA sequences that are no longer expressed. Several preserved Y-chromosomal genes with homologs on the X chromosome have acquired a male-specific function. The high proportion of Y-chromosomal genes with male-specific function can be explained by strong selection preventing degeneration. Some genes on the Y chromosome have counterparts on the X chromosome that escape X chromosome inactivation in females (Lahn & Page, 1997). As these genes are thus expressed on both chromosomes in females, selective pressure on the Y-chromosomal gene may act to retain the proper gene dosage also in males. Stabilizing selection on the male copy may then prevent degeneration (reviewed in Ellegren, 2011;Bachtrog, 2013). Additionally, other male advantage genes with no known X homolog appear to have been transposed to the Y chromosome from autosomes (reviewed in Graves, 2006).

| Evolutionary strata
Mechanisms inhibiting recombination likely involved chromosomal rearrangements (especially inversions) (Lahn & Page, 1999;Lemaitre et al., 2009) and occurred in successive evolutionary events, each suppressing recombination between X and Y chromosomes in a certain region, or "evolutionary stratum." Gametologs in older strata show higher divergence. Four such evolutionary strata have been identified in human sex chromosomes by Lahn and Page (1999); later studies differentiate five (Ross et al., 2005) or even nine (Pandey, Wilson Sayres, & Azad, 2013). The position of the strata on the X chromosome corresponds to the timing of the stratum formation. The position of gametologs on the Y chromosome generally does not correspond to the strata on the X chromosome, which indicates that rearrangements on the Y chromosome might indeed have been responsible for the inhibition of recombination (Lahn & Page, 1999). Note, however, that Katsura and Satta (2012) shows that the pairwise divergence times of genes in strata one and two differ little between marsupials and eutherians. Graves and colleagues (Graves, 1995(Graves, , 2006 showed that the third and fourth strata of Lahn and Page (1999), located on the short arm of the eutherian X chromosome, arose through a translocation from an autosome (see Figure 1). As this translocation is present in all eutherians but not in marsupials, it must have occurred in the ancestor of extant eutherian mammals after divergence from the metatherians, that is, between about 180 and 100 MYA. Because of this chromosomal rearrangement, the eutherian X and Y chromosomes should be considered neo-X and neo-Y chromosomes. Thus, the boundary between strata two and three also corresponds to the boundary of a chromosomal rearrangement.
F I G U R E 1 Schematic representation of the strata of a eutherian X chromosome. Numbers 1-4 represent the strata according to Lahn and Page (1999), "ancestr-X" refers to the original X chromosome, "transl-X" to the translocated region of eutherians, and "PAR" to the pseudo-autosomal region. The genes considered in this article are indicated at their relative positions

| Genetic exchange between the sex chromosomes
Presently, only the PAR exhibits crossovers between the X and Y chromosomes. In the PAR, gametologs are nearly identical, which is necessary for pairing and proper segregation of the sex chromosomes during meiosis (Burgoyne, 1982). The borders of the PAR, that is, the pseudo-autosomal boundaries, sometimes seem to be displaced during evolution by attrition, but details remain unclear (Van Laere, Coppieters, & Georges, 2008). In contrast to the PAR, the MSY is generally highly differentiated from the X chromosome. However, limited genetic exchange is apparently still possible, most likely through gene conversion.
Gene conversion may occur to repair double-strand breaks, when only the gametolog is available as template to repair double-strand breaks (Trombetta et al., 2014). As only little exchange of genetic information is necessary to prevent the rapid degeneration on the Y chromosome (Pecon Slattery et al., 2000), these gene conversion events challenge current models of the evolution of Y chromosomes presented above.
In this article, we investigate evolutionarily old strata, where crossover had already been suppressed since before the split of the three taxa we investigate: cattle (of the order Artiodactyla), mice (of the order Rodentia), and humans (of the order Primates). Aligned sequences of genes in these strata generally show site patterns consistent with a preradiation topology with respect to the split between, on the one hand, Laurasiatheria (including artiodactylans) and, on the other, Euarchontoglires (including primates and rodents) (see subsection 2.3 for details). This split dates to about 84 MYA and the split between Euarchonta and Glires to about 76 MYA (dos Reis et al., 2012). Note that a gene conversion within rodents or primates must have occurred at least 8 million years after stratum formation. Our aim is to detect evidence of gene conversion that occurred after the inhibition of crossover of these strata. We screened all available genes; however, only few functional genes have remained on the Y chromosome, specifically within rodents only a handful. Indeed, some rodents have entirely lost their Y chromosome (reviewed in Graves, 2006). Furthermore, sequencing and annotation of the Y chromosome is difficult, because of the largely repetitive Y-chromosomal sequence, such that reliable Y-chromosomal sequences are only available from few species.
We aligned MSY sequences, as frequencies of substitution site patterns of homologous sites may provide information about evolution. With a variant of the "co-double method" (Balding, Nichols, & Hunt, 1992), we compare frequencies of substitution site patterns that (1) can only arise by double substitutions with those that (2) may arise by either a double substitution or a single substitution and subsequent gene conversion between gametologs. An excess of the latter indicates exchange of information between gametologs after stratum formation. A prerequisite for this test is an overall preradiative topology, that is, all X-chromosomal sequences are more similar among themselves than to Y-chromosomal sequences. Statistical power for our study mainly comes from the number of sites, rather than from the number of species studied. Adding more species would have reduced the number of sites or maybe even the number of genes further. We screened all genes in the MSY that could be aligned in these three well-studied taxa and the outgroup platypus.

| Gene sequences and multiple sequence alignment
According to our screening of the databases, only five genes potentially satisfy our criteria: RBMX/Y, DDX3X/Y, USP9X/Y, UTX/Y and ZFX/Y. Coding sequences of these genes in humans (Homo sapiens), mice (Mus musculus), and cattle (Bos taurus) as well as the coding sequence of the respective homologs in platypus (Ornithorhynchus anatinus) were obtained from http://www.ncbi.nlm.nih.gov (Table 1). The locations of these genes in strata according to Lahn and Page (1999) and Pandey et al. (2013) are given in Table 2, along with information on X-inactivation in humans and mice. Further information on location of the genes on the X and Y chromosomes, on paralogs, evolution, function, and expression is given in the Data S1.
For all five genes, multiple sequence alignments (MSAs) of all seven sequences were produced using T-coffee (Notredame, Higgins, & Heringa, 2000). The outgroup species platypus was used for polarization of the states. For further investigation, we only used sites on which the MSA consists of exactly two different variants, because these are likely not hypermutable, as is plausible for sites which include three different nucleotides. A binary site pattern of, for example, 10 10 10 suggests a preradiation topology and a single substitution on the X gametolog after the split from the Y gametologs, but before the split of Laurasiatheria (cattle) from Euarchontoglires (primates and rodents). A binary site pattern of, for example, 00 10 10 suggests a single substitution on the X gametolog on the stem connecting humans and mice, that is, after the X gametologs split from the Y gametologs and after the phylogenetic split of cattle from the other two species, but before the split of primates from rodents.

| Binary site patterns
We code such a site pattern, together with a substitution on the cattle branch, for example, the site pattern 10 00 00, as bovine and stem substitutions on the X chromosome, that is, as bsx, and the corresponding substitution site pattern on the Y chromosome as bsy (see Table 3).
The linkage structure may also provide information about underlying processes. We thus looked for autocorrelation among site patterns along the alignment. In particular, we compared site patterns that suggest a gene conversion along the rodent branch, that is, 00 00 11, to the theoretical expectation if there is no autocorrelation, that is, the geometric distribution, using Q-Q plots (R Core Team, 2015). We chose the rodent branch, because it is the only branch with enough substitutions that makes detection of deviation likely. As we observed little autocorrelation, we treated each site of the MSA independently.
Similarly, function may constrain evolution. We therefore compared the synonymous/nonsynonymous and transition/transversion rates between different groups of patterns for all genes, using chi-square tests.
T A B L E 2 Characteristics of genes: stratum according to Lahn and Page (1999) (LP99) and Pandey et al. (2013) (PAND13); preradiation (PRR) versus postradiation phylogeny; X-inactivation in humans (XIH) and mice (XIM); and the copy number of the gene in the whole genome (see also the Data S1)

| Test for gene conversion
For the co-double method, the divergence between X and Y gametologs must precede the earliest relevant phylogenetic split between Laurasiatheria (cattle), on the one hand, and Euarchontoglires (humans and mice), on the other (Figure 2). The orthologs on the X chromosome should thus be more closely related to each other than to their respective gametologs on the Y chromosome, that is, the gene topology should be consistent with a "preradiation topology" (Wilson & Makova, 2009). For each gene, we constructed a phylogenetic tree with the neighbor-joining method (Saitou & Nei, 1987) and evaluated consistency with a preradiation topology using the binary site patterns and a parsimony principle.
The proposed test compares the frequencies of substitution site patterns that (1) can only arise by double substitutions with those that (2) may arise by either a double substitution or a single substitution and subsequent gene conversion between gametologs. The reasoning is as follows: Given a preradiation topology and in the absence of gene conversion, for some site patterns double substitutions need to be assumed, for example, for 00 11 00 or 01 10 11. The first site pattern 00 11 00 can also be produced by a single substitution of either the X-or the Y-chromosomal nucleotide of humans and a gene conversion within the human lineage.
In contrast, the second site pattern can only be produced by double substitutions, because after the split of the Laurasiatheria from the Euarchontoglires, the X-chromosomal gametolog of cattle and the Y-chromosomal gametolog of humans were never in the same organism.
The substitution site patterns of the gametologs with the most parsimonious double substitutions are summarized in  is shown next to the nodes. The trees are drawn to scale, and the branch lengths (shown next to the branches) represent the number of base substitutions per site. The evolutionary distances were computed using the maximum composite likelihood method (Tamura, Nei, & Kumar, 2004

| RESULTS
The gene pairs DDX3X/Y, RBMX/Y, USP9X/Y, and UTX/Y from cattle, mice, and humans showed a preradiation topology, when phylogenies were inferred using neighbor joining (Figure 3). The gametologs ZFX/Y showed an unclear topology and were excluded from further studies.
Note that in most cases, the branches of Y gametologs are longer, which reflects the higher mutation rates on Y chromosomes. The  We note that no differences between transitions and transversions in within-and among-lineage site patterns could be observed (Table 5). Likewise, proportions of within-and among-lineage patterns between synonymous and nonsynonymous substitutions were similar, but numbers of nonsynonymous substitutions were too low for meaningful tests (Table 6).
We thus lumped all data for the analyses with the variant codouble method. The frequencies of within-and among-lineage substitution site patterns were compared with a one-sided test to the proportion of 1:2, which is expected assuming only double substitutions (Table 4). For all four genes, the expected proportion was rejected (p < .05). (Note that a two-sided chi-square test for UTX/Y would be just nonsignificant with p = .05004.) We thus accepted the alternative hypothesis that information was exchanged between the X and Y gametologs, that is, that gene conversion contributed to the observed site patterns. The frequencies of the site patterns differ among genes (Table 7).  Of the four genes investigated herein, Wyckoff, Li, and Wu (2002) already noticed gene conversion between gametologs of RBMX/Y, located in the old stratum 2 (Pandey et al., 2013), but do not present evidence for it. Previous studies (see the Section 1) showed gene conversion between gametologs in younger strata, seven to nine according to Pandey et al. 2013). Katsura and Satta (2012) provide evidence for gene conversion in the eutherian lineage between SMCX/Y and UBE1X/Y, located in the old stratum six (Pandey et al., 2013).

| DISCUSSION
We note that exchange of genetic information without recombination has also been observed in other contexts, for example, in allopolyploid plants with disomic inheritance (Ma, Li, Vogl, Ehrendorfer, & Guo, 2009). It thus seems to be more frequent than commonly thought.
We do not infer rates of gene conversion and mutations, which would be a harder problem. We also do not include information on the positions of potential gene conversion sites, which has been used to test for gene conversion (Sawyer, 1989), because little autocorrelation along alignments was evident, such that inference of tract lengths seemed impossible. Our test would not work if gene conversion rates were so high that the majority of sites were affected, because then the overall tree topology would no longer be preradiative (this might have been the case with ZFX/Y). With our approach, it is impossible to infer the original mutation, and thus impossible to infer the direction of gene conversion, that is, either from the X to the Y chromosome or the reverse. Furthermore, the power of a test is limited by the sample size, which corresponds to the number of sites per gene in our test.
Note that even the shortest gene RBMX/Y showed evidence for gene conversion. The test is affected little by differences in substitution rates among sites or among species and, generally, provides a simple, fast, and robust way to detect exchange of genetic information between gametologs.
Only the four genes investigated herein and ZFX/Y could be aligned among our four study species platypus, mice, humans, and cattle. This is mainly because rodents, in our case mice, have very reduced Y chromosomes. Note that the power of the tests comes mainly from the number of aligned sites. Adding more species would likely have reduced this number and thus likely have reduced power.
Excluding mice but adding Y-chromosomal sequences from other mammalian orders, could possibly have increased the number of genes with multispecies alignments that could be tested with our method. At present, however, nearly complete coding sequences of Y chromosomes are rare.