Genes involved in the convergent evolution of asexuality in 1 stick insects 2

The ability to reproduce is one of the most fundamental traits that distinguishes living organisms 16 from inorganic matter, yet, organisms use a panoply of strategies for reproduction. The evolution 17 of these strategies, especially sexual and asexual reproduction, has been the focus of intensive 18 study. By contrast, the molecular underpinnings of sexual and asexual reproduction remain 19 relatively unknown. We investigated convergent gene expression changes and patterns of 20 molecular evolution across five independent transitions to asexuality in stick insects. We 21 compared gene expression of asexual females to those of females from close sexual relatives in 22 whole-bodies and two tissues: the reproductive tract and legs. We identified a striking amount of 23 convergent gene expression change, ranging from 5 to 8% of genes examined. Convergent 24 changes were also tissue-specific, with most convergent genes changing in only one tissue type. 25 Functional enrichment tests found that genes showing convergent changes in the reproductive 26 tract were associated with meiotic spindle formation and centrosome organization. These genes 27 are particularly interesting as they can influence the production of unreduced eggs, a key barrier 28 to asexual reproduction. Changes in legs and whole-bodies were likely involved in female sexual 29 trait decay, with enrichment in terms such as sperm-storage and pigmentation. By identifying 30 changes occurring across multiple independent transitions to asexuality, our results provide a rare 31 insight into the molecular basis of asexual phenotypes and suggest that the evolutionary path to 32 asexuality is highly constrained, requiring repeated changes to the same key genes. 33


38
Sexual reproduction is extremely costly. Sex is less efficient than asexuality for transmitting genes 39 to future generations [1] and in order to outcross, an individual has to find a partner, forgo foraging, 40 and risk contracting sexually transmitted diseases and predation while mating [2,3]. Yet, the 41 overwhelming number of sexual, as compared to asexual, animal and plant species [4,5] indicates 42 that sexual reproduction is highly advantageous. Identifying potential advantages conferred by 43 sex has motivated decades of research and a rich body of work on the evolution and maintenance 44 of sexual and asexual reproduction has been produced (reviewed in [2,6-9]). By contrast, little is 45 known about the molecular underpinnings of transitions between reproductive systems [10]. Yet 46 these molecular underpinnings have the potential to provide insights into the processes involved 47 in the evolution of asexuality, and to help understand how sex is maintained. For example, sex is 48 more easily maintained if asexuality evolves gradually in a sexual population than if it emerges 49 suddenly via major effect mutations [11][12][13]. 50 51 Some insight into the genetic basis of asexuality has been gained from studies of individual 52 asexual lineages [14-17], but a broad comparative framework for exploring common principles of 53 the molecular basis of asexuality is lacking. For example, a major unresolved question is whether 54 independent transitions to asexuality involve similar or different molecular changes. To address 55 these shortcomings, we explored the molecular underpinnings of asexuality in stick insects of the 56 genus Timema, a genus of wingless, herbivorous insects native to the West coast of North 57 America and the mountains of the Desert Southwest. This group is uniquely suited for 58 comparative studies of asexuality, as asexuality has evolved at least seven times independently 59

85
Transcriptomes and orthology 86 87 Reference transcriptome assemblies for each species were generated previously [26]. Bast et al. 88 [26] also identified 3010 one-to-one orthologs, which were used as our transcriptome reference. 89 For each tissue, orthologs with low expression (counts per million less than 0.5 in two or more We identified convergent gene expression changes between sexual and asexual species by 97 modelling gene expression as a function of species-pair (see Fig. 1), reproductive mode (sexual 98 or asexual), and their interaction in edgeR [27]. In such a model, convergence is indicated by an 99 overall effect of reproductive mode (FDR < 0.05), but no interaction (FDR > 0.05) (Supplemental 100 Table 1). Approximately four times as many genes changed convergently in the reproductive tract 101 (7%; 203/2754) and legs (8%; 206/2737) as compared to the whole-body (2%; 57/2985), perhaps 102 reflecting the relative difficulty in identifying expression changes in complex tissue assemblies 103 such as whole-bodies [28]. The amount of convergence we observe is considerable and 104 approximately double what we would expect by chance, for all tissues (whole-body: p = 0.0128, 105 reproductive tract: p < 0.0001, legs: p < 0.0001, Supplemental Fig. 1). The amount of change 106 between sexual and asexual females was relatively small for convergent genes, with a mean fold 107 change of approximately 1.4 (absolute log2 expression change for whole-body = 0.55, 108 reproductive tract = 0.68, and legs = 0.46) (Fig. 2). 109 interpretation of selection also predicts that convergent genes will be involved in divergent 118 functions in each tissue which, as we show below, is indeed the case. 119 120

Functional processes of convergently expressed genes 121 122
To detect convergence at the process level, we performed gene set enrichment analyses (GSEA). 123 Briefly, we scored Gene Ontology (GO) terms according to the rank of convergent expression 124 change of genes annotated to the terms; GO terms were then called significant if they had a better 125 average rank than expected by chance (see Methods). More than 100 GO terms are enriched in 126 each tissue studied (FDR < 0.05), providing strong support for convergence of biological 127 processes between asexual species (Supplementary Tables 2-4). This signal is not dependent 128 on any threshold at the gene level, and thus provides information on convergence at the process 129 level due to small but consistent contributions from many genes. Consistent with the gene 130 expression results, enriched GO terms were generally tissue specific; we found no significant 131 overlap between GO-terms enriched in the legs and reproductive tract (11 shared terms, FDR = 132 0.123), between the legs and whole-body (4 shared terms, FDR = 0.799), or between whole-body 133 and reproductive tract samples (10 shared terms, FDR = 0.064). 134

135
To reduce the number of enriched GO terms to examine we semantically clustered enriched GO 136 terms using ReviGO [29] (Supplemental Tables 5-7). The annotations of convergent changes in 137 the reproductive tract reflect the convergent evolution of parthenogenesis in asexual Timema, as 138 they were linked to meiosis (meiotic spindle organization, meiosis II, centrosome duplication, 139 meiosis I cytokinesis, meiosis II cytokinesis), and reproduction (growth of a germarium-derived 140 egg chamber, sperm individualization, gamete generation). However, convergent changes were 141 also linked to neuron development (neurogenesis, neuron development, neuron recognition), as 142 well as several GO terms involved in development and metabolic processes for which the link to 143 asexuality is less clear. In legs we identified GO terms involved in immune defence (response to 144 fungus, regulation of production of molecular mediator of immune response, regulation of 145 antimicrobial peptide production, regulation of humoral immune response), which may be 146 because asexual females are no longer susceptible to the costs associated with diseases 147 transmitted from sexual interactions, which can be considerable [30]. Convergent changes were 148 also linked to sex determination (primary sex determination; soma, primary response to X:A ratio), 149 which may control changes in the expression of sexual traits, and several metabolic processes. 150 In whole-body samples we find some reproduction associated terms (courtship behavior, male 151 Although all the convergent genes we identified showed an overall shift in expression across the 188 five species-pairs, often expression change in one or two of the pairs was small (<1.2 fold 189 change). We defined top candidate genes as convergent genes for which the absolute log2 fold 190 change in expression was more than 0.25 (~1.2 fold change) for all species-pairs. Most of these 191 top genes showed convergent shifts in the reproductive tract (36 genes, relative to 4 and 15 genes 192 for whole-body and legs, respectively) ( Figure 4, Supplemental Table 1 Table 8). To investigate more genes, 237 we mapped reads from all samples to genes from each species which had a reciprocal-best-blast-238 hit between species-pairs (which includes the 1-to-1 orthologs analysed above). This approach 239 generated 10 different datasets (one for each species assembly), with between 15500 and 17583 240 genes. After filtering out genes with low expression (using cpm, see Methods) in each dataset, 241 this approach allowed us to examine between 2.43-3.12 (dependent on species and tissue) times 242 more genes than using the 1-to-1 orthologs (Supplemental Table 8). Results from this approach 243 qualitatively confirmed the results found using only the 1-to-1 orthologs: the percentage of genes 244 showing a convergent expression ranged from 4-5% for whole-body samples and 6-8% for the 245 reproductive tract and leg samples, dependent on which of the species transcriptome was used 246 (Supplemental Table 8 The approach taken above allowed us to identify genes which showed convergent changes in 252 expression across independent transitions to asexuality. This approach will not identify expression 253 . CC-BY 4.0 International license is made available under a The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It . https://doi.org/10.1101/364869 doi: bioRxiv preprint changes confined to a single or few species-pairs. Such changes are clearly not convergent at 254 the gene expression level, however, these changes could be convergent at the functional process 255 level, whereby species-pair specific changes in gene expression are involved in common 256 functional processes between species-pairs. To test this, we compared each asexual species to 257 its closest sexual relative and called differentially expressed ( There were no genes that showed overlap between all sexual-asexual species-pairs in any tissue 265 ( Fig. 5A). Examination of overlaps between pairs of sexual-asexual species-pairs found some 266 overlapping genes, but these were close to the expectation by chance (Fig. 5B, for all levels see 267 Supplemental Table 12). The majority of the DE genes also showed a significant interaction 268 between species-pair and reproductive mode in the model used to identify convergently changing 269 genes (whole-body = 69%, reproductive tract 66%, and legs = 81%), corroborating the finding 270 that the vast majority of the DE genes are species-pair specific. Note the species-pair by 271 reproductive mode interactions do not appear to be generated by one specific species-pair as 272 generally genes DE between one species-pair were not DE between the other 4 species-pairs. 273

274
The DE genes of the different species pairs are not involved in convergent functional processes. 275 Species-pair-specific genes were enriched for a number of GO terms, however no GO terms were 276 found to overlap between all pairs in any tissue (Supplemental Fig. 6). Examination of overlaps 277 between pairs of sexual-asexual species-pairs found some overlapping GO terms, but these were 278 close to the expectation by chance (Supplemental Table 13). This pattern remained even when a 279 more liberal approach, whereby related GO-terms were considered as a unit, was applied 280 (Supplemental Fig. 7A). This overall lack of overlap suggests that species-pair-specific genes are 281 not involved in producing convergent phenotypes but are instead the product of either lineage-282 specific selection or drift. These two processes are difficult to disentangle, but our results are 283 more consistent with drift rather than lineage-specific selection. Indeed, species-pair specific 284 genes showed similar changes in gene expression across tissues, in contrast to the mainly tissue-285 specific changes uncovered for convergently changing genes. The overlap of species-pair specific 286 genes between tissues was significantly greater than expected by chance (Table 1). 287 . CC-BY 4.0 International license is made available under a The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It . https://doi.org/10.1101/364869 doi: bioRxiv preprint Finally, these results were reproduced when examining a much larger set of genes (genes with 288 reciprocal-best-blast-hits between species-pairs, see above) as both genes DE between each 289 species-pair, and their enriched GO terms, showed little overlap (Supplemental Figs 8 and 9, and  290   Supplemental Tables 14 and 15). Asexuality is a complex adaptation that includes two major components: the ability to produce 323 viable asexual offspring, and secondary adaptive changes that would not have been selected for 324 in sexual species (e.g. the reduction of costly sexual traits). A key change necessary for the 325 production of asexual offspring is the ability to produce unreduced eggs [44]. Convergently 326 . CC-BY 4.0 International license is made available under a The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It . https://doi.org/10.1101/364869 doi: bioRxiv preprint expressed genes in the reproductive tract were enriched for changes in meiosis, and in particular 327 meiotic spindles, which are key for the proper division of cells during meiosis. Mutations in meiotic 328 spindles have been shown to result in unreduced meiotic products in D. melanogaster, and 329 specifically in two genes (bond [31] and pelo [45]) which show convergent changes in expression 330 in asexual Timema. As such we suggest that these changes may underlie the non-reduction of 331 eggs in asexual Timema. An alternative hypothesis is that since Timema reproduce 332 parthenogenetically (and thus likely no longer recombine) changes in meiotic genes represent 333 trait decay. Although possible, previous work has shown that, in fact, meiotic genes are not only 334 retained in asexual lineages without damaging mutations, but often appear to be subject to 335 selection for changes in expression, via duplication or differential upregulation of promoters [46-336 49]. Taken together with our results, we suggest that modifications to meiotic genes, specifically 337 those that that disrupt meiotic cell division, are key in overcoming a major barrier to the evolution 338 of asexuality: the production of unreduced eggs. 339

340
The production of unreduced eggs is not the only barrier to producing offspring asexually. In most 341 species, sperm transfer essential components for the formation of a functioning centrosome 342 [50,51]. This paternal contribution represents a second key barrier in the evolution of 343 parthenogenesis in many systems [44]. However, in phasmids the centrosome is assembled 344 without any contribution from sperm in both sexual and asexual species [52]. This may act as pre-345 adaptation for asexuality in stick insects and account, in part, for the large number of asexual stick 346 insects. 347 348 A final barrier to asexual offspring production in many systems is egg activation. In many species 349 mature oocytes are arrested at a specific stage (e.g. at metaphase II in mammals, and metaphase 350 I in most insects), and must be activated by sperm to re-enter the cell-cycle In addition to convergent changes, we also identified many species-pair specific gene expression 409 changes. In contrast to convergent genes, species-pair specific genes showed common shifts in 410 expression across tissues, and inconsistent associations with functional processes between 411 species-pairs, that were largely unrelated to asexual phenotypes. Taken together, these results 412 suggest that the majority of changes we observe from a single sex-asex species-pair comparison 413 are due to drift rather than selection. Our findings thus highlight the problem of drawing inferences 414 on the causes or consequences of asexuality from the examination of only a single transition to 415 asexuality, whereas examining several transitions allows us to disentangle adaptive changes and 416 those due to drift. 417 418 Overall, we find evidence for a striking number of convergent changes across five transitions to 419 asexuality. The amount of molecular convergence to expect, however, is dependent on several 420 factors including the complexity of the phenotype, and the size of the mutational target [65]. For 421 instance, here we find that a key change required for asexual reproduction, the production of 422 unreduced eggs, likely requires changes to meiotic spindle regulation. The pathways that govern 423 meiotic spindle regulation are relatively small in number [66], meaning that only a small minority 424 of genes are likely able to confer the relevant changes, making the chance of convergence for 425 this trait relatively high. In contrast, the observed reduction of sexual traits could be produced by 426 changes to numerous genes and pathways (i.e. there is a large mutational target) making 427  The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It . https://doi.org/10.1101/364869 doi: bioRxiv preprint likelihood ratio test to determine significance of model terms for each gene by comparing 500 appropriate model contrasts. P-values were corrected for multiple tests using Benjamini and 501 Hochberg's algorithm [78], with statistical significance set to 5%. Using this approach, we 502 classified genes as convergently differentially expressed when there was a significant effect of 503 reproductive mode (FDR < 0.05) but no interaction effect of species-pair by reproductive mode 504 (FDR > 0.05). Differentially expressed genes within each species-pair were identified using 505 pairwise contrasts between each sexual and asexual pair. To test if the observed number of convergent genes was significantly greater than expected by 513 chance we performed a permutation test by whereby, for the read counts of each gene, we 514 randomly switched the assignment of reproductive mode (sexual or asexual) within a species-515 pair. Note that all biological replicates from a particular group were always assigned to the same 516 reproductive mode (i.e. In the event of a switch, all sexual replicates were assigned as asexual, 517 and vice versa). This process was repeated to produce 10,000 permuted data sets, which were 518 then ran through the gene expression pipeline described above to generate a distribution of the 519 number of convergent genes we expect to find by chance. (default settings within Blast2GO) was then run for each sequence, and the results merged with 528 the blast results to obtain GO terms. This produced two sets of functional annotations, one derived 529 from all arthropods and one specifically from Drosophila melanogaster. The D. melanogaster GO 530 term annotation generated around four times more annotations per sequence than NCBI's nr-531 arthropod database. We therefore conducted all subsequent analyses using the GO terms derived 532 . CC-BY 4.0 International license is made available under a The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It . https://doi.org/10.1101/364869 doi: bioRxiv preprint from D. melanogaster, but note that results using the annotations from all arthropods were 533 qualitatively the same (see Supplementary Fig. 7B). 534 535 We conducted gene set enrichment analyses using the R package TopGO (v. 2.28.0) [81] using 536 the elim algorithm to account for the GO topology. Gene set enrichment analyses identify enriched 537 GO terms in a threshold-free way, by finding GO-terms that are overrepresented at the top of a 538 ranked list of genes. For comparisons within a species-pair, genes were ranked by FDR; to identify 539 enrichment of convergent genes, genes were ranked by FDR value for reproductive mode, with 540 the FDR value for genes that showed a significant lineage by reproductive mode set to 1. GO 541 terms were considered to be significantly enriched when p < 0.05. Enriched GO terms were then 542 semantically clustered using ReviGO [29] to aid interpretation. 543

544
The significance of overlapping GO terms was determined using SuperExactTest as described 545 above. The hierarchical nature of GO terms generates a bias towards finding a significant amount 546 of overlap, since enrichment terms are non-independent. It is however possible that the 547 complexity of the GO term hierarchy could lead to convergent functional processes being 548 overlooked. For instance if a GO term is enriched in one comparison, but its parent term is 549 enriched in another comparison, then there would be no apparent overlap. To address this, we 550 also looked at the amount of 'linked overlap' of GO terms, whereby significant GO terms were first 551 clustered together based on parent or child terms. 552 553 For the GO term enrichment analyses of convergently differential expressed genes we used only 554 the annotation from T. bartmani as it had the most number of sequences annotated. Annotations 555 to each of the other species were very similar to those from T. bartmani, with 80% of annotations 556 being identical across all 5 species annotations. The remaining 20% of sequences were typically 557 characterized by an additional term in one or more of the species. For comparisons within a 558 lineage we used the annotation of the sexual species in that lineage. Although the annotations 559 are very similar across all ten species the small differences in annotation could create differences 560 in the amount of overlap observed between contrasts (e.g. if a term is annotated to an ortholog in 561 one annotation but not another). To examine this, we repeated the analysis using only annotations 562 from T. bartmani. This produced a virtually identical result (Supplemental Fig. 7C)  To test for differences in the rate of evolutionary divergence between gene categories, we used 567 dN/dS ratios for each of the one-to-one orthologs from [26]. To obtain an estimate for pN/pS reads 568 from the whole-body libraries for each asexual species were mapped to the reference using 569 RSEM/bowtie2 with default parameters and fragment length mean = 200 fragment length sd = 570 100 [82,83]. Samtools v1.2 was then used to create an mpileup file, which was filtered with 571 All of the above analyses used only the one-to-one orthologs. To examine a larger fraction of the 581 transcriptome we produced species-pair references by using a reciprocal blast between the 582 assemblies of sexual-asexual sister species (blastN, minimum e-val = 0.00001, minimum query 583 coverage = 30%). Prior to this step potential contaminants were filtered from these by blasting 584 transcripts to local versions of the nt (using blastN, default options except task blastn, 585 max_target_seqs = 10) and nr (using blastX, default options except, max_target_seqs = 10) 586 databases (downloaded: 07/08/2016) using NCBI's blast client (v. 2.2.30+). Blast hits with an e-587 value > 0.0000001 were discarded. The remaining blast hits were used to assign a phylum to 588 sequences if >=50% of Blast hits came from one phylum (in the event of a tie, the taxa with the 589 highest e-value was used as a tiebreaker). Transcripts that were assigned to a non-arthropoda 590 phylum were discarded (note that transcripts with no Blast hits or that blasted to mixed phyla were 591 retained). This filtering removed between 4-8% of transcripts (see Supplemental Table 8). Reads 592 of each species were then mapped to each species-pair reference in the same way as for the 1-593 to-1 orthologs. Differential expression analyses and GO-term enrichment analyses were then 594 repeated as described above. 595 596 597 . CC-BY 4.0 International license is made available under a The copyright holder for this preprint (which was not peer-reviewed) is the author/funder.  The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It . https://doi.org/10.1101/364869 doi: bioRxiv preprint