Evolution of Yin and Yang isoforms of a chromatin remodeling subunit results in the creation of two genes

Genes can encode multiple isoforms, broadening their functions and providing a molecular substrate to evolve phenotypic diversity. Evolution of isoform function is a potential route to adapt to new environments. Here we show that de novo, beneficial alleles in the nurf-1 gene fixed in two laboratory strains of C. elegans after isolation from the wild in 1951, before methods of cryopreservation were developed. nurf-1 encodes an ortholog of BPTF, a large (>300kD) multidomain subunit of the NURF chromatin remodeling complex. Using CRISPR-Cas9 genome editing and transgenic rescue, we demonstrate that in C. elegans, nurf-1 has split into two, largely non-overlapping isoforms (NURF-1.B and NURF-1.D, which we call Yin and Yang) that share only two of 26 exons. Both isoforms are essential for normal gametogenesis but have opposite effects on male/female gamete differentiation. Reproduction in hermaphrodites, which involves production of both sperm and oocytes, requires a balance of these opposing Yin and Yang isoforms. Transgenic rescue and genetic position of the fixed mutations suggest that different isoforms are modified in each laboratory strain. In a related clade of Caenorhabditis nematodes, the shared exons have duplicated, resulting in the split of the Yin and Yang isoforms into separate genes, each containing approximately 200 amino acids of duplicated sequence that has undergone accelerated protein evolution following the duplication. Associated with this duplication event is the loss of two additional nurf-1 transcripts, including the long-form transcript and a newly identified, highly expressed transcript encoded by the duplicated exons. We propose these lost transcripts are non-functional biproducts necessary to transcribe the Yin and Yang transcripts in the same cells. Our work suggests that evolution of nurf-1 isoforms in nematodes creates adaptive conflict that can be resolved by the creation of new, independent genes.


34
There is general interest in understanding how animals adapt to new environments. What are the alleles      The low genetic diversity between these strains enables identification of not only the causal genes for 17 these traits, but the exact causal nucleotides.

18
To date, five de novo, causal genetic variants have been identified in either the N2 or LSJ2 lineage (de       Xu et al. 2016). We decided to test whether this additional genetic variant or variants affected evolutionary 30 fitness of the animals in laboratory conditions using a previously described pairwise competition assay 31 (Zhao, Long et al. 2018). To do so, we took advantage of three strains we had previously created; CX12311 32 is a near isogenic line used to eliminate the fitness and phenotypic effect of derived alleles of N2 npr-1 and

49
To determine if the N2-derived intron SNV in nurf-1 ( Figure 1B) was responsible for the fitness gains (as 50 opposed to one of the seven linked LSJ2/N2 variants), we used CRISPR-Cas9 to directly edit the LSJ2 51 allele of the intron SNV into the standard N2 strain to create a strain we will refer to as ARL(intron,LSJ2>N2). 52 1 again contains a dpy-10 silent mutation). The ARL(intron,LSJ2>N2*) strain was significantly less fit than the N2 2 strain at a level similar to the difference between the NIL(nurf-1,LSJ2>N2*) and ARL(del,LSJ2>N2) strains ( Figure   3 1D). These results indicate that beneficial alleles of nurf-1 arose in both laboratory lineages -the 60 bp 4 deletion makes LSJ2 animals more fit in liquid, axenic media (Large, Xu et al. 2016), and the intron SNV 5 makes N2 animals more fit on agar plates seeded with bacteria.

6
Brood size of C. elegans hermaphrodites is an important trait for evolutionary fitness in laboratory spermatogenesis before transitioning to oogenesis; a concomitant lengthening of spermatogenesis time 10 increases the total brood size of hermaphrodites but also delays when reproduction can start. When we 11 compared the total fecundity produced by the N2 and ARL(intron,LSJ2>N2) strains, we found a significant 12 difference, with the ARL(intron,LSJ2>N2) strain producing ~30 fewer offspring than N2 ( Figure 1E). The

15
RNAseq analysis identified transcriptional differences caused by the intron SNV during spermatogenesis, 16 supporting our hypothesis that sperm development is affected by this SNV. We collected RNA from 17 synchronized N2 and ARL(intron,LSJ2>N2) hermaphrodites at two timepoints, 52 and 60 hours after hatching,  Table S1). Although a portion of these 3,384 genes are 21 expressed in the germline, these genes are also expressed in additional tissues ( Figure S2B). Gene 22 ontology analysis suggests that cuticle development and innate immune responses are regulated by nurf-23 1 (Table S2) consistent with the role of its orthologs in regulating immunity and melanocyte proliferation in    Table 1).

36
To identify other transcripts produced by nurf-1 and quantify the relative proportions of each that are  Table 1). nurf-1.a encodes a full-length 2,197 amino acid isoform analogous to the primary isoform of 44 BPTF in humans and NURF301 in Drosophila (Figure 1C). Despite the expectation that C. elegans would 45 produce a similar protein, the Oxford Nanopore long-read data is the only evidence supporting its existence.

46
The nurf-1.q transcript is predicted to produce a 243 amino acid unstructured protein. With the exception 47 of the full-length nurf-1.a transcript, the overlap of these transcripts is quite minimal, resulting in predicted 48 isoforms with unique protein domains and functions ( Figure 2B).

49
We quantified the relative expression of these five transcripts by either counting the number of Nanopore 50 reads that matched the transcript or by using kallisto (Bray, Pimentel et al. 2016) to predict transcript 51 5 abundance using Illumina short-read sequencing data ( Figure 2C). These predictions qualitatively agreed 1 in transcript ranking of expression strength (although quantitative variation in predictions were observed, 2 reflective of the different technologies or developmental stages of the animals). Surprisingly, the newly 3 described nurf-1.q transcript was the most highly expressed followed by the nurf-1.b transcript, and the 4 nurf-1.a, nurf-1.d and nurf-1.f were expressed at similar lower levels.

5
Although each of the five major transcripts are transcribed, this result does not necessarily mean they are 6 translated into stable protein products. To facilitate analysis of NURF-1 proteins, we used CRISPR-Cas9 7 to fuse two distinct epitope tags (HA and 3xFLAG tag) to the endogenous nurf-1 locus, just prior to the 8 stop codons in the 16 th and 28 th exon, respectively ( Figure S5A). Immunoblot analysis supported the 9 expression of the B, D, and F isoforms, but not the A or Q isoforms ( Figure S5B). Although larger proteins, 10 such as the A isoform, can be difficult to transfer during immunoblots, the lack of a band matching the 11 small Q isoform suggests the highly expressed nurf-1.q transcript is not translated into protein or the protein 12 is rapidly degraded.

13
The B and D isoforms are both essential for reproduction and the F isoform modifies the heat shock

19
Comparison of the phenotypes of the n4293 and n4295 homozygotes leads to the model that the B isoform 20 is essential for reproduction and the A, D, and/or F isoforms have subtle effects on growth rate and 21 reproductive rate ( Table 1).

22
To further delineate the biological role of each isoform, we used CRISPR-Cas9 to engineer nine stop 23 codons in eight exons of the nurf-1 gene: the first, second (two positions), 7 th , 15 th , 18 th , 19 th , 23 rd , or 26 th 24 exons ( Figure 3A). The predicted effects of these stop codons on each major isoform are shown in Figure   25 S6 and Table 2. Homozygote animals for each mutation were assayed for total brood size and growth rate.

26
Analysis of the phenotypes of these mutants indicated that our working model was incorrect. Instead, we 27 propose that both the B and D isoforms are essential for reproduction.

28
As expected, engineering stop codons in the first, second, and 7 th exons greatly reduced fecundity, 29 resulting in either sterility, or a mortal germline phenotype, initially reducing total brood size of animals, 30 before eventually causing complete sterility after around three-to-five generations of homozygosity ( Figure   31 3B and C). Although the qualitative phenotypes of these four alleles agreed, we observed interesting 32 quantitative differences between them. The second stop codon in the second exon (kah106) and the stop 33 codon in the 7 th exon (kah142) reduced growth and fecundity more than the first exon stop codon (kah90) 34 or the first stop codon in second exon (kah91) (Figure 3B

40
Unexpectedly, engineering stop codons in the 18 th and 19 th exons also caused a mortal germline 41 phenotype (kah96 and kah99) ( Figure 3B). This result was surprising, because the n4295 allele, predicted 42 to be a loss-of-function allele for the D and F isoforms due to the loss of the PHD and bromodomains, does 43 not have a mortal germline phenotype. We excluded a number of potential explanations for this 44 discrepancy. A suppressor for the n4295 allele could have fixed during the construction of this strain.

45
However, the kah68 allele, which contains a stop codon within the n4295 deleted region, phenocopies the 46 n4295 allele and not the kah96 and kah99 animals (Figure 3B, 3C, and Figure S7)). Another possibility is 47 that the D isoform suppresses the F isoform; loss of both isoforms (in the n4295 background) is tolerated, 48 but loss of just the D isoform (in the kah96 or kah99 backgrounds) allows the F isoform to prevent 49 reproduction. However, we could exclude this possibility as the double mutant containing both the n4295 50 allele and the 18 th exon stop allele phenocopied the kah96 single mutant ( Figure S8). Additionally, specific 6 loss of the F isoform by the 23 rd exon stop allele (kah11) did not affect the phenotype of animals ( Figure   1 3B and C). Our data suggests that, unlike human BPTF, the ability of NURF-1 to bind modified histones is 2 not required for its function. We further confirmed this hypothesis by editing conserved residues in these 3 the PHD and bromodomains necessary for recognition of the H3K4me3 and H4K16ac marks ( Figure S9).

4
The most parsimonious explanation of our data is that either the A or D isoform is essential for reproduction 5 in C. elegans. Compound heterozygote tests allowed us to distinguish between these possibilities, 6 indicating that the D isoform is required for reproduction and wild-type growth rate, and the A isoform is 7 dispensable for reproduction and development (Figure 4). We first verified that the kah93, kah96, and 8 kah106 alleles were recessive by measuring the fecundity of heterozygous animals ( Figure 4B). Next, we 9 examined the fecundity of kah106/kah96 compound heterozygotes, which are predicted to lack only the A 10 isoform, due to the production a single unaffected copy of the B isoform from the kah96 haplotype and a 11 single unaffected copy of the D isoform from the kah106 haplotype. If the A isoform was essential for 12 reproduction, we would expect these compound heterozygotes to be sterile or have severe defects in 13 fecundity. However, these animals were indistinguishable from wild-type, suggesting that the full-length A 14 isoform is not essential (Figure 4B). The kah106/kah93 compound heterozygotes showed similar results.

15
These animals are predicted to encode one unaffected copy of the D isoform, one truncated copy of the B 16 isoform, and zero unaffected copies of the A isoform. These animals were mostly wild-type, with a small 17 reduction in total fecundity ( Figure 4B). We believe that the A isoform is not essential and the truncation 18 of the B isoform slightly perturbs its function, causing a slight reduction in fecundity. Finally, we analyzed 19 kah93/kah96 compound heterozygotes. These animals are predicted to encode zero wild-type copies of 20 the D isoform, one wild-type copy of the B isoform, and zero wild-type copies of the A isoform. These 21 animals were essentially sterile. Taken together, we believe that the B and the D isoform are both essential 22 for reproduction.

23
To confirm that the D isoform is essential, we also created a transgenic strain containing an integrated

36
Transcriptional analysis of strains lacking the F isoform indicated that the initial transcriptional response to 37 heat shock was largely the same, but the long-term transcriptional response of a subset of genes was 38 affected (Figure S11E-G). We conclude that the F isoform is specifically up-regulated by heat shock and 39 plays a modulatory role in determining the long-term transcriptional response to heat shock. Although the B and D isoforms are both required for reproduction, the molecular mechanism that these 42 isoforms operate through could be different. One possibility is that the long-form of NURF-1 has split into 43 two subunits -both isoforms participate as part of the NURF complex, cooperating together to regulate 44 reproduction. However, the D isoform might instead modify NURF activity by competing for binding with 45 transcription factors or regions of the genome to which NURF is recruited. A third possibility is that the D 46 isoform acts through a NURF-independent pathway.
To gain insights into the molecular nature of the D isoform, we decided to determine precisely how the B 48 and D isoforms regulate reproduction, using three nurf-1 stop alleles ( Figure 5A). For hermaphrodites to 49 produce a fertilized egg, the gonads must produce both male and female gametes at different 50 developmental times ( Figure 5B). Initially, gametogenesis produces sperm, creating approximately 300 51 sperm at which point a permanent sperm-to-oocyte switch occurs. From this time, gametogenesis 1 produces oocytes until the animal dies or the gonad ceases to function (Hubbard and Greenstein 2005).

2
A number of defects could cause sterility -inability to form gametes, inability to create sperm, inability to 3 create oocytes, or defects in the sperm and/or oocyte function. We used DAPI staining to characterize the 4 production of sperm and oocytes in three nurf-1 mutants (Figure 5C and D). We first tested kah106 5 mutants, which lack the B isoform ( Figure 5A), for the ability to produce sperm. Compared with N2 animals, 6 which create ~300 sperm per animal, the number of sperm produced by kah106 animals was greatly 7 reduced, resulting in the production of only approximately 60 sperm ( Figure 5D). These animals produced 8 a normal number of oocytes, indicating that spermatogenesis seemed to be affected specifically ( Figure   9 5E). We interpret these data as evidence that hermaphrodites that lack the NURF-1.B isoform spend less 10 time in spermatogenesis before transitioning to oogenesis. We next tested kah96 mutants which lack the 11 D isoform. These animals produced approximately 500 sperm (Figure 5C and D) and almost no oocytes 12 ( Figure 5E). We interpret these data as evidence that hermaphrodites that lack the D isoform are unable 13 to transition from spermatogenesis to oogenesis. Finally, we performed similar experiments on kah93 14 mutants, which lack the D isoform and have a truncated B isoform. These animals showed an intermediate 15 phenotype, with normal number of sperm but reduced number of oocytes (Figure 5D and E). The reduced 16 activity of the B isoform due to its truncation potentially allows other factors to transition the animals to 17 oogenesis, resulting in the milder defects found in the kah93 animals ( Figure 3B).

18
Although animals that lack either the B or D isoform are unable to reproduce, the cause of sterility is

29
To study the effects of the 60 bp deletion and intron SNV on transcription, we focused on two comparisons: 30 1) the N2* vs ARL(del, LSJ2>N2*), which will identify transcriptional changes caused by the 60 bp deletion and 31 2) the NIL(nurf-1, LSJ2>N2*) vs ARL(del, LSJ2>N2*), which will identify transcriptional changes caused by the intron 32 SNV (as well as linked mutations in the NIL other than the 60 bp deletion). We believe that the latter 33 comparison will mostly report the changes of the intron SNV, as it accounts for most of the fitness 34 differences between the two strains. We observed a large negative correlation between these two 35 comparisons ( Figure S12B). The most parsimonious explanation for this observation is that both the N2 36 and LSJ2-derived alleles in nurf-1 regulate the activity of a common molecular target, which is likely to be 37 the NURF complex.  and nurf-1.d were isolated from this species, they no longer shared any exons with each other, suggesting 42 that they were expressed from two separate genes ( Figure 6A). We compared the gene products using    (Figure S13, S14, and S15). Like C. 52 8 briggsae, the species in the brenneri/tribulationis clade express a transcript matching nurf-1.b from a single 1 gene (which we call nurf-1-1). These species also express two transcripts matching nurf-1.d and nurf-1.f 2 from a second gene, called nurf-1-2. None of these species appears to express nurf-1.a or nurf-1.q 3 transcripts (Figure S13, S14, and S15). RNA-seq data for species outside of this clade ( Figure S13, S14, 4 and S15) matched the transcription pattern of C. elegans, suggesting that these species express five major 5 transcripts from a single nurf-1 gene: nurf-1.a, nurf-1.b, nurf-1.d, nurf-1.f, and nurf-1.q. These data suggest 6 that the C. elegans transcript structure is ancestral.

7
Phylogenetic analysis of the duplicated ~200 amino acid sequence was used to evaluate different 8 hypotheses surrounding the timing and number of duplication events. The analysis supported the model 9 that the split of nurf-1 into two distinct genes happened once within the common ancestor of the 10 brenneri/tribulationis clade ( Figure 6C -additional possible trees shown in Figure S16). The topology 11 recovered for the region of nurf-1 outside the duplication is congruent with the species tree ( Figure S17).

12
Interestingly, the rate of amino acid substitution in the duplicated region was accelerated in nurf-1-1,

13
suggesting that this region experienced positive selection and/or relaxed selection after this duplication 14 event occurred.

16
In this paper, we make use of CRISPR/Cas9-enabled gene editing to characterize the nurf-1 gene in C.

17
elegans and then study the sequence and expression of nurf-1 orthologs in other Caenorhabditis species.

18
The combination of genetics and evolutionary analysis allowed us to make a number of surprising 19 observations. First, we show that an SNV in the 2 nd intron of nurf-1 that fixed in the N2 laboratory strain 20 increases the evolutionary fitness and fecundity of the N2 strain. Second, we show that the full-length 21 isoform of nurf-1 has split into two essential, mostly non-overlapping isoforms with opposite effects on cell

40
Genetic analysis suggests that full-length NURF301 is required for gametogenesis in both sexes while the 41 N-terminal isoform is required for regulation of pupation and innate immunity.

42
Nematodes have retained the N-terminal isoform but seem to have lost use of the full-length isoform for   We propose a molecular mechanism to explain the actions of the B and D isoforms to regulate transcription 3 ( Figure 7B). These two isoforms share 207 amino acids of protein sequence, which falls in a region that

19
In general, such a mutation would not be predicted by most bioinformatic approaches to have a phenotypic 20 effect. Only the low genetic diversity between the LSJ2 and N2 strains allowed us to focus on this variant,

21
and eventually demonstrate this particular variant is causal.

22
The probability of two beneficial mutations happening in both lineages by random chance is quite small.

24
Only a handful of these fixed mutations are expected to be beneficial; our recent QTL mapping of fitness

29
Targeting of nurf-1 is consistent with its role as a regulator of life history tradeoffs. Many traits influence 30 individual and offspring survival; however, the mapping of these traits onto fitness is thought to be 31 dependent on the environmental niche an organism occupies. The LSJ2-derived deletion in nurf-1 modified 32 life history tradeoffs to prioritize individual survival over reproduction; by shunting energy away from 33 reproduction and growth, they increased their chances of surviving on the poor, unnatural food. N2 animals 34 grew on agar plates seeded with E. coli bacteria, which they can readily consume and metabolize into a 35 useful energy source. In these conditions, survival is not the primary concern; each animal has three days 36 to eat as much food as possible and produce as many progenies as possible to maximize the probability 37 one of their offspring is transferred to the new food source. It is reasonable to think that the N2 and LSJ2    potentially changing the transcription factors NURF-1 binds to. One potential issue that arises in this situation is the pleiotropy of genetic changes in the shared region; changing the amino acid sequence of 1 the B isoform also changes the D isoform. Are there situations where modifying one isoform but not the 2 other is preferred? Escape from adaptive conflict is a mechanism by which gene duplication can resolve 3 the situation where a single gene is selected to perform multiple roles (Des Marais and Rausher 2008).

4
After duplication, each copy is free to improve its function independently.

5
In a clade of Caenorhabditis nematodes, the nurf-1 gene has split into two separate genes in a manner 6 consistent with escape from adaptive conflict. Duplication of the shared exons releases each isoform to 7 evolve independently. Our data suggests that after this event, the duplicated region in the nurf-1-1 gene 8 experienced accelerated evolution, consistent with an increased rate of adaptive evolution.

9
This duplication event also resolved a molecular conflict between the nurf-1.b and nurf-1.d transcripts. To 10 produce both transcripts in the same cell, there must be a mechanism to distinguish between transcripts 11 containing the 1 st to 15 th exons (the nurf-1.b transcripts) and transcripts initiating from the 14 th exon (the 12 nurf-1.d transcript). In the former case, the 15 th exon is spliced to the 16 th exon to terminate the transcript.

13
In the latter case, the 15 th exon is spliced to the 17 th exon and continues transcription. Alternatively, the 14 cell might not distinguish between transcripts, but uses each alternative splice site at a constant ratio (i.e.

15
80% of the time, the 15 th exon is spliced to the 16 th exon and 20% of the time, the 15 th exon is spliced to 16 the 17 th exon). In the latter scenario, two additional transcripts must be produced. Intriguingly, these two 17 transcripts match nurf-1.a and nurf-1.q, suggesting these transcripts are non-functional biproducts of 18 molecular conflict between nurf-1.b and nurf-1.d.

19
Multiple lines of evidence are consistent with the 2 nd scenario. First, while the nurf-1.q transcript is produced 20 at high levels, we were unable to observe its product in our immunoblots, suggesting that it is either not 21 translated or the protein product is rapidly degraded. Second, our genetic tests were unable to identify a     The following strains were used in this study:

15
Resequencing of the PTM88 strain identified a number of background mutations, including an A to G 16 missense SNV that is predicted to change an asparagine to an aspartic acid which we named kah132. The 17 flanking sequence of this mutation is 5'-cgacaatgac[a]atcgccaggg-3'. We backcrossed out this 18 spe-9(kah132) mutation, along with additional background mutations, to create PTM417.

19
To create PTM416, we designed a number of guide RNAs nearby the intron SNV. However, we were 20 unable to identify editing events using these guide RNAs, putatively due to the high usage of As and Ts.

21
We turned to a two-step strategy to create the edit, first creating a deletion of the 2 nd intron along with 22 flanking exon regions using guide RNAs with high predicted efficiency. We created the following constructs 23 driving the following sgRNAs:

28
We also ordered an oligonucleotide repair:

34
Jackpot broods were identified and roller animals were genotyped using the following primers along with

38
A single heterozygote worm was identified. Wild-type heterozygote progeny were identified (to remove the 39 linked dpy-10 mutation) and this mutation was balanced (homozygous animals were sterile) with an 40 integrated GFP marker near the nurf-1 gene (oxTi924). This strains was frozen with the following genotype:

16
To create the PTM420 epitope-tagged strain the following guide RNA and repair oligo was used to first 17 add an HA epitope tag into the 16 th exon:

24
We next added a 3xFLAG tag to the C-terminal of nurf-1 gene using purified Cas9 protein (IDT, Catalog

38
To facilitate the genotyping process, some of the repair oligos for STOP codon replacement sites contain 39 restriction sites that will alter some of the amino acids, exact changes are listed in Table S4. In C. elegans 40 nomenclature, Identical edits must be given different allele names if they were isolated independently.

41
For mutants that were sterile (or lead to sterility), we balanced these mutations using a GFP (oxTi924) or   Pnurf-1.d::nurf-1.d-SL2-GFP (insertion vector with homologous arms), 2.5ng/ul pCFJ90 (Pmyo-14 2::mCherry), 5ng/ul pCFJ104). This was injected into EG6699 uncoordinated animals. Three injected 1 animals were placed on a single plate at 30 o C to facilitate starvation. After 5 days, coordinated animals 2 with GFP fluorescence and no red fluorescence were singled to new NGM plates and allowed to proliferate.

3
Their progenies were singled and a single homozygote without uncoordinated offspring was maintained.

4
This homozygote was then backcrossed to N2 for 4 generations to remove unc-119(ed3) III to create the 5 PTM337 strain containing the integrated rescue construct. This strain was then crossed to a variety of nurf-6 1 alleles using standard protocols.        were placed on NGM agar plates and incubated at 20°C until they reach young adulthood, as determined by when eggs were observed on assay plates. These worms were then harvested, washed 3 times with 1 M9 buffer, and frozen in a -80 o C freezer for later processing.

2
RNAseq samples for heat shock 3 N2 and PTM416 worms were synchronized using a 3-hour hatch-off. Eggs were cultured at 20°C until they 4 reached L4 stage. Heat shock assay plates were then wrapped with parafilm and placed in a water bath 5 pre-heated to 34°C for 2 hours or 4 hours. Worms were either collected right after heat shock or after 30 6 minutes at 20°C for the recovery group.

22
Alternative splicing sites in the 10 th , 16 th , and 21 st exons were also removed from this reference database 23 to ensure they were consistent between all isoforms. We used wildtype L2 RNAseq data from Brunquell

39
All samples were loaded on 5% SDS-PAGE gel at 3ul, 5ul and 7ul volumes followed by Coomassie blue 40 staining and washing steps. Gels were then dried using DryEase Mini-Gel Drying System (Invitrogen,

41
Catalog number: NI2387). These gels were used to normalize protein loading volume for different samples.

42
Each sample was loaded onto a freshly made 6% or 10% SDS-PAGE gel and run at 25mA. Gel samples 43 were then transferred in 10mM CAPS pH10.5 buffer at 20V and 20mA for 17hrs to a PVDF membrane.

44
Protein products with HA tag were detected using 1:500 anti-HA antibody (Life Technologies, Catalog

Egg-laying analysis
Egg laying assays were performed as previously described (Large, Xu et al. 2016). All egg-laying assays 1 were carried out at 20°C using standard 3cm NGM plates seeded with the OP50 strain of Escherichia coli.

2
OP50 were prepared freshly by streaking a glycerol stock of OP50 on an LB plate and letting grow at 37°C 3 overnight. A single colony was then picked to 5ml fresh LB and cultured overnight in a shaking incubator 4 at 200rpm. 1ml of the overnight culture was used to inoculate 200 ml of LB for 4-6 hours of growth at 37°C 5 with shaking. The 200ml OP50 culture was concentrated via centrifugation to an OD600 of 2.0 and this 6 culture was used for seeding experimental plates with 50 μl aliquots. All experimental plates were prepared 7 the week of the assay and left at 22.5°C 18-24 hrs following seeding. Plates were then placed at 4°C until 8 the day of the assay and warmed to 20°C for 12 hours before each time point.

9
For strains that have severe reduced fertility when homozygote, one L4 nematode was transferred to the 10 50μl experimental plate. The number of eggs laid were measured every 12 or 24 hours, and eggs laid per

31
Oocytes were counted while imaging and sperm number was measured manually by analyzing z-stack 32 images on ImageJ through the CellCounter plugin.

34
To identify nurf-1 orthologs, we used homology information included in www.wormbase.org or by blasting      For some of the species that we were unable to resolve the full nurf-1 region (due to missing sequence for part of the region), we were able to 1 identify the duplicated region and included this in the phylogenetic analysis.  . We noted that the resulting topology recovered for the duplicated region was incongruent with the 9 species tree, likely due to limited phylogenetic signal in the short alignment ( Figure S18). To address this,   The N2 and LSJ2 lineage split sometime around 1958. N2 grew on agar plates with E.coli OP50 as a food source for around 11 years until they were cryopreserved. LSJ2 animals were cultured in liquid axenic media containing sheep liver extract and soy extract peptone as a food source for about over 50 years until they were cryopreserved. 302 genetic variations were fixed between these two strains, including two that fall in the nurf-1 gene -WBVar00601361 and WB00601565. B) Genetic location of two nurf-1 variations. WBVar00601361 (in red box) is an N2-derived intron single nucleotide substitution T/A (N2/ancestral) in the 2nd intron of nurf-1. WBVar00601565 is an LSJ2-derived 60bp deletion in the 3' end of nurf-1 that removes the last 18 amino acids and part of the 3'-UTR. C) Comparison of NURF-1 orthologs from Drosophila and humans showing position of protein domains and conserved regions as determined by Blastp and Clustal Omega. D) Boxplot of pairwise evolutionary fitness differences between the indicated strains measured by directly competing the indicated strains against each other for five generations. PTM288 and PTM229 are the same genotype as N2* and N2, respectively, with the exception of an engineered DNA barcode in the dpy-10 gene. PTM88 is the same genotype as the ARL (del_LSJ2>N2) , with the exception of a background DNA barcode in the spe-9 gene (for details see Methods). The genotype of each nurf-1 allele (shown in B) is indicated by color. The NIL strain also contains LSJ2 alleles of additional linked mutations, which is indicated by the blue horizontal line. For all figures, each dot represents an independent replicate, the box indicates the interquartile values of all data, and the line indicates the median of all data. Positive values indicate strain one is more fit than strain two. Negative values indicate strain two is more fit than strain one. For all figures, n.s. indicates p>0.05, one star indicates significant difference at p<0.05 level, two stars indicate significant difference at p<0.01 level, and three stars indicate significant difference at p<0.001 level. E) Total brood size of the N2 and ARL (intron,LSJ2>N2) strains. F) Number of differentially expressed genes between synchronized N2 and ARL (intron,LSJ2>N2) animals harvested 52 hours (L4 stage -when spermatogenesis is active) or 60 hours (young adults -when oogenesis is active) after hatching.   1  2  9  8  7  6  5  4  3  13  12  11  10  18  17   16   15  14   23   21  20  19  25  24  28  27                Proposed molecular mechanism for NURF-1 isoforms. The NURF-1.B isoform interacts with ISWI through its DDT domain to form a NURF complex capable of remodeling chromatin at specific regions of the chromosome. NURF is recruited to these regions through interactions with specific transcription factors using protein domains encoded by the overlapping exons. This remodeling is necessary for transcriptional responses for spermatogenesis. Due to some unknown signal, after spermatogenesis has resulted in the production of ~300 sperm, the NURF-1 D isoform outcompetes the NURF complex away from its target loci, casued the loss of transcription of key spermatogenesis genes, resulting in gametogenesis transitioning from spermatogenesis to oogenesis. The PHD and Bromodomain's binding affinity to histone strengthens this repression, but they are not completely necessary for the ability of the D isoform to outcompete the B isoform.

Figure 7
Name Evidence 675T 816 . Egg-laying rate of four strains. Egg-laying rate was calculated at the indicated time points. Six L4 animals were picked onto each assay plates (time = 0) for the indicated times. The total number of eggs was counted for each plate and used to calculate the average number of eggs laid per hour for each animal. Each individual trial is shown as a dim line. The mean for each strain is shown as the bold, colored lines. But the effect of the intron SNV on this trait was subtle, especially when compared to the difference in reproductive rate caused by the nurf-1 60bp deletion ( Figure S1) . We believe this is explained by epistasis between the two nurf-1 mutations, with the LSJ2 combination of both alleles having a non-linear effect on reproductive rate. We are unable to test this hypothesis as we did not construct a double ARL strain containing LSJ2 alleles of both of the nurf-1 mutations due to the difficulty in creating the intron edit (see Methods). Alternatively, additional mutations in NIL (nurf-1,LSJ2>N2*) could contribute to reproductive rate.   Coverage plot of reads from RNA. Note the high expression of the 14th, 15th, and 16th exons, supporting the existance of the nurf-1.q transcript. Reads covering the 23rd exon were also observed, supporting the expression of the nurf-1.f transcript. Blow up of the 10th, 16th, and 21st exon indicates alternative splicing sites are used at these exons. Clipped reads containing sl1 sequence support transcriptional start sites at the 1st and 14th exon.  Figure S4. nurf-1 encodes multiple transcripts. A) Subset of nurf-1 transcripts analyzed in this paper. Each blue box is an exon. Exon number is indicated on the figure. Dark blue exons are alternatively spliced, resulting in a 6-9 bp difference in length. B) Nanopore sequencing reads aligned to nurf-1. Reads were grouped by the nurf-1 transcripts they support. Dark purple marks are mismatches from the reference sequence.   1  2  9  8  7  6  5  4  3  13  12  11  10  18  17   16   15  14   23   21  20  19  25  24  27 1  2  9  8  7  6  5  4  3  13  12  11  10  18  17   16   15  14  21  20  19  25  24  28  27    We also show the position a precise deletion of the 23rd exon edited into two strains, created using CRISPR/Cas9. In C. elegans genetic nomenclature, each independently generated genetic mutation is given a unique allele name, even if they are genetically identical. The deletion edited into the N2 strain is named kah149.

Figure S1
The deletion edited into a strain containing an inframe FLAG epitope tag (shown as a black box) is named kah144. The y-axis indicates the number of reads for at each location. B) Quantification of RNA abundance for five nurf-1 transcripts in response to heat shock. Data taken from Li et. al., who heat shocked L2 animals at 34°C for 30 minutes and Brunquell et. al., who heat shocked L4 animals at 33°C for 30 minutes. The y-axis is the estimated transcripts per million (tpm) for each isoform in each condition. C) Western blotting of a strain containing the FLAG-tag fused at the position shown in panel A using an anti-FLAG antibody. We detected two bands, one matching the predicted size of the NURF-1.F isoform, that were both upregulated by heat shock (34°C). D) Western blotting of three strains either containing a FLAG-tag and/or deletion allele predicted to ablate the nurf-1.f transcript. The x-axis shows the presence or absence of the various alleles along with the environmental condition. We detected two bands that were induced by heat shock. Observation of these bands required the FLAG epitope tag and could be ablated by the 23rd exon deletion. E) Multi-dimensional scaling plot (MDS) of the N2 (red) and a strain carrying the kah149 deletion of the 23rd exon (blue) in response to various heat shock conditions. No HS indicates no heat shock, 2 hr or 4 hr HS indicates two or four hours of heat shock at 34°C, and 2 hr HS + recovery indicates animals experiencing two hours of heat shock at 34°C followed by 0.5 hours of recovery at 20°C. The overall transcriptional response was the same in both strains. F) Number of genes significantly up or down regulated between N2 and kah144 animals at the indicated conditions. G) Scatter plot of all genes differentially expressed in four hour heat shock or two hour heat shock + recovery conditions. The R2 value was 0.4421.

Figure S13
C

Figure S14
C  Figure S14. Sashimi plots for Caenorhabditis species with one nurf-1 gene. Only species with published genome and transcriptome were plotted. Each peak shows the coverage for each exon, each trajectory shows exon-exon junctions supported by RNAseq reads.

Figure S15
C Two nurf-1 genes Figure S15. Sashimi plots for Caenorhabditis species with two nurf-1 gene. Only species with published genome and transcriptome were plotted. Each peak shows the coverage for each exon, each trajectory shows exon-exon junctions supported by RNAseq reads. These plots show no read support the splicing from nurf-1-1 to nurf-1-2 which further suggest the split of nurf-1 in these species.