Sex Determination in Ceratopteris richardii Is Accompanied by Transcriptome Changes That Drive Epigenetic Reprogramming of the Young Gametophyte

The fern Ceratopteris richardii is an important model for studies of sex determination and gamete differentiation in homosporous plants. Here we use RNA-seq to de novo assemble a transcriptome and identify genes differentially expressed in young gametophytes as their sex is determined by the presence or absence of the male-inducing pheromone called antheridiogen. Of the 1,163 consensus differentially expressed genes identified, the vast majority (1,030) are up-regulated in gametophytes treated with antheridiogen. GO term enrichment analyses of these DEGs reveals that a large number of genes involved in epigenetic reprogramming of the gametophyte genome are up-regulated by the pheromone. Additional hormone response and development genes are also up-regulated by the pheromone. This C. richardii gametophyte transcriptome and gene expression dataset will prove useful for studies focusing on sex determination and differentiation in plants.

Male gametophytes never develop a lateral meristem and are much smaller than hermaphrodites (Figure 1d), with nearly all cells of the male gametophyte terminally differentiating as antheridia. If removed from media containing A CE , undifferentiated cells of a male can form a new hermaphroditic prothallus. Based upon these observations, A CE has many functions in gametophyte development: it prevents the establishment of the lateral meristem; it promotes the rapid differentiation of antheridia; it prevents its own synthesis or secretion in the male; and is necessary to maintain the male program of expression.
To date, all fern antheridiogens characterized are gibberellins (GAs) (Yamane et al. 1987;Yamane 1998;Takeno et al. 1989;Furber et al. 1989). Although the structure of A CE is unknown, the GA biosynthetic inhibitors ancymidol, AMO-1618, and uniconazole-P reduce the proportion of males in a population of Ceratopteris gametophytes suggesting that A CE and GA share a common biosynthetic pathway (Warne and Hickok 1989). That ABA completely blocks the A CE response in Ceratopteris is also consistent with A CE being a GA (Hickok 1983;McAdam et al. 2016).
To understand how A CE determines the sex of the Ceratopteris gametophyte, mutations affecting sexual phenotype have been characterized and used to develop a genetic model of the sex-determining pathway (Strain et al. 2001;Banks 1994Banks , 1997Eberle and Banks 1996). Cloning these genes is challenging because of the large genome size of C. richardii, ca. 11Gb (Li et al. 2015), and the lack of a reference genome sequence for this or any other homosporous fern species. An alternative approach to identifying sex-determining genes involves de novo transcriptome assembly using RNA-seq, which provides a means to perform sensitive gene expression studies in organisms that do not have a reference genome (Grabherr et al. 2011;Robertson et al. 2010;Schulz et al. 2012). Here we describe the de novo assembly of the transcriptome of young Ceratopteris gametophytes and identify genes whose expression differs as their sex is being determined by the absence or presence of A CE ; thus providing a snapshot of the transcriptional changes that occur as the sex of the spore becomes determined and prior to the differentiation of male or female traits in the developing gametophyte.

Plants and Growth Conditions
The origins of Hn-n, an isogenic, wild-type strain of Ceratopteris richardii used in this study, is described in (Hickok et al. 1987). The conditions for spore sterilization and gametophyte culture are as previously described (Banks 1994). Medium used to culture gametophytes in the absence of exogenous A CE is as described in (Banks et al. 1993). and is referred to as fern medium, or FM. A CE was obtained as a crude aqueous filtrate from media previously supporting gametophyte growth in FM as described in (Banks et al. 1993) and is referred to as conditioned FM (CFM). Scanning electron micrographs (SEMs) were performed on a FEI NOVA nanoSEM on samples prepared as previously described (Banks 1994).
For both RNA-seq and qRT-PCR, spores were grown aseptically in liquid FM at 28°in a growth chamber, shaken at 100 rpm, and at a density of 1g spores/L. Three days after spore inoculation, gametophytes were filtered from media; 1/6 of the spores were added to each of three flasks containing 200 mL sterile FM, which is the 2A CE treatment, and 1/6 were added to each of three flasks containing 200 mL sterile CFM, which is the þA CE treatment. After 36 hr, gametophytes were vacuum filtered from media and frozen in N 2 ðlÞ. Tissue was subsequently stored at -80°. All samples were randomized throughout incubators and during sample preparation and harvesting protocols.

Library Preparation and Sequencing
Frozen tissue was ground under liquid nitrogen for 20 min and total RNA extracted using the RNeasy Plant Mini Kit (Qiagen, CA). The TruSeq kit (Illumina, CA) was used to select poly-adenylated mRNA and prepare libraries for sequencing. Libraries were sequenced on an Illumina HiSeq2000 platform using paired-end technology.

Transcriptome Assembly and Annotation
DeconSeq v.0.4.1 was run on each of the FASTQ read files to remove reads aligning to bacterial, viral, rRNA, mitochondrial RNA, and chloroplast DNA . After removing adapter sequences and trimming reads based on quality score with Trimmomatic v.0.22 (Lohse et al. 2012), reads were assembled using the de novo transcriptome assembler Trinity (Grabherr et al. 2011), with a minimum contig length cutoff of 150 and a fixed k-mer size of 25. An assembly with unique genes was generated by selecting the longest component from each Trinity de Bruijn graph. These were used in subsequent differential expression analyses in order to avoid biasing analyses toward genes that were more difficult to assembly and thus had many more contigs (subcomponents). The program Assembly Stats in the iPlant Discovery environment was utilized to obtain basic assembly statistics (Goff et al. 2011;Earl et al. 2011). Protein-encoding, differentially expressed genes were annotated using the Trinotate workflow (Ashburner et al. 2000; Figure 1 Ceratopteris gametophyte development. (a) SEM of spores three days after inoculation showing trilete markings. (b-d) SEMs of 4.5d, 6d and 14d gametophytes grown in the presence of A CE . (e-g) SEMs of 4.5d, 6d and 14d gametophytes grown in the absence of A CE . The mature hermaphrodite (g) has a meristem notch (mn), archegonia (ar) and antheridia (an) while the mature male (d) has only antheridia (an). Bars = 0.15mm. Finn et al. 2011;Grabherr et al. 2011;Kanehisa et al. 2011) using a 50 amino acid minimum cutoff.

Differential Expression Analysis
Paired reads were aligned to the assembled transcriptome using RSEM v.1.0.1 (Li and Dewey 2011;Grabherr et al. 2011;Li et al. 2015). Only the transcripts with at least one read aligned in at least one of six samples were used. edgeR v.3.0.8 (Robinson et al. 2010), DESeq v.1.10.1 (Anders and Huber 2010), and EBseq v.1.1.4 (Leng et al. 2013) were used to identify differentially expressed genes at a Benjamini-Hochberg (Benjamini et al. 2001) corrected FDR of q = 0.01. In edgeR, dispersion was estimated as tagwise dispersion. To retain as much rigor in our methods as possible, genes that were identified as statistically significantly differentially expressed in all three packages and displayed at least a twofold expression difference between conditions were identified as "consensus DEGs" and used in all downstream analyses.

GO Enrichment and Assembly Validation
Because there is no reference genome sequence for C. richardii, GO enrichment was performed by annotating the C. richardii transcriptome and the differentially expressed genes against the Arabidopsis proteome (Araport11) (Hanlon et al. 2015), the non-redundant database, and the Selaginella moellendorffii proteome v. 1.0 (Banks et al. 2011), using BLASTx and an e-value threshold of 10 28 : Gene Ontology (GO) terms were then assigned from the Arabidopsis accession identifier from the best hit associated with the differentially expressed transcripts and the reference C. richardii transcriptome. ClueGO (version 2.3.4), a Cytoscape (version 3.5.1) plug-in (Cline et al. 2007;Smoot et al. 2010;Saito et al. 2012;Shannon et al. 2003), and GO Term Fusion were used to distill and visualize the GO term enrichments within the biological processes category using default parameters, with the following exceptions: the minimum number of genes/cluster was set to 5, the Benjamini-Hochberg method was used to correct the p-values for multiple testing, with a significance threshold of P , 0.05 and a custom background model supplied. The GO terms mapping to the entire nonredundant Ceratopteris richardii transcriptome was used as the background when assessing the enrichment of GO terms. To assess the quality of the C. richardii Trinity assembly, the 5133 C. richardii Sanger-generated ESTs available in GenBank were used to blast the entire Ceratopteris transcriptome assembly using BLASTn and a BUSCO (Benchmarking Universal Single-Copy Orthologs) analysis was performed using BUSCO v.2.0 to assess completeness of the assembled transcriptome using the 'eukaryotic' dataset, which consists of 303 highly conserved genes (Simão et al. 2015;Waterhouse et al. 2018).

Expression Analysis Validation
Total RNA was isolated from six gametophyte populations cultured and harvested in the same manner as that used to generate the RNA-seq data. RNA was reverse transcribed using the Tetro cDNA Synthesis Kit (Bioline, MA); qRT-PCR was performed using the SYBR green PCR Master Mix (Applied Biosystems), 3ng cDNA template and the StepOne Real-Time PCR System (Applied Biosystems, NY). PCR conditions were: one cycle of 20 min at 95°, 40 cycles of 3 sec at 95°and 30 sec at 60°. Melt curves were analyzed and only those reactions producing a single Tm peak were used. Three technical replicates were performed for each sample. Measurements were normalized to the amount of CrEF1a (GenBank accession number BE642078) transcript in the samples. Reactions without template added served as the negative control. The DCt method was used in calculating relative fold changes (Livak and Schmittgen 2001). The primer sequences used are listed in Table S1.

Data Availability
Strains are available upon request. Table S1 contains primers used in qRT-PCR and supplemental figures. Table S3 contains a list of all 1,163 differentially expressed genes found by all three statistics packages with annotation and statistical support included. This Transcriptome Shotgun Assembly project has been deposited at DDBJ/EMBL/GenBank under the accession GBGN00000000. The version with 82,870 genes used in the differential expression analysis is the second version, GBGN02000000. RSEM results and statistical support for all Trinity predicted transcripts are available upon request. Supplemental material available at Figshare: https://doi.org/10.25387/g3.6100139.

Gametophyte Morphology and Selection of Tissue Samples
The early development of Ceratopteris gametophytes can be divided into distinct stages (Banks et al. 1993). During the first stage (0-3d after spore inoculation), the spore swells but remains intact ( Figure 1a). During stage 2 (3-4d), the spore coat cracks along its trilete markings. The first rhizoid emerges from the spore during stage 3 and the twodimensional protonema (Figure 1b and e) emerges during stage 4 (4-6d). The male and hermaphrodite gametophytes become morphologically distinct at stage 5 (6-7d; (Figure 1c and f) at which time hermaphrodites begin to secrete A CE . For a gametophyte to develop as a male, it must continuously be exposed to A CE during stages 2 and 3 (Banks et al. 1993). Because we are interested in identifying genes that are differentially expressed by A CE treatment during the period of time that the sex of the gametophyte is determined, three populations of gametophytes were grown without A CE for three days; at day three (end of stage 1), each population was divided into two and either media without A CE or media with A CE was added to the split samples. All gametophytes were harvested and processed 36hr later (stage 3; (Figure 1b and e)) when the sex of the gametophyte was determined but male and hermaphrodites were morphologically indistinguishable.

Transcriptome Assembly and Annotation
The Ceratopteris transcriptome was assembled from 188 million Illumina paired end reads generated from the six gametophyte samples (see Table S2 for a summary of run metrics, analysis and assembly of the transcriptome). Three biological replicate samples were sequenced and analyzed for each treatment condition. A Trinity (Grabherr et al. 2011) de novo assembly resulted in 82,820 genes with read support of which 24% could be annotated with the Arabidopsis proteome, and 23% could be annotated by the Selaginella proteome. A large number of genes (1,064) could be annotated using the Selaginella proteome but did not have hits in the Arabidopsis proteome. Most of the top hits of these sequences are from the Selaginella moellendorffi genome (302), however many are also from Physcomitrella patens (175), and Marchantia polymorpha (131). Of these, the majority (738 sequences) had hit descriptions of predicted/hypothetical, uncharacterized, or unknown proteins. That most of these sequences do not have known gene functions is not surprising given that Arabidopsis is generally used in annotation of plant datasets. Of the remaining sequences which were annotated, many are sperm-related. Motile sperm are a characteristic of early divergent land plants such as Selaginella and Ceratotperis (reviewed in (Hodges et al. 2012)), and thus it is not surprising that such proteins would be present in these assemblies but notably absent from Arabidopsis. A total of 44 sequences have blast hits to dynein related proteins and 18 have hits to flagellar associated proteins. Additional sequences are also present which are sperm-related, including radial spoke protein 9 and sporangia induced deflagellation-inducible protein. Of the annotated sequences, 43 are annotated with the cellular component GO term cilium, and 13 are annotated with cilium or flagellum-dependent cell motility; these are likely sperm-related proteins as flagellum are solely found in sperm cells in seedless vascular land plants (Raven et al. 2005).
Following the assembly and annotation of the Ceratopteris gametophyte transcriptome, the quality of the assembly was assessed. First, the quality of the Trinity assembly was assessed by comparing 5,133 Ceratopteris Sanger EST sequences available in GenBank to transcript sequences generated by Trinity using BLASTn. 87% of the Sanger ESTs, generated either from C. richardii sporophyte and gametophyte tissues were identical or almost identical (E-value of 0.0) to transcripts in the transcriptome assembly, indicating that Trinity accurately assembled transcript sequences from the short Illiumina reads. The expression of the Sanger ESTs not represented in the transcriptome assembly may be age or tissue specific and thus not captured in the transcriptome assembly described here. A BUSCO analysis (Waterhouse et al. 2018;Simão et al. 2015) was also performed to assess the completeness of the transcriptome. BUSCO identifies highly conserved genes as complete, complete and single-copy, fragmented, or missing in the transcriptome. Of the 303 total BUSCO groups searched, 290 were complete (95.7%), 181 were complete and single-copy (59.7%), 10 were fragmented (3.3%), and only 3 were missing (1%). This suggests that the assembled Ceratotperis gametophyte transcriptome is quite complete.

Identification and Validation of Differentially Expressed Genes by Antheridiogen Treatment
Three programs, edgeR (Robinson et al. 2010), DESeq (Anders and Huber 2010), and EBSeq (Leng et al. 2013), were used to identify genes that differ in their expression by A CE treatment (See Table S2 for number of differentially expressed genes found by each package). A scatterplot (Figure 2) that assesses the overall expression pattern across all transcripts shows that the expression of most transcripts is similar regardless of treatment, as expected. The majority (88%) of differentially expressed genes were more highly expressed in A CE treated gametophytes ( Figure 2). Of the 1,183 DEGS identified using DESeq, 1,163 were also identified by EBSeq and edgeR; these 1,163 DEGS were used in subsequent analyses. A list of the 1,163 DEGS, their annotation and supporting statistics is provided in Table S3. Of the 133 DEGS more abundant in the non-A CE -treated gametophytes, 55% were annotated as protein-encoding genes, while 71% of the 1,030 DEGS more abundant in the A CE treated samples could be annotated.
To test the validity of the DEG analysis, the expression of 10 genes, including genes more abundant in A CE -treated samples, genes more abundant in the non-A CE -treated samples and genes showing a less than twofold difference in abundance between treatments were assessed by qRT-PCR. As shown in Figure S1, the qRT-PCR expression data are consistent with the RNA-seq expression data in the direction of the fold change.

GO-Enrichment of Differentially Expressed Genes
The enrichment of Gene Ontology (GO) Biological Process terms associated with the genes that are up-regulated by A CE (Figures 3 and Figure S2) reveals four major networks of enriched GO terms. One cluster includes genes related to various aspects of development, including meristem, shoot and tissue development. Another cluster includes genes involved in hormone (ABA, auxin, ethylene and GA) signaling or responses. A third cluster includes genes that affect chromatin structure and epigenetic regulation of gene expression. The fourth cluster includes genes broadly involved in regulating gene expression; genes within this cluster are included in the "chromatin" cluster. Only a single GO term (response to light stimulus) was enriched for genes that are up-regulated in the non-A CE treated samples.

Hormone and Development Genes Responsive to A CE
Given that all characterized fern antheridiogens are gibberellins (Yamane 1998), genes involved in GA hormone biosynthesis, signaling and responses are likely to be involved in sex determination in Ceratopteris. Of the differentially expressed genes, COPALYL DI-PHOSPHATE SYNTHASE/KAURENE SYNTHASE (CPS/KS), which encodes a key enzyme in GA biosynthesis (Sun and Kamiya 1994;Hedden and Thomas 2012), is more abundant in gametophytes that will become A CE secreting hermaphrodites (Table 1). No other known GA biosynthetic genes, including kaurene oxidase and GA20 oxidase are differentially expressed in C. richardii, indicating that sex-specific A CE biosynthesis may be regulated or limited by the expression of the CPS/KS gene in Ceratopteris, and that its expression is down-regulated by A CE (males do not secrete A CE ). All major ABA and GA signaling genes (Yamauchi et al. 2004;Chan 2012;Sun 2008) are present in the Ceratopteris transcriptome and are listed in Table S4. Ceratopteris seems to have all the components seen in Arabidopsis, though instead of the 7 DELLA proteins, responsible for repressing GA responses, Ceratopteris has only 2 and two F-box protein encoding genes (SNE   (Anders and Huber 2010). Genes which are more highly expressed in þA CE treatment are shown in blue whereas those more highly expressed in 2A CE treatement are shown in purple. The majority (88%) of differentially expressed genes were more highly expressed in A CE treated gametophytes.
have been described. Furthermore, three independent alleles of Ceratopteris GAMETOPHYTE INSENSITIVE TO ABA 1 (GAIA1) were shown to be mutations of a gene homologous to OST1/SLAC1 of Arabidopsis (McAdam et al. 2016). Wild type gametophytes are hermaphroditic in the presence of A CE and ABA, gaia1 mutants are male in the presence of A CE and ABA. However, while the GA receptor (GID) and GA signaling (the DELLA GAI/RGA and GID2) genes are present in the transcriptome, they are not differentially expressed by antheridiogen in C. richardii as they are in gametophytes of the fern Lygodium japonicum (Tanaka et al. 2014). Three MYB genes are up-regulated (or de-repressed) by A CE treatment in Ceratopteris (Table 1). These genes are well-characterized regulators of GA-induced responses during angiosperm seed germination (Gubler et al. 2002) and GA-dependent anther development (Aya et al. 2011) and may serve similar functions by promoting the development of antheridia and/or suppressing archegonia development in Ceratopteris gametophytes exposed to A CE .
Although ferns never evolved seeds, there are interesting physiological parallels between seed germination in Arabidopsis and sex determination in Ceratopteris in that both processes are regulated by two antagonistic hormones, ABA and GA. In germinating Arabidopsis seeds, the expression of Mother of FT and TFL (MFT) gene is modulated by GA and ABA (Xi et al. 2010). MFT is up-regulated by ABA treatment via the ABA-INSENSITIVE3 (ABI3) and ABI5 transcription factors, as well as by DELLA proteins in the GA signaling pathway. Because MFT represses ABI5, MFT is central to a negative feedback loop that regulates seed germination by GA and ABA in Arabidopsis. Given that the Ceratopteris MFT and ABI3 genes are upregulated by A CE and ABA antagonizes the A CE response (Hickok 1983;McAdam et al. 2016;Sussmilch et al. 2017), A CE may promote male development by ultimately repressing ABA signaling in the gametophyte through a pathway that involves MFT and ABI3.
Several additional genes involved in ABA, ethylene, auxin and cytokinin perception, signaling or response are up-regulated by A CE Figure 3 Functionally grouped Biological Process GO terms specific for A CE -up regulated DEGs. The size of each node represents the term enrichment significance. Node labels are shown in the bar graph in Figure (Table 1). While ABA is known to affect sex determination by blocking the A CE response, these results point to roles for additional hormones in the sex-determining process. Studies of the effects of exogenous auxin (Gregorich and Fisher 2006;Hickok and Kiriluk 1984), ethylene (Ka zmierczak 2010) and cytokinin (Menéndez et al. 2009) on fern gametophyte development have shown that these hormones can affect the overall size and organization of the gametophyte as well as the number of sex organs in a gametophyte. However, neither auxin, ethylene or cytokinin substitute for or completely block the male-inducing effects of antheridiogen, indicating that A CE may influence these hormones, or the crosstalk among these hormones, in modulating cell division and expansion in young gametophytes that will become important as they differentiate. This DEG analysis suggests that A CE affects the sex of the gametophyte by not only activating genes associated with development, but also by epigenetically reprogramming the nucleus that will divide and ultimately give rise to a male gametophyte. The relatively few genes that are up-regulated in gametophytes not treated with A CE likely represent genes that are normally expressed in the gametophyte destined to become hermaphrodite but are repressed by A CE : An Epigenetic Response to A CE A striking number of DEGs up-regulated by A CE encode factors involved in epigenetic regulation of gene expression or epigenetic reprogramming of the genome. These genes were sorted into five groups (Table 1) following the classification of Pikaard and Scheid (Pikaard and Scheid 2014): DNA modification, histone modification, Polycombgroup proteins and interacting components, chromatin formation/ chromatin remodeling and RNA silencing.
The first group includes DNA modification genes that affect cytosine methylation. The DEGs assigned to this group encode DNA METHYLTRANSFERASE 1 (MET1), which maintains CpG methylation (Saze et al. 2003;Jullien et al. 2012), CHROMOMETHYLASE 3 (CMT3), which maintains CpHpG methylation (Law and Jacobsen 2010) and REPRESSOR OF SILENCING 1 (ROS1), a cytosine demethylase (Gong et al. 2002). Differences in global DNA methylation patterns between gametes and adjacent cells of both male and female gametophytes of Arabidopsis have been observed (Pillot et al. 2010;Calarco et al. 2012;Ibarra et al. 2012;Jullien et al. 2012) and are thought to silence transposable elements and reset silenced imprinted genes in sperm cells (Kawashima and Berger 2014;Martínez et al. 2016). While sex determination in a homosporous fern, which occurs during the gametophyte generation, differs from sex determination in the heterosporous angiosperms, which occurs during the sporophyte generation (Tanurdzic and Banks 2004), the up-regulation of these genes during sex determination in Ceratopteris adds another stage of plant development where DNA methylation may play an important role in stabilizing or destabilizing transposable elements and contributes to epigenetic reprogramming of the male gametophyte. Whether the observed differential expression of these DNA methylation genes alters DNA methylation patterns in the genomes of young Ceratopteris gametophytes, and whether additional changes in DNA methylation occur as their gametes differentiate, remain to be tested.
A number of A CE -up-regulated DEGs encode proteins belonging to the second group, histone-modifying enzymes known to affect gene expression (Table 1). Among them are the histone acetyltransferases HAC1, HAC12 and ROS4, a histone deacetylase (HDA14), the histone methyltransferases TRITHORAX-LIKE PROTEIN 2 and 3 (ATX2 and 3), the SU(VAR)3-9 related proteins SUVH4/KYP and SUVH6, and EARLY FLOWERING IN SHORT DAYS (EFS/SDG8). These proteins are involved in either maintaining transcriptionally active states or transcriptionally inactive states (reviewed in Bannister and Kouzarides 2011;Grossniklaus and Paro 2014;Pikaard and Scheid 2014;Xiao et al. 2016) and can contribute to the maintenance of DNA methylation at silenced loci. ATXR3 is notable in that it is essential for male and female gametophyte development (Berr et al. 2010) in angiosperms. Only one DEG, CURLYLEAF (CLF), was classified as encoding proteins from the third group of chromatin modifiers: Polycomb proteins. Polycomb proteins and interacting partners are often involved in determining cell proliferation and identify through methylation and chromatin compaction (Grossniklaus and Paro 2014;Kingston and Tamkun 2014). The fourth group of genes, those involved in n chromatic formation/remodeling, are also represented among the genes up-regulated in response to A CE . PICKLE (PKL), the gene encoding for a chromatin remodeling factor which is necessary for gibberellin modulated development in Arabidopsis, (Park et al. 2017) and , are both members of remodeling complexes and are required for normal development (Zhang et al. 2015).
The fifth group of genes involving epigenetic regulation is those relating to RNA-mediated gene silencing pathways (Table 1). Argonaute (AGO) 1, a core member of the RNA-induced silencing complex (RISC) which is involved post transcriptional gene silencing (PTGS) through cleavage or transcriptional inhibition (reviewed in (Czech and Hannon 2011;Martienssen and Moazed 2015)) is significantly up-regulated by A CE . Also up regulated are genes encoding two Dicer endonucleases: DCL1 which generates miRNAs of mostly 21nt and DCL4, which generates siRNAs that are 21nt (Pouch-Pélissier et al. 2008). Additional genes involved in RNA-mediated PTGS are XRN4, which encodes a nuclease involved in small RNA processing (Cao et al. 2014), and SUO, which encodes a component of the miRNA pathway (Yang et al. 2012). NRPD2, encoding the catalytic subunit of RNA polymerase IV and V in plants (Ream et al. 2009) is also up-regulated by A CE . Pol IV and V are both required for intercellular RNA interference and are involved in PTGS maintenance (Onodera et al. 2005;Pontier et al. 2005;Kanno et al. 2005). Also modulated by A CE is a component of the THO/TREX complex, which has a putative role in siRNA biosynthesis (Furumizu et al. 2010). Interestingly, the THO complex represses female germline specification in Arabidopsis (Su et al. 2017). Together, these results show that small RNA-mediated PTGS is involved in the suppression of female characteristics in C. richardii gametophytes.
All of the epigenetic mechanisms known to occur in plants are represented among the genes up regulated by A CE . The importance of epigenetic regulation for sex determination in C. richardii should perhaps not come as a surprise. Gametophytes which are removed from A CE containing media will over time develop into hermaphrodite gametophytes, thus the promotion of male/suppression of female traits must be reversible. Epigenetic regulation of sex determination would allow for such plasticity in development.

Conclusions
This work reports the first transcriptome of Ceratopteris richardii, along with a survey of significant differential gene expression changes between male and hermaphrodite gametophytes as sex is being determined. A high-quality reference gametophyte transcriptome was assembled and used in the identification of genes which may be involved in sex determination. The majority of differentially expressed genes were more highly expressed in the male gametophyte. Many of these up regulated genes are known to be involved in development and in response to hormones. A significant number of differentially expressed genes are involved in chromatin remodeling and epigenetic regulation. Outcomes of this research shed light on the molecular mechanisms involved in sex determination of C. richardii as well as provide a resource for other plant science researchers. Future work will probe and functionally classify these differentially expressed genes and as well as survey how these changes persist as the gametophyte moves from sex determination to differentiation.

ACKNOWLEDGMENTS
This research was supported by the National Science Foundation (NSF 1258091 to JB) and the University of Queensland. We thank the Purdue University Rosen Center for Advanced Computing (RCAC) for access to computational resources throughout the project, the Texas Advanced Computing Center (TACC) and Matthew Vaughn for computational resources and support. We are grateful to Michael Gribskov, Steve Kelly, Mikhail Atallah, Phillip SanMiguel, Joseph Ogas, Qiong Wu, and Rick Westerman for technical and computational support.
Author Contributions: NA performed tissue preparation and treatment as well as harvesting, performed bioinformatics analysis, performed qRT-PCR, interpreted data, and wrote part of the manuscript. FG provided bioinformatics guidance and support. OV participated in designing the experiment as well as provided statistical guidance and support. JB and MT designed the experiment, conceived of experimental design and setup, interpreted data, and wrote manuscript. All authors reviewed and approved the final manuscript.