De novo transcriptome assembly of the Southern Ocean copepod Rhincalanus gigas sheds light on developmental changes in gene expression

Copepods are small crustaceans that dominate most zooplankton communities in terms of both abundance and biomass. In the polar oceans, a subset of large lipid-storing copepods occupy central positions in the food web because of their important role in linking phytoplankton and microzooplankton with higher trophic levels. In this paper, we generated a high-quality de novo transcriptome for Rhincalanus gigas, the largest—and among the most abundant—of the Southern Ocean copepods. We then conducted transcriptional profiling to characterize the developmental transition between late-stage juveniles and adult females. We found that juvenile R. gigas substantially upregulate lipid synthesis and glycolysis pathways relative to females, as part of a developmental gene expression program that also implicates processes such as muscle growth, chitin formation, and ion transport. This study provides the first transcriptional profile of a developmental transition within Rhincalanus gigas or any endemic Southern Ocean copepod, thereby extending our understanding of copepod molecular physiology.


Introduction
The copepod Rhincalanus gigas ( Fig. 1) is one of the dominant members of the mesozooplankton of the Southern Ocean, numerically and in terms of biomass (Ommanney, 1936;Mackey et al., 2012). It is also by far the largest of the Southern Ocean copepods, reaching more than 10 mm in length (Jansen et al., 2006). R. gigas is most abundant at the Polar Front where it is associated with the Antarctic Circumpolar Current, but its range extends south to the Antarctic continental shelf (Ward et al., 1997;Gleiber, 2014). Like the other dominant large-bodied copepods of the Southern Ocean-Calanoides acutus and Calanus propinquus-R. gigas has a circumpolar distribution and is abundant in both pelagic and coastal waters (Conover and Huntley, 1991). The diet of R. gigas is mostly herbivorous, but omnivory is an important aspect of its overwintering strategy when phytoplankton are scarce Kattner et al., 1994;Pasternak and Schnack-Schiel, 2002;Schmidt et al., 2003). R. gigas is a "slow and steady" species, characterized by low rates of growth, feeding, and metabolism (Conover and Huntley, 1991;Atkinson et al., 1992;Mayzaud et al., 2002;Maps et al., 2014), low fecundity (Ward and Shreeve, 1995), and relatively small seasonal fluctuations in abundance (Atkinson, 1991;Ward et al., 1997). Its large size and opportunistic feeding mode allow R. gigas to survive periods of low food availability. Compared to C. acutus, R. gigas exhibits low winter mortality (Ward et al., 1997) and small seasonal changes in lipid content, losing only about half as much relative dry weight and lipids over the winter Hagen and Schnack-Schiel, 1996;Tarling et al., 2007).
Historically, there has been disagreement whether R. gigas has a one- (Ommanney, 1936) or two-year life cycle (Atkinson et al., 1992). In fact, like Calanus finmarchicus in the North Atlantic, R. gigas can almost certainly exhibit either a one-or two-year life cycle depending on the location and environmental conditions. Both field observations (Ommanney, 1936;Voronina, 1970;Marin, 1988;Chiba, 2002) and modelling studies (Tarling et al., 2007) indicate that populations are frequently composed of overlapping generations with both single-and multi-year lifespans. Towards the southern end of its distribution, the waters are probably too cold for R. gigas to complete its life cycle in one year, but its large size allows young juvenile (copepodite) stages to successfully overwinter and complete the life cycle during the following growing season (Ward et al., 1997;Maps et al., 2014).
Lipid storage is a critical strategy for zooplankton in highly seasonal polar environments. Several groups of copepods, particularly predominantly herbivorous copepods at high latitudes and in coastal upwelling zones, store large amounts of lipids that they use as fuel for reproduction and overwintering (Hagen and Schnack-Schiel, 1996;Lee et al., 2006). The overwintering strategy of these lipid-storing copepods frequently includes a period of seasonal dormancy called diapause, which takes place during a late juvenile or adult stage of development. During diapause, copepods descend to depth, arrest development, and become metabolically inactive; they ultimately emerge from diapause during the spring to molt (in the case of juveniles) and reproduce (reviewed in Baumgartner and Tarrant, 2017). It is unclear whether, or to what extent, R. gigas performs diapause. In contrast to the canonical diapause behavior of species heavily reliant on herbivory, such as the sympatric Calanoides acutus, omnivory allows R. gigas to remain active or semiactive throughout the winter. Although it accumulates large lipid stores and migrates seasonally to deeper waters (Schnack-Schiel and Hagen, 1994), it is capable of actively feeding and reproducing yearround (Rakusa-Suszczewski et al., 1976;Pasternak and Schnack-Schiel, 2002).
Like most of the large lipid-storing species in the families Calanidae, Neocalanidae, and Eucalanidae, the primary storage lipids of R. gigas are wax esters. However, these lipids contain mostly short saturated fatty acids and alcohols Ward et al., 1996), in contrast to the long-chain monounsaturated lipids that characterize predominantly herbivorous, diapause-associated copepods-such as C. acutus, and the Arctic species Calanus glacialis and Calanus hyperboreus Atkinson, 1998;Lee et al., 2006). Long-chain wax esters are more energy-efficient and are thought to be an adaptation for long periods of food deprivation (Lee et al., 1974), so the predominance of short-chain wax esters in R. gigas may be a biochemical signature of its omnivorous feeding mode and weak reliance on diapause . These interspecific differences in lipid composition require differences in lipid synthesis and elongation pathways, but the genetic basis of these differences has yet to be closely examined in copepods.
We report a highly-complete transcriptome for R. gigas, assembled from two life stages, to serve as a bioinformatic resource, and we investigated changes in gene expression between late stage juveniles (CV copepodites) and adult females. Calanoid copepods progress through six naupliar (NI-NVI) and five copepodite (CI-CV) larval stages via a series of molts, which are accompanied by metabolic and physiological transitions and, in the case of the terminal molt, the formation of mature gonads (Mauchline, 1998). The developmental transition between late juveniles and adults has previously been profiled in a diapausing species (C. finmarchicus), where the terminal molt coincides with emergence from diapause. In C. finmarchicus, there is a clear connection between lipid metabolism and development, and transcriptomic studies identified large shifts in the expression of lipid synthesis and degradation enzymes between adults and juveniles (Lenz et al., 2014;Skottene et al., 2019). In contrast, the extent to which changes in lipid and carbohydrate metabolism are associated with development of R. gigas was unknown, so we specifically assessed changes in expression within these pathways during maturation to adulthood. Indeed, we found that R. gigas undergoes strong shifts in lipid synthesis and carbohydrate metabolism pathways during maturation, similar to previous observations in C. finmarchicus. This study opens the door for comparative analyses of copepod physiology and development.

Animal collection
Rhincalanus gigas were collected during austral summer 2019 aboard the "ARSV Laurence M. Gould" in association with sampling conducted by the Palmer Antarctica Long-Term Ecological Research (PAL LTER) program. The collection site was on the shelf of the West Antarctic Peninsula (LTER grid Station 600.040; 64 • 56.04 ′ S, 64 • 23.58 ′ W). Copepods were collected at 18:25 on January 7, 2019 by conducting a double oblique net tow to 300 m with a 2 × 2 m rectangular frame, a 700-μm mesh net, and a non-filtering cod end. The surface temperature was 1.5 • C, and the bottom depth was 1422 m. R. gigas individuals were contained in an ice-chilled bucket during sorting. Copepods were stored in RNAlater at − 20 • C or colder and shipped to Woods Hole Oceanographic Institution (WHOI) on dry ice. Staging was conducted in Woods Hole back at WHOI by examining the preserved individuals under a stereomicroscope, examining urosomes for signs of maturity by shape and presence of genital pores (Fig. 1), and measuring the prosome length (females >6 mm) (Seret, 1979;Razouls, 1994).

RNA extraction, library preparation, and sequencing
Total RNA was extracted from individual copepods using the Aurum Fatty and Fibrous Tissue Kit (Bio-Rad). The optional DNase treatment was omitted because we have empirically found that omitting the step can improve RNA quality and yield without observable DNA contamination (for further discussion see Tarrant et al., 2019). RNA yield, purity, and quality were assessed using a Nanodrop Spectrophotometer and denaturing agarose gel. Because the 28S rRNA in arthropod samples frequently fragments during sample preparation (McCarthy et al., 2015), samples were deemed to be of suitable quality if they possessed a distinct 18S band with no evidence of genomic DNA contamination (large bands) or degradation (smear of smaller bands). RNA was submitted to Arraystar Inc. (Rockville, MD) for library construction and sequencing. Libraries were constructed from six individual copepods using the KAPA Stranded RNA-seq Library Preparation Kit. Library quality was assessed using an Agilent 2100 Bioanalyzer, and yield was assessed through quantitative PCR. All libraries showed the expected fragment size distribution (400-600 bp) without evidence of adapter dimer contamination. The barcoded libraries were sequenced with 150 base pair paired-end reads to a depth of 40 M reads (20 M pairs) on an Illumina HiSeq 4000. The MIxS (Minimum Information about any Sequence) descriptions, which provide a standardized checklist for sequencing metadata (Yilmaz et al., 2011), are given in Table 1. To further confirm species identity, mtCOI sequences from each individual library were BLASTed against the nr database; all sequences were >99% identical to a deposited NCBI R. gigas mtCOI sequence (GenBank JN663361.1).
BLASTX and BLASTP v2.7.1 (Altschul et al., 1990) were used to search transcripts and peptides against the Swiss-Prot database (Boutet et al., 2007, downloaded March 18, 2020, and hmmscan (http://hmmer.org/) was used to search the PFAM database (Finn et al., 2014, downloaded March 16, 2020. Any peptides with a hit (e-value = 1e-10) to either SwissProt or PFAM were retained in addition to computationally predicted peptides. Sequence whose top five BLASTP hits by e-value contained a nonmetozoan sequence and no metazoan sequences were considered contamination and removed. We also used DIAMOND v0.9.30 (Buchfink et al., 2015) with the 'more-sensitive' setting to search proteins and nucleotides against NCBI's non-redundant protein database (nr, downloaded March 23, 2019).
We used the "shuffle" mode of FEELnc v0.1.1 (Wucher et al., 2017) to annotate long non-coding RNA's (lncRNA's). This is a machine learning approach that can be used to predict lncRNA's in de novo assemblies. The training set for coding transcripts was constructed from the set of complete universal single copy orthologs found from the BUSCO analysis (as in Huerlimann et al., 2018).

Annotation and gene set analysis
Because automatic annotation of non-model organisms is not always reliable, we manually annotated certain KEGG (Kanehisa and Goto, 2000) metabolic pathways of particular interest (as described in Section 3.5) using BLAST and a gene tree-based approach. For each KEGG gene ID, we aligned sequences with MAFFT (Katoh et al., 2009), constructed gene trees using iQTREE (Nguyen et al., 2015), and visualized trees with FigTree (http://tree.bio.ed.ac.uk/software/figtree). Sequences were annotated based on their top BLAST hits and their placement in the gene tree. Gene set enrichment analysis (GSEA) was performed on manually annotated KEGG pathways in the R package 'clusterProfiler' (Yu et al., 2012). As input, we ordered genes by shrunken log-fold change and used all genes annotated for a particular pathway as a custom gene set. A gene set was considered enriched at an FDR of 0.05.

Sequence read data
Three field-collected individuals from each of two developmental stages (CV copepodite and adult female, six individuals total) were sequenced to an average depth of 18.5 ± 2.2 M (mean ± SD) read pairs per sample, generating 111.1 M total reads. K-mer correction, quality trimming, and rRNA removal removed 17.6% of the raw reads for a final set of 91.5 M reads in total and 15.2 ± 2.4 M (mean ± SD) read pairs per sample. GC content of the final assembly was 39.7%.

Transcriptome assembly and quality control
We used a non-standard transcriptome assembly workflow in which the transcriptomes for each of the six individuals were assembled individually and subsequently merged. This follows a similar approach used by Roncalli et al. (2018) to successfully assemble another recent copepod transcriptome; in that study, this strategy was necessary to obtain a high-quality assembly. To validate this approach in our case, we compared our assembly to an alternative transcriptome generated from the same libraries using the default Trinity pipeline, in which all reads are initially merged ("Default assembly", Table 2). The final nonstandard assembly contained 265,173 contigs and had a TransRate score of 0.44, with 89.1% of all reads mapping back to the transcriptome. BUSCO results indicate a relatively complete assembly with 93.7% of single-copy arthropod orthologs recovered, although many of these were duplicated (C:93.7%[S:13.1%, D:80.6%], F:2.0%, M:4.3%, n:1013). The default Trinity assembly contained a smaller number of sequences (152,092 versus 265,173) but had a lower TransRate score (0.30 versus 0.44). The mapping rate of the default assembly was lower at 80%, and BUSCO scores for the default assembly were also slightly lower than for our final assembly, albeit with lower levels of duplication. For these reasons, we continued to use the non-standard assembly for downstream analysis.

Annotation
TransDecoder predicted peptides for 155,948 (58.8%) transcripts in the full assembly. We removed 2860 non-metazoan sequences, retaining a filtered set of 262,313 contigs for downstream analysis. Of the filtered contigs, 188,349 had at least ten total counts and were retained by Corset, which grouped them into 89,528 clusters; 62,148 (69.4%) Corset clusters had at least one predicted peptide and 39,361 (44%) had at least one BLASTP hit to the NCBI non-redundant database (nr) with a threshold e-value of 1e-10. Based on these BLAST results, 34,624 (38.7%) clusters received at least one gene ontology annotation, and 27,298 (30.5%) were annotated with at least one KEGG term.

Differential gene expression analysis
We sequenced six R. gigas individuals, three of which were identified as CV copepodites and three as adult females based on microscopic examination. Images of R. gigas are shown in Fig. 1. However, principal component and distance-based clustering analyses identified one juvenile ("RJ2") that strongly grouped with the adult animals (Fig. 2). This grouping was also apparent using an alternative dimensionalityreduction approach, t-SNE (van der Maaten and Hinton, 2008). It is possible that the RJ2 sample was misidentified as a juvenile and is actually another adult female, or that it is a late-stage CV close to its terminal molt, which could explain its greater transcriptional similarity to adults. This individual was also the largest of the juvenile samples. To avoid any ambiguity, we excluded this sample from subsequent analyses, thus comparing three adults and two juveniles. DE analysis without RJ2 identified 5198 (5.9%) genes with higher expression in juveniles and 1234 (1.4%) with higher expression in adults (FDR ≤ 0.05, LFC ≥ 1).
In order to see which of our results were robust to including the outlying sample, we also performed DE and GO enrichment analysis including the outlier. DE analysis using all samples identified 1344 (1.5%) genes with higher expression in juveniles and 353 (0.39%) with higher expression in adults. Complete differential expression results and gene annotations are given in Supplementary File 1.

GO enrichment and KEGG pathway analysis
After excluding the outlier, we identified 449 gene ontology (GO) categories enriched among genes with higher expression in juveniles (263 BP, 67 CC, 119 MF), and 70 GO categories enriched among genes with higher expression in females (41 BP, 15 CC, 14 MF). Including the outlier, we found a total of 110 GO categories enriched among genes with higher expression in juveniles (70 BP, 10 CC, 30 MF), and 11 GO categories enriched among genes with higher expression in females (4 BP, 4 CC, and 3 MF). The GO terms with strongest statistical significance (lowest adjusted p-value) were identical or similar in both cases (Supplementary File 2), so we only discuss the results excluding the outlier in the rest of the paper. Complete GO enrichment results are given in Supplementary File 2.
We used REVIGO to interpret and visualize the large number of GO categories, and the results are summarized in Fig. 3. After clustering with REVIGO, the top five biological process (BP) terms by p-value with higher expression in juveniles were "myofibril assembly", "muscle contraction", "epithelial cell migration, open tracheal system", "supramolecular fiber organization", and "locomotion"; the top five molecular function (MF) terms were "structural constituent of muscle", "calmodulin binding", "actin binding", "motor activity", and "signaling receptor activity". The top five BP terms with higher expression in females were "chitin metabolic process", "aminoglycan metabolic process", "nucleosome assembly", "carbohydrate derivative metabolic process", and "drug metabolic process"; the top five MF terms were "chitin binding", "structural constituent of chitin-based larval cuticle", "structural constituent of cuticle", "protein heterodimerization activity", and "hydrolase activity, acting on carbon-nitrogen (but not peptide) bonds".
Because automated BLAST-based pipelines typically result in incomplete annotation (e.g., Tarrant et al., 2016;Skottene et al., 2019), we manually annotated genes in KEGG pathways related to lipid and carbohydrate metabolism (glycolysis, fatty acid synthesis, fatty acid elongation, fatty acid degradation, the pentose phosphate pathway, glycerophospholipid metabolism, and the TCA cycle) as well as steroid biosynthesis. We then performed gene set enrichment analysis on the manually annotated pathways. Three of the eight pathways were enriched among genes upregulated in juveniles: glycolysis, fatty acid   synthesis, and glycerophospholipid metabolism (Fig. 4). These results broadly agree with the GO enrichment analysis, as GO terms such as "lipid metabolic process", "lipid transport", "calcium-dependent phospholipid binding", and "carbohydrate catabolic process" were upregulated in juveniles. For the list of genes annotated with these KEGG pathways, see Supplementary File 3. Lenz et al. (2014) identified three lipid metabolism genes with differential expression between CV copepodite and adult female Calanus finmarchicus. Two of these genes, an elongation of very long-chain fatty acids protein (Elov) and a delta-9 desaturase (Scd), had much higher expression in C. finmarchicus CV juveniles compared to adult females. The C. finmarchicus Elov gene was the best BLASTN hit for two R. gigas transcripts, and neither was differentially expressed in our dataset. However, three other R. gigas Elov genes were upregulated in juveniles, suggesting that upregulation of enzymes in the Elov family may be a general feature of late-stage copepodite development, even if the specific genes involved differ between species. The C. finmarchicus Scd gene had no significant similarity to any transcripts in the R. gigas assembly (BLASTN search with an e-value cutoff of 1e-5), but five other Scd-like genes were upregulated in juvenile R. gigas, again suggesting that developmental similarities may be found within gene families but not necessarily with the same orthologs. The third gene identified by Lenz et al. (2014), a diaglycerol o-acyltransferase (Dgat), also had no significant similarity to any R. gigas transcripts, and we did not identify any Dgat enzymes with differential expression in R. gigas.

Identification of long non-coding RNA's
We used FEELnc to annotate potential long non-coding RNA's (lncRNA's) in our transcriptome. We used the assembly's complete BUSCO orthologs as the training set for classification of coding and noncoding transcripts. FEELnc annotated 99,375 out of 262,313 transcripts as lncRNA's (37.9%). Of these, 14,645 transcripts had at least one hit to the nr protein database (e-value cutoff 1e-5) and were considered false positives, leaving 84,730 (32.3%) putative lncRNA's. This 85% accuracy of lncRNA annotation is similar to the error rate reported by the authors of FEElnc (Wucher et al., 2017) and in another recent crustacean transcriptome (Huerlimann et al., 2018). Of the 188,349 contigs included in Corset clusters, 50,293 (26.7%) were annotated as lncRNA's, and 22,361 out of 89,528 clusters (25.0%) consisted entirely of predicted lncRNA transcripts. Consistent with the fact that lncRNA's generally have low expression (Ilott and Ponting, 2013), 41% of lncRNA's fell below Corset's expression filter compared to 22% of mRNA transcripts. Indeed, the mean normalized counts value for lncRNA transcripts was 24.6, compared to 160.7 for mRNA transcripts. In total, 634 lncRNA clusters were differentially expressed between adults and juveniles (~9% of all DE genes), with 272 more highly expressed in adults and 362 more highly expressed in juveniles. Overall, a similar proportion of our assembly was annotated as lncRNA compared to the recent crustacean transcriptome assembled by Huerlimann et al. (2018).

Discussion
Despite the global abundance and ecological importance of marine copepods, the development of transcriptomic resources for these taxa has been somewhat limited (reviewed in Tarrant et al. (2019); see Asai et al. (2020) and Russo et al. (2020) for other recent copepod transcriptomes). The current study provides a high-quality transcriptome for the important Southern Ocean endemic Rhincalanus gigas. Another assembly for R. gigas was recently published by Lauritano et al. (2020). Their assembly is derived only from adult individuals, whereas the transcriptome reported here was assembled from more than one life stage and is thus considerably more complete (94% versus 70% BUSCO). Lauritano et al. (2020) focused on the identification of antifreeze proteins and glutathione metabolism enzymes in adult copepods. On the other hand, our study profiles differences in gene expression between two developmental stages (copepods and adult females), highlighting that juveniles upregulate lipid synthesis and glycolysis pathways compared to adults.

Transition between juvenile and adult is accompanied by a largescale transcriptional shift
In this study, we observed extensive changes in gene expression across the terminal molt between juvenile (CV copepodite) and adult Rhincalanus gigas copepods. Relative to adult females, we found that juvenile R. gigas substantially upregulate muscle-related genes, including functions and processes such as actin binding, myofibril assembly, and muscle contraction (Fig. 3a, b). This likely reflects preparation for molt-associated muscle remodelling. Crustacean molting is associated with cycles of muscle atrophy and growth, and pre-molt periods are often characterized by enhanced muscle growth and musclerelated gene expression (Mykles and Skinner, 1982;Kuballa et al., 2011). A previous study of copepod development observed large changes in myosin expression within the CV copepodite stage of Calanus finmarchicus (Tarrant et al., 2014).
This developmental gene expression program involves substantial changes to cell signalling, as juveniles upregulate ion transport, transmembrane transporter activity, and signalling receptor activity. Regulation of ion balance is a critical aspect of crustacean molting because molting requires major changes in ion concentrations and water balance as the exoskeleton is shed and replaced (Ahearn and Zhuang, 1996).  Table S3. FA: fatty acid; Phospholipids: glycerophospholipid metabolism.
Another interesting category of genes with higher expression in juveniles are those related to oxidative stress, including peroxiredoxin and antioxidant activity. Oxidative stress proteins are known to cycle over the course of the molt cycle in crabs, perhaps because oxidative species are associated with cell signalling and/or increased mitochondrial activity and steroid synthesis (Head et al., 2019). Female R. gigas displayed higher expression of genes related to chitin binding, chitin metabolism, and cuticle development (Fig. 3c, d). Copepod exoskeletons are made mostly of chitin-in fact, copepods are probably the largest source of chitin in the ocean (Souza et al., 2011). There is evidence that chitin plays a role in copepod reproduction, providing a possible explanation for the upregulation of chitin metabolism in females. The application of a chitin synthesis inhibitor to adult copepods (Acartia tonsa) substantially reduced egg and larval viability of exposed females, although it did not cause mortality for the females themselves (Tester and Costlow, 1981). Chitin also lines the reproductive tissues of adult copepods of both sexes (Sugier et al., 2018). Interestingly, female banana shrimp (a decapod crustacean) express more transcripts related to chitin binding and catabolism compared to males (Powell et al., 2018).
We found that long non-coding RNA's (lncRNA's) make up roughly 30% of the R. gigas transcriptome, in line with other recent crustacean assemblies (Huerlimann et al., 2018). The functions of lncRNA's remain poorly understood, particularly outside of model systems, although they generally have roles as regulators of gene expression (Engreitz et al., 2016). In Drosophila, lncRNA's are implicated in embryogenesis and development, and are often expressed specifically in the nervous system and gonads (Li et al., 2019). Interestingly, lncRNA's play roles in sex determination and X-chromosome dosage compensation in both humans (Pontier and Gribnau, 2011) and flies (Li et al., 2019), and have also been implicated in sex determination in crustaceans (Yang et al., 2018;Kato et al., 2018). The only other existing study of lncRNA's in a copepod showed that some lncRNA transcripts were perturbed by drug exposure (Valenzuela-Miranda et al., 2017). In the current study, lncRNA's accounted for just ~9% of differentially expressed genes despite representing ~25% of all genes, a pattern that can be explained (at least in part) by low lncRNA abundance. Nonetheless, some lncRNA transcripts are apparently involved in the developmental transition between CV juvenile and adult. The annotation of lncRNA's in non-model organisms such as copepods may help clarify the functions of non-coding transcripts in these groups, and perhaps identify homologous lncRNA's between closely-related species.

Juvenile R. gigas upregulate lipid and carbohydrate metabolism
Aspects of lipid and carbohydrate metabolism, including glycolysis, fatty acid synthesis, and glycerophospholipid metabolism, were upregulated in juveniles compared to adult females. These broad changes in energy metabolism may reflect preparation for an energetically expensive molt. For example, molt-associated increases in muscle growth would require large amounts energy. Other studies of crustacean development have similarly suggested that actively growing juveniles may have higher energetic requirements than adults (Kuballa et al., 2011), consistent with the upregulation of glycolysis. Juvenile copepods must also accumulate large amounts of lipids (Baumgartner and Tarrant, 2017), providing a straightforward explanation for the upregulation of fatty acid synthesis observed in R. gigas.
Among large lipid-storing copepods, the molecular basis of development is by far the best-studied in Calanus finmarchicus, which has a North Atlantic and subarctic distribution. In this species, large stores of wax esters are accumulated during the CV stage, which serve as fuel for molting, reproduction and diapause (Lee et al., 2006). Studies of C. finmarchicus development identified substantial shifts in gene expression related to lipid synthesis and degradation; these shifts occur both between CV and adult individuals (Lenz et al., 2014;Skottene et al., 2019), as well as within the CV stage (Tarrant et al., 2014(Tarrant et al., , 2016. Tarrant et al. (2016) profiled gene expression in early-and late-stage CV copepodites of C. finmarchicus, and found that many genes related to glycolysis and lipid metabolism were upregulated in early-stage CV's (Table S2 in Tarrant et al., 2016). Skottene et al. (2019) found that fatty acid degradation genes were downregulated in adult C. finmarchicus and suggested that protein catabolism is relatively more important than lipid catabolism in adults under starved conditions. Pathway analysis did not identify a consistent difference in the expression of fatty acid degradation genes in R. gigas (Fig. 4), although several individual genes in this pathway were expressed more highly in juveniles (Supplementary File 2).
Lipid metabolism is clearly linked to the development of polar Calanus spp. and Calanoides spp. because they undergo their terminal molt after emerging from a long period of lipid-fueled dormancy as juveniles (Lee et al., 2006). In contrast, R. gigas does not appear to rely on diapause as its primary overwintering strategy and is active or semiactive year-round in the Southern Ocean (Pasternak and Schnack-Schiel, 2002). Although R. gigas does store wax esters in oil sacs, it accumulates smaller lipid stores relative to species with a well-defined winter diapause (Tarling et al., 2007) and loses less lipid weight during the winter, probably because it actively feeds during that time . Nonetheless, we observed upregulation of lipid synthesis in juvenile R. gigas relative to adult females. Regulation of lipid metabolism may be a general aspect of development for large lipidstoring copepods, rather than being strictly connected to diapause. Lipid metabolism is also important in the development of decapod crustaceans (Kuballa et al., 2011) and insects (Gilbert and O'Connor, 2012), and may be a conserved feature in pancrustacean development.
In a recent study of the lipid-storing copepod Neocalanus felmingerii, high expression of glycolysis and glycerophospholipid metabolism were characteristic of females in the early-to-mid stage of oogenesis . Phospholipids can be important components of copepod eggs (Ohman and Runge, 1994;Lee et al., 2006)-although this does not seem to be true in all copepod species, and some studies of the same species differ in the reported abundance of egg phospholipids (Lee et al., 1972;Leiknes et al., 2016). In R. gigas, the main lipid components of eggs are triglycerides and polar lipids, which include phospholipids (Ward et al., 1996). The upregulation of glycolysis and glycerophospholipid metabolism observed in R. gigas juveniles could reflect early oogenesis in CV copepodites. Although the reproductive biology of R. gigas has not been well-studied, this would be consistent with other copepods, where gonadogenesis can begin during the CV stage (Hirche, 1996) or in some species even earlier (Kosobokova, 1999). Regardless of when oogenesis begins, the lipids accumulated by juveniles serve as fuel for adult reproduction as well as for the molt itself in many copepods (Lee et al., 2006). There is evidence that this is true in R. gigas because it is able to reproduce in apparently food-poor conditions (Rakusa-Suszczewski et al., 1976;Ward and Shreeve, 1995;Shreeve et al., 2002).
Some of the other changes in gene expression observed in this study are likely associated specifically with female reproductive development, as we did not sample adult males. Many calanoid copepods have highly female-skewed sex ratios, due to male-specific mortality (Hirst et al., 2010) and/or skewed sex ratios at birth (Burris and Dam, 2015). Indeed, low male:female sex ratios have frequently been observed for R. gigas (Marin, 1988;Spiridonov and Kosobokova, 1997). Future studies of males could clarify which aspects of the developmental program observed here are generally related to molting and development and which are specific to female reproductive development; lncRNA transcripts may be particularly interesting in this regard because of their known roles in sex determination and gonadal function (Yang et al., 2018;Kato et al., 2018;Li et al., 2019).

Conclusions and future directions
We present a high-quality transcriptome assembly for Rhincalanus gigas, a large and abundant lipid-storing copepod endemic to the Southern Ocean. Using this assembly, we found that the developmental transition between late-stage (CV) juveniles and adults is marked by substantial changes in gene expression. Compared to adult females, juveniles upregulated lipid and carbohydrate metabolic pathways, which likely reflects ongoing lipid accumulation and preparation for molting and reproduction. A wide variety of other molecular processes are also implicated in molting and maturation in R. gigas, including muscle growth, chitin metabolism, and regulation of ion transport. The new transcriptome presented here extends the taxonomic breadth of molecular studies in copepods and contributes to our understanding of developmental transitions and metabolic adaptations of pelagic copepods. In conjunction with the ongoing development of molecular resources for other species, this assembly will enable future comparative studies for this ecologically important group of zooplankton. This will help determine whether copepods share a common developmental gene expression program, or if differences in lifestyle (herbivory versus omnivory) and lipid composition (wax esters versus triglycerides) are associated with differences in developmental transitions at the gene expression level.

Data availability
Sequenced reads and assemblies are available on the Short Read Archive and the Transcriptome Shotgun Assembly (TSA) database, respectively (BioProject PRJNA666170). The TSA project has been deposited at GenBank under the accession GIVD00000000.

Funding
Funding for this project was provided by the National Science Foundation (Grants OPP-1746087 to AMT and OPP-1440435 to DKS).

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.