Hex Hox: De Novo Transcriptome Assembly and Embryonic Hox Gene Expression in the Burrowing May y Hexagenia Limbata (Ephemeridae)


 Background: Hox genes are key regulators of appendage development in the insect body plan. The body plan of Mayfly (Ephemeroptera) nymphs differs due to the presence of evolutionarily significant abdominal appendages called gills. Despite mayflies’ basal phylogenetic position and novel morphology amongst insects, little is known of their developmental genetics. Here we present an annotated transcriptome for the mayfly Hexagenia limbata, with annotated sequences for putative Hox peptides and embryonic expression profiles for the Hox genes Antp and Ubx/abd-A. Results: Transcriptomic sequencing of early instar H. limbata nymphs yielded a high-quality assembly of 83,795 contigs, of which 22,975 were annotated against Folsomia candida, Nilaparvata lugens, Zootermopsis nevadensis and UniRef90 protein databases. Peptide annotations included eight of the ten canonical Hox genes (lab, pb, Dfd, Scr, Antp, Ubx, abd-A and Abd-B), most of which contained all functional domains and motifs conserved in insects. Expression patterns of Antp and Ubx/abd-A in H. limbata were visualized from early to late embryogenesis, and are also highly conserved with patterns reported for other non-holometabolous insects.Conclusions: We present evidence that both H. limbata Hox peptide sequences and embryonic expression patterns for Antp and Ubx/abd-A are extensively conserved with other insects. These findings suggest mayfly Antp and Ubx/abd-A play similar appendage promoting and repressing roles in the thorax and abdomen, respectively. The identified expression of Ubx and abd-A in early instar nymphs further suggests that mayfly gill development is not subject to Ubx or abd-A repression. Previous studies have shown that insect Ubx and abd-A repress appendages by inhibiting their distal structures, which can permit the development of proximal appendage types. In line with past morphology-based work, we propose that mayfly gills are proximal appendage structures, possibly homologous to the proximal appendage structures of crustaceans.


Background
Arthropods are the most speciose clade of animals on earth, an evolutionary success widely attributed to the evolution and diversi cation of segmented body plans [1]. Of particular note is the insect body plan, which consists of a head with antennae and gnathal appendages, a thorax with three pairs of walking legs, and an abdomen devoid of appendages except external genitalia [2,3]. This relatively simple body plan is the basis for a vast range of appendage diversi cation. One example is the transition of ancestral gnathal appendages to piercing and sucking structures in hemipterans, an elongated proboscis in many lepidopterans, a sponge-like proboscis in many dipterans, and structures adapted for nest construction or defense, as in numerous hymenopterans [4,5]. Similarly, the thoracic leg segments may be elongated for mobility on the water surface, as in Gerridae hemipterans, or enlarged for jumping, as in many orthopterans. With such immense diversity, insects provide unique opportunities to study the evolutionary mechanisms of body patterning and appendage diversi cation.
From the perspective of developmental genetics, insect appendage diversity has been explained in part by the Hox genes, a highly conserved gene family rst characterized in the fruit y Drosophila melanogaster [6,7]. The canonical Hox family comprises ten genes organized on a single chromosome, each expressed along the anterior-posterior axis of the embryo in parallel with their chromosomal order [8, 9,10,11]. Hox genes are key factors in regulating body patterning and appendage identity, developing unique appendage phenotypes, and shifting appendage morphology [12,13,14,15,11,16,17]. Insect Hox genes are organized in two complexes. The anterior Antennapedia complex is key for specifying the development of antennae, gnathal appendages, and thoracic legs, and consists of the genes labial (lab), proboscipedia (pb), Hox3, Deformed (Dfd), Sex combs reduced (Scr), fushi tarazu (ftz) and Antennapedia (Antp) [11]. The posterior Bithorax complex functions primarily to regulate appendage development in the abdomen, and contains the genes Ultrabithorax (Ubx), abdominal-A (abd-A), and Abdominal-B (Abd-B) [11]. All Hox genes contain a DNA binding homeodomain sequence, and can be further distinguished by the presence or absence of several conserved functional regions, such as the SSYF motif, hexapeptide, and UbdA motif [18,19]. Collectively, these genes are crucial for evolving novelties in the insect body plan, such as the specialized thoracic legs of orthopterans and Gerridae hemipterans, and possibly the abdominal appendages of ephemeropterans [20,21,22].
May ies (Ephemeroptera) belong to one of the earliest branching clades of winged insects [23,24], and develop from aquatic nymphs that develop paired abdominal gills on the rst seven abdominal segments (Additional les 1: Fig.S1). Gill morphology is incredibly diverse, ranging from small thin threads to attened, leaf-like lamellae, highly sclerotized plates, and bilamellate, feathery structures. Moreover, gill position on the abdomen may also be dorsal, lateral, or ventral, and many species exhibit shape and size differences in gills along the abdominal segments [25,26]. Coincident with this morphological diversity, gills serve a variety of functions in food acquisition/water movement, locomotion, oxygen and ion uptake, protection of other gills, and adherence to the substrate (e.g. [27,28,29]). The unique morphology and functional roles of may y gills represent a distinct divergence from the appendage-less abdomen that de nes most insects, raising key questions on the origins of the insect body plan and the evolution of novel appendage types, such as wings.
The evolutionary position of may ies has sparked recent interest in their genetics. The rst may y transcriptome for Cloeon viridulum (Baetidae) was sequenced to study differential gene expression during metamorphosis, and a genome and several transcriptomes of C. dipterum were sequenced to assess may y lifecycle adaptations and support the development of C. dipterum as an emerging model system [30,31,32]. However, next-generation sequencing outside the Baetidae family is sparse, as is the availability of data for Hox genes from early clades of winged insects. Furthermore, only two published studies have documented gene expression patterns during may y embryogenesis, both focused on the expression of the segment polarity genes Engrailed and wingless [33,34]. The developmental genetics of nymphal body patterning, and the potential role of Hox genes, thus remains unknown.
In this study, we assembled an annotated transcriptome for early instar nymphs of the North American burrowing may y Hexagenia limbata (Ephemeridae; [35]), with detailed annotations of eight putative H.
limbata Hox proteins. We also provide the rst visualization of Hox gene expression in a may y, focusing on H. limbata Antp and Ubx/abd-A from early to late embryogenesis. Our data shows that Hox peptide sequences contain all expected functional sequences for insects, while Antp and Ubx/abd-A expression domains are highly conserved with non-holometabolan insects. These results suggest that may y gills are proximal appendage structures not regulated by Hox gene regulation, and provide a basis for further exploration of insect appendage evolution via may ies as an evo-devo model.

Egg collection and maintenance
Mature H. limbata females were collected by black lighting at Sky Pond (New Hampton, Belknap Co., NH) on peak hatch nights in June and July, 2013-2016. Eggs were extracted from captured females by submerging the abdomen into a conical tube containing pond water to stimulate egg laying [36]. Egg production amongst female may ies tends to scale with overall body size, with each large female Hexagenia producing ~8,000 eggs [37,36,38].
Collected eggs were washed in a solution of 10% bleach and rinsed thoroughly with aged (24h) distilled water. All eggs were maintained in aged distilled water at room temperature (approximately 25°C) until xation or reaching approximately 50% development, then stored at 4°C to induce a diapause-like state [39] for future study. Hatched nymphs were reared in glass containers with aged distilled water, while older nymphs were collected directly from pond mud samples and housed in containers lled with pond water and mud. All nymphs were maintained at room temperature and ambient light conditions (approximately 12-hour light-dark cycles). Immunohistochemistry Embryos were xed by modifying an established protocol [34]. Live eggs were rst washed thoroughly with PBTw (1X phosphate-buffered saline + 0.1% Tween), soaked for 6 minutes in a 50% bleach solution to remove the chorion, then xed for 30-50 minutes with agitation in a 6% formaldehyde and PBTw xative with heptanes at a 2:1 ratio. Following xation, eggs were washed in PBTw and stored at -20°C in absolute methanol.
Fixed eggs were rinsed in PBTw and stripped of the vitelline membrane by submersion in a waterbath sonicator (Fisher Scienti c FS20D) at 42Khz (+/-6%) for several seconds. Embryos were then soaked in SuperBlock T20 (Thermo Scienti c, MA) for 30 minutes at room temperature and incubated overnight at 4°C in either 12ng/µl of Antp 4C3 (DSHB, University of Iowa; deposited by Brower, D.) or 20ng/µl of Ubx/ABDA FP6.87 primary antibody (DSHB, University of Iowa; deposited by White, R.) diluted in SuperBlock T20. Following primary antibody incubation, the embryos were washed with PBTw for one hour (1 wash/10 minutes), then incubated for two hours in a 1:500 dilution of horseradish-peroxidase conjugated goat anti-mouse secondary antibody (Jackson ImmunoResearch, PA) in SuperBlock T20.
Embryos were washed again as above in PBTw, equilibrated for 20 min (1 wash/5 minutes) in 1X stable peroxide buffer (1XHP) (Thermo Scienti c, MA), and developed for 10 minutes using 1:10 dilution of metal-enhanced diaminobenzadine substrate (Thermo Scienti c, MA) in 1XHP buffer. After developing, embryos were washed in PBTw and stored at -20°C in 80% glycerol. Negative control embryos of all stages were incubated in SuperBlock T20 instead of primary antibody, and showed little sign of nonspeci c staining (Additional les 1: Fig. S2).

Nymphal cDNA library preparation
We used multiple nymphs as starting material to represent the full pool of transcripts present in early nymphal development. Of the approximately 100 µl of nymphs used for RNA sequencing, most were rst instar, and the rest second instar. Total mRNA was extracted using TRIzol (Ambion), then column puri ed with RNeasy (Qiagen). Puri ed RNA was treated with Turbo DNase (Ambion), quanti ed, and checked for purity with a NanoDrop 2000 (NanoDrop Technologies, Wilmington, DE) before storage at -80℃.
Total mRNA was sent to the Hubbard Center for Genome Studies (HCGS, University of New Hampshire, Durham, NH) and checked for quality and quantity with an Agilent 2100 Bioanalyzer (Agilent Technologies, Santa Clara, CA). An Illumina compatible library was constructed using an Illumina TruSeq RNA Prep Kit V2 with index Set A (RS-122-2101), following the low sample input protocol (Part #15026495 Rev. F). Brie y, 1 µg of total mRNA was used as initial input; mRNA was then puri ed, fragmented, and primed with random hexamers using poly-T oligo attached magnetic beads. cDNA was reverse-transcribed with SuperScript II Reverse Transcriptase, 3' adenylated, ligated with RNA adapter indices, and PCR-enriched. Finally, the cDNA library was checked for quality with an Agilent 2100 Bioanalyzer, and normalized to 10 nM prior to sequencing.

H. limbata read & assembly statistics
Sequenced paired-end raw reads were 151 base pairs in length. Phred-based quality scores for read pairs differed dramatically, with the forward R1 reads having a mean Phred score of 33 (99.95% call accuracy) or greater across all bases, while the reverse R3 reads had mean Phred scores that varied extensively, from 22 (99.37%) or greater initially, but dropping rapidly after base pair 64 down to 2 (36.90%) by base pair 90 (Additional les 1: Figs. S3 and S4). No reads were agged as poor quality. The ORP and CLC assemblies of H. limbata produced a similar number of contigs; however, the ORP assembly produced generally longer contig sequences ( Table 1). The two assemblies differed notably in terms of quality, with the ORP producing far higher TransRate and BUSCO completion scores (Table 1). Given the superior quality of the ORP assembly, open reading frame (ORF) annotation and alignments were done exclusively with ORP contigs. The only two exceptions are H. limbata lab, which is derived from a CLC contig, and H. limbata Dfd, which used the consensus from combining an ORP and CLC contig (Additional les 1: Fig.  S5). The ORP assembly was annotated against Folsomia candida, Nilaparvata lugens, Zootermopsis nevadensis and UniRef90 peptide databases (Table 1; Additional le 2). The number of contigs with ORFs was provided by TransRate. The TransRate assembly score assesses how accurate and complete an assembly is, while the optimal score is the assembly score obtained after removing all poorly assembled contigs. The complete BUSCO score represents the number of database sequences identi ed in the assembly, and was calculated using the insect ortholog database Odb10.   (Figs. 1-7), and contained a core YKWM (Fig. 1) or YPWM (Figs. 2-7) sequence. The H. limbata Abd-B peptide lacks the Hx motif, and instead has only one conserved tryptophan residue, similar to that of other hexapod homologs (Fig. 8). Including this conserved tryptophan, all eight H. limbata Hox peptides contained a linker region (LR) between the Hx and HD that was longest in the anterior Hox peptides (Figs. 1 and 2, 69 and 15 residues long respectively), and progressively shorter in most posterior peptides (e.g. Figs. <link rid=" g4">4</link> and 5, 14 and 4 residues long respectively), with three semi-conserved residues in the Abd-B homolog (Fig. 8).
Speci c subgroups of Hox proteins contain additional conserved motifs. At the N-terminus is the SSYF motif found in Scr, Antp, Ubx, and abd-A peptides (Figs. 4-7). As the N-terminus arm is missing in our partial H. limbata Dfd peptide, it is unclear if limbata Dfd contains the SSYF motif as do hexapod homologs (Fig. 3). Ubx and abd-A contain a number of unique signatures identi ed in H. limbata, including the UbdA peptide, the QAQA and Poly-A sequences unique to Ubx, and the TDWM and PFER motifs unique to the abd-A linker region (Figs. 6-7).

Antennapedia and Ubx/abd-A gene expression patterns
Early H. limbata embryos do not show Antp expression (Fig. 9a, b). Once the precursors to the gnathal and thoracic segments form, weak expression appears in the three thoracic segments, especially along the sides where the thoracic legs will develop (Fig. 9c, d, black arrowheads). As the thoracic limb buds appear, expression becomes prominent at their edges ( Fig. 9e-g), with midline expression present but weaker (Fig. 9h, black arrow). Expression in post-segmentation embryos becomes stronger in the thoracic midline, and extends through the abdominal segments beginning in A1-A3 (Fig. 9i), then through to A9 (Fig. 9j, k). In late-stage embryos (Fig. 9k), abdominal expression becomes uniform across the thorax and abdomen, but is still absent from the abdominal lateral edges and the A10 segment (Fig. 9k).
Like Antp, Ubx/abd-A expression is not evident in early H. limbata embryos (Fig. 10a), including those that have nearly developed the presumptive gnathal and thoracic segments (Fig. 10b). Once segmentation is complete, expression is prominent throughout all abdominal segments except for A10, with weak expression appearing at the posterior edges of T2 and T3 (Fig. 10c, black arrows). Thoracic expression becomes more prominent in post-segmentation embryos (Fig. 10d), with expression beginning to spread in T3 (black arrowheads), and intensifying at the T3/A1 border (black arrow). In the oldest embryos imaged (Fig. 10e), these expression patterns persist, with strong expression at the T3/A1 border and T2 expression evident along the entire posterior border. In all documented stages, expression is not observed in the developing appendages or in the A10 segment. Unlike Antp expression, lateral expression in the segments is strong, while midline expression in the thorax (Fig. 10d) and abdomen (Fig. 10d, e) is reduced or absent.

ORP Assembly Quality is Comparable to Assemblies of Non-model Insects
Our 88.8% ORP BUSCO completion score was similar to or higher than those seen in recently published insect transcriptome data (e.g. [53,54,55] . Thus, the low quality seen in many of our R3 reads may have depressed the assembly score despite read trimming done by the ORP. These factors also highlight the importance of read trimming in assembly quality, as trimming was only conducted for the ORP assembly.
May y Hox peptides are highly conserved relative to other hexapods H. limbata Hox peptides contain all functional domains and motifs widely conserved in insect Hox genes, including a homeodomain, linker region, and hexapeptide motif or residue. Several functional regions speci c to particular Hox genes are present in the H. limbata peptides. These include the presence of an N-terminal SSYF motif in Scr, Antp, Ubx, and abd-A, and its absence in lab and Abd-B; TDWM and PFER motifs in the linker region of abd-A; the C-terminal UbdA peptide in both Ubx and abd-A; and C-terminal QAQA and poly-A sequences in Ubx [19].
Further evidence of high sequence conservation in H. limbata Hox genes comes from speci c residues within these functional regions. For example, there are four residues unique to Hox homeodomains: a glutamic acid in alpha-helix 1, an arginine and glutamic acid in alpha-helix 2, and a methionine in alphahelix 3 [19]. These residues were all identi ed in our H. limbata sequences (e.g., residues Glu-136, Arg-148, Glu-150, and Met-171 in Fig. 1; see Figs. 2-8 and [19]). Additional unique homeodomain residues exist that are speci c to particular Hox genes. Homeodomains for the Hox genes lab and pb have the largest number of unique residues, primarily within the N-terminal arm and rst and third alpha-helices [19].
A number of residues in the linker regions of lab, pb, Dfd, and Scr peptides are also conserved, though these vary more than the homeodomain and hexapeptide regions. For example, many animal lab linker regions share a VKRXXPKTXKXE sequence [19], which in H. limbata is represented by VKRXXPKP (Fig. 1, residues 7-14; with the conserved threonine replaced by proline). The rest of the linker sequence varies in most aligned hexapods, a phenomenon that is also seen in the conserved XKKXXK sequence for pb (Fig. 2, residues 7-13), and the KVHL sequence in Dfd and Scr (Figs. 3, 4, residues 11-14 and 9-12 respectively; [19]). The role of Hox genes in nymphal body patterning

Antp and Ubx/abd-A embryonic expression is highly conserved amongst insects
The body plan of may y nymphs diverges from that of most other insects in possessing unique abdominal appendages. Developmental regulation of thoracic legs and abdominal appendages in insects is controlled by the Hox genes Antp, Ubx, and abd-A, each of which contain functional regions well conserved between H. limbata and many insects. These include the SSYF motif necessary for the transcriptional activation of downstream target genes [69], and the hexapeptide motif, which contributes to extradenticle protein binding [70]. The length of each Hox gene linker region is likewise widely conserved between H. limbata and other insects, and facilitates proper gene function [71,19]. However, as little is known regarding the functional signi cance of most linker region residues [19], the potential impact of both H. limbata speci c and gene-speci c differences in linker region sequences remains unknown. Conserved N-terminal residues of Antp, Ubx, and abd-A were also identi ed in H. limbata, and play a major role in specifying DNA binding a nity [72,19]. While it remains unknown if H. limbata Dll is abdominally expressed, it is possible that may y gills are exclusively proximal structures, akin to the prolegs of saw y larvae, and thus not subject to Dll inhibition by Ubx or abd-A. Such a hypothesis is further supported by previous work that posits apterygote styli and may y gills as proximal appendicular structures possibly related to the proximal appendicular structures of crustacean epipodites [3,91].

Conclusions
Our investigation of H. limbata Hox peptides identi ed eight of the ten canonical Hox genes, and revealed the presence of key functional regions highly conserved with other Hox insect homologs. The expression of embryonic Antp also is highly conserved with that of other insects, becoming apparent in the developing thoracic limb buds and subsequently spreading throughout the thoracic and abdominal midline. Similarly, embryonic Ubx/abd-A expression closely matches what is observed in other studied insects, particularly non-holometabolan species. The extensive conservation of both H. limbata Hox sequences and expression pro les with those of other insects suggests that the unique body plan of may y nymphs may not be directly regulated by Hox genes during embryogenesis. Based on previous morphological studies and our current ndings, we hypothesize that may y abdominal gills are proximal morphological structures. As the gills of H. limbata develop in the second nymphal instar [92,93], Additional les 1: Fig. S1) rather than during or immediately after embryogenesis, studies assessing the expression of Ubx/abd-A and Dll during rst instars, and their overall functional roles during H. limbata development, will rigorously test this hypothesis and provide further clarity on the unique evolution of may ies.  Ephemera danica; C.d., Cloeon dipterum. All sequence accession numbers are reported in Additional le 1: Table S1.  Horseradish peroxidase staining of H. limbata embryos using the Antp (4C3) antibody. Early em-bryos do not show any expression (a, b). As the embryo adds segments to the posterior, expression is present in the three thoracic segments, where thoracic limb buds will eventually protrude (c, d). As the limb buds elongate, expression extends into the buds (f), eventually appearing as patches of strong expression at the anterior of T2 and T3 (arrowheads in g, h), while expression at the midline of each segment is less strong (g, h). At this time, expression also begins to faintly appear on the posterior edge of the labial segment (g, h). After segmentation concludes, strong expression in the thorax and thoracic limbs is present, weak expression is still evident at the posterior edge of labial segment, and expression is seen in the center of the A1-A3 segments (i, j). In the oldest embryos we imaged, expression in the abdomen becomes stronger extends up to the A10 segment, but is absent from the lateral edges (k). Expression at this stage extends from the posterior of the labial segment through the A10 segment (k). Mn, mandible; Mx, maxilla; Lb, labium; T1-T3, thoracic segments; A1-A10, abdominal segments. Ventral view, anterior to the top in all panels. Scale bars 0.10mm. Image magni cations are 200X for a-b and i-k, 100X for c and eg.

Figure 10
Horseradish peroxidase staining of H. limbata embryos using the Ubx/ABD-A (FP6.87) antibody. Early in embryogenesis, expression was not visible (a, b). The rst apparent expression was ob-served as small, centrally located patches in the posterior of the T2 and T3 segments (arrowheads in c). Abdominal expression was stronger and extended through all abdominal segments until A10, with the strongest staining found in the A1 segment (arrow in c). Thoracic staining was weak but apparent at the