Evolution of the insect Hox gene cluster: Comparative analysis across 243 species

The Hox gene cluster is an iconic example of evolutionary conservation between divergent animal lineages, providing evidence for ancient similarities in the genetic control of embryonic development. However, there are differences between taxa in gene order, gene number and genomic organisation implying conservation is not absolute. There are also examples of radical functional change of Hox genes; for example, the ftz, zen and bcd genes in insects play roles in segmentation, extraembryonic membrane formation and body polarity, rather than specification of anteroposterior position. There have been detailed descriptions of Hox genes and Hox gene clusters in several insect species, including important model systems, but a large-scale overview has been lacking. Here we extend these studies using the publicly-available complete genome sequences of 243 insect species from 13 orders. We show that the insect Hox cluster is characterised by large intergenic distances, consistently extreme in Odonata, Orthoptera, Hemiptera and Trichoptera, and always larger between the ‘posterior’ Hox genes. We find duplications of ftz and zen in many species and multiple independent cluster breaks, although certain modules of neighbouring genes are rarely broken apart suggesting some organisational constraints. As more high-quality genomes are obtained, a challenge will be to relate structural genomic changes to phenotypic change across insect phylogeny.


Introduction
Insects display an astounding range of developmental and morphological diversity.Comprising over half of all described animal species, insect diversity has been attributed to high rates of speciation and adaptive radiation in association with flowering plant diversification, underpinned by dynamic rates of gene and genome evolution.Together with the orders Protura, Diplura and Collembola, insects make up the Hexapoda, a clade within Arthropoda consisting of six-legged, mostly terrestrial species.Within Hexapoda there have been several major evolutionary transitions associated with novel phenotypic traits.The evolution of insect wings is one such event which resulted in diversification of body forms within the clade Pterygota [1].A later event was the emergence of complete metamorphosis in the holometabolous insects, thought to have permitted rapid diversification.Indeed, the most diverse and speciose insect orders are found within the holometabolous pterygotes (Hymenoptera, Coleoptera, Diptera and Lepidoptera).While the insect body plan is generally well conserved, a myriad of morphological novelties have emerged through insect radiation, ranging from pronotal horns on some beetles, sucking mouthparts in Hemiptera and (most) Lepidoptera, stings in bees and wasps, and halteres in Diptera and Strepsiptera.
Changes in developmental processes underlie morphological diversity, and ultimately these developmental changes must be underpinned by inherited genetic changes.Identifying these genetic changes is one of the goals of evolutionary developmental biology (evo-devo) although this is a difficult task when the morphological transitions occurred tens or hundreds of millions of years ago.One place to start is with the genes shared between taxa, and with key roles in development: a set of genes sometimes called the developmental toolkit.The Hox genes are examples of such core developmental genes, encoding position along the anteroposterior axis of most animal embryos.Furthermore, Hox genes code for transcription factors that activate and repress cascades of downstream genes to sculpt the morphology appropriate to that position.Later in development, Hox genes also orchestrate cell differentiation decisions, primarily though not exclusively within their original embryonic expression domains [2].Changes in the content, order and expression domains of these genes have been implicated in a huge range of morphological novelties in the arthropod body plan [2].
The insect Hox cluster is thought to have consisted ancestrally of 10 genes: labial (lab), proboscipedia (pb), zerknüllt (zen), Deformed (Dfd), Sex combs reduced (Scr), fushi tarazu (ftz), Antennapedia (Antp), Ultrabithorax (Ubx), abdominal-A (abdA), and Abdominal-B (AbdB), similar to that of the bilaterian ancestor [3,4].Of these, zen and ftz have 'altered' roles, having switched from their ancestral roles in anteroposterior position specification to extraembryonic membrane patterning (zen) and segmentation (ftz).Until recently, we have lacked a thorough knowledge of the multiple evolutionary paths that have been taken from this ancestral state across the diversity of insect orders.Here, we provide an updated view of the evolution of the Hox gene cluster across the largest sample of insect genomes sampled to date.We focus on the evolution of Hox cluster organisation, and do not discuss recent work on Hox gene regulation or the changing downstream functions of Hox genes, such as co-option of Hox genes to accessory roles in different orders (e.g.[5][6][7][8][9]).Furthermore, we focus on protein-coding loci within Hox clusters, and do not cover non-coding RNAs since these cannot be predicted reliably from genome sequences.There is certainly good evidence for antisense lncRNAs produced from within the Hox cluster of insects and other arthropods, as well as annelids, but comparative data are sparse (for review see [10]).We do not report new sequence data here: these analyses are based on publicly-available complete genome sequences, interpreted in the light of previous analyses.

Hox genes in a new era of insect genomics
In the pre-genomic era of molecular biology, from the early 1980s to around 2005, Hox gene clusters were analysed either by painstaking positional cloning of mutants or by cross-hybridisation of probes to genomic libraries followed by laborious genomic walking and clone-byclone sequencing, sometimes coupled with in situ hybridization to chromosome spreads [11,12].These heroic efforts were limited to a few species, but they started to generate a picture of comparative stasis of insect Hox gene clusters.For example, in the fruitfly Drosophila melanogaster, the insect in which Hox genes were first studied, it was clear that the zen gene had undergone tandem duplication to give three genes: zen, z2 and bcd.Otherwise, there were no examples of Hox gene duplication in that evolutionary lineage.Similarly, in the red flour beetle Tribolium castaneum, there is a zen duplication to give two genes, but otherwise the cluster is unaltered [13,14].It was, of course, clear from the earliest days that splitting of the cluster was possible, as evidenced by the cluster split in Drosophila melanogaster with a 9.6 Mb gap between Ubx and Antp.It was also clear that intergenic distances could be very large, as in Drosophila melanogaster and the locust Schistocerca gregaria [11].Nonetheless, these are relatively minor changes compared to whole Hox gene cluster duplication in vertebrates and extensive posterior Hox gene duplication in vertebrates and amphioxus [15,16].
From the mid-2000s onwards, complete genome sequencing began to be applied to single species, or small sets of related species, and the evolutionary picture was refined.First, it became clear that splitting of the cluster was not a unique event in Drosophila melanogaster, as there have been independent splits at different positions in some other Drosophila species [17].Second, lability of the zen gene, in terms of propensity to duplication, was reinforced by a remarkable discovery in the genome of the silk moth Bombyx mori [18].This analysis revealed extensive tandem gene duplication, generating at least 13 copies of zen: one locus with an amino acid sequence similar to the ancestral zen gene, and 12 that have diverged extensively and given the name Shx (Special homeobox) genes.Later analysis of a refined genome assembly suggested the number may be even greater: zen plus 15 Shx loci, although not all can encode functional proteins [19].Genome sequences of two butterflies (Heliconius melpomene and the Monarch Danaus plexippus) revealed presence of four Shx genes [20,21], as did low coverage genome skims of the Comma butterfly Polygonia c-album, Speckled Wood butterfly Pararge aegeria, Scarlet Tiger moth Callimorpha dominula and Horse Chestnut leaf-miner Cameraria ohridella [19].
The above historical perspective is one of a gradually unfolding picture emerging in a piecemeal manner as each additional genome is sequenced or analysed.But the landscape is now changing rapidly.In 2018, the Earth BioGenome Project was announced, as a bold vision to determine the complete genome sequence of all living eukaryote species [22].This vision has galvanised action from over 40 affiliated projects, each attempting to determine high quality genome sequences at scale [23].Among these, the project that has generated the largest number of high quality insect genomes to date is the Wellcome Trust-funded 'Darwin Tree of Life' (DToL), focussed on species living in Britain and Ireland [24].Since 2019, the DToL project has generated 381 complete genome assemblies with over 2000 more species in the genome sequencing pipeline (data as of July 2022: https://portal.darwintreeoflife.org/tracking).These genome assemblies have been determined using long-read DNA sequencing technology (primarily PacBio HiFi) and scaffolding to chromosome-level using Hi-C.As such, they surpass in quality the large majority of genome assemblies previously available.In particular, the large contig sizes scaffolded to chromosomes provides opportunity to determine gene order and distances.Importantly, all data from the DToL project are released openly.
In reviewing the evolution of Hox gene clusters, we consider that the landscape of the field has changed so remarkably in the past two years that we cannot draw conclusions solely from previously published analyses.Instead, we supplement previous findings with analyses of the openly released data from DToL and other genome sequencing projects.We do not report new experimental data here, but rather draw new conclusions from available data.We use these data to summarise patterns of Hox gene duplication and the changes to genomic organisation across insects, using genomic data from 243 species representing 13 insect orders, plus one order of non-insect hexapod as an outgroup (Table 1).We show that insects continue to be an important model for understanding Hox gene evolution and, with the development of further methods and models for genetic manipulation from a phylogenetically diverse set of orders, will be vital for progress in the field of evolutionary developmental biology [25].

Gene loss in insect Hox clusters
There are no clear examples of Hox gene loss within insects, at least for the 'canonical' Hox genes that play roles in specifying anteroposterior position.All eight of the expected canonical Hox genes -pb, lab, Dfd, Scr, Antp, Ubx, abdA, AbdB -are present in all insects (Fig. 1).This contrasts to some other arthropod lineages (see [26]).For example, within Crustacea the abdA gene is proposed to be missing in three barnacles that have been studied (Elminius modestus, Trypetesa lampas and Sacculina carcini) and within Chelicerata the same gene has not been found in two mites (Archegozetes longisetosus and Tetranychus urticae) and a pycnogonid (Endeis spinosa), although not all these surveys were based on high quality genome assemblies (see [26]).
The ftz gene, which evolved a role in segmental patterning in insects rather than specification of position, seems to be absent in the genome of the stick insect Timema cristina (order Phasmatodea; [27]; assembly tcristinae_2.1).However, since this is a finding from analysis of a single genome assembly, verification is needed.The other Hox gene with a changed function in insects, the paralogy group 3 gene zen, is present in most insects.Interestingly, zen appears to be lost completely from the genomes of two related flies, Epicampocera succincta and Thecocarcelia acutangulata, which are both within the dipteran family Tachinidae.Similar loss of zen may have occurred within some Chelicerata, where this gene is reported absent from the genomes of the mites Tetranychus urticae [28] and Metaseiulus occidentalis [29].Other cases of gene loss affect more recent duplicates.For example, zen has undergone tandem duplication in several lineages of insects and in some cases there has been secondary loss of derived zen duplicates (see Section 3.4).This includes a shared loss of the zen-derived ShxD gene in all Lycaenidae butterflies.

Splits, rearrangements and inversions in the insect Hox cluster
Even before the molecular cloning of Hox genes, it was clear that the mutant loci giving homeotic phenotypes in Drosophila melanogaster were located in two distinct complexes on chromosome 3: the ANT-C and the BX-C [30,31].Cloning revealed the ANT-C contains from lab to Antp (of the ancestral gene order), whereas BX-C contains the genes from Ubx to AbdB, with a 9.6 Mb gap between them.Splits have also occurred, at different positions, in other Drosophila species [17].In the mosquito Anopheles gambiae, the cluster is not split.The clear implication is that an unbroken Hox cluster is ancestral for this clade of Diptera and by implication (assuming a split cluster cannot reform into a complete cluster) ancestral for all insects, as it is for the Bilateria.In surveying the structure of Hox clusters across insects, therefore, we are not asking whether a split cluster is ancestral.Instead, we can ask whether there are particular intergenic regions where splits are more frequent evolutionarily, and conversely whether particular sets of Hox genes always stay together in evolution.Here we define intergenic regions as the genomic content between the homeobox sequences of the Hox genes, as current data do not allow us to identify the ends of every transcription unit.
Examination of 243 insect genomes reveals Hox cluster splits in many species (Fig. 1).For example, these include splits in the Hox clusters of Platycnemis pennipes (Odonata), Aelia acuminata (Hemiptera), Chrysoperla carnea (Neuroptera), Coremacera marginata (Diptera), Limnephilus marmoratus (Trichoptera) and in all Lepidoptera.In some cases, these splits lead to dramatic expansion in the overall size of the Hox cluster.For example, the lab, pb and zen genes in Platycnemis pennipes are located ~84 Mb from the rest of the cluster.Similarly, splits between Scr and Dfd and between pb and lab in Aelia acuminata, resulted in genomic distances of ~33 Mb and ~21 Mb between these genes, respectively.In Chrysoperla carnea the split occurs between Scr and Dfd and results in a distance of ~67 Mb, and in Coremacera marginata a distance of ~66 Mb separates zen and pb.The cluster split in Lepidoptera lies between lab and the rest of the cluster, and is present in every lepidopteran species analysed (124 species) including two representatives of the most basal family, Micropterygidae (Neomicropteryx cornuta and Micropterix aruncella).
The lab gene is found distal to the 'posterior' end of the cluster in most Lepidoptera (represented in Fig. 1 by the Silver-Y moth Autographa gamma).This repositioning is clearly a secondary event following 'escape' of the Hox gene from tight linkage to other Hox genes, since in Neomicropteryx cornuta (in the basal family Micropterygidae) the split has occurred but the repositioning has not.The finding that the lab gene is also distant from the rest of the Hox cluster in Trichoptera (caddisfly) genomes suggests this split probably occurred prior to the common ancestor of Lepidoptera+Trichoptera (Amphiesmenoptera).Interestingly, relocation of lab to a different chromosome was also found in two mosquito species, Aedes aegypti and Culex quinquefasciatus [32].
Similar cases of translocation or inversion have occurred in Odonata, Thysanoptera and Trichoptera, after splitting of the Hox cluster.This is the case for zen, pb and lab split from the rest of the cluster in Platycnemis pennipes (Odonata), Thrips palmi (Thysanoptera) and Limnephilus marmoratus (Trichoptera).Other inversions of genes occur unrelated to splits, for example, Dfd is inverted in Coremacera marginata and Nephrotoma flavescens (Diptera) and Neomicropteryx cornuta and Autographa gamma (Lepidoptera) (Fig. 1).
While splits have occurred frequently in insect evolution, the overall genomic order of Hox genes in insects is comparable to that seen for the homologous genes in vertebrates.This represents the colinear correspondence between gene order and the body position where each gene is expressed and functional during early embryonic development, for those Hox genes that still play this role (Fig. 2).Although we do not find clear cases of shuffling this order when the genes are together in a single intact cluster, there are cases of rearrangement caused by cluster breakage, in some cases involving inversions.Interestingly, these changes are almost always associated with paralogy groups 1-4: lab, pb, zen and Dfd (Fig. 2).The rearrangements found affect these four Hox genes in different ways.In each case of gene, or gene block rearrangement, there is a link between splits in the Hox cluster and subsequent rearrangement events within insect orders.
In some insect orders, there are rearrangements in all species sampled; for example, five Trichoptera (caddisfly) species have pb, zen and Dfd in derived positions.In all four species in the family Limnephilidae, pb, zen and Dfd are located at the 'posterior' (AbdB) end of the cluster, with an inversion in two species, Limnephilus marmoratus and Limnephilus rhombicus.In Eubasilissa regina (family Phryganeidae), pb, zen and Dfd are found outside the cluster, upstream of lab.Similarly, in all Lepidoptera (butterflies and moths; 124 species) lab is found away from the rest of the Hox cluster.The rearrangement of lab away from the rest of the Hox cluster was noted previously in the silk moth Bombyx mori [18,33]; the higher quality genome assemblies now available confirm that the lab gene is usually located at the 'posterior' end of the lepidopteran Hox cluster, separated by a large distance (from 1.4 Mb in Tinea semifulvella to 24 Mb in Phalera bucephala) containing numerous non-Hox genes.
In two of the three Odonata (dragonfly and damselfly) species analysed, lab, pb and zen are rearranged, but Dfd is in its ancestral position in the cluster.In the white-legged damselfly Platycnemis pennipes there has been an inversion that switched the order and transcriptional orientation of these genes as a block, and in the blue-tailed damselfly Ischnura elegans there has been an inversion plus a translocation to the other end of the cluster.In Plecoptera (stoneflies), Thysanoptera (thrips) and Neuroptera (lacewings and allies) various rearrangements are found.In certain species in these groups lab, pb, zen and Dfd have all been translocated to the 'posterior' end of the cluster, nearer to AbdB, with a subsequent inversion of this gene cassette in the plecopteran species Nemurella pictetii.In the only thysanopteran species in our dataset (Thrips palmi), an additional rearrangement resulted in lab positioned on a separate scaffold to the rest of the cluster and pb, zen and Dfd translocated to the posterior end.In Hemiptera, lab is located at the 'posterior' end of the cluster in Acanthosoma haemorrhoidale, although larger rearrangements affecting lab, pb, zen and Dfd in another hemipteran species (Diaphorina citri) has been observed [34].Diptera displays the largest number of rearrangements, with at least five different rearrangement events occurring across the tree, resulting in translocations of one or more of the lab, pb, zen and Dfd genes to the opposite end of the gene cluster.In Coleoptera, three species show translocation of zen copies outside of the Hox cluster, resulting from independent lineage-specific events.

Hox cluster size across insects
While splits and rearrangements in the Hox cluster occur frequently across insects, there are certain genes which have rarely been split apart in the insect genomes studied to date.For example, the three genes found in the Bithorax complex of Drosophila (AbdB, abdA and Ubx) are found in the same order, in all 243 insect genomes studied, although we note that a cluster split between Ubx and abdA occurred in a clade of Drosophila [17].Within the set of genes corresponding to the ANT-C of Drosophila (Figs. 1-2), the genes Antp, ftz and Scr are most conserved in their organisation and orientation.To our knowledge, there are no known cases of split between these genes, indicating there may be a selective pressure to maintain their linkage.Indeed, overall there are relatively few cluster splits between AbdB and Scr.
When the intergenic distances between each pair of genes (measured as the distance between homeobox sequences of the Hox genes) are compared between insect orders, a very intriguing pattern emerges (Fig. 3A).Excluding the first three Hox genes located at the 'anterior' end of the cluster (lab, pb and zen), which underwent significant rearrangements in many different species, we see that the intergenic distances between the next four genes (Antp, ftz, Scr and Dfd) are consistently small.These four 'tightly linked' genes are all orthologues of the ANT-C genes of Drosophila melanogaster.In contrast, the intergenic distances between the three orthologues of the BX-C genes (AbdB, abdA and Ubx) are consistently larger.This trend is most easily seen when the distances are compared within an insect order, and is seen regardless of whether the insect order has more or less 'relaxed' organisation of the Hox cluster (Fig. 3A).This may imply that there is a deep and fundamental difference between Hox gene organisation between ANT-C and BX-C genes, dating to long before the homeotic complex split in Drosophila.Interestingly, the intergenic distance between Ubx and Antp in most insects (the position of the BX-C/ANT-C split in Drosophila melanogaster) falls into the range of the BX-C intergenic distances, even in gene clusters that are not split.
The relative conservation in gene order and organisation from AbdB to Scr across all orders provides a useful opportunity to compare the evolution of the Hox cluster size across insects.The size of this conserved core region of the Hox cluster ranges from 0.57 Mb (Common Plume moth Emmelina monodactyla; Lepidoptera) to 5.8 Mb (Tachina fera; Diptera).These genomic distances are much larger than the same region in vertebrates where whole Hox clusters are only ~0.1 Mb [35,36].Odonata, Hemiptera and Trichoptera have consistently large sizes for Fig. 1.Genomic organisation and gene orientation across insect Hox clusters (A) Left shows the phylogeny for subset of species analysed.Hexapod orders, from top to bottom are: Collembola (grey), Odonata (dark purple), Plecoptera (light purple), Orthoptera (dark pink), Phasmatodea (light pink), Psocodea (brown), Hemiptera (dark orange), Thysanoptera (light orange), Hymenoptera (dark red), Neuroptera (light red), Coleoptera (dark green), Diptera (light green), Trichoptera (dark blue), and Lepidoptera (light blue).Dots on the phylogeny represent hexapod orders for which data are shown from more than one species.Right shows the order and transcriptional orientation of Hox genes (coloured as per the legend) in each species.Splits within the Hox cluster are denoted by double orange lines, inversions are annotated with a black border around the gene.Slanted double black lines represent translocation to a separate scaffold.(B) Structure of the Hox cluster per species shown using actual genomic distances.Each line represents a Hox gene as it occurs in the genome, coloured as per the legend.Genomic distances are shown in Megabases.this core cluster, reflecting large intergenic distances, while Coleoptera and Diptera each show great variation in cluster size across the order (Fig. 3B).For most insect orders, the size of the core part of the Hox cluster correlates with genome size (Fig. 3C).However, Diptera and Lepidoptera both show low correlation values (r = 0.37 and 0.27, respectively), suggesting that there are other factors driving the size of the Hox cluster other than genome size in these groups of insects.
The size of the Hox cluster in Schistocerca piceifrons (Orthoptera; assembly iqSchPice1.1) is significantly expanded with larger distances between all genes, relative to most other insects, suggesting relaxation of the constraints acting on the whole cluster (Fig. 1).When we compare cluster size across other Schistocerca species (Schistocerca americana; iqSchAmer2.1 and Schistocerca gregaria; iqSchGreg1.1), the total size of the Hox cluster ranges from 16 Mb to 17.8 Mb, and the 'core' Hox cluster size (AbdB to Scr) ranges from 10.8 Mb to 12.2 Mb, significantly larger than any other insect species analysed in this study (Fig. 3B).This contrasts with earlier (pre-genomic) analysis in Schistocerca gregaria, where the total cluster size was determined using chromosomal in situ hybridization and estimated to be at least 700Kb in length, and no longer that 2 Mb in total [11].Although linkage in the Hox cluster in this genus has relaxed significantly, there are no rearrangements found in the order of the Hox genes within the genome.

Tandem duplication of insect Hox genes: Zerknüllt and fushi tarazu
Tandem duplication within a Hox gene cluster is rare, with some of the clearest examples being the initial expansion of the Hox cluster in early bilaterian evolution [37,38] and expansions at the 'posterior' end of the cluster in vertebrates, amphioxus and echinoderms [15,39].In analysing publicly-available insect genomes, we find only two cases of putative tandem duplication of a canonical Hox gene (Fig. 4): two copies of Dfd present in Acronicta aceris (Sycamore Moth) and two copies of pb present in Micropterix aruncella.While genomic position and gene trees provide support for these putative Hox duplicates as real events, since they are present in a single species each of the findings needs further verification.Indeed, it is expected that tandem duplications of canonical Hox genes would be deleterious since they could disrupt the spatial regulation of these genes, and thereby disrupt anteroposterior body patterning.In contrast, the two Hox genes that have derived roles, ftz and zen, might be expected to have less constraint against duplication.This is because the zen gene lost its ancestral homeotic function in early insect evolution, acquiring a novel role in the formation of extraembryonic tissues, while ftz has a new role in segmentation.
As noted in Section 3.1, a putative loss of ftz is observed in one insect, and conversely there are two ftz copies in Spilarctia lutea (Buff Ermine Moth), as well as two closely related wasps: Vespula germanica and Vespula vulgaris (Fig. 4).Finding the duplication in related species gives stronger support to this observation.Duplications of zen in insects have been known about and intensively studied for many years.First, there is a well-studied duplication of zen in Tribolium castaneum, which gave zen and zen2 [13].Recent work has shown that this duplication is shared by three closely related Tribolium species, and that the gene products interact in a negative feedback loop that may confer precision of temporal expression [40].Second, it was shown over 20 years ago that the dipteran bcd gene is a derived tandem duplicate of zen [41].This duplication was followed by extensive sequence divergence in the locus Fig. 2. Hox genes prone to rearrangement in insect Hox clusters Left shows a time calibrated species tree of insect Orders analysed in this study.Right shows the composition of the Hox gene cluster, in their ancestral order.Shaded orange regions represent genes that have undergone rearrangement from the ancestral order of Hox genes.Those conserved across species sampled have a border around the box.Splits in the Hox cluster are not depicted.

P.O. Mulhair and P.W.H. Holland
which became bcd, in a classic case of 'asymmetric sequence divergence' where one daughter gene undergoes far more sequence change than the other [42,43].Key amino acid changes in the homeodomain include a mutation from glutamine to lysine at position 50 (Q50K) and a switch from methionine to arginine at position 54 (M54R); these substitutions contributed to changing downstream targets and altered biological role [44,45].Third, a further duplication of zen within the Drosophila genus produced zen2.Fourth, the most dramatic cases of zen duplication have been reported in Lepidoptera.In ditrysian lepidopterans, the zen gene duplicated to give four additional fast-evolving copies named ShxA, ShxB, ShxC and ShxD [19].These highly derived Shx genes are expressed in the developing serosa, not the embryo itself, and may pre-pattern this extraembryonic tissue, as judged by the striking pattern of maternal RNA localisation [19].These genes duplicated even further in Bombyx mori resulting in at least 12, possibly 15, tandem Shx gene copies [18,19].Furthermore, recent work has shown that the extreme duplication in Bombyx is not unique: at least 18 other lineages of Lepidoptera have highly expanded sets of Shx genes, in some cases reaching over 100 homeobox copies (Fig. 4).There has also been occasional loss of specific Shx genes; for example, ShxD was lost in butterflies of the family Lycaenidae ('blues' and their allies) and fritillary butterflies of the genus Melitaea [46]).
With the availability of many high-quality insect genomes, it is now possible to ask if there are additional cases of zen duplication, in addition to those mentioned above (Fig. 4).Within the coleopteran species for which genomes are available, multiple duplications of zen occur in the Cucujiformia infraorder and range from 2 copies in Polydrusus cervinus (weevil) to 17 copies in Harmonia axyridis (harlequin ladybird) and 19 copies in Pyrochroa serraticornis (cardinal beetle).This is in addition to the well-studied duplication in Tribolium.In Diptera, copy number of zen ranges from a single copy in the early diverging lineages to 118 in Tachina fera and 93 in Sarcophaga variegata.Even within a family of flies for which there is a large number of species sampled, Syrphidae, there is significant variation in zen copy number between these related species (Fig. 4).The number of copies and the branching patterns within the gene tree (Fig. 4) suggest that large tandem duplication events occurred multiple times independently in this lineage.
It is striking that duplication of zen, and its progenitors (Shx in Lepidoptera) occur only in the highly speciose orders Diptera, Lepidoptera and Coleoptera, within the holometabolous insects.As described above, zen lost its homeotic function early on in insect evolution, and in many insect species is involved in development of extraembryonic membranes.In insects these membranes consist of two distinct layers: the amnion and serosa (these form a single epithelium known as the amnioserosa in higher flies) [47,48].The amnion is the inner membrane which surrounds the ventral side of the developing embryo, while the serosa is an outer membrane which lies just inside the chorion and envelops the embryo, amnion and yolk [49][50][51].This structure is hypothesised to be involved in a wide range of functions unrelated to development of body form, such as a general protective role including structural stability, water regulation and desiccation resistance [52][53][54], and innate immune response [55][56][57].Interestingly, while this dual structure is present in most pterygote insects, derived hymenopterans (Apocrita) usually lack an amnion, or have a temporary amniotic-like structure which covers the yolk [58,59].It is intriguing to consider whether the highly dynamic copy numbers of zen along with its functions in the extraembryonic tissues may have played a role in facilitating speciation and adaptation to diverse habitats in Diptera, Lepidoptera and Coleoptera.Indeed, it is striking that copy number variation of zen is particularly variable in highly speciose families such as Syrphidae and Coccinellidae (Fig. 4).However, whether these large expansions in gene number are functional, or even expressed during early development, requires further analyses.Furthermore, the neutral theory posits that increases in copy number may not always have  adaptive significance, and may instead result from mutational processes within the genome, affected by intragenomic variation in copying fidelity and the effects of transposable element accumulation.

Conclusions
We are entering a new era of genomics, as new technologies are facilitating the imminent sequencing and assembly of thousands of eukaryotic species.At the time of writing, there are more than 200 high quality, complete insect genomes available for analysis, and although this number is expected to rise very rapidly, now is an excellent time to pause and take stock of the lessons that can be learned.This is an opportune time for two reasons.First, the available high quality genomes span a wide phylogenetic diversity of insects, including representatives of at least 13 orders (Odonata, Plecoptera, Orthoptera, Phasmatodea, Psocodea, Hemiptera, Thysanoptera, Hymenoptera, Neuroptera, Coleoptera, Diptera, Trichoptera, Lepidoptera).Second, within some taxa (notably Lepidoptera and Diptera) 'deep dives' have been undertaken, yielding genomes from closely related species, thereby permitting insights into genomic change on shorter time frames.Here we have used these data, in combination with previously published analyses, to compare Hox gene cluster organisation across insects.We have searched for patterns of evolutionary conservation or general trends across insects, examples of convergent evolution, and anomalies.
First, we examine gene loss and conclude that canonical Hox genes have not been lost in insect evolution: pb, lab, Dfd, Scr, Antp, Ubx, abdA, AbdB are present in all insects studied.The two 'non-canonical' Hox genes, zen and ftz, are lost rarely.We find two closely related insect species putatively lacking zen, possibly a shared loss inherited from a common ancestor, and one example of a putative loss of ftz.The rarity of these losses highlights that further verification is needed.However, the finding that canonical Hox genes are never lost in insects has a biological implication.We suggest that each Hox gene has remained indispensable through insect radiation because segment number and tagmatization, has remained consistent, giving no opportunity for gene redundancy and loss.
Second, we find many independent cases of splitting of the insect Hox gene cluster, in an analogous fashion to the separation of ANT-C and BX-C in Drosophila melanogaster.Although these splits can occur in several different places in the Hox cluster, they are most commonly seen affecting the first four paralogy groups (PG1 to PG4): lab, pb, zen, and Dfd.There are cases where just lab (PG1) is split away (Lepidoptera and Trichoptera), one dipteran in which lab plus pb are separated away, Odonata with lab, pb and zen split away, and many insects with a split between Dfd and Scr (separating PG1 to PG4 from the rest).We do not find cases of complete 'atomisation' of the Hox cluster, as seen in larvacean chordates and predatory mites for example.It would be interesting to compare patterns of Hox cluster breakage and rearrangement with the overall genome-wide recombination and inversion rates for each taxa, to test if Hox cluster rearrangements reflect general genomic properties.From the pattern of splitting observed, we suggest that insect Hox genes are not generally regulated as a whole cluster, but there are selective pressures acting to prevent many rearrangements.These selective pressures could include shared regulation of neighbouring games, interdigitated control (enhancers for one Hox gene located beyond the neighbouring gene) or simply a high density of regulatory elements.We suggest these constraints are lowest around paralogy groups 1-4.We speculate that shared and interdigitated control may have evolved around 'posterior' insect Hox genes to fine-tune expression within overlapping domains in the abdomen.
Third, it has long been known that insect Hox gene clusters have much larger intergenic distances than in vertebrates.We find that intergenic distances in the Hox cluster vary greatly across insects, with particularly large genomic distances in Orthoptera, Odonata, Hemiptera and Trichoptera, and highly variable intergenic distances in Coleoptera and Diptera.Intergenic lengths correlate with genome size in most, but not all, insect orders.We note a striking and puzzling trend in intergenic distance within insect Hox clusters: the distances between 'posterior' genes are consistently greater than distances between each pair of 'central' or 'anterior' genes.Specifically, intergenic distances from AbdB to Antp are greater than intergenic distances across the rest of the cluster.We do not know the biological basis for this observation.Counterintuitively, the region with the largest intergenic distances is also the region least prone to genomic rearrangement in evolution.We suggest that fundamental mechanisms of gene regulation may be different at the two ends of the insect Hox gene cluster.
Fourth, we examine gene duplication and conclude that insect Hox genes are rarely duplicated, with the exception of zen.We do find putative cases of Dfd duplication and pb duplication, but these are seen in single genomes and require further verification.A ftz duplication is seen in genome assemblies for two wasps and can be treated as more definitive.The zen gene, in contrast, has undergone tandem duplication many times independently, undergoing dramatic copy number expansion in some insect lineages.The most striking examples of zen duplication are seen in genomes from the highly speciose orders, Coleoptera, Diptera and Lepidoptera, where over 100 zen-derived homeobox sequences can be present in some species.It is unclear why such dramatic copy number changes have occurred, and indeed whether retention of extra genes was selectively advantageous through subfunctionalization, neofunctionalization or simply dosage effects.The fact that zen genes play roles in extraembryonic patterning, rather than position-specific cell fate in the embryo, may underpin why tandem duplications are not instantly deleterious, but this does not seem to explain the preponderance of zen gene arrays observed.Further work is required to determine if the locus is particularly prone to unequal crossover at meiosis, and therefore a hotspot of mutation, and/or whether duplicated zen genes were repeatedly recruited to novel roles in extraembryonic membrane patterning as insects adapted to their multitude of ecological niches.

Fig. 3 .
Fig. 3. Hox cluster size across insects.(A) Distribution of intergenic regions per Hox gene in different insect orders.Only shows orders where the Hox genes are conserved in the 'normal' order.(B) Distribution of core Hox cluster (AbdB to Scr) size for each order, ordered by phylogeny.(C) Correlation between core Hox cluster size and genome size.Each dot represents a species, which are colour by order as per the figure legend.Dots are sized by the total number of Hox genes present in each species.In (A) and (B) boxplots have a rectangle between the 25th and 75th percentiles of the range, with the median as a dark line, whiskers reach the largest and smallest values within 1.5x Interquartile range, and outliers are points beyond 1.5 × Interquartile range (plotted using geom_boxplot in ggplot2).

Fig. 4 .
Fig. 4.Copy number of Hox genes across insects.Left shows the phylogeny of all species analysed, with block colours signifying the Order to which they belong.Heatmap shows the copy number of all Hox genes per species.Barchart of the right shows the total number of Hox genes annotated per species.

Table 1
Order, species and genomes used in this study.

Table 1 (
continued ) (continued on next page) P.O.Mulhair and P.W.H.Holland