Arthropod ILGF, Relaxin and Gonadulin, putative orthologs of Drosophila insulin-like peptides 6, 7 and 8, likely originated from an ancient gene triplication

Background Insects have several genes coding for insulin-like peptides and they have been particularly well studied in Drosophila. Some of these hormones function as growth hormones and are produced by the fat body and the brain. These act through a typical insulin receptor tyrosine kinase. Two other Drosophila insulin-like hormones are either known or suspected to act through a G-protein coupled receptor. Although insulin-related peptides are known from other insect species, Drosophila insulin-like peptide 8, one that uses a G-protein coupled receptor, has so far only been identified from Drosophila and other flies. However, its receptor is widespread within arthropods and hence it should have orthologs. Such putative orthologs were recently identified in decapods and have been called gonadulins. Methodology In an effort to identify gonadulins in other arthropods public genome assemblies and short-read archives from insects and other arthropods were explored for the presence of genes and transcripts coding insulin-like peptides and their putative receptors. Results Gonadulins were detected in a number of arthropods. In those species for which transcriptome data from the gonads is available insect gonadulin genes are expressed in the ovaries and at least in some species also in the testes. In some insects differences in gonadulin expression in the ovary between actively reproducing and non-reproducing females differs more than 100-fold. Putative orthologs of Drosophila ilp 6 were also identified. In several non-Dipteran insects these peptides have C-terminally extensions that are alternatively spliced. The predicted peptides have been called arthropod insulin-like growth factors. In cockroaches, termites and stick insects genes coding for the arthropod insulin-like growth factors, gonadulin and relaxin, a third insulin-like peptide, are encoded by genes that are next to one another suggesting that they are the result of a local gene triplication. Such a close chromosomal association was also found for the arthropod insulin-like growth factor and gonadulin genes in spiders. Phylogenetic tree analysis of the typical insulin receptor tyrosine kinases from insects, decapods and chelicerates shows that the insulin signaling pathway evolved differently in these three groups. The G-protein coupled receptors that are related to the Drosophila ilp 8 receptor similarly show significant differences between those groups. Conclusion A local gene triplication in an early ancestor likely yielded three genes coding gonadulin, arthropod insulin-like growth factor and relaxin. Orthologs of these genes are now commonly present in arthropods and almost certainly include the Drosophila insulin-like peptides 6, 7 and 8.


Introduction
Insulin may well be the best studied and perhaps the best known hormone, due to its essential role in the regulation of glucose homeostasis and its effective and widespread use to treat diabetes. Insulin is related to a number of other hormones with different functions, such as insulin-like growth factor, relaxin and Ins3. One of the interesting aspects of these hormones is their use of two structurally very different receptors, receptor tyrosine kinases (RTKs) and leucine-rich repeat G-protein coupled receptors (LGRs). Thus, whereas insulin and insulin-like growth factors (ILGFs) act through an RTK, relaxin and Ins3 use an LGR for signal transduction.
An intriguing question remains as to how this switch was made from one type of receptor to another, or alternatively whether the ancestral insulin used perhaps both types of receptors and during evolution its descendants became specific ligands for only one of the two receptors.
Like other hormones and neuropeptides, insulin was already present in the ancestral bilaterian that gave rise to both protostomes and deuterostomes. The first indication that an insulin-like peptide (ilp) might exist in protostomes was the observation that insulin enhances cell differentiation in cultured Drosophila cells (Seecof & Dewhurst, 1974). The identification of one ilp in the silkworm Bombyx mori that can break diapause (Nagasawa et al., 1984(Nagasawa et al., , 1986 and another one in neuroendocrine cells known to produce a growth hormone in the pond snail Lymnaea stagnalis (Smit et al., 1988) reinforced the hypothesis that insulin may functions as growth hormones in protostomes. Since then a large variety of invertebrates has yielded a still larger variety of ilps (e.g. Murphy & Hu, 2013;Mizoguchi & Okamoto, 2013;Nässel & Vanden Broeck, 2016;Yu, Han & Liu, 2020). In insects the ilps of Drosophila and the silkworm Bombyx mori have been extensively studied and these hormones are best known in fruit fly due to the genetic power that can be employed in this species. There are eight ilp genes in Drosophila melanogaster, which are referred to as Drosophila ilps 1-8. Drosophila ilps 1, 2, 3 and 5 are co-expressed in a single cell type of neuroendocrine cells of the brain (Brogiolo et al., 2001;Grönke et al., 2010). Of these ilp 2 seems to be the most important and it also seems to be expressed exclusively or predominantly in these brain neuroendocrine cells. Drosophila ilps 3 and 5 are also expressed in other tissues, e.g. ilp 3 is expressed in midgut muscle of both larvae and adults where its expression stimulates midgut growth in response to feeding (O'Brien et al., 2011). Drosophila ilp 1 has been shown to be expressed in the brain neuroendocrine cells, but its expression is largely limited to stages when the animal does not feed, i.e. metamorphosis and diapause (Liu et al., 2016). The expression of dilp 4 seems limited to the embryonic stage, while ilp 6 is expressed predominantly if not exclusively by the fat body (Slaidina et al., 2009;Okamoto et al., 2009b). All these ilps are believed to activate the single Drosophila insulin RTK, while Drosophila ilps 7 and 8 are either known (ilp 8) or suspected (ilp 7) to activate Drosophila LGRs 3 and 4 respectively (Vallejo et al., 2015;Gontijo & Garelli, 2018;Veenstra et al., 2012). Drosophila ilp 7 is expressed by neurons in the abdominal neuromeres in a sex specific manner (Miguel-Aliaga, Thor & Gould, 2008;Yang et al., 2008;Castellanos, Tang & Allan, 2013), while ilp 8 is expressed by the imaginal disks as well as the ovary and testes as shown by flyatlas (Chintapalli, Wang & Dow, 2007;Gontijo & Garelli, 2018).
The primary amino acid sequences of the Drosophila ilps vary considerably and this is also the case in other arthropod species that have multiple genes coding insulin-related peptides. There is not only large sequence variability within a species, but also between species. Only when species are relatively closely related is it possible to reliably identify orthologous genes in different species. However, while in most insects the A-and B-chains have thus quite variable amino acid sequences, this not the case for orthologs of Drosophila ilp 7. The strong conversation of the primary amino acid sequence of these peptides allows for easy identification of its orthologs, not only in other insect species, but also in other protostomes like various mollusks and even in some deuterostomes (Veenstra et al., 2012). The strongly conserved primary amino acid sequence of these peptides suggests that it interacts with another receptor than the other ilps, perhaps in addition to the RTK. As some ilps act through a G-protein coupled receptor (GPCR), it seemed a distinct possibility that Drosophila ilp 7 and their orthologs might also stimulate a GPCR.
Interestingly, genes coding LGR4 and its orthologs are present in the same genomes as those that have genes coding orthologs of Drosophila ilp 7. This holds not only for insects, but also other arthropods, mollusks and even some basal deuterostomes. Every genome that has a Drosophila ilp 7 ortholog also has a LGR4 ortholog and vice versa (Veenstra et al., 2012;Veenstra, 2014Veenstra, , 2019. Furthermore, LGR3 and LGR4 are holomologs of vertebrate LGRs that use ilps as ligands. This means that the ligands for the LGR4s must be the Drosophila ilp 7 orthologs. Since these peptides are so different from the typical insect neuroendocrine insulins, it made sense to give it a different name. Earlier work on Drosophila suggested that it might have a role similar to relaxin in vertebrates (Yang et al., 2008) and since LGR4 is an ortholog of the relaxin receptor (Veenstra, 2014), has also been called relaxin, but it might be better to call them arthropod or invertebrate relaxin.
Drosophila ilp 8 (for review see Gontijo & Garelli, 2018) is another ilp (for review see Gontijo & Garelli, 2018) that acts through a leucine-rich repeat GPCR, LGR3 (Garelli et al., 2015;Vallejo et al., 2015;Colombani et al., 2015). However, whereas Drosophila ilp 7 orthologs have well conserved primary amino acid sequences, this is not the case for Drosophila ilp 8. Indeed, if it were not for the common presence of LGR3 orthologs in insect and other arthropod genomes one might believe that this peptide hormone evolved within the higher flies and is absent from other insects. The imaginal disks in Drosophila produce and release ilp 8 as long as they develop and also when they get damaged. When it is no longer released this is used by the brain as a signal to initiate metamorphosis (Garelli et al., 2012;Colombani, Andersen & Léopold, 2012;Jaszczak et al., 2016). Drosophila ilp 8 is furthermore produced by the testes and ovaries (Chintapalli, Wang & Dow, 2007) and since imaginal disks are only present in holometabolous insects, it is tempting to speculate that the gonads are the original site of expression of orthologs of this peptide. I had previously suggested that the crustacean androgenic insulin-like peptide that stimulates premature sexual maturation in male crustaceans and can induce sex reversal in females, might be an ortholog of Drosophila ilp 8 (Veenstra, 2016b). However, more recently a fourth type of ilp was identified in two decapod species, that seem to be structurally more similar to Drosophila ilp 8 than the androgenic insulin-like peptides (Chandler et al., 2017). It has now been shown that these peptides, which have been called gonadulins, are generally present in decapods and commonly expressed by the gonads (Veenstra, 2020). Since gonadulins might be orthologs of Drosophila ilp 8 (Veenstra, 2020), it seemed worthwhile to look for this hormone in other arthropods. Analysis of arthropod genome and transcriptome sequences revealed that such peptides are not limited to decapods but are also present in insects as well as chelicerates.
During this analysis interesting new details of the putative orthologs of Drosophila ilp 6 were also encountered as well as evidence suggesting that the putative orthologs of Drosophila ilps 6, 7 and 8 arose from an ancestral gene triplication.

Materials & Methods
The sratoolkit (https://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?view=software) in combination with Trinity (Grabherr et al., 2011) was used in the search for transcripts encoding peptides that might be somewhat similar to insulin in insect gonad transcriptome short read archives (SRAs).
Once such transcripts had been found, it was often possible to find orthologs from related species.
Alternatively genes coding such transcripts were identified in genome assemblies using Artemis (Rutherford et al., 2000). For example, once the honeybee gonadulin was found, it was much easier to find it in other Hymenoptera. Details of the methods used have been previously described (Veenstra and Khammasi, 2017;Veenstra, 2019) and the only detail that is significantly different is the large -evalue option of 100 in the tblastn_vdb command from the sratoolkit. This implied that subsequently many sequences needed to be checked for possible gonadulin sequence similarity. The same method was used to identify relaxin and C-terminally extended ilps, which have much better conserved primary amino acid sequences and consequently are more easily identified, as well as their putative receptors. Whenever possible all sequences were confirmed in both genome assemblies and in transcriptome SRAs. In many cases transcripts for the various ilps and receptors were already present in genbank, although they were not always correctly identified. All these sequences are listed in Suppl. Spreadsheet 1.
Expression was estimated by counting how many RNAseq reads in each SRA contained coding sequence for each of the genes. In order to avoid untranslated sequences of the complete transcripts, that sometimes share homologous stretches with transcripts from other genes and can cause false positives, only the coding sequences were used as query in the blastn_vdb command from the sratoolkit. This yielded the blue numbers in Suppl. Spreadsheet 2. In order to more easily compare the different SRAs these numbers were then expressed as per million spots in each particular SRA. These are the bold black numbers in Suppl. Spreadsheet 2.
For the expression of alternative aILGF splice forms reads for each splice variant were first separately identified. Unique identifiers in these two sets were determined to obtain the total number of reads for aILGF. Those identifiers that were present in the initial counts for both splice forms were counted separately and subtracted from the initial counts of the two splice variants to obtain the number of reads specific for each isoform.

Phylogenetic and sequence similarity trees
For the phylogenetic tree of the insulin RTKs sequences were aligned with clustal omega (Sievers et al., 2011). Using Seaview (Gouy, Guindon & Gascuel, 2010) only well aligned sequences were retained and saved as a fasta file. Fasttree2 (Price et al., 2010) was then used to produce a phylogenetic tree using the ./FastTreeDbl command with the following options: -spr 4mlacc 2 -slownni. The phylogenetic GPCR tree was constructed in the same way using only the transmembrane regions.
The amino acid sequences of arthropod ilps are not well conserved and hence one can not make phylogenetic trees, as it is impossible to know which amino acid residues can be reliably aligned apart from the cysteines. Even the latter can cause problems when the spacing between is not the same in different peptides. To compare the different ilps an unbiased method is needed. For this clustal omega was used to align the complete precursor sequences and even though visual inspection with Seaview reveals very poor alignments, the alignment was not changed but saved as a fasta file and Fasttree was used with the same parameters as above to construct trees.
Although such trees are not phylogenetic trees they do allow for an unbiased comparison of the various sequences. The trees so produced have been called sequence similarity trees. Note that the branch probabilities of such trees give useful information as to how reliable the grouping of the various ilp precursors is.

Prediction of prepropeptides processing
Signal peptides were predicted using signal P-5.0 (Almagro Armenteros et al., 2019). No attempts were made to predict convertase cleavage sites in the gonadulin and aILGF precursors.
Both of these putative hormones are likely not made by neuroendocrine cells. This implies that these hormones may not be exposed to neuroendocrine convertases and hence rules that describe how these convertases might cleave would not be applicable.

Gonadulin-like peptides are present in many arthropods
Peptides that share the typical location of six cysteine residues with insulin but are insufficiently similar to known insect ilps to be easily recognized as such, were identified in a large number of arthropods. Their sequences differ not only from other ilps, but are also very variable between them. As a consequence they are difficult to find in genome assemblies, unless a sequence from a not too distantly related species is available. This explains why searches were most successful when done in ovary and/or testes transcriptomes. Putative gonadulin orthologs were identified not only in insects, but also in several chelicerates, notably spiders, a spider mite and scorpions. A list of gonadulin propeptides in representative species is given in Figure 1 and additional sequences are provided in Suppl. Spreadsheet 1. It is evident that although these peptides have been given the same name, their sequences diverge even more than the neuroendocrine arthropod insulins. When constructing a sequence similarity tree from the insect ilps the gonadulins are well separated from the relaxins and the other insect ilps. Interestingly even the insulins and aILGFs are reasonably well separated, except for the precursors from highly evolved flies (Drosophila and Glossina) and the head louse (Fig. 2). When this is repeated on sequences from a set of arthropod ilps, the gonadulins are once again well separated from the other insulin related peptides (Fig. 3, Fig. S1).
Publicly available transcriptomes were used to explore in which tissues they are expressed. By nature such data is imperfect, as these transcriptomes were not made to answer the question where gonadulin, other insulin-related peptides or their putative receptors are expressed and hence such data is limited. Some of the more salient examples are illustrated in Table 1. In honeybees the ovaries of virgin queens do not express significant amounts of gonadulin, but those of egg-laying queens produce it in large quantities. In a single queen bumblebee larva transcriptome the gonadulin reads are at least 15 times more numerous than those in the three transcriptomes each for male and worker larvae (Suppl. Spreadsheet 2). In the tsetse fly Glossina morsitans the gene is strongly expressed in ovaries of non-pregnant females, i.e. those that mature an egg, and hardly at all in pregnant/lactating females when egg maturation is arrested. In the bugs Rhodnius prolixus and Oncopeltus fasciatus the ovaries also express this gene, as do ovaries and testes of the stick insect Timema cristinae, while short read archives of unfecundated eggs from Blattella germanica similarly contain large amounts of gonadulin reads. In the termite Zootermopsis nevadensis, gonadulin reads are abundant in reproducing females but rare or absent in alate females or reproducing males. The gene is also expressed in the testis of the American cockroach and possibly in the ovary as well, since it can be detected in whole body transcriptomes from females (Suppl. Spreadsheet 2). However, as in decapods (Veenstra, 2020), in insects gonadulin expression is not limited to the gonads (Table 1; Suppl. Spreadsheet 2). In the spider Parasteatoda tepidariorum ovary expression of gonadulin varied significantly between different samples (Table 1), and even larger variability in gonadulin expression has previously been reported for the crab Portunus trituberculatus (Veenstra, 2020). This shows that data from a single SRA are not necessarily informative as to the level of gonadulin expression in this organ.

Arthropod insulin-like growth factors
Most insect ilps contain only a few amino acid residues after the sixth cysteine residue in the precursor and sometimes there are none, however some ilps have a long C-terminal extension.
Such ilps are commonly present in hemimetabolous insects as well as several holometabolous species (Fig. 4, Fig. S2 , Suppl. Spreadsheet 1). In some species the C-terminal extension of these peptides are easily missed since if one ignores an intron donor site in the genome sequence the conceptual translation of such sequences predicts much smaller ilps that look similar to the well known Drosophila peptides. Nevertheless, analysis of RNAseq SRAs from several species shows that such intron donor sites are functional. These C-terminal extensions are coded by two additional exons that are not present in the typical insect neuroendocrine insulin genes. These extensions are not only commonly present, but also look similar to one another, providing further evidence that they are genuine parts of these ilps (Fig. 4). Furthermore, Trinity analysis of SRAs containing reads for such transcripts reveals that they are alternatively spliced which leads to the production of precursors with different C-terminal extensions. The difference often consists of the inclusion or exclusion of a sequence rich in dibasic amino acid residues, mostly arginines, that in many species has two characteristic cysteine residues. The alternative splice site is in the middle in the first additional exon, the last coding exon of these genes is less well conserved but contains a sequence that conforms more or less to the GTVX 1 PX 2 (F/Y) consensus sequence. Such genes are present in species as diverse as cockroaches, termites, stick insects, beetles, bees, ants and moths (Fig. 5). Interestingly, in the stick insect Timema cristinae this gene underwent a local gene duplication, with one gene coding a peptide with the arginine-rich peptide sequence and the second one lacking it.
In hemiptera ilp genes exist that similarly code for C-terminally extended ilps that are alternatively spliced, but in those species the extended C-terminals of the predicted peptides are not as well conserved (Fig. S3). Other C-terminally extended ilps were found in spiders and scorpions, but in those species no evidence was found for alternative splicing. Such C-terminally extended ilps appear absent from decapods (Veenstra, 2020).
The term insulin-like growth factors was initially used as a description of substances in plasma that had insulin-like biological activity, but it is now mostly used as a name for the vertebrate hormones that are predominantly made in the liver. The use of the same term for both a group of molecules that have similar characteristics as well two specific hormones is confusing. This is particularly the case for insects, since hormones that have been called insulin-like growth factors are not necessarily orthologs of one another nor of these vertebrate hormones. One of the two types insect insulin-like growth factors, Bombyx IGF-like peptide (BIGFLP; Okamoto et al., 2009a), has only been found in Bombyx mori, although it can be expected to be present in other Lepidoptera as well. The insulin-like growth factor described above seems to be commonly present in arthropods, including Bombyx mori and I propose to call it arthropod insulin-like growth factor (aILGF).
The data from an extended set of Bombyx transcriptome SRAs shows that in this species both insulin-like growth factors are expressed by the fat body, but the temporal patterns of expression of the two differ. Thus, aILGF is also expressed in larvae, when there is very little expression of BIGFLP while during the pupal stage their peaks of expression do not coincide (Fig. 6). Alternative splicing of aILGF mRNA in the silkworm and the honey bee is variable between the different samples (Fig. 7, Fig S4). In Bombyx SRAs from adults show a relatively higher expression of the longer isoform that has the additional sequence sequence rich in dibasic amino acid residues (Fig. S5), while in the honeybee those are more common in SRAs from the bodies of larvae destined to become queens as well as in samples from the Nasonov glands from nurse bees and in the only available sample of queen heads (Fig. S4). In most species there are insufficient data or a single isoform may be predominantly expressed such as in the bumblebee (Suppl. Spreadsheet 2).

Is Drosophila ilp 6 an aILGF ortholog ?
Given the absence of prominent sequence similarity between Drosophila ilp 6 and the aILGFs from more basal insect species, the obvious question is whether or not Drosophila ilp 6 is an ortholog of aILGF or whether it arose independently. The archetype aILGF gene contains four coding exons, the first two of which code for the insulin structure. The third consists of two parts, the first half which is present in both isoforms and the last part of this exon which is alternatively sliced. The fourth and last coding exon contains the GTVX 1 PX 2 (F/Y) consensus sequence.
During evolution, this basic pattern has been modified on several occasions. In Oncopeltus and other hemiptera the third coding exon was modified, while in both the mosquito Aedes aegypti and the honeybee, the intron between the first two coding exons was eliminated. In Aedes other changes occurred as well, but the gene can still be recognized as an aILGF gene by the presence of an exon that shows sequence similarity to the fourth coding exon of the aILGF genes. Soldier flies and robber flies have aILGF genes that are complete except for the third coding exons. It thus seems that the third coding exon was lost when Diptera evolved (Fig. 8). However, whereas the aILGF genes in those flies can still be recognized as such by the presence of what once was the fourth coding exon of a classical aILGF gene, this exon is not only absent from Drosophila, but is missing from all Erenomeura (Fig. 8). There are two possible explanations for this absence, either this exon was also lost, or the entire aILGF gene was lost and a novel insulin-like growth factor-like gene evolved in those flies, perhaps not unlike the origin of BIGFLP in Bombyx. The amino acid sequences of the aILGFs from the most highly developed non-Eremoneura flies are very similar to those from the least evolved Erenomeura flies (Fig. S6). So the aILGF gene persisted but it lost the last coding exon reminiscent of a classical aILGF gene and the only physico-chemical characteristic that separates aILGF from the neuroendocrine insulins is the very short sequence connecting the putative A-and B-chains.

Arthropod relaxins
The arthropod relaxins are known from a large number of arthropods, although they are lacking in many insect species. As noted previously they have by far the best conserved amino acid sequences in both the A-and B-chains of all the protostomian ilps and paralogs of this particular peptide are also found in other protostomians and even some basal deuterostomes. Neuropeptides containing cysteine residues typically have them in pairs, since as soon a cysteine containing peptide enters the endoplasmatic reticulum these residues get oxidized and form disulfide bridges. For this reason it is very surprising to see a decapod relaxin having seven cysteine residues (Fig. 9). The presence of this particular cysteine is unlikely to be an artifact as it is found in relaxins from a number of decapod species from different orders (Fig. S7). Although most arthropods appear to have a single relaxin gene, two scorpion genomes have two such genes; in both cases the predicted amino acid sequence of one of the relaxins seems less conserved (Veenstra, 2016a).

Gonadulin, relaxin and aILGF likely evolved from a local gene triplication
The very large primary amino acid sequence variability of arthropod ilps makes it difficult to establish their phylogenetic affinities. Sequence similarity trees show that the gonadulins resemble one another and hence may share a common ancestor, but such trees do not provide details. Synteny offers another means to establish evolutionary relationships and, as chance has it, the genomes of some species suggests that aILGF, gonadulin and arthropod relaxin originated from an ancient gene triplication. Thus the three genes coding gonadulin, aILGF and relaxin are located next to one another in the genomes of the German cockroach and the termite Zootermopsis nevadensis. In the stick insect Timema cristinae the aILGF underwent a local duplication (see above) and the four genes coding the two aILGFs, gonadulin and relaxin, are also next to one another. Furthermore, in the genomes from the spiders Parasteatoda tepidariorum and Pardosa pseudoannulata the gonadulin genes are next to an aILGF gene, which in these spiders has undergone one or more local duplications (Fig. 10). Such a close genomic association of these genes strongly suggests that they arose from a local gene triplication.

Receptors
Two different types of receptors are used by ilps, LGRs and RTKs. Insect have two different types of insulin RTKs, that have been labeled A and B. Most species have one gene coding an type A and one gene coding for an B type B RTK. However, cockroaches, termites and at least some stick insects have two B type RTKs and the American cockroach has also two type A RTKs. The genome assembly of the American cockroach shows that the two type A RTKs originated from a local gene duplication, as did the two type B RTKs. In Lepidoptera and Diptera all insulin RTKs are type A and there is no type B RTK. However, Lepidoptera do have a second gene coding for a protein with clear structural similarity to insulin RTKs, but these proteins lack the tyrosine kinase domain. I have called those proteins Insulin RTK-like (IRTKL). This gene is more similar to the type B RTKs than type A and it may have evolved from those RTKs. The presence of an IRTKL coding gene in at least two genomes from Trichoptera, suggests that it may have evolved before the two groups diverged. Interestingly, the genome of the beetle Tribolium castaneum has genes for both a type A and a type B RTK, as well as for an IRTKL. No IRTKL orthologs was detected in any of other sequenced beetle genomes, but such an ortholog was found in the transcriptome data of Tenebrio molitor, a species that is closely related to Tribolium. A phylogenetic tree of the various RTKs and IRTKL proteins suggests that the origin of the IRTKL proteins in Lepidoptera is ancient, but that of Tribolium is quite recent, indicating convergent evolution (Fig. 11). In chelicerates there are also two types of RTK, while there are four in decapods (Veenstra, 2020), however the various gene duplications of the arthropod RTKs occurred after these groups diverged (Fig. 11).

Drosophila
LGR3 is activated by Drosophila ilp 8 and orthologs of this receptor are present in many but not all arthropods, always in a single copy. As described LGR4 is likely the arthropod relaxin receptor and is similarly present in many arthropods, usually in a single copy, but chelicerates have two copies and some spider have even three. LGR3 and LGR4 are closely related GPCRs together with another receptor that was first described from the pond snail Lymnaea stagnalis as GRL101 (Tensen et al., 1994) to stress its similarity with LGR3 and LGR4 it is called here LGR5. The strong sequence similarity of LGR5 with LGR3 and LGR4 suggests that it, like LGR3 and LGR4 might have an insulin-like ligand and for this reason it is included here.
LGR5 is commonly present in arthropods. Unlike LGR3 and LGR4, that are missing in many insect species, LGR5 is consistently present in hemimetabolous insects, but it is completely absent from all holometabolous insects (the GRL101 from Rhagoletis zephyria [XP_017487580] is appears to be a mite GPCR). In both chelicerates and decapods there are usually several paralogs (Fig. 12). All three of these receptors are widely expressed and it is difficult to discern clear expression patterns (Suppl. Spreadsheet 2).

Discussion
I describe a number of novel arthropod ilps, the gonadulins and aILGFs, that are putative orthologs of Drosophila ilp 8 and ilp 6 respectively and I present evidence that the genes coding these peptides and relaxin are commonly present in arthropods and most likely originated from an ancient gene triplication.
There are four arguments that together suggests that the various gonadulins and Drosophila ilp 8 are indeed orthologs. First, the gonadulins cluster together with Drosophila ilp 8 on on a sequence similarity tree that bundles peptides with similar structures. It is clear that even though the amino acid sequences of the gonadulins are poorly conserved, they do resemble one another better than each one of them resembles any of the better known arthropod ilps. Secondly, for those members of this group where this could be determined, they are all made by the gonads, although this is not the only tissue expressing these peptides and expression by the gonads is variable. Thirdly, all species for which a gonadulin could be identified also have an ortholog of Drosophila LGR3, even though not all arthropod species have such a receptor. Unfortunately, due to the large structural variability of the gonadulins, it was not always possible to demonstrate the existence of a gonadulin gene in each species that has an ortholog of LGR3. Nevertheless, no gonadulins were found in species lacking such a receptor. Finally, peptides that were identified as putative gonadulins, but that are present in species as distantly related as spiders on one hand and stick insects and cockroaches on the other, are produced by genes for which orthology is independently confirmed by synteny with aILGF genes.
The physiological significance of the presence of both aILGF and BIGFLP in Bombyx is an intriguing question, perhaps even more so as there seems to be only a single insulin receptor in this species that might induce an intracellular response. As shown by the data of an impressive set of transcriptome SRAs from this species, expression of the two differs during development (Fig.   6). Although the physiological meaning of this remains to be investigated, it is tempting to speculate that it is related to oogenesis and metamorphosis taking place simultaneously. In most insect species juvenile hormone and ecdysone play preponderant roles in the regulation of both processes, but using the same hormones for two different processes at the same time might be counterproductive and may have led to the evolution of BIGFLP.
It has been reported previously that, unlike Drosophila, most insect species have more than one insulin RTK (Kremer, Korb & Bornberg-Bauer, 2016). This is confirmed here and in the American cockroach there are actually four insulin RTKs. The presence of three insulin RTKs in termites has led to the suggestion that they might be involved in caste determination (Kremer, Korb & Bornberg-Bauer, 2016). Although this may be so, it is somewhat surprising in this context that the American cockroach, a close relative, has even four such receptors, yet has no caste system. In Lepidoptera and at least two tenebrionid beetles one of the putative insulin receptors lacks the tyrosine kinase domain, like the vertebrate ILGF2 receptor and the Drosophila decoy insulin receptor (Okamoto et al., 2013). The Drosophila receptor also lacks the transmembrane domaine and is released into the hemolymph. These receptors may well have similars role in regulating the hemolymph ilp concentrations.
In some insects the expression of insulin RTKs have been studied sometimes together with the effects of inactivating one or both insulin RTKs by RNAi (Wheeler, Buck & Evans, 2006;Sang et al., 2016;Okada et al., 2019). Interestingly, in honeybee larvae that are changed from worker food to royal yelly, that is rich in proteins, it is the RTK A that is upregulated (Wheeler, Buck & Evans, 2006), whereas in two beetle species the effects of RTK inactivation by RNAi is much stronger for type A than for type B (Sang et al., 2016;Okada et al., 2019). In the honeybee and the beetle Gnatocerus these receptors have been indirectly linked to an aILGF peptide. In Diptera and in the head louse there is only a single RTK, which in both case are also type A and this is also the type that is most abundantly expressed in insects. This suggests that the A type insulin RTK is the more important than the B type RTK.
The structural difference between neuroendocrine ilps and aILGFs, as illustrated by the large Cterminal extensions absent in the former but present in the latter, suggest the use of different receptors. Indeed, the similarities between insulin and vertebrate ILGF on one hand and insect neuroendocrine ilps and aILGF on the other, are striking. Thus whereas the liver is the major tissue expressing ILGF, in insects aILGF is expressed by the fat body, the tissue that performs the functions of both the vertebrate liver and adipose tissue. Furthermore, both ILGF and aILGF are C-terminally extended insulin-like molecules and in both cases the primary transcripts produced are alternatively spliced (Roberts et al., 1987). Finally, insulin and ILGF act on two RTKs and many insects also seem to have two insulin RTKs, although structural similarity is insufficient proof that these are insulin receptors, as illustrated by the insulin receptor-related receptor that functions as an alkali receptor (Deyev et al., 2011). Nevertheless, this suggests that the Cterminal extension of aILGF may allow for the differential activation of two different types of insulin RTKs and the presence of only one insulin receptor in Diptera may well explain why Drosophila ilp 6 has lost this C-terminal extension. Furthermore, it is perhaps no coincidence that the ilps that did not separately clearly into either an aILGF or an insulin branch on the sequence similarity tree (Fig. 2) are from Drosophila, Glossina and Pediculus. These are species that have only one insulin RTK and thus there would be no physiological need to maintain different molecular structures for peptides in order to preferentially activate one or the other of the two insulin RTKs.
The close genomic association of the gonadulin, relaxin and ILGF genes in some arthropod genomes suggest that they originated from an ancient gene triplication. Although the primary amino acid sequences of the different gonadulins is limited, as discussed above, there is reason to believe they are also orthologs and the same holds for the various aILGFs. The Drosophila gonadulin (dilp 8) receptor has been shown to be Drosophila LGR3, while LGR4 must be an arthropod relaxin receptor. As demonstrated by the sequence similarity of their transmembrane regions, these LGRs and LGR5 are evolutionary cousins. Insect aILGFs on the other hand are known to stimulate RTKs. When neuropeptide genes undergo a local duplication the ligands they encode either keep using the same receptor or use paralogs that have their origin in a gene duplication of the receptor. The gene coding aILGF that uses an RTK is flanked on both sides by genes that code LGR ligands. The only reasonable explanation is that the original gene coding for an insulin-like peptide (the one that got triplicated) used both types of receptors, i.e. it had two receptors, both an LGR and an RTK.
It is clear that during evolution at least holometabolous species no longer have such a GPCR, as these species have at most only two such receptors, LGR3 and LGR4 (for gonadulin and relaxin respectively), while some species have none. Nevertheless, it is interesting to note that the hemimetabolous insects have LGR5, a receptor that is evolutionarily closely related to the receptors for gonadulin and relaxin. Thus LGR5 could be a second receptor for aILGF. If this were so, then the remarkable absence of LGR5 from holometabolous species might be related to the switch to holometaboly itself. Maggots and caterpillars undergo essentially linear growth, while in non-holometabolous species, growth is accompanied by development at the same time.
Perhaps, LGR5 stimulated by aILGF is responsible for this.
Arthropod relaxin has an primary amino acid sequence that is much better conserved than that of the typical arthropod insulins or ILGFs. As arthropod relaxin shares what seem to be the structural characteristics common to both aILGF and neuroendocrine insulins, it seems plausible that relaxin can also stimulate the insulin RTK. It is of interest in this respect that Drosophila relaxin, ilp7, binds to the decoy insulin receptor as well other Drosophila ilps (Okamoto et al., 2013) and previous work suggested that it acts through the Drosophila RTK (Linneweber et al., 2014). When there is only a single receptor for a ligand, both can co-evolve and over time structures of both the ligand and its receptor may change. However, when the ligand activates two different receptors, all three components have to coevolve and one would expect that this would restrain the structures of all three elements much more than when there is only a single receptor and such restraints would be the strongest on the ligand that activates both receptors.
This might explain why arthropod relaxin is so well conserved.
It is tempting to speculate that orthologous genes have similar expression patterns and functions.
In related species this is a reasonable hypothesis and the observed expression of aILGFs in cockroaches (Castro-Arnau et al., 2019) and beetles (Okada et al., 2019) is predominantly in the fat body and this seems to be the case for Bombyx aILGF too. What is likely the Rhodnius aILGF is also expressed by the fat body (Defferrari, Orchard & Lange, 2015) as is Drosophila ilp 6 (Okamoto et al., 2009b). This suggest that within insects these peptides have the same function. It is notable however, that at least in Pardosa spiders the fat body does not express aILGF but a specific insulin , while aILGFs expression is limited to the cephalothorax. This suggests that in spiders aILGFs are expressed in the brain, is indeed observed in the spider Stegodyphus dumicola (Suppl. Spreadsheet 2). In decapods, which are more closely related to insects than chelicerates, no aILGFs were found (Veenstra, 2020). The phylogenetic tree of the various arthropod insulin RTKs also shows that the various paralogs of this receptor are not direct orthologs of one another, but must have evolved independently in each subphylum or even class.
This within arthropods the functions of the various insulin-like peptides may be significantly different. It suggests that the apparent resemblance between insect neuroendocrine insulins and aILGF on one hand and insulin and ILGF on the other could reflect a case of convergent evolution rather than one of orthology.
In the beetle Gnatocerus cornutus it has been shown that aILGF specifically stimulates the growth of a sexual ornament (Okada et al., 2019), while higher levels of aILGF are observed in honeybee larvae that are destined to become queens and thus develop functional ovaries (Wheeler, Buck & Evans, 2006). In Gnatocerus aILGF release depends on nutrition status and in honeybees protein rich royal yelly is associated with in increase of aILGF. Although we don't know as much detail for Blattella aILGF, its expression is strongly inhibited during starvation  (Castro-Arnau et al., 2019). This suggests that in insects aILGF is released by the fat body in response to nutritious food.
The physiological function of gonadulin is less clear. Insulin and related peptides typically stimulate growth and reproduction, so its presence in the ovaries and testes suggests a function in reproduction. Its presence in unfecundated eggs of Blattella suggests that within the ovary it are the oocytes themselves that express gonadulin, likely the follicle cells that in Drosophila have been shown to express Drosophila ilp 8 (Liao & Nässel, 2020). In the crab Portunus trituberculatus gonadulin expression is on occasion very high in the gonads (Veenstra, 2020), but the very variable degree of expression makes it difficult to see this hormone as merely stimulating reproduction. The expression of gonadulin in hematopoetic tissue and the anterior proliferation center of the brain in Procambarus clarkii (Veenstra, 2020), neither suggest a role limited to reproduction but hints at a more general role in promoting growth. Spider silks are proteins and thus its production requires plenty of amino acids, not unlike vitellogenesis, or the development and reparation of imaginal disks. Gonadulin secreted by these organs might therefore suggest that, not unlike insulin, it stimulates growth, but more intensely and/or more localized. Such an intensified stimulation of growth might be achieved by increasing not only the uptake of glucose as an energy substrate but also that of amino acids. Under this hypothesis, it might act as both an autocrine to stimulate uptake of metabolites and an endocrine to make these available and by doing so it might be able to stimulate growth of specific organs, such as imaginal disks and/or gonads that secrete it.

Conclusions
A local gene triplication in an early ancestor likely yielded three genes coding gonadulin, arthropod insulin-like growth factor and relaxin. Orthologs of these genes are now commonly present in arthropods and almost certainly include the Drosophila insulin-like peptides 6, 7 and 8. I gracefully acknowledge the contributions made by all those who not only produced but also made available the numerous SRAs that I analyzed here. This work would neither have been possible without the various programs employed .  551  552  553  554  555   556  557  558  559  560  561  562  563  564  565  566  567  568  569  570  571  572  573  574  575  576  577  578  579  580  581  582  583  584  585  586  587  588  589  590  591  592  Figure 1 Gonadulin sequence alignment.

Acknowledgements
Note the very large sequence variability of the gonadulin propeptides. Cysteine residues are indicated in red and sequences are from Suppl. Spreadsheet 1.

Figure 2
Insect ilp sequence similarity tree.
Note that the gonadulins and relaxins are well differentiated from the insulins and aILGFs, but that the latter are only partially separated from one another on the tree. Sequences are from Suppl. Spreadsheet 1.

Figure 3
Radial arthropod ilp sequence similarity tree.
Note that the gonadulins cluster and are well separated from the other arthropod ilps. A more detailed sequence similarity tree with species names is present in the supplementary data as  Sequence alignment of some insect aILGF propeptides; where applicable only the isoforms with the arginine-rich sequences are shown. Note that these proteins have three different regions with sequence similarity: the insulin subsequence, the arginine-rich sequence containing an additional disulfide bridge and the C-terminal parts (underlined in blue), that is also somewhat conserved in Gryllus rubens, even though that species lacks an arginine-rich subsequence. Sequences are from Suppl. Spreadsheet 1; cysteine residues are indicated in red and the predicted disulfide bridges by lines. Figure 5 Intron-exon structures of aILGF genes.
(A) Bombyx mori, (B) Tribolium castaneum, (C) Apis mellifera, (D) Blattella germanica. In the top part for each species the gene structure with the exon represented as small colored boxes and the introns as a black line. The orange brown color in the first exon corresponds to the coding sequence of the signal peptide and the blue corresponds to the insulin sequence, while the red color corresponds to the coding sequence of the arginine-rich region that is alternatively spliced in the two isoforms produced from these genes. The yellow exons contains coding sequence for the GTVX 1 PX 2 (F/Y) consensus sequence. The amino acid sequence coded by this alternatively spliced DNA sequence is indicated. The numbers 1 and 2 show the coding sequences of the two mRNA species produced from these gene using the same colors as for the gene structures. Note that the structures of these genes are very similar, with the major differences being the size of the introns, some of which are very large, as indicated by interruption sign in the gene structures, and the loss of an intron in Apis. The only other notable difference is that in the honey bee the second transcript is produced in a different fashion and only consists of one coding exon. The alternative splice site in this species have been indicated together with how this results in either splicing or the inclusion of a stop codon in the mRNA.   it is on a transcript (GCGU01007956.1) and assumes that its structure is identical to those of its closest relatives.
Note how the relaxin propeptides are much better conserved than the either the gonadulins or the aILGFs. Also not that the exception is the single decapod relaxin that has also an additional cysteine residue. This is not a sequencing or other technical error, as the same residue is also found in several others decapods from different decapod orders (Fig. S7).
Cysteine residues are indicated in red; sequences are provided in Suppl. Spreadsheet 1 and from Veenstra (2020).

Figure 10
Synteny of arthropod ilp genes.
Schematic representation of relaxin, gonadulin and aILGF genes in three insect and two spider species. Genbank accession numbers are indicated below the species name.
Arrowheads indicate the direction of transcription. The numbers in Kbp indicate the distances between the coding regions of neighboring genes.

Figure 11
Phylogenetic tree of arthropod insulin RTKs.
Phylogenetic tree of various arthropod RTKs. Note that the decpods, chelicerates and insect RTKs evolved independently. Only branch probabilities of more than 0.900 have been indicated. Sequences are from (Veenstra, 2020) and others provided in Supp. Spreadsheet 1.

Figure 12
Phylogenetic tree of putative arthropod insulin GPCRs.
Phylogenetic tree based exclusively on the transmembrane regions of various decapod LGRs that might function as receptors for insulin-related peptides. Only branch probabilities of more than 0.900 have been indicated. Sequences are from (Veenstra, 2020) and others provided in Supp. Spreadsheet 1. Numbers are the number of gonadulin half reads per million in one or more transcriptome SRAs. This is a selection of the data from Suppl. Spreadsheet 2.