CLE peptide-encoding gene families in Medicago truncatula and Lotus japonicus, compared with those of soybean, common bean and Arabidopsis

CLE peptide hormones are critical regulators of many cell proliferation and differentiation mechanisms in plants. These 12-13 amino acid glycosylated peptides play vital roles in a diverse range of plant tissues, including the shoot, root and vasculature. CLE peptides are also involved in controlling legume nodulation. Here, the entire family of CLE peptide-encoding genes was identified in Medicago truncatula (52) and Lotus japonicus (53), including pseudogenes and non-functional sequences that were identified. An array of bioinformatic techniques were used to compare and contrast these complete CLE peptide-encoding gene families with those of fellow legumes, Glycine max and Phaseolus vulgaris, in addition to the model plant Arabidopsis thaliana. This approach provided insight into the evolution of CLE peptide families and enabled us to establish putative M. truncatula and L. japonicus orthologues. This includes orthologues of nodulation-suppressing CLE peptides and AtCLE40 that controls the stem cell population of the root apical meristem. A transcriptional meta-analysis was also conducted to help elucidate the function of the CLE peptide family members. Collectively, our analyses considerably increased the number of annotated CLE peptides in the model legume species, M. truncatula and L. japonicus, and substantially enhanced the knowledgebase of this critical class of peptide hormones.

CLAVATA3/Endosperm Surrounding Region-related (CLE) peptides belong to a class of cysteine poor, post-translationally modified peptides that are derived from a prepropeptide [1][2][3] . The mature CLE peptide is 12 to 13 amino acids long and those that have been structurally confirmed all possess a tri-arabinose moiety attached to a highly conserved hydroxylated central proline residue [4][5][6] . They act as hormone-like signals 7 and are perceived by class XI leucine-rich repeat receptor kinases 8 . They are also unique to plants, with the exception of CLE peptide-encoding genes of the cyst-knot nematode 9 , which were likely acquired from plants via horizontal gene transfer 6,10 . CLE peptides have roles in regulating stem cell populations of various plant organs 11,12 . Prominent examples include CLAVATA3 (CLV3) in the shoot apical meristem [13][14][15] , AtCLE40 in the root apical meristem [16][17][18] , a number of legume-specific CLE peptides that suppress nodule organogenesis 2,19 , and a sub-class of highly conserved CLE peptides that regulate vascular differentiation [20][21][22][23][24] . Those of the cyst-knot nematode are thought to have a role in establishing the pathogen's feeding site 25 .
Medicago truncatula and Lotus japonicus are model legume species that offer a number of molecular advantages to understanding aspects of legume development, as well as microbial and fungal symbioses 26 . However, only a few CLE peptide-encoding genes have been functionally characterised in these species to date. This includes LjCLE-RS1, LjCLE-RS2, LjCLE-RS3, MtCLE12 and MtCLE13, which are involved in nodulation regulation 2,5,[27][28][29] . Others include LjCLE7, LjCLE15, LjCLE19 LjCLE20, LjCLE24 and LjCLE29, that are up-regulated in response to phosphate and/or mycorrhizae 30,31 ; and MtCLV3 32 and LjCLV3 27,33 , the orthologues of the most thoroughly characterised CLE peptide-encoding gene, AtCLV3 15 . In M. truncatula, the likely orthologues of the Treachery Element Inhibitory Factor (TDIF) encoding genes, AtCLE41, AtCLE42 and AtCLE44 23,24 , have also been identified 3 .
Recent genomic and bioinformatic advances allow for the identification of entire peptide families. This is extremely helpful for comparable genomic studies and for advancing the important functional characterisation of individual peptide members. Here, we used a genome-wide approach to identify the complete CLE peptide-encoding gene families of M. truncatula and L. japonicus. Comparative bioinformatic approaches were used to assist in identifying orthologous genes between these, and other plant species, as well as in the categorisation and functional characterisation of these critical peptide-encoding genes.

Identification of CLE peptide-encoding genes in L. japonicus and M. truncatula. A thorough
genome-wide search of the M. truncatula and L. japonicus genomes was conducted to identify the complete CLE peptide-encoding gene families of these species. Multiple BLAST searches identified 52 and 53 CLE peptide-encoding genes in each of the two species respectively (Figs 1-3, Table 1). Initial BLAST and TBLASTN queries used sequences of known soybean and A. thaliana CLE peptide-encoding genes and prepropeptides 3 to ensure all genes of interest were captured. The resulting identified sequences were verified and false-positives removed from further analyses. Additional CLE peptide-encoding genes were identified by BLAST and TBLASTN reciprocal searches of the M. truncatula and L. japonicus genomes using the sequences identified in the initial searches. A number of the genes identified are reported here for the first time, with the nomenclature of the newly discovered genes consistent with previously identified CLE peptide-encoding genes (Figs 1-3, Table 1). A recent study published after our searches were conducted included 20 M. truncatula CLE peptide-encoding genes (Goad et al., 2016), but no nomenclature was given as species-specific analyses were not conducted. A complete listing of all CLE peptide encoding gene family members from M. truncatula and L. japonicus is provided in Supplementary  Table S1. MtCLE19, which has a premature stop codon very early in the prepropeptide (see Fig. 4). MtCLE34 is a likely pseudogene without a functional CLE domain. The signal peptide approximate location and CLE domain is shown on the consensus sequence.  Additional CLE peptide-encoding genes in both L. japonicus and M. truncatula were identified that contain multiple CLE domains; some of which are also reported here for the first time. These multi-CLE peptide domain encoding genes include LjCLE32, LjCLE33, LjCLE46 and LjCLE47 in L. japonicus; and MtCLE14, MtCLE22, MtCLE26 and MtCLE27 in M. truncatula (Fig. 3). LjCLE32 and LjCLE33 encode eight and nine putative CLE peptides respectively; MtCLE22 encodes four putative CLE peptides; MtCLE26 and MtCLE27 encode three putative CLE peptides; whereas all others contain seven putative CLE peptide domains ( Fig. 3a; Supplementary Table S1). Interestingly, these multi-CLE domain containing genes contain repeating motifs of 24 to 35 amino acids, with each motif having a consistent length within their respective prepropeptide, with the sole exception of LjCLE33 which has varying motif lengths (Supplementary Table S2).
Pseudogenes were also identified in both the L. japonicus and M. truncatula genomes. These genes include mutations where the CLE domain is not translated in frame, likely resulting in a non-functional gene. This includes the pseudogenes MtCLE34, which is annotated within the M. truncatula genome (Fig. 1, Table 1;   Table 1) and LjCLE48 are also unlikely to be functional (Fig. 6). These pseudogenes, and the genes containing multiple CLE-domains, were excluded from the sequence characterisation studies detailed below because they fail to align well with the more typical single-CLE domain sequences.
A BLAST search of the L. japonicus genome with the LjCLE34 nucleotide sequence (first reported by Okamoto et al. 27 ), identified two possible genes having two synonymous nucleotide changes that result in identical prepropeptides. These genes are located at chr3:27855838..27856107 and chr0:126894445..126894714, and interestingly, both are found within a larger predicted protein. It therefore appears that these two genes arose as a transposable element and subsequent duplication event, or they are the result of a genome sequencing error. Interestingly, the CLE domain of LjCLE34 is not located at the C-terminus of the prepropeptide but towards the centre, similar to that of AtCLE18, which has a C-terminal CLE-Like/Root Growth Factor/GOLVEN (CLEL/RGF/GLV) domain in addition to a CLE domain 34 . LjCLE34 shares some homology at the C-terminus with AtCLE18 which includes the region of the CLEL/RGF/GLV domain ( Supplementary Fig. S2).
CLE peptide-encoding genes of M. truncatula and L. japonicus are located across all chromosomes, with the greatest number located on chromosome two of M. truncatula (eleven) and chromosome three of L. japonicus (thirteen) ( Table 1). There are five CLE peptide-encoding genes of L. japonicus currently located on unassigned scaffolds ( Table 1). The CLE prepropeptides of both species vary in length, with the average single-CLE domain   (Table 1). Interestingly, the genes appearing directly in tandem within the L. japonicus genome share >50% amino acid sequence similarity, while only some of the tandem gene pairs in M. truncatula exhibit more than a 50% level of similarity (Supplementary Table S3).

Identification of orthologous CLE peptide sequences.
To identify gene orthologues of the M. truncatula and L. japonicus CLE prepropeptides, multiple sequence alignments were generated. Most orthologues were present in a 1:1 ratio between the two species ( Supplementary Fig. S3). When no orthologue was evident, further BLAST searches were conducted in an attempt to identify one. In some instances, this yielded additional CLE peptide-encoding genes. Subsequent multiple sequence alignments with the CLE prepropeptides of M. truncatula, L. japonicus, soybean, common bean and A. thaliana were constructed (data not shown) and used to identify additional CLE peptide-encoding genes. All orthologous sequences identified are shown in Figs 1 and 2.
A multiple sequence alignment of the prepropeptides of M. truncatula, L. japonicus, common bean and A. thaliana was used to construct a phylogenetic tree (Supplementary Fig. S3). Similar phylogenetic trees have been constructed using only the CLE domain of the prepropeptides; however, this domain is highly conserved and only 12-14 amino acids long, and hence alignments and trees constructed using only the conserved motif can be less informative. In contrast, the tree constructed here, using the entire prepropeptide sequences, allows for the identification of conserved residues within other domains that may relate to cleavage and other important facets of post-translational modification 2 .   Table S4). The CLE domain represents the functional peptide ligand, which is post-translationally cleaved and modified to 13 amino acids in AtCLV3 and LjCLE-RS1 4-6,35 . A total of 66% (L. japonicus) and 61% (M. truncatula) of the prepropeptides have an amino acid at the 13 th residue, with the remaining having a stop codon at position 13, and thus being only 12 amino acids long. In both species, the amino acid most commonly found at position 13 is arginine (Figs 1 and 2, Supplementary Fig. S4).
An arginine residue is found at the start of 83% of L. japonicus and 87% of M. truncatula CLE domains. Although less common, a number of CLE domains also begin with a histidine, and this is conserved between orthologues of different species. Three of the four peptides beginning with a histidine in A. thaliana are Tracheary Differentiation Inhibitory Factors (TDIF) that are involved in vascular differentiation 36 . L. japonicus and M. truncatula each have three CLE peptides beginning with a histidine (LjCLE26, LjCLE29 and LjCLE31, and MtCLE05, MtCLE06 and MtCLE37) that appear orthologous to the TDIF factors. However, they do not appear to have an orthologue of the functionally unrelated fourth CLE peptide of Arabidopsis to begin with a histidine, AtCLE46, and its putative soybean orthologue, GmCLE13 3 .
The most highly conserved CLE domain residues of M. truncatula are arginine at position one, glycine at position six and histidine at position 11, with all three present in 87% of the peptides (Fig. 1). Interestingly, the most conserved CLE domain residue of L. japonicus is histidine at position 11 (91%), with only three sequences having a serine at this position and one sequence having a glutamine (Fig. 2). Residues 1, 4, 6, 7, 9 and 11 are also highly conserved (>82%) in the CLE domain of both species (Figs 1 and 2, Supplementary Fig. S4). These residues are all considered critical for function except for the proline at position nine 37 .
Outside of the CLE domain there is little conservation within the L. japonicus and M. truncatula CLE prepropeptide families (Figs 1 and 2). However, the signal peptide, which is predicted to either export the entire prepropeptide or the cleaved propeptide outside of the cell 1,38 , contains a typical hydrophobic motif consisting of predominantly leucine and isoleucine (Figs 1 and 2). The size of the predicted signal peptide ranges from 19 to 43  (Table 1). Additionally, the truncated LjCLE5 prepropeptide has a predicted signal peptide cleavage site between residues 14 and 15 (Table 1).
Hastwell et al. 3 classified the CLE prepropeptides of soybean and common bean into seven distinct Groups (I to VII). The prepropeptides within each group show sequence conservation within and outside of the CLE domain. Based on the phylogenetic tree of the prepropeptides in L. japonicus, M. truncatula, A. thaliana and P. vulgaris, these groups remain conserved (Supplementary Fig. S3, Supplementary Table S5). This is especially evident with the Group VI CLE prepropeptides, which function in nodulation regulation, and Group III CLE prepropeptides, which show high sequence conservation with the Arabidopsis TDIF peptides, AtCLE41, AtCLE42 and AtCLE44 (Supplementary Fig. S3, Supplementary Table S5).

Identification of CLE40.
A well characterised peptide, AtCLE40, has been shown to act as the root paralogue of AtCLV3 to regulate the stem cell population of the root apical meristem [16][17][18] . Putative orthologues of AtCLE40 have been identified in M. truncatula, P. vulgaris and G. max (MtCLE39, PvCLE40, GmCLE40a and GmCLE40b 3 ). Interestingly, our BLAST searches using the L. japonicus genome failed to identify a CLE40 orthologue. However, a region on chromosome 3 (chr3:40213173..40213683) exhibits a very high level of sequence similarity to these CLE40 orthologues, in addition to having a similar genomic environment to them (Fig. 6). All previously identified CLV3 and CLE40 orthologues contain two introns. The putative L. japonicus CLE40 orthologue, identified here as LjCLE48, contains conserved predicted intron boundaries for the second intron, which correspond to the CLE40 orthologues, but there are no predicted boundary sites for the first intron. Given this critical change at the 5′ end of LjCLE48, it appears unlikely that the resulting prepropeptide would produce a functional peptide product. This may suggest that another CLE peptide has evolved to perform the function of CLE40 in L. japonicus. Nodulation CLE peptides. CLE genes in Group VI of soybean and common bean are known to respond to symbiotic bacteria, collectively called rhizobia, and act to control legume nodulation. The rhizobia-induced nodulation-suppressing CLE peptide encoding genes of L. japonicus and M. truncatula, known as LjCLE-RS1, LjCLE-RS2, LjCLE-RS3, MtCLE12 and MtCLE13 [27][28][29]39,40 , cluster with these Group VI members of soybean and common bean 3 . Interestingly, two additional CLE prepropeptides of unknown function, called MtCLE35 and LjCLE5, also group closely (Supplementary Fig. S3). Okamoto et al. 27 noted that LjCLE5 did not have a predicted signal peptide and that no expression could be detected. However, upstream of the previously predicted LjCLE5 start codon is another possible methionine (Fig. 5). The sequence following this alternative start codon corresponds closely with that of MtCLE12 (71.1% similarity), but the translation would result in a truncated protein prior to the CLE domain. Signal peptide prediction using SignalP (www.cbs.dtu.dk/services/SignalP/) suggests that there is a possible cleavage site at position 30 of the longer (but non-functional) LjCLE5. Interestingly, MtCLE35 contains the consensus sequence TLQAR, which is consistent with the nodulation-suppressing CLE peptides, whereas LjCLE5 does not. The functional analysis of MtCLE35 would be of great interest to the nodulation field.
In addition to having rhizobia-induced CLE peptides, soybean has an additional nitrate-induced CLE peptide, GmNIC1a, which acts locally to supress nodulation 39 . To date, no orthologue of GmNIC1a has been reported in L. japonicus or M. truncatula. Here, we used GmNIC1a and a BLAST search of the L. japonicus and M. truncatula genomes to reveal likely orthologous candidates (Supplementary Fig. S3). In soybean and common bean, NIC1 and RIC1 are located tandemly within the genome 39,40 . In L. japonicus, the putative NIC1 and RIC1 orthologues (LjCLE40 and LjCLE-RS2, respectively) appear in tandem with LjCLE-RS3 and are approximately 24 kb apart on chromosome 3. Interestingly, LjCLE40 was also recently found to be induced by rhizobia inoculation 29 . In M. truncatula, the predicted orthologue of NIC1 is MtCLE34, which is located tandemly on chromosome 2 with MtCLE35. However, a C > T mutation at base 148 of MtCLE34 results in a premature stop codon and thus the translated product of this gene is likely non-functional. Further investigations are required to determine if the product is indeed truncated.
The legume nodulation CLE peptides are most similar to AtCLE1-7 of A. thaliana, however no direct orthologues have been identified as A. thaliana lacks the ability to form a symbiotic relationship with rhizobia or arbuscular mycorrhizae 2 . A targeted phylogenetic analysis was utilised here to investigate whether there are specific A. thaliana CLE peptides within AtCLE1-7 that are more closely linked with the nodulation CLE peptides of M. truncatula, L. japonicus, P. vulgaris and G. max (Fig. 7). As expected, the rhizobia-induced CLE peptides form a distinct branch from the nitrate-induced CLE peptides of legumes, and not surprisingly, the A. thaliana CLE peptides AtCLE1-7 group closer to these nitrate-induced sequences. This finding further supports the distinction of Group VI made by .
Expression of CLE peptide-encoding genes of M. truncatula and L. japonicus. It would be of little biological relevance to apply the peptides identified here to plants without first understanding their structural modifications and location of synthesis. We therefore used an in-silico approach to further assist in the functional characterisation of these genes. Publicly available transcriptome databases of M. truncatula and L. japonicus were used to collect expression data of the CLE peptide-encoding genes. A meta-analysis was performed to determine if putative orthologues identified by sequence characterisation and phylogenetic analyses exhibited similar expression patterns (Tables 2 and 3). Some similarity was seen between the putative orthologues, but the number of currently annotated CLE-peptide encoding genes limited a more detailed analysis.
A number of putative orthologues identified in the phylogenetic tree ( Supplementary Fig. S3) showed similar expression trends across tissues, such as PvCLE25 3 and MtCLE08, which were both expressed in the root, nodules and stem (Table 2). LjCLE15 is expressed highest in the stem with lower expression levels found across all other tissue types and genes that group closely, MtCLE18 and PvCLE24, are expressed in both the stem and root, whereas AtCLE12, which also groups closely is only found in the root (Tables 2 and 3). MtCLE17 shares a similar expression pattern to PvCLE23, GmCLE23a and GmCLE23b 3 , being expressed across all tissue types except in seeds, with MtCLE17 also having notable higher expression in flowers than that of its putative orthologues, which shows little expression in the flower tissue (Table 2). MtCLE12 and MtCLE13 are currently the only functionally characterised M. truncatula CLE peptide-encoding genes, and the transcriptomic data for both genes is consistent with the literature 28 , being expressed in the nodules at different stages of development.
In contrast, some CLE peptide-encoding gene orthologues did not exhibit similar expression patterns within the transcriptomes according to the tissues and treatments available. PvTDIF1, GmTDIF1a and GmTDIF1b show high levels of expression across the different tissues 3 , with high root expression being of particular importance, as it is the only TDIF peptide-encoding gene to exhibit expression in the root. Their putative orthologues, AtCLE41 and AtCLE44 are also expressed in the root, in addition to other tissue types tested 3 , and M. truncatula orthologue, MtCLE06, shows no expression in the seeds and is only lowly expressed in the root. PvCLE29 was noted by Hastwell et al. 3 to have very high expression only in the flower. The putative orthologue LjCLE19, has previously been shown to respond in the root to phosphate treatment 30 and more recently mycorrhizae colonization 31 , which is also not consistent with the expression of PvCLE29 3 .

Discussion
The importance of peptides in plant development is becoming increasingly evident with an extensive number of peptides and peptide families being discovered 1 . CLE peptides are no exception, with confirmed roles in meristematic tissue maintenance, and abiotic and biotic responses; however, the precise function of most is yet to be elucidated. To assist in the discovery of novel CLE peptide functions, the entire CLE peptide family of two model legumes, M. truncatula and L. japonicus, was identified here. Our analyses increased the number of annotated CLE peptides from 24 to 52 in M. truncatula and from 44 to 53 in L. japonicus. These were subjected to a range of comparative bioinformatics analyses to create a resource that can be utilised for further reverse-genetics-based functional characterisation. Additionally, six multi CLE domain-encoding genes and a number of pseudogenes were identified across the two species.
The phylogenetic analysis conducted using entire families of CLE prepropeptides of M. truncatula, L. japonicus, A. thaliana and P. vulgaris shows strong groupings between those having a similar CLE domain and a known or predicted function. The gene clusters identified here are generally conserved with those identified by Hastwell et al. 3 , which were divided into seven groups (Group I -VII).
M. truncatula and L. japonicus have a similar sized genome (500 Mbp) and share a common ancestor ~37-38 MYA, which is more recent than their shared ancestry with P. vulgaris (~45-59 MYA) 41 . The number of CLE peptide-encoding genes present (52 and 53 respectively), is consistent with the number in the P. vulgaris genome, 46, and is roughly half that of G. max, which has 84 3 due to a more recent (~13 MYA) whole genome duplication event 42 .
The number of CLE peptide-encoding genes in the legumes is higher than that of A. thaliana, which has 32. This is predominately due to the absence of CLE peptide-encoding genes involved in symbioses between rhizobia (Group VI) or mycorrhizae 3,31,43 . The symbioses formed by legumes enable them to acquire nutrients that would otherwise be unavailable 44,45 . Nodulation control pathways are well characterised in M. truncatula and L. japonicus, beginning with the production of a CLE peptide 2,19,46 . However, a separate nitrate-regulated nodulation pathway identified in G. max has not yet been established in these two species. Here, a putative orthologue of GmNIC1 and PvNIC1, which responds to the level of nitrate in the rhizosphere to inhibit nodulation 2,39,40 , has been identified in M. truncatula. However, MtCLE34 is likely to be non-functional as a result of a truncation before the CLE domain. The putative orthologue in L. japonicus, LjCLE5, which has not yet been detected in gene expression studies, is likely to be non-functional as a result of a naturally-occurring insertion/deletion mutation. Further analysis is also needed to determine if MtCLE35 has a functional role in nodulation and if another gene in L. japonicus has gained the ability to regulate nodulation in response to nitrogen. Indeed, the latter is hinted towards by the ability of LjLCE-RS1 to be induced by both rhizobia and nitrate to control nodule numbers 2,27 .
Although A. thaliana does not enter into a symbiosis with either rhizobia or mycorrhizae, its genome contains orthologues to known symbiosis genes, such as AtPOLLUX 47 . However, our work indicates that no CLE peptide-encoding genes have yet been identified that show homology or synteny to the rhizobia-induced CLE peptides. It would be of interest to determine if such CLE peptide encoding genes previously existed, or exist but have been overlooked in A. thaliana due to being highly divergent from the symbiosis CLE peptides in legumes and other species.
Recent advances in genome sequencing, bioinformatics resources and the identification of entire CLE peptide families of soybean, common bean and Arabidopsis, have been utilised to capture the entire CLE peptide-encoding gene families of two important model legume species, M. truncatula and L. japonicus. Further characterisation of these CLE peptide-encoding genes revealed orthologues amongst the species, many of which appear functional, with some likely to be pseudogenes. The identification and genetic characterisation of these genes will benefit future studies aimed at functionally characterising these integral molecular components of plant meristem formation and maintenance.

Methods
Gene Identification. Candidate CLE peptide-encoding genes were identified in L. japonicus and M. truncatula using TBLASTN searches with known all CLE prepropeptides of G. max 3 , P. vulgaris 3 and A. thaliana 48 . The M. truncatula Mt4.0v1 genome was searched in Phytozome (https://phytozome.jgi.doe.gov/) 49,50 and the L. japonicus v3.0 genome was searched in Lotus Base (https://lotus.au.dk/). Initial searches were conducted with E-value = 10. The results were manually validated for the presence of a CLE peptide-encoding gene in an open reading frame. Orthologues were also identified using TBLASTN of newly identified CLE prepropeptide sequences where clear orthologous were not identified between M. truncatula and L. japonicus, using E-value = 1.
Hidden Markov Models (HMMs) were generated for M. truncatula and L. japonicus CLEs individually, using all full length prepropeptide sequences as input into HMMER3, respectively (www.hmmer.org). Next, based on the generated HMMs, jackHMMER (www.hmmer.org) was applied to iteratively search for CLE sequences in M. truncatula and L. japonicus protein databases using a bit score of 50.

Phylogenetic analysis.
Multiple sequence alignments were constructed as outlined in Hastwell et al. 3 .
Manual adjustments were made to some predicted sequences, particularly in regards to their start codon, based on similarity to duplicate genes, clustering genes, and/or likely orthologous genes. Multiple sequence alignments constructed without truncated or likely non-functional CLE prepropeptides were used to generate phylogenetic trees. The trees were constructed using methods described in Hastwell et al. 3 using 1,000 bootstrap replications in all cases, except for the tree constructed using the entire families of L. japonicus, M. truncatula, A. thaliana and P. vulgaris CLE peptides, which used 100 bootstrap replications. Where orthologues were not apparent, the genomes of L. japonicus and M. truncatula were re-searched in an attempt to identify a possible orthologue. Sequence Characterisation. The presence of a signal peptide encoding domain and putative signal peptide cleavage site of the CLE prepropeptides was identified using SignalP (http://www.cbs.dtu.dk/services/SignalP/) 51 . If no signal peptide was detected, the sequence was manually examined for an up-or downstream methionine, which could be the likely start codon. The modified sequence was re-entered into SignalP and a signal peptide was detected in most instances. Possible intron boundary sites were identified using the NetPlantGene Server (http://www.cbs.dtu.dk/services/NetPGene/) 52,53 and the nucleotide splice sites and resulting prepropeptides were compared with orthologous sequences. Sequence logo graphs of the CLE domain were generated using multiple sequence alignments in Geneious Pro v10.0.2 53 .
Genomic environments were established using five up-and down-stream annotated genes in Phytozome and Lotus Base (https://phytozome.jgi.doe.gov/; https://lotus.au.dk/) 49,50 . Orthologues of individual genes within the genomic environment lacking functional family annotations were identified using BLAST within and between the two databases.