Evolutionary relationships and expression analysis of EUL domain proteins in rice (Oryza sativa)

Lectins, defined as ‘Proteins that can recognize and bind specific carbohydrate structures’, are widespread among all kingdoms of life and play an important role in various biological processes in the cell. Most plant lectins are involved in stress signaling and/or defense. The family of Euonymus-related lectins (EULs) represents a group of stress-related lectins composed of one or two EUL domains. The latter protein domain is unique in that it is ubiquitous in land plants, suggesting an important role for these proteins. Despite the availability of multiple completely sequenced rice genomes, little is known on the occurrence of lectins in rice. We identified 329 putative lectin genes in the genome of Oryza sativa subsp. japonica belonging to nine out of 12 plant lectin families. In this paper, an in-depth molecular characterization of the EUL family of rice was performed. In addition, analyses of the promoter sequences and investigation of the transcript levels for these EUL genes enabled retrieval of important information related to the function and stress responsiveness of these lectins. Finally, a comparative analysis between rice cultivars and several monocot and dicot species revealed a high degree of sequence conservation within the EUL domain as well as in the domain organization of these lectins. The presence of EULs throughout the plant kingdom and the high degree of sequence conservation in the EUL domain suggest that these proteins serve an important function in the plant cell. Analysis of the promoter region of the rice EUL genes revealed a diversity of stress responsive elements. Furthermore analysis of the expression profiles of the EUL genes confirmed that they are differentially regulated in response to several types of stress. These data suggest a potential role for the EULs in plant stress signaling and defense.


Background
Proteins of non-immune origin with at least one noncatalytic domain that can recognize and reversibly bind to specific carbohydrate structures are referred to as 'lectins'. The occurrence of carbohydrate-binding proteins in all kingdoms of life illustrates their importance. Indeed, the specific interaction between lectins and their corresponding carbohydrate partner(s), either occurring as a free ligand or as part of a glycoconjugate, mediates a multitude of biological processes. These interactions can relay cellular signaling and are of utmost importance for defense reactions, stress signaling, growth and development.
In plants, lectins are divided into 12 families of structurally and evolutionary related proteins based on the presence of a conserved carbohydraterecognition domain . Like animals, plants can experience several forms of stress due to e.g. environmental conditions or biological agents. Because of their sessile lifestyle, plants cannot move away from these stresses, and therefore have developed a sophisticated system to recognize the different stressors and initiate a specific response to this stress. It was shown that carbohydrate-binding proteins play a pivotal role in plant defense (review De Schutter and Van Damme 2015).
In addition to the structural classification, plant lectins can also be subdivided in 2 classes depending on their expression pattern. The first class groups all lectins that are constitutively expressed. These lectins are usually present at high concentrations in specific cells and organs (e.g. seeds and specialized vegetative tissues), suggesting a dual role as a storage protein involved in plant defense. The second class groups all lectins which are present at a low basal level but when plants are exposed to biotic or abiotic stresses, the expression of these lectins is significantly upregulated. At present, the expression of at least 6 different lectin domains from plants has been shown to be stress regulated. Most of these stress related lectins locate to the cytoplasm and the nucleus of plant cells (Lannoo and Van Damme 2010).
The EUL family, grouping all proteins that show homology to the Euonymus europeaus agglutinin (EEA) (Petryniak et al. 1977;Fouquaert et al. 2008), belongs to the group of stress related lectins. In contrast to many other lectin families that occur only in some plant families, the EUL family represents a group of nucleocytoplasmic proteins that are found throughout the plant kingdom, suggesting they fulfill an essential role in plants (Fouquaert et al. 2009). The genome of Arabidopsis thaliana harbours only one EUL gene, referred to as ArathEULS3. The expression of this gene is upregulated after exposure to some plant hormones, drought and salt stress as well as Pseudomonas syringae infection (Li et al. 2014;Van Hove et al. 2014). Furthermore, it was shown that changes in expression levels of the lectin are accompanied with altered levels of stress resistance. Overexpression of ArathEULS3 enhanced drought resistance (Li et al. 2014) and plants showed less disease symptoms after P. syringae infection compared to wild type plants or plants with lower transcript levels of the lectin gene (Van Hove et al. 2015).
Oryza sativa L., or Asian cultivated rice, is one of the most important food crops and staple foods. More than half of the world population is dependent on rice, it has shaped their diet, culture and economics. Rice breeding and cultivation has given rise to the existence of several subspecies. The sticky, short-grained japonica rice is grown in dry fields at higher altitudes in temperate environments, whereas the non-sticky, long-grained indica rice grows mostly submerged in lowlands in tropical and subtropical environments. Besides these morphological and agronomical differences, both rice varieties show distinct physiological and biochemical features. Amongst other, these features translate in differences in stress resistance Hu et al. 2014;Liu et al. 2010). Although the phenotypic differences between the two subspecies are well studied, the molecular mechanisms behind these traits are only poorly characterized. Since plant lectins play an important role in plant defense and stress signaling, we focus on the identification of lectins in rice.
In this paper we identified the putative lectin genes encoded in the japonica rice genome. An in-depth molecular characterization and analysis of the transcription profiles of the EUL family was performed under abiotic stress conditions. The strong conservation of the EUL lectin domain and the stress regulated transcription profiles suggest that this family of lectins fulfills an important role in plant stress signaling.

Results
Distribution and organization of lectin sequences in the O. sativa genome Lectin genes in the genome of O. sativa subsp. japonica were identified through extensive BLAST searches. Using the protein sequences for each of the representative members for the different plant lectin families, pFam identifiers and conserved domains, a total of 329 putative lectin sequences were identified in the japonica genome (Table 1). All these lectin sequences were further classified into nine of the 12 plant lectin families (Table 1). No homologs representing lectins belonging to the Agaricus bisporus agglutinin, Amaranthin and Cyanovirin families were found. A large variation exists in the abundance of lectin genes for the different lectin families. The Galanthus nivalis agglutinin (GNA) family represents the largest lectin family with 134 genes (40.7%) whereas the Robinia pseudoacacia chitinaserelated agglutinin (CRA) family is the smallest with only 2 genes (0.6%). Lectin sequences of the GNA and legume lectin families account for 72.3% of all lectin sequences in rice. A comparative analysis between the subspecies japonica and indica revealed only minor differences in the number of lectin sequences for most lectin families, but considerably more lectin homologs were retrieved from the japonica genome for the legume family and the GNA family (Table 1). This study will focus on the lectin sequences retrieved from the genome sequence O. sativa subsp. japonica.
The EUL family in the O. sativa genome In contrast to the Arabidopsis genome which contains a single gene belonging to the EUL family, five EUL genes were identified in the genomes of rice (Table 1), further referred to OsEULS2, OsEULS3, OsEULD1A, OsEULD1B and OsEULD2 according to the nomenclature suggested by Fouquaert et al. (2009). The S-type lectins are composed of a single EUL domain, whereas the D-type lectins contain two tandem arranged EUL domains. These five EUL genes are distributed over three chromosomes, in particular chromosomes 1, 3 and 7 (Fig. 1a). When mapping segmental duplications, a duplication block was identified linking LOC_Os03g21040 (chromosome 3) with LOC_Os07g48460 (chromosome 7). Further analysis of tandem duplicated genes revealed a duplication block containing the three EUL genes on chromosome 7 (Fig. 1a).

Genomic sequences of the EULs
Detailed analysis of the genomic sequences and the deduced coding sequences revealed that the EUL sequences contain multiple intron sequences (Fig. 1b). While the OsEULS2 gene is interrupted by a single intron, the genes encoding OsEULD1A and OsEULD2 both contain five introns, and the OsEULS3 and OsEULD1B genes contain three introns. A summary of the size for each of the different introns and exons is shown in Additional file 1: Table S1.
Several exons showed the same size (Additional file 1: Table S1) and had similar sequences at their boundaries. When plotting the exon structure of the EUL domain to the EUL protein sequence, it was clear that all EUL genes possess an exon-intron-exon transition at identical positions in the protein sequence (Additional file 2: Figure S1). Further comparison of the intron-exon structures for the different OsEUL genes revealed several similarities. Despite small differences in the size of the introns and two exons (OsEULD2 has an extended Nterminal region (nine nucleotides) and an extended linker region (162 nucleotides) compared to OsEULD1A), the intron-exon structure of the genes OsEULD2 and OsEULD1A is very similar (Fig. 1b). The exon-intron transitions of the different exons in OsEULD1A and OsEULD2 plot at the same sites of the protein (Additional file 2: Figure S1). In addition, the structure of the OsEULS3 gene correlates with that of the second domain in the OsEULD1A/D2 gene. Furthermore, when comparing the structure of OsEULD2/D1A with that of OsEULD1B, it is observed that exons 1 and 2 and exons 3 and 4 from OsEULD2/D1A are fused in OsEULD1B (Fig. 1b). Although the structure of the OsEULS2 gene deviates from the overall structure of the OsEUL genes at the 3′ end (i.e. no split stop codon), there is some similarity with the intron-exon structure of OsEULD1B showing boundaries at the same sites (Fig. 1b). This similarity in the intron-exon structures suggests a strong conservation and phylogenetic relationship between the different EUL genes from rice.

Genomic variation in the O. sativa EULs
When comparing the coding sequences of the EUL genes between the japonica and indica subvarieties, it was observed that the japonica sequence coding for the N-terminal region of OsEULS3 contained an additional stretch of 33 nucleotides that was absent in the indica sequence. This sequence, located in the first half of the first exon, is flanked by sequences conserved between the two varieties. Further analysis of the coding sequences revealed only small differences between the subvarieties. Except for OsEULS2, eight SNPs were found in the coding sequences of the OsEULs. Three of these SNPs give rise to an amino acid change in the protein sequence between japonica and indica. One SNP is located in the N-terminal region of the coding sequence of OsEULD1A, and two SNPs are found in the linker sequence connecting both EUL domains of the OsEULD2 coding sequence. With the availability of SNP data from the 3000 rice genome project (SNP-seek database, IRRI), an extensive search for SNPs in the EUL genes was performed. This database contains 3024 rice genomes classified into 12 subvarieties. Japonica species are subdivided into tropical japonica (Trop), subtropical japonica (Subtrop), temperate japonica (Temp) and Japx which contains all other japonica varieties. Indica species are divided into Ind1A, Ind1B, Ind2, Ind3 and Indx which contains the remaining indica varieties. Further classes consist of aux and aromatic (Aro) rice and the admix group containing all other unassigned varieties. In total 46 SNPs were identified, of which 13 are located in exons (Fig. 1b). Of the latter SNPs, seven yielded a silent mutation and 6 SNPs introduced an amino acid change (Additional file 3: Table S2). For the SNPs that gave rise to an amino acid change, the distribution of the two alleles in the different rice subvarieties was analyzed (Fig. 1c). Almost all genomes contain the Nipponbare reference nucleotide for OsEULS2, only in 11% of the tropical japonica genomes the alternative allele is present. Similarly for OsEULS3, the Nipponbare reference allele is used preferentially, except for a low percentage of the other allele in the indica2 (7%), Aro (9%) and admix (9%) groups. However, within the Aux subvarieties only 37% has the reference allele. When analyzing the SNPs in the two-domain EULs, a clear separation is observed in the use of the alleles for the japonica and indica species (Fig. 1c). While japonica subvarieties and the aromatic and admix groups preferentially use the Nipponbare reference allele, the indica subvarieties together with the aus subvariety have a preference for the alternative allele. Tropical japonica is the exception, with a preference for the indica allele in the SNP in OsEULD1A and OsEULD2 (Fig. 1c).

Domain architecture and structure of the rice EULs
Analysis of the domain structure for the different EUL lectin sequences revealed two putative lectins composed of a single EUL domain (further referred to as S-type EULs) and three proteins consisting of two tandem arrayed EUL domains (further referred to as D-type EULs) (Fig. 2a). When analyzing the sequence similarity between the different EUL domains (151 amino acids), a high degree of conservation was observed. Sequence identity between the different EUL domains can be as high as 85%, while sequence similarity reaches up to 92% (Fig. 2b, Additional file 4: Table S3). The Nterminal region preceding the EUL domain (ranging from 19 to 117 amino acids) and the linker region connecting the two EUL domains in D-type EULs (ranging from 18 to 76 amino acids) were analyzed using Interproscan and BLAST searches against the NCBI database. None of the sequences corresponded to any known protein domain. Further analysis of the protein sequences for the presence of signal peptides and Fig. 1 Comparison of DNA sequences encoding EUL homologs from O. sativa. a. Distribution and duplication of EUL genes in the Oryza sativa subsp. japonica genome. Sequences for the different EULs were mapped on the chromosomes based on their start position and annotated with their locus number. Blocks containing EUL genes resulting from segmental duplication events are indicated in gray and connected by lines. Tandem duplications are indicated by an asterisk. The chromosome map was prepared using MapChart and drawn to scale. b. Intron-exon structure of the EUL genes. Shown in white are the 5′ and 3′ UTRs, exons are shown in purple. Introns are represented by grey bars between the exons. SNPs identified by SNP-seek are indicated in yellow.c. Distribution of SNP alleles in different rice varieties. The percentage of the Nipponbare reference allele in the different subvarieties is indicated by the heat map. The dendrogram is generated by DendroUPGMA (http://genomes.urv.cat/ UPGMA/) (Garcia-Vallve et al. 1999) transmembrane domains did not retrieve any of these sequences in the OsEULs.
Analysis of the phylogenetic relationships between the different EUL domains revealed a clustering between the different EUL domains of the D-type EULs. For this analysis the EUL domains composing the D-type lectins were separated and referred to as domain 1 and 2 for the N-terminal and C-terminal domain, respectively. Analysis of the relationship between the S-type EUL domains and the N-and C-terminal domains of the D-type EULs revealed that the C-terminal EUL domain sequences of the D-type EULs (domain 2) are more closely related to the sequences of the S-type EULs than the Nterminal EUL domain sequences of the D-type EULs (domain 1) are related to the S-type domains (Fig. 2c). This is in accordance with the higher sequence similarity between the S-type EUL domains and the C-terminal EUL domain of the D-type EULs (Additional file 4: Table S3). Within the cluster of the D-type EUL domains, the second domain of OsEULD2 is positioned between the branch containing the sequences for the N-terminal domain (domain 1) and the branch representing the C-terminal domains (domain 2) of OsEULD1A and D1B. This is also reflected in the sequence similarity between the different sequences in rice: OsEULD1A_1 and OsEULD1B_1 share the highest similarity to OsEULD2_1 (89 and 88% respectively) and the latter domain shares the highest similarity to OsEULD2_2 (90%). OsEULD1A_2 and OsEULD1B_2 share the highest similarity to OsEULD2_2 (88 and 92% respectively) (Additional file 4: Table S3).

Molecular modeling of OsEULs
Molecular modeling of the EUL domains revealed that they all consist of the canonical β-trefoil conformation found in bacterial lectins, made of three bundles of βstrands, labelled as subdomains I, II and III, linked by more or less extended loops which protrude out of the protein core ( Fig. 3a-h). A single functional carbohydrate-binding site was identified in subdomain III, whereas three functional carbohydrate-binding sites usually occur in the bacterial β-trefoil structures. Modeling of the D-type lectins, OsEULD1A, OsEULD1B and OsEULD2, revealed that they consist of two tandemly arrayed β-trefoil domains, linked by a more or less extended proline (P)-rich linker region . Each β-trefoil domain in the lectin contains one carbohydrate-binding site which upon folding is located at opposite ends of the polypeptide chain.
As previously reported by Fouquaert and Van Damme (2012) both EEA and OsEULS2 exhibit a high specificity towards mannose and galactose containing saccharides. Docking experiments performed with α-D-mannose (Man), α-1,2-dimannoside (Man1,2Man), and N-acetyl-D-lactosamine (LacNAc), confirmed the promiscuity of the carbohydrate-binding site of the single-domain lectins OsEULS2 and OsEULS3 (Fig. 4, Additional file 5: Figure S2). A network of 8-9 hydrogen bonds participate in the anchorage of Man or Gal to the D116, N143 and Q144 residues forming the functional triad of the carbohydrate-binding site. An additional hydrogen bond occurs between K122 of OsEULS2 and the second Man ring of Man1,2Man. Additional stacking interactions between the sugar rings and the aromatic residues F118 and W136, complete the binding of simple sugars and disaccharides to the carbohydrate-binding site. Because of the high sequence conservation for the residues forming the active triad D-N-Q of EULs ( Fig. 2b) a very similar binding scheme was observed in docking experiments performed with the two-domain lectin OsEULD1A (Fig. 4) and OsEULD2 (Additional file 6: Figure S3). Depending on the lectins, some discrepancies were noticed in the binding of simple sugars to the carbohydrate-binding site. These discrepancies mainly concern 1) the number and the length of hydrogen bonds connecting the sugar rings to the D116-N143-Q144 triad and, 2) the stacking interactions between the sugar rings and the aromatic residues F118 and W136. In this respect, the aromatic residue F118 is replaced by the hydrophobic residue L118 in both OsEULS3 (Additional file 6: Figure S2) and the second EUL domain of OsEULD1B, preventing a stacking interaction with the Man or Gal ring. In spite of these few discrepancies, the binding scheme of Man and Gal remains very similar from one to another EUL domain.

Expression profiles of EUL genes upon plant stress
A search in the Transcriptome Encyclopedia of Rice (TENOR) database (Kawahara et al. 2016) was performed and changes in the expression levels were observed for the EUL genes after different abiotic stresses and plant hormone treatments. Figure 5 shows the expression profiles for the different OsEUL genes based on the expression data after application of drought or osmotic stress as well as after exogenous application of absiscic acid (ABA, 100 μM) or jasmonic acid (JA, 100 μM).
Transcripts for OsEULD1B and OsEULD2 are highly and significantly upregulated in the shoots in response to drought and osmotic stress (43 and 24 times for OsEULD1B, respectively, and 19 and 11 times for OsEULD2, respectively), especially after 3-12 h of treatment. Though the expression of OsEULS2 and OsEULS3 was also increased after drought stress the fold change is only eight times for OsEULS2 and three times for OsEULS3 after one day of treatment which is significantly lower in comparison with OsEULD1B and OsEULD2. The effect from the osmotic stress on the , OsEULD1B_2 (f), OsEULD2_1 (g), and OsEULD2_2 (h). Beta-sheets, stretches of α-helices and loops are colored magenta, orange and green, respectively. The bundles of β-sheet are numbered I, II and III, respectively. The carbohydrate-binding site of the EUL domain is located at the bundle III. i-j. Ribbon structure of the two-β-trefoil domain OsEULD1A (i) and OsEULD1B (j), showing the respective arrangement of the two β-trefoil domains on both sides of the central linker region (L). The arrow indicates the carbohydrate-binding site in subdomain III. N and C correspond to the N-and C-terminal ends of the polypeptide chain of OsEULD1A and OsEULD1B, respectively expression of the S-type EULs in the rice shoots is not so prominent, being only three and one times for OsEULS2 and OsEULS3, respectively. Upregulation of OsEULD1B and OsEULS2 was also observed in the roots whereas transcript levels for OsEULD2 were downregulated. Similarly transcript levels for OsEULS3 were downregulated in roots subjected to osmotic stress. Expression of OsEULD1A was significantly downregulated in shoots and roots after drought and osmotic stress.
Upon hormone treatment with ABA and JA, OsEULS2 is highly upregulated, especially in roots. Transcript levels for all other OsEULs were also significantly changed in shoots and roots, except for a slight downregulation in shoots for OsEULS3. In contrast to ABA, JA mainly causes a downregulation for all EULs with exception of OsEULS2 (up to 20 fold upregulation in roots) and OsEULD1B (5 fold upregulation in shoots).

Cis-regulatory elements in EUL promoter sequences
The identification of cis-regulatory elements in the promotor sequences of the OsEUL genes can provide information about environmental and developmental stimuli that might influence OsEUL gene expression which could support the expression data. Therefore 2000 bp promoter fragments for each OsEUL gene were screened for cis-regulatory elements. To decrease the high number of false positives associated with simple mapping of , Man (f-g) or LacNAc (h-i) to the carbohydrate-binding site of the first (d, f and h) or second (e, g and i) EUL domain. The H-bond distances are indicated (Å). A stacking interaction occurs between the first Man ring (a, d and e), Man (b, f and g) or Gal (c, h and i) and the aromatic residues F118 and W136 (colored orange) located in the vicinity of the active site cis-regulatory elements and strongly increase the likelihood of retaining biologically functional elements, an integrative bioinformatics approach was followed which takes into account co-expressed genes, evolutionary sequence conservation and information on open chromatin regions (DNaseI-hypersensitive sites) (Vandepoele et al. 2009;Van de Velde et al. 2014). Cis-regulatory elements were identified in a set of approximately 150 coexpressed genes identified for each OsEUL gene. Furthermore an enrichment analysis was performed on this set of genes to identify the cis-regulatory elements significantly shared between each OsEUL and its coexpressed genes. In total 41 cis-regulatory elements were identified in the OsEUL promoters and their respective co-expressed genes, yielding regulatory information about OsEULS2, OsEULS3 and OsEULD1B ( Fig. 6; Additional file 7: Table S4; Additional file 8: Figure S4).
In addition, 111 cis-regulatory elements were identified in open chromatin regions of OsEULS2, OsEULD1A and OsEULD1B (Additional file 7: Table S4; Additional file 8: Figure S4) and 410 sites were identified within the evolutionary conserved regions of all OsEUL promoters (Additional file 7: Table S4; Additional file 8: Figure S4).
Analysis of the transcription factors binding to the identified cis-regulatory elements shared between the OsEULs and their co-expressed genes (OsEUL regulons) retrieved two major transcription factor families involved in stress response and development, in particular the WRKY family and the basic leucine-zipper (bZIP) family. While the WRKY transcription factors are mainly found in association with OsEULS2, the bZIP family as well as some NAC transcription factors are associated with OsEULD1B. Within the promoter sequence of OsEULS2, 78% of the significantly overrepresented motifs were Fig. 5 Expression profiles of the rice EUL genes upon plant stress represented as fold change. The false discovery rate is represented with asterisks to denote statistical significance. One star (*) is used if the value is less than 0.05, two stars (**) represent values less than 0.05 but higher or equal to 0.001, and three stars (***) show a value below 0.001. a. Drought stress. b. Osmotic stress. c. ABA treatment. d. JA treatment identified as elements associated to the WRKY family of transcription factors ( Fig. 6; Additional file 7: Table S4; Additional file 8: Figure S4). Our analysis retrieved binding sites for 5 rice WRKY transcription factors : WRKY11 (LOC_Os01g43650), WRKY28 (LOC_Os06g44010), WRKY62 (LOC_Os09g25070), WRKY71 (LOC_Os02g08440) and WRKY76 (LOC_ Os09g25060). In addition, binding sites of five maize and one Arabidopsis WRKY transcription factor and one additional W-box motif were retrieved (Additional file 7: Table S4). Furthermore, binding sites for two bZIP transcription factors from Arabidopsis (bZIP19 and bZIP23) were identified. The other identified motifs in the OsEULS2 regulon are associated with FUS3 and MADS box transcription factors. The presence of a TATABOX1 motif was also confirmed in the OsEULS2 promoter.
Most of the significantly overrepresented motifs in the promoter sequence of the OsEULD1B gene belong to the so-called ABRE elements (e.g. ABRELATERD1, ABRETAEM, ABREBNNAPA, ABADESI2) or are associated with the bZIP family of transcription factors according to their annotation in the PLACE database (HY5AT, ACGTTBOX). In total, 59.4% of the significantly overrepresented motifs/transcription factors in the promoter sequence of the OsEULD1B gene are associated to bZIP transcription factors. Among these elements TRAB1, OSBZ8 and LOC_Os07g37920 represent bZIP transcription factors from rice. The second largest group are the NAC transcription factors which comprise 20.8% of the enriched promoter elements for OsEULD1B. Furthermore two binding sites for bHLH factors were retrieved. The motif enrichment analysis in the OsEULS3 regulon identified only two significantly enriched elements in the OsEULS3 promoter: a PALBOXAPC motif and a binding site for the Arabidopsis response regulator ARR11.
A GO term enrichment analysis was performed for the different sets of OsEUL regulons (Additional file 9: Table S5) and revealed several GO terms related to stress perception and signaling (Additional file 10: Table S6). For example, in the OsEULD1B regulon GO terms such as ' ABA metabolism' (GO:0009687), 'response to abiotic stimuli' (GO:0009628) and 'abiotic stress' , 'response to osmotic stress((GO:0006970), 'response to salt stress' (GO:0009651) and more general 'response to stress' (GO:0006950) and 'response to stimulus' (GO:0050896) were found to be enriched.

Phylogenic relationships between the EUL domains from rice and other monocot and dicot species
While the occurrence of many lectin families is restricted to a few plant families, the EUL family is present in Embryophyta. However, differences in the number of EUL genes are observed between different plant species. Analysis of 8 monocot genomes retrieved 5 to 8 putative EUL genes (Additional file 11: Table S7). In dicot genomes, the number of putative EUL genes is remarkably smaller. Three EUL genes were identified in the genome of   Table S7). In addition, analysis of the domain structure of the EULs in dicots and monocots revealed that the D-type EULs are only present in monocots while the S-type EULs are found in all species (Additional file 11: Table S7).
Since a high level of sequence conservation was observed between the different EUL domains in rice (Additional file 4: Table S3), this analysis was extended with the EUL domains obtained from the different monocot and dicot species. In total 70 EUL domains belonging to 33 S-type and 19 D-type EULs (Additional file 11: Table S7) were aligned and the variability in the amino acids at each position of the sequence was analyzed and quantified using WebLogo (Fig. 7). Despite variation in the first 33 amino acids of the EUL domain, a high level of conservation is observed starting from the first QxW motif (at position 34-36) in the EUL domain. The D116-N143-Q144 consensus sequence representing the functional triad of the carbohydrate-binding site was conserved in all analyzed sequences. Next to this triad, the aromatic residues involved in stacking interactions with the sugars are conserved. As expected from the modeling studies, the W136 residue was conserved in all species except for Vitis vinifera where an asparagine (N) is present, while for the residue F118 a substitution with the hydrophobic residue L118 was observed in 37% of the sequences.
To study the evolutionary relationships between these EUL domains, a phylogenetic analysis was performed. All EUL domain sequences were aligned, after which a phylogenetic tree was generated (Fig. 8). The dendrogram shows 4 main branches (indicated as I to IV) and reveals a rather strict separation between the EUL domains originating from S-type and D-type EULs (Fig. 8).
Within the cluster of D-type EUL domains (branch I), a separation can be observed between the N-terminal (domain 1) and C-terminal (domain 2) EUL domains. The branches show multiple clusters defined by the rice D-type EULs: OsEULD1A, OsEULD1B and OsEULD2 (Fig. 8). While the D-type EUL domains group closely together, the clustering of the S-type EUL domains shows more divergence.
The EUL domains from dicot species cluster together in one branch (branch IIIA) separately from the monocot species. In addition, the sequences from Musa acuminata, the only monocot with only S-type EULs, cluster together in one branch (branch IIIB) separated from the other monocots but close to the branch of the dicot EUL domains. For the other monocot S-type EUL domains two clusters are observed, these two clusters can be defined by OsEULS2 (branch IV) and OsEULS3 (branch II).
Reconciliating the domain tree with the species tree gave insights into the duplication events that gave rise to the S-and D-type EULs (Additional file 12: Figure S5). After the divergence between Musa acuminata and the Poaceae, the ancestral S-type was submitted to three consecutive duplication events. The first two duplication events gave rise to the two S-type EULs (represented by OsEULS2 and OsEULS3), the third duplication event gave rise to the 2 domains of the D-type EULs. Further Fig. 7 Sequence logo for the EUL domains of all species under study. Amino acid residues forming the active triad of the carbohydrate-binding site are indicated with an asterisk (*). Aromatic residues involved in stacking interactions with the sugars are indicated by a hash (#) duplication events yielded the different D-type EULs and increased the copy number of EULs in the different species. Increasing the domain tree with sequences from monocots outside the Poaceae family might increase the resolution of the timeframe when the different duplication events occurred.

Discussion
With changing environmental conditions, the ability of plants to react to different forms of stress (e.g. cold, drought and flooding) becomes of key importance for a sustainable future. Understanding the mechanisms governing plant stress responses and identification of the individual components involved, will shed light on the molecular mechanisms behind stress signaling which in turn can be applied to improve stress resistance (Jung et al. 2013). Accumulating evidence shows that proteincarbohydrate interactions are of vital importance for plant immunity, involving lectin-carbohydrate interactions at the cell surface as well as in the cytoplasmic compartment (De Schutter and Van Damme 2015; Lannoo and Van Damme 2014).
Based on their sequence similarity to reference members for the different plant lectin families, we retrieved 325 putative lectin genes belonging to nine different lectin families from the genome of Oryza sativa subsp. japonica, indicating that the rice genome harbours a large set of putative lectin domains each characterized by a specific carbohydrate-recognition domain. Since experimental data points to the involvement of EULs in the plant stress response (Van Hove et al. 2015;Al Atalah et al. 2014a;Fouquaert et al. 2009), the focus of this paper is on an in-depth molecular characterization of this lectin family in rice.
Five EUL genes were identified in the genome of japonica rice. Orthologs of these 5 genes were also found in the indica genome (Additional file 11: Table S7). Since several databases provide evidence that these EUL genes are expressed, we assume that these five genes are functional. In addition, four pseudogenes have been reported (Fouquaert et al. 2009) for which no evidence of transcription was found. Although all four pseudogenes were retrieved from the japonica genome based on homology with the E. europeaus lectin (Additional file 11: Table  S7), only two of them were found to contain an EUL domain using interproscan. Similarly, 2 pseudogenes containing EUL domains were identified in the subspecies indica. Since there is no evidence for transcription of these pseudogenes, the sequences were not taken into account for further analysis.
Analysis of the genomic structure of the rice EUL genes revealed a high degree of conservation. When plotting the intron-exon structure of the OsEUL genes, to the protein sequence, it is noted that exon boundaries

N-terminal EUL domain) (blue) and domain 2 (C-terminal EUL domain) (green). Branch labels show bootstrapping values
are present at the same positions in the protein sequences. Furthermore, when analyzing the position of the different rice EULs in the genome, it is observed that OsEULD1A and OsEULD2 are linked by a tandem duplication, which can account for the high similarity in genomic structure. Interestingly, a high degree of similarity is also observed between the intron-exon structure of OsEULS3 and ArathEULS3, the only EUL gene in the Arabidopsis thaliana genome. Due to the similarity between the structure of ArathEULS3 and OsEULS3/D1A/ D2, it is suggested that the S3 type represents the more ancestral structure.
A search in the SNPSeek database comprising the data of the 3000 rice genome project, identified 46 SNPs in the rice EUL genes. Analysis of the SNPs that give rise to an amino acid change revealed clear preferences between indica and japonica subspecies for the SNPs in the D-type EUL genes. Further research is needed to elucidate whether the SNPs contribute to differences in stress tolerance of rice.
In PFam (PF14200, Ricin-B lectin 2, 103 amino acids) and SSF (SSF30570, Ricin-B like lectins, 151 amino acids) the EUL domains are annotated as ricin-B like lectins based on the presence of the QxW motif typical for ricin-B domains. Compared to the shorter pFam domain, the annotation of the EUL domain by SSF (SSF50370) shows an extended domain of 151 amino acids including a second QxW motif. Since the EUL genes share no significant overall sequence similarity with any protein comprising a ricin-B domain (Fouquaert et al. 2008), lectins containing the EUL domain were grouped as a separate lectin family.
Phylogenetic analysis of the EUL domains showed a close relationship between the EUL domains of the Stype EULs and the second domain of the D-type EULs. The percentage sequence identity between the S-type EUL domain and domain 1 of D-type EULs ranges between 68 and 76% (similarity between 80 and 86%) while the sequence identity between S-type EUL domains and domain 2 of D-type EULs ranges between 71 and 80% (similarity between 82 and 92%) (Additional file 4: Table S3). This higher sequence similarity between the S-type EULs and the second domain in D-type EULs is in agreement with the higher degree of conservation in the genomic structure of these domains.
As pointed out by Fouquaert and Van Damme (2012), and documented by Agostino et al. (2015), the carbohydrate-binding domain of the Euonymus lectin exhibits a notable promiscuity in the accommodation of diverse carbohydrate structures. The spatial organization of the amino acid triad D-N-Q forming the carbohydrate-binding site of EULs, accounts for the promiscuity in the binding of simple sugars and oligosaccharides to the β-trefoil domains of E. europaeus lectins. In this respect, docking experiments performed with mannose, α-1,2-dimannoside and LacNAc, resulted in the anchorage of the sugars to the carbohydratebinding site of the β-trefoil domains through a very similar network of hydrogen bonds and stacking interactions. However, some discrepancies occurred according to the number and length/strength of hydrogen bonds anchoring the sugars to the carbohydrate-binding site. The docking experiments with OsEULS2 confirmed the previously reported substrate specificity for high-mannose N-glycans and lactosamine related structures (Al Atalah et al. 2012). However, for OsEULD1A a substrate specificity for galactose related sugars or galactose containing glycoproteins was reported whereas mannose did not inhibit the agglutination of rabbit erythrocytes caused by OsEULD1A.
Previous promoter analysis performed using the PLACE database (Al Atalah et al. 2013) identified the presence of several TATA and CAAT boxes and identified three major classes of promoter elements, among which the light responsive elements, the ABA and GA responsive elements and the elements related to other (a)biotic stresses. There is some controversy about the significance of the TATA box for transcription initiation since analyses in A. thaliana (Molina and Grotewold 2005) and rice (Civan and Švec 2009) genomes revealed that only 29% of the Arabidopsis genes and 19% of the genes in O. sativa comprise a TATA box in their promoter region. The enrichment analysis identified one TATABOX1 motif in OsEULS2 promoter, but similar motifs were absent in other OsEUL promoter sequences. Identification of cis-regulatory elements revealed a high number of promoter elements putatively related with the abiotic stress responsiveness of the OsEUL genes. To discriminate between false positives and true biologically significant elements, coregulatory genes, evolutionary sequence conservation and information about open chromatin regions were integrated into the analysis. A large set of cis-regulatory elements retrieved from the OsEUL promoter sequences can be classified in the WRKY, basic leucine-zipper (bZIP) and NAC transcription factor families, known to be associated with abiotic stress responses (Banerjee and Roychoudhury 2017;Fang et al. 2008;Nijhawan et al. 2008). These data are in agreement with the GO enrichment analysis that yielded GO terms related to stress signaling and metabolism, response to abiotic stimuli and abiotic stress, cell signaling etc.
Motifs associated with WRKY transcription factors are abundantly present within the promoter sequences of the OsEULS2 regulon. WRKY transcription factors have been implicated in different biological processes such as response to wounding, senescence, development, dormancy, cold and drought tolerance, metabolism and hormone signaling pathways. In addition, numerous WRKY genes are involved in response to biotic and abiotic stress (Berri et al. 2009). The five rice WRKY transcription factors with binding sites identified in the OsEULS2 promoter all show stress regulated expression profiles. The responsiveness of WRKY11, WRKY28 and WRKY62 to abiotic stress was shown in a microarray analysis (Jain et al. 2007), where WRKY11 was upregulated in drought stress, whereas WRKY28 and WRKY62 were responsive to salt stress. With the exception of WRKY11, these transcription factors were also upregulated under several biotic stresses, such as bacterial, viral and fungal infection (Marcel et al. 2010;Zhou et al. 2010;Berri et al. 2009).
Within the promoter of OsEULS3, and its coregulated genes, a PALBOXAPC element was identified (Additional file 7: Table S4). This element was found to be significantly enriched in promoters of stress related genes that are downregulated by ABA (Yazaki and Kikuchi, 2005) and agrees with the downregulation of OsEULS3 in rice shoots treated with ABA (Fig. 5).
A predominant group of motifs in the OsEULD1B promoter are associated with transcription factors belonging to the bZIP family, these are known as stress inducible transcription factors (Banerjee and Roychoudhury 2017) involved in drought, osmotic, salt stress and some of them also in cold stress (Liu et al. 2014;Lu et al. 2009). Among them TRAB1 and OSBZ8 are reported as ABA related elements. While OSBZ8 expression is reported in the roots of ABA treated seedlings and in developing embryos, TRAB1 shows more general expression in different plant organs as well as in different stages of embryo development and in seeds (Banerjee and Roychoudhury 2017). Their responsiveness to dehydration and salt is confirmed in microarray data (Nijhawan et al. 2008) and in the case of OSBZ8 the upregulation is also associated with salt tolerance in indica rice cultivars (RoyChoudhury et al. 2008). Another group of transcription factors retrieved in the EULD1B promoter analysis belongs to the NAC transcription factor family, involved in different stress and developmental processes (Fang et al. 2008). Previously the NAC/ NAM transcription factor LOC_Os07g37920 was identified as a stress inducible gene under water deficit conditions (Ray et al. 2011). Similarly Fang et al. (2008) reported upregulation of this transcription factor in response to salt and drought stress. RNA sequencing data showed that the same transcription factor was involved in root development and auxin signaling (Hiltenbrand et al. 2016).
Enhanced expression for OsEUL genes in plants subjected to abiotic stress was confirmed by proteomics data. Already in 1997 Moons et al. reported on the stress induced expression of one of the OsEULs after treatment of rice seedlings with ABA (Moons et al. 1997). More recent proteomic analyses (Cheng et al. 2009), qRT-PCR investigations (Al Atalah et al. 2014a) have confirmed the differential transcription of several OsEUL genes upon ABA and salt stress. Expression of OsEULD1B was found to be upregulated upon drought stress (Rabello et al. 2008). EUL proteins in barley and wheat were also reported as drought related proteins. EULS3 and EULD1B genes are induced and upregulated at protein level in barley crown and roots from drought sensitive cultivars while the drought tolerant cultivar shows a constant protein level for EULD1B (Vítámvás et al. 2015). Similarly, two proteins belonging to the EUL family, one of them classified as the EULD2 type are detected at higher protein levels in a drought tolerant wheat cultivar especially in drought treated plants compared to the control plants at 10 days after opening of the flowers (Jiang et al. 2012b).
Transcript levels for ArathEULS3, the EUL homolog from Arabidopsis were also significantly upregulated after ABA treatment (Li et al. 2014;Van Hove et al. 2014), MeJA treatment (Van Hove et al. 2014) and drought (Li et al. 2014). In addition, Arabidopsis plants overexpressing ArathEULS3 revealed a better tolerance to drought stress (Li et al. 2014) and Pseudomonas syringae infection (Van Hove et al. 2014). All these findings support the statement that EUL genes are actively involved in the signal cascades triggered by several abiotic stresses.
In contrast to ABA, JA causes a significant downregulation for most EULs in rice with the exception of OsEULS2 (shoots and roots) and OsEULD1B (mainly shoots). The information retrieved from TENOR is confirmed by proteomics studies involving some EULs (Cho et al. 2007). Moons et al. (1997) also report the downregulation of OsEULD1B in roots after 24 h after JA treatment which is in agreement with the RNA sequencing data and in the same paper EULD1B is defined as not responsive to salicylic acid (SA) or ethylene.
At present, there are only a few reports that show the involvement of OsEUL genes in response to pathogen attack. qRT-PCR experiments and microarray data revealed that transcript levels for some OsEULs are upregulated after infection of rice with Xanthomonas oryzae pv. oryzae (OsEULD1B, OsEULD2), Magnaporthe oryzae (OsEULD2) or root knot nematodes (Meloidogyne graminicola) (OsEULD2) (Al Atalah et al. 2013;Kyndt et al. 2012). In line with this, several WRKY transcription factors with binding sites identified in the OsEUL promoters are regulated by diverse biotic stresses. For example, WRKY71 for whom a binding site was identified in the OsEULS2 promoter was upregulated upon infection with Magnaporthe grisea (Berri et al. 2009). Other WRKY transcription factors with binding sites in the OsEULS2 promoter were upregulated upon infection with Xanthomonas oryzae (Zhou et al. 2010) and Magnaporthe oryzae (Marcel et al. 2010).
The EUL lectin family is unique in that representatives of this family are present throughout the plant kingdom and have been identified in every fully sequenced genome of land plants. However, differences are observed in the number of EUL genes and the type of EULs in different plant species. The analysis of 9 monocot and 8 dicot genomes revealed a higher number of putative EUL genes in monocots (ranging 5 to 8) compared to dicots (ranging 1 to 3). In addition, it is observed that dicot species only possess S-type EULs while S-type as well as D-type lectins are identified in monocots (Additional file 7: Table S4). One exception is Musa acuminata where only S-type EUL sequences were identified. Judging from 70 EUL domain sequences, representing 33 S-type and 19 D-type EULs, identified in all genomes under study, the conservation of the EUL sequences is high. A high degree of sequence conservation was observed not only for the D-N-Q triad and the aromatic residues F/L118-W136 but also for the complete EUL domain.
In addition, the phylogenetic tree built from all EUL domains under study, revealed distinct clusters. Overall a good correlation was observed between the annotation of the EUL domains for the different species and the rice domain in this cluster. Reconciliation with the species tree provided insight in the three duplication events leading to the different S-type and D-type EULs. Interestingly the EUL domains from the dicot species are grouped in a separate branch and this branch clusters close to the cluster containing the EUL domains from Musa acuminata which, like the dicots, solely encodes the S-type lectins. Analysis of the reconciliated tree revealed that Musa acuminata diverged from the other monocots before the first duplication event that gave rise to the monocot S-and D-type EULs. This can explain why banana only contains S-type EUL lectins of the same (ancestral) type as the dicot species. In other monocot species the S-type EULs were subject to differentiation that gave rise to the S2-type and S3-type EULs.

Conclusion
The strong conservation of the Euonymus-related lectins within the plant kingdom, not only at protein level but also at genomic level suggests an important role for these proteins in the plant cell. The identification of multiple stress responsive elements in the promoter sequences of these genes as well as the transcriptional regulation of these genes in response to stress or hormonal treatments are in agreement with a role in plant stress signaling. However, further research, such as the generation of rice plants overexpressing OsEULs, is needed to experimentally prove the involvement of EULs in stress signaling and plant defense in rice.  (Kawahara et al. 2013) and screened for the presence of conserved protein domains using interproscan 5 (Mitchell et al. 2015). The program was downloaded (https://www.ebi.ac.uk/ interpro/download/) and locally installed. Indica lectins were identified by BLAST searches with the lectin domains of the japonica hits against the indica rice genome (ASM465v1) available from EnsemblPlants (http://plants.ensembl.org). As for the japonica sequences, these protein sequences were analyzed for the presence of conserved protein domains using Interproscan 5 (Mitchell et al. 2015). Only sequences with at least one lectin domain were retained. The protein sequences of the lectins were analyzed for the presence of signal peptides using SignalP 4.1 (Petersen et al. 2011) and the presence of transmembrane domains was analyzed using TMHMM 2.0 (Krogh et al. 2001).

Mapping lectin sequences to chromosomal locations and analysis of duplication events
The putative lectin genes were mapped on the different chromosomes using the MapChart software (Voorrips 2002). The transcription start sites of the putative lectin genes were retrieved from the MSU database (Kawahara et al. 2013) and used for the construction of the map.
Gene expansion through segmental duplication or tandem duplication was analyzed for the japonica lectin sequences. Tandem duplications were assigned if: 1. Both genes belong to the same lectin family, 2. There are no more than ten intervening genes and 3. They reside on the same chromosome. Segmental duplications were identified using the Plant Genome Duplication Database (PGDD) . Collinear blocks within the O. sativa subsp. japonica genome were determined by McScan v8 (http://chibba.agtec.uga.edu/duplication/ index/files), the output data was downloaded and searched for presence of lectin genes. Duplicated genes with a Ks (synonymous substitution) value higher than 1.0 were omitted.

Genomic analysis of the O. sativa EUL genes
The genomic structure of the OsEUL genes was analyzed in GenomeView (Abeel et al. 2011) using the japonica RGAP release seven sequence. Visual representation of the genomic structure of the EUL genes was made in IBS 1.0.1 (http://ibs.biocuckoo.org/). Sequences of the intron boundaries were analyzed by Weblogo (http://weblogo.berkeley.edu/logo.cgi).
Single nucleotide polymorphisms (SNP) in the EUL genes between different rice subspecies were identified using the SNP-Seek database containing SNP genotyping data (called against Nipponbare reference Os-Nipponbare-Reference-IRGSP-1.0) from the 3000 Rice Genomes Project (Alexandrov et al. 2015). The use of the Nipponbare reference allele in the different rice groups (in %) is represented in a heat map using the BAR HeatMapper Plus Tool (http://bar.utoronto.ca/ ntools/cgi-bin/ntools_heatmapper_plus.cgi). Based on this data a dendrogram is constructed using DendroUPGMA (http://genomes.urv.cat/UPGMA/). For the generation of the dendrogram, the RMSD (Root Mean Square Deviation) distance coefficient has been used to compare between sets of variables.

Molecular modeling of OsEULs
Homology modeling of the β-trefoil domains of EULs was performed with the YASARA Structure program (Krieger et al. 2002), running on a 2.53 GHz Intel core duo Macintosh computer. Up to eighteen different models were built up for each of the EUL domains corresponding to LOC_Os07g48500.1 (OsEULS2), LOC_Os01g01450.1 (OsEULS3), LOC_Os07g48490.2 (OsEULD1A), LOC_Os03g21040.2 (OsEULD1B), and LOC_Os07g48460.1 (OsEULD2) sequences, using the Xray coordinates of different bacterial hydrolases and lectins as templates: a β-l-arabinopyranosidase from Streptomyces avermitilis (PDB code 3A21) (Ichinose et al. 2009), the HA-33/HA-17 hemagglutinin complex of Clostridium botulinum (PDB code 5B2H) (Sagane et al. 2016), the mosquitocidal holotoxin from Bacillus sphaericus (PDB code 2VSE) (Treiber et al. 2008), a computationally reconstructed β-trefoil subdomain module (PDB code 3PG0) (Broom et al. 2012), the HA33 neutotoxin from Clostridium botulinum (PDB code 4OUJ) (Lee et al. 2014), the 1,3Gal43A exo-β-1,3-galactanase from Clostridium thermocellum (PDB code 3VSF) (Jiang et al. 2012a), the β-trefoil lectin HA33/C from Clostridium botulinum type C neurotoxin (PDB code 3AJ5, 3AJ6) (Nakamura et al. 2011), the retaining xylanase from Streptomyces olivaceoviris E-86 (PDB code 2D24) (Suzuki et al. 2009), a family ten xylanase from Streptomyces olivaceoviridis E-86 (PDB code 1V6W) (Fujimoto et al. 2004), and the hemagglutinin of Clostridium botulinum (PDB code 3WIN) (Amatsu et al. 2013). These templates were selected based on the best BLAST E-values found for the amino acid sequences matching with the query sequence, using a cutoff value of 0.5 for the alignment score. Finally, a single hybrid model was built up, from the different previous models combining the best refined regions, for each of the EUL domains. In addition, a tentative modeling of OsEULD1A and OsEULD1B proteins corresponding to LOC_Os07g48490.2 and LOC_Os03g21040.2 sequences, was performed with the same templates to get an insight into the molecular organization of these two-domain lectins. Similarly, a single hybrid model was built up from the different previous models for these two OsEULs. PROCHECK (Laskowski et al. 1993) and ANOLEA (Melo and Feytmans 1998) were used to assess the geometric and thermodynamic qualities of the three-dimensional models (Additional file 13: Table S8). Using ANOLEA to evaluate the models, some residues of the EUL models exhibited an energy over the threshold value. These residues are mainly located in the loop regions, especially those loops connecting the β-sheets in the models. However, the calculated QMEAN6 score (Benkert et al. 2011;Arnold et al. 2006) of all of the models gave values > 0.5 (results not shown).

Expression analysis of EUL genes based on publicly available resources
The accession numbers of the representative transcripts of OsEULs were downloaded from Rice Annotation Project Data Base (RAP DB) (http://rapdb.dna.affrc.go.jp/). A search for changes in the expression of all rice EULs transcripts was performed in TENOR database (http:// tenor.dna.affrc.go.jp/). Data for the fold change after drought, osmotic stress, abscisic acid (ABA) and jasmonic acid (JA) were downloaded and represented graphically. During our search in TENOR we found three representative transcripts for OsEULS3 and two for OsEULD1A. Since all transcripts for each of these OsEULs showed very similar results, their fold change was represented as mean values.
In order to reduce the high false positive rates associated with inferring regulatory interactions based on simple motif mapping, two approaches were used. A first approach consisted of filtering motif matches using crossspecies sequence conservation or open chromatin regions. Therefore, motif matches were filtered using all 4 sets of conserved non-coding sequences from De Witte et al. (http://bioinformatics.intec.ugent.be/blsspeller/) (De Witte et al. 2015) and DNaseI-hypersensitive sites downloaded from Zhang et al. (2012). Only the motif matches that overlap by 50% with the above functional regions (open chromatin or conserved non-coding sequences) were retained (using intersectBed -f 0.5 from BEDTools; Quinlan and Hall 2010).
For the second approach motif enrichment analysis was used. Motif mapping information was combined with a set of up to 200 co-expressed genes (according their Pearson correlation coefficient)from the TENOR database (Kawahara et al. 2016). With the help of the RAP-DB ID convertor tool (http://rapdb.dna.affrc.go.jp/ tools/converter) the RAP DB identifiers were converted to MSU locus numbers and used for the subsequent motif enrichment analysis. During the process of conversion some of the annotated RAP DB genes were not identified in the MSU database which reduced the number of co-expressed genes to approximately 150 genes for each OsEUL gene. Motif enrichment on these coexpressed gene was determined using the hypergeometric test with false discovery rate correction. Only significantly (q-value < 0.05) enriched motifs also present in the OsEUL promoters were retained as enriched motifs. The GO enrichment analysis on the different OsEUL regulons was performed using the GO enrichment tool in the PLAZA workbench of PLAZA 3.0 Monocots (Proost et al. 2015).

Phylogenetic analysis
Sequences corresponding to the different EUL domains of the OsEULs were extracted and aligned with MUSCLE using the default settings (Edgar 2004). Based on this alignment a maximum likelihood phylogenetic tree was build using RAxML v8.2.4. RAxML used the GTRGAMMA model with automated determination of the best amino acid substitution model (i.e. the model with the highest likelihood score on the starting tree), random number seed and distinct starting trees. Bootstrap iterations to assess the robustness of the generated trees were decided automatically by RAxML. The phylogenetic tree was displayed with FigTree v1.4.2 (http:// tree.bio.ed.ac.uk/software/figtree).
The EUL domain of all putative EUL sequences from the monocot and dicot species under study was extracted and aligned with MUSCLE using the default settings (Edgar 2004). A logo of the EUL domain was generated with WebLogo 3 (http://weblogo.threeplusone.com/ create.cgi). From the aligned sequences a maximum likelihood phylogenetic tree was built with RAxML v8.2.4 as described before. The FigTree v1.4.2 software was used to visualize and edit the phylogenetic tree. Reconciliation of the phylogenetic tree with the species tree was performed in Notung 2.9 (Stolzer et al. 2012). The species tree, containing all species from which EUL domains were sampled, was constructed in NCBI taxonomy (https://www.ncbi.nlm.nih.gov/Taxonomy/Com-monTree/wwwcmt.cgi).