Pseudogenization of the rhizobium-responsive EXOPOLYSACCHARIDE RECEPTOR in Parasponia is a rare event in nodulating plants

Nodule symbiosis with diazotrophic Frankia or rhizobium occurs in plant species belonging to ten taxonomic lineages within the related orders Fabales, Fagales, Cucurbitales, and Rosales. Phylogenomic studies indicate that this nitrogen-fixing nodulation trait has a single evolutionary origin. In legume model plants, the molecular interaction between plant and rhizobium microsymbiont is mapped to a significant degree. A specific LysM-type receptor kinase, LjEPR3 in Lotus japonicus and MtLYK10 in Medicago truncatula, was found to act in a secondary identity-based mechanism, controlling intracellular rhizobium infection. Furthermore, LjEPR3 showed to bind surface exopolysaccharides of Mesorhizobium loti, the diazotrophic microsymbiont of L. japonicus. EPR3 orthologous genes are not unique to legumes. Surprisingly, however, its ortholog EXOPOLYSACCHARIDE RECEPTOR (EPR) is pseudogenized in Parasponia, the only lineage of non-legume plants that nodulate also with rhizobium. Analysis of genome sequences showed that EPR3 orthologous genes are highly conserved in nodulating plants. We identified a conserved retrotransposon insertion in the EPR promoter region in three Parasponia species, which associates with defected transcriptional regulation of this gene. Subsequently, we studied the EPR gene of two Trema species as they represent the sister genus of Parasponia for which it is assumed it lost the nitrogen-fixing nodulation trait. Both Trema species possess apparently functional EPR genes that have a nodulation-specific expression profile when introduced into a Parasponia background. This indicates the EPR gene functioned in nodulation in the Parasponia-Trema ancestor. We conclude that nodule-specific expression of EPR3 orthologous genes is shared between the legume and Parasponia-Trema lineage, suggesting an ancestral function in the nitrogen-fixing nodulation trait. Pseudogenization of EPR in Parasponia is an exceptional case in nodulating plants. We speculate that this may have been instrumental to the microsymbiont switch -from Frankia to rhizobium- that has occurred in the Parasponia lineage and the evolution of a novel crack entry infection mechanism.


Introduction
The ability to engage in a nodule endosymbiosis with diazotrophic Frankia or rhizobium soil bacteria is a trait present in ten plant lineages within the taxonomic orders Fabales, Fagales, Cucurbitales and Rosales [1]. These four orders are commonly known as the nitrogen-fixing clade (NFC), but also represent multiple lineages of non-nodulating plants [2]. Recent phylogenomic studies indicated that the absent-present pattern of nitrogen-fixing root nodules in the NFC is the result of a single evolutionary gain of the nodulation trait, followed by multiple parallel losses [3][4][5]. In such a scenario, switches of microbial partners may have occurred.
A key feature of the nodulation trait is the potential to form a partnership with diazotrophic bacteria, in which the bacteria are carried to a newly formed root organ -the nodule-to establish an endosymbiosis. This bacterial infection is typically supported by plant-derived tubular structures, called infection threads, that transport bacteria to dividing root cortical cells that form the nodule primordium. Finally, infection threads penetrate into nodule cells allowing bacteria to fill most of the cytoplasmic space. The plant host provides carbohydrates to symbiotic bacteria that then fix di-nitrogen gas (N 2 ) to ammonia, which is metabolized by the plant.
Nodule formation relies on a complex cross-talk between plant and microbial partners. This cross-talk can vary in its specificity depending on which partners are involved. For example, in Lotus japonicus -an important legume model species-infection thread progression and cell infection are granted by recognition of compatible rhizobia surface exopolysaccharides (EPS) by the host's trans-membrane lysin motif (LysM) receptor kinase EXOPOLYSACHARIDE RECEPTOR 3 (LjEPR3) [6,7]. LjEPR3 harbours a singular configuration of its three LysM domains (LysM1-LysM2-LysM3) due to the atypical topology of LysM1 [6,8]. As a result, the extracellular domain of LjEPR3 is specific to EPS and does not bind to fungal and rhizobia chitooligosaccharide and lipochitooligosaccharide signal molecules (COs and LCOs) [8]. Thus, LjEPR3 works as a secondary identity-based mechanism in the establishment of nitrogen-fixing nodule symbiosis between L. japonicus and its microsymbiont Mesorhizobium loti.
Studies on EPR3-type LysM receptors in species other than L. japonicus are limited. In Medicago truncatula, the LjEPR3 ortholog MtLYK10 is crucial for the progression of the infection thread to the nodule primordia. But recognition of succinoglycan -the surface EPS of the M. truncatula compatible microsymbiont Sinorhizobium meliloti-was not found [9]. EPR3-type receptors do occur also in non-nodulating plant species [8,9], however, surprisingly is lost in the nodulating Cannabaceae species Parasponia [4].
Parasponia is the only taxonomic lineage outside the legume clade that can establish nitrogen-fixing root nodules with rhizobium. Parasponia represents five nodulating tropical tree species growing on volcanic islands of Indonesia and Papua New Guinea [10,11]. Parasponia is closely related to the genus Trema, which includes 18 species that do not nodulate [4,12]. Comparative analysis of Trema and Parasponia species showed that loss of the EPR3-type receptor EPR is specific to Parasponia species [4]. Here we aim to characterise the evolutionary trajectory of EPR in the Parasponia-Trema lineage. Specifically, we ask whether EPR may have functioned in nodulation in an ancestral Parasponia-Trema species, and how common loss of EPR3 orthologous genes is in nodulating species. Furthermore, we discuss whether the loss of EPR in Parasponia was instrumental to the microsymbiont switch that occurred in this lineage.

A retrotransposon insertion caused epr pseudogenization in Parasponia species
Parasponia represents five species, three for which genome sequence data have been generated; P. andersonii, P. rigida, and P. rugosa, respectively [4]. Earlier analysis revealed that these Parasponia species, as well as close relatives of the genus Trema, possess a single LjEPR3/MtLYK10 orthologous gene named EXOPOLY-SACCHARIDE RECEPTOR (EPR). P. andersonii and P. rigida EPR accumulated different mutations in the first exon causing a disruption of the predicted open reading frame (ORF), whereas P. rugosa EPR experienced a large deletion affecting exons 1 to 5 ( Table 1). As these mutations in EPR are not shared between the three Parasponia species, they must have occurred in parallel. This may suggest that the loss of EPR in Parasponia is the result of genetic erosion rather than specific selection. Alternatively, a shared, but yet unknown mutation occurred in the non-coding region of the gene affecting its functioning. To find evidence for this latter scenario, we investigated the putative promoter region of EPR in Parasponia and Trema species. In L. japonicus the functional promoter region of LjEPR3 is relatively short, spanning only 329 bp upstream of the translational start codon [7]. We analysed the EPR promoter region in three Parasponia and two Trema species. The alignment of these promoters revealed a large 5,7 kb insertion in all three Parasponia species, just 154 bp upstream of the predicted translational start codon ( Fig. 1A; Supplemental data file 1). Homology searches using BLAST revealed that this insertion represents a unique TY3-GYPSY-type retrotransposon element, which occurs only as a single copy in the genomes of the three Parasponia species, whereas it is absent in Trema. We compared the expression of the P. andersonii epr pseudogene to close homologs of the LysM-type receptor kinase (LYK) family [13]. This uncovered that in none of the samples, Panepr expression was observed, including roots and nodules at different stages of development ( Figure S1). This supports that the retrotransposon insertion in the putative regulatory region of EPR could have been instrumental for the pseudogenization of this gene in the Parasponia lineage.

Trema EPR is expressed in rhizobium-induced nodules
In L. japonicus, the LjEPR3 promoter possesses putative binding sites for the nodulation-specific transcription factors NIN and ERN1 [7]. We analysed the putative promoter region of Trema and Parasponia EPR using MEME combined with manual curation [14,15]. This predicted the occurrence of conserved putative transcription factor binding sites for ERN1 (1x) and NIN (3x), both in Trema and Parasponia EPR promoters in a confined ~ 500 bp region (Fig. 1B). This may suggest that transcriptional regulation of EPR3 ortholog genes is conserved in legumes and non-legumes. As the putative NIN and ERN1 binding sites are present also in the T. orientalis EPR promoter, we questioned whether Trema EPR still possesses a nodule-enhanced expression profile, despite the loss of the nodulation trait.
To find support for the functioning of EPR in nodulation in a Trema-Parasponia ancestor, we generated transgenic P. andersonii lines carrying a TorEPR promoter GUS reporter construct (pTorEPR:GUS). As a putative promoter, a fragment of 1,730 bp upstream of the translational start codon was used, which includes the putative NIN and ERN1 binding sites. Two independent transgenic lines were studied. GUS staining of root tissue did not reveal any blue staining. Subsequently, plantlets (2 × n = 10) were inoculated with the compatible strain Bradyrhizobium elkanii WUR3 [16] and studied 4 and 8 weeks post-inoculation. TorEPR protomer GUS activity was observed in rhizobium-induced cell divisions ( Fig. 2A,B), which in P. andersonii occur in the root epidermis and outer cortical cell layers [17]. In mature nodules, pTorEPR:GUS induced blue staining is confined to the meristematic zones in the apex of the nodule (Fig. 2C-E). In both cases, the GUS expressing cells were yet to be infected by rhizobium.
To find additional support for Trema EPR expression in nodules, we studied gene expression in an intergeneric F1 hybrid of the cross P. andersonii x Trema tomentosa. Earlier studies showed such hybrid plants can be nodulated, but are hampered in hosting rhizobium intracellularly [4]. T. tomentosa is an allotetraploid. We analysed available genome sequence data and identified two T. tomentosa EPR genes, which were named TtoEPRa and TtoEPRb (Supplemental data file 2). Next, we studied EPR allelespecific expression in P. andersonii x. T. tomentosa F1 hybrid roots and nodules. This revealed a nodule-specific expression of TtoEPRa and TtoEPRb whereas no expression of P. andersonii epr was detected (Fig. 3).
Taken together, expression analysis of the P. andersonii x T. tomentosa F1 hybrid as well as pTorEPR:GUS reporter studies in P. andersonii confirm that Trema EPR

The loss of EPR in nodulating species is specific to the Parasponia lineage
In L. japonicus and M. truncatula, LjEPR3 and MtLYK10 commit essential functions in rhizobium infection, whereas in Parasponia the orthologous gene is pseudogenized. Earlier studies showed that also in the legume Aeschynomene evenia the LjEPR3/MtLYK10 orthologous gene is absent [18]. However, this species possesses a close paralog, which possibly evolved as a result of a legume-specific duplication event and that may commit a similar function [9,18]. To determine whether loss of EPR3 occurred more often in nodulating species, we analysed genome sequences of 34 species; 26 legumes (including A. evenia, L. japonicus, and M. truncatula), 7 actinorhizal plant species that nodulate with Frankia, and P. andersonii. In all species, 1 to 4 putative ERP3 orthologous genes were identified. Many of these gene models have been predicted based on automated bioinformatics, without manual curation. As LjERP3, MtLYK10 and TorEPR/Panepr have a conserved gene structure consisting of 10 exons, we used these to manually curate the gene models in other species (Table S1). This revealed that all species investigated possess at least one gene copy that can encode a LysM-type receptor kinase that in length and structure is comparable to LjEPR3/MtLYK10/ TorEPR. Subsequent phylogenetic reconstruction, based on a coding sequence alignment and using close paralogs LjLYS4, LjLYS5, MtLYK11, and PanLYK4 as an outgroup, supported the orthologous relation ( Fig. 4; Supplemental data file 3). Also, it supports the occurrence of a duplication event in the legume Papilionoid subfamily, and the subsequent loss of one copy in the so-called galagoid clade formed by Cicer, Medicago, Trifolium, Vicia, and Pisum. As all analysed plant genomes -except Parasponia-possess an EPR3-type gene, we conclude that loss of this gene in nodulating plant species is uncommon.

Discussion
The legumes L. japonicus and M. truncatula use a specific LysM receptor kinase, namely LjEPR3 and MtLYK10, as a secondary identity-based mechanism to control rhizobium infection [6,9]. We showed that the occurrence of LjEPR3/MtLYK10 orthologous genes is highly conserved in nodulating plants, including actinorhizal plants that interact with diazotrophic Frankia. This suggests that a secondary identity-based mechanism allowing diazotrophic microsymbionts to infect is more generic. Strikingly, however, this mechanism seems to be mutated in Parasponia due to the pseudogenization of LjEPR3/ MtLYK10 orthologous gene EPR. This raises the question of why this gene was lost in the only non-legume lineage that nodulates with rhizobium?
First, we found that epr pseudogenization in the Parasponia lineage is associated with a unique retrotransposon insertion near the transcriptional start of the gene. Second, we studied the transcriptional regulation of the EPR gene in a nodulation context of the nearest non-nodulating sister species of Parasponia; namely Trema spp. It is hypothesized that Trema spp. lost the nodulation trait after the divergence of Parasponia, which is supported by the pseudogenization of several key nodulation genes in Trema [4]. We anticipated that Trema EPR may still possess the cis-regulatory elements critical for expression upon rhizobium-induced signalling since T. orientalis EPR contains putative ERN1 and NIN binding sites in its promoter region, similar as was reported for L. japonicus EPR3 and M. truncatula LYK10 [7,9,[19][20][21][22]. Expression analysis using a T. orientalis EPR putative promoter GUS (pTorEPR:GUS) reporter construct in P. andersonii supported this view. The TorEPR putative promoter showed to be induced in the first dividing epidermal and cortical cells that occur upon rhizobium inoculation, whereas in the nodule expression is confined in the cell clusters that are about to be infected. Likewise, enhanced expression of the Trema EPR alleles was found in nodules formed on T. tomentosa x P. andersonii F1 hybrid plants, whereas the Parasponia allele is not expressed. Together, these findings support the hypothesis that EPR committed a function in nodulation in the last Trema-Parasponia ancestor.
Loss of EPR in the Parasponia lineage may have been instrumental for the microsymbiont switch from Frankia to rhizobium. We speculated earlier that such a switch occurred at the base of the Parasponia lineage, based on evolutionary signatures in the nodule-specific haemoglobin gene [4,23]. The EPR receptor of the nodulating Parasponia-Trema ancestor could have been co-evolved with its (ancestral) Frankia microsymbiont. Such EPR receptor may have hampered the interaction of ancestral Parasponia with rhizobium in a somewhat similar manner as observed in L. japonicus where LjEPR3 hampers infection of the M. loti exoU exopolysaccharide mutant [6]. We initiated the first experiment to find evidence for this hypothesis and introduced the T. orientalis EPR gene in P. andersonii and quantified the nodulation efficiency of these transgenic lines. Though, despite nodule-specific expression of TorEPR ( Figure S2), no phenotypes in nodulation were observed (Figures S3 and S4). This suggests that the effect of trans TorEPR in P. andersonii nodulation is only subtle and difficult to observe under given laboratory conditions. Alternatively, trans TorEPR in P. andersonii may contribute to the interaction potential withcertain Frankia species. Although it is a tempting hypothesis, we considered it experimentally extremely challenging to prove. This is because we anticipate that the ancestral Frankia microsymbiont of the nodulating Trema-Parasponia ancestor most probably belonged to the taxonomic cluster-2 [13], of which species possess LCO biosynthesis genes [24][25][26]. Frankia cluster-2 strains are notoriously difficult to culture [27], and current (non-sterile) inocula only have limited compatibility with actinorhizal plants of the Southern hemisphere [26,28,29].
M. truncatula and L. japonicus, knock mutants in Mtlyk10 and Ljepr3 are affected in the progression of root hair-based infection threads [6,9]. This results in a reduced number of trans-cellular cortical infection threads. In the L. japonicus Ljepr3 mutant, successful infection often occurs from large intercellular pockets of bacteria from which subsequently cell penetration can occur (so-called peg infections) [7]. Parasponia does not support a root hair-based infection mechanism. Instead, rhizobium enters the roots by a novel crack entry mechanism, not found in legumes or actinorhizal plants. Upon inoculation, root epidermal and most outer cortical cells will divide. The newly formed daughter cells remain only loosely attached creating openings that are colonized by rhizobium [17,30,31]. From such an infection pocket, infection threads are formed to enter the nodule primordial cells. In comparison, crack entry infection is also  Table S1 and Supplemental Data file 3 found in some legume species. However, in these cases rhizobium exploits disruptions in the epidermis, e.g. due to later root emergence, rather than actively inducing the formation of such openings [32]. The evolution of this unique crack entry mechanism in Parasponia may have coincided with the loss of epr and the acceptance of rhizobium as a microsymbiont.
Taken together, this study highlights that LjEPR3/ MtLYK10 controlled secondary identity-based mechanism may predate the legumes, as cis-regulatory elements essential for a nodulation associated expression are present in the orthologous gene of Trema. Studies in Parasponia show, however, that the occurrence of a LjEPR3/MtLYK10-orthologous gene is not essential to allow effective rhizobium nodulation.

Plant materials and nodulation
Experiments were conducted using P. andersonii WU1 or its offspring [33] and the interspecific hybrid P. andersonii x T. tomentosa line H9 [4]. P. andersonii plantlets used for nodulation experiment and qRT-PCR analysis were grown in 1 L clear polypropylene containers allowing for gas exchange (Duchefa Biochemie, The Netherlands). Pots were filled with agraperlite type 3 (Maasmond-Westland, The Netherlands), saturated with EKM nutrient solution ( [16]. Plants were placed in a conditioned climate room set at 28˚ C and a 16/8 h day/night regime.

Phylogenetic reconstruction
Protein sequences of publicly available genomes belonging to the Fabid clade (a taxonomic clade within the clade eurosids) were clustered into orthologous groups using Orthofinder (v2.5.1, Emms & Kelly, 2015). The orthogroup containing the EPR3 orthologues was extracted by searching for the L.japonicus LjEPR3 (Lj2g3v14154105) gene name. The EPR3 orthologous proteins were aligned using MUSCLE, and a phylogenetic tree based on this alignment was made using RAxML on the CIPRES Science Gateway version 3.3 [34]. EPR protein alignment was then used to manually curate the data set by assessing the protein model integrity based on the MUSCLE alignment. A protein model was scored complete when all three key EPS receptor domains were present (LysM motifs, trans-membrane and kinase domains). Otherwise, it was scored truncated when missing part of a domain or elongated when it had additional amino acids at the N or C terminal. Any species without a complete orthologous EPR protein model was annotated as a putative gene loss.

Plant transformation
P. andersonii stable transformation was conducted as described in Wardhani et al., (2019) [35]. For the T. orientalis promoter the 1,730 bp upstream region was cloned (Supplemental Data File 1) in a Golden Gate compatible level 0 clone (clone i.d. EC74289). This clone was subsequently used to assemble a pTorEPR:GUS level 2 binary vector EC74794. using the Moclo backbone pICH86966. As empty vector control, the binary vector EC74842 was used, expressing only a kanamycin resistance gene. Golden Gate constructs used in this study are listed in Table S2. Genotyping of transgenic lines was conducted using the Phire Plant Direct PCR Kit (Thermo Scientific, USA) and specific primers for T. orientalis pTorEPR. Amplicons were subsequently confirmed by sequencing.

Microtome sectioning
Longitudinal sections of root nodules were made from root nodules 5 weeks post-inoculation. Plant tissue was fixed and embedded in technovit 7100 as previously described [35]. Thin Sects. (5 µm) were cut using a microtome (Leica Microsystems, Germany) placed on a glass slide and stained with 0.05% Toluidine blue for imaging. Pictures were taken using a Leica DM5500B microscope coupled with a DFC425C camera (Leica Microsystem, Germany).

Library preparation and RNA sequencing
Nodules from the two P. andersoniistable transformation lines 1.3 and 2.1 containing the pTorEPR:TorEPR trans gene, as well as from an empty vector control line, were harvested and flash-frozen in liquid nitrogen in two biological replicates.
Frozen samples were homogenized for 2 min with a bead beater at 2000 rpm and the homogenized sample was immediately resuspended in modified RB buffer, 500 µl RB buffer, 10 µl beta-mercaptoethanol and 50 µl Plant RNA isolation aid (Thermo Fisher Scientific, USA). Then, RNA was extracted using the E.Z.N.A. Plant RNA Kit (Omega Bio-tek, USA) following manufacturer recommendations. Library preparation and sequencing were performed by Novogene (Cambridge, UK). In short, the mRNA is fragmented randomly by adding a fragmentation buffer, then the cDNA is synthesized by using mRNA template and random hexamers primer. Samples were barcoded and pooled according to their effective concentration determined with qPCR and expected data