Reference nodule transcriptomes for Melilotus officinalis and Medicago sativa cv. Algonquin

Abstract Host/symbiont compatibility is a hallmark of the symbiotic nitrogen‐fixing interaction between rhizobia and legumes, mediated in part by plant‐produced nodule‐specific cysteine‐rich (NCR) peptides and the bacterial BacA membrane protein that can act as a NCR peptide transporter. In addition, the genetic and metabolic properties supporting symbiotic nitrogen fixation often differ between compatible partners, including those sharing a common partner, highlighting the need for multiple study systems. Here, we report high‐quality nodule transcriptome assemblies for Medicago sativa cv. Algonquin and Melilotus officinalis , two legumes able to form compatible symbioses with Sinorhizobium meliloti . The compressed M. sativa and M. officinalis assemblies consisted of 79,978 and 64,593 contigs, respectively, of which 33,341 and 28,278 were assigned putative annotations, respectively. As expected, the two transcriptomes showed broad similarity at a global level. We were particularly interested in the NCR peptide profiles of these plants, as these peptides drive bacterial differentiation during the symbiosis. A total of 412 and 308 NCR peptides were predicted from the M. sativa and M. officinalis transcriptomes, respectively, with approximately 9% of the transcriptome of both species consisting of NCR transcripts. Notably, transcripts encoding highly cationic NCR peptides (isoelectric point > 9.5), which are known to have antimicrobial properties, were ∼2‐fold more abundant in M. sativa than in M. officinalis , and ∼27‐fold more abundant when considering only NCR peptides in the six‐cysteine class. We hypothesize that the difference in abundance of highly cationic NCR peptides explains our previous observation that some rhizobial bacA alleles which can support symbiosis with M. officinalis are unable to support symbiosis with M. sativa .

partners (Oldroyd, 2013): legumes secrete flavonoids that attract soil rhizobia and induce expression of rhizobial nod genes, leading to rhizobial production of chito-oligosaccharide Nod factors that elicit the nodulation process by legumes. This process involves the curling of root hairs to trap rhizobia and the formation of infection threads within which rhizobia divide and move toward the root cortical layer (Gage, 2004). Rhizobia released from infection threads are internalized by nodule cells, where they develop into mature N 2 -fixing bacteroids.
In some legumes, such as those belonging to the Inverted Repeat Lacking Clade (IRLC), the rhizobia undergo an irreversible hostinduced process known as terminal differentiation that is largely driven by a unique class of legume proteins known as nodule-specific cysteine-rich (NCR) peptides ( Van de Velde et al., 2010). Terminal differentiation involves cell enlargement, genome endoreplication, and increased membrane permeability and is thought to increase the efficiency of N 2 -fixation (Haag & Mergaert, 2019;Lamouche et al., 2019;Mergaert et al., 2006). Not all rhizobium/legume pairings are compatible (Pueppke & Broughton, 1999;Wilson, 1939). Partner compatibility is determined by numerous factors impacting both early and late stages of the symbiotic interaction (Walker et al., 2020). The flavonoids secreted by legumes vary, as does the ability of rhizobia to respond to different flavonoids (Kosslak et al., 1987;Maxwell et al., 1989;Pueppke et al., 1998;Recourt et al., 1991). Similarly, the Nod factor produced by rhizobia differ and legume hosts respond only to Nod factors with specific structures (DHaeze & Holsters, 2002). Moreover, legume infection depends on rhizobia producing particular host-compatible exopolysaccharide molecules (Finan et al., 1985;Leigh et al., 1985), and variations in exopolysaccharide structure can impact specificity at the level of plant ecotype and bacterial strain (Simsek et al., 2007). In addition, some rhizobia secrete effector proteins that induce effectortriggered immune responses in a cultivar-specific manner, thereby influencing host range (Tsukui et al., 2013;Tsurumaru et al., 2015;Yang et al., 2010). Moreover, for IRLC legumes, an effective symbiotic interaction requires compatibility between the host-produced NCR peptides and the rhizobial membrane protein BacA (diCenzo et al., 2017;Wang et al., 2017Wang et al., , 2018Yang et al., 2017).
NCR peptides are a large class of legume-specific proteins, with $600 members in Medicago truncatula (Young et al., 2011). These proteins display little conservation in amino acid composition but possess four or six cysteine residues at conserved positions (Mergaert et al., 2003). The length of mature NCR peptides varies from about 20 to 50 amino acids and includes two or three disulfide bridges (Mar oti et al., 2015). NCR peptides can be classified as cationic (isoelectric point [pI] ≥ 8), neutral (6 ≤ pI < 8), or anionic (pI < 6) (Mar oti et al., 2015). Highly cationic peptides (pI ≥ 9.0) display antimicrobial activity in vitro, likely through disrupting microbial membranes, thereby leading to permeabilization and cell lysis (Mar oti et al., 2011;Tiricz et al., 2013). In planta, NCR peptides are required for rhizobium terminal differentiation and an effective symbiosis in IRLC legumes (Van de Velde et al., 2010). Deletion of individual NCR genes has shown that at least two of the $600 NCR peptides in M. truncatula are essential for N 2 -fixation (Horváth et al., 2015;Kim et al., 2015); however, mutation of other NCR genes resulted in N 2 -fixation in previously incompatible symbioses Yang et al., 2017), demonstrating the role of NCR peptides in partner compatibility.
The ability of rhizobia to establish an effective symbiosis with IRLC legumes requires the membrane protein BacA (Glazebrook et al., 1993). BacA is a peptide transporter whose deletion results in multiple phenotypes including increased resistance to bleomycin, and gentamicin increased sensitivity to detergents, and altered membrane composition (Ferguson et al., 2004(Ferguson et al., , 2002Ichige & Walker, 1997;Marlow et al., 2009). In addition, bacA deletion mutants are both unable to import NCR peptides and show increased sensitivity to cationic NCR peptides (Barrière et al., 2017;Guefrachi et al., 2015;Haag et al., 2011). Moreover, rhizobium bacA mutants are unable to fix nitrogen in symbiosis with IRLC legumes; instead, the rhizobia are quickly killed in a NCR peptide-dependent fashion upon release from the infection threads (Glazebrook et al., 1993;Haag et al., 2011).
Intriguingly, BacA appears to be a host-range determinant factor in IRLC legumes. For example, studies have shown that introduction of the bacA or bacA-like genes of Mesorhizobium loti and Bradyrhizobium species into a Sinorhizobium meliloti bacA mutant is insufficient to allow N 2 -fixation during interaction with IRLC legumes of the genus Medicago (Guefrachi et al., 2015;Maruya & Saeki, 2010). Similarly, we previously demonstrated that replacement of the S. meliloti bacA with the bacA alleles of Sinorhizobium fredii NGR234 or Rhizobium leguminosarum bv. viciae 3841 does not allow for N 2 -fixation during symbiosis with Medicago sativa (alfalfa) but does support N 2 -fixation on the IRLC legumes Melilotus alba (white sweet clover) and Melilotus officinalis (yellow sweet clover)   Table S1).
In addition to the above-noted comparison, several symbiotic differences have been observed when S. meliloti mutants interact with Medicago versus Melilotus plants (diCenzo et al., 2015;Geddes et al., 2021;Honma & Ausubel, 1987;Zamani et al., 2017), suggesting that Melilotus plants are a valuable secondary model system to study the symbiotic properties of S. meliloti. To further develop M. officinalis as a model species for studying symbiosis, here we report a reference nodule transcriptome for M. officinalis. We further compare the characteristics and the expression of NCR genes between M. officinalis and M. sativa to investigate whether the ability of certain bacA alleles to support symbiosis with Melilotus but not Medicago plants is correlated with differences in the NCR peptide profile of these genera.

| Plant materials and sample collection
M. sativa cv. Algonquin (alfalfa) and M. officinalis (yellow blossom sweet clover) seeds were purchased from Speare Seeds Limited (Harriston, Ontario, Canada). Seeds were surface sterilized with 95% ethanol for 5 min followed by 2.5% hypochlorite for 20 min and then soaked in sterile double-distilled water (ddH 2 O) for 1 h. The sterilized seeds were plated on 1X water agar plates and incubated at room temperature in the dark for 2 days. Five germinated seeds were placed in autoclaved Leonard Assemblies consisting of two Magenta Jars with a cotton wick extending from the top jar (containing vermiculite mixed with silica sand [1:1 w/w]) into the bottom jar (containing 250 ml of Jensen's media; Jensen, 1942) and then incubated in a Conviron growth chamber for two nights. Wildtype S. meliloti strain Rm2011 was grown overnight at 30 C in LBmc broth (10 g L À1 tryptone, 5 g L À1 yeast extract, 5 g L À1 NaCl, 2.5 mM CaCl 2 , and 2.5 mM MgCl 2 ), washed with 0.85% NaCl, and diluted to a density of $1 Â 10 7 CFU ml À1 in sterile ddH 2 O. Ten milliliters of cell suspension was then added to each Leonard Assembly. Plants were grown in a Conviron growth chamber with a day (18 h, 21 C, light intensity of 300 μmol m À2 s À1 ) and night (6 h, 17 C) cycle. Root nodules were collected 4 weeks postinoculation and immediately flash frozen with liquid N 2 and stored at À80 C until use. All nodules collected from plants grown in the same Leonard Assembly were stored in a single tube and treated as one replicate. The shoots from each pot were dried at 60 C for 2 weeks prior to measuring shoot dry weight (Table S2).

| RNA extraction and sequencing
Total RNA from three replicates of frozen M. sativa and M. officinalis

| Sequencing read trimming
Preprocessing of the raw reads was performed to ensure only highquality data were used for de novo transcriptome assembly and differential expression analysis. Read quality was initially evaluated using FastQC version 0.11.9 (Andrews et al., 2015), following which errors in raw reads were identified and corrected by the k-mer-based method of Rcorrector version 1.0.4 (Song & Florea, 2015). The FilterUncorrectablePEfastq.py script (github.com/harvardinformatics/ TranscriptomeAssemblyTools/) was used to remove any read pairs where at least one read had an unfixable error identified by Rcorrector. Adaptors sequences, short reads (< 25 bp), and low-quality reads (Q score < 20) were removed using Trim Galore version 0.6.6 (bioinformatics.babraham.ac.uk/projects/trim_galore/), which is a wrapper calling cutadapt version 3.2 (Martin, 2011) and FastQC. The  (Table S3), with a total of 174,707,055 and 119,333,821 paired-end reads ($43.7 and $29.8 Gb, respectively) remaining for M. sativa and M. officinalis, respectively (Table S4).

| Transcriptome de novo assembly and quality control
The nodule transcriptomes of M. sativa and M. officinalis were de novo assembled following the same procedure. First, the preprocessed reads from the triplicate samples were simultaneously provided to Trinity version 2.9.0 for assembly without genome guidance (Grabherr et al., 2011). Then, the assembled contigs were clustered into genelevel clusters using SuperTranscripts (Davidson et al., 2017). Gene isoforms were identified by Corset version 1.09 with the log likelihood ratio threshold set to very high (Davidson & Oshlack, 2014). Based on the Corset clusters, Lace version 1.14.1 was used to merge the gene isoforms into single long supertranscripts meant to provide a gene-like view of the transcriptome (Davidson et al., 2017).
Multiple methods were used to examine the quality of the Trinity and SuperTranscript assemblies. First, the alignment rates of the preprocessed reads to the assemblies were inspected using STAR version 2.7.8a with the two-pass mode that is more sensitive to alternative splicing (Dobin et al., 2013). Second, assembly statistics such as N50 and number of contigs were calculated using the seqstats software (github.com/clwgg/seqstats). Third, the completeness of the assemblies was evaluated using BUSCO version 5.1.2, run separately using the OrthoDB v10 'Fabales' and 'Viridiplantae' reference databases (Seppey et al., 2019). The assemblies were also compared with the S. meliloti Rm2011 genome (Sallet et al., 2013) using BLASTn version 2.5.0 + (Camacho et al., 2009), which confirmed the absence of contaminating S. meliloti transcripts in the assemblies. Finally, the M. sativa de novo assembly was aligned to a publicly available genome of M. sativa cultivar XinJiangDaYe (Chen et al., 2020) with MUMmer version 4.0+, and 87.3% of transcripts were sucessfully aligned to the genome.  (Eddy, 2011;Haft et al., 2012;Mistry et al., 2021), and results were filtered to remove annotations with a Bit-score < 50. For repetitive annotations from isoforms of a gene, only the consensus annotations were retained. For contigs successfully annotated by more than one of the annotation methods, results from the bidirectional BLAST took priority, followed by the results of eggNOG-mapper, then the Pfam searches, and finally the TIGRFAM searches.

| NCR peptide identification
Considering the high degree of sequence diversity of NCR peptide sequences, the functional annotation methods described above were not sufficiently sensitive to discover genes encoding NCR peptides in the assemblies. Therefore, the SPADA version 1.0 pipeline was used to identify NCR peptides (Zhou et al., 2013). SPADA is specialized to predict cysteine-rich peptides in plant genomes and is distributed with a M. truncatula prediction model. Cysteine-rich peptides in the M. sativa and M. officinalis assemblies were predicted using the SPADA pipeline with the following software: HMMER version 3.0, Augustus version 2.6, GeneWise version 2.2.0, GeneMark.hmm eukaryotic version 3.54, GlimmerHMM version 3.0.1, and GeneID version 1.1 (Birney et al., 2004;Blanco et al., 2007;Lukashin & Borodovsky, 1998;Stanke et al., 2006). The putative NCR peptide sequences were filtered to remove those without a signal peptide and then further filtered based on the E-value (cutoff of 1e-5) and hmm score (cutoff of 50). Filtered sequences were then verified via hmmscan searches against the Pfam database, and they were aligned using Clustal Omega version 1.2.4 (Sievers et al., 2011) to ensure the presence of the signature cysteine motif and N terminal signal peptide that are present in bona fide NCR peptides.

| NCR peptides classification and clustering
To predict the lengths of mature NCR peptides, signalP version 4.1g with the notm network was used to predicted cleavage sites and extract mature NCR peptides (Petersen et al., 2011), and the number of cysteine residues in each motif were counted. The pI values of the NCR peptides were predicted using the pIR R package, and the value for each peptide was calculated based on the mean values from all prediction methods excluding the highest and lowest values (Audain et al., 2016). The M. sativa, M. officinalis, and M. truncatula NCR peptides were clustered using CD-HIT-2D with an identity threshold of 0.7 (Li & Godzik, 2006).

| Gene-expression level estimation and differential expression analysis
Gene-expression levels were estimated individually for each library based on transcript abundance estimation using salmon version 0.12.0 in mapping-based mode (library type automatic, validate Mapping) (Patro et al., 2017) and the reference transcriptomes produced as described above. R package deseq2 version 1.32.0 (Love et al., 2014) was used to perform differential expression analysis between M. sativa and M. officinalis, using the raw counts for each replicate from salmon, the length of each gene in each species as an additional parameter during normalization, and limiting the analysis to one-to-one orthologs identified by OrthoFinder version 2.5.2 (Emms & Kelly, 2019). OrthoFinder was run with default settings using the total predicted M. sativa and M. officinalis proteomes including all isoforms, following which orthologs were reduced to one per supertransript.

| Gene ontology term analysis
The Gene Ontology (GO) terms for M. sativa and M. officinalis were obtained from the M. truncatula A17 proteome (assembly release r5.0 1.7) and annotations from eggNOG-mapper. For transcripts annotated with GO terms from both sources, the consensus GO term annotations were retained. Then, the GO terms were reduced based on the Generic GO subset (download 10 August 2021).

| Software information
All analyses were performed in an Ubuntu 20.04.2 LTS (Linux 5.8.0-48-generic) operation system or on the Compute Canada Graham cluster. Custom scripts were written in Python version 3.8.5, and bash. R version 3.6.3 was used during data analysis.  Table S4  M. officinalis contigs (Datasets S1 and S2). Of these, $52% (M. sativa) and $58% (M. officinalis) are high confidence annotations as they were transferred from the M. truncatula whole genome annotation following identification of putative orthologs using a BLAST bidirectional best hit approach (   and $47,000 lncRNAs in the legume Pisum sativum (pea) (Kerr et al., 2017), we hypothesize that the majority of the unannotated M. sativa and M. officinalis transcripts reflect lncRNAs.
All of the examined assembly summary statistics (mean and median contig length, contig N50) were improved in the compressed assemblies compared with the original de novo assemblies, indicating that the compressed assemblies are of higher structural quality (Table 1). The M. sativa transcriptome summary statistics, such as N50 and and transcript length, are consistent with those reported for other M. sativa de novo transcriptome assemblies, although the number of transcripts varies likely due to each study examining different tissues (Arshad et al., 2018;Zhang et al., 2015). In addition, the assemblies appear to be robust; greater than 90% of the filtered reads used for transcriptome assembly could be mapped to the corresponding assemblies by STAR (Table 1)   We next examined the predicted functions of the proteins encoded by the 50 most abundant transcripts in both species (Tables 2 and 3). Not surprisingly, these transcripts were enriched in those predicted to encode nodulins and leghaemoglobin-like proteins.
Nodulins refer to diverse proteins expressed specifically in nodule tissue, which play various structural or metabolic roles during symbiotic nitrogen fixation. Among the nodulins are the leghemoglobin proteins that account for up to 40% of the total soluble protein in legume nodules (Nash & Schulman, 1976 identified. Interestingly, the abundance of the conserved supertranscripts was significantly higher, on average, than that of the species-specific transcripts (p-value < 2.2e-16; Figure 3). In both plant species, the majority of the most abundant, species-specific annotated transcripts were also nodulins, globin family proteins that are likely species-specific leghaemoglobin isoforms, and some housekeeping genes such as ribonuclease and ribosomal proteins. It is noteworthy that the most abundant M. sativa-specific supertranscript is predicted to encode albumin I. Similarly, M. officinalis also has a highly expressed albumin I supertranscript. The albumin I peptide family is known to be highly expressed in legume seeds and play roles in seed protection (Rahioui et al., 2014). Expression of albumin I genes has also been observed in M. truncatula root nodules, with expression specific to uninfected cells in the nitrogen fixation zone (Limpens et al., 2013).     Figure 4). Considering the limitations of interspecies differential expression analyses, we restricted our investigation to supertranscripts with absolute log 2 fold changes > 5 and a p-value < .05. Using these thresholds, we identified 290 differentially  (Table S2), we cannot rule out that some of these transcriptomic differences may also reflect variances in nodule maturity and/or host metabolic activity at the time of harvest.

| NCR peptide diversity and expression profile
We previously observed that replacing the bacA allele of S. meliloti  Table S1). We hypothesized that this was due to differences in the NCR peptide profiles of these species . To test this hypothesis, supertranscripts encoding NCR peptides were identified in the M. sativa and M. officinalis transcriptome assemblies using the SPADA pipeline (Zhou et al., 2013 (Horváth et al., 2015;Kim et al., 2015).
Overall, these results are consistent with rapid evolution of the  Figure 5c). Strikingly, when subdividing the NCR peptides with pI values > 9.5 into those with four or six cysteine residues, we observed that those with six cysteines were $27-fold more abundant in M. sativa than M. officinalis (Figure 4e,f). Considering these results, we hypothesize that the ability of the R. leguminosarum bacA allele to support symbiosis with M. officinalis and P. sativum, but not M. sativa, is a consequence of the elevated abundance of highly cationic (pI > 9.5) NCR peptides in Medicago nodules. Consistent with this, two M. sativa NCR peptides were identified that had high sequence idenitity (83% and 73%) with M. truncatula NCR035, which is a cationic NCR peptide with antimicrobial activity against S. meliloti (Haag et al., 2011), whereas similar sequences were not found in M. officinalis. It may be that the BacA proteins of S. fredii and R. leguminosarum are less capable of transporting highly cationic NCR peptides, and consequently, strains with these BacA proteins may be more sensitive to the antimicrobial activities of these NCR peptides.

| CONCLUSION
We report high-quality nodule transcriptome assemblies for M. sativa cv. Algonquin and M. officinalis that we expect will serve as valuable resources for the legume research community. In particular, we expect that the availability of a nodule transcriptome for M. officinalis will help establish this plant as a secondary model system for studies of the symbiotic properties of S. meliloti.
We were particularly interested in using these transcriptomes to compare the properties of the NCR peptides encoded by both species.
Despite predicting 33% more NCR peptides in M. sativa than M. officinalis, NCR transcripts accounted for roughly 9% of the transcriptome (based on TPM values) in both species. In general, the characteristics of the NCR peptides of M. sativa and M. officinalis were highly similar. However, transcripts encoding cationic NCR peptides with a pI > 9.5 were $2-fold more abundant in M. sativa than in M. officinalis and 27-fold more abundant when considering only six-cysteine NCR peptides. These results are consistent with previous observations that transcripts encoding cationic NCR peptides with a pI > 9.5 account for $2-fold more NCR transcripts in M. truncatula compared with P. sativum. Cationic, but not neutral or anionic, NCR peptides display antimicrobial activity through disrupting the integrity of microbial membranes (Mikuláss et al., 2016). It has been hypothesized that BacA provides protection against these NCR peptides by importing them into the cytoplasm and thus away from the membrane (Arnold et al., 2017;Nicoud et al., 2021). Considering that the BacA proteins of S. fredii and R. leguminosarum share less than 60% amino acid identity with the BacA protein of S. meliloti, it is reasonable to speculate that they have different substrate specificity and may be less capable of transporting cationic NCR peptides . If true, this could explain why the bacA alleles of S. fredii and R. leguminosarum can support symbiotic nitrogen fixation with M. officinalis but not M. sativa; the increased production of cationic NCR peptides in M. sativa, coupled with lower rates of import into the S. meliloti cytoplasm, could result in an accumulation of these peptides in the periplasm, resulting in a loss of viability and lack of nitrogen fixation . In future work, it will be interesting to test whether S. meliloti strains with different bacA alleles display differing sensitivities to these highly cationic NCR peptides or differences in their abilities to transport these peptides.