The Bacillaceae-1 RNA motif comprises two distinct classes

Non-coding RNAs are key regulatory players in bacteria. Many computationally predicted non-coding RNAs, however, lack functional associations. An example is the Bacillaceae-1 RNA motif, whose Rfam model consists of two hairpin loops. We find the motif conserved in nine of 13 non-pathogenic strains of the genus Bacillus but only in one pathogenic strain. To elucidate functional characteristics, we studied 118 hits of the Rfam model in 11 Bacillus spp. and found two distinct classes based on the ensemble diversity of their RNA secondary structure and the genomic context concerning the ribosomal RNA (rRNA) cluster. Forty hits are associated with the rRNA cluster, of which all 19 hits upstream flanking of 16S rRNA have a reverse complementary structure of low structural diversity. Fifty-two hits have large ensemble diversity, of which 38 are located between two coding genes. For eight hits in Bacillus subtilis , we investigated public expression data under various conditions and observed either the forward or the reverse complementary motif expressed. Five hits are associated with the rRNA cluster. Four of them are located upstream of the 16S rRNA and are not transcriptionally active, but instead, their reverse complements with low structural diversity are expressed together with the rRNA cluster. The three other hits are located between two coding genes in non-conserved genomic loci. Two of them are independently expressed from their surrounding genes and are structurally diverse. In summary, we found that Bacillaceae-1 RNA motifs upstream flanking of ribosomal RNA clusters tend to have one stable structure with the reverse complementary motif expressed in B. subtilis . In contrast, a subgroup of intergenic motifs has the thermodynamic potential for structural switches.


Introduction
A diverse collection of non-coding RNAs (ncRNAs) regulates bacterial gene expression.These include trans-acting small regulatory RNAs, antisense RNAs, and cis-acting RNA structures in the untranslated regions of operons for controlling transcription or post-transcriptional fate.Regulatory structured and ncRNAs have been associated with multiple physiological processes and molecular mechanisms among bacteria (Repoila and Darfeuille, 2009), making them promising targets in bacterial metabolic engineering for biotechnological purposes (Nshogozabahizi et al., 2019;Wang et al., 2012).One example is the cisregulatory motif EAR which controls exopolysaccharide biosynthesis genes (eps) expression through a processive antitermination mechanism to ensure the complete synthesis of the 16 kbp eps operon within the order Bacillales (Irnov and Winkler, 2010).Cis-regulatory RNAs also play important roles in proper ribosome synthesis, such as the ribosomal protein L20-interacting RNA that interacts with the L20 protein to repress further L20 synthesis and, hence, contributes to the maintenance of appropriate stoichiometry between ribosomal proteins and rRNAs (Choonee et al., 2007;Meyer et al., 2018).Some RNAs adopt multiple functional structures regulated through a structural switch that can be activated by, for example, trans-acting factors (Nagel and Pleij, 2002).Among the best-studied bacterial regulatory ncRNAs are riboswitches, i. e., ligand-dependent regulatory ncRNAs, and RNA thermometers (RNATs), i.e., temperature-sensitive regulatory ncRNAs, both acting through structural switches sensing changing environments.
The extent of RNA contribution to gene regulation is, however, still unknown even in the model microorganisms Escherichia coli and Bacillus subtilis (Abduljalil, 2018).Many putative structured RNAs have been revealed by computational tools such as CMfinder (Yao et al., 2006), which ranks candidates by both sequence conservation and structure conservation.Further, novel members of known RNA families, e.g., riboswitches, can be predicted by utilizing structure motifs, sequence conservation, and covariation of known domains, e.g., aptamers (Retwitzer et al., 2015).Subsequent experimental tests of hypotheses regarding function are often applied to assess the accuracy of such predictions (Stav et al., 2019), such as differential RNA stability of diverse RNA structures (Wan et al., 2012).
Among those prokaryotic ncRNAs with unknown function is the Bacillaceae-1 RNA motif (Rfam family RF01690), which we term B-1 motif in the following.Weinberg et al. (2010) bioinformatically predicted de novo the conserved structure as a putative ncRNA within bacteria of the family Bacillaceae using CMfinder (Yao et al., 2006).Similar putative RNA structures are found in other families of the order Bacillales, such as in the genus Sporolactobacillus and Brevibacillus brevis, and are available in the complete (full) Rfam alignment (Kalvari et al., 2021).The B-1 motif consists of two terminal loops with the nucleotide consensus RUCCU in both loops, where R stands for a purine (A or G).In the Rfam database (v14.4),2,235 members of the B-1 motif are annotated in 120 species, with nine instances found in the Gram-positive model microbe B. subtilis.Many of them are located adjacent to a ribosomal RNA.Recently a comparative transcriptomic study measured the expression landscape of B. subtilis under over a hundred different conditions comprising, besides others, temperature and saline concentration (Nicolas et al., 2012).Based on this study, a review about RNAbased regulation in B. subtilis (Mars et al., 2016) discusses a transcribed intergenic region (S160) with the highest expression under salt stress that overlaps one specific copy of the B-1 motif.The authors predicted an "intricate secondary structure" for the entire S160 locus but did not investigate the role of the B-1 motif.To our knowledge, no further functional description is available for neither the B-1 motif nor the S160 region.This work studies the conservation, structure diversity, and expression of the B-1 motif and describes two distinct classes related to ribosomal RNA clusters and intergenic loci.

Data
All sequence and annotation data from bacterial genomes were taken from GenBank release 234 (Sayers et al., 2019) and included 20 different Bacillus spp.and Listeria monocytogenes as an outgroup.The complete list and accession number of genomes are available in Table 1.

Structure-based alignments
To identify all B-1 motifs in the 21 genomes, a refined search was performed using the Rfam v. 14.1 covariance model (CM; Kalvari et al (Sundfeld et al., 2016) to the 21 genomes to search for more distant members.From the analysis detailed in the Supplementary Methods S1-2, we conclude that the B-1 motif CM did not miss any structurally conserved RNA structures neither in the 10 genomes where the CM motif was found nor in the 11 genomes where initially no CM matches were found.

Reverse complementary Bacillaceae-1 RNA motif
All B-1 motifs identified among the 20 Bacillus spp.genomes (Table 1) were aligned with Infernal cmalign to the Rfam covariance model (RF01690).The resulting alignment (97 bp long) was used to refine the CM and to trace the evolutionary history of the B-1 motif.Structure-based multiple sequence alignments of the reverse complementary sequences of the 118 hits of the B-1 motif and all seed sequences of 3016 Rfam families were built with mlocarna v. 1.9.2.2 (Will et al., 2012;11 Rfam families of more than 200 seed sequences, sequence length greater than 1000 nucleotides, or both were not completed due to running time greater than 72 h).Then, PETfold v. 2.1 (Seemann et al., 2008) was used to predict the conserved secondary structure from the retrieved alignment, and Infernal cmbuild v.1.1.2(Nawrocki and Eddy, 2013) to build the CM.The CM relative entropy (sequence) and the HMM relative entropy (structure) were calculated for the CMs of both forward and reverse complementary motif using Infernal cmstat v 1.1.2.The mean relative entropy per match state, in bits, describes the expected (mean) score per consensus position of the CM.The larger the difference between the CM and HMM relative entropy, the more the model will rely on structural conservation relative sequence conservation when identifying members.

16S rRNA phylogenetic tree
To trace the evolutionary history of the B-1 motif among the genus Bacillus, the 16S rRNA phylogenetic tree was made.For that reason, 16S rRNA from the species listed in Table 1 were downloaded from the SILVA SSU database 138 (Yilmaz et al., 2014).As the rRNA is present under multiple copies and evolves under concerted evolution (Espejo and Plaza, 2018), one of the 16S sequences was chosen randomly per genome as their representative.Then, the 21 sequences were aligned and masked using SSU-Align 0.1.1 (Nawrocki, 2009), which aligns the rRNA sequences based on their structure.The final alignment consisted of 1511 positions.The phylogenetic reconstruction was performed by Maximum Likelihood (ML) with IQ-TREE v. 1.6.12(Nguyen et al., 2015) and the substitution model Transitional model 3 (TIM3; Posada, 2003) with empirical base frequencies and using the FreeRate model (Soubrier et al., 2012;Yang, 1995) with three categories ("TIM3 + F + R3") as determined by the minimum Bayesian Information Criterion according to ModelFinder (Kalyaanamoorthy et al., 2017).Branch and node support values were obtained with the ultrafast bootstrap (Hoang et al., 2018) and the Shimodaira-Hasegawa-like approximate Likelihood Ratio Test (Anisimova et al., 2006;Guindon et al., 2010;Shimodaira and Hasegawa, 1999), respectively.Parsimonious reconstruction of the evolutionary history of the B-1 motif trait was done by tracing the presence or the absence of the trait mentioned above over the 16S rRNA ML tree using Mesquite v. 3.61 (Maddison and Maddison, 2019).

B-1 motif phylogenetic tree
To trace the evolutionary history of the sequence of the B-1 motif, its alignment among the genus Bacillus was used.The phylogenetic reconstruction was performed as described before, except that the substitution model was Kimura-2-parameter (K2P;Kimura, 1980) using a discrete Gamma model (Yang, 1994) with four categories ("K2P + G4") according to ModelFinder (Kalyaanamoorthy et al., 2017).Parsimonious reconstruction of the association of the RNA motif with the rRNA clusters and structure ensemble diversity was done tracing the genomic location of the different B-1 motifs over the B-1 motif ML tree using Mesquite v. 3.61 (Maddison and Maddison, 2019).

Evaluation of ensemble diversity to test for specific RNA families
The ensemble diversity was calculated with RNAfold v. 2.4.14 (Lorenz et al., 2011), defined as , where p ij is the basepair probability between nucleotide i and j.We applied the lengthnormalized ensemble diversity, that is, the ensemble diversity divided by the squared sequence length.Its distribution was compared between all Rfam v. 14.1 seed sequences, 30 riboswitch families from Rfam, RNAT families from Rfam, and the B-1 motifs identified among the Bacillus spp.genomes.
To test for structural switches, we calculated the maximal base-pair distance between MFE structure and all suboptimal structures with delta free energy less than 0.5 kcal/mol to the MFE (ViennaRNA2/2.4.14 RNAsubopt -e 0.5 -s) and normalized by the number of base-pairs in the MFE structure ("Base-pair distance per MFE base-pair").The mean value of "Base-pair distance per MFE base-pair" of 4,773 seed sequences of the 30 riboswitch families from Rfam v. 14.1 is 0.463 and was applied as a threshold for RNAs with putative structural switches.

Expression analyses of tiling array dataset concerning temperature and saline concentrations
A publicly available tiling array dataset from Nicolas et al. (2012) was considered, which provides expression data for selected genomic regions at 104 different conditions.We investigated three experimental setups concerning temperature and saline concentrations to evaluate a putative function of the B-1 motif located in the S160 locus as RNAT or salt-related riboswitch.The first experiment tested the impact of the extreme temperatures (51 • C and 16 • C) and the high salinity stress (1.2 M NaCl) in B. subtilis growth.The second experiment was designed to analyze the impact of the temperature shifting (37 • C to 48 • C or 18 • C) and the 4% ethanol stress.The last experiment examined the impact of moderate salinity stress (0.4 M NaCl) in B. subtilis growth.Each experiment consists of three replicates.Wet laboratory procedures were described in detail in Nicolas et al. (2012).

Expression analyses of total RNAseq dataset concerning high-saline concentrations
A publicly available Ion Torrent Proton total RNAseq dataset was applied to study the impact of high-salinity (1.2 M NaCl) in outgrowing spores for B. subtilis (Nagler et al., 2016;SRA project SRP074602).The dataset consists of case/control duplicates at the four time points (0, 30, 60, and 90 min after the experiment).The libraries contain 8 -12.4 million reads.We cleaned the raw reads with trimmomatic v. 0.39 (Bolger et al., 2014) by clipping after positions with a Phred score below ten or an average Phred score of at least ten within a four bp window.Reads shorter than 40 bp were discarded.The filtering reduced the number of reads by at most 8%.Afterward, the cleaned reads were mapped against the RefSeq reference genome (accession ASM904v1) with segemehl v. 0.3.4 using default parameters (Hoffmann et al., 2009).The expression counts for the gene, and UTR features in the BSGatlas v. 1.1 (Geissler et al., 2021) were quantified with featureCount v. 2.0.1 E. González-Tortuero et al. (Liao et al., 2014).We improved the cross-library content by stabilizing the counts against CDS mapping reads after excluding the spores (timepoint 0 min) and highly variant genes between biological replicates (rRNA, tRNA, SRP, tmRNA, and ribozyme).The differential expression due to high salinity was conducted with the LRT option of DESeq2 v. 1.26.0 (Love et al., 2014) in R v. 3.6.3 (R Core Team, 2019).This analysis was conducted in a computationally reproducible framework with snakemake v. 5.7.1 and anaconda v. 4.7.12.

Statistical analyses
We applied Pagel's test of correlated discrete traits (Pagel, 1994) using 1000 replicates to test the association between the presence of the B-1 motifs and the pathogenicity of the different Bacillus spp.To test the association between low or high ensemble diversity B-1 motifs and the different genomic contexts (e.g., proximity to rRNAs), we applied Felsenstein's phylogenetic independent contrasts test (Felsenstein, 1985).In both cases, the test provides Pearson's correlation coefficient of two 21 long vectors corresponding to the studied species where each number is the count of low or high ensemble diversity motifs in a specific genomic context.Pagel's test and Felsenstein's phylogenetic independent contrasts were performed using Mesquite v. 3.61 (Maddison and Maddison, 2019).
For each expression analysis experiment from Nicolas et al. ( 2012), we evaluated the normality and homoscedasticity of the data by performing the Shapiro-Wilk test and Levene's test, respectively.Since the data did not violate any assumption for normality (Shapiro-Wilk [first experiment]: 0.174; [second experiment]: 0.552; [third experiment]: 0.072) and homoscedasticity (Levene's test [first experiment]: 0.799; [second experiment]: 0.392; [third experiment]: 0.392), we looked for differences in the expression levels of the S160 locus between the treatments by performing a one-way ANOVA.To find significant differences between pairs of treatments, we applied the Tukey HSD.In the case of the moderate salinity stress, the Student's t-test was applied to test for the differential expression of S160.All these statistical tests were performed in R v. 3.6.0(R Core Team, 2019) using the car package (Fox and Weisberg, 2019).

RNA motif is enriched for non-pathogenic Bacillus
We searched the genomes of 21 bacteria, comprising 20 Bacillus species (spp.) and Listeria monocytogenes as an outgroup (see Methods), for regions that align to the RNA structure of the B-1 motif as described by its covariance model (CM) in Rfam (Kalvari et al., 2021).Our results in B. subtilis str.168 are consistent with the Rfam annotation, which annotates nine hits with a score cutoff of 35, while we find ten hits with a score cutoff of 29.However, two of the ten hits in B. subtilis have much lower scores of 30.9 and 37.0 than the remaining hits with scores above 59.Therefore, in the following, we focus on the eight most confident hits for our in-depth analysis in B. subtilis (Table 2; Supplementary Table S1).
Within our score cutoff of 29, we detected 118 hits of the B-1 motif in 11 Bacillus spp., of which 88 hits in 10 species are exact matches to the curated set of representative sequences for the family, i.e., Rfam seed sequences.The CM build of the 118 hits retained the same structure information content and slightly increased sequence information content as the respective Rfam model built from 95 seed sequences, all from the Bacillus genus (Fig. 1A and B).The latter results from a lower average pairwise sequence identity of the 118 hits considered in this study compared to the Rfam seed alignment (59.5% and 61.2%, respectively).We did not find more distant members of the B-1 motif in the 21 genomes by applying RNA structure-based local alignments (see Methods).
We found the B-1 motif in the Bacillus clades II, Geo/Ano, III, VI, and VII (Hernández-González et al., 2018; Table 2).Tracking the B-1 motif trait over the 16S rRNA Maximum Likelihood (ML) tree of the genus Bacillus suggests that it was present on some basal Bacillus spp.such as B. infantis or Oceanobacillus iheyensis (Table 2; Fig. 2).Thus, the B-1 motif was fixed among the genus Bacillus evolution, although it had been lost for two clades.Recently, multiple cis-and trans-acting ncRNA elements were described in host-defense processes and virulence-relevant physiological and metabolic processes in bacteria (Ahmed et al., 2016;Diallo and Provost, 2020;Heroven et al., 2017;Krawczyk-Balska et al., 2021).To evaluate if the presence of the B-1 motif is associated with pathogenicity, Pagel's test was performed.In short, such a test evaluates the relationship between two discrete characters considering the evolution of these traits across the phylogenetic tree (Pagel, 1994).Interestingly, the presence of the B-1 motif and the pathogenicity trait are significantly negatively correlated (Pagel's test: Likelihood Ratio = -0.521;P = 0.001).Indeed, none of the pathogenic and symbiotic Bacillus spp.has a B-1 motif associated with rRNA clusters.

Evolution and genomic context among the genus Bacillus
The evolutionary history (ML tree) of all detected B-1 motif sequences (Supplementary Fig. S2) is not congruent with the 16S rRNA ML tree (Fig. 2).The B-1 motifs are clustered by neither species nor genomic context (adjacent genes).The only exception is comprised by 14 hits in clade III (B. licheniformis, B. subtilis, B. amyloliquefaciens, and B. pumilus;Hernández-González et al., 2018) that are grouped by both species and genomic context, i.e., located between coding sequences (CDSs) and 16S rRNA (Supplementary Fig. S2).However, not all rRNA clusters have an upstream flanking B-1 motif.
In B. subtilis, four of the eight B-1 motifs are adjacent to the 16S rRNA, one is located between the 23S and the 5S rRNA, and the remaining three are located between two CDSs (Table 2).We used the sequences of the adjacent up-and down-stream genes to identify syntenic regions covering the B-1 motif among the genus Bacillus.Homologous genes were searched using BLASTn and tBLASTx (Camacho et al., 2009), disabling the DUST and the SEG (Wootton and Federhen, 1996) masking, respectively (Supplementary Table S1).A B-1 motif between 23S and 5S rRNA exists in several copies in B. licheniformis and Oceanobacillus iheyensis as part of the rRNA cluster.The genomic context of all four upstream flanking B-1 motifs of 16S rRNA is conserved in at least

Table 2
Coordinates and genomic context of eight B-1 motifs in the reference genome B. subtilis str.168.Genes are categorized as up-and downstream relative to the B-1 motif.Strand specifies the genomic orientation of the up-stream gene, B-1 motif, and down-stream gene.

Mirrored conserved RNA secondary structures coexist on both strands
The structure-based alignments of the reverse complementary sequences of Rfam families result in CMs that are numerically equally good scoring as the Rfam model (see Methods and Supplementary Fig. S1).The same applies for the 118 hits in the 11 Bacillus spp.(CM relative entropy of 0.838 for both Rfam and the reverse complementary  2), (C) reverse complementary (revcomp B-1) motif in 11 Bacillus spp.The CM relative entropy, a measure of structure (base-pair) information in the alignment, is 0.838 for all three alignments (A-C).The HMM relative entropy, a measure of sequence information in the alignment, is 0.622 for (A), 0.640 for (B), and 0.618 for (C).A nucleotide of red, black, or grey color indicates whether its weighted (distances between the sequences to cluster them in a bifurcating tree) frequency exceeds 97%, 90%, and 75%, respectively.A circle of red, black, or grey color indicates whether a nucleotide is present in the position with weighted frequency more than 97%, 90%, or 75%, respectively.Green shady base pairs indicate covarying pairs, and blue shady base pairs are compatible.The figures were made using R2R (Weinberg and Breaker, 2011).model).The retrieved reverse complementary structure is mirrored to the original B-1 motif (Fig. 1C) with folding energy comparable to the Rfam model.The loop sequence motif described in Rfam (nucleotide consensus RUCCU) is also reverse complementary (Shine-Dalgarno (SD)like AGGAY; Band and Henner, 1984;Wei et al., 2017), allowing putative loop-loop interactions between instances of the forward and reverse complementary structure.A similar feature exhibits the ydaO/yuaA leader harboring a pseudo-symmetrical, pseudo-dimeric conserved secondary structure in multiple thermophilic microorganisms (Gao and Serganov, 2014;Jones and Ferré-D'Amaré, 2014;Ren and Patel, 2014).We term the reverse complementary motif as revcomp B-1 motif.
During the development of the RNA motif database RMfam (Gardner and Eldai, 2015), the authors hypothesized that the terminal loops of the revcomp B-1 motif could be related to carbon regulation as they match the carbon storage regulator A (CsrA)/RsmA binding motif.The CsrA/ RsmA binding motif is a stem-loop with the loop sequence GGA and occurs in pairs in Gammaproteobacteria (Gardner and Eldai, 2015).CsrA generally binds the SD sequence of some mRNAs associated with carbohydrate metabolism and cellular communication to inhibit their expression and, thus, to ease mRNA decay (Liu et al., 1997;Mercante et al., 2009).Hence, the loop sequence GGA of the revcomp B-1 may exhibit a binding site for RNA binding proteins.

The B-1 motif comes in two distinct structural classes
The two hairpin loops of the B-1 motif are conserved between all CM hits despite small variations of the minimum free energy (MFE) structure (see Supplementary Methods S3).To hypothesize functional implications of RNA structures, we inspected their thermodynamic ensemble diversity that measures the dissimilarity of structural alternatives.Due to its length dependency (see the formula in Methods), we normalized the ensemble diversity by the square of sequence length.The lengthnormalized ensemble diversity for 118B-1 motifs shows two modes, one close to zero and one in the range of 0.0017 to 0.00572 (Fig. 3A).The revcomp B-1 motifs are dominated by low ensemble diversity (Fig. 3B).The two modes are not correlated to the nucleotide distance between the motif sequences as they do not cluster in the B-1 motif ML tree (Supplementary Fig. S3).Most of the B-1 motifs with low ensemble diversity also have a revcomp B-1 motif with low ensemble diversity (Supplementary Fig. S136A).In contrast, only 40% of B-1 motifs with high ensemble diversity also have a reverse complementary structure with high ensemble diversity (Supplementary Fig. S136B).Based on the observation that B-1 motif members can be separated into low and high structural diversity, we compared their ensemble diversity to known regulatory RNAs: all Rfam seed sequences, 30 Rfam riboswitch families, and 31 Rfam RNAT families.As riboswitches and RNATs can fold into alternative RNA structures, albeit metabolite binding or temperature changes induce the refolding, we expect that they exhibit distinguishable ensemble diversity.In Fig. 3A, we see increased ensemble diversities of riboswitches and RNATs compared to all other Rfam families (mean ± standard deviation: 0.0019 ± 0.0014, 0.0015 ± 0.0010, 0.0014 ± 0.0011, respectively).Nevertheless, only for riboswitches and not RNATs, the fraction of members with high structural diversity is higher than among all other Rfam seed sequences (Fisher's Exact Test P = 3 × 10 -77 and P = 0.5627, respectively).Supplementary Fig. S4 shows that most riboswitch families in Rfam have larger median lengthnormalized ensemble diversity than all other Rfam families.About half of the riboswitch families have median length-normalized ensemble diversity below our threshold, which can be, at least partly, explained as Rfam alignments often cover only the ligand-binding domains of riboswitches, so alternative structures are not necessarily exhibited in the Rfam seed sequences.However, more importantly, a fraction of B-1 motifs different from those 5 ′ of 16S rRNA has rather high structural diversity similar to the other half of riboswitch families (Supplementary Fig. S4A), whereas the revcomp B-1 motifs have not (Supplementary Fig. S4B).Interestingly, almost all B-1 and their reverse complementary motifs 5 ′ of 16S rRNA have length-normalized ensemble diversity less than 0.0017 (Fig. 3).The exceptions are three B-1 motifs in B. pumilus from the same strand as the 16S rRNA.Based on these observations, we propose that this single Rfam family consists of two distinct classes.The first class has a length-normalized ensemble diversity of less than 0.0017.Both forward and revcomp B-1 motifs upstream flanking of a 16S rRNA are almost exclusively members of this class.The second class has length-normalized ensemble diversity larger or equal than 0.0017 and may act as structural switches comparable to riboswitches.

The class of stable structures comprises upstream flanking motifs of 16S ribosomal RNA
Bacterial 16S rRNA, 23S rRNA, and 5S rRNA are organized as co- transcribed operons.40B-1 motifs are associated with the rRNA cluster, of which 19 are 5 ′ of 16S.The B-1 motif also occurs in the internal transcribed spacer between 23S and 5S, and between 16S and 23S rRNA.Twenty-six of rRNA associated B-1 motifs, and 36 reverse complements have length-normalized ensemble diversity lower than 0.0017 (Supplementary Fig. S5; Supplementary Table S2).Interestingly, all 19 revcomp B-1 motifs upstream of 16S rRNA have low structural diversity (Fig. 3B).Despite the clear association between B-1 motifs of low structural diversity and their location upstream flanking 16S rRNA, this class is not restricted to this genomic location (Felsenstein's phylogenetic independent contrasts test ([B-1: 5 ′ 16S rRNA vs. not 5 ′ 16S rRNA]: Pearson's r = 0.154; P = 0.252, and [revcomp B-1: 5 ′ 16S rRNA vs. not 5 ′ 16S rRNA]: Pearson's r = 0.053; P = 0.409; see also Fig. 3 and Supplementary Fig. S5 in comparison).Whereas B-1 motifs located in the internal spacers of the rRNA transcriptional unit are preferably in the same orientation as the rRNA, B-1 motifs upstream of 16S rRNA are preferably located on the reverse strand and share high sequence similarity (Supplementary Fig. S2).
The B. subtilis genome has ten copies of the rRNA cluster, and five of them have one B-1 motif located either 5 ′ of 16S or between 23S and 5S (Table 2).For evaluating the transcriptional activity of the rRNAassociated motifs, we investigated two public expression datasets in B. subtilis (see Methods): A large scale tiling array study of over a hundred diverse environmental conductions (Nicolas et al., 2012) and an RNAseq study of outgrowing spores under extreme salt stress (1.2 M NaCl; Nagler et al., 2016).Both datasets support that all four revcomp B-1 motifs are transcribed with their downstream 16S rRNA (Fig. 4, Supplementary Fig. S6-9).In contrast, there is little to no expression of the actual B-1 motif, e.g., no antisense expression 5 ′ of 16S.This observation suggests that in B. subtilis only the reverse complement of the B-1 motif is transcribed as part of the rRNA transcriptional unit.Furthermore, the RNAseq data under high-salinity (Nagler et al., 2016) shows high expression elevation of both revcomp B-1 motif (log2 fold change (log2FC) ≈ 4.5 after 30 min) and 16S rRNA (log2FC ≈ 10 after 30 min) after introducing salt stress (1.2 M NaCl) but with a much higher expression level for 16S (Fig. 4, Supplementary Table S4).Over time, the expression level of the entire transcriptional unit goes down again to normal.Due to the typically large amount of rRNA expression, we cannot entirely exclude the possibility that the expression signal of the B-1 motifs upstream flanking of the 16S rRNA is transcriptional noise.However, B-1 motif expression is supported by the conservation of the B-1 motif upstream flanking of the rRNA cluster and its putative function in 16S rRNA maturation (see below).
In E. coli, the endoribonuclease RNase III cleaves the primary rRNA transcript to separate the individual rRNAs.The complementary sequences flanking 16S rRNA are cleaved at positions + 115 (5 ′ ) and + 33 (3 ′ ) by RNase III during rRNA maturation-Fig.7 in Li et al. (1999).We ran RNAcofold with default options (Lorenz et al., 2011) on 150 nucleotides upstream and 50 nucleotides downstream of all ten 16S rRNAs in B. subtilis, of which four 16S rRNAs have an upstream revcomp B-1 motif, and six do not.Interestingly, the RNase III cleavage site in E. coli perfectly matches the start of long-range interaction between the flanking loci of the four 16S rRNA with the revcomp B-1 motif upstream of 16S (Supplementary Fig. S14).In contrast, all six 16S rRNAs without an upstream revcomp B-1 motif have weak base pairing, no base pairing or pairing at alternative positions between the flanking regions (Supplementary Fig. S15).Already Ogasawara et al. (1983) had found several cleavage sites for 16S, of which one is located at the 5 ′ -terminal end of a complex secondary structure, i.e., the B-1 motif, adjacent to 16S rRNA in one specific rRNA operon in B. subtilis (rrnO).Further, through in vitro transcription, they showed that this unusually processed transcript is stable to rifampicin treatment and is only detectable in germinating spores, but neither in the spore itself nor during exponential growth.
As all 16S rRNA-associated motifs have one fixed structure, this may be a consequence of the RNA mimicry.Such RNA mimicry phenomenon is defined as the structural similarity of some mRNA regulatory elements and other RNA elements such as ribosomal RNAs (Meyer et al., 2018).RNA mimicry plays an essential role in gene regulation (Marzi and Romby, 2012), and hence, this class of B-1 motifs could play a role in ribosomal gene regulation.We also considered the possibility that the SD-like sequences in both terminal loops of the revcomp B-1 motif help initiate the translation of predicted downstream ORFs (Supplementary Methods S5).However, we rejected this putative function because only the 16S rRNA associated revcomp B-1 motifs are supported through transcriptional activity and an annotated peptide homolog.S2).The genomic context of B-1 motifs and revcomp B-1 motifs of high structural diversity is not random.This class is mainly found between coding sequences (Felsenstein' = 3 × 10 -3 ).Of 71 non-hypothetical proteins adjacent to B-1 motifs, 23 proteins were associated with cations such as Na + and Mn 2+ or related to the sporulation process (Supplementary Table S1).As support for RNA switches that reorganize their structure within a preserved free energy range, we checked for suboptimal structures with large base-pair distance but comparable free energy to their MFE structure (energy difference of less than 0.5 kcal/mol; see Supplementary Table S2, Supplementary Fig. S16-S133).We found 34 B-1 motifs and 23 revcomp B-1 motifs (four instances with both forward and reverse motif) that satisfy our criteria for this specific type of structural switches (see Methods and Supplementary Fig. S134).The majority of these putative RNA switches are not associated with the rRNA clusters (28 B-1 and 17 revcomp B-1).We also checked if members of the B-1 motif may behave like RNATs but did not find any such association (see Supplementary Methods S4).

The class of structurally diverse motifs comprises putative in trans acting structural switches
In B. subtilis, the two B-1 motifs of high structural diversity (lengthnormalized ensemble diversity of 0.0027; 0.0022 for revcomp B-1) have identical sequences.As one is located in the genomic region S160, we have termed the genomic region of the other as S160*.The B-1 motif in S160 and S160* has two likely variants of the large hairpin loop, but the revcomp B-1 motif has not (Fig. 5).The hairpin loop matching the Rfam model is the non-MFE structure.The two alternative hairpins of the B-1 motif result in different loop sequences: CGU (MFE) and GUCCU (Rfam model).The public RNAseq experiment of Nagler et al. (2016) tested in duplicates the time-series effect of the maximum tolerated salinity in B. subtilis (1.2 M NaCl) on outgrowing spores (Boch et al., 1994).The expression profiles of this dataset clearly show that the B-1 motifs in both S160 and S160* are expressed as individual transcripts and independently from their adjacent coding genes (Supplementary Fig. S10-S11).The difference in expression due to high salinity was detected for 32% of all coding genes, but also 45 riboswitches, nine sRNA, and 14 putative ncRNA (DESeq2 Log-likelihood Ratio Test (LRT): FDR-adjusted p less than 0.01).Both B-1 motifs in S160 and S160* had significantly decreased expression levels due to high salinity (Fig. 4 and Supplementary Table S4; log2FC of − 1.3 and − 2.2, respectively).In contrast, the up-and downstream coding genes ydbT and ydcA of S160 had increased expression levels for high salinity (log2FC = 1.1), the upstream niaP of S160* had a decreasing expression level in the same log2FC magnitude as B-1 (log2FC = − 1.9), though, the downstream yceJ of S160* had weak expression under all conditions (Supplementary Table S4).The difference in expression between the high salinity and the control condition of both B-1 motifs and their adjacent genes decreased with time.
After re-examining the tiling array data from Nicolas et al. ( 2012) and the RNAseq data from Nagler et al. (2016), it seems that the B-1 motif in the S160 locus was inhibited under heat and cold stress conditions, while the same motif was highly expressed under moderate salinity stress conditions.Moreover, the opposite was observed under high salinity stress.This observation agrees with previous empirical studies on the S160 locus (Mars et al., 2016), and it could suggest the role of the intergenic B-1 motif as a putative salt-related RNA switch.In fact, in both independent expression studies, B-1 was lower expressed  under 1.2 M NaCl.In the RNAseq data, the downstream neighbor gene ydcA was up-regulated in equal proportion (slightly above two-fold).The Nagler et al. dataset started from outgrowing spores, whereas the Nicolas et al. dataset started with fully-grown cells subjected to stress only after growth.Despite the drastically different developmental stages in both experiments, the expression of the RNA motif was significantly different as a consequence of the salt priming (Elabed et al., 2019;Navada et al., 2020).This is consistent with salt level-dependent effects on expression level and RNA structure of the motif; however, a secondary effect may be more likely as many genes, such as those involved in oxidative and osmotic stresses, changed expression due to changed salt concentration (Höper et al., 2006;Nicolas et al., 2012;Rath et al., 2020).
Based on the available public expression data and thermodynamic folding, we hypothesize that at least a subset of the class of structurally diverse B-1 motifs acts as structural switches, including the B. subtilis hits in S160 and S160*.The lack of synteny conservation of the structurally diverse B-1 motifs in Bacillus spp.suggests their function in trans.A speculative function of the trans-acting members of the B-1 motif family is the inhibition of the rRNA maturation during salt-induced stress as the sequences of the suboptimal large hairpin loops of S160 and S160* B-1 motifs and 16S associated revcomp B-1 motifs are complementary, and their differential expression goes in the opposite direction under high salt levels.

Conclusion
In this study, we found that the 118 hits of the B-1 motif in 11 Bacillus spp.have different structural diversity and genomic context concerning the rRNA cluster.We characterized in detail the members in B. subtilis and found that the four B-1 motifs upstream of 16S rRNAs are not transcriptionally active, but instead, their reverse complement (revcomp B-1 motif) is expressed together with 16S rRNA.These revcomp B-1 motifs are similar to the B-1 motif with two hairpin loops and low structural diversity.Public RNAseq data (Nagler et al., 2016) supports increased expression under salt stress of the entire SSU rRNA cluster and the upstream flanking revcomp B-1 motif.However, none of the other revcomp B-1 motifs is expressed.Furthermore, the revcomp B-1 motifs upstream of the 16S rRNA exclusively coexist with a long-range interaction of 16S rRNA flanking loci consistent with RNAse III's target site during rRNA maturation in E. coli (Li et al., 1999).Two identical B-1 motifs of very different characteristics than the rRNA-associated motifs are located inside the B. subtilis' intergenic loci S160 and S160*.They are expressed independently from their adjacent genes under several conditions (Nagler et al., 2016;Nicolas et al., 2012), and both motifs are differentially inhibited under salt stress.Their high structural diversity suggests that they act as RNA switches.As their genomic context is not conserved, they likely act in trans.Our findings suggest two distinct classes of the RNA family RF01690: (I) motifs of one stable RNA structure and (II) motifs with (mainly two) alternative structures.Expression data in B. subtilis and synteny conservation support that class I contains cis-acting B-1 motifs that regulate the fate of downstream 16S rRNAs, and class II contains B-1 motifs that act in trans as structural switches.However, further studies based on structural biology and peptide expression experiments are necessary to elucidate the mechanism involved in the function of members of the B-1 RNA motif.
one other Bacillus spp.than B. subtilis.The adjacent genes YgxA and ThiT were only present in the clade III Bacillus spp.Moreover, the TPP riboswitch was always present before ThiT.A continuous intergenic region between GyrA and the 16S rRNA was noticed in the clades I, III, and IV; however, the B-1 motif only co-occurs in B. licheniformis.In contrast, the genomic context of the three B-1 motifs surrounded by two CDSs is not conserved among Bacillus spp.For instance, the upand down-stream genes, ydbT, and ydcA, of the genomic segment S160(Nicolas et al., 2012; Table 2)  were only detected in B. amyloliquefaciens and B. licheniformis, both of them belonging to the clade III.However, the intergenic region between these two genes (S160 locus including the B-1 motif) exclusively exists in B. subtilis.For the other two B-1 motifs flanked by two CDSs, only one adjacent gene had homologs in the other species, namely NiaP in Bacillus spp.clades I, Geo/Ano, III, and IV, and FdhD in B. licheniformis and B. amylolichefaciens, respectively.

Fig. 1 .
Fig. 1.Structure conservation of B-1 motifs among the genus Bacillus.(A) Rfam seed, (B) 11 Bacillus spp.(see Table2), (C) reverse complementary (revcomp B-1) motif in 11 Bacillus spp.The CM relative entropy, a measure of structure (base-pair) information in the alignment, is 0.838 for all three alignments (A-C).The HMM relative entropy, a measure of sequence information in the alignment, is 0.622 for (A), 0.640 for (B), and 0.618 for (C).A nucleotide of red, black, or grey color indicates whether its weighted (distances between the sequences to cluster them in a bifurcating tree) frequency exceeds 97%, 90%, and 75%, respectively.A circle of red, black, or grey color indicates whether a nucleotide is present in the position with weighted frequency more than 97%, 90%, or 75%, respectively.Green shady base pairs indicate covarying pairs, and blue shady base pairs are compatible.The figures were made using R2R(Weinberg and Breaker, 2011).

Fig. 2 .
Fig. 2. Phylogenetic reconstruction of the genus Bacillus based on the 16S rRNA.Branch colors indicate the presence (red) or absence (black) of the B-1 motif.Those branches with a Shimodaira-Hasegawa approximate Likelihood Ratio Test support of less than 50 or a bootstrap value of less than 50 were collapsed.The scale indicates the average number of nucleotide substitutions per site.The figure was constructed using Mesquite v. 3.61 (Maddison and Maddison, 2019).

Fig. 3 .
Fig. 3. Length-normalized ensemble diversity of 118B-1 motifs found in 11 Bacillus spp.(A) B-1 motif; (B) revcomp B-1 motif.Both panels show B-1 motifs as a histogram (bin width of 0.00025 normalized ensemble diversity and dodge positioning) and Gaussian kernel density estimate (scaled to the maximal count).Panel A adds kernel density estimates for 4774 Rfam seed sequences of riboswitch families, 174 Rfam RNATs, and all other 67,104 Rfam seed sequences.Panel B adds kernel density estimates for 4286 reverse complementary Rfam seed sequences of riboswitch families, 174 Rfam RNATs, and other 65,276 Rfam seed sequences.Motifs 5 ′ of 16S rRNA have mostly length-normalized ensemble diversity of less than 0.0017, with three exceptions in B. pumilus (all cases on the same strand as 16S rRNA).We propose a cut-off for structural switches of ≥ 0.0017 (dashed vertical line).
52 B-1 motifs (35 revcomp B-1 motifs) have a length-normalized ensemble diversity larger or equal to 0.0017, and 38 (31 revcomp B-1 motifs) of them are located between two CDSs (Supplementary Table Fig. 4. Normalized expression counts of eight B-1 motifs and their revcomp B-1 motifs in B. subtilis.Nagler et al. dataset provide duplicate samples for high-salinity treatment and control at 30, 60, and 90 min.Lines are LOESS regression with the grey area indicating the standard error.The data supports that all four revcomp B-1 motifs upstream flanking of 16S rRNA are active and that both B-1 motifs in S160 (ydbT -B-1 -ydcA) and S160* (niaP -B-1 -yceJ) are differentially inhibited under salt stress.

Fig. 5 .
Fig. 5. Base-pairing probability matrices of (A) B-1 motif and (B) revcomp B-1 motif in B. subtilis region S160 and S160*.The lower triangle shows the MFE structure.The base-pairing probabilities in the upper triangle were calculated by RNAfold from the Vienna RNA package.Cells with a green background present the Rfam structure (A) and its reverse complementary structure (B).The structurally diverse B-1 motif and revcomp B-1 motif have length-normalized ensemble diversity of 0.0027 and 0.0022, respectively.However, only the B-1 motif has a high probability of two alternative long stem-loops.

Table 1
., Number of B-1 motifs detected among 20 Bacillus spp.and one outgroup.the RF01690 family with Infernal cmsearch v. 1.1.2.To detect more distant hits, we used a score cutoff of 17.5, which is half the size of the Rfam gathering score threshold (the command line used is cmsearch -Z 8 -T 17.5).All hits that passed a bit score of 29 were considered for further analyses.We operate with the score threshold of 29 in contrast to the Rfam gathering score cutoff of 35 since we (1) observed an extra hit in Bacillus subtilis with a score of 30.9 and (2) observed a gap in the scores between 27 and 29 if considering all hits in all genomes.The ten intergenic regions harboring the B-1 motifs in B. subtilis str.168 were structurally aligned by Foldalign v. 2.5.2