Comparative genomics provides structural and functional insights into Bacteroides RNA biology

Bacteria employ noncoding RNA molecules for a wide range of biological processes, including scaffolding large molecular complexes, catalyzing chemical reactions, defending against phages, and controlling gene expression. Secondary structures, binding partners, and molecular mechanisms have been determined for numerous small noncoding RNAs (sRNAs) in model aerobic bacteria. However, technical hurdles have largely prevented analogous analyses in the anaerobic gut microbiota. While experimental techniques are being developed to investigate the sRNAs of gut commensals, computational tools and comparative genomics can provide immediate functional insight. Here, using Bacteroides thetaiotaomicron as a representative microbiota member, we illustrate how comparative genomics improves our understanding of RNA biology in an understudied gut bacterium. We investigate putative RNA‐binding proteins and predict a Bacteroides cold‐shock protein homolog to have an RNA‐related function. We apply an in silico protocol incorporating both sequence and structural analysis to determine the consensus structures and conservation of nine Bacteroides noncoding RNA families. Using structure probing, we validate and refine these predictions and deposit them in the Rfam database. Through synteny analyses, we illustrate how genomic coconservation can serve as a predictor of sRNA function. Altogether, this work showcases the power of RNA informatics for investigating the RNA biology of anaerobic microbiota members.

sRNAs were enabled by the availability of multiple enterobacterial genome sequences that allowed for the evaluation of sequence conservation (Argaman et al., 2001;Rivas et al., 2001;Wassarman et al., 2001). Conservation has also been used as evidence of function for particular regions of sRNAs (Papenfort et al., 2010) and appears to be a general marker of target interaction sites (Peer & Margalit, 2011;Richter & Backofen, 2012). The use of comparative genomics to determine secondary structure has an even longer history; for instance, manual comparisons of related sequences allowed the determination of highly accurate secondary structures for basic biomolecules such as tRNAs (Madison et al., 1966) and rRNAs (Woese et al., 1983) decades before experimentally determined structures were available. This basic approach, mathematically formalized and computationally automated, has driven a long series of noncoding RNA (ncRNA) discoveries in bacterial genomes. These include identification of 6S as an ancient housekeeping RNA in bacteria (Barrick et al., 2005), the discovery of an astonishingly diverse menagerie of metabolite-responsive riboswitches (McCown et al., 2017), studies of autoregulatory sequences in the untranslated regions of mRNAs encoding ribosomal proteins (Fu et al., 2013), and many more. As these approaches only require access to genomic sequences, they are particularly well suited to investigating the structure and function of noncoding RNAs in organisms that are difficult to cultivate or manipulate.
Bacteroides spp. are dominant colonizers of the mammalian large intestine (Wexler, 2007) and have emerged as model organisms for the microbiota (Wexler & Goodman, 2017). These obligate anaerobic, Gram-negative bacteria are metabolic specialists with the capacity to break down a variety of dietary fiber-and host mucus-derived polysaccharides, thereby facilitating nutrient absorption by the gut epithelium (Glowacki & Martens, 2020). Additionally, intestinal Bacteroides spp. protect their host from enteric infections by priming the development of the immune system and providing colonization resistance against pathogens (Buffie & Pamer, 2013;Hooper et al., 2000). In contrast to the situation in the model organisms of the Proteobacteria, Bacteroidetes RNA research is only in its infancy (Ryan et al., 2020b). For example, in contrast to other Gram-negative phyla where sRNA function is often tied to assisting RNA chaperones such as Hfq (Kavita et al., 2018) or FinO domain-containing proteins such as ProQ (Olejniczak & Storz, 2017), no global RBP is known in Bacteroidetes, though RRM-domain-containing proteins have recently been suggested as candidates (Adams et al., 2021).
RNA sequencing (RNA-seq) was applied to Bacteroides fragilis (Cao et al., 2016) and Bacteroides thetaiotaomicron (Ryan et al., 2020a)-the two workhorses of Bacteroides research-and revealed hundreds of noncoding RNA candidates, yet their conservation, secondary structures, and functions have not been determined systematically. For example, our differential RNA-seq (Sharma & Vogel, 2014) screen identified 151 sRNAs in B. thetaiotaomicron type strain VPI-5482 including 124 intergenically encoded candidates (Ryan et al., 2020a). As of now, however, only two trans-encoded B. thetaiotaomicron sRNAs have been functionally characterized: GibS, which posttranscriptionally binds and controls the expression of metabolic target mRNAs (Ryan et al., 2020a), and RteR, which likely acts as a cotranscriptional repressor of a transposon operon (Waters & Salyers, 2012). For GibS, we recently determined the secondary structure-composed of a single-stranded 5′ portion harboring the seed sequence, two meta-stable hairpins in the central region, followed by the intrinsic terminator hairpin-by a combination of computational prediction and chemical/enzymatic validation (Ryan et al., 2020a). In the case of RteR, a structure consisting of a long 5′ hairpin, an eight nt-long single-stranded stretch, followed by the 3′ terminator was proposed based on minimum free energy (MFE) calculations (Waters & Salyers, 2012). It is, however, currently unclear to what degree the structures and working mechanisms of GibS and RteR might be representative for the bulk of sRNAs in this genus.
Here, we apply a suite of in silico analyses to improve our understanding of Bacteroides sRNA biology. We start with a computational search for Bacteroides proteins with known RNA-binding domains, identifying a conserved Bacteroidetes cold-shock protein (CSP) as a putative RBP in this phylum, and examine global properties of Bacteroides sRNAs in comparison with Proteobacteria and Firmicute sRNAs. We then predict structures of a number of previously identified intergenic B. thetaiotaomicron sRNAs and core ncRNAs, harnessing information about covarying base-pairing ribonucleotides.
Using structure probing, we validate and refine the predicted consensus structures for selected ncRNAs, namely the ubiquitous 6S and 4.5S RNAs as well as two intergenic sRNAs (an RteR homolog and BTnc201). The curated alignments and consensus structures of Bacteroides RNA families have been deposited in the Rfam database (Kalvari et al., 2018). Finally, based on the assumption that adjacently encoded genes and operons might be functionally connected, we perform synteny analyses to predict biological pathways that individual sRNAs might be involved in. Altogether, this study illustrates the power of in silico approaches to infer structural and functional features of sRNAs in human-relevant bacterial species for which the experimental toolkit is still in the making.

| RBP candidates in Bacteroides
Bacterial species of the Proteobacterium phylum-including the genuses Escherichia and Salmonella-have long served us as model Gramnegative organisms including for RNA research (Hör et al., 2020).
In stark contrast, RNA biology is poorly understood in the distantly related genus Bacteroides (Figure 1a) (Ryan et al., 2020b). Given the profound impact global RBPs exert on proteobacterial sRNA function, we first performed an in silico search for Bacteroides proteins harboring known RNA-binding domains using Pfam release 32 (El-Gebali et al., 2019). Neither an Hfq nor a ProQ homolog could be identified ( Figure 1b). Likewise, CsrA/RsmA-an otherwise highly conserved translational regulatory RBP (Romeo & Babitzke, 2018)-is absent from the Bacteroidetes. Instead, the analysis yielded B. thetaiotaomicron proteins with domains found occasionally in other bacterial RBPs, namely putative K homology (KH), RNA recognition motif (RRM), or cold-shock domains (CSDs). Specifically, BT_2563, BT_4417, BT_3835, BT_2721, and BT_3403 contain KH domains (Pfam IDs: PF00013,   PF07650, PF13184), BT_0784, BT_1887, and BT_3840 contain each an RRM-1 domain (PF00076), whereas BT_1884 harbors a putative CSD (PF00313) (Figure 1c). Based on their homology to known transcriptional regulators (BT_3403 is a homolog of the transcription termination/antitermination factor NusA), ribosomal proteins and biogenesis factors (BT_3835 is a putative GTP-binding protein with a role in ribosome biogenesis; BT_2721 is the 30S ribosomal protein S3), or ribonucleases (BT_2563 is a putative polyribonucleotide nucleotidyltransferase; BT_4417 is a putative ribonuclease Y), the B. thetaiotaomicron KH-containing proteins appear unlikely to have a global RNA chaperone function. In contrast, the three RRM-1 proteins and the CSP could indeed act as regulatory RBPs in Bacteroides.

| Cold-shock protein BT_1884 has an RNA-related function
Closer inspection of these candidates sparked our interest in the CSD-containing protein BT_1884, as the eggNOG-mapper tool for functional annotation (Huerta-Cepas et al., 2017) suggested it to be a likely homolog of CspC/E (COG1278)-known global RBPs in Enterobacteriaceae with hundreds of mRNA and sRNA ligands (Michaux et al., 2017). We generated a B. thetaiotaomicron deletion mutant devoid of this protein (strain ∆BT_1884), its complementation in trans under its native promoter (strain BT_1884 + ), and a strain overexpressing BT_1884 from a strong constitutive phage promoter  Table S1). Note that strain BT_1884 + harbors BT_1884 as a standalone gene, rather than as part of an operon as in the wild type (Supplementary Figure S1a). As a likely consequence, we detected BT_1884 mRNA at elevated levels (~100-fold) in this transcomplemented strain compared with the wild type (Supplementary Figure S1b). Based on observations made in Proteobacteria (Chao & Vogel, 2010;Romeo & Babitzke, 2018), where global RBP deletion or overexpression may result in pleiotropic effects, including altered growth rates, we recorded growth curves of the B. thetaiotaomicron strains with varying BT_1884 levels. However, at least when cultured in rich media at 37℃, depletion or constitutive expression of BT_1884 had no obvious effect on bacterial growth (Supplementary Figure S1c).
Next, we isolated total RNA from the same B. thetaiotaomicron strains in the stationary phase. Assuming that sRNA-binding proteins may affect the abundance of their ligands in the bacterial cell, we profiled the steady-state levels of 11 randomly selected sRNAs by northern blot (Figure 1d and Supplementary Figure S1d). None of the selected sRNAs showed altered levels upon the deletion of BT_1884, not surprising in light of the low expression of endogenous BT_1884 mRNA in the wild-type strain (Supplementary Figure S1b; Ryan et al., 2020a). In contrast, two sRNAs were slightly (approximately twofold) yet significantly upregulated in the BT_1884 overexpression background compared with the wild-type strain, namely 5′ UTRderived BTnc013 and intergenic BTnc201 (Figure 1d). Overexpression of BT_1884 did not show an effect on the steady-state levels of the remaining nine sRNAs (Supplementary Figure S1d). Mature 5S ribosomal RNA (rRNA) was used as our loading control on the northern blots. Interestingly, whereas the levels of mature 5S (~110 nt) were not affected by the BT_1884 status, a putative precursor 5S (~150 nt) accumulated in a BT_1884-dependent manner (Figure 1d; Supplementary Figure S1e). In Escherichia coli, RNA helicase RhlE is implicated in rRNA processing and ribosome maturation (Jain, 2008) and the B. thetaiotaomicron RhlE homolog (BT_1885) is encoded upstream of BT_1884 within the same operon (Supplementary Figure S1a). However, we are confident that the observed effect on 5S rRNA processing is due to altered BT_1884 levels and independent of RhlE, because we excluded major polar effects in the B. thetaiotaomicron mutant strains (Supplementary Figure S1b).
Salmonella deletion mutants devoid of both CspC and CspE exhibit severe phenotypes, ranging from an increased sensitivity to peroxide, bile salts, and antimicrobial peptides to defects in biofilm formation, motility, and pathogenesis (Michaux et al., 2017). These phenotypes are likely a consequence of aberrant transcript levels for CspC/E ligands in the double mutant background and could be reverted when complementing with either CspC or CspE, establishing functional redundancy between these two Salmonella CSPs (Michaux et al., 2017). Compared with Salmonella CspC and −E, Bacteroides BT_1884 harbors an N-terminal extension yet shares the two RNA-binding motifs (RNP-1 and RNP-2; Figure 1e). We therefore asked whether BT_1884 might have a similarly global effect and would complement a Salmonella ΔcspC/E double mutant. Reading out the survival rates when bacteria were challenged with the antimicrobial peptide polymyxin B, we found that heterologous BT_1884 expression overcomplemented polymyxin B sensitivity of Salmonella ΔcspC/E (Figure 1f). Swimming assays further revealed that the Bacteroides CSP homolog can efficiently cross-complement the motility defect of Salmonella ΔcspC/E (Figure 1g). Taken together, these observations suggest the CSP protein BT_1884 may indeed have a global RNA-related function and should encourage future studies of Bacteroides RBPs.
Overall, neither of these parameters differed markedly between ncRNAs in B. thetaiotaomicron and the nine unrelated species, indicating that the absence of the classical RNA chaperones and low GC content of Bacteroides do not appear to be reflected in these general ncRNA properties.
We used iterative sequence homology searches over a database of genomes from class Bacteroidia with the nucleotide profile hidden Markov model (profile HMM)-based nhmmer (Wheeler & Eddy, 2013) to identify 22 intergenic B. thetaiotaomicron ncRNAs as conserved within and occasionally beyond Bacteroides spp. (Ryan et al., 2020a; Figure 2c). The most broadly conserved ncRNAs, as in most bacteria, are housekeeping transcripts (Jose et al., 2019 proposed as a likely 6S molecule (Wehner et al., 2014), and we previously showed that it produces the expected short product RNAs (pRNAs) (Ryan et al., 2020a).
Beyond core bacterial ncRNAs, the only two Bacteroides transencoded sRNAs characterized so far are RteR (Waters & Salyers, 2012) and GibS (Ryan et al., 2020a). RteR was not included in our original sRNA annotation (Ryan et al., 2020a

| Integrated structure and sequence analysis of B. thetaiotaomicron core ncRNAs
In the following, we sought to determine the secondary structures of B. thetaiotaomicron ncRNAs. To this end, we focused on the well-conserved housekeeping RNAs and the partially conserved intergenic sRNAs, as this allowed for the incorporation of sequence conservation and single-nucleotide covariation information to strengthen the computational predictions. We based our approach on a published protocol to construct RNA families (Barquist F I G U R E 1 Bacteroides RNA-binding protein candidates. (a) Phylogenetic tree with zoom-in on Bacteroides species. The horizontal scale indicates the number of substitutions per site between the representative genomes of the selected genera/species. (b) Pfam search for RNAbinding domain (RBD)-containing proteins in Bacteroidetes and bacterial landmark species of other phyla. White square: not detected by our search; gray square: detected by our search; black square: detected and previously confirmed to be a (global) RNA binder. CSD, cold-shock domain; KH, K homology; RRM, RNA recognition motif. Asterisk: There is no completed genome available yet for Prevotella copri, so the hits are not guaranteed to be complete. (c) RBD proteins in Bacteroides thetaiotaomicron and their domain structure as determined using the Pfam sequence search function. (d), Steady-state levels of B. thetaiotaomicron sRNAs BTnc013 and BTnc201 and of a putative 5S rRNA precursor (see Supplementary Figure S1e) in different BT_1884 backgrounds. Northern blots were performed with total RNA samples extracted from wild-type, ∆BT_1884, BT_1884 + (trans-complemented), or BT_1884 ++ (overexpression) B. thetaiotaomicron strains grown in TYG medium to stationary phase and were probed with RNA-specific, radiolabeled oligonucleotides. Mature 5S rRNA serves as loading control. Values to the right refer to the size of the corresponding marker bands (in nt). One representative out of three biological replicate experiments is depicted and quantification over all replicates is given at the bottom. See Supplementary Figure S1d for the full set of sRNAs selected for probing. (e) Amino acid alignment of Bacteroides BT_1884 and Salmonella CspC and −E. Conserved residues are colored according to the Clustal X color scheme (blue: hydrophobic, red: positively charged, magenta: negatively charged, green: polar, pink: cysteine, orange: glycine, yellow: proline, cyan: aromatic). The two RNA-binding motifs (RNP-1 and RNP-2) are highlighted in bold font. Rfam database (Gardner et al., 2009). Using sequences gathered by our previous iterative HMMER searches, we created sequence alignments with consensus secondary structure predictions using the Webserver for Aligning structural RNAs (WAR) (Torarinsson & Lindgreen, 2008). WAR comprises 14 different alignment and structure prediction methods that variously consider MFE, residue conservation, and covariation and can provide a maximum consistency alignment and structure integrating individual predictions with T-Coffee (Notredame et al., 2000). The consensus structures were then manually inspected and adapted where needed (Supplementary Table S2).
As a proof-of-concept, we began with the core conserved ncRNAs 6S and 4.5S, which were not already captured by alignments thetaiotaomicron (their "BTnc'" ID according to Ryan et al., 2020a or, where available, their trivial name) are indicated at the top, sorted by conservation from left to right (gray triangle). We refer to ncRNAs as "conserved" when they are present in more than one strain and as "strain-specific" when they are only found in B. thetaiotaomicron. The color code denotes nucleotide identity as indicated at the lower right.  and a conserved, functionally important GGAA tetraloop (Jagath et al., 2001;Larsen & Zwieb, 1991). However, our WAR predictions  Figure 3a). We revised our computational prediction of the 6S secondary structure so that it was compatible with both probing data and base-pair conservation in our alignment ( Figure 4a). We evaluated it using R-scape (Rivas et al., 2017), a statistical test for base covariation in secondary structures, and found evidence for evolutionary selection for base-pairing in three of the four predicted stem structures (asterisks in Figure 4a).
In contrast, single sequence probing of 4.5S RNA was less conclusive and the manually inferred structure quite different from that inferred from sequence alignment (compare Figure 3b; Supplementary Figure S2b). However, unlike the 6S structure, which was supported by numerous cleavage events, few unambiguous cleavage events were detected in 4.5S (Figure 3b). We resolved this by using these unambiguous cleavage events as structural constraints in RNAalifold (Bernhart et al., 2008), resulting in a single long stem-loop structure well supported by covariation (Figure 4b).

| Secondary structure determination of trans-encoded sRNAs in B. thetaiotaomicron
The GibS and RteR sRNAs are conserved within Bacteroides spp. and are currently the only two functionally characterized trans-encoded sRNAs in this genus (Ryan et al., 2020a;Waters & Salyers, 2012). The WAR-derived GibS structure prediction (Supplementary Figure S2c) was similar to that previously determined by in vitro probing (Ryan et al., 2020a) and required only minor manual editing for consistency (Figure 4c 1 2 3 4 5 6 Lane:   1 2 3 4 5 6 Lane:

| Synteny and target co-occurrence analyses for conserved sRNAs
Gene neighborhood can be predictive of cellular processes and bio-  (Weinberg & Breaker, 2011) and statistically significant base covariations according to R-scape (Rivas et al., 2017) are labeled by asterisks. See Supplementary Figure S2a-e for the corresponding original predictions, prior to manual curation GibS, as one of its targets, BT_0771, is encoded directly downstream of the gibS gene itself (Ryan et al., 2020a).
Inspired by a previous study on the proteobacterial sRNA SgrS (Horler & Vanderpool, 2009), we performed gene synteny analyses for some of the most conserved (Figure 2c), yet unstudied, Bacteroides sRNAs. BTnc060, for example, was a candidate sRNA with a predicted size of ~220 nt (Ryan et al., 2020a). Here, northern blotting revealed the existence of two stable BTnc060 variants, a longer isoform (matching the predicted 220 nt) and a shorter one  (Figure 5a). This could imply further roles of GibS, for example, in the control of the Bacteroides cell envelope and transmembrane transport processes-a common theme among proteobacterial sRNAs (Guillier et al., 2006).
In an attempt to assess the completeness of our previous GibS target identification screen through transcriptomic profiling (Ryan et al., 2020a), we here performed an in silico co-occurrence analysis between this sRNA and its targets across Bacteroides spp. (Figure 5b).
The known GibS target BT_0771 (Ryan et al., 2020a) exhibits an overall similar prevalence pattern to the sRNA itself. While this might suggest that regulation is also conserved beyond B. thetaiotaomicron, we note that genomic linkage likely contributes to their co-occurrence, as GibS is encoded adjacent to the BT_0769-0771 operon. We also report ~30 instances, where GibS was present in a genome devoid of any of its three known target genes, arguing that this sRNA could indeed have additional, currently unknown targets. Future studies might employ approaches orthogonal to transcriptomic profiling, such as sRNA affinity purification and sequencing (Correia Santos et al., 2021;Lalaouna et al., 2015), to more fully define GibS targets and gain further insight into the physiological role of this sRNA.

| Identification of K homology, RNA recognition motif, and cold-shock domain proteins in Bacteroides
In Gram-negative bacteria, sRNA functionality often depends on RNA chaperones . Hfq-dependent sRNAs, for example, associate with this widely conserved RBP, which protects them from cellular ribonucleases, facilitates annealing to complementary regions within target transcripts, and may recruit endonucleases to cleave the bound target, effecting rapid decay (Kavita et al., 2018;Santiago-Frangos & Woodson, 2018;Vogel & Luisi, 2011). FinO domain proteins (Olejniczak & Storz, 2017), such as proteobacterial ProQ (Holmqvist et al., 2020), have only recently emerged as global RBPs but seem to also stabilize their ligands, thereby aiding sRNA-mediated control. CsrA is a prevalent regulatory protein that binds to GGA motifs within loop regions in mRNA 5′ UTRs and influences translation initiation, an activity that in many bacteria is controlled by antagonistic sRNAs that sequester the protein in an inactive complex (Romeo & Babitzke, 2018

Intersection size
Occurrences bind to single-stranded regions within nucleic acids, and are prevalent in all three domains of life (Graumann & Marahiel, 1998;Maris et al., 2005;Valverde et al., 2008). Type II KH domains-which are predominant in prokaryotes-consist of three β-strands, two of which are in parallel orientation. Nucleic acid binding occurs in a hydrophobic cleft formed between two α-helices and a variable loop (Nicastro et al., 2015). KH domains are, for example, contained within PNPase and ribosomal protein S3, where they mediate binding to RNA ligands, but also in NusA-like transcription elongation proteins, in which they initiate binding to chromosomal DNA. The Firmicute KH-containing protein KhpB (also known as EloR) has been recently described as a new globally acting RBP in Gram-positives (Lamm-Schmidt et al., 2021;Winther et al., 2019;Zheng et al., 2017). However, no KhpB homolog was predicted for Bacteroides spp. by our screen. Rather, the KH domain proteins found showed homology to known transcriptional regulators, ribosomal factors, or ribonucleases.
RRM domains consist of each four antiparallel β-strands and two α-helices, and are best understood in the Eukarya, where they engage in various posttranscriptional processes (Maris et al., 2005). While our study was in review, the three B. thetaiotaomicron RRM-1-containing proteins (Figure 1c) were independently predicted as Bacteroides RBPs (Adams et al., 2021). What is more, the deletion of two of them-renamed to RbpA (BT_0784) and RbpB (BT_1887)-resulted in widespread gene expression changes. Additionally, purified RbpB bound single-stranded RNA baits with high affinity in vitro, suggesting these RRM proteins function as global RBPs in the Bacteroidetes phylum (Adams et al., 2021).
CSDs also adopt a β-barrel structure and the binding of CSPs remodels the folding of their RNA ligands (Phadtare & Severinov, 2010).
The best-studied CSP, E. coli CspA, is produced upon a temperature decrease to 20℃ and modulates both the transcription and translation of its mRNA ligands to grant survival below the optimum growth temperature (Bae et al., 2000;Giuliodori et al., 2010;Jiang et al., 1997). Specifically, CspA prevents the formation of inhibitory secondary structures in the 5′ portion of mRNAs to overcome premature transcription termination and facilitate ribosome binding and translation initiation in the cold (Jiang et al., 1997 (Michaux et al., 2017). In Salmonella, altered CspC/E levels are reflected in the steady-state levels of their direct RNA ligands (Michaux et al., 2017). In the absence of RIP-seq (RNA immune-precipitation and sequencing) or CLIP-seq (crosslinking immunoprecipitation-high-throughput sequencing) data, we randomly selected Bacteroides sRNAs for probing. This revealed 2 out of 11 sRNAs to have mildly elevated expression levels when BT_1884 was overexpressed, although their cellular half-lives remained largely unchanged.
The binding of CSPs and the consequent suppression of local RNA folding may also impact RNA processing. For example, the plant CSP GRP2 melts secondary structures-and thereby likely interferes with the processing-of its RNA ligands Sasaki et al., 2007). Likewise, specific CSD-containing proteins associate with rRNA and are implicated in ribosomal maturation. For instance, the proteobacterial cold-induced ribosome-binding factor A (RbfA) binds 16S rRNA and is required for efficient processing of this ribosomal constituent (Dammel & Noller, 1995;Gualerzi et al., 2003). Here, our presented data suggest that BT_1884 affects 5S rRNA processing in B. thetaiotaomicron. Of note, BTnc013-one of the putative sRNAs that were upregulated when this CSP was overexpressed-is a 5′ UTR-derived sRNA. Future studies might therefore explore whether BT_1884 plays a general role in Bacteroides RNA remodeling and processing.
Absence of Hfq, ProQ, and CsrA homologs, yet the presence of RRM-1-and CSD-containing proteins in the Bacteroidetes phylum might be indicative of an RNA biology fundamentally different from that of Proteobacteria, whose members have served us as bacterial models of RNA biology for decades. This, in turn, could imply that many new RNA-related mechanisms and functions await discovery and should foster future studies of RBPs in Bacteroides and other microbiota members. F I G U R E 5 Synteny and target co-occurrence analyses for GibS homologs across Bacteroides spp. (a), Comparison of the regions flanking gibS in Bacteroides genomes for which a homolog of this sRNA was predicted. For each strain, the 5,000 nucleotides upstream and downstream of the gibS gene (red arrow) are plotted. Genes within these regions are represented as arrows and colored according to their predicted function (see legend). Trapezoids connecting regions of adjacent genomes indicate the BLAST-derived sequence identity, on a gradient from 66% (light gray) to 100% (dark gray) identity. Gene locus tags (without the literal prefix, e.g., "BT_" for Bacteroides thetaiotaomicron) are included within or above each gene arrow. (b), Co-occurrence analysis of GibS and its three established targets BT_0771, BT_1871, and BT_3893 (Ryan et al., 2020a) across Bacteroides genomes. The top bar graph indicates the number of genomes that contain a single gene or specific combination of the four genes, as denoted by the black dots below each bar. The bar graph to the left shows the number of genomes in which each gene was detected The presented data reveal additional interesting facets of Bacteroides RNA biology. For instance, the consensus secondary structure of Bacteroides 6S RNA is a bulged hairpin, with pRNA transcription initiating from the central bulge region, as previously described for well-characterized proteobacterial 6S RNA (Barrick et al., 2005;Wassarman, 2018). However, the central bulge of Bacteroides 6S RNA is substantially larger than that of any other described 6S molecule. It is thus tempting to speculate that the absence of the canonical sigma factor (σ 70 ; encoded by the gene rpoD) and the presence of a different "housekeeping" sigma factor in this phylum (σ ABfr ; Bayley et al., 2000;Vingadassalom et al., 2005) could be responsible for the divergent 6S structure. In other words, our findings might imply that the transcription bubble in actively transcribed genomic DNA, which is mimicked by 6S RNA to sequester the RNA polymerase holoenzyme, may be larger in the Bacteroidetes than in other phyla.

| What secondary structures imply about
The bacterial SRP, consisting of 4.5S RNA and the Ffh protein, binds hydrophobic residues in nascent polypeptides. The ribonucleoprotein complex then docks to its membrane receptor FtsY, thereby cotranslationally inserting proteins into the membrane or tethering them for transport out of the cytosol (Peluso et al., 2000). Reflecting this housekeeping function, the SRP system is essential across the bacterial kingdom including in B. thetaiotaomicron (Goodman et al., 2009;Liu et al., 2021). Our analysis was unable to resolve a central tetraloop structure in the Bacteroides 4.5S RNA by either experimental structure probing or computational conservation-based analysis. In E. coli, this loop contains the sequence GGAA, which is important for stimulating GTPase activity of the SRP-FtsY complex (Siu et al., 2007) and whose mutation abolishes SRP function (Jagath et al., 2001). Interestingly, additional base pair disruptions in the tetraloop-adjacent stem partially restore E. coli 4.5S RNA function (Jagath et al., 2001).
Consequently, the extended loop of our predicted Bacteroides 4.5S structure might represent a molecular compromise to retain SRPreceptor interaction in the absence of a GGAA tetraloop, though it remains unclear if the Bacteroides 4.5S adopts a different central structure that may fulfill a similar function.
Our structural compendium of Bacteroides ncRNA families was deposited in the Rfam database and will be useful for the community as a resource for future mechanistic studies. For example, certain RBPs such as FinO-like proteins (Olejniczak & Storz, 2017), exhibit high affinities toward RNA stem-loop structures. Although no FinO domain-containing homolog is present in Bacteroides, similar structural preferences can be envisaged for other RBPs.
In fact, both cold-shock and RRM-1 domains tend to bind singlestranded stretches within target RNA (Maris et al., 2005;Phadtare & Severinov, 2010). The presented folding predictions could thus be interrogated in the future to search for a common structural code for RBP ligands.

| Conclusion and perspective
Studying RNA processes in bacteria has had a tremendous impact on modern science, ranging from basic research to biotechnology and medicine (Pickar-Oliver & Gersbach, 2019;Serganov & Patel, 2007;. It is important, however, to emphasize that our current knowledge of bacterial RNA biology is strongly biased toward that of Gammaproteobacteria and Bacilli, whose species have been harnessed as model Gram-negative or Grampositive organisms, respectively, for decades. In contrast, due to technical hurdles associated with experimentation with obligate anaerobic species, as of now we only have a vague idea of the RNA complement of most gut microbiota species. Mapping and functionally analyzing this "RNA microbiome" bears great potential to uncover novel functions and mechanisms employed by bacterial RNA molecules and the proteins they interact with. While experimental toolkits are being developed to study RNA-related processes in different representatives of the microbiota Ponath et al., 2021;Ryan et al., 2020b), informatic approaches, such as comparative genomics, offer an entry point to explore RNAs expressed by human microbiotas (Fremin & Bhatt, 2021;Weinberg et al., 2010). In the present study, we have leveraged in silico tools and combined them with experimental approaches to improve our understanding of the RNA biology of the gut microbiota member B. thetaiotaomicron. This provided insights into potential RBPs, ncRNA structure, and function in Bacteroides spp. Collectively, our findings imply that much is still to be learned from the RNA biology of obligate anaerobes. Even GibS-arguably the best-characterized Bacteroidetes sRNA to date (Ryan et al., 2020a)-confronts us with many open questions regarding its mechanistic function and physiological role. However, the present study may serve as starting ground for future targeted investigations, to functionally dissect and-eventually-exploit the RNA of these predominant gut bacteria.

| Bacterial strains, plasmids, and growth conditions
B. thetaiotaomicron VPI-5482 is referred to as wild type throughout this study and was grown as previously described (Ryan et al., 2020a). All strains, plasmids, and oligonucleotides used in this study are indicated in Supplementary Table S1. Growth curves were generated on a Biotek Epoch2 microplate reader, inoculating fresh tryptone yeast extract glucose (TYG) medium with 1:100 dilutions of overnight cultures of the respective strain.

| Phylogenetic tree visualization
The bacterial phylogenetic tree shown in Figure 1a was obtained from GTDB release 95 (Parks et al., 2020) and pruned with the APE R package (Paradis & Schliep, 2019). For simplicity, we kept the historical assignation of B. dorei and B. vulgatus to the Bacteroides, rather than the Phocaeicola genus of GTDB r95.

| In silico search for RNA-binding domaincontaining proteins
Hidden Markov models (HMMs) from Pfam A (release 32) were downloaded from the Pfam ftp server, and used with hmmer (ver-  Heterologous expression of BT_1884 in Salmonella enterica serovar Typhimurium was achieved by transforming plasmid AWP-044 into the ΔcspCE Salmonella background (JVS-5084; Michaux et al., 2017). Cloning of AWP-044 was done as in (Michaux et al., 2017).

| Generation of BT_1884 mutant, complementation, and overexpression strains
Briefly, the coding sequence of BT_1884 was amplified with oligonucleotides AWO-654 and AWO-655 and the amplicon ligated in the XbaI site of pBAD33.

| RNA extraction, northern blotting, and quantitative real-time PCR (qRT-PCR)
B. thetaiotaomicron was grown overnight in 5 ml TYG medium and total RNA was extracted from stationary phase cultures (OD ≈ 4.8) as previously described (Ryan et al., 2020a). For northern blotting, DNase I-digested (NEB) total RNA (5 μg/lane) was denatured for 5 min at 95℃ followed by 5 min incubation on ice. Samples were run on a TBE 6% (vol/vol) polyacrylamide (PAA) 7 M urea gel at 300 V, 150 mA for ~2 hr and blotted onto nylon membranes (Sigma #15356) at 50 V, 4℃ for 1 hr. After UV cross-linking the RNA to the membrane, blots were probed with 32 P-labeled gene-specific oligonucleotides (Supplementary Table S1) in Hybri-Quick buffer (Carl Roth AG) at 42℃ and exposed as required on phosphor screens.

| RNA stability assay
To monitor cellular half-lives of B. thetaiotaomicron ncRNAs in different BT_1884 backgrounds, we applied rifampicin-mediated inhibition of de novo transcription. Briefly, B. thetaiotaomicron strains AWS-001 (WT), AWS-186 (ΔBT_1884), and AWS-192 (BT_1884 ++ ) were inoculated from single colonies and grown for ~16 hr in liquid TYG to an OD of ~4.8 (the condition where we observed altered steady-state levels; see Figure 1d) and then treated with rifampicin (final concentration: 500 μg/ml) to halt transcription. RNA samples were drawn at the indicated time points after rifampicin addition and RNA decay was assessed by northern blot analysis.

| Polymyxin B sensitivity assay
Polymyxin B survival assays were performed as in Michaux et al.

| Swimming assay
The motility of Salmonella strains was assessed via swimming assays as previously described (Michaux et al., 2017). Twenty milliliters of soft LB-agar (0.3% agar; prewarmed to 60℃) were poured per Petri dish on the lab bench and dried with an open lid for 15 min. l-arabinose (0.2% final concentration) was supplemented to the overnight media and agar plates to induce the expression of CSPs in the complementation strains. Then, 10 µl of each bacterial overnight culture were spotted on the center of a plate, followed by incubation for 5.5 hr at 37℃ and measurement of swimming distances.

| Interspecies sRNA structuredness comparison
We define the structuredness of an sRNA as the deviation (Z score) of its predicted MFE from the distribution of the MFEs of 1,000 sequences of the same length as the sRNA and having the same dinucleotide frequency as the respective genome. For Figure 2b, we computed MFE values with Vienna RNAfold (Lorenz et al., 2011) and generated the 1,000 background sequences with esl-shuffle (-L length -N 1000 -d options, with length being the sRNA length) from the hmmer3 suite (Eddy, 2011).
Reannotation was based on a keyword search for the term "Bacteroides" in the Rfam database with subsequent hits filtered for "Bacterial small-signal recognition particle RNA." The resulting sequence table file was downloaded and filtered for species of the Bacteroidetes/Chlorobi group (Figure 1a), retrieving strain CTnDOT (Jeters et al., 2009;Waters & Salyers, 2012). However,

RteR was not included in our original annotation of sRNAs in B.
thetaiotaomicron strain VPI-5482 (Ryan et al., 2020a). A BLAST search for the exc gene suggested BT_0105 (annotated as a type IA DNA topoisomerase) within the conjugative transposon CTn1 as a putative exc homolog. By manually inspecting our previous RNAseq data (Ryan et al., 2020a) mapping to this locus, we identified a lowly expressed, ~75 nt-long transcript derived from a region downstream of BT_0105, associated with a canonical σ ABfr promoter (Bayley et al., 2000), and not harboring any potential open reading frame (Stothard, 2000). It was suggested previously that the secondary structure of RteR is important for its function as a repressor of conjugative transfer (Waters & Salyers, 2012). Based on the shared genomic organization and similar RNA secondary structure (Waters & Salyers, 2012), we propose this CTn1-encoded RNA as a functional homolog of RteR.

| Analysis of ncRNA conservation
A custom genome database was constructed using bacterial genomes belonging to class Bacteroidia (Taxonomy ID: 200643) and selected from the ENA (https://www.ebi.ac.uk/genom es/bacte ria.html, accessed 1/12/2017). An iterative search of each ncRNA candidate using nhmmer (Wheeler & Eddy, 2013) was performed as previously described (Lindgreen et al., 2014). With each iteration, nhmmer was set with the flags, -popen 0.4999 -E 0.001 -incE 0.001. Additionally, hits with E values ≤0.001 were required to have almost full-length alignments with no more than 10% missing sequence at either end. The resultant alignment served as input for hmmbuild and was utilized in the next iteration. Alignments were subsequently manually inspected using the RALEE editor (Griffiths-Jones, 2005).
Dephosphorylation of 50 pmol of the above RNA was carried out using 25 U of calf intestinal alkaline phosphatase (NEB) in a 50 µl volume and incubated at 37℃ for 1 hr. The dephosphorylated RNA was extracted using phenol, chloroform, isoamylalcohol (25:24:1) and precipitated as described above. Twenty picomoles of the RNA were 5′ end-labeled using 32 P-γATP (20 µCi) and polynucleotide kinase (NEB) at 37°C for 1 hr in a reaction volume of 20 µl. The labeled RNA was then columnpurified (G-50, GE Healthcare) and gel-extracted as above.

| Structure probing
Structure probings were performed in 10 µl reaction volumes using radiolabeled RNA (0.2 pmol) that was first denatured at 95℃ for 1 min and then chilled on ice for 5 min. Following the addition of 1 µg yeast RNA and 10x structure buffer (Ambion), the reactions were incubated for 10 min at 37℃. To these reactions, either 2 µl of fresh lead (II) acetate (25 mM, Fluka), 2 µl of RNase T1 (0.01 U/µl, Ambion), or 2 µl RNase III (NEB) and 1 mM dithiothreitol were added and incubated at 37℃ for 45 s, 3 min or 10 min, respectively. The reactions were stopped by addition of 12 µl loading buffer II (Ambion) and placed on ice. Alkaline hydrolysis ladders were prepared by incubating 0.4 pmol labelled RNA with 9 µl of 1× alkaline hydrolysis buffer (Ambion). RNase T1 ladders were similarly prepared in 8 µl of 1× sequencing buffer (Ambion) and incubated at 95℃ for 1 min prior to the addition of 1 µl RNase T1 (0.1 U/µl). Both reactions were incubated for 5 min at 37℃ and stopped as above. Prior to loading on an 8% (vol/vol) PAA 7 M urea sequencing gel, all samples were denatured at 95℃ for 3 min.

| Curation of ncRNA families
The curation of our RNA families largely followed the protocol described in Barquist et al. (2016). Initial sequences were collected using nhmmer as described above in "Analysis of ncRNA conservation" then fed to the WAR webserver. The resulting alignments were manually evaluated and edited for consistency using the RALEE emacs RNA editor mode (Griffiths-Jones, 2005) by at least four independent editors, and further evaluated using the R-scape webserver (Rivas et al., 2017). Details of editing are provided in Supplementary   Table S2, "Consensus structure curation". These alignments and structures were further revised in light of structure probing data as described in the main text, where relevant, and deposited on the Rfam database (Supplementary Table S3).

| sRNA synteny analysis
Comparison plots of 5,000 nucleotides flanking the respective sRNA's transcriptional start site were generated through Easyfig (Sullivan et al., 2011). The plots were then manually curated to include gene functional annotation from NCBI (Tatusova et al., 2016).

| sRNA-target co-occurrence analysis
The nucleotide sequences of GibS, BT_0771, BT_1871, and BT_3893 were searched with blastn (7 nt word size) (Altschul et al., 1997) against the refseq_genomes database filtered for Bacteroides (taxid 816). Hits with an E value of >0.01 or query cover of <75% were discarded. The results of the co-occurrence analysis (Figure 5b) were visualized with the UpSetPlot Python package (https://github.com/ jnoth man/UpSet Plot).

ACK N OWLED G M ENTS
We thank Charlotte Michaux for advice on Salmonella motility assays, Jörg Vogel for sharing Salmonella strains, and Nancy Ontiveros and Anton Petrov for assistance with depositing alignments in the Rfam database. Research in the Westermann lab is supported by the German Research Foundation (DFG): Individual Research Grant We6689/1-1.
Open Access funding enabled and organized by Projekt DEAL.

DATA AVA I L A B I L I T Y S TAT E M E N T
The results of the integrated structure and sequence analysis of B. thetaiotaomicron ncRNAs were deposited in the Rfam database (RF04177 to RF04184) and are publicly available as of release 14.6.
The existing family RF01693 was updated with a description indicating it may be the Bacteroidetes 6S, though the underlying alignment has not been updated as of publication. The original Stockholm format alignments underlying the conclusions in this manuscript have been deposited on Zenodo with https://doi.org/10.5281/zenodo.5156251 (https://doi.org/10.5281/zenodo.5156250).