Transcriptomic analysis of staphylococcal sRNAs: insights into species-specific adaption and the evolution of pathogenesis

Next-generation sequencing technologies have dramatically increased the rate at which new genomes are sequenced. Accordingly, automated annotation programs have become adept at identifying and annotating protein coding regions, as well as common and conserved RNAs. Additionally, RNAseq techniques have advanced our ability to identify and annotate regulatory RNAs (sRNAs), which remain significantly understudied. Recently, our group catalogued and annotated all previously known and newly identified sRNAs in several Staphylococcus aureus strains. These complete annotation files now serve as tools to compare the sRNA content of S. aureus with other bacterial strains to investigate the conservation of their sRNomes. Accordingly, in this study we performed RNAseq on two staphylococcal species, Staphylococcus epidermidis and Staphylococcus carnosus, identifying 118 and 89 sRNAs in these organisms, respectively. The sRNA contents of all three species were then compared to elucidate their common and species-specific sRNA content, identifying a core set of between 53 and 36 sRNAs encoded in each organism. In addition, we determined that S. aureus has the largest set of unique sRNAs (137) while S. epidermidishas the fewest (25). Finally, we identify a highly conserved sequence and structural motif differentially represented within, yet common to, both S. aureus and S. epidermidis. Collectively, in this study, we uncover the sRNome common to three staphylococcal species, shedding light on sRNAs that are likely to be involved in basic physiological processes common to the genus. More significantly, we have identified species-specific sRNAs that are likely to influence the individual lifestyle and behaviour of these diverse staphylococcal strains.


Introduction
The wide availability of sequenced genomes and the decreasing cost of producing such data have revolutionized the way molecular biology research is performed (Dark, 2013). With the increasing knowledge base of genomic information published each year there is an escalating demand for automated pipelines to identify and annotate genes within sequence data (Richardson & Watson, 2013). To highlight the vast amount of genetic information available, at the time of writing this manuscript, a total of 5443 completed prokaryotic genomes were available in the NCBI Genome database (http://www.ncbi.nlm.nih.gov/ genome), with a further 65 259 partially completed genomes. Furthermore, the rate of publication continues to increase exponentially each year for studies on such topics (Tatusova et al., 2015). Traditionally, the pipelines used for de novo genome assembly involve prediction of protein-coding genes, rRNAs and tRNAs, followed by comparison with a reference genome to assign ORF function (Richardson & Watson, 2013). However, many drawbacks exist to such approaches, not the least of which is a lack of efficient detection for small, regulatory RNAs (sRNAs), some of which can also encode small peptides (<50 aa).
The versatile genus Staphylococcus encompasses a diverse set of organisms that range from highly pathogenic to foodgrade species. Staphylococci live on the mucous membranes of virtually all animals, as well as in aged meat products. Staphylococcus carnosus is an avirulent, coagulase-negative member of the staphylococci with the highest G+C content, and is commonly used as a starter culture for fermented sausages (Rosenstein et al., 2009;Schleifer & Fischer, 1982;Wagner et al., 1998). The genome of S. carnosus, an organism often regarded as an ancient and genetically simple species, generally has a lack of mobile genetic elements, especially in comparison with the other staphylococci. In contrast, Staphylococcus epidermidis is a coagulase-negative, opportunistic pathogen that is found as a part of the normal human flora of the skin and nares (Otto, 2009). S. epidermidis infections often occur through indwelling devices such as catheters, but are rarely life threatening or invasive (Otto, 2009). Staphylococcus aureus, also a normal part of the human flora, is a coagulase-positive member of the staphylococci, and is one of the leading causes of human infectious disease and death. S. aureus causes a wide variety of infections, ranging from minor cellulitis to life-threatening sepsis, and is capable of infecting all organ systems (Archer, 1998;Lowy, 1998). Compounding its extensive pathogenicity is the widespread prevalence of antibioticresistant isolates, which severely limits the number of viable treatment options (Lowy, 2003).
Collectively, the diversity of lifestyles and evolutionary relationships between the staphylococci (Fig. 1) make this a model genus to ask how regulatory molecules change and adapt across species; and how they develop specialized, and niche-specific functions within a given organism. As such, in this study we identified and annotated the sRNA content of both S. epidermidis and S. carnosus using next-generation sequencing technologies coupled with comparative genomics. These newly annotated sRNAs were analysed for homology to each other, and to those recently curated by our group for S. aureus (Carroll et al., 2016), to identify conserved and unique elements for each species. In total, we identified 118 total sRNAs in S. epidermidis and 89 in S. carnosus, compared with 303 in S. aureus (Carroll et al., 2016). A comparison of these datasets revealed that each genome contains between 36 and 53 sRNAs that are common to all three organisms. Finally, we uncovered the presence of several highly homologous sRNAs in S. epidermidis and S. aureus that share conserved sequences, and appear to retain common structural motifs. Collectively, our work shines a light on these complex and largely overlooked regulators, providing insight into staphylococcal speciation, and the evolution of pathogenesis within this genus.

Impact Statement
Staphylococcus aureus is a leading cause of nosocomial infections and exhibits profound levels of antimicrobial resistance. The importance of this pathogen has been well established, forming the subject of extensive research, but a comprehensive understanding of the regulatory processes governing its virulence has yet to be elucidated. Recently, our group has investigated the role of regulatory RNAs (sRNAs) by cataloguing and annotating them in S. aureus genomes. The study presented here continues this line of research by performing transcriptomic analyses with two closely related species, Staphylococcus epidermidis and Staphylococcus carnosus, to annotate, for the first time, their genomes for sRNAs. The sRNAs of all three organisms were then compared to determine the common and species-specific sRNA content of each genome. In addition, we identified a subset of sRNAs shared between S. aureus and S. epidermidis that demonstrate high sequence and structural conservation. This study provides a platform to guide studies on sRNAs that are important for the general physiology of staphylococci (shared sRNAs) as well as the unique lifestyles of each organism (species-specific sRNAs).
overnight. Synchronous cultures were achieved as outlined by us previously (Kolar et al., 2011) before being grown for 3 h to the exponential phase.
RNAseq. Transcriptomic experiments using an Ion Torrent Personal Genome Machine (PGM) system (Ion Torrent) were performed as described by us previously (Carroll et al., 2014). Briefly, total RNA was isolated from exponentially growing cultures using an RNeasy kit (Qiagen), with DNA removed using a TURBO DNA-free kit (Ambion). Next, RNA integrity was confirmed utilizing an Agilent 2100 Bioanalyzer system in combination with a RNA 6000 Nano Kit (Agilent). To remove rRNA from samples, a Ribo-Zero rRNA Removal Kit (Bacteria) (Epicentre) and MICROBExpress Bacterial mRNA Enrichment Kit (Ambion) were used in a sequential approach; complete removal of rRNA species was confirmed using an Agilent RNA 6000 Nano Kit. cDNA libraries were constructed from the enriched RNA with an Ion Total RNA-seq Kit v2 (Ion Torrent), before cDNA fragments were amplified onto Ion Sphere Particles (ISPs) using an Ion PGM Template OT2 200 Kit (Ion Torrent) and an Ion OneTouch 2 system (Ion Torrent). Template-positive ISPs were subsequently loaded onto Ion 318 v2 chips (Ion Torrent) and sequencing runs were performed utilizing an Ion PGM Sequencing 200 Kit v2 (Ion Torrent). After completion of each run, data were imported to the CLC Genomics Workbench software (CLC bio; Qiagen) and aligned to the publicly available S. epidermidis RP62a (NCBI accession number: NC_002976.3) and S. carnosus TM300 (NCBI accession number: AM295250.1) genomes. The addition of novel annotations to the S. epidermidis and S. carnosus genomes was performed according to guidelines outlined by us previously (Carroll et al., 2016;Weiss et al., 2015). Updated annotation files including novel sRNA transcripts for S. epidermidis RP62a and S. carnosus TM300 were deposited to Figshare (Data citation 1). The annotation files containing sRNA annotations were used to generate expression values calculated as RPKM (reads per kilobase material per million reads) in CLC Genomics Workbench. All downstream bioinformatic analyses (e.g. BLAST searches investigating sRNA similarities between different species) were also performed with CLC Genomics Workbench software. RNA structure predictions were performed using the mfold web server (Zuker, 2003).
Northern blots. To confirm the presence of novel transcripts identified by RNAseq, we performed Northern blot analysis for selected sRNA candidates. Northern blots were performed as outlined previously (Caswell et al., 2012), as follows. RNA from exponentially growing cultures was isolated and DNA-depleted as described for RNAseq samples. RNA was electrophoretically separated in a 10 % polyacrylamide gel [1Â TBE (Tris/borate/EDTA) buffer, 7M urea]  and transferred to an Amersham Hybond N+ membrane (GE Healthcare) by electroblotting. Samples were crosslinked to membranes via UV radiation, followed by prehybridization in ULTRAhyb-Oligo buffer (Ambion) for 1 h at 43 C in a rotating oven. Next, [g-32 P]-ATP end-labelled oligonucleotides specific for each target RNA sequence (Table S1, available in the online Supplementary Material) were added to membranes and hybridized overnight at 43 C. The following day membranes were washed with 2Â, 1Â and 0.5Â SSC (saline and sodium citrate) buffer for 30 min at 43 C. Finally, membranes were exposed to X-ray film to detect radiolabelled and specifically bound probes.

Results
Annotation of sRNAs in the S. epidermidis RP62a and S. carnosus TM300 genomes The goal of this study was to gain insight into the impact of sRNAs on staphylococcal species-specific adaptation. A set of organisms was chosen to represent the diverse lifestyles of staphylococcal species: S. aureus USA300-Houston, an epidemic community-associated methicillin-resistant strain isolated from the wrist abscess of a 36-year-old, HIV-positive, intravenous drug user; S. epidermidis RP62a, a methicillin-resistant strain isolated from a patient suffering from intravascular catheter-associated sepsis; and S. carnosus TM300, originally isolated from dry sausage in 1982 in Germany (Highlander et al., 2007;Gill et al., 2005;Schleifer & Fischer, 1982;Rosenstein et al., 2009). Importantly, these organisms are intermediately and distanty related species (Fig. 1a), representing the highly virulent (S. aureus), the mildly virulent (S. epidermidis) and the avirulent (S. carnosus). As such, they have the potential to provide significant insight into those sRNAs that are core to the staphylococci, as well as those that influence species-specific adaptation.
Recently we re-annotated the genome of S. aureus (Carroll et al., 2016) to include all sRNAs from the literature, as well as several novel transcripts identified by our group using next-generation sequencing approaches (Fig. 1b). As such, we used a similar RNAseq-based approach to re-annotate the genomes of S. epidermidis RP62a and S. carnosus TM300.
To our knowledge, no sRNAs have been identified or studied in either S. epidermidis or S. carnosus to date, and neither published genome has any sRNAs currently annotated.
Given the absence of any information regarding the sRNAs of these two species, a transcriptomic approach was used to identify sRNAs in these genomes. Initially, each RNAseq was performed on cultures grown to the mid-logarithmic phase, with all reads generated aligned to the published genomes of S. epidermidis RP62a and S. carnosus TM300 (Gill et al., 2005;Schleifer & Fischer, 1982;Rosenstein et al., 2009). Files were then reviewed for the presence of sRNA reads using criteria defined by us previously for S. aureus and Acinetobacter baumannii (Carroll et al., 2016;Weiss et al., 2015), as: antisense to previously annotated protein coding genes (Fig. S1a), in intergenic regions (Fig. S1b) or that showed differential expression from annotated genes with which they overlapped (Fig. S1c).
The first genome-wide identification of sRNAs in S. epidermidis and S. carnosus In total, 118 and 89 sRNAs were identified in S. epidermidis RP62a and S. carnosus TM300, respectively (Tables 1 and 2). The sRNAs in each organism are distributed across their respective chromosomes, with the exception of a general lack of sRNAs in regions encoding prophages. The lack of sRNAs residing in these regions is perhaps to be expected, as these are relatively recent evolutionary events that have not yet been homogenized into the rest of the genome. To facilitate the addition of novel sRNA annotations in the future, an annotation system was created that does not relate to function, but instead acts only as an identifier (as described by us for S. aureus) (Carroll et al., 2016). As such, sRNAs from S. epidermidis were denoted as SERPs001-SERPs118, referring to their total number, for ease of sequential incorporation of new sRNA annotations in the genome. Similarly, in S. carnosus sRNAs were denoted as SCAs001-SCAs089. Newly annotated genes were given the gene names jointly annotated epidermidis loci (jaeL)1-118 and jointly annotated carnosus loci (jacL) 1-89 for S. epidermidis and S. carnosus, respectively. To confirm the size and expression of sRNAs discovered in S. epidermidis and S. carnosus, several representative transcripts were chosen for Northern blot validation (Fig. 2). Each of the sRNAs analysed produced a single, probe-specific band at the size suggested by RNAseq, and as annotated herein. These findings suggest that the methods used by our group to identify and annotate novel sRNAs are both robust and reproducible.
Defining the core staphylococcal sRNA content Given that a primary goal of this study was to better understand the sRNAs that are specific to each species, and that may contribute to their individual lifestyles, we first set out to elucidate the shared sRNA content of the staphylococci ( Fig. 3 and Table S2). An sRNA in one genome was considered homologous to another gene if BLAST searches returned an E-value 10 À10 in a region that had been annotated. As such, we queried all sRNAs from each organism in a nucleotide BLAST search against the genomes of the other two staphylococcal species to gain a comprehensive overview of the shared and unique sRNAs encoded by each genome.
A confounding issue to this approach, however, is that there does not appear to be a 1 : 1 ratio of sRNAs from one organism to another. For example, a number of sRNAs from S. aureus have significant sequence homology to several sRNAs from S. epidermidis (described in more detailed below ). Indeed, this is not a lone occurrence as each organism comparison results in several such relationships. Accordingly, the unique and shared sRNA content of the staphylococci can only be specifically calculated from one genome to another, rather than across the genus as a whole. Such analyses are visually represented in Fig. 3(a) where links represent a homologous relationship between sRNAs of S. aureus and S. epidermidis (blue) or S. carnosus (red). A single sRNA exists in S. epidermidis and S. carnosus that shares homology to each other but has no relationship to any in S. aureus (black link). The relative (Fig. 3b) and absolute ( Fig. 3c) number of shared sRNAs by genome vary significantly. At first glance it is readily apparent that nearly two-thirds of the sRNAs (187 of 303) previously identified in S. aureus are unique to this organism (Fig. 3b, c). S. carnosus has the next highest number of unique sRNAs, 41 of 89 ( Fig. 3b, c). The high percentage of unique sRNAs in S. carnosus (~46 %) is perhaps to be expected, as it is the most distantly related of the three organisms in this study. In contrast, S. epidermidis has the least number of unique sRNAs, at 25 of 118 (~21 %) (Fig. 3c), meaning that nearly 79 % of its sRNA content is shared with S. aureus and/or S. carnosus (Fig. 3b, c). Collectively, we identified 53 core and 187 unique sRNAs in S. aureus, 36 core and 25 unique sRNAs in S. epidermidis and 39 core and 41 unique sRNAs in S. carnosus (Fig. 3b, c). The conservation of sequence and expression suggests that these sRNAs may be involved in more central and conserved processes, such as metabolism. As such, unique sRNAs may represent elements that are probably involved in individual, species-specific adaptation, which, in the case of S. aureus, suggests virulence processes.
A consideration with these data is that ours is the first study to evaluate S. epidermidis and S. carnosus sRNAs, which are derived from a single transcriptomic experiment. Conversely, studies by many groups, using a wealth of different approaches, have contributed to the 303 S. aureus sRNAs identified thus far. This is placed in context when one considers that the S. aureus sRNA content is greater than that from S. epidermidis (118 in total) and S. carnosus (89 in total) combined. As such, the possibility remains that several other sRNAs exist in these latter two species, but are not expressed under the conditions tested in our study. Accordingly, all sRNAs from S. aureus that showed significant sequence homology (E-value 10 À10 ) to regions in the S. epidermidis or S. carnosus chromosomes were identified and denoted (Fig. 4a, Table S2). These regions were not annotated as sRNAs in the newly generated genome annotations, but their locations have been recorded (Table S2). While these loci did not show any transcriptional activity in S. epidermidis or S. carnosus in our study, they do share high sequence homology to known sRNAs of S. aureus, and thus may be expressed under different conditions not examined within this study. These transcriptionally inactive regions are linked to their homologous sRNA in S. aureus using blue and red links (S. epidermidis and S. carnosus, respectively) as before ( Fig. 4a). When one factors these homologous, transcriptionally inactive regions into the shared and unique calculations, a very different picture appears (Fig. 4b, c). Specifically, the number of shared sRNAs increases greatly, elevating the putative S. aureus core-sRNA content from 53 to 87, whilst at the same time decreasing the number of unique sRNAs from 187 to 137.

ORF prediction and conservation
The genomes of S. aureus USA300, S. epidermidis RP62a and S. carnosus TM300 have previously been annotated for standard genomic features, including origin of replication, tRNAs, rRNAs and protein-coding genes. During the automated annotation process, ORFs smaller than 50 codons in length are generally dismissed, but the importance of small peptides (those smaller than 50 aa) encoded by small ORFs is becoming increasingly recognized (Hobbs et al., 2011;Storz et al., 2014). As such, we examined the predicted ORF content of the newly annotated transcripts, as our annotation process does not exclude potential protein-coding genes (Tables S3 and S4). In S. epidermidis, only a single newly annotated transcript had a predicted ORF of 50 codons or longer, whilst 111 had predicted ORFs between five and 50 codons, and six had no predicted ORFs of five or more codons. Similarly, in S. carnosus six newly annotated transcripts had predicted ORFs greater than 50 codons, 73 had ORFs between five and 50 codons long and 11 sRNAs had no identifiable ORFs of five or more codons. Importantly, none of the predicted ORFs within each of the organisms examined had any significant homology to any protein with known function aside from the S. aureus Dhemolysin. Furthermore, the predicted ORFs from all three organisms also have very little similarity to each other, suggesting that these may not be translated (Tables S5 and S6). S. epidermidis and S. carnosus have a similar number of predicted ORFs per sRNA (3.3 ORFs and 3.9 ORFs per sequence, respectively) whereas the S. aureus sRNAs contain a much carnosus. Total RNA was isolated from S. epidermidis RP62a (a) and S. carnosus TM300 (b) cultures grown to the mid-logarithmic phase. Samples were analysed using DNA probes specific to each transcript. Size markers, and the RNA probed for, are denoted on each gel. higher number of predicted ORFs (11.2 ORFs per sequence). This discrepancy is probably due to a difference in the average size of annotated sRNAs, as S. aureus has an average sRNA size of 506 nt compared with S. epidermidis and S. carnosus with 190 and 217 nt, respectively. As a note, the algorithm used to predict potential ORFs can predict more than one ORF per sRNA but does not evaluate the presence or absence of a ribosomal binding site. As such, the presence of an ORF does not provide any information on the likelihood of translation.

An interspecies conserved and recurring sRNA structural motif
Initial investigations into the overall conservation of sRNA content in the staphylococci revealed the presence of a number of S. epidermidis elements with homology to sRNAs from S. aureus (Fig. 3a). Twenty-one sRNAs from S. epidermidis and three from S. aureus demonstrate a higher than random level of homology as first identified by BLAST analysis (Fig. 3a), and confirmed by sequence alignments (Figs 5a, 6a and S2). The sRNAs identified in S. epidermidis have a significantly higher level of nucleotide identity to each other (as determined by pairwise comparisons) than to the sRNAs of S. aureus, or that the S. aureus sRNAs do with each other (Figs S2-S4). Furthermore, while sequence conservation does exist between SAUSA300s206 and the other 23 sRNAs identified, it is the most divergent sequence (Figs 6a, S2 and S3).
The 21 highly related sRNA genes in S. epidermidis have a higher relative G+C content, ranging from 32.6 % for SERPs014 to 44.2 % for SERPs106, than the relative G+C content of the S. epidermidis genome (32.2 %). They also span a range of sizes from 98 bp (SERPs103) to 217 bp (SERPs099), with this variation seemingly attributable to differences in their 5' regions ( Fig. 5a). Conversely, each sequence shares a key region of high conservation that extends approximately from the middle of the sequence to its 3 ¢ end. This common region demonstrates nucleotidelevel conservation of 71.4-100 %, not including the 3 bp insertion found in SERPs013. Using this information, we generated a consensus sequence (SERPsCon) that reflects the nucleotide identity of >71 % of the sequences in the conserved region.
The SERPsCon sequence and each S. epidermidis sRNA were subjected to secondary structure prediction using the mfold software ( Fig. 5b, c, respectively) (Zuker, 2003). The predicted SERPsCon structure includes a stem with two single-stranded regions, and a terminal loop (bracketed) near the 5 ¢ end of the molecule that includes 28 of the 38 residues conserved in all 21 sequences (Fig. 5a, b). The terminal single-stranded region of the SERPsCon structure within the bracket has a 10 nt sequence that is variable only at the ninth residue, and defined by the sequence motif 5 ¢ -GAAGACUAYA (Fig. 5b). Furthermore, mfold secondary structure predictions performed on the 21 S. epidermidis sRNAs suggest the sequence homology extends to structural conservation. The secondary structure predictions suggest that in all 21 of these elements, the region corresponding to SERPsCon (Fig. 5c, red regions) includes an extended stem-loop structure that is identical (for 17 of the 21) to the motif defined in SERPsCon (5 ¢ -GAAGA-CUAYA). The remaining four sRNAs have the same sequence motif at the terminus of a stem, although the optimal structure, as predicted by mfold, suggests less single-strandedness. The conserved region and terminal loop do not appear to be related to any known RNA families or motifs as determined by an Rfam analysis (http://rfam.xfam.org/) (Nawrocki et al., 2015), and thus may constitute a new regulatory RNA family.  SAUSA300s288 in relation to SERPsCon and the other S. epidermidis sRNAs (Fig. 6c). The predicted secondary structure of SAUSA300s288 has the highest level of structure and sequence conservation, with nine residues in the terminal loop structure, seven of which are perfectly conserved in relation to SERPsCon (Fig. 5b). SAUSA300s205 contains a 59 nt insert within the region corresponding to SERPsCon that necessarily shifts the structure, resulting in a slightly lower level of sequence conservation in the terminal loop (six of nine residues) (Fig. 6c). Folding predictions of SAU-SA300s206 suggest very little, if any, structural conservation, mirroring the lack of sequence similarity, with the other sRNAs (Fig. 6a, c). However, as perhaps is expected, mfold analysis of RC-SAUSA300s206 suggests structural conservation including the terminal loop (six of nine residues) (Fig. 6c).
The antisense nature of SAUSA300s206 in comparison with SAUSA300s205 and SAUSA300s288 hints at the possibility of an interaction between SAUSA300s206 and the other two sRNAs. To evaluate this potential, we queried the SAUSA300s206 sequence against the target sequences SAUSA300s205 and SAUSA300s288 using RNA-RNA interaction prediction software, IntaRNA (http://rna.informatik.uni-freiburg.de/) (Busch et al., 2008;Wright et al., 2014). Perhaps unsurprisingly, the predicted areas of interaction between SAUSA300s206 and both of the other sRNAs are extensive, and have a very low free energy (À181.7 and À95.3 kcal mol À1 for SAUSA300s205 and SAUSA300s288, respectively) thus making these interactions energetically favorable.
Functional prediction of S. aureus-specific sRNAs A major goal of this study was to differentiate sRNA content between the staphylococci (Table S2), and to garner a better understanding of the potential physiological role for unique elements, particularly in the context of S. aureus pathogenesis. As such, the complete set of S. aureus-specific sRNAs (Table S7) was subjected to target prediction using Tar     , pathogenicity islands (red), prophages (orange) and other genomic islands (yellow)], sRNAs encoded on the forward strand and, innermost, the reverse strand. The inner links connect sRNAs that have sequence conservation. Red and blue links show homologous sRNAs between S. aureus and either S. carnosus or S. epidermidis, respectively; and the black link indicates a single homologous sRNA shared between S. epidermidis and S. carnosus but with no relation to any in S. aureus. (b) Pie charts representing the portion of sRNA content that is shared with each of the species in this study. The total sRNA content of each genome is indicated. (c) Numbers used to generate images in (b). Shown is the number of sRNAs shared between a given species pairing (upper section of each cell) as well as the number of sRNAs unique to a given species pairing (lower section of cells). For example, S. aureus has 105 sRNAs in common with S. epidermidis, but only 52 of these 105 are unique to S. aureus and S. epidermidis (i.e. not found in S. carnosus). When viewing these data, an organism-specific point-of-view must be employed to understand the differences in numbers from similar comparisons. Specifically, the numbers are different for S. aureus vs. S. epidermidis (105 and 52 sRNAs shared and specific, respectively) compared with S. epidermidis vs. S. aureus (92 and 56 sRNAs shared and specific, respectively) because S. aureus has 52 sRNAs that are homologous to 56 sRNAs in S. epidermidis. (Kery et al., 2014). The resulting list of putative targets (Table S8) was subjected to ontological classification, to identify those that are known virulence factors. Of note, 85 of the 137 (62 %) sRNAs unique to S. aureus were found to have the capacity to interact with at least one virulence-related transcript. Interestingly, the gene with the highest number of predicted sRNA regulators (10 different sRNAs) was splA, which encodes a well-characterized serine protease (Stec-Niemczyk et al., 2009). This is particularly compelling as S. aureus proteases have a major role in pathogenesis via  the global modulation of virulence determinant stability (Kolar et al., 2013). As such, this clearly suggests potential for sRNA-based regulation of the infectious process in S. aureus. Ultimately, each of the predictions generated require further experimental verification to assess specific functional roles. However, we suggest that the data presented herein represent an important first step in exploring the influence of sRNAs in the staphylococci, and their impact on speciesspecific adaptation.

Discussion
The advent of next-generation sequencing technologies has resulted in a vast amount of genomic and transcriptomic SERPs103 SERPs106 Fig. 5. Sequence and structural conservation of several highly related and newly identified S. epidermidis sRNAs. (a) A sequence alignment of 21 newly identified sRNA genes from S. epidermidis, with a particular focus on the most conserved region within each. Within the zoomed in region, conservation at the nucleotide level is shown, with a consensus sequence generated from the alignment presented (SERPsCon). The level of conservation of each nucleotide is indicated by colour, and the number of sequences containing the conserved residue from all 21 sRNAs. Purple, conservation in all 21 sequences; dark green, 20/21; yellow, 19/21; orange, 18/21; and white, not conserved, and not included in the consensus sequence. Alignment, conservation analysis and consensus sequence generation were performed using the CLC Genomics Workbench software. (b) RNA secondary structure prediction for the consensus sequence generated in (a), with each residue colour coded to its level of conservation, as detailed in (a). RNA secondary structure predictions were generated using the mfold software. (c) RNA secondary structure predictions for each of the 21 sRNAs from the alignment. The most highly conserved region of each [from the zoomed in area in (a)] is highlighted in red. RNA secondary predictions were again generated using the mfold software.

SERPsCon
data available for all domains of life. This flood of data has resulted in the need for automated annotation software (Dark, 2013;Richardson & Watson, 2013). While automated annotation has become fairly robust for proteincoding regions, tRNAs and rRNAs, the ability to accurately predict the presence of other non-coding RNAs lags behind, which necessitates the manual curation of such genes (Sridhar & Gunasekaran, 2013). Collectively, sRNAs are of growing interest, as the diverse roles they play in regulating carbon metabolism, virulence gene expression, iron acquisition and many other cellular processes becomes increasingly apparent (Hoe et al., 2013;Beisel & Storz, 2010;Murphy et al., 2014;Caron et al., 2010;Geissmann et al., 2006;Harris et al., 2013;Papenfort & Vanderpool, 2015;Oliva et al., 2015). The inability to efficiently identify and annotate these elements hinders research on sRNAs and creates a need for transcriptomic-based approaches to supplement automated annotation software pipelines. To this end, our group has begun manually cataloguing and curating these molecules into their respective genomes within the staphylococci.
In the present work, we have identified and annotated sRNAs in the genomes of both S. epidermidis RP62a and S. carnosus TM300 using RNAseq methodologies. The total sRNA contents of S. epidermidis and S. carnosus were compared with our previous work in S. aureus, generating a fully comprehensive comparison of the shared and unique sRNA content of these common staphylococci. In so doing, we

T GGCG A G A C T CC T G A GGG A GC A G T GCC A G T CG A A G A CCG -A GGC T G A G A CGGC A CCC T A GG A A G T GGCG A G A C T CC T G A GGG A GC A G T GCC A G T CG A A G A CCG -A GGC T G A G A CGGC A CCC T A GG A A -T GGCG A G A C T CC T G A GGG A GC A G T GCC A G T CG A A G A CCG -A GGC T G A G A CGGC A CCC T A GG A A --GGCG A G A C T C T T G A GGG A A C A GG A C A A GC T G A A G A C T A C A GGC T G A A GC T G T CCCC T A A G A A -T A GCG A A GCC A T T C A A T A CG A A G T A T T G T A T A A A T A G A G A A C A GC -A GCG A A GCC A T T C A A T A CG A A G T A T T G T A T A A A T A G A G A A C A GC -A GCG A A GCC A T T C A A T A CG A A G T A T T G T A T A A A T A G A G A A C A GC -A GCG A -GCC A A -C A A T A CG A A G T A T T G T A A A T A A A G A A GCC
SAUSA300s206 SAUSA300s288 RC-SAUSA300s206 Fig. 6. The S. epidermidis sequence and structural motif is conserved in homologous sRNAs in S. aureus. (a) Sequence alignment of three sRNA genes (SAUSA300s205, SAUSA300s206 and SAUSA300s288) from S. aureus and the consensus sequence generated (SERPsCon) in Fig. 4(a). Sequence annotations are shown on the left, and on the right total sequence length. Zoomed in areas show nucleotide conservation amongst the four sequences, with the conservation for each residue indicated by colour. Purple is 100 % conservation, green is 75 % and yellow is 50 % or below. (b) As in (a), but containing the reverse complement region of SAUSA300s206 (RC-SAUSA300s206) instead of its native orientation. (c) Secondary structure predictions for each S. aureus sRNA as well as RC-SAUSA300s206. Regions sharing a high level of homology to SERPsCon as determined in (a) (SAUSA300s206) or (b) (the rest) were highlighted in each structure prediction. RNA secondary predictions were generated using the mfold software.
identified and annotated 118 and 89 novel sRNAs in S. epidermidis and S. carnosus, respectively. The sRNA content of these two genomes initially appears strikingly small compared with S. aureus (303 annotated sRNAs). The difference in the number of sRNAs between these organisms is probably not due to differences in genome size (2 872 769, 2 616 530 and 2 566 424 bp, respectively), but rather an artifact of the overall number of conditions tested for sRNA expression within each species. For S. epidermidis and S. carnosus, our study is the first assessment of their sRNA content, based on a single growth condition (mid-logarithmic phase, TSB at 37 C), whereas those sRNAs for S. aureus are derived from a wealth of different studies and experimental conditions (Pichon & Felden, 2005;Marchais et al., 2009;Geissmann et al., 2009;Abu-Qatouseh et al., 2010;Bohn et al., 2010;Beaume et al., 2010;Nielsen et al., 2011;Xue et al., 2014;Anderson et al., 2006Anderson et al., , 2010Olson et al., 2011;Howden et al., 2013;Carroll et al., 2016). The discrepancy in the number of studies that have examined the sRNA content of these three organisms also underlies the very different proportion of sRNAs common to the staphylococci in each genome. For example, considering only the transcriptionally active sRNA comparisons, S. aureus has a common sRNA set of 53 (~17.5 %) while S. epidermidis and S. carnosus have 36 (~30.5 %) and 39 (~43.8 %), respectively. The sRNAs represented in all three genomes probably have similar roles within the cell, speculatively involved in evolutionary conserved processes such as basic metabolism and maintenance of cellular homeostasis. While the number of sRNAs shared by S. aureus increases to 87 (~28.7 %) if the homologous, but transcriptionally inactive, regions of S. epidermidis and S. carnosus are included, this is still a smaller proportion of the sRNAs compared with S. epidermidis, and considerably smaller than that of S. carnosus (~30.5 % and 43.8 %, respectively). One could hypothesize that these sRNAs may be involved in conserved processes that are perhaps unnecessary under the conditions tested. Conversely, and of some interest, several regions within the S. aureus genome show high sequence similarity to newly annotated sRNAs from S. epidermidis and S. carnosus, despite themselves being transcriptionally silent (data not shown). Either the presence of such regions suggests an evolutionary event that has silenced expression from these loci, or, perhaps a more likely scenario, we have yet to elucidate the permissive conditions for their expression in S. aureus. As such, a need exists for further research into lifestyle-specific and pathophysiologically relevant transcriptomic conditions and effects within the staphylococci.
The presence of a set of highly conserved sRNAs from S. epidermidis and S. aureus is seemingly quite unusual. The high level of sequence similarity within these sRNAs also results in a conserved structural motif that takes the form of a stem and multi-loop region, ending in a terminal hairpin with an unpaired, conserved 9-10 nt motif. Conservation of the multi-loop stem and terminal loop would suggest a common function for these sRNAs as a group and/or for the region of homology. Several possibilities for general function present themselves with such sequence and structure conservation. For example, it is possible that these structures act to bind and sequester proteins, as is the case for the CsrB/C sRNAs. CsrB/C sRNAs were originally identified in Escherichia coli as binding to and sequestering the CsrA protein through a conserved, repeated RNA motif, ultimately affecting carbon utilization and virulence gene expression (Liu et al., 1997;Jonas & Melefors, 2009). A second scenario, that has been demonstrated for several of the Rsa sRNAs in S. aureus (first characterized for their UCCC motif), is that the terminal hairpin serves to bind conserved regions within a target RNA, and the surrounding, less conserved regions confer target specificity (Geissmann et al., 2009). More work is necessary to elucidate the function of each of these individual sRNAs as well as the conserved domain that characterizes them. Curiously, the homology searches also identified SAU-SA300s206 within this group, although further in silico analysis demonstrated SAUSA300s206 has high sequence complementarity to SAUSA300s205 and SAUSA300s288 as well as SERPsCon. The presence of high levels of sequence complementarity begs the question: is SAUSA300s206 a regulator of SAUSA300s205 and SAUSA300s288? Regulation of one sRNA by another (so-called anti-sRNAs) is not unprecedented in the literature. In E. coli the molecular mechanism for interaction of two such anti-sRNAs, AsxR and AgvB, with their targets has recently been elucidated (Tree et al., 2014). AsxR binds the sRNA FnrS, which normally represses the expression of a heme oxygenase, ChuS; thus, AsxR acts to enhance expression of ChuS (Tree et al., 2014). In the context of AgvB, it binds the sRNA GcvB, repressing the GcvB-dependent repression of DppA expression (Tree et al., 2014). In direct parallel to this, our group recently identified a set of highly transcribed, highly homologous sRNAs in A. baumannii, termed Group 1 sRNAs, for which there appears to be an anti-sRNA, ABUWs043 (Weiss et al., 2015). ABUWs043 is encoded in an antisense fashion to ABUWs042, and thus may regulate ABUWs042 through several means, including promoter interference and/or complementary binding (Weiss et al., 2015). Importantly, ABUWs043 has a high level of sequence complementarity to the rest of the Group 1 sRNAs (21 such elements exist in the A. baumannii genome), albeit lower than that found for ABUWs042, suggesting ABUWs043 may regulate the rest of the Group 1 sRNAs in an anti-sRNA fashion (Weiss et al., 2015). SAUSA300s206 shares many characteristics with these confirmed and putative anti-sRNAs, but ultimately more work must be done to characterize its function within S. aureus. Finally and perhaps the most intriguing observation about these sRNAs is the absence of an identified anti-sRNA encoded in S. epidermidis that shares homology with SAU-SA300s206. The possibility that such an sRNA exists cannot be excluded, although the potential that this is an S. aureus specific adaptation is a potentially fascinating point of evolution. Regardless, a better understanding of the function of these 24 sRNAs may underlie basic physiological and regulatory differences between S. aureus and S. epidermidis, and further our understanding of the staphylococci in general.