Snapshots of a shrinking partner: Genome reduction in Serratia symbiotica

Genome reduction is pervasive among maternally-inherited endosymbiotic organisms, from bacteriocyte- to gut-associated ones. This genome erosion is a step-wise process in which once free-living organisms evolve to become obligate associates, thereby losing non-essential or redundant genes/functions. Serratia symbiotica (Gammaproteobacteria), a secondary endosymbiont present in many aphids (Hemiptera: Aphididae), displays various characteristics that make it a good model organism for studying genome reduction. While some strains are of facultative nature, others have established co-obligate associations with their respective aphid host and its primary endosymbiont (Buchnera). Furthermore, the different strains hold genomes of contrasting sizes and features, and have strikingly disparate cell shapes, sizes, and tissue tropism. Finally, genomes from closely related free-living Serratia marcescens are also available. In this study, we describe in detail the genome reduction process (from free-living to reduced obligate endosymbiont) undergone by S. symbiotica, and relate it to the stages of integration to the symbiotic system the different strains find themselves in. We establish that the genome reduction patterns observed in S. symbiotica follow those from other dwindling genomes, thus proving to be a good model for the study of the genome reduction process within a single bacterial taxon evolving in a similar biological niche (aphid-Buchnera).

Obligate microbial symbionts (whether primary, secondary, tertiary, or other) are present in a variety of eukaryotic organisms, such as leeches (Annelida: Hirudinida) 1 , gutless oligochaetes (Annelida: Oligochaeta) 2 , and insects (Arthropoda: Insecta) 3 (see ref. 4). These have the capacity to produce essential nutrients their hosts cannot synthesise nor obtain from their diet [5][6][7][8] , making them essential for the correct development and survival of their partners. On the other hand, facultative symbionts are dispensable, although under certain environmental challenges/niches, they can endow the host with desirable traits, ranging from defence against parasitoids or fungal parasites to survival after heat stress (reviewed in refs 9 and 10). Moreover, these facultative endosymbionts can even affect the performance of its host on a certain food source (e.g. a plant) [11][12][13] .

Results and Discussion
S. symbiotica strains and their shrinking genomes. Generally, "ancient" obligate endosymbionts hold highly reduced genomes, as small as 112 kilo base pairs 45 (hereafter kbp). Conversely, more "recently" derived endosymbionts (including facultative ones) tend to display larger genomes, all the way up to the 4.5 Mbp genome of S. glossinidius (reviewed in refs 46 and 47). Accordingly, the different genomes of S. symbiotica strains land within and along this spectrum, from the large 3.58 Mbp genome of the facultative SAf to the small 0.65 Mbp genome of the co-obligate STs (Fig. 1). Similarly to the other large endosymbiotic genomes [23][24][25]48,49 , SAf, SAp, and SCt's display a large enrichment of MEs, both in terms of diversity and number of them ( Fig. 1; Supplementary  Table S1 and Dataset S1). While Db11 holds nine insertion sequence (hereafter IS) elements and one TnTIR transposon; SAf, SAp, and SCt all hold over one hundred IS elements, show an enrichment of TnTIR transposons, and have gained group II intron mobile elements. Interestingly, the composition of the IS families (the most common type of MEs found within these genomes) seems to be lineage-specific. While IS3 and IS256 are Serratia genomes are depicted as circular plots and are arranged from largest (leftmost) to smallest (rightmost). From outermost to innermost, the rings within the genome plots display features on the direct strand, the reverse one, and RNA genes. Inside the circles, coloured lines connect the same-family IS elements scattered throughout the genome, following the colour code at the very bottom of the image. The grey bars on top of the genome plots describe the lifestyle and genome reduction stage. Underneath the genome plots, the strain alias and the host, between parenthesis, are shown. Below, a table showing the genomic features of each strain and pie charts displaying the relative abundance of IS-family elements, with the two most abundant highlighted by name. Underneath, the colour code for the different IS elements.
the most prevalent in SAf and SAp (both facultative endosymbionts from Aphidinae aphids), IS481 and IS5 are the most common in SCt (co-obligate endosymbiont from C. tujafilina [Lachninae]). Conversely, the smaller genomes of SCc and STs lack any traces of MEs, congruent with similar-sized endosymbiotic genomes (see ref.
3). It is important to note that, given the highly fragmented assembly of SAp, the absolute counts of long repetitive mobile elements such as Tn3, IS3, and group II intron, might be underestimated. Following the trend of many other dwindling genomes 3 , all S. symbiotica have a GC content lower than that of their free-living counterpart, Db11. While this GC content is very similar among SAf, SAp, and SCt (52.1%), there is a marked decrease in SCc (29.2%) and even more so in STs (20.9%). Additionally, while there is a great enrichment of pseudogenes in SAf (126), SAp (550), SCt (916), and SCc (110), the small STs is almost deprived of these gene remnants. This genetic erosion comes along with a decrease in coding density. Accordingly, while SAf shows only a small decrease when compared to Db11 (87.9% to 78.2%), SAp, SCt, and SCc exhibit a marked drop down (56.8%, 53.4%, and 39.0%, respectively), mainly due to the increased pseudogenisation and "junk" DNA. On the other hand, the highly-reduced STs shows a high coding density (77.5%). This difference between SCc and STs is mainly due to highamount "junk" DNA that is present in SCc's genome, amounting to almost half of it 39 . Finally, we also found a gradual loss (from free-living Db11 to co-obligate intracellular STs) of RNA features (rRNAs, tRNAs, and other non-coding RNAs [hereafter ncRNAs]), revealing their different levels of genomic erosion. As has been previously observed in other endosymbionts, genome erosion comes with a "disturbance" of the functional profile of the organism, when compared to their free-living relatives 14,18 . Accordingly, prior analyses have described that while the functional profiles of free-living Serratia strains were very stable, a displacement of it was evident in SCc and SAp 50 . Through a similar analysis using all five currently available S. symbiotica strains, we have determined that while the recently-derived SAf, SAp, and SCt strains cluster together and are most similar to DB11, the highly-reduced SCc and STs form a divergent cluster from the rest of Serratia strains ( Fig. 2A). These two S. symbiotica clusters differ mainly in the relative presence of MEs (category X) and translation-related genes (category J). While the former reflects the enrichment of SAf, SAp, and SCt's in MEs, the latter evidences the common trend in highly reduced endosymbionts to retain housekeeping genes (e.g. category J includes all ribosomal proteins) (see ref. 3).
In the early stages of an endosymbiont's genomic reduction, the genome's enrichment in MEs can lead to rearrangement 18,38 . Generally, these rearrangements get fixed in the endosymbiotic lineage once the MEs have been lost, as is observed by the general genome-wide synteny displayed in Buchnera 15,51 , Blochmannia 26 , or Blattabacterium 52 . Nonetheless, some endosymbionts such as Portiera have been found to present lineage-specific genome rearrangements, putatively mediated by large repetitive intergenic regions 53 . Free-living Serratia strains display general genome-wide synteny 50 , on the contrary, S. symbiotica genomes display various rearrangements when compared to free-living Db11's, and even among each other's ( Fig. 2B,C). Interestingly, while the less-reduced genome of SAf displays the most similarity (in terms of rearrangements) to Db11, the drastically-reduced genome of STs has accumulated the highest number of rearrangements. Also, SCc and STs' genomes, which both lack MEs, display no synteny between them. These observations suggest that all S. symbiotica lineages have diverged before the loss of MEs, allowing a great number of lineage-specific reorganisation.

Erosion of essential amino acid biosynthetic routes. A general feature of endosymbiotic genomes
is the loss of non-essential genes, leading to highly reduced genomes with a genetic repertoire specialised in the symbiotic function (reviewed in ref. 3). In aphids, Buchnera, the primary obligate endosymibont, is mainly in charge of producing essential amino acids (hereafter EAAs) for its host. Therefore, it is expected that co-existing symbionts show degraded biosynthetic routes involved in the production of these compounds. By analysing these routes in S. symbiotica, the gradual degradation of genes and operon attenuators implicated in the synthesis of EAAs becomes immediately evident (Fig. 3). The recently-derived SAf shows intact routes for most EAAs, with the notable expections of lysine and methionine. As already described in a previous study, there is a marked difference in the retention of leucine, arginine, and histidine biosynthetic-related genes even between the closely related facultative SAp and co-obligate SCt 38 . Finally, by comparing SCc and STs against the other S. symbiotica strains and each other, it becomes evident that both have become highly dependent on Buchnera for the supply of EAAs, with the main difference between SCc and STs being the purging of the remaining pseudogenes in the latter.
Decay of RNA features and the loss of regulation. Typically, highly-reduced endosymbionts retain only a small number of ncRNAs and other RNA features 54 (see Supplementary Figs S1 and S2). Through an annotation of these in the genomes of S. symbiotica and Db11, we have explored the erosion of RNA features ( Fig. 4: top panel). We found that in the recently-derived endosymbionts SAf, SAp, and SCt, many of these features are still retained, although differentially. This points towards drift acting behind the loss of these features at the early stages of genome reduction. As expected, these three genomes show the acquisition of ME related ncRNAs, which all belong to the large class of self-catalytic group II introns (RF00029, RF01999, RF02001, RF02003, RF02005, RF02012). In the intermediate and late stages of drastic genome reduction SCc and STs find themselves in, respectively, most of the RNA features have been lost. Conserved features across S. symbiotica are the the 4.5S RNA component of the signal recognition particle (SRP) (ffs), the RNase P M1 RNA component (rnpB), the tmRNA (ssrA), the tpke11 small RNA (of unknown function), the leader sequence from the rnc-era transcription unit (coding for the ribonuclease 3 and the GTPase Era), and the alpha operon leader (coding for the 30S ribosomal subunits S13, S11, and S4; the 50S ribosomal subunit L17; and the DNA-directed RNA polymerase subunit alpha). The first three are interestingly also retained in other small genomes, but unidentifiable in some tiny genomes ( Supplementary Fig. S1), hinting at these being essential functions retained until the last stages of genome reduction. Since most of these RNA features are related to the regulation of gene expression (small antisense RNAs, riboswitches, and leader sequences [including amino acid operon attenuators]), these losses would reflect a general trend of gene-regulation-loss in endosymbiotic genomes through the erosion of RNA features.
Regarding tRNAs, we observed a drastic reduction in tRNA-gene number, particularly marked in SCc and STs (Fig. 4: bottom panel). These losses, as in other reduced endosymbionts (see Supplementary Fig. S2), mainly affect redundancy rather than variety. Contrasting the other S. symbiotica genomes, we were unable to detect a tRNA with aminoacyl charging potential for glutamate in SCc. This is similar to what is observed in other tiny genomes, where some tRNAs with certain aminoacyl charging potential are absent (Supplementary Fig. S2). However, the presence of a tRNA Glu in a yet-unidentified plasmid cannot be discarded. Also, a loss of the selenocysteine tRNA is already present in the early co-obligate SCt, consistent with the loss of other selenocysteine-related genes, and completely absent in the smaller SCc and STs. It is important to remark that, in SAp, one of the tRNA Met copies has undergone a mutation in its anticodon (CAT → AAT), which could theoretically lead to the ATT codon to be recognised as coding for methionine. Finally, the tRNAs for formyl-methionine (tRNA fMet ), in charge of aminoacylation of the starting methionine, and lysylated isoleucine (tRNA kIle ) are conserved even in the two smallest S. symbiotica. This follows the trend observed in other reduced genomes (Supplementary Fig. S2) and points towards the essential nature of these tRNAs.
Informational machinery. By analysing and comparing the informational machinery (ribosome-, transcription-, translation-, and DNA replication/repair-related genes) in S. symbiotica strains, both high preservation as well as gradual patterns of deterioration become evident in different categories. The ribosome, as well as the tRNA aminoacylation genes are mostly perfectly preserved (Fig. 5: top). Marked differences include the presence of multiple copies of the three rRNA genes in SAf, SAp, and SCt, and the absence of two ribosomal proteins (rpsI and rplM) as well as the prolyl-tRNA synthetase gene (proS) in STs. While the retention of only one copy of the rRNA genes reflects the tendency of endosymbiotic, and other reduced genomes, to eliminate redundancy 55 , the loss of the rpsI gene (coding for the 30S ribosomal subunit S9) reflects the loss of a non-essential gene. In Escherichia coli, it has been experimentally proven that a null mutant of the rplI gene is able to grow, albeit showing a slow growth phenotype 56,57 . Most intriguing are the losses of the rplM (coding for the 50S ribosomal subunit L13) and proS genes. The former has been described as essential in E. coli 57 , and its loss could be related to the loss of rpsI, that together with rplM forms an operon. The latter, could reflect a putative functional replacement of the ProS protein activity by another non-specific aminoacyl-tRNA synthetase. This phenomenon has been observed for the prolyl-tRNA synthetase of Deinococcus radiodurans, which has the ability to charge cysteine to tRNACys 58 . This non-specific aminoacyl-tRNA synthetases have also been observed in archeal organisms (reviewed in ref. 59), suggesting this to be a common mechanism to cope with the lack of a specific aminoacyl-tRNA synthetase.
Both rRNAs and tRNAs undergo a series of modifications that are required to produce the mature version of these ncRNAs (reviewed in refs 60 and 61). By analysing the genes involved in both rRNA and tRNA modifications, we observed that while the recently-derived SAf, SAp, and SCt hold a rather complete set (with particularly marked losses of 23S rRNA methyltransferases), the highly-reduced SCc and STs retain only a small fraction of these genes (Fig. 5: middle). With the notable exceptions of the fmt, tilS, trmD, tsaB, tsaC, tsaD, and tadA genes (all retained in the small SCc and STs), individual knockout mutants all of the rRNA and tRNA modification-related genes in E. coli (except miaE, which is not present in this organism) have dimmed them non-essential [62][63][64][65] . The fmt and tilS genes code for the proteins responsible for the attachment of a formyl group to the free amino group of methionyl-tRNA fMet (for initiator methionine) 66 and the modification of the wobble base of the CAU anticodon of the tRNA kIle 67,68 , respectively. The retention of these two genes thereby insure both the correct charging of the initiator methionine in proteins (which is posttranslationally-removed) and the accurate   69 . Interestignly, the tsaE gene (an ATPase), which has been found to be non-essential in E. coli under anaerobic conditions 70 , is missing from STs, thus the biosynthesis of t6A would be either putatively impaired or working in an unknown way. Finally, the tadA gene, which codes for a tRNA-specific adenosine deaminase that is essential for viability in E. coli 71 , is retained even in the small STs.
In regards to DNA replication and repair, the gene losses are particularly marked in the most genomically reduced symbionts, SCc and STS, affecting mostly DNA repair-related genes (Fig. 5: bottom left). This is also observed in other reduced endosymbionts (see ref. 47), and is possibly related to the triggering of more drastic genome erosion (reviewed in ref. 3). DNA replication-related losses affect the non-essential holE gene of the DNA polymerase, the priA-dependent primosome (retaining an elementary DNA-dependent one [missing the auxiliary Hup proteins]), and the gyrA subunit of the DNA gyrase. These latter, although identified as essential in E. coli 63 , has also been found to be missing from tiny genomes 46,47 , thereby suggesting its function could be taken over by an alternative enzyme or it actually being non-essential in some endosymbiotic organisms.
In terms of transcription-and translation-related genes, a high degree of retention in all S. symbiotica genomes can be observed (Fig. 5: bottom right). Gene losses mainly affect the sigma factors, with STs retaining only the rpoD and rpoH genes, coding for σ 70 and σ 32 , respectively. While the former is generally preserved in endosymbionts 47 , the latter is missing from endosymbionts such as Blattabacterium and Nasuia. σ 32 is required for the normal expression of heat shock genes and for the heat shock response through the regulation of the synthesis of heat shock proteins 72 , and thus its retention/loss could be specific of certain endosymbiotic systems.
Dwindling genes: stripping proteins down to the bones. Through the manual curation of the annotation of SCt, SCc, and STs endosymbionts 38,40 , we noted that some genes (atpC, cysJ, deaD, dnaX, ftsN, hscA, metG, pcnB, rnr, and tolC) seemed to be shorter in STs, and sometimes consistently shrunken across S. symbiotica, compared to those of free-living E. coli and even Db11. However, while these genes showed truncated or missing domains, they displayed a high degree of sequence conservation when compared to Db11. Thorough examination of these shrunken genes, revealed that experimental evidence, mainly from E. coli, have proven that truncated versions of these proteins were able to function with few to none obvious phenotypic consequences (details recorded in the annotation files available from the INSDC). Particularly evident is the loss of non-essential domains in six proteins: AceF, DnaX, FtsK, FtsN, and Rnr (Fig. 6). The AceF protein (E2 component of pyruvate dehydrogenase complex) has undergone the loss of one or two biotin/lypoyl domains (PF00364) in all S. symbiotica, namely STs retains only one. In E. coli, it has been shown, through the in vitro deletion of biotin/lipoyl domains, that one single domain suffices with respect to enzyme activity and protein function 73 . The tau subunit of the DNA polymerase III is coded by the dnaX gene, however an alternative isoform, denominated gamma subunit, is produced due to a programmed ribosomal frameshifting, which leads to a premature stop codon in the − 1 frame at codon 430 74 . in vitro experiments with the shorter isoform, which lacks the tau 4 and 5 domains (PF12168 and PF12170), indicate that gamma is sufficient for replication 75 . The most drastic gene diminutions are observed in the ftsK and ftsN genes (whose products are involved in cell division), where SCc and STs preserve only a very small portion of the original gene. Independent in vivo experiments in E. coli mutants coding only for truncated FtsK (amino acids 1-200) 76 or FtsN (amino acids 1-119) 77 proteins, have corroborated that these tiny versions are sufficient for cell division, although short to long filamentous cells were observed to occur. Regarding MetG (Methionyl tRNAsynthetase), both SCc and STs are lacking the C-terminal putative tRNA binding domain (PF01588). Genetic complementation studies and characterization of C-terminally truncated enzymes in E. coli, established that MetG can be reduced to 547 residues without significant effect on either the activity or stability of the enzyme 78 . Finally, a deletion of the C-terminal basic domain of the Rnr protein (ribonuclease R) can be observed only in STs. This could lead to an increase in activity of this enzyme, since assays using purified truncated Rnr proteins from mutant E. coli, lacking the 83 residues from the C-terminus, were shown to display higher affinity and circa 2-fold higher activity than full length wild-type Rnr (on poly[A], A [17] and A [4] substrates) 79 . Through the alignment of the aforementioned putatively-functional proteins against other small and tiny genomes, we corroborated most of these gene diminutions are common among these organisms (Supplementary Dataset S1). This suggests that selection might favour gene diminution (the retention of only essential domains of the coded protein), relaxing selective constraints in non-essential gene regions, thus further contributing to genome reduction.

Conclusion
S. symbiotica strains analysed here have all been evolving under a similar environment, the aphid-Buchnera symbiotic system. We have established that S. symbiotica strains can be considered to be along the genome reduction spectrum from a free-living bacterium to a drastically-reduced endosymbiont, thus providing "snapshots" of the genome reduction process. SAf would thus represent the very first stages of genome reduction, having not yet lost its ability to be grown in axenic culture and having undergone a mild genome shrinkage and few rearrangements, when compared to the free-living Db11. SAp and SCt would be a stage further down the path, having a more reduced genome than SAf and showing a massive enrichment in both pseudogenes and MEs. However, SCt has already done the transition to becoming a co-obligate endosymbiont, and thus shows more drastic gene losses in the EAAs' biosynthetic pathways. SCc and STs find themselves in more advanced stages of genome reduction and integration to their symbiotic systems, having established a series of metabolic dependencies and complementation with Buchnera for the synthesis of several essential compounds. Nonetheless, SCc differs greatly from STs in genome size, which is explained by the former being in a recent stage of an advanced genome erosion, thus retaining several pseudogenes and "junk" DNA. Both SCc and STs display a drastic genome-wide gene loss, and particularly in their ncRNA repertoire and informational machinery. Through the comparison of these S. symbiotica strains, we were able to hint at essential retained functions, which not surprisingly are shared with other highly-reduced endosymbionts. The detailed study of protein diminution in S. symbiotica revealed a common tendency of endosymbionts to loose non-essential protein domains, and thus constituting an additional route Scientific RepoRts | 6:32590 | DOI: 10.1038/srep32590 towards genome reduction. We expect the further study of this particular endosymbiont of aphids will continue to provide important clues into the intriguing process of genome reduction.

Methods
Annotation of protein-coding genes. All protein-coding genes that were not found in their respective S. symbiotica genomes were searched for using the online version of tblastn 80 with S. marcescens' or E. coli's protein as query. All positive hits with an e-value ≤10 −3 were then manually curated. Domains within a protein were annotated using the InterProScan 81 webserver and through alignments against E. coli's proteins using MAFFT v7.220 82 . Circular representations of presence/absence of genes were done using circos v0.67 83 and edited in Inkscape v0.91. COG categories for proteins were assigned using blastx and ad hoc perl scripts to select the best non-overlapping hits with an e-value threshold of ≤ 10 −3 . Then, COG categories absolute counts were converted to relative ones per organism. Finally, in order to analyse the disturbance of S. symbiotica's functional profiles from that of free-living Db11, we subtracted Db11's relative frequency per COG from each value within the same row. Visual display of COG categories was done using R and the gplots library, followed by manual editing in Inkscape.
Annotation of RNA features. tRNA features were annotated using tRNAscan-SE v1.3.1 84 (-B option for the bacterial model) and TFAM v1.4, followed by manual curation. All other RNA features were searched for using Infernal v1.1.1 85 (-cut_tc -mid) against the Rfam v12.0 database 86 . All hits with an e-value ≤ 10 −3 were considered and manually curated. Visual displays were done using R and the gplots library, followed by manual editing in Inkscape. Plain-text source files used for the plotting of RNA features can be found in https://dx.doi. org/10.6084/m9.figshare.3413932.v1. Rearrangement analysis. Single copy shared proteins among S. symbiotica strains and Db11 were calculated as in ref. 40. Briefly, we used OrthoMCL v2.0.9 87 to build the orthologous groups of proteins, followed by manual curation aimed at joining rapidly-evolving proteins such as outer membrane proteins. This proteins were then used as rearrangement markers for calculating a minimal rearrangement phylogeny in MGR v2.0.3 88 (no heuristics). Scaffold for unfinished genomes (SAf, SAp, and SCt) were arranged as to minimise the distance against Db11. Tree visualisation was done in FigTree v1.4.1. Rearrangement graphic was done in R using the genoPlotR library 89 . All graphics were edited in Inkscape.