The Prophage and Plasmid Mobilome as a Likely Driver of Mycobacterium abscessus Diversity

Mycobacterium abscessus is an important emerging pathogen that is challenging to treat with current antibiotic regimens. There is substantial genomic variation in M. abscessus clinical isolates, but little is known about how this influences pathogenicity and in vivo growth.

IMPORTANCE Mycobacterium abscessus is an important emerging pathogen that is challenging to treat with current antibiotic regimens. There is substantial genomic variation in M. abscessus clinical isolates, but little is known about how this influences pathogenicity and in vivo growth. Much of the genomic variation is likely due to the large and varied mobilome, especially a large and diverse array of prophages and plasmids. The prophages are unrelated to previously characterized phages of mycobacteria and code for a diverse array of genes implicated in both viral defense and in vivo growth. Prophage-encoded polymorphic toxin proteins secreted via the type VII secretion system are common and highly varied and likely contribute to strain-specific pathogenesis.
KEYWORDS Mycobacterium abscessus, prophages, bacteriophages, plasmids B acteriophages are characteristically specific for their bacterial hosts, with preferences that rarely traverse genus boundaries and are sometimes constrained to only a subset of isolates within a bacterial species (1). Phage specificity is determined by numerous factors, including receptor accessibility, restriction-modification, CRISPR-Cas, and abortive-infection systems, many of which can be expressed from prophages or plasmids (2)(3)(4)(5)(6). Because prophages and plasmids are highly mobile, these are key contributors to variations in phage infection among otherwise closely related bacterial strains. For using phages therapeutically to control bacterial infections, this specificity is a double-edged sword; it facilitates targeting of particular pathogens without gross microbiome disturbance, but constrains the range of bacterial isolates sensitive to any particular phage (7).
A large collection of mycobacteriophages have been isolated on Mycobacterium smegmatis and genomically characterized (8). They are genetically diverse and are currently grouped into 29 clusters (A to Z, AA to AC) according to overall sequence member (these are assigned to clusters rather than classed as singletons, as there are relatives in the large number of M. abscessus genomes in public databases) (Fig. 1E). The M. abscessus prophages are at least as, if not more, diverse than an equivalent number of M. smegmatis phages (10). Genomic maps of prophages prophiGD21-3 and prophiGD54-2 ( Fig. 2) illustrate some of the interesting and unusual genomic features of these prophages, and detailed genomes of prophages are shown at https://phagesdb.org/documents/categories/14/. ProphiGD21-3, a member of cluster MabG ( Fig. 2A), is organized with most of the genes rightward-transcribed, with the notable exceptions of a cassette adjacent to attR containing a polymorphic toxin (PT), a corresponding immunity protein (27), and an ESAT-6-like WXG-100 protein ( Fig. 2A). The PT contains an N-terminal WGX-100 motif and is likely exported by the host type VII secretion system. The PT contains a C-terminal domain related to the tuberculosis necrotizing toxin (TNT), which facilitates immune evasion by Mycobacterium tuberculosis (28), thus implicating this prophage in success of M. abscessus in vivo. These PT-Imm cassettes are common in the prophages but highly varied, as discussed in detail below. A second feature of note is genes 20 and 23, which are predicted to be expressed early in lytic growth and code for proteins with motifs common to cysteine dioxygenases and phosphoadenosine phosphosulphate (PAPS) reductases, respectively. It is unusual for these to be phage encoded, but PAPS reductase-like proteins are similar to DndC, which participates in phosphorothioate DNA modifications that are common in M. abscessus (29,30); gp20 is also implicated in cysteine metabolism.
ProphiGD54-2 (cluster MabI) is organized similarly to cluster M mycobacteriophages (24). It integrates with a serine-integrase and codes for an array of 21 tRNA genes and a tmRNA, as well as a release factor (Fig. 2B), suggesting substantial translational reprogramming during lytic growth. However, like prophiGD21-3 ( Fig. 2A), prophiGD54-2 also codes for a PT-Imm cassette, although it is located proximal to attL (Fig. 2B). The PT also contains an N-terminal WXG-100 motif and has a C-terminal motif distantly related to the AvrE-family of secreted effectors; the Imm protein is a predicted LpqN-like lipoprotien and is likely to be cell wall associated.   Diversity of M. abscessus plasmids. Although plasmids are not as prevalent as prophages in these clinical isolates and are only present in ;50% of the strains, they are also quite diverse ( Table 2, Fig. 1F). There are eight clusters (pA to pH) and nine singletons, each without close relatives in this data set, of which three (pGD58, pGD104, and pGD21-1) are large and are not fully assembled (Fig. 1F). The smallest are cluster pA plasmids (9.5 kbp), but the cluster pH and singleton pGD104 plasmids are over 90 kbp. All are present at low copy number, typically fewer than 5 copies/cell on average ( Table 2; Fig. 1F). Comparison of these plasmids with the extant publicly available (;1,500) M. abscessus sequences shows that although some plasmid-borne genes are prevalent, there are few examples of near-full-length sequence matches. Notable exceptions are M. abscessus subsp. bolletii plasmid 2 (31) and M. abscessus pJCM30620 (32), which are similar to pGD58 (each with 99% identity spanning 92% coverage), and plasmid Mycobacterium sp. djl-10 plasmid djl-10_3 (accession number CP016643.1) that is very similar to pGD25-3. Detailed genome maps of the plasmids are available at https://phagesdb.org/documents/categories/14/. e Plasmid copy numbers are calculated as the fold-difference between the average number of sequence reads mapping to the plasmid relative to the corresponding genome. If there is more than one plasmid, the average is reported. f Plasmids are predicted to be mobilizable if they code for a conjugative type relaxase, and conjugative if they contain an ESX operon. g Plasmid pATCC19977-1 is the same as the previously reported plasmid in this strain (20).
Plasmids in different clusters share fewer than 35% of their genes, but most code for one of four sequence types of a RepA replication protein, the exceptions being the large plasmids pGD58 (cluster pH), pGD21-1, and pGD104, for which no RepA was identified. RepA sequences of pA, pC, pD, pE, and pF plasmids are sufficiently similar (.64% pairwise amino acid [aa] identity) that they likely form a single incompatibility (Inc) group (IncMabI) (Fig. 1F). Clusters pB, pG, and singletons pATCC19977, pGD13, pGD22-1, pGD51, and pGD52 have a second group of related RepA proteins (.75% pairwise aa identity), potentially forming a second Inc group (IncMabII), and although 10 strains have two plasmids, none have two plasmids of the same Inc type. GD25 has three plasmids and singleton pGD25-3 likely represents a third Inc group (IncMabIII), although it shares 78% aa identity with IncMabII plasmid pGD25-2 (MabG). GD21 has two plasmids, pGD21-1 and -2, the latter of which is linear and represents a fourth Inc group (IncMabIV) (Fig. 1F).
Prophage locations and prophage integration. All of the prophages are chromosomally integrated, and many are expected to impact host physiology; no plasmidial prophages were identified (33). They are inserted at 18 different positions and are distributed broadly around the M. abscessus genome (Fig. 3A); the number and variety of attB sites is greater than those used by 1,800 sequenced phages of M. smegmatis (34). Phages in most of the clusters use a tyrosine integrase (Int-Y), with the exceptions of clusters MabI and MabJ, which both use serine integrases (Int-S) (Fig. 1E). Of the 15 attB sites used by Int-Y, 10 overlap host tRNA genes, a common organization for these integration systems (35); however, 5 do not (Fig. 3A). MabG and MabM phages use an attB site (attB-11) located within the host tmRNA gene (Fig. 3B). The common core sequences (shared by attB and attP) are typically 25 to 76 bp for the tRNA-attB sites (Table S1), with the phage-derived sequences reconstructing the 39 end of the host tRNA gene (e.g., attB-1; Fig. 3B); the exceptions are the MabA1 phages that unusually reconstruct the 59 end of the tRNA Met gene at attB (e.g., attB-5; Fig. 3B). For all of these, no host genes are lost upon integration, although the tRNA Met gene (Mab_t5028) must be expressed from a phage promoter following MabA1 phage integration at attB-5 ( Fig. 3A and B). Int-Y phages integrating at the four non-tRNA attB sites (attB-14, attB-12, attB-13, and attB-8) typically have shorter core sequences (3 to 25 bp), but the consequences of integration are more complex (Fig. 3B). At attB-14, the integration is intergenic (MAB_4442-4443) and flanking gene expression is likely unaltered following integration. However, at attB-12 and attB-13, the common core overlaps the ribosomebinding site and translational start site of MAB_3824 and MAB_3947 (fatty acyl-CoA reductase), respectively, such that transcription of these genes must originate within the prophages (Fig. 3B). The attB-8 site is within the 39 end of MAB_2979, with the crossover site positioned seven codons from the translation stop codon (Fig. 3B), and, although integration results in replacement of the C-terminal seven amino acids with eight prophage-derived residues, the protein likely retains functionality (Fig. S1).
Three attB sites (attB-9, attB-7, and attB-17) are used by Int-S systems and have characteristically short common core sequences (5 to 8 bp) (36,37). All integrate within open reading frames which they disrupt, as described similarly for Bxb1 integration into the groEL1 gene of M. smegmatis (36,38). The cluster MabJ phages integrate at attB-7 located within MAB_2445, which encodes an AraC-like regulator, with potential for wholesale changes in host gene expression. attB-9 and attB-17 are both used by MabI phages and are located within MAB_3230 and MAB_3265, respectively (Fig. 3B). Mab_3230 contains a SnoaL_4 domain and is related to an oxidoreductase of Streptomyces (39). MAB_3265 encodes a dienelactone hydrolase family protein, although its specific role is not known. M. abscessus Prophages and Plasmids ® Superinfection immunity and integration-dependent immunity. There is considerable variation in the sequences of immunity repressors, including within clusters of otherwise closely related prophages (Fig. 3C, Table S1). At least 30 distinct immunity groups are predicted, reflecting a broad capacity to influence phage infection profiles by repressor-mediated superinfection immunity (Fig. 3C, Table S2). With the exceptions of the cluster MabI and MabJ phages, the repressors are divergently transcribed from putative cro-like genes and closely linked to the int; they vary considerably in length and sequence, but most contain putative DNA-binding motifs. In the cluster MabJ phages, the repressor is distal from the integrase, reflecting the organization common to cluster A mycobacteriophages. The repressor location in the cluster MabI is unclear.
Nine of these prophages, corresponding to six attB sites (attB-1, attB-2, attB-8, attB-12, attB-13, and att-14), use integration-dependent immunity systems (40). These systems are unusual in that attP is located within the repressor gene such that the viraland prophage-encoded gene products differ at their C termini. The virally encoded repressor gene product typically has a C-terminal ssrA-like degradation tag and does not confer immunity, and integration is required for removal of the degradation tag and expression of a functional repressor (40). Clusters MabB, MabC, MabH, MabK, MabN, and MabO all have attP within their repressor genes and integrative recombination leads to a 20 to 35 residue shorter gene product truncated at its C terminus due to a translation stop codon at attL (Fig. 3D, Fig. S1). For the MabC, MabK, MabN, and MabO phages (using attB-1, attB-12, attB-13, and attB-14), the stop codon is formed by juxtaposition of the first base of bacterial sequence to the phage sequence at attL. In the MabB phages, seven amino acids are added from the bacterially derived sequence (Fig. S1). For MabB and MabK phages, the attB site overlaps the 39 ends of tRNA genes such that the tRNA is transcribed toward the truncated repressor (Fig. 3D).
Prophage-encoded polymorphic toxin-Imm systems. The presence of PT-Imm cassettes in prophiGD21-3 and prophiD54-2 was noted above (Fig. 2), but related cassettes are prevalent in these prophage genomes. Prophages in 14 clusters code for a remarkably diverse set of PT-Imm systems, all implicated in bacterial virulence (41) (Fig. 4A, Table S3). These systems code for a large (;50 kDa) member of the polymorphic toxin (PT) family, and an immunity protein (Imm) that protects from toxicity (41). All of the prophage-encoded toxins include an N-terminal WXG-100 motif targeting the PT for export by the type VII secretion system (TSS), together with a small ESAT6-like protein with a WXG-100 motif that likely forms a heterodimer to promote PT export (Fig. 4A). The variation among the prophage-encoded PTs is consid- erable, including at least 10 different sequence groups of the PTs, with additional diversity among their C-terminal regions. For example, prophages prophiGD57-1, prophiGD08-3, prophiGD21-3, prophiGD43A-5, prophiGD43A-6, prophiGD05-3, and prophiGD03-1 code for related PTs, but the C-terminal regions code for different motifs, including Tox-REase-5, tuberculosis necrotizing toxin (TNT), Endo-NS2, and Ntox-15 motifs (Fig. 4A, a to g). The putative Imm proteins immediately downstream of the PTs are also highly diverse and are predicted to interact directly with the toxin (42), and likely coevolve with the PT C-terminal domain (Fig. 4A). Thus, although there are seven different configurations with a toxin related to that in prophiGD57-1 (Fig. 4A), the four different putative Imm proteins correspond to the C-terminal variation of the toxin (Fig. 4A). We note that several of the putative Imm proteins are predicted lipoproteins (Fig. 4A, i, k, and s). Secretion of the PT likely utilizes either the M. abscessus Esx-3 or Esx-4 type VII secretion systems, both of which are important for growth in vivo (43,44). These prophage-encoded PT-Imm systems are predicted to contribute to M. abscessus in vivo growth and infection via multiple mechanisms. All of these PT-Imm systems are encoded close to the attachment junctions and adjacent to bacterial genes (Fig. 4A), a common location for prophage-expressed genes among mycobacteriophages (13,15). Transcriptome sequencing (RNAseq) shows that most prophage genes are transcriptionally silent, but the PT-Imm systems are expressed in several lysogens with transcription initiation originating from prophage promoters (Fig. 4B). The Imm genes are expressed at higher levels than the PT genes, presumably to optimize immunity from the PT prior to export (Fig. 4B). This is in contrast to the MuF-related toxins within the virion structural genes of several Escherichia coli phages, which are secreted by type VI systems (27,45). We note, however, that PT-Imm expression is not observed in all lysogenic strains, as shown by prophiGD21-1, in which the repressor is the sole lysogenically expressed gene product (Fig. 4B). It is plausible that some PT-Imm systems are expressed only in host cells.
Prophage-encoded toxin-antitoxin systems. Prophages can encode and express multiple functions other than repressor-mediated immunity that prevent phage infection, often with considerable specificity and against genomically unrelated phages (3,5). Among these are toxin-antitoxin (TA) systems and several are located in att-linked defense loci of mycobacteriophages and are prophage expressed (3). Nineteen M. abscessus prophages code for at least nine different TA systems, although only two (in prophiGD12-2 and prophiGD04-1) are proximal to an attachment site (Fig. 5A). The others are located within early lytic genes but often transcribed on the opposite strand (e.g., prophiGD79-1, prophiGD91-4, prophiGD43A-5, and prophiGD12-2) (Fig. 5A). RNAseq of several lysogens carrying MabA1 phages shows that the TA pair is strongly transcribed, contrasting with the flanking phage (Fig. 5B). These genes are thus implicated in influencing bacterial physiology and likely promote defense against viral infection.
Potential roles of M. abscessus plasmids. The M. abscessus plasmid repertoire is diverse and replete with functions predicted to influence bacterial physiology, including antibiotic resistance, phage defense, and virulence. Most of the plasmids are likely mobilizable and code for conjugative-type relaxases, perhaps using the host TSS sys- FIG 4 Legend (Continued) these genes are close to either an attR (a to h, l to u) or attL (i to k) attachment junction (designated according to the attB site used; see Fig. 2A, Table  S3) and phages genes are shown as colored boxes above or below genome rulers reflecting rightward and leftward transcription, respectively; black arrows indicate a host gene adjacent to the attachment, designated with the corresponding gene number in M. abscessus ATCC 19977. The genomes are aligned by 59 end of the toxin gene (a to h, l to u) where transcribed leftward inside attR, and similarly for the three configurations adjacent to attL (i to k), where the genes are transcribed rightward. Genes are colored according to their designated assignment into groups of related proteins (phamilies). All of the polymorphic toxin genes have an N-terminal WXG-100 (WXG) motif common to the type VII secretion system but have variable C termini. A schematic representation is shown in the box at top right indicating the organization of the polymorphic toxin domains and the proposed interaction between the toxin and a protective immunity protein. (B) Expression of the PT-Imm loci. RNAseq profiles for prophiGD43A-5, prophiGD43A-2, prophiGD43A-3, and prophiGD43A-6 show lysogenic expression of the PT-Imm loci; most of the rest of the prophages are transcriptionally silent. RNA was prepared from M. abscessus strain GD43, and only sequence reads mapping uniquely are shown. Also shown is a profile of the entire prophiGD21-1 prophage, in which only the repressor is expressed. RNA was prepared from M. abscessus strain GD21 and RNAseq reads mapping to forward (red) and reverse (purple) strands are shown. Pairwise nucleotide sequence similarity is displayed by spectrum color shading between the genomes, with violet as most similar and red as least similar. Genes are shown as boxes either above or below the genome, indicating rightward and leftward transcription, respectively. Gene boxes are colored according to gene phamilies in which they are assigned. tems for mobilization that are implicated in distributive conjugation (46). We note that clusters pC, pD, and pE plasmids also code for several proteins with WXG-100 domains that are likely also exported through the TSS system, as well as toxin-antitoxin and abi systems (47) implicated in viral defense (Fig. 5C). Abi genes (47) are present in clusters pA, pB, pC, pD, pF, and the singletons pATCC19977, pGD25-3, and pGD104, and TA systems are in plasmids in clusters pD, pE, pH, and singletons pGD104 and pGD21-2. However, we note that of the 28 strains that are not infected by phages, 19 are plasmid free, and the overall phage susceptibility profiles are likely determined by complex combinations of prophage, plasmid, and bacterially encoded functions (17).There are also a variety of genes associated with transport systems, including the MmpL proteins (coded by pB plasmids), MFS-like transporters, and several metal resistance and iron regulators, specifically. These strains are resistant to many different antibiotics and the plasmids are strongly implicated in these resistance phenotypes.
The large (.92 kbp) plasmids (pGD58, pGD104, pGD21-1, and pGD21-2; Table 2) are notable in that they have large (25 to 30 kbp) ESX regions coding for type VII secretion systems that are implicated in conjugative plasmid transfer (Fig. 5D); these ESX systems are similar to that in M. bolletii 50594 plasmid 2, designated ESX-P cluster 3 (48). Related plasmids are reported to be quite widespread (49), but are not highly prevalent in M. abscessus strains; pGD58 and pGD104 each have only ;20 closely related plasmids in over 1,500 sequenced M. abscessus strains (50). The three strains carrying these large plasmids are all M. abscessus subsp. massiliense, two of which have smooth morphotypes, suggesting that abundant surface GPLs do not interfere with plasmid transfer by conjugation.

DISCUSSION
M. abscessus is an important emergent pathogen and widespread antibiotic resistance presents substantial clinical challenges. Elucidating its pathogenic capacity is complicated by its genetic variability, much of which could be driven by its expansive mobilome of prophages and plasmids, many of which code for genes predicted to influence survival and growth in vivo as well as antibiotic-and phage-resistance profiles (Fig. 4, Fig. 5). Defining these strain differences and their pathogenic behaviors is of considerable importance (17). Most studies of M. abscessus have focused on typestrains such as ATCC 19977, but this strain is poorly representative of the pathogenic potential and physiology of most clinical strains, whose mobilomes are revealed to be highly diverse, with individual strains having different properties depending on the variety of prophages and plasmids they carry. Understanding clinical responses to M. abscessus infection will require a much broader understanding of these strain differences and their phenotypic consequences.
The widespread antibiotic resistance of M. abscessus clinical isolates is a substantial impediment to genetic manipulation, as it greatly limits the use of selectable markers for transformation. The diverse prophage and plasmid repertoires offer a multitude of opportunities for advancing the genetic systems. For example, the numerous superinfection immunity systems are a resource for use as genetically selectable markers that circumvent the use of antibiotics (51). Several of the prophages have been propagated lytically and it is likely that many more can be (17,52). For each of these, a cloned repressor gene can be adapted as a selectable gene using lytic phage derivatives as selective agents. We note that for the integration-dependent immunity systems (40), it is critical that the truncatedbut-active prophage-encoded repressor must be used, not the inactive virally encoded form.
There are relatively few plasmid replicons available for vector development for M. abscessus. The plasmids described here represent at least four incompatibility groups (Fig. 1F), each of which could be used to develop low-copy-number extrachromosomal vectors for combinatorial use. There is also considerable potential for construction of additional integration-proficient plasmid vectors taking advantage of the abundance of newly identified attB sites (Fig. 3A). We note that the commonly used integrative vectors based on mycobacteriophage L5 (53) use a conserved attB site overlapping M. abscessus tRNA Gly gene (t5027), which is not occupied by any of the prophages described here (Fig. 3B), and therefore should be broadly applicable.

MATERIALS AND METHODS
Bacterial strains and media. M. smegmatis mc 2 155 was grown as previously described (14). M. abscessus strains were grown in 10 ml of 7H9 medium with oleic acid-albumin-dextrose-catalase (OADC) and 1 mM CaCl 2 for ;72 h at 37°C with shaking. For some M. abscessus strains, several individual isolates were recovered either at different times or different morphotypes, including strains GD54, GD35, and GD64, which were designated GD54H, GD35B, and GD64A, respectively. For some isolates, both rough and smooth colony morphotypes were recovered, and designated accordingly (e.g., GD68A, GD68B). GD43A and B have different numbers of prophages in them and they are therefore treated as separate strains. Bacterial DNA was prepped from 1 ml of log-phase culture using standard phenol-choloroform-isoamyl alcohol extraction and ethanol precipitation. Phage DNAs were isolated using similar methods as reported previously (9).
Genomics. Genomic DNAs were prepared for sequencing using NEB Ultra II FS kits and then pooled and run on an Illumina MiSeq using v3 reagent kits to generate 300-base paired-end reads. In some cases, Oxford Nanopore sequencing libraries were also constructed from genomic DNA using Rapid Sequencing Barcoding kits, and then pooled and run on a MinION device using FLO-MIN106D flowcells. Illumina reads for each strain were trimmed and quality-controlled using Skewer (54). Trimmed Illumina reads were then assembled using Unicycler (55), incorporating Nanopore reads when available.
In the case of complete genomes, assemblies were viewed, stitched, corrected, and finalized using Consed version 29 (56,57). GraphMap (58) was used to align long Nanopore reads to provisional assemblies and resolve repetitive regions. The first base and orientation of each complete circular chromosome was chosen to match those of the ATCC 19977 strain and/or to align with the first base of the dnaA gene.
Prophage and plasmid identification. Prophages were detected initially by searches using PHASTER (20) followed by careful manual inspection. PHASTER often identifies potential regions with prophages but does not accurately identify attachment junctions. Precise prophage positions were determined by genome comparisons with strains lacking those prophages, and identifying the short repeated sequences corresponding to the common core at the attL and attR sites. Related copies of prophages were identified by extensive sequence searches and genome comparisons. Each prophage sequence was extracted, including the common core sequence at both ends of the prophage genome. Prophages were designated according to the strain in which they reside, i.e., prophiGDXX-1, with suffixes used to denote multiple prophages in the same genome.
Potential plasmids were identified primarily as small circularized contigs in genome assemblies, although one linear plasmid was also identified. These contigs were manually inspected to ensure they were valid, complete, and not contaminants. Complete circular plasmids were oriented and cut so that base 1 was the first base of a predicted repA gene whenever possible.
RNAseq. Total RNA was isolated from logarithmically growing M. abscessus cells. Removal of DNA was completed using a Turbo-DNase-Free kit (Ambion) according to the manufacturer's instructions. The depletion of rRNA was completed using QIAseq FastSelect (Qiagen). The libraries were constructed using the NEBNext Ultra RNA library kit (New England BioLabs) and verified using a BioAnalyzer. The libraries were multiplexed and 4 were run on an Illumina MiSeq for each run. Analysis of the data was as described previously (62). Only unique reads were mapped to each genome set. All RNAseq data have been deposited in Gene Expression Omnibus (GEO) repository (GSE161710).
Data availability. The data that support the RNAseq findings of this study have been deposited in Gene Expression Omnibus (GEO) with number GSE161710. The completed and WGS genome sequencing data for M. abscessus clinical isolates, including plasmids and prophages, are available in GenBank, and a complete list of accession and project numbers are provided in the accompanying manuscript (17).

SUPPLEMENTAL MATERIAL
Supplemental material is available online only. FIG S1, PDF file, 0.1 MB.