Not all Pseudomonas aeruginosa are equal: strains from industrial sources possess uniquely large multireplicon genomes

Pseudomonas aeruginosa is a highly versatile, antibiotic-resistant Gram-negative bacterium known for causing opportunistic infections and contamination of industrial products. Despite extensive genomic analysis of clinical P. aeruginosa strains, no genomes exist for preservative-tolerant industrial strains. A unique collection of 69 industrial isolates was assembled and compared to clinical and environmental strains; 16 genetically distinct industrial strains were subjected to array tube genotyping, multilocus sequence typing and whole-genome sequencing. The industrial strains possessed high preservative tolerance and were dispersed widely across P. aeruginosa as a species, but recurrence of strains from the same lineage within specific industrial products and locations was identified. The industrial P. aeruginosa genomes (mean=7.0 Mb) were significantly larger than those of previously sequenced environmental (mean=6.5 Mb; n=19) and clinical (mean=6.6 Mb; n=66) strains. Complete sequencing of the P. aeruginosa industrial strain RW109, which encoded the largest genome (7.75 Mb), revealed a multireplicon structure including a megaplasmid (555 265 bp) and large plasmid (151 612 bp). The RW109 megaplasmid represented an emerging plasmid family conserved in seven industrial and two clinical P. aeruginosa strains, and associated with extremely stress-resilient phenotypes, including antimicrobial resistance and solvent tolerance. Here, by defining the detailed phylogenomics of P. aeruginosa industrial strains, we show that they uniquely possess multireplicon, megaplasmid-bearing genomes, and significantly greater genomic content worthy of further study.


Content
Title Page Methods Method S1 Preservative susceptibility testing 3 Method S2 Extraction of DNA from P. aeruginosa 3 Method S3 Complete genome sequencing and annotation of P. aeruginosa RW109 3 Method S4 Assigning functional groups to the P. aeruginosa RW109 genome sequence 4 Method S5 Kyoto Encyclopedia of Genes and Genomes (KEGG) functional module assignment 5 Method S6 KEGG enrichment analyses in relation to preservative tolerance 5 Results Result S1 The P. aeruginosa RW109 megaplasmid copy number. 6 Result S2 KEGG functional module enrichment and P. aeruginosa preservative tolerance 6 References 7-9 Tables  Table S1 Collection of industrial and other P. aeruginosa strains assembled and analysed in this study 10-16  Figure S1 Clustered RAPD-PCR profiles of 69 industrial P. aeruginosa isolates 25 Figure S2 Swimming, swarming and twitching motilities of the P. aeruginosa strains. 26 Figure S3 Biofilm formation by P. aeruginosa panel and industrial strains 27 Figure S4 The complete multireplicon genome of industrial P. aeruginosa strain RW109 28 Figure S5 KEGG functional module enrichment analysis in relation to P. aeruginosa tolerance of BIT, MIT and CITMIT 29 Figure S6 KEGG functional module enrichment analysis in relation to P. aeruginosa tolerance of PHE and CHX 30 Bioinformatics (CLIMB) consortium [3]. The following protocol was used to create FASTQ DNA sequence files from the raw data PacBio files. From each sequencing run, three bax.h5 files and one bas.h5 file (a pointer file to the three bax.h5 files) resulted from each SMRT cell. The bax.h5 files contained the base call information from the sequencing run, and both sets from the two SMRT cells were converted into two separate binary format (bam) files using the bax2bam tool (v0.0.8, PacBio).
Assembled contigs were checked for overlapping ends and trimmed where necessary using Circulator (EggNOG) database (v4.5.1) . Functional orthologs were assigned using the EggNOG HMMER3 [9,10] based homology search to the optimised bacterial database and the COG categories and accession numbers were extracted. COG categories were divided into three well characterised functional classes' information storage and processing, cellular processes and signalling and metabolism. A poorly characterised functional class was also used where the COG category was unknown.
RW109 genomic islands (GIs) were predicted using Islandviewer (v4.0) [11] with the Prokka generated GeneBank file with default settings applied. The results from IslandViewer prediction methods SIGI-HMM and IslandPath-DIMOB was used to identify the total number of GIs within the RW109 whole genome sequence. Prophage sequences within the RW109 genome were predicted using PHAge Search Tool Enhanced Release (PHASTER) (v1.0) [12]. Comparisons against the PHASTER databases and feature identifications were carried out with the Prokka generated GeneBank file. Phage sequence regions were given a PHASTER score and were identified as being complete if the score was > 90, questionable with a score of 70-90 and incomplete with a score < 70. The ABRicate tool (v0.5-dev, https://github.com/tseemann/abricate.git) via the command line was used to screen the Prokka annotated nucleotide sequence of RW109 to identify antimicrobial and virulence genes using the Comprehensive Antibiotic Resistance Database (CARD) 2013) [13]. A ≥80% cut off was used for both coverage and identity ABRicate scores.

Method S5. Kyoto Encyclopedia of Genes and Genomes (KEGG) functional module assignment.
KEGG Orthology (KO) terms were assigned to the RW109 translated amino acid sequences from Prokka predicted CDS, using the KEGG Automatic Annotation Server (KAAS) [14] through the NCBI BLAST single-directional best hit search method, against the prokaryotes organism list. The Metabolic and Physiological Potential Evaluator (MAPLE) tool (v2.1.0) [15], was subsequently used to map the groups of KO-assigned CDS to KEGG defined modules. These modules are a collection of functional units linked to specific metabolic abilities and phenotypic features, and identified with M numbers. KEGG modules are grouped into pathway modules, structural complexes, functional sets and signature modules [15].

Supplementary Results
Result S1. The P. aeruginosa RW109 megaplasmid copy number. Mapping of short reads to the complete RW109 sequence using the EDGE software [16] derived fold coverage metrics of 36 ± 9.9, 67 ± 25 and 100 ± 30 (standard deviation) for the main chromosome, megaplasmid and large plasmid respectively. The copy number of the megaplasmid was estimated to be less than 2 since the sequence coverage was 1.8-fold greater than that of the main chromosome. The Inc-P2 plasmids that were phylogenetically closely related to the P. aeruginosa RW109 megaplasmid (pJB37 and p0Z176; Figure   4) are also predicted to be low copy number [17,18]. The RW109 large plasmid ( Figure S4) was also likely a low copy number given that plasmid sequence coverage was 2.7-fold greater than that for the main chromosome.
Result S2. KEGG functional module enrichment and P. aeruginosa preservative tolerance. KEGG functional module pathway analysis was carried out on the four P. aeruginosa isolates with the highest MICs and four with the lowest MICs for the preservatives BIT, MIT, CITMIT, PHE and CHX (Figure 1; Table S4 and S5). The modules were grouped by category and the number of complete modules for each category were compared for the isolates with high MIC versus the isolates with low MIC for each preservative. Overall, there were minimal differences in the numbers of complete modules assigned to the categories between P. aeruginosa with high and low preservative MICs, although significant differences were identified for a small number of categories ( Figure S5 and S6). Isolates with higher BIT and MIT MICs had a significantly higher number of modules assigned to the drug resistance category ( Figure S5). Interestingly this was due to the addition of the complete multidrug resistance efflux pump BpeEF-OprC module in the isolates RW176 and RW146, which had high BIT and MIT MICs.
P. aeruginosa isolates with low MICs for BIT and MIT were found to have a significantly higher number of complete modules assigned to central carbohydrate metabolism when compared to those with higher MICs for these preservatives ( Figure S5). Isolates with high CITMIT ( Figure S5) and PHE ( Figure   S6) MICs had significantly more modules categorised as two component regulatory systems. A higher number of modules were identified in the bacterial secretion system category for isolates with high BIT and MIT MICs ( Figure S5). Isolates with high CHX MICs were found to have significantly higher modules identified in the central carbohydrate metabolism category ( Figure S6). Detailed functional analysis of the implicated pathways and screening of additional P. aeruginosa strains is required to expand on these interesting preliminary findings.        Footnotes: CF, cystic fibrosis; CLIN, clinical; ENV, environmental; IND, industrial; HC, household cleaner; PC, personal care; MWF, metal working fluid; TC, timber care, '-' designates non motile, and light blue shading indicates highly motile. a Strains exhibiting an atypical swimming motility are shown in Figure S2B; b A strain exhibiting typical swarming with a diameter of approximately 60 mm is shown in Figure S2A Figure S1. Clustered RAPD-PCR profiles of 69 industrial P. aeruginosa isolates. A Pearson correlation similarity coefficient was used to construct a UPGMA dendrogram of the RAPD-PCR profiles. The PCR fingerprint profile of each isolate is shown to the right of the dendrogram, along with the isolate number in the RW collection (ID#; See Table S1 100  95  90  85  80  75  70  65  60  55  50  45  40  35

Italy
Indonesia Malaysia  Figure S2. Swimming, swarming and twitching motilities of the P. aeruginosa strains.
Representative images of the levels of swimming (0.3% LB agar), swarming (0.5% LB and BSM-G agar) and twitching (1% LB agar) motility exhibited are shown in panel A. Results were recorded after 16-18 hours incubation at 30°C (swarming) or 37°C (swimming and twitching). The unusual but consistent swimming motility phenotype of the 3 industrial strains from the same location are shown in panel B. Numerical assessment of the phenotypes is provided in Table S5.