PorinPredict: In Silico Identification of OprD Loss from WGS Data for Improved Genotype-Phenotype Predictions of P. aeruginosa Carbapenem Resistance

ABSTRACT The increasing integration of genomics into routine clinical diagnostics requires reliable computational tools to identify determinants of antimicrobial resistance (AMR) from whole-genome sequencing data. Here, we developed PorinPredict, a bioinformatic tool that predicts defects of the Pseudomonas aeruginosa outer membrane porin OprD, which are strongly associated with reduced carbapenem susceptibility. PorinPredict relies on a database of intact OprD variants and reports inactivating mutations in the coding or promoter region. PorinPredict was validated against 987 carbapenemase-negative P. aeruginosa genomes, of which OprD loss was predicted for 454 out of 522 (87.0%) meropenem-nonsusceptible and 46 out of 465 (9.9%) meropenem-susceptible isolates. OprD loss was also found to be common among carbapenemase-producing isolates, resulting in even further increased MICs. Chromosomal mutations in quinolone resistance-determining regions and OprD loss commonly co-occurred, likely reflecting the restricted use of carbapenems for multidrug-resistant infections as recommended in antimicrobial stewardship programs. In combination with available AMR gene detection tools, PorinPredict provides a robust and standardized approach to link P. aeruginosa phenotypes to genotypes. IMPORTANCE Pseudomonas aeruginosa is a major cause of multidrug-resistant nosocomial infections. The emergence and spread of clones exhibiting resistance to carbapenems, a class of critical last-line antibiotics, is therefore closely monitored. Carbapenem resistance is frequently mediated by chromosomal mutations that lead to a defective outer membrane porin OprD. Here, we determined the genetic diversity of OprD variants across the P. aeruginosa population and developed PorinPredict, a bioinformatic tool that enables the prediction of OprD loss from whole-genome sequencing data. We show a high correlation between predicted OprD loss and meropenem nonsusceptibility irrespective of the presence of carbapenemases, which are a second widespread determinant of carbapenem resistance. Isolates with resistance determinants to other antibiotics were disproportionally affected by OprD loss, possibly due to an increased exposure to carbapenems. Integration of PorinPredict into genomic surveillance platforms will facilitate a better understanding of the clinical impact of OprD modifications and transmission dynamics of resistant clones.

C arbapenems are last-line antibiotics reserved for the treatment of severe bacterial infections. The emergence of carbapenem-resistant (CR) pathogens has thus created a major challenge for public health. Of particular concern is the spread of carbapenem-resistant Acinetobacter baumannii (CRAB), Enterobacteriaceae (CRE; including Klebsiella pneumoniae, Escherichia coli, and Enterobacter spp.), and Pseudomonas aeruginosa (CRPsA), which are major causes of nosocomial and multidrug-resistant infections and associated with high mortality rates (1).
In carbapenemase-negative CRPsA, carbapenem resistance is primarily associated with the inactivation of the OprD outer membrane porin (11,13). OprD, a monomeric 18stranded b-barrel, is responsible for the passive uptake of basic amino acids and small peptides and serves as a diffusion channel for carbapenems such as meropenem and imipenem, which structurally resemble dipeptides (14,15). OprD defects can be mediated by indels, recombination events such as the insertion of insertion sequence (IS) elements, nonsense mutations, start-or stop-loss mutations, promoter disruptions, or amino acid substitutions affecting the porin's pore width, charge, or structural integrity (7,11,16). Other genetic determinants contributing to decreased carbapenem susceptibility include the overexpression of multidrug efflux pumps and the overexpression of the intrinsic beta-lactamase AmpC (17,18).
Pathogen surveillance has greatly benefited from the recent integration of genomics: the timely and accurate identification of pandemic clones and transmission events is critical to guide public health responses (19). Whole-genome sequencing has also become an integral part of antimicrobial resistance (AMR) surveillance, complementing phenotypic antimicrobial resistance testing by providing valuable information on resistance mechanisms and evolutionary events that led to their acquisition (20). Whereas reliable databases and computational tools are available for the detection of carbapenemase genes from whole-genome sequencing data (summarized by Hendriksen et al. [21]), there is a lack of robust and standardized approaches for the detection of porin loss-mediated carbapenem resistance. The identification of the various possible mutations leading to OprD deficiency currently relies on technical expertise and laborious curation and is thus often omitted from routine genotype-phenotype analyses. To address this, we investigated the genetic diversity of OprD in 2,088 P. aeruginosa genomes and compiled a database of intact variants. This was integrated into PorinPredict, a tool that predicts OprD porin loss from whole-genome sequencing data. PorinPredict was validated using 1,124 additional genomes (including 987 carbapenemase-negative genomes) and facilitated considerably improved genotype-phenotype predictions for P. aeruginosa meropenem resistance.
In the 605 genomes of the NCBI data set with a presumptively inactivated OprD coding sequence, the open reading frame contained premature stop codons (n = 150), FIG 1 Distribution of OprD variants and predicted OprD loss across the P. aeruginosa phylogeny. Assemblies of 2,088 isolates (NCBI data set) were included in the analysis. Presence of OprD variants (exact amino acid sequence match) is labeled according to the legend. Putative OprD loss is shown in black bars. Intact nonexact matches (n = 51) and putative OprD losses due to promoter disruptions (n = 12) are not indicated. Isolate affiliation to clades (shaded in gray) and high-risk clones (outermost ring) is annotated. Phylogenetic distances were estimated using MASH and visualized in iTOL (46). frameshift mutations (n = 297), other indels or truncations (n = 155), start-loss mutations (n = 2), or was not detected (n = 1). Disrupted promoter regions (but intact OprD coding sequences) were identified in nine assemblies, which all contained contig breaks within 200 bp upstream of oprD. In all nine cases, sequences matching with various IS elements (IS1394, ISPa82, ISPpu1, ISPpu21, ISPsme1, ISPsp2, ISPst7, or IS5 family) were identified between the contig break and oprD (Table S3). IS elements typically occur in multiple copies in a genome, preventing a resolved assembly of their surrounding region from short-read sequencing data.
The high proportion (28.9%) of genomes with an inactivated OprD coding sequence in this data set likely reflects an overrepresentation of sequenced clinical isolates which had been exposed to antibiotics. Of the isolates, 391 (18.7%) belonged to the high-risk clones sequence type 111 (ST111), ST175, ST233, ST235, ST244, ST277, ST298, ST308, ST357, or ST654, which are known for their increased virulence and multidrug resistance, often associated with various carbapenemases and extended-spectrum beta-lactamases (12). Overall, 239 (61.1%) of all high-risk clone isolates were predicted to be OprD deficient, compared to 375 of 1,697 (22.1%) isolates not belonging to high-risk clones (Fig. 1).
Evaluation of PorinPredict: concordance of predicted OprD loss and carbapenem resistance. PorinPredict detects putative OprD defects by querying genomes against a database consisting of the 15 identified intact OprD variants and against an oprD promoter database. Truncated or absent oprD, indels, premature stop codons, stop-loss and start-loss mutations, and disrupted promoter sequences are reported as putative OprD porin loss. Missense mutations are reported but not per se classified as porin loss.
To evaluate PorinPredict regarding genotype-phenotype associations, 1,124 genomes of P. aeruginosa isolates with known phenotypic meropenem susceptibility (validation data set) were screened for a putative OprD loss and the presence of carbapenemase-encoding genes. OprD loss was common and strongly associated with meropenem resistance: OprD loss was predicted for 556 of 653 (85.  Table S2). Carbapenemase-encoding genes were identified in 132 (20.2%) of the nonsusceptible isolates, often (n = 102, 83.1%) in combination with an inactivated OprD. Midlevel resistance conferred by carbapenemases may predispose for OprD loss, synergistically decreasing carbapenem susceptibility: isolates with concomitant OprD loss and carbapenemase presence showed a median meropenem MIC of .32 mg/L (interquartile range [IQR], .8 to 128) compared to 12 mg/L (IQR, 4 to .32) among isolates with a carbapenemase alone (Fig. 3b). A similar synergy of carbapenemases and outer membrane porin modifications has been described in carbapenem-resistant K. pneumoniae (25).
Possible reasons for inconsistent genotype-phenotype associations include mixed bacterial cultures, fragmented or absent (carbapenemase) gene assemblies from shortread data, alternative carbapenem-resistance mechanisms such as mutations affecting the functionality and overexpression of intrinsic beta-lactamases or efflux-pumps, technical errors during the susceptibility testing or sequencing, interlaboratory differences in MIC readings, or transcription errors.
Overall, PorinPredict improved the accuracy of genotype-phenotype predictions for meropenem nonsusceptibility from 53.1% (when predicted based on the presence of carbapenemase-encoding genes alone; sensitivity and specificity of 20.2% and 98.9%, respectively) to 89.6% (when predicted based on presence of carbapenemase-encoding genes and/or porin loss; sensitivity and specificity of 89.1% and 89.4%, respectively). Among the 79 University Hospital Basel (USB) isolates, for which MIC measurements were performed as part of this study, the accuracy was 96.2% (sensitivity and specificity of 95.8% and 96.4%, respectively). Very major discrepancies (resistant isolates predicted to be susceptible) were found for 1 of 21 (4.7%) and 42 of 430 (9.8%) isolates of the USB and PATRIC data sets, respectively. Considering amino acid substitutions at critical sites as inactivating mutations would further improve accuracies and discrepancy rates for genotype-phenotype predictions.
Role of regulators affecting MexAB and AmpC expression. Carbapenem susceptibility can be affected by additional genetic factors, including the expression levels of the intrinsic genes blaPDC (AmpC beta-lactamase) and mexA and mexB (MexAB multidrug efflux pump) (17,18). Here, loss (truncations, premature stop codons, frameshifts, or stop loss) of the MexAB repressors MexR, NalC, and NalD was predominantly found among isolates that simultaneously harbored primary carbapenem resistance determinants (OprD loss, OprD missense mutations, or presence of carbapenemase genes) ( Table 3). MexR-, NalC-, and NalD-deficient isolates exhibited 1.5-to 4-fold-increased meropenem MICs compared to those with intact repressor genes. In the absence of carbapenemase genes or OprD loss, these isolates were predominantly classified as susceptible. MexR loss was, however, detected in 11 out of the 43 isolates with very major genotype-phenotype discrepancies, sometimes (n = 4) coinciding with OprD missense mutations (Table S2). Loss of the AmpC repressor AmpR (in combination with an intact blaPDC) was detected in 20 isolates, which often (n = 11; 55.0%) coharbored primary carbapenem resistance determinants. The nine isolates with AmpR loss alone showed no increased MICs (median, 0.5 mg/L; IQR, 0.5 to 1 mg/L). Missense mutations in repressor genes or changes in their binding sites may also affect expression levels but were not further investigated here. Overall, our data suggest that the derepression of efflux pumps and intrinsic beta-lactamase genes plays a minor role in meropenem resistance of clinical P. aeruginosa isolates. Similar results were recently reported for other carbapenems (11).
Identical disruptive mutations were occasionally identified among closely related isolates, pointing to the spread of OprD-deficient clones. A phylogenetic cluster of 18 isolates within ST277 carried the frameshift mutation c.380_381delTG (p.E127fs) within OprD_6 (Table S1). These isolates were obtained between 1997 and 2019 at various geographic locations in Brazil (n = 16) and the United States (n = 2), corresponding with reports of a Brazilian endemic ST277 clone carrying this stable OprD-inactivating mutation (26). A subclone of the high-risk clone ST175 carrying a characteristic premature stop codon (Q142*) in OprD_1 and detected in Spanish and French hospitals over at least a decade (27,28) was represented in the validation data set (17 isolates) (Table S2). Compensatory mutations and coacquisition of AMR determinants. The presence of relatively conserved OprD throughout the P. aeruginosa population suggests an important metabolic role of this porin, with OprD loss possibly leading to fitness defects. To investigate potential compensatory mutations acquired by OprD-deficient isolates, we performed a kmer-based genome-wide association study (using DBGWAS) for a subset of 700 isolates with (n = 206) versus without (n = 494) predicted porin loss (NCBI data set; Table S1). While mutations in oprD itself were not captured by DBGWAS, three genetic determinants reached genome-wide significance (q [Benjamini-Hochberg adjusted P] , 0.05) for associations with OprD-loss, (i) the chromosomal mutation gyrA T83I (q min = 6.6E-18) and (ii) parC S87L (q min = 1.6E-7) (both contributing to ciprofloxacin resistance), and (iii) a component (q min = 1.4E-5) representing intI1, sul1 (sulfonamide resistance), qacED1 (antiseptic resistance), and an acetyltransferase-encoding gene, which are typically colocated on a class I integron (Table S4). Rather than representing compensatory mutations or gene acquisitions, these results likely reflect an accumulation of antimicrobial resistance mechanisms during patient treatment due to the use of carbapenems as a last-resort antibiotic for multidrug-resistant clones, as commonly recommended in antibiotic stewardship programs. In the entire NCBI data set of 2,088 isolates, 65.3% of the OprD-deficient isolates (n = 614) carried a mutation linked to quinolone resistance (gyrA T83I, gyrA D87G, gyrA D87H, gyrA D87N, parC S87L, or parE A473V) compared to 18.3% (P , 0.001; odds ratio [OR], 8.4; 95% confidence interval [CI, 6.8 to 10.4]) among the isolates with an intact OprD (n = 1,474). The sul1 gene was found in 47.7% of the OprD-deficient isolates, compared to 10.3% (P , 0.001; OR, 7.9; 95% [CI, 6.3 to 10.0]) of the isolates with intact OprD.
Conclusion. In conclusion, we developed, evaluated, and validated PorinPredict, a much-needed tool that predicts OprD-inactivating mutations in P. aeruginosa. Among carbapenemase-negative isolates, predicted OprD loss was overall highly consistent with phenotypic meropenem nonsusceptibility. Independent prospective validation is, however, necessary to provide evidence suitable for in vitro diagnostic regulations (IVDR). Although relatively rare, the effects of OprD missense mutations often remain difficult to interpret. The increasing availability of whole-genome sequencing and associated susceptibility data will enable an extension of the database covering additional variants. Given the increased application of whole-genome sequencing in clinical laboratories, PorinPredict will be valuable for detecting and tracking resistance within patients and communities.
Development of PorinPredict. PorinPredict was designed to detect mutations leading to a putative OprD loss or OprD amino acid substitutions from preassembled genomes. PorinPredict uses OprD nucleotide and protein sequence databases in combination with BLASTn 2.5.01 (37) and DIAMOND 2.0.7 (32), respectively, to assess OprD integrity. BLASTn 2.5.01 (37) is used with a 95% sequence identity threshold and the options "-max_target_seqs 10 -evalue 1E-20 -culling_limit 1." The BLASTn hit with the lowest E value is evaluated. DIAMOND is run in BLASTx mode with sequence identity and coverage thresholds of 95% and 60%, Prediction of P. aeruginosa OprD Loss from WGS Data Microbiology Spectrum respectively, and the options "--query-gencode 11 -b 6 --max-target-seqs 1 --sensitive --masking 0 -c1." Amino acid substitutions are identified using the R package Biostrings (v3.14) (38). PorinPredict additionally assesses the integrity of the oprD promoter region by screening input assemblies against a database consisting of oprD promoter regions from five isolates representing the distinct P. aeruginosa clades (A, B, C1, C2, and C1-2 intermediate; defined according to Ozer et al. [39]). These sequences consist of the 200-bp region upstream of oprD comprising the ArgR binding site, 235 and 210 region, and Shine-Dalgarno sequence (40) and 10 bp of the N-terminal oprD sequence. For the promoter screening, BLASTn 2.5.01 (37) is used as described above. BLASTn and DIAMOND results are filtered and summarized in an interpretable output table.
PorinPredict reports (i) exact matches (same length, no amino acid substitutions) to intact OprD variants in the database; (ii) nonexact matches, i.e., intact same-length variants with amino acid substitutions (missense mutations); (iii) mutations leading to a presumptive OprD loss, including premature stop codons (nonsense mutations), truncations, frameshift mutations and other indels, loss of stop codons (stop-loss), loss of start codons (start-loss), and absence of oprD in the assembly (no hit); and (iv) putatively disrupted oprD promoter regions (BLASTn matches with ,98% coverage). Evaluation of PorinPredict. PorinPredict 1.0.0 was evaluated using a second, independent data set consisting of 1,124 P. aeruginosa genome assemblies that had associated meropenem susceptibility data available and passed the above-defined CheckM and QUAST quality criteria. This collection ("validation data set" in Table  S2) included 79 assemblies from isolates of the University Hospital Basel (USB) sequenced as part of this study and 1,045 assemblies accessed from PATRIC (41). For short-read sequencing of in-house isolates, genomic DNA was extracted using the DNeasy blood and tissue kit (Qiagen) and sequenced on the Illumina NextSeq 500 or Illumina MiSeq platforms. Draft genomes were assembled using Unicycler version 0.3.0b (42) after read trimming with Trimmomatic version 0.38 (43).
Statistical tests. Frequency counts were compared using a two-tailed Fisher's exact test. P values of ,0.05 were considered to reflect statistical significance.
Data and software availability. Sequencing data generated as part of this study are available under BioProject accession no. PRJEB54973. Accession numbers of included assemblies are listed in Table S1 (NCBI data set) and Table S2 (validation data set) in the supplemental material. PorinPredict is available under the terms of the GNU General Public License 3.0 at https://github.com/MBiggel/PorinPredict/.

SUPPLEMENTAL MATERIAL
Supplemental material is available online only. SUPPLEMENTAL FILE 1, XLSX file, 0.5 MB. SUPPLEMENTAL FILE 2, PDF file, 0.3 MB. assistance with sequencing. We thank Alfredo Mari for extensive helpful discussions on the implementation of PorinPredict.