The promiscuous and highly mobile resistome of Acinetobacter baumannii

Antimicrobial resistance (AR) is a major global threat to public health. Understanding the population dynamics of AR is critical to restrain and control this issue. However, no study has provided a global picture of the whole resistome of Acinetobacter baumannii , a very important nosocomial pathogen. Here we analyse 1450+ genomes (covering >40 countries and >4 decades) to infer the global population dynamics of the resistome of this species. We show that gene flow and horizontal transfer have driven the dissemination of AR genes in A. baumannii . We found considerable variation in AR gene content across lineages. Although the individual AR gene histories have been affected by recombination, the AR gene content has been shaped by the phylogeny. Furthermore, many AR genes have been transferred to other well-known pathogens, such as Pseudomonas aeruginosa or Klebsiella pneumoniae . Despite using this massive data set, we were not able to sample the whole diversity of AR genes, which suggests that this species has an open resistome. Our results highlight the high mobilization risk of AR genes between important pathogens. On a broader perspective, this study gives a framework for an emerging perspective (resistome-centric) on the genomic epidemiology (and surveillance) of bacterial pathogens.


INTRODUCTION
Antimicrobial resistance is a major global menace to humans all over the world. In this regard, the ESKAPE (Enterococcus faecium, Staphylococcus aureus, Klebsiella pneumoniae, Acinetobacter baumannii, Pseudomonas aeruginosa and Enterobacter species) group is a significant cause of deaths and burden disease in many countries [1]. This group of bacteria can easily acquire antimicrobial resistance genes (ARGs). In 2017 the World Health Organization issued a list of bacterial pathogens for which novel drugs are immediately required [2]. At the top of this list, with the highest priority status (priority 1: critical), was carbapenem-resistant A. baumannii. Notably, most A. baumannii hospital infections are caused by multidrug-resistant (MDR) isolates.
The resistome is the set of ARGs present in a given species or a particular environment [3]. Over the last decade, the resistome of some species and some ecological communities have been studied to an unprecedented detail due to the extensive amount of genetic information provided by genome sequencing. For instance, metagenomics and functional genomics studies have shown that ARGs are frequently found in many different environments [4][5][6][7][8]. These range from human-made environments, such as hospitals [5], to pristine and even isolated environments, such as isolated caves [7]. Furthermore, many ARGs are globally distributed and are rather diverse.
Genome sequencing has also been tremendously useful to better understand the biology of A. baumannii. Studies analysing the pangenome of this species have unveiled important aspects of its genetic variation [9] and its defence mechanisms [10,11]. Additionally, population genomics and genomic surveillance studies have been very useful to study the geographical and OPEN ACCESS temporal spreading of important lineages in this and other species [12][13][14][15][16][17]. However, most of these studies have used the core genome as the framework to understand the dynamics of the ARGs. Indeed, ARGs are often just mapped onto the core phylogeny. This strategy can be useful for rather clonal species, such as Staphylococcus aureus. Nonetheless, in species with high rates of gene content variation, such as A. baumannii [9], this strategy is bound to be problematic. Furthermore, ARGs are commonly found within mobile genetic elements. Hence, ARGs can be easily moved horizontally within and between species. In this regard, the potential transfer of ARGs between pathogens from different genera is especially concerning in hospital settings [6]. Therefore, to properly track the transmission dynamics of ARGs, population genomics studies focusing on the ARGs per se are required.
Over the last decade, and due to its clinical relevance, a lot of knowledge has been gained for A. baumannii through the genome sequencing of many hundred isolates [18]. However, despite the rapid accumulation of genomes, there has been a few studies that have tried to use all this information to analyse diverse aspects of A. baummannii [10,19]. For instance, Mangas and colleagues focused on the distribution of CRISPR/Cas and plasmids in different lineages of the species [10], whereas Hamidian and Nigro focused on the distribution of some (but not all) the ARGs in A. baummannii [19]. Importantly, none of those studies paid attention to only including high-quality genomes. This is very relevant as this species has a very fluid genome with many mobile genetic elements; due to this, draft genomes often do not capture the whole repertoire of genes (ARGs included) present in a given isolate. Thus, to comprehensively study the population dynamics of the resistome of this species, high-quality genomes are required. Here we produce one of the most extensive views of the resistome of A. baumannii only including high-quality genomes. Our data set has almost 1500 high-quality genomes, covers 42 countries and four decades, and contains 149 lineages (sequence types). Our results showed that this is a very dynamic resistome, showing high levels of gene flow within the global population of A. baumannii. Furthermore, many ARGs are exchanged with other distantly related bacterial pathogens.

Genomes and quality check
Genomes sequences were downloaded in early March 2020 from the RefSeq NCBI Reference Sequence Database. Only genomes with reliable quality genome assembly status were considered; thus, only genomes showing an assembly level of 'complete genome' , 'chromosome' and 'scaffold' were included in the final data set. We also used CheckM v1.0.11 [20] to evaluate the completeness and contamination of the genomes and only genomes showing 95 % (or higher) of completeness and equal or less than 5 % of contamination were considered for downstream analyses. We downloaded a total of 1554 but 82 genomes were discarded (these are listed in Table S3), as they did not pass the quality criteria or were not A. baumannii as per their ANI value (see below). These genomes come from 287 different Bioprojects (see Table S1). For consistency all the genomes were re-annotated with prokka v1.13 [21] and the final list of the genomes included is provided in Table S1. We also run a maximum-likelihood (ML) non-recombinant core phylogeny as we have done previously in a genomic epidemiology study of A. baumannii [22]. First, we defined the core genes and excluded all the core genes that had recombination signals (recombination was detected as stated below). A total of 50 non-recombinant genes were concatenated. On the concatenated alignment an ML phylogeny was run by means of RAxML v8.2.12 [23].

Average nucleotide identity analysis and ST assignation
To make sure that all genomes belong to A. baumannii and to evaluate how similar they were, we conducted an average nucleotide identity (ANI) analysis using OrthoANI v 0.9 [24]. Using pairwise comparison, every genome was compared to the rest of the genomes and to the type strain ATCC 19606 (Biosample accession number SAMN13045090). We kept only the isolates that had an ANI value of 95 % or higher versus the type strain. We downloaded the allelic variants for profiles of all the ST described thus far for both the Oxford and the Pasteur schemes from the PubMLST database [25], accessed in late

Impact Statement
Although Acinetobacter baumannii is a very important pathogen for which many genomes have been sequenced, there has been no effort made to understand the global resistome of this pathogen from a population genomics point of view. We conducted a population genomics analysis of almost 1500 A. baumannii genomes to analyse the transmission dynamics of the resistance genes in this species. Our data showed that gene flow and horizontal transfer have been the driving factors in the dissemination of AR genes in A. baumannii. There was substantial variation in resistance genes between and within the different lineages. Notably, many antibiotic resistance genes have been interchanged with other important pathogens, such as Klebsiella pneumoniae and Pseudomonas aeruginosa. Finally, our results implied that A. baumannii has an open resistome. Our study reveals the high mobilization of antibiotic resistance genes within the global population of A. baumannii and even with other common pathogens. Above all, this study embodies the idea of a resistome-centred genomic epidemiology of bacterial pathogens.
March 2020. Then, we used blastn (requiring 100 % identity) to assign an allelic profile to each genome considering both schemes. Given that the Oxford MLST scheme has some issues with paralogy in the locus gdhB, we processed the data as in [25] to discard the paralogous genes causing issues.

Antibiotic resistance gene prediction and pangenome analysis
We employed the CARD database [26], accessed in late March 2020, to infer the ARGs in all the A. baumannii genomes. We used the Resistance Gene Identifier tool from CARD and only 'Perfect' and 'Strict' cases were considered but we also required a ≥90 % coverage between the query and the target. Antibiotic drug classes were also determined through CARD. Then, the ARGs were assigned to the different types of genome categories as follows: core genome, genes in 100 % of the isolates; soft core genome, genes between 95 % and less than 100 % of the isolates; shell genome, genes in less than 95 % but greater than 15 % of the isolates; and cloud genomes, genes in less than 15 % of the isolates. The ARGs were grouped into ARG families using the CARD classification.

Recombination analysis, horizontal gene transfer with other species and accumulation curve analysis
Recombination analyses were run on all the ARG family groups that had at least three sequences and which shared at least 80 % of coverage among all the sequences. We used the PHIpack programme [27] that implements the 'Pairwise Homoplasy Index' test (PHI test) to detect recombination signals within the ARG families. To infer recent events of HGT between the ARGs in A. baumannii and other bacteria, we used blastn (with an e-value of 1e-50) to search these ARGs against the 'nt' database from the NCBI. Of note, as we wanted only very recent HGT events, we considered only those hits that were 100 % identical (both in the coverage and in the % of identity) to the query. As query could have identical hits from several species, we chose that species that had the highest number of identical hits. We used the Vegan R library version 2.5-7 [28] to run a rarefaction curve of the number of ARGs as a function of the number of genomes. We employed the 'specaccum' function for this, setting the method 'rarefraction' . To establish whether the resistome was closed or open, we used the Heaps' law model; to that end we used the 'heaps' function in micropan version 2.2 [29], setting 100 000 permutations. The visualization tool Circos [30] was employed to illustrate the HGT cases between the ARGs in A. baumannii and the other three major bacteria contributors.

The vast accessory resistome of A. baumannii
To have the most comprehensive picture of the resistome of A. baumannii, we included as many genomes as possible; however, we did pay attention not to include lower-quality data. Thus, only genomes showing a high-quality assembly were downloaded (see Methods). Furthermore, we included only complete and uncontaminated genomes (completeness ≥95 % and contamination ≤5 %, see Methods). We also corroborated that all the genomes belonged to A. baumannii via an ANI analysis; where only isolates having ≥95 % identity with the type strain ATCC19606 were considered. We kept a total of 1472 genomes for downstream analyses (listed in Table S1). This is a considerably extensive data set for this species: it covered isolates from 42 different countries (Fig. 1a), a period of time spanning 76 years (four decades without outliers), and 149 sequence types (Fig. 1b). Importantly, this data set not was not biassed towards very closely related lineages, as the median ANI value (against the type strain) was 97.88 and the third quartile of all the ANI values was 97.93. Then, we used the Comprehensive Antibiotic Resistance Database [26] to catalogue and quantify the ARGs in the 1472 genomes. We found that the average number of ARGs per genome was 29.38; the frequency distribution histogram of ARGs per genome is presented in Fig. 2a. The ARGs were classified into 199 ARG families and these covered a wide range of drug classes (see Table 1 and S2). We noted that no single group was present in all the genomes (see Table 1). Twelve ARG families were present in more than 95 % of the genomes. These gene families correspond to the RND efflux pumps, well-known intrinsic resistance genes, and were present in most countries and in almost all the STs (see Table 1). We also found that 26 ARG families (13 %) were present in less than 95 % of the genomes but in more than 15 % of the genomes (see Table 1). Fig. S1 gives the distribution of the ARG families (grouped by type of drug class) on the core phylogeny, showing that many ARG families have undergone non-vertical transmission. Remarkably, almost all of these 38 most frequent ARG families were affected by recombination or horizontal gene transfer (see Table 1). However, the vast majority of the ARG families (161 groups, 81 %) were contained in less than 15 % of the genomes (seeTable S2). These ARG families were present in few countries and a few STs (see Table S2). Taken together, these results show that a considerable amount of ARGs is found within these genomes. Nonetheless, these ARGs do not belong to the core genome and are not strictly vertically transmitted. Notably, most of them are just present in less than 15 % of the strains.

High gene flow within the species and HGT with other pathogens
The fact that most of the ARGs are just present in some of the genomes implies that ARGs are frequently lost and gained. In accordance with this, we recently showed that gene turnover was of paramount importance in a recently emerged lineage   Information for the most frequent ARGs is provided. These ARGs were present in more than 100 genomes. The number of genomes, countries and ST in which every ARG was present is reported. In bold are those ARGs that had recombination signals and the superscript H means that identical allelic variants were found in other bacteria. of this species [9]. To further explore this, we analyse ARG content variation within individual STs; we only considered those STs that had ten or more genomes per ST. This approach allowed us to analyse ARG variation over short timescales. If ARGs were to be only disseminated by clonal expansions (without HGT or gene loss), the amount of ARGs per genome would be the same for a given ST. Contrary to this, Fig. 2b clearly shows that there is a huge variation, not only within STs but also between STs (Kruskal-Wallis test, P-value<2.2e-16). While the ST that had most ARGs was ST229 with an average number of 32 ARGs per genome, the ST with the lowest mean number was ST447 (see Fig. 2b) with 20 ARGs per genome. We also conducted the same analysis but considering the Pasteur MLST scheme instead of the Oxford one (see Fig. S2); very similar patterns were observed, with a large variation both within and between STs (Kruskal-Wallis test, P-value<2.0e-16). This high variation in ARG content within STs implies that, even over short timescales, acquisitions and losses of ARGs are very common. Then we went on to look if any of these ARG families have been horizontally transferred very recently (see Methods). We noted that 78 ARG families (39%) had identical allelic variants in other bacteria (see Tables 1 and S2).
Notably, most the HGT events were located in other nosocomial pathogens (see Table S2, Fig. 3a); Klebsiella pneumoniae and Pseudomonas aeruginosa are two of the most striking cases. Collectively, these results show that HGT (and gene loss) are of paramount importance not only within the species but also with other (nosocomial) bacterial species.

An open resistome partially structured by the phylogeny
Although ARGs are subject to loss and gain processes, the presence of the ARGs could be structured by the phylogeny. This is because HGT between closely related lineages is likely to be more successful and also the more distantly related two lineages are, the higher the likelihood of gene loss. Thus, to explore this, we conducted pairwise comparisons of all the isolates correlating how similar they were in terms of their ARGs versus their ANI values. We used ANI as a proxy for the phylogenetic relationship of the isolates. To establish how similar the ARG profiles between the isolates were, we used the Jaccard index. A perfect correlation would imply that the gains and losses of ARGs are shaped by phylogeny. Although not very strong, Spearman's correlation was significant (r 2 =0.57, P-value<2.2e-16). Thus, there is a correlation between ANI values and similarity in the ARG profiles, which implied that ARG content variation correlated with the phylogeny to some extent.
Finally, we wanted to establish if we were able to sample the whole diversity of ARGs in this species. Of note, given the very extensive nature of this data set in temporal and geographical terms, we would expect so. We ran an accumulation curve analysis to evaluate this (see Methods and Fig. 3b). This analysis showed that we have sampled much of the ARG family diversity. However, the curve did not level off and the slope is still very steep in the last section of the curve. Thus, the discovery rate of ARGs is still high after almost 1500 genomes and a lot more ARGs remain to be found.
To be completely sure about the openness of the resistome, we determine the value of the α parameter considering a Heaps' law model. Under this model an α value equal or lower than 1 denotes that the data are compatible with an open resistome; the value of the α parameter was 0.999 consistent with an open resistome. Taken together, these results imply that although gene flow is common within ARGs, ARG content variation within the isolates is structured to some extent by the phylogeny. Furthermore, this seems to be an open resistome.

DISCUSSION
Metagenomics studies over the last decade have been of paramount importance to characterize the resistome dynamics in different environments. However, reliable population genomic studies are still very much required to properly describe the transmission dynamics of ARGs both within and between bacterial species. A clear understanding of these transmission dynamics is essential for the effective use of antibiotics. Notably, this will provide useful knowledge to inform public health actions and infection control teams to establish better treatment options. However, there has not been a single study analysing the whole resistome of A. baumannii on a global scale. Here we gathered a very extensive -in terms of geographic and temporal components -dataset for this important nosocomial pathogen. Our analyses underscore a highly mobile and even promiscuous resistome. We found a considerable variation in ARGs profiles among the different lineages (STs). There are at least two contrasting forces that affects the distribution of the ARGs in A. baumannii populations. On the one hand, because most of these isolates are within clinical settings (under constant use of antibiotics) this will select for the presence of (at least some) ARGs. On the other hand, A. baumannii has defence mechanisms, for instance CRISPR/Cas systems or restrictionmodification systems, that can impair the introduction of the mobile genetic elements harbouring the ARGs [10,11]. In this regard, a recent study has shown that a group of A. baumannii having functional CRISPR/Cas systems have considerably fewer plasmids [10].
From a clinical standpoint, it is worth paying attention to the fact that many of the ARGs exchanged confer resistance to critically important antimicrobials for human health. For example, we found several ARGs conferring resistance against Aminoglycosides, which are frequently used to treat aerobic, Gram-negative infections in many parts of the world. We also detected transferable ARGs against Cephalosporins, employed to treat a wide variety of infections from both Gram-negative and Gram-positive bacteria. In connection with the previous point, our results have clear implications for the treatment against A. baumannii infections. For instance, we noted that several oxacillinases (OXAs), which confer resistance to Carbapenems (one of the last resort options to treat A. baumannii infections), were found in many STs and were even found in other pathogens. In this respect, it could happen that within the same locale (hospital) one would find several lineages of A. baumannii, some resistant and some susceptible. In this scenario, the resistant lineages can transfer the genetic resistance mechanism to the susceptible lineages; thus, rendering ineffective the potential antibiotic treatment. Importantly, this would not be limited just to lineages from A. baumannii but it could involve other pathogens, such as K. pneumoniae. Of note, the co-existence of several A. baumannii in a hospital is not unlikely. We have recently shown that different lineages of A. baumannii can co-exist in individual hospital settings [22,31].
In some important bacterial pathogens, such as K. pneumoniae, MDR isolates (showing many ARGs) are frequently found in high-risk clones. Contrary to that, we found that in A. baumannii ARGs are well dispersed among the global population and clonal expansion does not seem to be the major force dispersing the ARGs. In this regard, previous studies have shown that gene-content variation is of paramount importance for this pathogen [9]. Importantly, even very recently emerged ARGs seem to have experienced HGT between different lineages of this species [16]. One of the most relevant findings of our study, from a clinical microbiology point of view, is that A. baumannii readily exchanges ARGs with other highly critical pathogens such as K. pneumoniae or P. aeruginosa. In this respect, a recent study also found many instances of HGT across MDR bacteria from different genera in a single hospital [32]. Hence, infection control and detection strategies considering this pathogen should be focused on the ARGs, rather than on particular lineages, to prevent the transmission of such genes. We think that the risk of ARGs across important nosocomial pathogens should be considered of paramount importance not only from a microbial genomics point of view but also from an infection control perspective. Finally, and related to the high amount of HGT with other pathogens, we observed that, notwithstanding the vast data set we used, our accumulation curve analysis implies that we did not sample the full diversity of ARGs within A. baumannii. Therefore, very likely this species has an open resistome; given the tendency of this species to acquire ARGs from other bacteria, an open resistome is not actually unexpected. Considering this, we have previously shown that this species has a very dynamic genome that easily acquires and losses genes [9], and where MLST schemes do not seem to work properly [33].
Our study has several limitations. First, it is not properly global, as we did not sample evenly the different regions of the world. Thus, our findings should be taken with caution. However, our study is by far one of the most extensive studies of the resistome of this species, not only geographically and temporally but also in terms of lineage diversity. Another limitation is that we underestimated the rate of HGT with other bacteria. This is because we only considered very recent HGT events, i.e. identical gene sequences in different bacteria. Clearly, the rate of HGT is bound to be considerably higher, if HGT events of different ages are to be included in the estimation. Another limitation of our study is that our analyses were based on a genetic definition of antibiotic resistance [34]. We acknowledged that further studies considering also microbiological and clinical definitions of antibiotic resistance are required to fully grasp the complex issue of antimicrobial drug resistance. Finally, future resistome studies might consider including not only A. baumannii but the whole set of closely related species conforming the Acinetobacter calcoaceticus-Acinetobacter baumannii complex (ACB complex). This is because this complex shows convoluted evolutionary relationships often prone to taxonomic misclassification of isolates [35]. Furthermore, several ARGs first described in A. baumannii have also been found in other species from the ACB complex. Thus, the inclusion of the different species within the ACB complex would allow for a more extensive portrayal of the underlaying population structure where the ARGs reside in.
Understanding the population dynamics of the ARGs is crucial for their tracking and surveillance, not only in the clinic but also in non-clinical settings [36]. In this regard, we recently show that environmental isolates of A. baumannii can be an important source of ARGs [37]. Without a doubt, the understanding of the transmission dynamics of ARGs can be used to implement ad hoc infection control actions within different health systems all over the world. Thus, similar studies are required for many other members of the ESKAPE group, or any other relevant human pathogen for that matter, to properly tackle the major problem of antimicrobial drug resistance.