CRISPR-based subtyping to track the evolutionary history of a global clone of Acinetobacter baumannii

Acinetobacter baumannii global clone 1 (GC1) is the second most common clone in the global population of A. baumannii isolates and a key cause of hospital-acquired infections. In this study, comparative analysis of the clustered regularly interspaced short palindromic repeats (CRISPR)-based sequence types (CST) was performed to determine the genetic relatedness and track patterns of descent among 187 GC1 isolates, as a complement to the evolutionary inferences from their multilocus sequence types and genome-wide single nucleotide polymorphism (SNP)-based phylogeny. The CST2 cluster, CST2 and all the CSTs descending from CST2, corresponded to GC1 lineage 1. This cluster included 143 of the 187 isolates showing a prevalent geographical distribution worldwide. A well-demarcated group of 13 CSTs, accounting for 33 of the 187 isolates, corresponded to GC1 lineage 2. All the CSTs of this group were characterized by the absence of spacer Ab-18. Many of the GC1 lineage 2 isolates had an epidemiological link to the Middle East and/or were obtained in military healthcare facilities. GC1 lineage 3 was a novel lineage that has so far been limited to Afghanistan, Pakistan and India. Diversification of A. baumannii GC1 into lineages and clades has probably been related to a dynamic expansion after passing a migration bottleneck to enter the hospital environment. We conclude that CRISPR-based subtyping is a convenient method to trace the evolutionary history of particular bacterial clones, such as A. baumannii GC1.


Introduction
Acinetobacter baumannii is a glucose-non-fermenting Gram-negative bacterium that has widely been implicated in hospital-acquired infections (Nowak and Paluchowska, 2016).Although it has been found in a variety of environmental samples, the natural habitat of A. baumannii is still not known.This opportunistic pathogen has remarkable abilities to endure desiccation and starvation, acquire resistance to different classes of antibiotics, and disseminate in and between medical facilities (Chapartegui-González et al., 2018;Hamidian and Nigro, 2019).Worryingly, carbapenem-resistant A. baumannii has recently been labelled as "Priority 1: CRITICAL" in the world health organization list of antimicrobial resistant bacteria to guide research and development of new effective antibiotic treatments (Tacconelli et al., 2018).Extensive utilization of ventilators, due to the recent COVID-19 pandemic, has increased the risk of hospital-acquired ventilator-associated secondary infections, for which A. baumannii has been a main cause (Lescure et al., 2020;Chen et al., 2020).
The exposed global population of A. baumannii, largely biased by clinical isolates, is predominated by few highly successful clones, including global clones (GC) 1 and 2, also known as international clones I and II (Hamidian and Nigro, 2019;Higgins et al., 2017).GC1 is currently the second most common A. baumannii clone with a widespread geographical distribution in more than 30 countries (Karah et al., 2012).Strains belonging to GC1 have showed a steady increase in the rates and ranges of their antimicrobial resistance over the past five decades (Holt et al., 2016).The oldest known GC1 isolate, HK302, was collected in 1977 in Switzerland (Krizova and Nemec, 2010).HK302 was multidrug-resistant, but susceptible to imipenem, and was associated with an outbreak of nosocomial infections (Devaud et al., 1982).Concurrently, time-stamped phylogenetic analysis of the wholegenomes of 44 GC1 strains estimated that the most recent common ancestor of GC1 emerged around 1960 and then diverged into two phylogenetically distinct lineages (Holt et al., 2016).The two GC1 lineages, L1 and L2, have then diversified into multiple clades accumulating different resistance determinants through the acquisition of plasmids and transposons and/or chromosomal mutations (Douraghi et al., 2020;Hamidian et al., 2019).
According to the Pasteur scheme for multilocus sequence typing (MLST), GC1 can be interrelated with clonal complex (CC) 1 (Diancourt et al., 2010).CC1 was initially composed of only five sequence types (STs), namely ST1, ST7, ST8, ST19 and ST20.The neat demarcation of these five STs from other non-CC1 STs endorsed the potential of this scheme to identify isolates belonging to CC1, and subsequently to GC1 (Diancourt et al., 2010).In addition, several programs to extract the MLST allelic profiles from assembled bacterial genomes are openly accessible online, providing a valuable tool for rapid assignments of STs to isolates subjected to whole genome sequencing (Larsen et al., 2012).
We have previously reported that CRISPR-cas subtype I-Fb, standing for clustered regularly interspaced short palindromic repeats (CRISPR) -CRISPR associated (cas) genes of the subtype I-Fb, is a conserved genetic element in the genome of GC1 (Karah et al., 2015).The occurrence of CRISPR-cas subtype I-Fb at the same locus on the chromosome of 106 GC1 isolates confirmed our results and indicated that this element was most likely acquired by a common ancestor of GC1 before diversification into intra-clonal lineages (Alvarez et al., 2020).The CRISPR-cas subtype I-Fb locus, located at position 1,057,691 to 1,069,768 of the genome of A. baumannii strain AYE (GenBank accession number: CU459141), included six genes, encoding for the Cas machinery, and an array of spacers (Karah et al., 2015).Upon the entry of a foreign element, the Cas machinery takes up a short sequence, called proto-spacer, from the invasive DNA and integrates it into the leading end of the array, where the adjacent direct repeat is duplicated and the integrated sequence becomes a new spacer flanked by two direct repeats (Barrangou and Marraffini, 2014).Interestingly, the spacer denoted Ab-1 was present at the trailer end of the CRISPR arrays in all the GC1 isolates, proposing it as a genomic indicator of GC1 (Karah et al., 2015).However, Ab-1 was also present in few non-GC1 isolates, such as NIPH 201 (APQV00000000.1) and Naval-82 (AMSW00000000.1)belonging to ST38 and ST428, respectively.
CRISPR-Cas systems were suggested to have an important role in the genomic changes of the Acinetobacter genus, particularly in controlling the transfer of conjugative elements (Touchon et al., 2014;Mangas et al., 2019).Due to their dynamic nature, comparative analysis of the CRISPR arrays of spacers provided a valuable secondary technique for subtyping isolates belonging to particular clones of A. baumannii, including GC1 (Karah et al., 2015).For instance, analysis of few spacers at the leading end of the CRISPR arrays was very informative to reveal diversity among local GC1 isolates (Hauck et al., 2012).Close clonal relationship was detected between twelve ST409 isolates collected in Greece between 2018 and 2019 and two ST1 isolates collected in Norway in 2011 and 2013 (Galani et al., 2020).Interestingly, one of these two Norwegian isolates had a history of import from Greece (Karah et al., 2015).In this study, the CRISPR-based subtyping approach was used to investigate the genetic relatedness among a collection of 187 clinical isolates belonging to A. baumannii GC1, and to provide new insights into the evolutionary path of this eminent clone.

In-silico plot to search for A. baumannii GC1 genomes
The Nucleotide Basic Local Alignment Search Tool (BLASTn) algorithm (https://blast.ncbi.nlm.nih.gov/Blast.cgi;Altschul et al., 1990) was used to screen for A. baumannii genomes carrying Ab-1 at the leading end of their CRISPR arrays of spacers.A query of 88 bp, including the nucleotide sequence of spacer Ab-1 and the two surrounding direct repeats, was used to search for similarities against the "RefSeq Genome Database (refseq_genomes)" database."Acinetobacter baumannii (taxid:470)" was used as the target organism.The BLASTn search was run under default parameters except for using 500, instead of 100, as the "max target sequences".Epidemiological data (year of isolation, country of isolation, and type of sample) were retrieved, from the online records or relevant literature, for all the hits identified by BLASTn.The nucleotide sequences of the corresponding whole genomes (complete or contigs) were downloaded on a local drive as FASTA files.
The online service "Sequence query -Acinetobacter baumannii MLST (Pasteur)", hosted at the Acinetobacter baumannii MLST website (htt ps://pubmlst.org/abaumannii/),was used to determine the ST of the isolates according to the Institute Pasteur's MLST scheme (Diancourt et al., 2010).goeBURST and PHYLOViZ were used to generate and visualize a minimum spanning tree based on the allelic profiles of the whole A. baumannii MLST dataset (Francisco et al., 2009;Francisco et al., 2012).The n Locus Variant (nLV) Graph service was applied to display all possible links.We used one LV as the maximum number of differences between nodes (Ribeiro-Gonçalves et al., 2016).Accordingly, STs were grouped into CCs if they shared 6/7 of their MLST alleles with at least one other ST in the group (Feil et al., 2004).

Single-nucleotide polymorphism (SNP)-based phylogenetic analysis
CSI Phylogeny 1.4 (Call SNPs and Infer Phylogeny) was used to generate a genome-wide SNP-based phylogenetic tree for all the GC1 isolates (Kaas et al., 2014).The SNPs were called, filtered, site validated, concatenated, and aligned following the default parameters of the webbased service (https://cge.cbs.dtu.dk/services/CSIPhylogeny/).The genomic sequence of strain DSM30011, isolated before 1944, was used as a reference genome (GenBank accession no.JJOC02000000; Repizo et al., 2017).The reference genome was excluded from the phylogenetic tree.FigTree (http://tree.bio.ed.ac.uk/software/figtree/) was used to provide a high-quality graphical view of the generated phylogenetic tree.

CRISPR-based subtyping
The CRISPRCasfinder platform was used to detect and retrieve the nucleotide sequence of the CRISPR arrays of spacers (https://crisprcas.i2bc.paris-saclay.fr/CrisprCasFinder/Index;Couvin et al., 2018).Each spacer with a newly defined sequence was assigned a new consecutive number, and each array with a newly defined assortment of spacers was given a new CRISPR-based sequence type (CST), as previously described (Karah et al., 2015).Then, a binary file was manually created to visualize and compare the presence (red rectangle) or absence (empty) of spacers in each CST and to trace the ancestry of CRISPR arrays in GC1.

Identification of A. baumannii GC1 genomes
Our BLASTn search (as of June 11, 2020) yielded a total of 260 hits with query coverage of 100% and nucleotide identity of 94.62% to 100% (Supplementary Table S1).Two hits represented a duplicate of the same CRISPR array on two different contigs (NZ_ASFN01000020.1 and NZ_ASFN01000054.1) in the genomic record of isolate TG22148 (Supplementary File S1).One of these two hits was omitted.Additional seven hits were excluded since they represented a repetitive genomic record of the same isolate (for instance, both JABU00000000.1 and JPHW00000000.1 corresponded to isolate MRSN 58).On the other hand, isolates recovered from the same patient or obtained during an outbreak caused by one strain were not excluded.

Updates on CC1
CC1 is currently part of a large complex consisting of several other CCs and hundreds of STs (Fig. 1B).Although we could not spot a strong founder for the large complex, clonal expansion of an overlooked ST is still the rational theory behind the formation of this complex (Feil et al., 2004).Alternatively, the occurrence of STs that might accidentally link different CCs is probable due to the large size of the MLST dataset.In order to make an outline for CC1, we had to make artificial borders and break the ties between ST94/ST325, ST94/ST495, and ST174/ST325 (Fig. 1B).Accordingly, ST325 and ST495 and their subsequent connections were subjectively considered to be outside CC1.Based on this outline, CC1 has so far included a total of 70 STs (Supplementary Table S2).As expected, ST1 has retained a robust central position, supporting the proposal that CC1 has emerged as a clonal expansion of ST1 (Diancourt et al., 2010).In other words, the most recent common ancestor of CC1, corresponding to GC1, most likely belonged to ST1 (Feil et al., 2004).Gradually, allelic changes have accumulated and descendants with new STs have emerged.Then, some of the descendent STs, such as ST20, ST81, ST19, ST493, and ST594, have become the founders of additional expansions and diversifications in CC1 (Fig. 1B).

Genome-wide SNP-based phylogenetic analysis
The phylogenetic tree of GC1 demonstrated the emergence of around ten clades and few subclades of isolates (Supplementary Fig. S1).Isolates belonging to lineage 1, according to previous studies (Alvarez et al., 2020;Hamidian et al., 2019;Holt et al., 2016), were overdistributed among several clades, making it difficult to delineate this lineage.Most of these isolates were supported by short internal branches.On the other hand, isolates from lineage 2, according to Douraghi et al., 2020 andHamidian et al., 2019, were all assembled on a well delineated bush characterized by relatively long branches.Although the branch length in our tree was not proportional to time and only reflected to the amount of evolutionary divergence (the number of nucleotide substitutions) that has occurred along that branch, we could infer that lineage 1 was older than lineage 2, as reported by other studies (Holt et al., 2016).Our analysis did not exclude SNPs found in the accessory genomes nor those probably acquired via a homologous recombination event.Yet, our results were largely consistent with previous phylogenetic studies where strict parameters were used (Hamidian et al., 2019;Alvarez et al., 2020).

CRISPR-based subtyping of GC1
Full sequences were available for the CRISPR arrays of only 187 of the 209 GC1 isolates.Different compositions of the arrays enabled us to assign these 187 GC1 isolates into 45 CSTs, including 35 novel CSTs (Supplementary Tables S1, S3, and S4).The novel CSTs, CST41 to CST75, were designated according to the current numbering system for CSTs in A. baumannii (Karah et al., 2015).The arrays mainly differentiated by the acquisition of new spacers at the leading ends and/or due to internal duplications or deletions of vertically inherited spacers, as described in other bacterial species (Kupczok et al., 2015).The arrays ranged in size between 15 spacers (CST66) and 110 spacers (CST63).CST1 was the most common subtype, with 65 isolates recovered between 2002 and 2016 from 13 countries in North America, Europe, Asia, Africa and Australia (Supplementary Table S1).The year and/or country of isolation were not available for 4 isolates.
Twenty-one isolates belonged to CST2, the second main subtype.CST2 was collected between 1984 and 2019 from different parts of the world, with the exception of Australia.Importantly, deletion of spacer Ab-40 was the only difference between CST2 and CST1.Each of the remaining CSTs were composed of 1 to 15 isolates.However, the size of some subtypes (for example, CST8 and CST13) was augmented by the presence of epidemiologically related isolates (Higgins et al., 2016;Lesho et al., 2013).We could not make a definite assignment for the remaining 22/209 GC1 isolates because the sequence of their arrays of spacers was divided on ≥2 contigs and some parts were missing or overlapping (Supplementary Table S1).Nonetheless, potential relationships were inferred from the available sequences.For example, isolates MRSN960 (NZ_VHDR01000016.1) and MRSN489678 (NZ_VHEN01000006.1)would perfectly fit into CST2 and CST65, respectively.Isolates ACB5 (NZ_OEON01000001.1) and aba_5m (NZ_CABEFJ010000030.1) belonged to a novel CST, which would be distinguished from CST61 by the deletion of 3 consecutive spacers (Supplementary File S3).

Post-migration diversification
The common ancestor of CRISPR arrays in GC1, designated CST0 GC1 , consisted of 54 spacers (Ab-1 to Ab-54) as shown in Table S3.We propose that the most recent common ancestor of GC1 has entered the hospital ecosystem at this point.It is tempting to propose that this entry has happened on only one occasion and that GC1 has since then been persisting in the hospital environment.Earlier to this point, this ancestor was challenged by foreign DNA (plasmids or viruses) coexisting in the same unknown pre-hospital niche.Spacers Ab-55 to Ab-106 and Ab-877 to Ab-1024 were acquired after this point and could accordingly provide insights into the pool(s) of DNA inhabiting the hospital environment.CST0 GC1 , and correspondingly the most recent common ancestor of GC1, has been evolving through two major pathways, designated 1 and 2 (Fig. 2).Deletion of Ab-15 and Ab-16 and deletion of Ab-18 are the genetic markers of pathways 1 and 2, respectively.A. baumannii isolate AB307-0294 (CP001172.2) has been standing alone with a secluded subtype (CST10) characterized by the deletion of Ab-23 to Ab-31.The exceptional position of AB307-0294 was also reported by other studies (Holt et al., 2016).
In order to survive in and between hospitals and succeed to infect and spread among patients, bacteria need to overcome a number of stressful conditions caused by the regular use of disinfectants, human immune mechanisms, and antibiotic treatments.Going through such harsh selective pressures, which can be described as a migration bottleneck, has shaped the clinical population of A. baumannii.The A. baumannii clinical isolates appear to have a constrained diversity in comparison with the environmental counterpart e.g.isolates from soil (Furlan et al., 2018).To our knowledge, only one isolate from GC1 has so far been reported from a hospital-unrelated environmental sample (Rafei et al., 2015).The whole genome of this isolate is under sequencing (according to personal communication with Dr. Rayane Rafei).
Once the bottleneck is passed, the newcomers gain a growth advantage, which could explain the predominance of few clones, such as GC1, in the current population of healthcare-associated A. baumannii (Karah et al., 2012).Further adaptability to the new environment has probably been a key factor behind the persistence, expansion, and widespread dissemination of GC1, as reported for other bacterial species (Martínez and Baquero, 2002).The post-migration GC1 lineages and clades are expected to be very homogenous in their core genomes (Antunes et al., 2014).Intra-clonal genomic deviations will be mainly related to genetic determinants providing better adoption to the hospital environment, including the accumulation of horizontally acquired antimicrobial resistance genes or having distinct outer surface molecules (Holt et al., 2016).(Karah et al., 2015, the green stars referred to isolates from this study).L1, highlighted in blue, and L2, in yellow, were used to label isolates assigned to GC1 lineage 1 and lineage 2, respectively, as described by previous studies (Alvarez et al., 2020;Douraghi et al., 2020;Hamidian et al., 2019;Holt et al., 2016).ND, standing for not determined, was used to label isolate AB307-0294.

GC1 lineage 1
The main branch in pathway 1 corresponded to GC1 lineage 1 (Holt et al., 2016).The most recent common ancestor of this lineage belonged to CST2 (Fig. 2), descending from CST0 GC1 by the acquisition of spacer Ab-55 and the occurrence of a single nucleotide polymorphism (C to A substitution) in the direct repeat separating spacers Ab-22 and Ab-23 (Supplementary File S4).CST2 has so far been identified in the Netherlands 1984, Czech Republic 1994, Greece 2002, USA 2008-2010, Spain 2010, Honduras 2012and 2016, Turkey 2013, Pakistan 2015, Iraq 2016, South Africa 2017, and India 2019, indicating that this antecedent subtype is still alive.Comparison of the year of isolation among the earliest isolates in GC1 lineage 1 was consistent with our proposal that CST2 was born first.In agreement, most of the CST2 isolates had very short internal branches and were mainly positioned at the base of the SNP-based phylogenetic tree (Supplementary Fig. S1).
Although CST2 has subsequently evolved toward a variety of CSTs, the emergence of CST1 was probably the most significant step in the dissemination of GC1 lineage 1. CST1 was also the founding node of several following subtypes (CST4, CST6, CST67, CST68, CST71, CST73, CST43, CST8, and CST44).We have previously reported that several isolates from CST1 were obtained or had a history of import from Iraq (Karah et al., 2015).Importantly, CST1, CST4 and CST67 have been involved in the dissemination of carbapenem-resistant isolates in India and Tanzania (Jones et al., 2014;Kumburu et al., 2019).The dissemination of carbapenem-resistant isolates belonging to CST8 has recently been reported in Greek hospitals (Galani et al., 2020).

GC1 lineage 3
The second main branch in pathway 1 was characterized by the acquisition of Ab-935 to Ab-940.Subtypes CST57, CST58, CST59, CST63, CST60, and CST66 have descended from this branch, hereby designated as GC1 lineage 3. GC1 lineage 3 included a total of eight isolates obtained between 2013 and 2019 from Afghanistan, Pakistan (Karah et al., 2020), and India.Isolates belonging to this lineage formed a demarcated cluster with long internal branches and an outstanding position in the genome-wide SNP-based phylogenetic tree (Fig. S1).Further analysis is needed to confirm the identity and characterize the phenotypic and genotypic features of this lineage.A number of phage DNA elements were detected as a potential source of the proto-spacers for Ab-935 to Ab-940 (Supplementary Table S5).However, we were not able to infer weather these spacers were acquired due to multiple independent interactions with several phage or plasmid DNA molecules or following a single contact with some yet unknown phage/plamsid.
GC1 lineage 2 had a total of 35 isolates showing a widespread geographical dissemination between the United states (2003− 2012), Germany (2003− 2013), Australia (2008), Norway (2009-2010), the United Kingdom (2011), Iran 2012-2013, Ukraine 2014-2016, and Georgia 2018.Many of these isolates were obtained from military healthcare facilities and others had a history of import from Iraq or Afghanistan (Chan et al., 2015;Douraghi et al., 2020;Farlow et al., 2019;Huang et al., 2010;Kovalchuk et al., 2018: Lesho et al., 2013).In accord, three isolates obtained from one military hospital in France in 2009 were also linked to CST13 (Hauck et al., 2012;Karah et al., 2015).However, we could not make a precise assignment for the French isolates since only the leader end of the CRISPR arrays was available for comparison.Two of the CST13 and ST94 isolates were collected in Norway with a probable import from India (Karah et al., 2015).There was no epidemiological data to link the last two isolates with military hospitals or war zones (Karah et al., 2011).
The isolates from Iran belonged to CST50 (n = 7) and CST51 (n = 2) and were all assigned to ST328 (Douraghi et al., 2020).Both CST50 and CST51 descended from CST13 according to our CST-based analysis (Fig. 2).However, the SNP-based phylogenetic tree demonstrated that the Iranian isolates were more related to the isolates of CST52, CST53, and CST54 in comparison to the CST13 isolates (Fig. S1).The MLST results also demonstrated that ST328 was more related to ST81 (1 locus variant) than to ST19, ST94, or ST315 (≥ 2 locus variants).In fact, the minimum spanning tree proposed that ST238 evolved from ST81 (Fig. 1B).
Searching for other isolates that might belong to GC1 lineage 2, we have found few articles demonstrating the occurrence of ST19 in Bulgaria in 2002 (from a military medical center), Saudi Arabia in 2006, and Croatia in 2009 (Aly et al., 2016;Diancourt et al., 2010;Dobrewski et al., 2006;Vranić-Ladavac et al., 2014).ST315 was reported in Belgium in 2010 (De Vos et al., 2016).The ST315 isolate had a history of import from Tunisia and was collected at a military hospital in Brussels.The occurrence of ST94 was also reported in the Kurdistan region of Iraq in 2012 (Ganjo et al., 2016).The A. baumannii MLST database included another isolate (Kh_4) that belonged to ST94 and was collected in Iraq in 2019 (https://pubmlst.org/bigsdb?page=infoanddb=pubmlst_abaumannii_isolatesandid=4149).

Conclusion
CRISPR-based subtyping is suggested as a powerful and practical method to detect and track the patterns of descent among isolates of particular bacterial clones, such as A. baumannii GC1.Our study demonstrated that the most recent common ancestor of the current known population of GC1 carried an array of 54 spacers.Passing through a migration bottleneck, this ancestor has managed to establish itself as a permanent resident of the hospital environment.Since then, it has been following two main pathways of evolution, through which several lineages and clades have emerged.We found that the most recent common ancestor of GC1 lineage 1 belonged to CST2, an early but still active subtype.GC1 lineage 2 included a well demarcated group of isolates, which was mostly linked to military hospitals and war zones.We identified a novel lineage, designated GC1 lineage 3, which was distributed in the region of Pakistan, Afghanistan, and India.The occurrence of some spacers can be used to trace particular lineages, clades, or single CSTs.For instance, spacer Ab-55 is a hallmark of GC1 lineage 1 while the absence of spacer Ab-18 points toward GC1 lineage 2.

Fig. 1 .
Fig. 1.Minimum spanning tree generated by goeBURST based on the allelic profiles of the whole Acinetobacter baumannii MLST dataset (A) and detailed snapshot on GC1 (B).Query-positive clonal complexes (CC) and sequence types (ST) were highlighted in yellow.CC1 and the non-CC1 STs were marked by blue and red rings, respectively.

Fig. 2 .
Fig.2.Graphic tree of the evolutionary history of Acinetobacter baumannii GC1, inferred from the composition of their CRISPR arrays.Patterns of decent were established for 48 CRISPR-based sequence types (CSTs) including CST6, CST9 and CST12, retrieved from a previous study(Karah et al., 2015, the green stars referred to isolates from this study).L1, highlighted in blue, and L2, in yellow, were used to label isolates assigned to GC1 lineage 1 and lineage 2, respectively, as described by previous studies(Alvarez et al., 2020;Douraghi et al., 2020;  Hamidian et al., 2019;Holt et al., 2016).ND, standing for not determined, was used to label isolate AB307-0294.

Table 1
Multilocus sequence types of 209 query-positive isolates belonging to clonal complex 1 and their geographical distribution.