Cross-transmission Is Not the Source of New Mycobacterium abscessus Infections in a Multicenter Cohort of Cystic Fibrosis Patients

Abstract Background Mycobacterium abscessus is an extensively drug–resistant pathogen that causes pulmonary disease, particularly in cystic fibrosis (CF) patients. Identifying direct patient-to-patient transmission of M. abscessus is critically important in directing an infection control policy for the management of risk in CF patients. A variety of clinical labs have used molecular epidemiology to investigate transmission. However, there is still conflicting evidence as to how M. abscessus is acquired and whether cross-transmission occurs. Recently, labs have applied whole-genome sequencing (WGS) to investigate this further and, in this study, we investigated whether WGS can reliably identify cross-transmission in M. abscessus. Methods We retrospectively sequenced the whole genomes of 145 M. abscessus isolates from 62 patients, seen at 4 hospitals in 2 countries over 16 years. Results We have shown that a comparison of a fixed number of core single nucleotide variants alone cannot be used to infer cross-transmission in M. abscessus but does provide enough information to replace multiple existing molecular assays. We detected 1 episode of possible direct patient-to-patient transmission in a sibling pair. We found that patients acquired unique M. abscessus strains even after spending considerable time on the same wards with other M. abscessus–positive patients. Conclusions This novel analysis has demonstrated that the majority of patients in this study have not acquired M. abscessus through direct patient-to-patient transmission or a common reservoir. Tracking transmission using WGS will only realize its full potential with proper environmental screening, as well as patient sampling.

Mycobacterium abscessus (recently renamed as Mycobacteroides abscessus) [1], is a group of 3 closely related subspecies: M. abscessus subsp. abscessus, M. abscessus subsp. massiliense, and M. abscessus subsp. bolletii [1,2]. These rapidly growing, nontuberculous mycobacteria cause chronic pulmonary disease, particularly in patients with cystic fibrosis (CF) and other chronic lung diseases. Mycobacterium abscessus is an important pathogen that has emerged in the CF patient population and that has been associated with poor clinical outcomes, especially following lung transplantation [3][4][5]. This is due, at least in part, to the extensive antibiotic resistance that makes infections with this organism difficult to treat [2,6]. CF patients infected with M. abscessus are frequently not listed for transplant; therefore, the acquisition of this pathogen is considered to be a serious complication in this group.
The epidemiology of M. abscessus strains has been studied using Variable Nucleotide Tandem Repeats (VNTR) and Multi Locus Sequence Typing (MLST) [7]. The clustering of globally spread sequence types was confirmed with whole-genome sequencing (WGS) and has provided greater resolution in how the various lineages are related, as well as predicting possible transmission routes [8,9]. A dominant method of transmission of M. abscessus remains contested [10,11], with evidence for and against patient-to-patient transmission being the common route [8,[12][13][14]. Mycobacterium abscessus is ubiquitous in the environment, with its niche hypothesized to be free-living amoeba [15,16], but due to the difficulties in isolating the organism, little has been done to track environment-to-patient acquisition. The confirmation of direct patient-to-patient transmission is important, as it influences the management of highrisk patients and it could increase the effectiveness of infection control interventions by directing the use of limited resources.
In this retrospective study, we assessed the utility of using WGS to characterize subspecies, antimicrobial resistance (AMR) profiles, and typing of M. abscessus isolates. We also wanted to utilize the data to investigate the scale of patient-topatient transmission and whether identification of single nucleotide variants (SNVs) by WGS can confirm transmission. To do this, we sequenced the genomes of 145 M. abscessus clinical isolates from a well-characterized cohort of 62 patients from 4 hospitals in 2 countries over 16 years.

Patients and Sample Collection
We collected 33 M. abscessus isolates from 30 patients at Hospital de la Santa Creu I Sant Pau (bcn_hsp), Hospital Clínic (bcn_hcl), and Hospital Vall d'Hebron (bcn_hvh) in Barcelona, Spain, and 112 isolates from 32 patients from Great Ormond Street Hospital (GOSH) in London, United Kingdom (Table 1). At GOSH, CF patients were screened for nontuberculous mycobacterial infections when attending clinics, as part of their routine management. In addition to this, CF patients and other patients at all hospitals included in this study were screened for nontuberculous mycobacterial infections when they presented with suggestive clinical symptoms or exacerbations. Demographic and patient location data were obtained from the patient administration system and microbiological data were obtained from the laboratory information management system using Structured Query Language (SQL) and Excel spreadsheets. Additional sources of information included CF and transplant databases. American Thoracic Society consensus guidelines were used to verify evidence of nontubercuolous mycobacterial infections [17]. All investigations were performed in accordance with the Hospitals Research governance policies and procedures.

DNA Extraction, Whole-genome Sequencing, and Multi Locus Sequence Typing
Information on DNA extraction, whole-genome sequencing, and MLST are included in the Supplementary Methods.

Read Mapping and Variant Calling
Sequenced reads for all samples were first mapped to M. abscessus subsp. abscessus ATCC 19977 using BBMap v37.90 (Joint Genome Institute). SNVs were called against the reference genome using freebayes v1.2.0 [18], and variants were filtered to only include those at sites with a mapping quality >30, a base quality >30, and at least 5 supporting reads, where the variant was present on at least 2 forward and reverse strand reads and present at the 5' and 3' end of at least 2 reads.

Phylogenetic Analysis
Potential regions of recombination were identified from the consensus genome sequences using Gubbins v2.3.1 [19]. Regions within the genome with low coverage (<5x) were masked on a per sample basis and regions with low coverage across 75% of samples were masked across the entire data set. A maximum likelihood tree was inferred from all samples using RAxML v8.2.4 [20], using a General Time Reversible (GTRCAT) model with 99 bootstraps. Subspecies were identified for each sample based on their position upon this tree.
Separate subtrees were also inferred for M. abscessus subsp. massiliense sequences, as well as for M. abscessus subsp. abscessus ST-1 and ST-26 sequences. All samples in each subtree were mapped against a suitable reference. Mycobacterium abscessus subsp. massiliense str. GO 06 was used as the reference sequencing for study massiliense sequences, and the de novo assembly of the earliest ST-26 study sequence (ldn_gos_2_520) was used as a reference for other ST-26 samples. Mycobacterium abscessus subsp. abscessus ATCC 19977 was again used as the reference for ST-1 sequences, as it is the same sequence type. All subtrees were generated using the same method outlined above, apart from the ST-26 subtree, which did not use Gubbins: instead, variants were filtered if 3 SNVs were found within a 100 bp window.

Sequence Clusters
Sequence clusters to infer possible transmission were generated using 3 different methods on each subtree. First, we used an SNV threshold that was based on the upper bounds of all withinpatient diversity and was applied to complete linkage hierarchical clustering based on a pairwise SNV matrix. Secondly, we assigned clusters using the R package rPinecone, as it incorporates SNV thresholds and root-to-tip distances and has been useful when applied to clonal populations [21]. Lastly, we also used hierarchical Bayesian Analysis of Population Structure (hierBAPS) [22] to assign clusters; however, due to the fact that all samples are included in the sequence clusters, we found it was not appropriate for this study question. We made the assumption that any strains taken from different patients that were within-sequence clusters constituted possible transmission events.

De Novo Assembly
All samples underwent de novo assembly of bacterial genomes using St. Petersburg genome Assembler (SPAdes) and pilon, wrapped in the Unicycler v0.4.4 package [23]. Assembled contigs were annotated using prokka v1.13 [24] and comparison of the accessory genome was generated using roary v3.12.0 [25]. To generate a list of genes that could be used to differentiate isolates, we filtered the annotated genes to remove coding sequences greater than 8000 bp and less than 250 bp, as well as those only present in a single sample and those present in every sample.

Possible Transmission Within Mycobacterium abscessus Clusters
To confirm possible transmission between patients, we required their isolate genomes to be clustered together by 2 independent methods and epidemiological evidence that both patients were at the same hospital during the same time period. Using WGS data, we inferred a phylogenetic tree from a reference genome   SNV matrix for all patients (Figure 1). We observed 2 lowvariant clusters of isolates that corresponded to ST-1 and ST-26 Pasteur MLST profiles (VNTR II and I, respectively), as well as other closely related M. abscessus subsp. massiliense isolates between patients. We used an SNV matrix from mapping against a reference (M. abscessus subsp. abscessus ATCC19977), as well as hierBAPS and rPinecone, to predict sequence clusters. The sequence clusters generated from the single-reference SNV matrix provided no further information than the MLST profiles and, in many cases, provided spurious findings with large groups of isolates clustered with no epidemiological link (Supplementary Figure 1). This included large sequence clusters relating to a single MLST type, which included isolates from different hospitals and countries.
Mapping to a single reference genome led to the inability of a single SNV cut-off, or model, to exclude unrelated isolates from sequence clusters, because the number of pairwise SNV distances varied greatly between both subspecies and specific lineages ( Figure 2

Subtree Sequence Clusters
The variation in the scale of diversity within subspecies and sequence types hampered efforts to capture possible transmission events. In order to improve the accuracy of sequence clustering, multiple subtrees were made for closely related isolates using a more suitable reference sequence. We separated M. abscessus subsp. abscessus and M. abscessus subsp. massiliense isolates, as well as further subtrees for ST-1 (VNTR II), ST-26 (VNTR I), and ST-23/ST-48 (VNTR III) isolates. We also integrated the presence of accessory genes when interrogating possible sequence clusters for transmission (Figures 3,  4, & 5). Sequence clusters were assigned for each subtree using both a single SNV threshold (Supplementary Figure 2) and rPinecone. Overall, we found that predicting transmission from the subtrees reduced the number of different patients clustered together from 46 to 19 and reduced the number of possible sequence clusters suggesting patient-to-patient transmission from 11 to 7.
A total of 18 sequence clusters (I-XVIII) were identified (listed in Supplementary Table 1): 15 of these were within the subtrees (I-XV), and 7 clusters contained samples from more than 1 patient (IV, V, VI, VIII, XIV, XVI, & XVII). We found no sequence clusters that contained samples from both the United Kingdom and Spain. We found no evidence of transmission between patients within ST-26 ( Figure 3). Within ST-1, 4 clusters (IV, V, VI, and VIII) containing samples from more than 1 patient were found. Of these, 3 clusters (IV, V, and VI) contained isolates from 9 patients from multiple hospitals within Barcelona. Only 2 of these patients were in hospital during the same time period (cluster VI: bcn_hcl_009 and bcn_hvh_30), but both were treated in different hospitals. Cluster VIII suggested transmission between 2 patients (ldn_gos_18 and ldn_gos_19) who were siblings and were previously assumed to have been infected either through direct transmission or a common reservoir ( Figure  4) [13]. A single cluster (XIV) containing samples from 2 patients (ldn_gos_46 and ldn_gos_7) was found among ST-23 isolates. However, the 2 strains were isolated from samples taken 9 years apart ( Figure 5). Patient ldn_gos_7 was already positive for M. abscessus on her first admission to GOSH and the 2 patients were present at the lung function lab within a month of each other on 2 occasions, but were never in the same location on the same day and were never admitted to the same ward.
All samples found within their respective clusters also contained similar accessory gene profiles, with the median shared percentage of accessory genes within a sequence cluster being 89% (IQR 79-94%), compared to 18% (IQR 12-37%) for isolates not in the same sequence cluster.
For the 32 GOSH CF patients included in the study, 16 became infected with M. abscessus after their first visit to the clinic (Table  1); however, transmission could only be confirmed by both WGS and epidemiological data in 1 case (ldn_gos_19), thus suggesting a different route of acquisition for the rest of these patients.

DISCUSSION
This study has shown that WGS of M. abscessus isolates can determine subspecies, identify previously reported AMRassociated mutations, and provide common typing definitions in a single workflow. This single method can replace the multiple existing molecular assays used in clinical microbiology laboratories to provide the same information and could be used to predict novel resistance variants [26]. We used the WGS data to investigate the likelihood of cross-transmission and found 43 (69%) patients had unique isolates that did not cluster with other patients. We identified 7 sequence clusters from the remaining 19 patients, but only 1 pair of patients (ldn_gos_18 and ldn_gos_19) had a plausible epidemiological link to support possible patient-to-patient transmission, as they were siblings. All other patients with genetically similar strains were either isolated in different countries or different hospitals or were isolated from samples that were taken years apart, making direct transmission of these strains extremely unlikely.
Every M. abscessus isolate from a GOSH patient was sequenced, so the data set generated represents a complete picture of M. abscessus infection in this hospital, which is vital for inferring transmission. Most of these patients were only attending clinics at GOSH; therefore, this study has captured all of their M. abscessus isolates and they are unlikely to have been in contact with M. abscessus-positive patients at other hospitals (Table 1). Therefore, if direct patient-to-patient transmission was occurring frequently, we would expect to see evidence of it here. In contrast to this, we found that the majority of patients in this study had unique strains and the majority of sequence clusters were multiple isolates from the same patients. This study confirms previous findings that, despite many M. abscessus-negative patients spending considerable time on the same wards as patients with ongoing M. abscessus infections, they did not subsequently acquire genetically similar isolates [13,14,27]. We have, therefore, found that a fixed number of SNVs cannot be reliably used to infer cross-transmission across all M. abscessus isolates, as there seem to be irreconcilable differences in the substitution rate between both subspecies and dominant clones. These difficulties are similar to those seen in Legionella pneumophila outbreaks, where the majority of cases can belong to only a few sequence types [25]. Legionella pneumophila can also display different scales of genetic diversity within different sequences or genotypes, indicating that a single SNV threshold cut-off will not provide sufficient discriminatory power [26]. When using WGS to infer relatedness in M. abscessus, there has previously been an attempt to find an absolute threshold which can rule in or rule out strains in a transmission event. This has previously been placed as below [25][26][27][28][29][30]14,28,29]. From our findings, we would advocate using a suitable, genetically similar reference sequence when carrying out core genome SNV calling, especially for the dominant clones, such as ST-1 and ST-26. There is a large amount of variation within the genomes of M. abscessus [30], so the use of a single reference, such as M. abscessus subsp. abscessus ATCC 19977, will mask many differences between strains and generate spurious clusters of genetically similar sequences. Where a suitable reference is not available, we recommend using a high-quality draft de novo assembly of the first isolated sample to compare other isolates against, as in the example of the ST-26 samples in this study (Figure 3).
In addition to conducting the core genome SNV analysis, we also found that the integration of accessory genome information is a useful indicator of relatedness within M. abscessus isolates that can be used to further interrogate assigned sequence clusters. Generally, there was good concordance between the proportion of putative genes shared and the SNV distance between 2 samples. This was helped by using closely related reference sequences to map sequence reads against. We have seen in this study and previously [31] that there is diversity in the accessory genome profiles-as well as in the number of SNPs and AMR-associated mutations-taken from multiple samples from the same patient on the same day. However, we have always found interpatient diversity to be greater than that seen within the same patient. This would suggest that any direct transmission between patients of even minority populations would still be identified by WGS and, taken together, the data suggest that person-to-person transmission of M. abscessus in pediatric patients in our institution is very uncommon. In this study, we have an example of 2 patients with transmission predicted by genomic epidemiology (ldn_gos_7 and ldn_gos_46) that had attended a lung function laboratory on 3 occasions within a month of each other. In this case, the only way transmission could have occurred would be if ldn_gos_7, who was already infected, contaminated the environment and the infection was then transmitted to ldn_gos_46. The predominant view [8] that human-to-human transmission occurs via contamination of fomites by respiratory secretions could explain this, although no other instances of this appeared to have occurred, despite numerous other CF patients attending the unit over many years. What is harder to explain is that, for this to be the case, the interval between exposure and culture positivity was 9 years. It could be that M. abscessus remains present but undetectable by conventional methods for this time period or, intriguingly, could cause a latent infection, like what occurs with Mycobacterium tuberculosis. To the best of our knowledge, this has never been a demonstrated part of the pathogenesis of M. abscessus infection, and may be worthy of further investigation.
In agreement with previous studies, we found an international distribution of M. abscessus-dominant clones [8]. We found WGS to be useful to confirm whether different patients' strains are unrelated, even within the dominant clones, but it has been far more difficult to reach definite conclusions about cross-transmission. Without environmental samples, we cannot rule out the possibility of intermediate sources of infection; therefore, WGS as a tool for tracking cross-transmission in M. abscessus will only realize its full potential with the proper screening of environmental sources, alongside longitudinal patient sampling.

Supplementary Data
Supplementary materials are available at Clinical Infectious Diseases online. Consisting of data provided by the authors to benefit the reader, the posted materials are not copyedited and are the sole responsibility of the authors, so questions or comments should be addressed to the corresponding author.