Skin Metagenomic Sequence Analysis of Early Candida auris Outbreaks in U.S. Nursing Homes

ABSTRACT Candida auris is a human fungal pathogen classified as an urgent threat to the delivery of health care due to its extensive antimicrobial resistance and the high mortality rates associated with invasive infections. Global outbreaks have occurred in health care facilities, particularly, long-term care hospitals and nursing homes. Skin is the primary site of colonization for C. auris. To accelerate research studies, we developed microbiome sequencing protocols, including amplicon and metagenomic sequencing, directly from patient samples at health care facilities with ongoing C. auris outbreaks. We characterized the skin mycobiome with a database optimized to classify Candida species and C. auris to the clade level. While Malassezia species were the predominant skin-associated fungi, nursing home residents also harbored Candida species, including C. albicans, and C. parapsilosis. Amplicon sequencing was concordant with culturing studies to identify C. auris-colonized patients and provided further resolution that distinct clades of C. auris are colonizing facilities in New York and Illinois. Shotgun metagenomic sequencing from a clinical sample with a high fungal bioburden generated a skin-associated profile of the C. auris genome. Future larger scale clinical studies are warranted to more systematically investigate the effects of commensal microbes and patient risk factors on the colonization and transmission of C. auris. IMPORTANCECandida auris is a human pathogen of high concern due to its extensive antifungal drug resistance and high mortality rates associated with invasive infections. Candida auris skin colonization and persistence on environmental surfaces make this pathogen difficult to control once it enters a health care facility. Residents in long-term care hospitals and nursing homes are especially vulnerable. In this study, we developed microbiome sequencing protocols directly from surveillance samples, including amplicon and metagenomic sequencing, demonstrating concordance between sequencing results and culturing.

microbial pathogens, and the only fungal species, rated by the U.S. Centers for Disease Control and Prevention (CDC) at the highest level of urgent threat in the 2019 antibiotic resistance report (3). Over one million invasive fungal infections occur globally, increasingly from multidrug-resistant C. auris (4).
C. auris was first identified in 2009 from a clinical culture of a Japanese patient's external ear canal (5), and in the ensuing decade, four main independent clades of C. auris have been identified from four distinct regions, clade I from South Asia, clade II from East Asia, clade III from Africa, and clade IV from South America (6)(7)(8)(9)(10). C. auris isolates are commonly azole resistant, with some strains resistant to all three classes of antifungals (azoles, amphotericin B, and echinocandins) (2,11). As the global emergence of invasive infections was recognized, both the CDC (12) and Public Health England (13) issued multiple clinical alerts.
Approximately half of the C. auris clinical cases in the United States have been in the bloodstream, but skin sites (axilla [Ax] and groin [G]) and anterior nares (N) are considered the primary sites of asymptomatic colonization (14). C. auris colonizes and is shed from human skin, persists on environmental surfaces (15,16), and is resistant to certain standard cleaning products, enabling its spread through health care settings. Identifying C. auris-colonized patients is a crucial component of infection control since these patients are at increased risk for developing a subsequent bloodstream infection, and they pose a risk for transmission to other patients in the ward.
Outbreaks reported in multiple countries, including American hospitals and nursing homes (7,(17)(18)(19), have proven difficult to control. Long-term acute care facilities and nursing homes with ventilator beds are particularly vulnerable to C. auris outbreaks since residents stay for long periods of time and have complex care needs. Furthermore, these facilities often do not have the infection control resources or structural protections (e.g., single-occupancy rooms) that are available in short-stay acute care hospitals. These factors underlie the increased risk to residents of health care-associated microbial infections, often by multidrug-resistant pathogens (20).
Human skin of young adult healthy volunteers is inhabited by a diverse community of bacteria (21), fungi (22), and viruses (23). As the first line of defense, the skin harbors trillions of commensal microbes essential for colonization resistance (24,25). Disruption of the commensal microbial community might provide opportunities for pathogens to colonize and infect the host (26). Amplicon and shotgun metagenomic sequencing have enabled the characterization of microbial communities. Specifically, amplifying and sequencing the bacterial 16S rRNA gene or the fungal internal transcribed spacer 1 (ITS1) region directly from a clinical or environmental sample enables the characterization of these communities, typically at the genus level. Achieving higher level resolution from the 16S rRNA gene or ITS1 region is dependent on the microbial genus and the sequencing strategy but requires custom analysis with curated reference genomes (22,27). Shotgun metagenomic sequencing enables further investigation of a community's functional potential and may achieve species or strain level resolution, but analysis is limited for rare constituents and for uncultivated strains lacking reference genomes.
A collaboration between the CDC and the National Institutes of Health (NIH) aimed to develop microbiome sequencing protocols, including amplicon and metagenomic sequencing, directly from patient samples at health care facilities with ongoing C. auris outbreaks. First, we modified laboratory protocols to obtain skin-associated microbiome DNA directly from surveillance samples, obtained as part of an ongoing point prevalence study. After DNA sequencing, we implemented a strategy to classify Candida ITS1 sequences to the species level. In addition, by leveraging high-confidence differences in the ITS1 region, we accurately classified the clade of C. auris directly from amplicon sequencing data. The skin ITS1-based mycobiome of patients from multiple health care facilities, including early C. auris outbreaks, showed elevated skin colonization by Candida species, including C. auris and C. albicans, concordant with Candida culturing results. We also demonstrated the possibility of metagenomic sequencing directly from surveillance samples to characterize C. auris colonizing the skin of a patient with high fungal abundance.

RESULTS
When patients colonized with Candida auris were identified in Illinois and New York health care facilities in 2017, local health departments launched point prevalence surveillance surveys to identify any additional residents colonized by C. auris. C. auris surveillance was a combined axilla/groin swab, but anterior nares was also considered a possible reservoir for these studies. These clinical samples were cultured on selective media, and individual colonies were confirmed as C. auris by matrix-assisted laser desorption ionization-time of flight mass spectrometry (MALDI-TOF MS) (15). To generalize our study on the skin microbiome of patients colonized with C. auris, we sequenced eight or nine patient samples from each of two ventilator-capable skilled nursing facilities (vSNF). These facilities provide enhanced care, including mechanical ventilation, to patients receiving treatment for acute and chronic health conditions. Facilities in New York (region A) and Illinois (region B) had C. auris positive colonization rates of 87.5% and 11.1%, respectively. We obtained similar clinical samples from 10 control patients in health care facilities in California serving a similar demographic, but with no history of C. auris.
To classify the ITS1 reads derived from amplicon sequencing, we first improved a custom ITS1 database by incorporating ITS1 sequences and their annotations from ITSoneDB (22,28). Next, we downloaded the complete ITS1 sequences of 14 common and clinically relevant Candida species from public databases. Single nucleotide variants (SNVs) within the ITS1 sequences were used to distinguish C. auris clades I/III, II, and IV (Fig. 1A).
The skin microbiomes of nursing home residents taken as controls from the unaffected California facility were dominated by Malassezia species. Overall, Malassezia spp. constituted .80% of the fungal relative abundance in 22 of 25 samples; 7 of 25 samples harbored .5% Candida spp., and 5 of these 7 were samples from the groin (Fig. 1B). Three groin samples had high representation of non-Malassezia fungi: two were predominated by C. albicans and one with Diutina mesorugosa (previously classified as Candida rugosa). No C. auris positive samples or ITS1 reads mapping to C. auris were identified from these samples, consistent with surveillance culturing.
In contrast, the skin of residents in region A was dominated by Candida species, with four samples dominated by C. albicans, two samples by C. auris, one sample by a combination of C. albicans and C. auris, and one sample by C. dubliniensis (Fig. 1C). Reads matching C. auris were obtained in seven of the eight residents sampled, ranging from less than 1% to greater than 50% relative abundance ( Fig. 1C and Table 1). In all cases, sequencing results were consistent with culturing results. Residents in region B with a low C. auris positivity rate exhibited similar skin fungal profiles as the control facility, with six out of nine samples dominated by Malassezia spp., two samples by C. albicans, and one sample by C. parapsilosis (Fig. 1C). Again, sequencing results recapitulated culturing results, identifying one of the nine residents as colonized with C. auris ( Fig. 1C and Table 1). Clade identity derived from ITS1 sequencing confirmed that residents in region A carried clade I, while residents in region B carried clade IV, matching genomic sequencing of isolates from these areas ( Table 2) (14).
The bacterial communities of control samples from axilla and nares were dominated by skin commensals, predominantly Corynebacterium, Cutibacterium, and Staphylococcus, and five out of nine groin samples were dominated by Proteobacteria ( Fig. 2A). Similarly, bacterial communities on the skin of region B residents were dominated by Corynebacterium and Staphylococcus, with some representation by Proteobacteria (Fig. 2B). However, the skin of region A residents with high C. auris positivity was colonized by species more commonly considered health care-associated pathogens, including Acinetobacter, Escherichia/Shigella, Klebsiella, Morganella, Proteus, and Pseudomonas (Fig. 2B).
In order to explore the possibility of performing metagenomic sequencing directly from surveillance samples and also to investigate potential adaptation to human skin by C. auris, we performed shotgun metagenomic sequencing of four additional samples, all PCR positive for C. auris, including samples from additional skilled nursing homes in regions A and B. The microbiomes of all four metagenomic samples were highly dysbiotic, dominated by hospital-associated bacterial species like Acinetobacter baumannii, Klebsiella pneumoniae, and Enterococcus faecalis, similar to what was seen in region A samples by V1-V3 sequencing (data not shown).
For healthy volunteers, fungi typically make up ,5% of the microbiome at the sampled body sites (22). Similarly, for these samples, fungi represented less than 5% of the microbial community, with a C. auris relative abundance of 0.25 to 0.73% with one exception. One sample (B12_Ax/G; SRA accession no. SRX5795138) had a high fungal burden and high overall C. auris relative abundance (14.24% of reads). We mapped these reads to a high-quality reference genome derived form a 2019 Illinois region clade IV isolate (ACUZR; BioSample accession no. SAMN18141223), which was reported to be susceptible to the commonly used antifungal agents fluconazole, amphotericin B, and micafungin (29). Read coverage of the reference was ;70Â, allowing us to confidently call variants between the patient metagenome and our reference. We detected 49 variants, 42 of which were predicted to be in noncoding regions. The seven variants in coding regions (six missense and one synonymous) were all found in genes encoding hypothetical proteins.

DISCUSSION
Over the last decade, Candida auris has rapidly emerged as a serious threat for vulnerable patients with immunodeficiency and/or other preexisting conditions, especially in long-term care facilities, such as long-term post-acute care hospitals and nursing homes, with recent clusters also occurring in acute care facilities (30,31). Surveillance for C. auris   (22), with little if any Candida spp. However, patients in long-term post-acute care hospitals and skilled nursing facilities tend to harbor more Candida spp., especially in facilities with C. auris outbreaks (e.g., region A), raising the possibilities that cocolonization by multiple Candida species may be an underlying risk factor for C. auris colonization or that a skin microbiome disruption leading to Candida dominance may precede C. auris colonization. A larger study of surveillance samples from facilities with C. auris in the NY region also cultured multiple Candida species from skin, including C. albicans, C. glabrata, C. tropicalis, and C. parapsilosis (19). Skin bacterial communities of healthy adults are dominated by Corynebacterium, Propionibacterium, Staphylococcus, and Streptococcus (23), with very little if any Proteobacteria. However, patients in these ventilated skilled nursing facilities tend to harbor more Proteobacteria species that are potential pathogens associated with nosocomial transmission, especially in the facility with more severe C. auris outbreaks (e.g., region A).
A limitation of this study is that samples provided for sequencing were unlinked to individual clinical metadata. As such, we are unable to generate or test hypotheses linking antimicrobial receipt, underlying disease state, or other factors to C. auris burden. Exposure to antimicrobials was associated with an increased odds of C. auris colonization in a recent study of long-term care patients (32).
Amplicon sequencing provides cross validation for culturing and insight into the skin fungal and bacterial community profiles that cannot be easily generated by culturing. Amplicon sequencing may be more comprehensive in detecting the full constituency of the fungal community rather than pursuing multiple culture conditions ( Table 1). We were able to classify human skin-associated ITS1 reads to the genus level for most fungi, to the species level for most Candida reads, and even to the clade level for C. auris. The potential clade V C. auris, identified from Iran (33), has an ITS1 sequence that is distinguishable from other clades, upholding the potential for ITS1 sequencing to classify C. auris reads to the clade level. Metagenomic sequencing opens up another level of possibility for integrating surveillance data with host microbial profiles and functional relevance of commensal microbes. We are also aware of the challenges posed by the low biomass of skin microbiome samples, so recovery of sufficient number of fungal reads may pose future challenges. From the clinical sample with sufficient C. auris bioburden, 49 high-coverage sequence variants were identified. Most variants (42/49) were predicted to be in noncoding regions of the genome. The remaining seven variants were in genes encoding hypothetical proteins, highlighting the need for improved fungal annotation. Future larger scale clinical studies are warranted to more systematically investigate the effects of commensal microbes and patient risk factors on the colonization and transmission of this global public health threat.

MATERIALS AND METHODS
Point prevalence survey of the health care facilities. The ventilator-capable skilled nursing facilities in regions A and B had already reported positive cases of Candida auris when the surveys were performed. Samples used in this study were collected as part of a point prevalence study. Sharing leftover clinical samples with NIH colleagues for protocol development and microbial analysis was reviewed and approved by the Centers for Disease Control and Prevention Institutional Review Board. Samples were collected in BD ESwabs (BD Biosciences) in Amies medium, and an aliquot was stored at 280°C. Body sites sampled include combined axilla and inguinal crease (groin) and anterior nares.
Nursing homes in a region with no reported cases of C. auris served as a control. Swabs of axilla, inguinal crease (groin), and nares were collected for skin microbiome sequencing by a standard method (23), and samples were stored at 280°C before processing.
DNA extraction, library preparation, and amplicon and metagenomic sequencing. (i) DNA extraction. For samples collected with BD ESwabs, 100 ml of each specimen in Amies medium was transferred to a 2.0-ml tube containing 1-mm glass beads (MPBio Matrix C), 300 ml yeast cell lysis buffer (Lucigen), and 10,000 (10K) units of Readylyse (Lucigen). For the samples collected directly in yeast cell lysis buffer by the standard method, 1-mm glass beads and 10K units of Readylyse were added. After incubation at 37°C for 1 h, samples were mechanically disrupted using a Tissuelyser (Qiagen) for 2 min at 30 Hz, cooled for 2 min, and disrupted for an additional 2 min at 30 Hz. Samples were then incubated for 30 min at 65°C for complete lysis and MasterPure Complete reagent (Lucigen) was added. The resulting supernatant was processed using the Purelink Genomic DNA kit (Life Technologies). DNA was eluted in DNA-free PCR water (Qiagen).
Custom fungal ITS1 reference database update. All fungal ITS1 sequences from ITSoneDB (version 1.131) were downloaded, with both ENA (European Nucleotide Archive) and HMM (hidden Markov model) annotations (28). Additional C. auris sequences were added to our curated ITS1 database (22), to include all four clades of C. auris. The curated ITS1 database was subjected to a search against the ITSoneDB sequences with the basic local alignment tool (BLAST), and the entries that had matches (alignment length of $50 bp and percent identity of $70%) to both the ENA and HMM annotations of the same record were kept (n = 20,168). The rationale was that, if our curated ITS1 sequence matches both the ENA and HMM annotations of the same ITSoneDB record, this sequence is most likely an authentic ITS1 sequence. ITS1 entries that had a match (alignment length of $50 bp and percent identity of $70%) to an ITSoneDB ENA annotation with the same genus name were also kept (n = 337), taking into account teleomorphic and anamorphic names of the same genus. A total of 20,482 out of 23,458 sequences (87%) from the curated Findley et al. ITS1 database (22) were kept. Most of the dropped sequences represented the ITS2 region. In addition, ITS1 sequences of .1 kb were removed and replaced with the longer sequence of ITSoneDB ENA and HMM annotations of #1 kb. The remaining ITSoneDB sequences (n = 35,188) were clustered with CD-HIT (36) at 95% identity, resulting in 5,547 sequences, which were added to the final database. Taxonomy strings from the NCBI GenBank annotations of the ITSoneDB entries were parsed to match the laboratory ITS1 sequence format. Finally, laboratory Malassezia sequences from several species derived from Sanger sequencing were clustered at 95% identity and added to the final database, which included 25,312 sequences.
Candida species database and clade level variation for C. auris. Sequences containing complete ITS1 regions from 14 common and clinically relevant Candida species were downloaded from NCBI: Candida albicans, Candida auris, Candida dubliniensis, Candida glabrata, Candida haemulonii (Candida haemulonis), Candida krusei (Pichia kudriavzevii/Issatchenkia orientalis), Candida lusitaniae (Clavispora lusitaniae), Candida metapsilosis, Candida orthopsilosis, Candida parapsilosis, Candida pseudohaemulonii, Candida rugosa, Candida sake, and Candida tropicalis. Candida duobushaemulonii has the same ITS1 sequences as Candida haemulonii and is therefore not included as an independent species in the database. Strict ITS1 sequences were derived from the ENA and HMM annotations of the 14 species in ITSoneDB. NCBI sequences were subjected to a search with BLAST against the strict ITS1 sequences, and flanking sequences were trimmed. The strict ITS1 sequences were clustered with CD-HIT at 100% and combined into the final Candida species database. To validate the Candida species database, first reference sequences in the database were mapped to the database itself. Then, additional testing was performed by introducing random mutations throughout the ITS1 sequences (1% error rate) to simulate sequencing errors and mapped against the original database. In both scenarios, classification accuracy was 100%. A fungal mock community (FMC, courtesy of G. Giannoukos, Broad Institute) was also used to verify the classification accuracy of the Candida species database. The three Candida species (C. albicans, C. tropicalis, and C. lusitaniae) in the FMC were correctly identified using the database across 13 FMC samples. SNVs within the ITS1 sequences were used to distinguish C. auris clades I/III, II, and IV (Fig. 1A). See "Data availability" below for access to database files and scripts.
Metagenomic sequencing. Libraries were generated from genomic DNA with Nextera XT library preparation kit (Illumina) per the manufacturer's recommendations. Sequencing was performed as previously described (23). Paired-end 151-bp reads with a target of 15 million to 50 million clusters per sample were sequenced on an Illumina NovaSeq 6000 instrument.
Amplicon and metagenomic sequencing data analysis. ITS1 MiSeq sequences were demultiplexed, trimmed of barcodes and primers, trimmed to 200 bp, and subsampled to 5,000 reads per sample. Samples without at least 1,000 reads were not included in the analysis. Reads were first subjected to a search using BLAST against the updated fungal ITS1 database described above (E-value cutoff, 1e26), and reads classified in the Candida genus were further subjected to a BLAST search against the Candida species database (evalue cutoff, 1e26). At the genus level, reads that had a match with at least 60% coverage and 70% identity against a reference were classified. At the species level, reads that had a match with at least 50 bp and 95% identity against a reference were classified. At the C. auris clade level, reads that had a match with at least 70 bp and 95% identity were classified.
16S V1-3 MiSeq sequences were processed using the mothur pipeline (37), version 1.39.1, as previously described (38,39). Briefly, 16S sequences were preprocessed to remove primers and barcodes, aligned to the Silva bacteria rRNA database (version 128) and subsampled to 5,000 sequences per sample. Samples without at least 5,000 reads were not included in the analysis. Sequences were preclustered, and chimeras from PCR artifacts were identified and removed using VSEARCH in mothur (40). Remaining sequences were classified to the genus level using RDP training set 16.
For the metagenomic sequencing data, adapters were trimmed using cutadapt (https://github.com/ marcelm/cutadapt), low-quality reads were removed using prinseq-lite (-lc_method entropy -lc_threshold 70 -min_len 50 -min_qual_mean 20 -ns_max_n 5 -min_gc 10 -max_gc 90) (41) and host (human; GRCh38) reads were removed. Reads were classified using the Clinical Pathoscope pipeline (42) with default parameters. Total fungal reads and C. auris reads were summed for each patient sample to obtain the fungal and C. auris relative abundance, respectively. SNVs were called against a haploid 12.5 Mb C. auris clade IV genome (ACUZR; BioSample accession no. SAMN18141223) using Snippy, version 4.4.1 (with the BWA-MEM algorithm, version 0.7.17, and FreeBayes software, version 1.3.1-dirty; https://github.com/tseemann/snippy). Paired-end reads were used by Snippy with default parameters, except -minfrac 0.01. Gene annotation was transferred from the annotated Candida auris strain B11243 (PYGM00000000.1) to ACUZR. Briefly, the B11243 assembled contigs were individually aligned to the ACUZR finished chromosomes, and annotation was transferred using the Import Features function in the sequence alignment window of the GenBank submission software Sequin (version 15.10). Annotation was validated in Sequin and manually inspected and corrected where applicable. A custom Perl script yielded the genetic consequences of the identified variants.
Data availability. Database files (ITS and Candida-specific), scripts and a description of their usage can be found at https://github.com/skinmicrobiome/fungal_toolkit_XH. Data are available under NCBI BioProject accession no. PRJNA657014. For metagenomic reads, the human reads were removed (mapped against the GRCh38 human genome build), and only reads that positively map to a microbial genome in the reference genomes (as in reference 39) were retained for analysis and deposited in the SRA: SRX5795136 to SRX5795139.

ACKNOWLEDGMENTS
This work was supported by the Intramural Research Programs of the National Institutes of Health (NIH) National Institute of Arthritis and Musculoskeletal and Skin Diseases and National Human Genome Research Institute.
The computational resources of the NIH High-Performance Computation Biowulf Cluster (http://hpc.nih.gov) were used for this study.
The findings and conclusions in this report are those of the authors and do not necessarily represent the official position of CDC.