Genomic characterization of the C. tuberculostearicum species complex, a prominent member of the human skin microbiome

ABSTRACT Corynebacterium is a predominant genus in the skin microbiome, yet its genetic diversity on the skin is incompletely characterized and underrepresented in public databases. We investigated the distribution of Corynebacterium species on the skin and expanded the existing genome reference catalog. We used extant V1-V3 16S rRNA gene sequencing data from 14 body sites of 23 healthy volunteers to characterize Corynebacterium diversity across human skin. Corynebacterium tuberculostearicum, recently proposed to belong to a species complex, is frequently found on human skin. We identified two distinct C. tuberculostearicum ribotypes (A and B) that can be distinguished by variation in the 16S rRNA V1-V3 sequence: ribotype A is distributed across all skin sites, while B is found primarily on the feet. We performed whole genome sequencing of 40 C. tuberculostearicum isolates cultured from the skin of five individuals across seven skin sites. We generated five closed C. tuberculostearicum genomes and determined that they are largely syntenic and carry a diversity of methylation patterns, plasmids, and CRISPR/Cas systems. The pangenome of C. tuberculostearicum is open with a core genome size of 1,806 genes and a pangenome size of 5,451 genes. This expanded pangenome enabled the mapping of 24% more C. tuberculostearicum reads from skin metagenomes. We demonstrated differential growth phenotypes of C. tuberculostearicum ribotypes A and B on rich and skin-like media, suggesting functional differences. Finally, while the genomes from this study fall within the C. tuberculostearicum species complex, we propose that ribotype B isolates constitute a putative new species. IMPORTANCE Amplicon sequencing data combined with isolate whole genome sequencing have expanded our understanding of Corynebacterium on the skin. Healthy human skin is colonized by a diverse collection of Corynebacterium species, but Corynebacterium tuberculostearicum predominates on many skin sites. Our work supports the emerging idea that C. tuberculostearicum is a species complex encompassing several distinct species. We produced a collection of genomes that help define this complex, including a potentially new species we term Corynebacterium hallux based on a preference for sites on the feet, whole-genome average nucleotide identity, pangenomic analysis, and growth in skin-like media. This isolate collection and high-quality genome resource set the stage for developing engineered strains for both basic and translational clinical studies.

microbiome.While Corynebacterium has been positively correlated with the resolution of dysbiosis associated with eczema flares (4), the importance of the Corynebacte rium spp. is less defined for skin disease severity in primary immune deficient patients (5,6).Corynebacterium spp.are predominant members of the human aerodigestive tract microbiome (nares, oral cavity, and respiratory tract) (3) and participate in microbemicrobe interactions with members of the nasal microbiome (7,8).Corynebacterium has been shown to engage with the host immune system, specifically Corynebacte rium accolens-promoted IL23-dependent inflammation in mice on a high-fat diet (9).Corynebacterium bovis and Corynebacterium mastiditis have been shown to predominate the microbiome of an ADAM10-deficient mouse model (10) as well as an ADAM17-deficient mouse model of eczema (11).Finally, Corynebacterium tuberculostearicum (ATCC 35533) has been shown to induce inflammation in human epidermal keratinocyte cell cultures (12).These studies establish Corynebacterium spp. as key members of the skin microbiome capable of both microbe-microbe and microbe-host interactions.
A critical resource for understanding the biology of Corynebacterium on the skin is a robust collection of complete reference genomes, including isolates collected from a variety of individuals and body sites.Previously published genome collections from skin-or nares-resident species include Staphylococcus epidermidis (13), Cutibacterium acnes (14), and the recent comparative analysis of Dolosigranulum pigrum (15).Of note, while emerging bioinformatic methods and pipelines are now being employed to extract nearly complete genomes (metagenome assembled genomes [MAGs]) from metage nomic assemblies of skin samples (16), MAGs are not yet a substitute for genomes from cultured isolates to understand strain level or pangenomic diversity.In addition to functional prediction, comparative genomics are increasingly being used to augment conventional microbiological methods to define or redefine taxonomic boundaries (17,18), as well as describe the full extent of diversity within these boundaries (19).A pangenome, which encompasses the complete set of genes present within a set of genome sequences, enables the characterization of gene-level heterogeneity within a taxonomic group.The pangenome is commonly subdivided into the "core" genome, referring to genes present in all strains, and the "accessory" or "dispensable" genome, referring to those present in only one or some isolates.(The accessory pangenome can be further subdivided to reflect a wider range of gene uniqueness, e.g., singletons.)Thorough characterization of taxa is limited by the availability of representative and high-quality genome assemblies.Unfortunately, with the exceptions of clinically relevant Corynebacterium spp.(e.g., Corynebacterium diphtheriae, Corynebacterium striatum, and Corynebacterium pseudotuberculosis), the genus is inadequately sequenced, with 75% of species having fewer than six genomes.This includes common skin-associated species like C. tuberculostearicum with just five unique isolate genomes, only two of which are from the skin.
C. tuberulostearicum was first validated as a species in 2004 (20) and named for its production of tuberculostearic acid.It was described as a Gram-positive rod that was non-motile, lipophilic, oxidase-negative, and catalase-positive.Like other lipophilic Corynebacterium spp., C. tuberulostearicum must obtain lipids from external sources like sebum and the stratum corneum.Recently, C. tuberculostearicum was proposed as a member of a species complex that includes Corynebacterium kefirresidentii as well as other potentially new species (21,22).
In this study, we first characterized the distribution of Corynebacterium spp.across 14 skin sites from 23 healthy volunteers (HVs) using extant 16S rRNA amplicon sequenc ing data.We identify C. tuberculostearicum as the predominant skin Corynebacterium species.We sequenced 23 distinct C. tuberculostearicum species complex strains (n = 40 genomes before dereplication), resulting in a fourfold increase in the number of publicly available, unique genomes.In addition to short-read assemblies, we generated five complete genomes which, along with the type strain (DSM44922), demonstrate that C. tuberculostearicum genomes are largely syntenic and carry a number of methylation systems as well as a CRISPR/Cas system.Genes from the C. tuberculostearicum species complex genomes in our collection fall into 5,451 gene clusters comprising the species pangenome.This expanded pangenome, as compared to existing public references, improved the mapping of metagenomic reads from unrelated HVs.In addition, we identified a distinct clade of C. tuberculostearicum species complex genomes that is highly enriched on the feet and may represent a new species, tentatively designated Corynebacterium hallux.

Corynebacterium spp. are prominent members of the healthy skin micro biome
To explore the tropism of Corynebacterium, we surveyed the microbial diversity of healthy human skin using existing 16S rRNA V1-V3 amplicon sequencing data (5,23).Clinical samples were obtained from 23 HVs across 14 body sites: sebaceous (back, Ba; occiput, Oc; external auditory canal, Ea; retroauricular crease, Ra; manubrium, Mb; glabella, Gb), moist (inguinal crease, Ic; antecubital crease, Ac), dry (hypothenar palm, Hp; volar forearm, Vf ), foot (toe nail, Tn; toe web, Tw; plantar heel, Ph), and (N)ares.An average of 10,000 sequences per sample were analyzed which yielded a total of 8,334 amplicon sequence variants (ASVs), or unique 16S rRNA gene signatures.After rarefying the data set to an even depth, 5,967 ASVs remained.As expected, the top three ASV's taxonomic assignments, present in 94% of skin samples, were Cutibacterium (41% of reads, ASV1 corresponds to C. acnes), Staphylococcus (9% of reads, ASV2 corresponds to S. epidermidis), and Corynebacterium (9% of reads, ASV3 corresponds to C. tuberculos tearicum).The genus Corynebacterium was present in 96% of the skin sites sequenced, averaging 17% of reads.With a preference for moist over sebaceous skin sites (Fig. S1), Corynebacterium thrives in the humid, temperate environments of the feet and nares.

C. tuberculostearicum is the most common skin Corynebacterium
A variety of marker gene approaches have been employed to determine the phyloge netic relationships between Corynebacterium species including combinations of 16S rRNA, rpoB, rpoC, and gyrA genes (for review see reference 24).In general, it is difficult to accurately classify Corynebacterium at the species level using amplicon data and standard reference databases.The Human Oral Microbiome Database (3) is a curated database that includes a training set with a supraspecies taxonomic level enabling the assignment of sequences to multiple species where ambiguity exists.In our case, >99.5% of sequences classified as C. tuberculostearicum using the Refseq classification were also classified as C. tuberculostearicum (part of the accolens/macginleyi/tuberculostearicum superspecies) by expanded Human Oral Microbiome Database (eHOMD).
While a variation in species composition was observed between individuals, some sites and habitats displayed species enrichment at specific locations across multiple individuals (Fig. 1a).We observed that C. accolens was enriched in the nares, with a prevalence of 74% across nares samples and constituted an average of 33%-41% of Corynebacterium reads.Corynebacterium afermentans were enriched across feet sites, where they were present in 54% of samples and comprised an average of 17% of Corynebacterium reads.Most notably, however, we found that C. tuberculostearicum was present in 94% of skin sites and was often the most abundant Corynebacterium.C. tuberculostearicum reads represented 67% of corynebacterial reads in the feet, 47% in dry sites, 58% in sebaceous sites, and 46% in the nares.
The majority of reads assigned to C. tuberculostearicum belonged to two predominant 16S rRNA sequence variants, ASV3 and ASV13 which differed by a single nucleotide polymorphism (SNP) and a single-base insertion/deletion (indel).ASV3 constituted 83% of C. tuberculostearicum-classified reads (compared to <8% for all other ASVs of this species) and showed a cosmopolitan distribution across body sites (Fig. 1b).Found in 100% of HVs (N = 23) and 87% (254/293) of skin samples, ASV3 was predom inant and prevalent across human skin.Relative abundance analysis revealed ASV3 abundance >85% within all habitats except foot sites, where it made up 66% of C. tuberculostearicum-classified reads.As of this writing, the existing C. tuberculostearicum National Center for Biotechnology Information (NCBI) reference genomes containing complete V1-V3 sequences are all ASV3 as are 100% of 16S rRNA gene C. tuberculosteari cum references in the SILVA reference database.
In contrast to cosmopolitan ASV3, ASV13 was enriched primarily on feet, constituting 28% of C. tuberculostearicum-classified reads from the Ph, Tw, and Tn sites (8%, 9%, and 70%, respectively).In 9 of 23 HVs, ASV13 constituted over 90% of C. tuberculostearicummapped reads within a single foot site (Fig. S2); notably, much of this enrichment was observed in Tn sites.In addition, we observed that some individuals exhibited within-site predominance by other less common ASVs, with some individuals colonized by a single non-dominant ASV across multiple body sites.In HV 12, for example, 52%-100% of C. tuberculostearicum-mapped reads in each body site excluding the toenail are classified as ASV39.We also noted that, while sites on the feet (Ph, Tn, and Tw) were often colonized by multiple ASVs, other body sites tended to be colonized by a single ASV.
We searched the SILVA database (v138.1)for perfect matches to the ASV13 sequence and found 152 matches, all associated with uncultured Corynebacterium.The majority of them were from our own full-length 16S rRNA gene sequencing of skin microbiome samples (1).This observation, combined with the fact that all the existing C. tuberculos tearicum reference genomes had a more common ASV3 sequence variant, led us to hypothesize that the ASV13 sequence, which we hereafter refer to as ribotype B, could be associated with an unrecognized species or subspecies.For the purposes of the current work, we will use the term C. tuberculostearicum species complex (21,22) to refer to all C. tuberculostearicum-like isolates found on the skin.Additionally, we will refer to the predominant ASV3 OTU as ribotype A.

Expanding the C. tuberculostearicum complex reference catalog
Prior to this study, only five C. tuberculostearicum isolates had been sequenced and submitted to NCBI.Only two of those were from human skin and neither was a closed genome.To expand the C. tuberculostearicum complex reference genome catalog, we sequenced isolates from five different HVs (Table S1).To enrich for the previously unsequenced ribotype B Corynebacterium, we screened our collection of skin-associated isolates with full-length 16S rRNA gene sequences.Of the 45 C. tuberculostearicum isolates screened, we identified eight ribotype B isolates for further study.In total, we shotgun sequenced 40 isolates in the C. tuberculostearicum complex-30 from ribotype A, 8 from ribotype B, and 2 from other ASVs.Initial genome clustering using mash indicated that some of the isolates we sequenced were closely related.Therefore, we used dRep (25) to identify groups of highly similar genomes (ANImf > 99.5%) and chose the best representative genome for each genome set based on sequence assembly statistics: maximal N50, minimal number of contigs, and maximal genome size.In cases of comparable assembly quality, genomes were selected to increase body site represen tation.Applying these dereplication criteria resulted in 23 strain-level genomes that differ by 1,000 s of single nucleotide variants and are distributed across three ribotypes: 18 from ribotype A, 4 from ribotype B, and 1 from ASV30.

Whole-genome features of five complete C. tuberculostearicum complex genomes
In addition to a paucity of C. tuberculostearicum reference genomes at the time of this study, the ones that did exist were not associated with publications describing their general features.To address this, we selected five of the dereplicated genomes, three ribotype A and two ribotype B, for long-read sequencing on the PacBio platform.The subsequent finished or complete C. tuberculostearicum complex genomes revealed four copies of the 16S rRNA gene in each genome.For each genome, we performed a multi-sequence alignment containing each V1-V3 region copy along with the predomi nant ribotype A sequence first identified in our amplicon sequencing data set.Within a genome, copies of the V1-V3 region are almost entirely identical across alignment, with ribotype A C. tuberculostearicum species complex genomes carrying four copies of ASV3.One exception was a single nucleotide variant identified in one copy of 16S rRNA 5′ region of CTNIH12 (Fig. S3).Notably, no variation was found in the ribotype B genomes, which both carried four identical gene copies marked by the two characteristic sequence variants as identified in the amplicon sequencing data set.The within-genome homogeneity of 16S rRNA genes confirmed its usefulness as a marker.These complete C. tuberculostearicum complex genomes also enabled us to directly compare the type strain (DSM 44922/FDAARGOS_1117; human bone marrow) to our ribotype A and B isolates without the ambiguity introduced by unfinished genomes.Fig. S4 shows that the five PacBio genomes from this study were largely co-linear, with >80% of the genome in large syntenic blocks, with the type strain DSM 44922.All five of the genomes in this study had a 440 kb region that was reorganized relative to the type strain.This region, comprising around 17.6% of the genome, encoded 392 genes (387 coding).The breakpoints for inversions or translocated blocks in the reference were marked by mobile element families (e.g., IS3, IS256, IS481, and IS6) that suggest a mechanism for this rearrangement.
We extracted the methylation profiles from the PacBio reads of our five genomes (Table S2).The most common methylation pattern, found in all five genomes, was N6-methyladenine modification (m6A) of GATC motifs (~16,000 sites/genome; 99% methylated), typically associated with the Dam methylase.A second motif AAAAC was also found to be methylated (m6A) in all five genomes (~75% methylated).In addition to these two common methylation patterns, ribotype A isolate CTNIH10 had two additional methylated motifs present in hundreds of copies (GGCANNNNNATC and GATDNNNNTGCC).CTNIH20 had an additional three methylated motifs present at 520-1,626 sites/genome.Finally, CTNIH23 had evidence of an additional five methylation motifs across the genome that were all >98% methylated and present at 225-2,159 sites/ genome.Methylation systems are important for horizontal gene transfer (HGT), phage resistance, and potential recombinant engineering of these strains.
The presence of CRISPR-Cas as well as other phage defense systems poses additional barriers to HGT.We detected an eight-gene Type I-E Cas gene cluster and two large repeat arrays (24 and 19 spacers) in the CTNIH20 ribotype B genome, but not in any of the other full-length genomes from this study.Additional CRISPR-Cas systems were detected in the short-read assemblies of CTNIH9 (ribotype A; Type I-E Cas gene cluster, eight spacer CRISPR) and CTNIH22 (ribotype B; Type I-E Cas gene cluster, 12 spacer CRISPR).Prior to this, the only public C. tuberculostearicum species complex reference genome with a CRISPR-Cas system was strain SK141 (ACVP01).A variety of other defense systems including restriction modification systems were also identified using the DefenseFinder tool (Table S2) (26,27).
Plasmids are important for the mobilization of virulence factors, antibiotic resist ance genes, and as tools for recombinant engineering.At the time of this study, a single 4.2 kb plasmid was deposited in the public databases associated with the C. tuberculostearicum species complex, p1B146 (NC_014912) (28).Across the five long-read genomes sequenced here, we detected five plasmids, none of which aligned to p1B146.Two plasmids, pCT3-020e and pCT4-9116 from CTNIH23 and CTNIH12, had the same backbone as the C. diphtheriae plasmid pNG2 (ORF9-traA-ORF11-parAB-repA) but lacked the erythromycin-resistance cassette.CTNIH20 carries three plasmids ranging in size from 21.2 to 27.3 kb.All three carried a traA/recD2 ortholog encoding a relaxase/helicase but are otherwise unrelated.While most of the proteins on these three plasmids were annotated as hypothetical, pCT1-afe7 carried an ebrB efflux pump and pCT1-0563 carried an stp (spectinomycin/tetracycline) efflux pump predicted to be involved in resistance to dyes and antibiotics.All five plasmids were characterized by the presence of a TraA/ RecD2 encoding gene, suggesting a common mobility mechanism.Furthermore, we found fragments of these plasmids in many of the contig-level genomes.For instance, the C. tuberculostearicum CIP 102622 genome (JAEHFL01) carried both the stp gene and a nearby transcription factor (>99.6% identity) on a 9.8 kb contig, showing the value of these plasmid references for identifying HGT elements.

Taxonomic structure of the skin-associated C. tuberculostearicum species complex
While 16S rRNA amplicon sequencing enabled us to group C. tuberculostearicum species complex isolates into two predominant ASVs, the dereplicated genomes enable further high-resolution taxonomic analysis of this species complex.We used GET_HOMOLOGUES to extract core genes and build a phylogenetic tree based on core genome SNPs (Fig. 2).We noted additional taxonomic structure, particularly amongst ribotype A isolates.The five public reference genomes were in the ribotype A-dominated portion of the tree as expected based on their 16S rRNA gene sequence.
While there was a good correlation between 16S rRNA ASVs and the core SNP tree, we noted a single isolate, CTNIH19, which carried a ribotype A allele but localized with ribotype B isolates on the tree.CTNIH19 was isolated from the inguinal crease and is the most basal member of this clade.Work by Cappelli and colleagues recently defined a number of new Corynebacterium species, and CTNIH19 was >99.9% identical to a species they designated Corynebacterium curieae (29).We calculated the average nucleotide identity (ANI) across isolates using pyANI (Fig. S5) and determined that ribotype B isolates share ANI > 97% with themselves and <94% with other C. tuberculostearicum complex genomes, including C. curieae.When whole-genome ANI is used to define a putative new species, 95% identity is a generally accepted species boundary (18,30).We also submitted our ribotype B isolate genomes to the DSMZ type strain genome server (TYGS) (31) to obtain a taxonomic/nomenclature assignment.TYGS predicted that ribotype B genomes belong to a new species in both the whole genome and 16S rDNA trees.The closest TYGS references were C. tuberculostearicum DSM 44922 and C. kefirresidentii.C. keffiresidentii was first described in 2017 (32) after isolation from kefir grains but has not been accepted as an official species yet.Three of our isolates (CTNIH2, CTNIH6, and CTNIH14) from three different HVs were 98% identical to the C. kefirresidentii reference, calling into question whether kefir is the only natural host for this bacterium (21,22).We are proposing that the isolates belonging to clade ribotype B be designated C. hallux, given their association with feet and, particularly, the toenail.

Pangenome of the skin-derived C. tuberculostearicum complex
We performed a pangenomic analysis to describe the coding diversity of the C. tuberculostearicum species complex (Fig. 3).We generated an anvi'o (33) pangenomic map to illustrate genomic variation across the combined set (N = 28) of NCBI reference genomes and our dereplicated genomes.(Fig. 3a).Pangenome openness was estimated for the complex using the Heap's law model (Fig. 3b) as proposed by Tettelin et al. (34).The model indicated an open pangenome (0.30 ± 0.01, γ > 0), predicting that the C. tuberculostearicum complex pangenome would increase with more genomes analyzed.Despite a lower degree of pangenome-wide diversity, the strict set (N = 20) of genomes representing C. tuberculostearicum species also demonstrated an open pangenome (0.25 ± 0.01, γ > 0).With our additional 23 genomes, the species complex pangenome size increases from 3,080 genes to 5,451 genes, resulting in an expansion of the non-core, or accessory genome by over 300% (Fig. 3c).We performed a functional characterization of 23 lab-sequenced and 5 NCBI-derived C. tuberculostearicum species complex genomes using the eggNOG-mapper annotation tool (Fig. S6), which returned annotations for 83.2% of orthologous gene clusters (of which 21% are annotated COG category "S, " Function Unknown).Interestingly, among the non-core genes, inorganic ion metabolism and transport-related genes were among the most abundant.We performed a principal components analysis (PCA) of gene presence/absence data describing our 23 genomes  (belonging to all genomes), accessory (belonging to two or more genomes), and singleton (belonging to only one genome) genes.The expanded pangenome contains 5,451 genes using 90% sequence identity as a cutoff parameter.and two skin-derived reference sequences (Fig. 4a).We observed site-specific clustering of genomes isolated from the feet and moist environments.In agreement with the core phylogenetic clustering, we also observed distinct clustering of ribotype B isolates away from other foot-derived C. tuberculostearicum complex genomes.In addition, we identified 11 genes that were unique to and carried by every member of a ribotype (A = 2 and B = 9; Supplementary Table.S3), four of which we were able to assign functional annotation using the UniprotKB sequence similarity search tool, including a bacteriocin and ferric uptake protein.

Improved metagenomic read mapping using an expanded C. tuberculosteari cum pangenome
We tested whether the expanded genomic reference set could improve the rate of C. tuberculostearicum read mapping in a set of metagenomic data sets from 12 HVs at six body sites (2).Reads were mapped with bowtie2 (35) against a genome database consisting of the five unique NCBI references or a database of the NCBI references plus the dereplicated genomes from this study.Overall, 27% more reads were assigned to C. tuberculostearicum using the expanded genome set as compared to NCBI references alone.While the five HVs with isolate genomes in the expanded database showed slightly better classification, median improvement of 32%, over those without isolates in the database (median = 24%), the difference was not statistically significant, showing the broad utility of these genomes.Furthermore, when broken down by body site, toenail (Tn) sites showed the largest improvement in C. tuberculostearicum read assignment (72%) while nares, which only contributed a single genome to the expanded database, improved by 55% (Fig. 4b).To control for spurious read mapping to repetitive elements or other assembly artifacts, these analyses were repeated using only the predicted gene catalogs, rather than the whole genomes, and very similar improvements in C. tuber culostearicum read mapping were observed, 25% median improvement and a similar site-dependence.In addition to improved read mapping, read classification also supports the niche specificity of C. tuberculostearicum species complex members.Fig. 4c shows that within the species complex, sites on the feet (e.g., toenail) are enriched in C. hallux as was seen in the analysis of amplicon data (Fig. 1b), while C. kefirresidentii is most abundant in the nares.

Growth of skin-derived C. tuberculostearicum in sweat media
Members of the C. tuberculostearicum species complex are widely distributed across the skin's microenvironments.Differences in body site physiology and nutrient composition inherent to each niche may provide selective growth advantages (and disadvantages) to a subset of strains.We performed a pilot experiment to investigate differential growth phenotypes of C. tuberculostearicum species complex ribotypes in skin-like media (Fig. 5).Corynebacterium are often cultured on brain-heart infusion (BHI) media plates supple mented with 1% Tween-80 (BHI + 1% Tween80), so this medium was used as a positive control for growth in liquid medium (Fig. 5b).Isolates were cultured on two types of medium consisting of a complex mixture of amino acids, lipids, and other metabolites that mimic human eccrine sweat, with one medium supplemented to include a sebumlike synthetic lipid mixture.Both simulated sweat media were supplemented with 0.1% Tween-80.A collection of eight skin-derived Corynebacterium strains consisting of four ribotype A and four ribotype B strains were grown for 20 hours in triplicate and in two separate experiments for each strain and medium condition (N = 6; Fig. 5b-d).In all three growth conditions, ribotype B isolates demonstrated a lower mean OD 600 over time than ribotype A isolates (Fig. 5a).Using analysis of variance (ANOVA) and the Tukey method, we determined that the area-under-the-curve (AUC) difference between the two ribotypes is statistically significant (P < 0.0001) for all media conditions.This pattern was particularly pronounced in the BHI + 1% Tween80 and Sweat media + 0.1% Tween80 conditions.We observed that the addition of synthetic lipid mixture to the eccrine sweat-like medium attenuated, however, still maintained the growth difference between ribotype B and other strains, suggesting lipid-limited growth for members of ribotype B.

DISCUSSION
In this study, we investigated the genomic diversity of the predominant yet undersequenced Corynebacterium genus.Our survey of microbial diversity across human skin revealed niche-specific enrichment of Corynebacterium species and identified C. tuberculostearicum as a predominant and widespread species on human skin.Our analysis of existing amplicon sequencing data identified a site-specific novel 16S rRNA gene ribotype which led to an expanded sequencing of the C. tuberculostearicum species complex.In total, we sequenced 23 distinct isolates belonging to the C. tuberculosteari cum species complex including C. tuberculostearicum (n = 15), C. kefirresidentii (n = 3), C. curieae (n = 1), and a novel species we are calling C. hallux (n = 4).Discovery of C. kefirresidentii on human skin and nares suggests that humans are a natural host for this species.
C. hallux is likely a new species of skin-associated Corynebacterium and merits further work to formally name it.It was cultured from three different HVs, detected by amplicon sequencing in most HVs, represented in the recently published SMGC (SMGC_122) (16), and detected in public 16S rRNA gene database entries associated with the skin.In our HVs, it was enriched in sites on the feet, particularly the toenail and toe web.Microbial communities on the feet are highly diverse and relatively unstable (2) subject to temperature fluctuations and invasion by environmental microorganisms.
This study helps to resolve the diversity of C. tuberculostearicum species complex strains and provides an important genetic resource for future study.Our whole-genome sequencing effort uncovers substantial genetic and functional diversity across the complex and improves read-mapping overall by >24%, which will in turn bolster future sequencing efforts and further characterization of Corynebacterium across human skin.While this work has greatly expanded the non-core genome, a significant proportion of these genes are putative or lack functional annotation.Overall, we did not detect pronounced differences between ribotype B and other strains' genes that would explain the observed patterns in skin site distribution and growth on synthetic media.
Eleven genes (ribotype A = 2 and B = 9) perfectly segregated the two ribotypes; however, the roles of these genes remain unclear due to the limitations of annotation tools.Nevertheless, we identified two examples of genes with the potential to affect within-niche competition.One of the genes specific to ribotype B shared sequence similarity with a Lactococcin 972 family bacteriocin.The bactericidal activity of ribotype B against closely related strains could contribute to patterns of within-site dominance as observed between ribotypes (Fig. S2).Bactericidal peptides have recently gained interest as a possible therapeutic intervention for gastrointestinal disease (36).Furthermore, Corynebacterium was shown to be enriched in a recent study (37) of post-operative, healing wounds, suggesting an opportunity for biotherapeutic applications.We also identified a ribotype B-unique copy of a gene encoding ferrous iron transport protein B, a major regulator of bacterial iron uptake.Iron is an essential nutrient for survival, requiring the development of highly efficient sequestering mechanisms by pathogenic and avirulent bacteria alike (38,39).Under conditions of limited nutrient bioavailability, enhanced ferric uptake may prove to be a determining factor of intraspecies competi tion.
On both rich and skin-like media, we observed that ribotype B isolates grew less robustly compared to other strains.Thus, the topographical distribution of ribotypes across the skin's surface may emerge from selective growth advantages including different nutritional requirements or nutrient acquisition mechanisms between strains.The mechanism(s) of this variability may have important clinical implications.For example, characterizing the nutritional limits for sustained growth could inform the development of prebiotic therapeutics to maintain healthy microbial composition within a given microenvironment.Moreover, some strains may perform anti-virulent functions within their respective niches.Work from Ramsey et al. (8) supports the idea that the pathobiont Staphylococcus aureus shifts from virulence to commensalism in the presence of commensal Corynebacterium spp. in the skin/nares.Thus, identifying genomic functions that distinguish C. tuberculostearicum on healthy vs diseased skin could provide the impetus for engineering site-specific, microbe-based drug delivery systems.Understanding the roles and requirements of host-associated microbial communities in maintaining skin health will provide insight into the emergence of skin disorders in addition to novel therapeutic interventions to combat them.

Subject recruitment and sampling
Adult male and female HVs 18-40 years of age were recruited from the Wash ington, DC, metropolitan region.This natural history study was approved by the Institutional Review Board of the National Human Genome Research Institute (clinical trials.gov/NCT00605878)and the National Institute of Arthritis and Musculoskeletal and Skin Diseases (https://clinicaltrials.gov/ct2/show/NCT02471352), and all subjects provided written informed consent prior to participation.Sampling was performed as described previously (23).
Full-length 16S rRNA gene copies were extracted from each PacBio complete genome.Briefly, reference Corynebacterium 16S rRNA sequences were downloaded from the RDP database (good quality, >1,200 nt) and used as a BLAST database to identify the coordinates of the four copies in the genome.To detect intragenomic variation in the 16S rRNA gene, all copies within each genome were compared against each other using the EMBL-EBI Multiple Sequence Alignment Tool (MUSCLE).Whole genome alignments were generated in Mauve v 2.4.0.

Phylogenetic analysis
Publicly available genomes were downloaded from NCBI including C. tuberculostearicum (CP068156, CP06979, CP065972, ACVP01, and JAEHFL01), C. kefirresidentii (CP067012, JAHXPF01), C. curieae (JAKMUU01), and C. accolens (ACGD01).GET_HOMOLOGUES (v09212021) was used to cluster protein sequences from 29 genomes (28 C. tuberculos tearicum and 1 C. accolens) into orthologous groups and generate a core gene alignment.Prokka GBK files were used as input for clustering.The OrthoMCL (v1.4) option was used to group sequences utilizing the Markov Clustering Algorithm with a minimum coverage value of 90% in blast pairwise protein alignments.A strict core consensus genome was generated by calculating the intersection of single-copy genes present in all 29 genomes.The accompanying GET_PHYLOMARKERS (v.2.2.9.1) pipeline was used to identify markers for phylogenetic inference.IQTREE (v 2.1.2) was used to generate a maximum-likelihood phylogenetic tree from marker gene cluster alignments with 1,000 bootstrap replicates and a mean branch support value cutoff of 0.7.The top-scoring tree was visualized and annotated using the web-based program interactive Tree of Life (iTOL v6).The ANI matrix for all sequences was plotted and annotated using the package heatmap.2/R.

Pangenome calculation
Three C. tuberculostearicum pangenomes (for all reference sequences, skin-derived reference sequences and lab sequences, and all sequences) were calculated from Prokka-derived GFF3 files using Panaroo on sensitive mode with a sequence iden tity threshold of 90% and otherwise default parameters.The resultant gene pres ence/absence tables were used for downstream analysis.

Pangenome visualization
A pangenomic map was created using anvi'o (v.7.1) with imported Prokka gene calling information and annotations (GBK format) for 23 lab-sequenced and 5 NCBI reference C. tuberculostearicum genomes.Strains were annotated with sample metadata including skin site, general skin habitat, HV ID, as well as phylogenetic grouping from GET_HOMO LOGUES analysis.ANIb of aligned regions was calculated within anvi'o using pyANI.In addition, eggNOGG (v.2.1.7)gene annotations were used for gene cluster annotation with the NCBI COG Database (2020) and visualized using ggplot2/R.Core, accessory, and singleton gene counts were derived from gene presence/absence tables for (1) all reference sequences and (2) all sequences.Counts were visualized as pie charts.
A gene rarefaction curve for the C. tuberculostearicum pangenome was found by applying the Vegan/R specaccum() function to the gene presence/absence table, with a random order of additions of genomes permuted 1,000 times.A Heap's law power law model was fitted to the curve using the nls function in stats/R to calculate constants K and α.The curve was visualized using ggplot2/R.
A PCA was performed on the gene presence/absence table using the prcomp function in stats(v 3.6.2)/R.The resultant object was visualized using ggplot2/R.

Unique genes
Scoary (v.1.6.16)was used to identify genes unique to ribotype A and B for 23 labsequenced C. tuberculostearicum complex isolates.Gene sequences were queried using the UniProtKB online sequence similarity BLAST tool (https://www.uniprot.org/blast).

Differential growth experiments
C. tuberculostearicum liquid cultures were pelleted, washed, and diluted 10-fold in diH2O to an OD 600 of ~0.1.Differential media were inoculated with diluted culture at a concentration of 100:1 and plated in triplicate across a 96-well microplate.Bacterial growth was recorded using the Epoch 2 Microplate.OD 600 readings were taken at 30-minute intervals throughout a 24-hour time span.The experiment was performed in duplicate.OD 600 measurements were exported, corrected via blank subtraction, and plotted using ggplot2/R.The package growthcurver/R was used to calculate empirical

FIG 2 A
FIG 2A maximum-likelihood phylogenetic tree of C. tuberculostearicum species complex genomes from this study and publicly available, calculated from 1,315 core gene cluster alignments.Bootstrap values (located along internal nodes) were calculated from 1,000 replicates.Clustering was generated using GET_HOMOLOGUES OrthoMCL v1.4 option with minimum coverage 90% in BLAST pairwise alignments.The tree was rooted on outgroup C. accolens ATCC 49725.On the right of tree, boxes depict site (body site locations defined in Fig.1) and individual (HV) from which each isolate was cultured.Sites are colored by niche type, with moist in shades of green; feet in shades of orange; dry in pink; sebaceous in lavender; and nares in blue.Individuals are randomly but consistently colored.
Genomic rings are annotated by skin site and HV metadata and ordered by pyANI ANIb.Genome margins are manually adjusted for clarity.(b) Heap's Law estimates of pangenome openness for C. tuberculostearicum complex (N = 28) and C. tuberculostearicum species (N = 20) genomes.Rarefaction curves show the total number of genes accumulated with the addition of new genome sequences in random order with 1,000 permutations.Shaded regions represent the 95% CI.A Heap's law model was fit to the resultant complex and species curves to calculate γ values (0.30 ± 0.01 and 0.25 ± 0.01, respectively).(c) Number of core

FIG 4 C
FIG 4 C. tuberculostearicum complex pangenome clustering and improved metagenomic read mapping.(a) PCA of orthologous gene clustering.The gene presence/absence data for 25 genomes (including two NCBI references, shown in gray) was analyzed using PCA.Ribotype B genomes are shown as circles; other genomes are triangles.(b) Improvement in shotgun metagenomic read mapping with a 28 member C. tuberculostearicum database as compared to the five member NCBI database.Percent increase in mapped C. tuberculostearicum reads by body site.Each point is an HV.Triangles mark HVs that contributed one or more isolates to the expanded mapping database.(c) Relative abundance of C. tuberculostericum species complex members, including the newly proposed species C. hallux, based on bracken-corrected kraken2 analysis.

10 FIG 5
FIG 5 Growth phenotypes of select C. tuberculostearicum complex strains in synthetic sweat media.(a) Empirical area under curve comparison of C. tuberculostearicum species complex strains from ribotype A and ribotype B, with biological replicates grouped by color.Strains were grown in BHI + 1% Tween; Sweat media + 0.1% Tween80; Sweat media + 0.1% Tween80 + synthetic lipid mixture.Medium composition is described in further detail in methods.(b-d) Selected growth curves from a representative experiment plotted with standard error.Ribotype B isolates are shown in shades of blue.Ribotype A isolates are shown in shades of red.