Abstract
Aspergillus fumigatus is an environmental saprobe and opportunistic human fungal pathogen. Despite an estimated annual occurrence of more than 300,000 cases of invasive disease worldwide, a comprehensive survey of the genomic diversity present in A. fumigatus—including the relationship between clinical and environmental isolates and how this genetic diversity contributes to virulence and antifungal drug resistance—has been lacking. In this study we define the pan-genome of A. fumigatus using a collection of 300 globally sampled genomes (83 clinical and 217 environmental isolates). We found that 7,563 of the 10,907 unique orthogroups (69%) are core and present in all isolates and the remaining 3,344 show presence/absence of variation, representing 16–22% of the genome of each isolate. Using this large genomic dataset of environmental and clinical samples, we found an enrichment for clinical isolates in a genetic cluster whose genomes also contain more accessory genes, including genes coding for transmembrane transporters and proteins with iron-binding activity, and genes involved in both carbohydrate and amino-acid metabolism. Finally, we leverage the power of genome-wide association studies to identify genomic variation associated with clinical isolates and triazole resistance as well as characterize genetic variation in known virulence factors. This characterization of the genomic diversity of A. fumigatus allows us to move away from a single reference genome that does not necessarily represent the species as a whole and better understand its pathogenic versatility, ultimately leading to better management of these infections.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 digital issues and online access to articles
$119.00 per year
only $9.92 per issue
Rent or buy this article
Prices vary by article type
from$1.95
to$39.95
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Data availability
Raw FASTQ files for the isolates sequenced in this study were uploaded to the NCBI Sequence Read Archive and are publicly available under BioProject PRJNA697844. The accession numbers for the publicly available sequence data are listed in Extended Data Fig. 1. Annotated genome assemblies for sequence data generated in this study and for 64 isolates sequenced by us in a previous study20 were submitted to NCBI GenBank and are available under the NCBI BioSample numbers listed in Extended Data Fig. 1. Datasets from FungiDB, release 39, are available at https://fungidb.org/fungidb/app/downloads/release-39/. The NCBI RefSeq non-redundant protein database v.22.01.08 is accessible at https://ftp.ncbi.nlm.nih.gov/blast/db/cloud/2018-01-22/. Source data are provided with this paper.
References
Latgé, J. P. and Chamilos, G. Aspergillus fumigatus and Aspergillosis in 2019. Clin. Microbiol. Rev. https://doi.org/10.1128/CMR.00140-18 (2019).
Invasive Aspergillosis. LIFE http://www.life-worldwide.org/fungal-diseases/invasive-aspergillosis (2020).
Harrison, N. et al. Incidence and characteristics of invasive fungal diseases in allogeneic hematopoietic stem cell transplant recipients: a retrospective cohort study. BMC Infect. Dis. 15, 584 (2015).
Kuster, S. et al. Incidence and outcome of invasive fungal diseases after allogeneic hematopoietic stem cell transplantation: a Swiss transplant cohort study. Transpl. Infect. Dis. 20, e12981 (2018).
Heo, S. T. et al. Changes in in vitro susceptibility patterns of Aspergillus to triazoles and correlation with aspergillosis outcome in a tertiary care cancer center, 1999–2015. Clin. Infect. Dis. 65, 216–225 (2017).
Lestrade, P. P. et al. Voriconazole resistance and mortality in invasive aspergillosis: a multicenter retrospective cohort study. Clin. Infect. Dis. 68, 1463–1471 (2019).
Snelders, E. et al. Emergence of azole resistance in Aspergillus fumigatus and spread of a single resistance mechanism. PLoS Med. 5, 1629–1637 (2008).
Rizzetto, L. et al. Strain dependent variation of immune responses to A. fumigatus: definition of pathogenic species. PLoS ONE 8, 2–14 (2013).
Kowalski, C. H. et al. Heterogeneity among isolates reveals that fitness in low oxygen correlates with Aspergillus fumigatus virulence. mBio 7, e01515-16 (2016).
Alshareef, F. & Robson, G. D. Genetic and virulence variation in an environmental population of the opportunistic pathogen Aspergillus fumigatus. Microbiology 160, 742–751 (2014).
Knox, B. P. et al. Characterization of Aspergillus fumigatus isolates from air and surfaces of the International Space Station. mSphere 1, e00227-16.
Ries, L. N. A. et al. Nutritional heterogeneity among Aspergillus fumigatus strains has consequences for virulence in a strain- and host-dependent manner. Front. Microbiol. 10, 854 (2019).
Steenwyk, J. L. et al. Variation among biosynthetic gene clusters, secondary metabolite profiles, and cards of virulence across Aspergillus species. Genetics 216, 481–497 (2020).
Dos Santos, R. A. C. et al. Genomic and phenotypic heterogeneity of clinical isolates of the human pathogens Aspergillus fumigatus, Aspergillus lentulus, and Aspergillus fumigatiaffinis. Front. Genet. 11, 459 (2020).
Fedorova, N. D. et al. Genomic islands in the pathogenic filamentous fungus Aspergillus fumigatus. PLoS Genet. 4, e1000046 (2008).
Abdolrasouli, A. et al. Genomic context of azole-resistance mutations in Aspergillus fumigatus using whole-genome sequencing. mBio 6, e00536 (2015).
Garcia-Rubio, R. et al. Genome-wide comparative analysis of Aspergillus fumigatus strains: the reference genome as a matter of concern. Genes 9, 363 (2018).
Fan, Y., Wang Y. and Xu, J. Comparative genome sequence analyses of geographic samples of Aspergillus fumigatus—relevance for amphotericin B resistance. Microorganisms 8, 1673 (2020).
Puértolas-Balint, F. et al. Revealing the virulence potential of clinical and environmental Aspergillus fumigatus isolates using whole-genome sequencing. Front. Microbiol. 10, 1970 (2019).
Barber, A. E. et al. Effects of agricultural fungicide use on Aspergillus fumigatus abundance, antifungal susceptibility, and population structure. mBio 11, e02213-20 (2020).
Arendrup, M. C. et al. Method for the determination of broth dilution minimum inhibitory concentrations of antifungal agents for conidia forming moulds. E.DEF 9.3.2. EUCAST https://www.eucast.org/fileadmin/src/media/PDFs/EUCAST_files/AFST/Files/EUCAST_E_Def_9.3.2_Mould_testing_definitive_revised_2020.pdf (2020).
Jombart, T., Devillard, S. & Balloux, F. Discriminant analysis of principal components: a new method for the analysis of genetically structured populations. BMC Genet. 11, 94 (2010).
Nierman, W. C. et al. Genomic sequence of the pathogenic and allergenic filamentous fungus Aspergillus fumigatus. Nature 438, 1151–1156 (2005).
Steenwyk, J. L. et al. Genomic and phenotypic analysis of COVID-19-associated pulmonary aspergillosis isolates of Aspergillus fumigatus. Microbiol. Spectr. 9, e0001021 (2021).
Abad, A. et al. What makes Aspergillus fumigatus a successful pathogen? Genes and molecules involved in invasive aspergillosis. Rev. Iberoam. Micol. 27, 155–82 (2010).
Bignell, E., et al. Secondary metabolite arsenal of an opportunistic pathogenic fungus. Philos. Trans. R Soc. B 371, 20160023 (2016).
Kjaerbolling, I. et al. Linking secondary metabolites to gene clusters through genome sequencing of six diverse Aspergillus species. Proc. Natl Acad. Sci. USA 115, E753–E761 (2018).
Urban, M. et al. PHI-base: the pathogen-host interactions database. Nucleic Acids Res. 48, D613–D620 (2020).
O’Hanlon, K. A. et al. Targeted disruption of nonribosomal peptide synthetase pes3 augments the virulence of Aspergillus fumigatus. Infect. Immun. 79, 3978–3992 (2011).
Sugui, J. A. et al. Genes differentially expressed in conidia and hyphae of Aspergillus fumigatus upon exposure to human neutrophils. PLoS ONE 3, e2655 (2008).
Willger, S. D. et al. A sterol-regulatory element binding protein is required for cell polarity, hypoxia adaptation, azole drug resistance, and virulence in Aspergillus fumigatus. PLoS Pathog. 4, e1000200 (2008).
Blatzer, M., et al. SREBP coordinates iron and ergosterol homeostasis to mediate triazole drug and hypoxia responses in the human fungal pathogen Aspergillus fumigatus. PLoS Genet. 7, e1002374 (2011).
Bertuzzi, M. et al. The pH-responsive PacC transcription factor of Aspergillus fumigatus governs epithelial entry and tissue invasion during pulmonary aspergillosis. PLoS Pathog. 10, e1004413 (2014).
Bignell, E. et al. The Aspergillus pH-responsive transcription factor PacC regulates virulence. Mol. Microbiol. 55, 1072–1084 (2005).
Pongpom, M. et al. Divergent targets of Aspergillus fumigatus AcuK and AcuM transcription factors during growth in vitro versus invasive disease. Infect. Immun. 83, 923–933 (2015).
Camps, S. M. T. et al. Molecular epidemiology of Aspergillus fumigatus isolates harboring the TR34/L98H azole resistance mechanism. J. Clin. Microbiol. 50, 2674–2680 (2012).
McCarthy, C. G. P. & Fitzpatrick, D. A. Pan-genome analyses of model fungal species. Microb. Genom. 5, e000243 (2019).
Lind, A. L. et al. Drivers of genetic diversity in secondary metabolic gene clusters within a fungal species. PLoS Biol. 15, e2003583 (2017).
Rybak, J. Mutations in hmg1, challenging the paradigm of clinical triazole resistance in Aspergillus fumigatus. mBio 10, e00437-19 (2019).
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25, 1754–1760 (2009).
McKenna, A. et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010).
Cingolani, P. et al. A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3. Fly 6, 80–92 (2012).
Varga, J. Mating type gene homologues in Aspergillus fumigatus. Microbiology 149, 816–819 (2003).
Paoletti, M. et al. Evidence for sexuality in the opportunistic fungal pathogen Aspergillus fumigatus. Curr. Biol. 15, 1242–1248 (2005).
Danecek, P. et al. The variant call format and VCFtools. Bioinformatics 27, 2156–2158 (2011).
Peng, Y., et al. IDBA—a practical iterative de Bruijn graph de novo assembler. In Proc. Research in Computational Molecular Biology. RECOMB 2010. (ed. Berger, B.) 426–440 (Springer, 2010).
Gurevich, A. et al. QUAST: quality assessment tool for genome assemblies. Bioinformatics 29, 1072–1075 (2013).
Palmer, J & Stajich J. Funannotate v.1.5.3. Zenodo https://zenodo.org/record/2604804 (2019).
Smit, A., Hubley, R. & Green, P. RepeatMasker Open-4.0. (2013–2015); http://www.repeatmasker.org
Bao, W., Kojima, K. K. & Kohany, O. Repbase Update, a database of repetitive elements in eukaryotic genomes. Mobile DNA 6, 11 (2015).
Haas, B. J. et al. Automated eukaryotic gene structure annotation using EVidenceModeler and the Program to Assemble Spliced Alignments. Genome Biol. 9, R7 (2008).
Ter-Hovhannisyan, V. et al. Gene prediction in novel fungal genomes using an ab initio algorithm with unsupervised training. Genome Res. 18, 1979–1990 (2008).
Stanke, M. et al. Using native and syntenically mapped cDNA alignments to improve de novo gene finding. Bioinformatics 24, 637–644 (2008).
Lowe, T. M. & Chan, P. P. tRNAscan-SE On-line: integrating search and context for analysis of transfer RNA genes. Nucleic Acids Res. 44, W54–W57 (2016).
Finn, R. D. et al. Pfam: clans, web tools and services. Nucleic Acids Res. 34, D247–D251 (2006).
Rawlings, N. D., Barrett, A. J. & Bateman, A. MEROPS: the database of proteolytic enzymes, their substrates and inhibitors. Nucleic Acids Res. 40, D343–D350 (2012).
Zhang, H. et al. dbCAN2: a meta server for automated carbohydrate-active enzyme annotation. Nucleic Acids Res. 46, W95–W101 (2018).
Simao, F. A. et al. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31, 3210–3212 (2015).
Aramaki, T. et al. KofamKOALA: KEGG ortholog assignment based on profile HMM and adaptive score threshold. Bioinformatics 36, 2251–2252 (2020).
Jones, P. et al. InterProScan 5: genome-scale protein function classification. Bioinformatics 30, 1236–1240 (2014).
Emms, D. M. & Kelly, S. OrthoFinder: phylogenetic orthology inference for comparative genomics. Genome Biol. 20, 238 (2019).
Edgar, R. C. MUSCLE: a multiple sequence alignment method with reduced time and space complexity. BMC Bioinform. 5, 113 (2004).
Suyama, M., Torrents, D. & Bork, P. PAL2NAL: robust conversion of protein sequence alignments into the corresponding codon alignments. Nucleic Acids Res. 34, W609–W612 (2006).
Minh, B. Q. et al. IQ-TREE 2: new models and efficient methods for phylogenetic inference in the genomic era. Mol. Biol. Evol. 37, 1530–1534 (2020).
Hoang, D. T. et al. UFBoot2: improving the ultrafast bootstrap approximation. Mol. Biol. Evol. 35, 518–522 (2018).
Didelot, X. & Wilson, D. J. ClonalFrameML: efficient inference of recombination in whole bacterial genomes. PLoS Comput. Biol. 11, e1004041 (2015).
Schliep, K. P. phangorn: Phylogenetic analysis in R. Bioinformatics 27, 592–593 (2011).
Yu, G. et al. Two methods for mapping and visualizing associated data on phylogeny using Ggtree. Mol. Biol. Evol. 35, 3041–3043 (2018).
Yang, Z. PAML 4: phylogenetic analysis by maximum likelihood. Mol. Biol. Evol. 24, 1586–1591 (2007).
Whelan, F. J., Rusilowicz, M. and McInerney, J. O. Coinfinder: detecting significant associations and dissociations in pangenomes. Microb. Genom. 6, e000338 (2020).
Kang, H. M. et al. Variance component model to account for sample structure in genome-wide association studies. Nat. Genet. 42, 348–354 (2010).
Brynildsrud, O. et al. Rapid scoring of genes in microbial pan-genome-wide association studies with Scoary. Genome Biol. 17, 238 (2016).
Acknowledgements
This project was supported by The Federal Ministry for Education and Science (Bundesministerium für Bildung und Forschung) within the framework of InfectControl 2020 projects FINAR and FINAR 2.0 (grant nos 03ZZ0809 and 03ZZ0834 to O.K.). The NRZMyk is supported by the Robert Koch Institute with funds provided by the German Ministry of Health (grant no. 1369-240 to O.K.). A.E.B. and G.P. are supported by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Germany´s Excellence Strategy (EXC 2051, Project ID 390713860). G.P. thanks the Deutsche Forschungsgemeinschaft (DFG) CRC/Transregio 124 ‘Pathogenic fungi and their human host: Networks of interaction’, subproject INF (project number 210879364) for intellectual input. The authors thank M. Blango for thoughtful discussions on this manuscript.
Author information
Authors and Affiliations
Contributions
A.E.B., G.P. and O.K. conceptualized and designed the study. The experimental work was performed by A.E.B. and G.W. A.E.B., B.S., G.P., K.K., J.L., O.K., T.S.-O. and G.W. analysed the data and interpreted results. A.E.B. wrote the primary manuscript and all of the listed authors were involved in the editing and review of the manuscript. O.K. acquired the primary funding for this work.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Peer review information Nature Microbiology thanks Nancy Keller and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 The pan-genome of A. fumigatus.
(A) Most frequently occurring Pfam domains among the core and accessory genomes. Values represent the total sum of domain-containing proteins among all 300 genomes. (B) Conservation of Af923 genes in the A. fumigatus pan-genome, arranged by chromosomal location in Af293. Each gene in Af293 is represented by a uniform-sized band that is coloured according to its prevalence among the 300 isolates analysed. Genes not in Af293 and their relative frequency are depicted at the bottom.
Extended Data Fig. 2 Nucleotide diversity (π) of A. fumigatus isolates from the environment, invasive disease and chronic disease.
π was calculated using 5 kb sliding windows across with genome with a 500 bp step size. Due to the underrepresentation of isolates from chronic disease in the dataset, isolates from the environment and invasive disease were downsampled to match the number of isolates from chronic disease (n = 19 isolates per group). The bold line in the box-and-whisker plot indicates the 50th percentile, and the box extends from the 25th to the 75th percentiles. The whiskers denote the lowest and highest values within 1.5 interquartile range. Statistical significance determined by two-sided Mann–Whitney U test with Bonferroni correction. *** represents P < 0.001. Exact P-values are: chronic vs. environmental: P = 97e-39; chronic vs invasive: P = 1.6e-78; invasive vs. environmental: P = 5.6e-14.
Extended Data Fig. 3 Phylogenies constructed from the genomes of 300 A. fumigatus using de novo assembled genomes and reference-base analyses.
(a-b) Core genome phylogeny built from nucleotide coding sequence of 5,380 single-copy orthologous genes shared by all 300 A. fumigatus isolates, A. oerlinghausenensis and A. fischeri (alignment length = 9,178,893 bp). Panel a shows the phylogeny rooted with A. fischeri and depicts the scaled relationship between the two outgroups and the A. fumigatus samples. Panel b depicts this phylogeny unrooted and with outgroups removed for comparison to the other phylogenies. (c) Phylogeny from concatenated SNVs following read alignment to Af293 and variant calling (n = 341,031 base pair). Genomic positions with zero coverage in any sample were removed from the alignment. (d) SNV-based phylogeny constructed from 4-fold degenerate (neutral) loci (n = 35,052 base pair).
Extended Data Fig. 4 Discriminant analysis of principle components of 300 A. fumigatus isolates.
(a-c) Number of clusters vs. Bayesian information criteria (BIC) was used to assess the best supported number of genetic clusters for the dataset. The input for analysis was either (a) a non-gapped core nucleotide alignment from 5,830 single-copy orthologous genes (b) a gene count matrix from orthogroup-based clustering of pansequences or (c) or whole-genome SNV data with of positions with zero genomic coverage in any isolate in the dataset excluded.
Extended Data Fig. 5 GWAS for variants associated with clinical isolates and triazole resistance.
(a & c) Q–Q plots for association with isolate source (a; clinical vs. environmental) and triazole resistance (c; resistance to one or more triazole vs. susceptible to all examined; c). Four software were utilized: EMMAX (top, left), GEMMA (top, right), treeWAS (bottom, left) and ECAT (bottom, right). The resulting Q–Q plots were used to identify the tool that produced outputs where the expected p-value distribution (x-axis) best matched the observed p-values (y-axis). (b) Venn diagram showing the gene overlap between GWAS for all clinical strains relative to environmental and significant genes specific to acute and chronic disease. (d) Venn diagram showing the gene overlap for association with triazole resistance when minor allele frequencies (MAF) of 0.01 and MAF 0.05 were used.
Supplementary information
Supplementary Information
Supplementary Tables 1–3.
Supplementary Data 1
Metadata, reference-guided and de novo assembly genome statistics for the 300 A. fumigatus samples analysed in this study.
Supplementary Data 2
Enrichment for GO annotations, Pfam and InterPro domains in the core and accessory genomes of A. fumigatus. Significance was determined using a one-sided Fisher’s exact test with Bonferroni‘s correction (P < 0.05).
Supplementary Data 3
Prevalence of pan-genes, including Af293 genes, across the 300 genomes analysed in this study.
Supplementary Data 4
Variation in copy number and high-impact variants in the Pfam domains and GO annotations between the genetic clusters. The values indicate the mean copy number of a given Pfam domain or GO annotation in the genomes of the genetic cluster or the mean fraction of annotation/domain-containing genes with high-impact variants in each cluster. Significance was determined using a one-way ANOVA with Bonferroni‘s correction (P < 0.05).
Supplementary Data 5
Genetic variation in 360 virulence-associated genes across the 300 A. fumigatus samples analysed in this study.
Supplementary Data 6
Variant-based GWAS and pan-GWAS for genomic variation associated with clinical isolates and triazole resistance. For the isolate-source pan-GWAS, log2-transformed odds ratios greater than zero represent genes significantly enriched among clinical genomes, whereas values less than zero represent genes significantly enriched among environmental genomes. For the triazole-resistance pan-GWAS, log2-transformed odds ratios greater than zero represent genes significantly enriched among resistant isolates, whereas values less than zero represent genes significantly enriched among susceptible isolates. Significance for variant-based GWAS was determined using the EMMAX software, which uses a linear mixed model. The resulting P values were FDR-corrected using the Benjamini–Hochberg procedure. Significance for pan-GWAS was determined using Scoary, which utilizes a two-tailed binomial test. The resulting P values were FDR-corrected using Bonferroni’s correction. FDR-corrected P < 0.05 was used as the cutoff for significance of both variant-based GWAS and pan-GWAS.
Source data
Source Data Fig. 1
Presence/absence gene matrix, pan-gene protein lengths, pan-gene dN/dS values, pan-gene InterPro domains and accessory-gene co-association modules.
Source Data Fig. 2
Newick phylogeny and genome metadata.
Source Data Fig. 3
Whole-genome sequencing, cyp51a and cyp51b phylogenetic network files and cyp51a and cyp51b dN/dS values.
Source Data Fig. 4
Accessory-gene counts for each isolate, Newick phylogeny and scale GO abundance matrix.
Source Data Fig. 5
Virulence-associated gene matrix.
Source Data Extended Data Fig. 1
Total Pfam domain counts for core and accessory genes, chromosome plot source data.
Source Data Extended Data Fig. 2
Pi values for each 5 kb sliding window.
Source Data Extended Data Fig. 3
Newick phylognies of core nucleotide sequence, neutral loci and genome-wide SNV data.
Source Data Extended Data Fig. 4
Input datasets for discriminant analysis of principle-component clustering.
Source Data Extended Data Fig. 5
P values for Q–Q plots from ECAT, EMMAX, GEMMA and treeWAS for clinical isolates and triazole resistance.
Rights and permissions
About this article
Cite this article
Barber, A.E., Sae-Ong, T., Kang, K. et al. Aspergillus fumigatus pan-genome analysis identifies genetic variants associated with human infection. Nat Microbiol 6, 1526–1536 (2021). https://doi.org/10.1038/s41564-021-00993-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41564-021-00993-x
This article is cited by
-
Genomic diversity of the pathogenic fungus Aspergillus fumigatus in Japan reveals the complex genomic basis of azole resistance
Communications Biology (2024)
-
Breaking the mould: rethinking ‘wild type’ in fungal pathogens
Nature Reviews Microbiology (2024)
-
Comparative genomic study of the Penicillium genus elucidates a diverse pangenome and 15 lateral gene transfer events
IMA Fungus (2023)
-
The genomes of Scedosporium between environmental challenges and opportunism
IMA Fungus (2023)
-
Combined pangenomics and transcriptomics reveals core and redundant virulence processes in a rapidly evolving fungal plant pathogen
BMC Biology (2023)