Overview. A total of 661 samples belonging to 34 species and 9 families of foregut-fermenting ruminant (thereafter ruminant, n = 468), foregut-fermenting pseudoruminant (thereafter pseudoruminant, n = 17), and hindgut fermenters (n = 176) were examined (Fig. 1a-b, Table S2). Many of the samples belong to previously unsampled/rarely sampled animal families (e.g. Caviidae, Trichechidae) and species (capybara, mara, manatee, markhor, chamois, takin). The dataset also provides a high level of replication for a variety of animals (229 cattle, 138 horses, 96 goats, 71 sheep, and 23 white-tail deer, among others) (Fig. 1b), locations (418 samples from USA, 74 from Egypt, 38 from Italy, 35 from New Zealand, 31 from Germany, and 25 from Nepal, among others) (Fig. 1a, Table S2), and domestication status (564 domesticated, 97 undomesticated) (Fig. 1b, Table S2), allowing for robust statistical analysis.
A total of 8.73 million Illumina sequences of the hypervariable region 2 of the large ribosomal subunit (D2 LSU) were obtained. Rarefaction curve (Fig. S1) and coverage estimates (Table S3) demonstrated that the absolute majority of genus-level diversity was captured. The overall composition of the dataset showed a high genus-level phylogenetic diversity, with representatives of 19 out of the 20, and 10 out of the 11, currently described genera, and yet-uncultured candidate genera, respectively, identified (Fig. 1c, d, S2, Table S3). Ubiquity (number of samples in which a taxon is identified) and relative abundance (percentage of sequences belonging to a specific taxon) of different genera were largely correlated (R2 = 0.71, Fig. S3).
To confirm that these patterns were not a function of the primer pair, or sequencing technology (Illumina) employed, we assessed the reproducibility of the observed patterns by conducting a parallel sequencing effort on a subset of 61 samples using a different set of primers targeting the entire D1/D2 LSU region (~ 700 bp D1/D2), and a different sequencing technology (SMRT PacBio). A highly similar community composition was observed when comparing datasets generated from the same sample using Illumina versus SMRT technologies,as evident by small Euclidean distances on CCA ordination plot between each pair of Illumina versus PacBio sequenced sample (Fig. S4b-d)), Ordination-based community structure analysis indicated that the sequencing method employed had no significant effect on the AGF community structure (Canonical correspondence analysis ANOVA p-value = 0.305) (Fig. S4).
Expanding Neocallimastigomycota diversity. Interestingly, 996,374 sequences (11.4% of the total) were not affiliated with any of the 20 currently recognized AGF genera or 11 candidate genera. Detailed phylogenetic analysis grouped these unaffiliated sequences into 56 novel genera, designated NY1-NY56 (Fig. 2a, Table S3), hence expanding AGF genus-level diversity by a factor of 2.75. In general, relative abundance of sequences affiliated with novel genera was higher in ruminants (Wilcoxon test adjusted p-value < 2x10− 16), as well as in pseudoruminants (Wilcoxon test adjusted p-value = 0.02) compared to hindgut fermenters (Fig. 2b-d, Table S4). On the other hand, there was no significant difference in relative abundance of novel genera based on domestication status (Wilcoxon test adjusted p-value = 0.69) (Fig. 2e, Table S4).
A closer look at the patterns of distribution of novel genera identified three important trends. First, the proportion of sequences belonging to novel genera in previously well-sampled animals (cattle, sheep, goats, horses, and donkey) was significantly smaller (Wilcoxon test adjusted p-value = 2.3x10− 10) (Fig. 2f, Table S4) than in rarely sampled or previously unsampled hosts (e.g. buffalo, bison, deer, elephant, mara, capybara, manatee, among others), highlighting the importance of sampling hitherto unsampled or rarely sampled animals as a yet-unexplored reservoir for AGF diversity. Second, some novel genera were extremely rare and often identified solely in few sample replicates of a well-sampled animal (e.g. NY42, NY9, NY53, and NY17, in only 5, 2, 1, and 1 cattle samples, respectively), highlighting the importance of replicate sampling for accurate assessment of hosts’ novel pangenomic diversity (Fig. 2g). Finally, 5 of the 56 novel genera were never identified in > 0.1% abundance in any sample, and 16 of the 56 never exceeded 1%, a pattern that highlights the value of deep sequencing to access perpetually rare members of the AGF community (Fig. 2h, Table S5).
Phylogenetically, 32 of the novel lineages identified clustered within the 4 recently proposed families in the Neocallimastigomycota [41], with 13, 7, 9, and 3 genera clustering with the families Neocallimastigaceae, Caecomycetaceae, Anaeromycetaceae, and Piromycetaceae, respectively). Another 17 novel genera formed additional 4 well-supported family-level clusters with orphan cultured genera (5, 4, 5, and 3 novel genera forming family-level clusters with the orphan genera Joblinomyces, Buwchfawromyces-Tahromyces, Aklioshbomyces, and Khoyollomyces, respectively). The remaining 7 novel genera were not affiliated with known cultured or uncultured genera and potentially formed novel family-level lineage(s) within the Neocallimastigomycota (Fig. 2a).
Confirmation of the occurrence of such an unexpectedly large number of novel AGF genera and simultaneous recovery of full-length sequence representatives (~ 700 bp covering the D1/D2 regions) was achieved by examining the SMRT-PacBio output generated from a subset (61 samples) of the total dataset, as described above. A total of 49 of the 56 novel genera were identified in the PacBio dataset (Table S6). No additional new genera were found using this supplementary sequencing approach. Further, comparing SMRT- versus Illumina-generated tree topologies, revealed nearly identical topologies, phylogenetic distinction, and putative family-level assignments for all novel genera identified (Fig. S5, Table S7).
Stochastic and deterministic processes play an important role in shaping AGF community. Normalized stochasticity ratios (NST) calculated based on two β-diversity indices (abundance-based Bray-Curtis index, and incidence-based Jaccard index) suggested that both stochastic and deterministic processes contribute to shaping AGF community assembly (Fig. 3a-h, Table S8). However, significant differences in the relative importance of these processes were observed across datasets regardless of the β-diversity index used. Specifically, hindgut fermenters and pseudoruminants exhibited significantly lower levels of stochasticity when compared to ruminants (Fig. 3a, e, Wilcoxon adjusted p-value < 2x10-16). This was also reflected at the animal family level (Fig. 3b, f), as well as at the animal species level (Fig. 3c, g). On the other hand, NST values were highly similar for domesticated versus non-domesticated animals (Fig. 3d, h). To further quantify the contribution of specific deterministic (homogenous and heterogenous selection) and stochastic (homogenizing dispersal, dispersal limitation, and drift) processes in shaping the AGF community assembly, we used a two-step null-model-based quantitative framework that makes use of both taxonomic (RCBray) and phylogenetic (βNRI) β-diversity metrics [33, 34]. Results (Fig. 3i) confirmed a lower overall level of stochasticity in hindgut fermenters, similar to the patterns observed using NST values. More importantly, the results indicate that homogenous selection (i.e., selection of specific taxa based on distinct difference between examined niches) represents the sole (99.8%) deterministic process shaping community assembly across all datasets (Fig. 3i). Within stochastic processes, drift played the most important role in shaping community assembly (83.4% of all stochastic processes), followed by homogenizing dispersal (16.6% of all stochastic processes), with negligible contribution of dispersal limitation. As such, homogenous selection, drift and homogenizing dispersal collectively represented the absolute (> 99%) drivers of AGF community assembly, albeit with different relative importance of the three processes in datasets belonging to different animal species, family, gut type, or lifestyle (Fig. 3i).
Community structure analysis reveals a strong pattern of fungal-host phylosymbiosis.
Assessment of alpha diversity patterns indicated that gut type, animal family, animal species, but not domestication status, significantly affected alpha diversity (Fig S6). Hindgut fermenters harbored a significantly less diverse community compared to ruminants. Within ruminants, no significant differences in alpha diversity levels were observed across various families (Cervidae and Bovidae) or species (deer, goat, cattle, and sheep) (Fig. S6).
Patterns of AGF community structure were assessed using ordination plots (PCoA, NMDS, and RDA) constructed using dissimilarity matrix-based (Bray-Curtis) and phylogenetic similarity-based (unweighted and weighted Unifrac) beta diversity indices (PCoA, and NMDS), or genera abundance data (RDA). The results demonstrated that host-associated factors (gut type, animal family, animal species) play a more important role in shaping the AGF community structure when compared to domestication status, with samples broadly clustering by the animal species (Fig. 4a). PERMANOVA results demonstrated that, regardless of the beta diversity measure, all factors significantly explained diversity (F statistic p-value = 0.001), with animal species explaining the most variance (14.7–21% depending on the index used), followed by animal family (5.4–7.2%), and animal gut type (4 -5.4%). Host domestication status only explained 0.4–0.5% of variance and was not found to be significant with unweighted Unifrac (F statistic p-value = 0.143) (Fig. 4b).
Due to the inherent sensitivity of PERMANOVA to heterogeneity of variance among groups [42], we used three additional matrix comparison-based methods: multiple regression of matrices (MRM), Mantel tests for matrices correlations, and Procrustes rotation [43, 44], to confirm the role of host-related factors in shaping AGF community. Results of matrices correlation using each of the three methods, and regardless of the index used, confirmed the importance of animal host species, family, and gut type in explaining the AGF community structure (Fig. S7). Further, we permuted the MRM analysis (100 times), where one individual per animal species was randomly selected for each permutation. Permutation analysis (Fig. 4c) yielded similar results to those obtained from the entire dataset (Fig. S7b), demonstrating that the obtained results are not affected by community composition variation among hosts of the same animal species.
Collectively, our results suggest a pattern of phylosymbiosis, with closely related host species harboring similar AGF communities [45]. To confirm the significant association between the host animal and the AGF community, we employed PACo (Procrustes Application to Cophylogenetic) analysis with subsampling one individual per host species (n = 100 subsamples), and compared the distribution of PACo Procrustes residuals of the sum of squared differences within and between animal species (Fig. 5a), animal families (Fig. 5b), and animal gut types (Fig. 5c). Within each animal species, family, or gut type, the variation in PACo residuals were minimal, where 90% of the residuals within animal species ranged from 0.0056 (buffalo) to 0.029 (elephant), within animal family ranged between 0.0048 (Giraffidae) to 0.029 (Elephantidae), and within gut type ranged between 0.007 (foregut) to 0.051 (hindgut). On the other hand, PACo residuals differed significantly between datasets (Wilcoxon two-sided adjusted p-value < 0.01) when animals belonged to different families, or different gut types were examined (Fig. 5a-c, Table S9). These results indicated a strong cophylogenetic signal that was robust to intra-animal species microbiome variation.
Identifying specific genus-host associations. Global phylogenetic signal statistics (Abouheif’s Cmean, Moran’s I, and Pagel’s Lambda) identified 37 genera with significant correlations to the host phylogenetic tree (p-value < 0.05 with at least one statistic) (Table S10). In addition to global phylogenetic signal statistics, we calculated local indicator of phylogenetic association (LIPA) values for correlations between specific genera abundances and specific hosts. Of the above 37 genera, 34 showed significant associations with at least one animal host (LIPA values ≥ 0.2), with 17 showing strong associations (LIPA values ≥ 1) with specific animal species, and 10 showing strong associations (LIPA values ≥ 1) with certain animal families (Fig. 5d). A distinct pattern of strength of association was observed: All hindgut fermenters exhibited a strong association with a few AGF genera: Horses, Przewalski’s horses, and zebras with the genus Khoyollomyces, mules with the uncultured genus AL3, Orpinomyces, and Caecomyces, donkeys with Piromyces, elephants with Piromyces, Caecomyces, and Orpinomyces, rhinoceros with NY20, manatees with NY54 and Paucimyces, and mara with NY1 and Orpinomyces. Members of the animal family Equidae mostly showed association with the phylogenetically related genera Khoyollomyces and the uncultured genus AL3, suggesting a broader family-level association between both host and fungal families (Fig. 5d, Table S11). On the other hand, a much smaller number of strong host-AGF associations were observed in ruminants (5/22 animal species: NY19 in bison, RH2 in oryx, AL8 in buffalo, NY9, SK3, and Caecomyces in yak, and Neocallimastix in elk) (Fig. 5d, Table S11). However, this lack of strong LIPA signal was countered by the identification of multiple intermediate and weak cophylogenetic signals (LIPA values 0.2-1, yellow in Fig. 5d) per animal species. It therefore appears that an ensemble of genera, rather than a single genus, is mostly responsible for the phylosymbiosis signal observed in ruminants. Indeed, DPCoA ordination biplot showed a clear separation of the hindgut families Equidae, and Rhinocerotidae, from the ruminant families Bovidae, Cervidae, and Giraffidae, with the pseudoruminant family Camelidae occupying an intermediate position. This confirmed the patterns suggested by LIPA values, with 14 genera contributing to the foregut community as opposed to only 9 for hindgut fermenters (Fig. S8).
Phylogenomic and molecular clock analyses correlate fungal-host preferences to co-evolutionary dynamics. The observed patterns of fungal- animal host preferences could reflect co-evolutionary symbiosis (i.e., a deep, intimate co-evolutionary process between animal hosts and AGF taxa). Alternatively, the observed preferences could represent a post-evolutionary environmental filtering process, where prevalent differences in in-situ conditions (e.g., pH, retention time, redox potential, feed chemistry) select for adapted taxa from the environment regardless of the partners’ evolutionary history [46]. To address both possibilities, we generated new transcriptomic datasets for 20 AGF strains representing 13 genera, and combined these with 32 previously published AGF transcriptomes [35–40]. We then used the expanded dataset (52 taxa, 14 genera) to resolve the evolutionary history of various AGF genera and estimate their divergence time. In general, most genera with preference to hindgut fermenters occupied an early-diverging position in the Neocallimastigomycota tree, and a broad concordance between their estimated divergence estimate and that of their preferred host family was observed. The genus Khoyollomyces, showing preference to horses and zebras (family Equidae), represented the deepest and earliest branching Neocallimastigomycota lineage, with a divergence time estimate of 67 − 50 Mya (Fig. 6). Such estimate is in agreement with that of the Equidae ~ 56 Mya [47, 48]. As well, while the genera AL3 and NY54 are uncultured, and hence not included in the timing analysis, their well-supported association with Khoyollomyces in LSU trees (Fig. 2a and S5) strongly suggests a similar early divergent origin. This is in agreement with the early evolution of the families of their hindgut preferred hosts: mules (family Equidae) for AL3, and manatee (family Trichechidae, evolving ~ 55 Mya [48]) for NY54). Similarly, the genus Piromyces, with a preference to elephants (family Elephantidae) and donkeys (Equidae), also evolved early (55 − 41 Mya), in accordance with the divergence time estimates for families Equidae and Elephantidae (~ 55 Mya) [47, 48]. Finally, the early divergence time estimate of Paucimyces (50 − 38 Mya) is again in agreement with its preference to the hindgut family Trichechidae (Manatee) [48].
Contrasting with the basal origins of AGF genera associated with hindgut fermenters, the majority of AGF genera showing strong, intermediate, or weak association with ruminants appear to have more recent evolutionary divergence time estimates. These include many of the currently most abundant and ecologically successful genera, e.g. Orpinomyces (24–32 Mya), Neocallimastix (28–37 Mya), Anaeromyces (19–25 Mya), and Cyllamyces (20–26 Mya). Such timing is in agreement with estimates for the rapid diversification and evolution of the foregut fermenting high ruminant families Bovidae, Cervidae, and Giraffidae (18–23 Mya) [30, 49], following the establishment and enlargement of the functional rumen [30].
While these results suggest the central role played by co-evolutionary phylosymbiosis in shaping AGF community, timing estimates for a few genera showed a clear discourse between evolutionary history and current distribution patterns. Such discourse suggests a time-agnostic post-evolutionary environmental filtering process. The late-evolving genera Orpinomyces (24–32 Mya) and Caecomyces (20–26 Mya) were widely distributed and demonstrated intermediate and strong preferences not only to ruminants, but also to hindgut fermenters (Fig. 5d), suggesting their capacity to colonize hindgut-fermenting hosts, the existence of which has preceded their own evolution. Collectively, these results argue for a major role for co-evolutionary phylosymbiosis and a minor role for post-evolutionary environmental filtering in shaping the AGF community in mammals.