Can bacterial indicators of a grassy woodland restoration inform ecosystem assessment and microbiota-mediated human health?

Understanding how microbial communities change with environmental degradation and restoration may offer new insights into the understudied ecology that connects humans, microbiota, and the natural world. Immunomodulatory microbial diversity and ‘Old Friends’ are thought to be supplemented from biodiverse natural environments, yet deficient in anthropogenically disturbed or degraded environments. However, few studies have compared the microbiomes of natural vs. human-altered environments and there is little knowledge of which microbial taxa are representative of ecological restoration—i.e. the assisted recovery of degraded ecosystems typically towards a more natural, biodiverse state. Here we use novel bootstrap-style resampling of site-level soil bacterial 16S rRNA gene environmental DNA data to identify genus-level indicators of restoration from a 10-year grassy eucalypt woodland restoration chronosequence at Mt Bold, South Australia. We found two key indicator groups emerged: ‘opportunistic taxa’ that decreased in relative abundance with restoration and more stable and specialist, ‘niche-adapted taxa’ that increased. We validated these results, finding seven of the top ten opportunists and eight of the top ten niche-adapted taxa displayed consistent differential abundance patterns between human-altered vs. natural samples elsewhere across Australia. Extending this, we propose a two-dimensional mapping for ecosystem condition based on the proportions of these divergent indicator groups. We also show that restoring a more biodiverse ecosystem at Mt Bold has increased the potentially immuneboosting environmental microbial diversity. Furthermore, environmental opportunists including the pathogencontaining genera Bacillus, Clostridium, Enterobacter, Legionella and Pseudomonas associated with disturbed ecosystems. Our approach is generalizable with potential to inform DNA-based methods for ecosystem assessment and help target environmental interventions that may promote microbiota-mediated human health gains.


Introduction
People are losing contact with nature due to rapid urbanization (Rydin et al., 2012), biodiversity loss (Mace et al., 2018), and environmental degradation (Navarro et al., 2017). At the same time, rates of immune-related disease are increasing (von Hertzen et al., 2011). Many immunologists and medical researchers now believe these trends are linked (WHO and SCBD, 2015). Microbial diversity and perhaps key species (microbial 'Old Friends') from biodiverse and natural environments are thought to play a critical beneficial role in the development and maintenance of human immune fitness (Rook, 2013;von Hertzen et al., 2011). Environmental microbiota might supplement our own protective human microbiota (e.g. in the skin, airway, gut), participate in immune signalling (triggering defence or tolerance responses), and help build immune memory (Flandroy et al., 2018;Rook, 2013). These interactions underpin many aspects of our health, and can help regulate both infectious and non-infectious disease (Ichinohe et al., 2011;Ottman et al., 2018;Stein et al., 2016).
Meanwhile, human societies unwittingly affect landscape-scale drivers of environmental microbiota. For example, land use and management, biomass production, plant species and diversity, and soil quality each contribute to the development of characteristic plant and soil microbiota (Bulgarelli et al., 2013;Delgado-Baquerizo et al., 2017;Delgado-Baquerizo et al., 2018;Gellie et al., 2017a;Turner et al., T 2013). Differing land use and management can also contribute distinct microbial signatures to airborne microbiota (i.e. aerobiology) dispersed among exposed neighbouring human populations (Bowers et al., 2013;Bowers et al., 2011;Mhuireach et al., 2016). With estimates of up to 47% of the global ice-free land area considered degraded and up to 76% impacted by human activities (Navarro et al., 2017), there is considerable scope for the anthropogenic modification of environmental microbiota to represent a potentially important but largely unappreciated feedback mechanism, linking ecosystems and human health. Equally, restoring biodiverse ecosystems where people live may offer promise to improve immune fitness and human health (Mills et al., 2017;Robinson et al., 2018). Ecological restoration-i.e. the assisted recovery of an ecosystem towards an ecological reference state (SER International Science and Policy Working Group, 2004)-also has key objectives to support native biodiversity, ecosystem functions, genetic resources, resilience against environmental stressors, productive potential and sustainable use.
However, large knowledge gaps remain concerning the immunomodulatory potential of environmental microbiomes. There are few studies that compare the microbiomes (i.e. genetic material of microbiota) of natural vs. human-altered outdoor environments from a human health perspective. Microbiomes of indoor only environments have received greater attention (Gilbert and Stephens, 2018). Suspected Old Friends include environmental microorganisms that we have coevolved with, and for which we needed to develop tolerance (e.g. Salmonella, helminths, Mycobacterium vaccae; Rook, 2012). However the membership and modes of action of microbial Old Friends remain largely unknown (Rook, 2013), in part due to the uncharacterized 'microbial dark matter' that dominates environmental microbiomes (Lloyd et al., 2018). We do not yet know if it is microbial diversity alone, microbial diversity with Old Friends, or possibly some other functional component(s) of environmental microbiomes that may provide protective immune-training or other physiological benefits. However, if we can identify indicator taxa for natural and biodiverse environments, these might represent possible new candidate taxa to investigate further for immunomodulatory potential and to otherwise help inform the assessment, design and restoration of new nature-based public health interventions.
To progress, there is a need to build the knowledge base of microbiomes that associate with different environment types. Soil environmental DNA (eDNA) may provide a useful medium for tracking changes in ecosystem condition (Gellie et al., 2017a;Yan et al., 2018) because soils can retain a biological imprint from recent land use and management (Janzen, 2016). With the reducing costs of DNA sequencing technology, soil bacterial 16S rRNA marker gene survey data are increasingly used to help characterize soil microbiota. Also, soil-associated microbes often represent a significant component of the aerobiology derived from environments (Polymenakou, 2012), which provides ambient biological connectivity to exposed human populations. Soils have also been highlighted elsewhere due to their often high microbial diversity and immunomodulatory potential (Liddicoat et al., 2018;von Hertzen and Haahtela, 2006).
The task of exploring potential human exposures to microbiota from different characteristic environments suggests the need for microbiota profiling methods that reflect a human-scale or site-level exposure. To estimate the potential accumulative human-microbial exposure from environmental samples alone, without human test subjects, it may be necessary to consider multiple samples for each site. Such an approach would provide a broader picture of the microbial diversity, composition, and key taxa within each particular environment type. For example, we might expect the overall alpha diversity of microbiota encountered by a person interacting with a biodiverse site (represented by a number of samples) to be greater than the maximum alpha diversity of individual microbiota samples considered in isolation, due to natural heterogeneity and particular taxa not occurring everywhere. However, customary approaches to microbiome data analysis contain unexplored uncertainty and have potential to overlook less abundant taxa. It is common to rarefy marker gene survey data to a minimum sequence read depth in order to normalize sampling effort (Weiss et al., 2017), however this approach has been criticized for discarding valuable information about microbial communities (McMurdie and Holmes, 2014). These issues point to the opportunity for employing general purpose bootstrap resampling techniques on merged samples from across a site, combined with rarefying to normalize sampling effort, as a means to preserve survey data and obtain a deeper understanding of site-level (multiple sample) microbiota characteristics. Through bootstrap-style resampling we can also better understand the uncertainty or distribution of equally likely outcomes for the analyses undertaken.
Here we sought to answer the following research questions: (1) Can we employ innovative bootstrap-style resampling methods for site-level analysis of microbiome survey data to improve understanding of different environment types, potential human-environmental microbiome exposures, and their uncertainty? (2) What are the key bacterial genera (i.e. indicator taxa) that show directional trends where restoration has been undertaken, and are there generalizable patterns in contrasting human-altered vs. natural land uses at a continental scale? (3) Are there trends in human-associated genera (e.g. commensals, potential pathogens, potential Old Friends) with restoration? Our findings highlight characteristics of the trending microbial taxa from a localized restoration study site that display generalizability to microbiota changes between human-altered and natural samples elsewhere across Australia.

Overview
We first consider existing soil microbiome data from a historic chronosequence of restoration of a Eucalyptus leucoxylon-dominated grassy woodland vegetation community at Mt Bold, South Australia (Gellie et al., 2017a(Gellie et al., , 2017b(Gellie et al., , 2017c. Grassy woodland communities typically comprise groundcover of grasses, herbs and shrubs between scattered medium to large trees. Bacterial 16S rRNA marker gene survey data from sites spanning a gradient from cleared land, various revegetation ages (6, 7, 8, 10 years), and three reference patches of native vegetation (Remnants A, B, C) provide a rare dataset to investigate microbial signatures of ecological restoration. We extend the broad community and phylum-level trends described in Gellie et al. (2017a) (Web Appendix, Figs. S1-S2) to highlight genus-level indicators. To identify key trending taxa with restoration, and their uncertainty, we propose an innovative merged-sample bootstrap resampling framework (described below) based on triplicate 16S microbiome samples that characterize each site, as an alternative to conventional one-off rarefying of individual samples. We then looked for support for the generalizability of patterns identified from Mt Bold within publiclyavailable Australia-wide soil microbiome data from the Biomes of Australian Soil Environments (BASE) dataset (https://data. bioplatforms.com/organization/bpa-base; Bissett et al., 2016).

Study data
Methods for sample collection, soil chemical and physical analysis, eDNA extraction, sequencing of the bacterial 16S rRNA gene, and bioinformatic analyses to derive operational taxonomic unit (OTU) abundance tables and taxonomic classifications are described in Gellie et al. (2017a) and Bissett et al. (2016). Briefly, OTUs were based on clustered sequences of ≥97% similarity and derived separately for the Mt Bold and BASE datasets. In both cases, open reference OTU picking and sequence abundance assignment workflows were used, with taxonomy mapped to the Greengenes (13-5) reference database, as described in Bissett et al. (2016).
From Mt Bold, we used the three replicates of 16S data collected for each restoration treatment (or sample type). We focussed our analyses on surface soil (0-10 cm; n = 24), although subsurface (20-30 cm; n = 24) data are also included in Web Appendix plots (described later) to provide confidence in our observed trends. From BASE, after excluding the Mt Bold samples, we included surface soil 16S data using the following selection steps. We excluded BASE data from off-shore islands and external territories, and filtered samples with extreme, outlying soil conditions to avoid undue influence from data associated with atypical and limiting conditions for soil biology. Specifically, we excluded samples that were strongly acid (pH H2O < 4.5), strongly alkaline (pH H2O > 9), saline (electrical conductivity > 2 dS/m), and very high clay content (> 50%). From available 16S data, we assigned samples with the following land uses to classes of either 'natural' (including national park, nature conservation, strict nature reserves, wilderness area), or 'human-altered' (including cereals-wheat, cotton, irrigated seasonal horticulture, pasture legume/grass mixtures, rehabilitation -i.e. once degraded land that is being restored towards a reference state, sown grasses, sugar, tree fruits-apple). These assignments were guided by interpreting Australian land use classification guidelines (ABARES, 2016). Due to ambiguity, we excluded land uses of grazing native vegetation, native/exotic pasture mosaic, natural feature protection, other conserved area, protected landscape, reserve, and residual native cover. Lastly, we did not wish to compare human-altered and natural samples separated by large geographic distances, so we used nearest neighbour calculations based on latitude and longitude to exclude samples with > 5 degrees of geographic separation from their nearest complementary sample type (i.e. each natural sample is ≤5 degrees from a human-altered sample, and vice versa), to arrive at the final human-altered (n = 78) and natural (n = 139) samples (locations shown in Web Appendix, Fig. S3). The nearest neighbour filtering eliminated the sample type of wilderness area.
Within the respective Mt Bold and Australia-wide BASE microbiome data we only used taxa assigned as Bacteria, and excluded all taxa assigned as chloroplast or mitochondria and any taxa not assigned at the bacterial phylum level. The two datasets were filtered separately to remove taxa with < 100 sequence reads across all samples (Gellie et al., 2017a), or that did not occur in at least two samples.

Data analysis
We used R software (R Core Team, 2018) for the majority of analyses, with purpose-built scripts (see Web Appendix) employing the microbiome data analysis framework of the R phyloseq package (McMurdie and Holmes, 2013). Phyloseq uses microbiome data objects that facilitate linked analysis of OTU abundance, taxonomy and sample contextual data. As described below, we used both rarefied and nonrarefied OTU abundance data, reflecting different input and sample normalization requirements for particular analyses. Briefly, rarefied OTU abundance data were used in visualising beta diversity and permutational multivariate analysis of variance (PERMANOVA; Anderson, 2017) testing of differences in microbiota groups, while non-rarefied OTU abundances were used when comparing OTU relative abundance (%) data and for differential abundance testing using the R DESeq2 package (Love et al., 2014). We implemented the new merged-sample bootstrap resampling (described below) to examine mean responses and uncertainty in OTU-and functional-alpha diversity, and associations between OTU relative abundance and restoration at Mt Bold. We visualized the sequence depth of samples using rarefaction curves (Web Appendix, Fig. S4). OTU alpha diversity in each sample was estimated using the exponential transform of Shannon Index values to derive the effective number of OTUs (Jost, 2006). We visualized differences between sample microbiota (beta diversity) using non-metric multidimensional scaling (NMDS) ordination of Bray-Curtis distances based on rarefied OTU abundances, at the minimum sequence read depth of the samples concerned. PERMANOVA was used to examine the statistical significance of compositional differences between rarefied OTU abundances of sample groups, implemented using the adonis() function, followed by the betadisper() function to test for homogeneity of group dispersions, where both functions are from the R vegan package (Oksanen et al., 2018). Distance-based redundancy analysis was used in preliminary exploration of relationships between microbiota and soil conditions at Mt Bold. Further analyses are described in more detail below.

Merged-sample bootstrap resampling
To preserve microbiome survey data while examining site-level (i.e. multiple sample) characteristics and their uncertainty, we implemented the following resampling framework (also see Fig. 1): 1. Prepare OTU abundance data in non-rarefied form with varying sequence depths across multiple sites. Here the full OTU table is expressed in a single phyloseq microbiome data object with the multiple samples per site given a respective site label in the phyloseq sample contextual data table. In our case, the Mt Bold data have triplicate samples at each site (i.e. treatment or sample type). 2. Set seed for pseudo-random number generator. 3. Rarefy all samples to an even sampling depth; equal to the minimum sequence depth across all samples, using sampling with replacement. 4. With the rarefied data from step 3, merge samples (i.e. group by site label) and sum taxa counts by site. Transfer results to a new sitelevel phyloseq microbiome data object. 5. Set seed for pseudo-random number generator. 6. Rarefy the merged data from step 4 to an even sampling depth, using the same minimum sequence depth value from step 3, again using sampling with replacement. 7. Evaluate the measure of interest based on the merged and rarefied site-level data from step 6 (e.g. alpha diversity, evaluate taxa relative abundance and correlation with revegetation age) and store results as an element of an output list. 8. Repeat steps 2 to 7, up to the number of predefined iterations (e.g. B = 100).
Sampling with replacement was chosen for consistency with general purpose bootstrap methods, and particularly in step 3 to avoid generating repeated data for the sample with the minimum sequence size. We did, however, trial combinations of sampling with-and withoutreplacement and achieved similar results (Web Appendix, Fig. S5). We note that two rounds of rarefying (as above) achieved normalization of sampling effort that contributes to site-level data and enables the merged-sample bootstrap density distributions (for equally likely sitelevel outcomes) to be plotted on the same scale as data derived from one-off rarefied samples.

Assessing trends: Mt Bold restoration
We examined trends in taxa relative abundance with restoration by considering metrics for effect size and significance. In a strict sense, the restoration chronosequence represents an ordinal independent variable. Using ordinal regression analysis of variance (AOV) (Gertheiss, 2015) for each taxon, we tested the null hypothesis of no ordinal trend by revegetation age, assuming ordinal values (cleared = 1; 6 years = 2; 7 years = 3; 8 years = 4; 10 years = 5; Remnants A, B, C = 6). However, treating the data in this way does not provide information on effect size. So, we assessed effect size or strength of relationship (Shinichi and Innes, 2007) by assuming pseudo-continuous values for the missing revegetation ages (cleared = 0 years; Remnants A, B, C = 20 years) to calculate the Pearson correlation coefficient with each distribution of taxon relative abundance. These metrics were calculated using the merged-sample bootstrap resampling framework, as above. The top trending taxa were identified based on the magnitude and sign (positive = increasing, negative = decreasing) of the mean correlation coefficient of taxon OTU relative abundance with revegetation age from the merged-sample bootstrap results.
To minimize spurious results, taxa were excluded if they did not appear in at least 30% of the resamples, or if they were not represented across at least three different sample types. Significant P-values from the ordinal regression AOV were adjusted using the Benjamini-Hochberg method to control for false discovery rate at an alpha level of 0.05. For the trend analyses, we focussed on genera, except for unclassified genera which were aggregated to the next available classified taxonomic group. We also defined taxa that were missing in cleared or remnant samples using a threshold of 95% absence from the merged-sample bootstrap results. We tested for differences in the distribution of OTU alpha diversity, derived from merged-sample bootstrap resampling, among Mt Bold samples grouped by revegetation age using the 95% confidence interval (CI) for the mean difference between groups, based on the bootstrap results.

Analysing community shifts: Mt Bold vs. Australia-wide
DESeq2 differential abundance testing was used to estimate fold changes in OTU abundance between the Australia-wide human-altered and natural groups. The DESeq2 algorithm internally adjusts for testing 1. Prepare OTU abundance data in non-rarefied form. Replicate samples are annotated to facilitate merging of groups, as below, e.g. samples X1, X2, X3 are from site X.
3. Rarefy to minimum sequence depth (D min ), sampling with replacement.
2. Set seed for pseudo-random number generator. 7. Evaluate measure of interest (e.g. alpha diversity) and save results.

Repeat
s p e t s

Analyze overall results
Merged-sample bootstrap resampling C. Liddicoat, et al. Environment International 129 (2019) 105-117 multiple hypotheses by applying the Benjamini-Hochberg method to control for false discoveries. To test whether microbiota shifts observed at Mt Bold might be generalizable, we examined the increasing and decreasing taxa most correlated with restoration to see if similar patterns of differential abundance were found across the Australia-wide samples. We plotted correlation with revegetation age against the log2fold-change results from differential abundance testing and highlighted two quadrants where similar patterns were evident. In the first quadrant, defined by a positive correlation with revegetation age and positive log2-fold-change between human-altered and natural groups, taxa are associated with a shift towards more natural or restored ecosystems. Conversely, in the second quadrant (negative correlation and negative log2-fold-change), taxa are associated with more altered or unnatural ecosystems. Plots of taxa correlation with revegetation age (Mt Bold) against log2-fold-change from human-altered to natural for Australiawide data were developed for the top-trending taxa identified from Mt Bold and for selected human-associated bacterial genera identified in the literature (Baumgardner, 2012;Berg et al., 2005;Bultman et al., 2013;Jeffery and van der Putten, 2011).
To further validate results for the top-trending taxa from Mt Bold in the Australia-wide samples, we compared taxon relative abundances between human-altered and natural samples and performed one-sided Wilcoxon rank-sum tests to detect significant differences between the groups in the direction predicted by the respective trend seen at Mt Bold.

Functional assignments and case study comparisons
To explore functional similarities and differences between sample types we used PICRUSt v1.1.1, which infers functional gene abundance from 16S data (Langille et al., 2013), as implemented in the R themetagenomics package (https://github.com/eesi/themetagenomics; Woloszynek et al., 2017). Greengenes sequence identifiers were needed to make PICRUSt functional assignments, so we used VSEARCH (Rognes et al., 2016) to seek matches for our study OTUs (represented by fastaformat sequences) with the closest available Greengenes (13-5) fasta sequences (see Web Appendix, Methods S1). Greengenes sequences were substituted for study OTUs for use in PICRUSt only where ≥97% sequence similarity was achieved and where the Nearest Sequenced Taxon Index (NSTI) was ≤0.15. Beyond this NSTI threshold, PICRUSt user documentation suggests functional predictions will be of low quality (http://picrust.github.io/picrust/tutorials/quality_control. html). We recorded the loss of functional representation for our study OTUs where they could not be aligned to representative Greengenes sequences, and then where function could not be reliably inferred due to large phylogenetic distance (NSTI > 0.15) between representative sequences and available functionally described sequences in the PI-CRUSt reference genome database. When using PICRUSt we followed the developers' recommendation to employ 16S copy number correction, which aims to account for different gene copy numbers in different organisms. We expressed functional data using the Kyoto Encyclopedia of Genes and Genomes (KEGG) orthology option in PICRUSt. KEGG orthologous gene data, or orthologs, refer to equivalent genes and gene products (e.g. RNA and proteins) that arise in different organisms, and are organized under categories including: metabolism, genetic information processing, environmental information processing, cellular processes and human diseases (Kanehisa et al., 2012).
For comparison to the Mt Bold sites that are each represented by triplicate samples, functional data were prepared for three randomly selected samples from all the Australia-wide natural and human-altered sites that contained at least three samples. Functional alpha diversity was estimated using both merged-sample bootstrap resampling and individual rarefied sample data (based on the minimum sequence depth across Mt Bold and Australia-wide samples). We also visualized bacterial functional profiles corresponding to rarefied OTU abundance data using a heat map. Additional subsamples, representing the top 30 increasing taxa (isolated from the 10 year samples), and the top 30 decreasing taxa (isolated from the cleared samples) were included to characterize functional trends associated with the Mt Bold restoration. KEGG function relative abundance data from the three representative samples per site for all Mt Bold and Australia-wide samples were averaged to indicate site-level functional profiles. Lastly, for display in the functional heat map, we ran row (i.e. sample-wise) and column (i.e. function-wise) normalization by subtracting the mean from the observed values and then dividing by the standard deviation.

Shifting soil microbiota with restoration vs. Australia-wide patterns
Soil bacterial communities from the Mt Bold restoration chronosequence displayed compositional shifts that were consistent with a transition between the broader groups of human-altered and natural microbiota from elsewhere in Australia (Fig. 2). The Australia-wide, human-altered and natural rarefied OTU abundance data (sequence depth 6377) showed significantly different compositional centroids (PERMANOVA based on the Bray-Curtis distance matrix: F = 22.7, P = 0.001), not due to differences in beta dispersions of the two groups (F = 3.7, P = 0.062).
We identified key trending taxa including bacterial genera and unclassified groups associated with restoration at Mt Bold (Web Appendix, Fig. S6, Table S1). To provide visual confirmation of trending taxa identified using the merged-sample bootstrap resampling framework and to compare with conventional rarefied data, we plotted OTU relative abundance against revegetation age for the top 10 increasing and top 10 decreasing taxa associated with restoration at Mt Bold (Web Appendix, Figs. S8-S9). We did not observe patterns that might indicate undue bias from soil conditions on restoration treatments or microbiome samples at Mt Bold (Web Appendix, Fig. S10). Eight out of the top 10 increasing taxa associated with restoration at Mt Bold (comprising DA101,Candidatus Xiphinematobacter,Bradyrhizobium,Candidatus Solibacter,Candidatus Koribacter,unclassified (family: Rhodospirillaceae), Rhodopila, and unclassified (family: [Leptospirillaceae])), and seven out of the top 10 decreasing taxa (comprising Bacillus, unclassified (order: Ellin5290), Sporosarcina, unclassified (family: Ellin5301), Ammoniphilus, Flavisolibacter, unclassified (class: C0119)), showed consistent trends when comparing OTU differential abundance from the human-altered to natural samples elsewhere in Australia (Table 1; Web Appendix, Figs. S11-S12). The Australia-wide human-altered and natural samples separate into two distinguishable clusters when mapped in two dimensions corresponding to the cumulative OTU relative abundance of the top 10 increasing and top 10 decreasing taxa that trend with restoration at Mt Bold ( Fig. 3; PERMANOVA on Euclidean distances: F = 119.33, P = 0.001; although dispersions of the two groups were different: F = 9.14, P = 0.003). Eight taxa were missing from cleared samples, increased in OTU relative abundance with restoration, and were found in remnants (Web Appendix, Table S2); while 14 taxa were found in cleared samples, decreased with restoration, and were missing in remnants at Mt Bold (Web Appendix, Table S3).
A range of taxa displayed outlying differential abundance in the Australia-wide human-altered vs. natural samples (Web Appendix, Figs. S13-S14; the top 30 increasing and top 30 decreasing taxa based on fold change between human-altered to natural samples are listed in Web Appendix, Table S4). However, the top-trending taxa with restoration at Mt Bold show consistent differential abundance patterns in the Australia-wide data (Web Appendix, Fig. S13, Fig. S15). This association is strongest for the top few taxa most correlated with revegetation age at Mt Bold, and declined as more taxa were considered (Web Appendix, Fig. S15).

Patterns in human-associated bacteria
Human-associated genera that increased with restoration at Mt Bold and had higher differential abundance in natural samples included Actinomadura, Burkholderia, and Mycobacterium. Genera that decreased with restoration and had higher differential abundance in human-altered samples included Achromobacter, Bacillus,Bacteroides,Chryseobacterium,Clostridium,Enterobacter,Flavobacterium,Legionella,Pseudomonas,Rhodococcus,Sphingobacterium and Streptomyces (Fig. 4; Web Appendix, Table S5, Figs. S16-S17).

Alpha diversity, functional diversity, and functional clustering
There was no apparent trend in OTU alpha diversity with restoration at Mt Bold when only considering the rarefied OTU abundance data. However, a different signal emerged when using site-level density distributions from the merged-sample bootstrap results (Web Appendix, Fig. S18a). We observed a significant rising trend in OTU alpha diversity with restoration at Mt Bold as determined from 95% CIs of differences between mean bootstrap results across groupings of revegetation age (Fig. 5). Estimated functional alpha diversity was, however, highest for cleared samples and remained steady or declined with revegetation age, although only 20-30% of OTUs were represented when inferring functions (Web Appendix, Fig. S18b-c). We observed no apparent trends in either OTU alpha diversity or functional alpha diversity for the Australia-wide human-altered vs. natural case study samples considering both rarefied OTU abundance data and the merged-sample bootstrap density plots (Web Appendix, Fig. S19); nor was there any relationship apparent between OTU alpha diversity and functional alpha diversity in the Australia-wide data. Only 10-30% of OTUs were represented when inferring functions (Web Appendix, Fig.  S19c).
Functional profiles for the top 30 increasing taxa with restoration at Mt Bold clustered together with the majority of other Mt Bold samples, except for the cleared samples whose mean functional profile was most different to the other samples considered (Fig. 6). The top 30 decreasing taxa with restoration at Mt Bold clustered with human-altered samples from annual croplands of cotton, sugar and wheat; although this cluster also contained samples from two national parks. The majority of four national parks clustered together with human-altered samples from perennial land uses including horticulture (apples) and pastures.

Bacterial indicators of restoration
Our results broadly indicate a core microbiome that shifts with ecological restoration. We demonstrate support for the relative abundance patterns in the majority of top 10 increasing and top 10 decreasing, and human-associated taxa identified from Mt Bold, by showing consistent differential abundance patterns within human-altered and natural sites across Australia. We might expect restoration to bring increased stability, perenniality and diversity of aboveground plants, and increased rhizosphere interactions involving different plant species and combinations of root exudates. Soils under restoration would also accumulate organic matter including plant debris and fungal hyphae networks due to reduced disturbance. With increasing complexity and stability of microbial habitats and feedstocks (Adams and Wall, 2000), it is unsurprising that soil microbiota would shift, in comparison to highly disturbed or regularly cleared land. In disturbed environments, where microbial habitats and feedstocks may undergo large fluctuations and possibly collapse, there is potential for fastgrowing, adaptable environmental opportunists to thrive as vacant ecological niches arise. Broad shifts in soil microbiota composition with restoration have been previously identified (Gellie et al., 2017a), however, here we reveal in fine taxonomic resolution the key genera that are increasing and decreasing in these shifting communities.
The eight taxa we observed from the top 10 increasing with restoration at Mt Bold, that also showed higher differential abundance in natural samples Australia-wide, include K-selected organisms associated with stable, late-successional ecosystems (MacArthur and Wilson, 1967) and organisms with particular resource requirements. Fig. 2. Surface soil bacterial communities from the Mt Bold restoration chronosequence (large filled circles of red, greens, and blues; n = 24) traverse the broader groupings of natural (cyan, n = 139) and human-altered (pale red, n = 78) bacterial communities from elsewhere in Australia, as stylized in the inset. Microbiota visualization is based on rarefied (sequence depth = 6377) bacterial 16S OTU abundance of taxa aggregated at the genus or next available classified level, using NMDS ordination of Bray-Curtis distances. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

Table 1
Summary statistics for the top 10 increasing and top 10 decreasing taxa associated with restoration at Mt Bold, with corresponding differential abundance data in human-altered samples (n = 78) vs.  DA101 is one of the most abundant bacteria found in soil, particularly in grasslands (Brewer et al., 2016). Candidatus Xiphinematobacter species are maternally-transmitted endosymbionts of Xiphinema americanum-group nematodes, which have cosmopolitan distribution (Archidona-Yuste et al., 2016) and field data suggest these nematodes are K-selected organisms with a long lifespan and low reproduction rate (Jaffee et al., 1987). Bradyrhizobium is another ubiquitous and abundant genus of bacteria that dominates forest soils (VanInsberghe et al., 2015) and tends not to overlap where DA101 dominates (Brewer et al., 2016). Genomic investigation of Candidatus Solibacter and Candidatus Koribacter suggest that these organisms are able to decompose complex substrates such as plant litter in soils and are slow-growing with a Kselected lifestyle (Ward et al., 2009). The family Rhodospirillaceae contains genera mostly with a preferred photoheterotrophic growth mode under anoxic conditions in light, or they grow chemotrophically in the dark (Garrity et al., 2005). Rhodopila is an acid-loving member of the Rhodospirillaceae family. Photoheterotrophs cannot use carbon dioxide as their sole carbon source, so they also use organic compounds from the environment. The character of unclassified taxa from the family Leptospirillaceae is uncertain; this family appears to be renamed as Nitrospiraceae, and contains iron-oxidising Leptospirillum (Hippe, 2000) and nitrate-oxidising Nitrospira (Watson et al., 1986). The seven taxa from the top 10 decreasing with restoration at Mt Bold, that also showed higher differential abundance in human-altered samples Australia-wide, include fast-growing and opportunistic species. Bacillus species are ubiquitous in nature, are often fast-growing, live in aerobic or anaerobic conditions, and are capable of forming resistant endospores to survive stressful environmental conditions for long periods of time. They include medically significant B. anthracis (anthrax) and B. cereus associated with food spoilage and poisoning, as well as many normally harmless species. Unclassified taxa in the order Fig. 3. Two-dimensional mapping of ecosystem condition based on soil bacterial 16S data. Natural (cyan crosses, n = 139) and human-altered (pale red squares, n = 78) samples from across Australia form distinguishable groups when characterized by the cumulative OTU relative abundance of the top 10 increasing (y-axis) and top 10 decreasing (x-axis) bacterial taxa that trend with restoration at Mt Bold. Kernel density contours emphasize the weight of samples for each group. Mean site-level data for Mt Bold (annotated circles; n = 8) indicate a shift in key microbiota from disturbance-adapted opportunistic taxa (lower left of plot) to mature niche-adapted taxa (upper right of plot). (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.) Fig. 4. Mean correlation of human-associated taxa OTU relative abundance with revegetation age from Mt Bold (x-axis) plotted against the mean differential OTU abundance between human-altered and natural samples elsewhere in Australia. The quadrant labelled 'Natural' corresponds to taxa that increase with restoration and have higher differential abundance in natural sites, while the quadrant labelled 'Disturbed' corresponds to taxa that decrease with restoration and have higher differential abundance in human-altered sites. Ellin5290, and in the family Ellin5301, are all from the phylum Gemmatimonadetes which display cosmopolitan distribution and versatile, generalist ecological strategies, adapting to a wide variety of environments including low moisture conditions (DeBruyn et al., 2011). With similar features to Bacillus, Sporosarcina are facultatively anaerobic or strictly aerobic heterotrophs, capable of forming endospores that can persist in the environment (Yoon et al., 2001). Ammoniphilus species are aerobic, spore-forming bacteria, dedicated to the use of oxalate (a common constituent of fresh plant tissues; Libert and Franceschi, 1987) for carbon and energy (De Vos et al., 2009). Flavisolibacter species are aerobic, non-motile, non-spore-forming rods, which appear to be environmental opportunists, having been isolated from various soils, fresh water, and a biofilm coating parts of an automotive air conditioning system (Kim et al., 2018). No information was available on the character of unclassified taxa from the class C0119 (phylum Chloroflexi).
We suggest that microbial indicators of ecosystem condition should be detectable across a range of environments and provide a meaningful association with a limited range of organisms. Also, we expect that the choice of taxonomic rank for such indicators will represent a trade-off between sensitivity to detect effects, and specificity to particular organisms. Our bacterial 16S data are appropriate for genus-level observations (Fox et al., 1992) and reflect a common and cost-effective mode of environmental microbiome survey data. Using correlation coefficients as a measure of effect size (so that our results emphasize the tightness of relationship with restoration) enabled us to identify trends in abundant and rare taxa alike. This approach acknowledges that rare taxa can play important ecological roles (Hol et al., 2010). Our twodimensional mapping of soil microbiome data, with axes highlighting proportions of opportunistic vs. mature niche-adapted taxa, may have use for ecosystem monitoring and management, as discussed further below.

Patterns in alpha diversity and function
We observed an increasing trend in alpha diversity with restoration at Mt Bold facilitated by the merged-sample bootstrap resampling approach. However, microbiome data relating to restoration treatments were not available for any other samples considered in our study. Considering just the one-off microbiome survey data for natural vs. human-altered samples, we found no generalizable patterns in OTU alpha diversity or functional alpha diversity between the two groups. In other words, the variability in OTU and functional alpha diversity was largely driven by site-specific factors rather than whether the samples were classed as natural or human-altered. On the other hand, coherent patterns in functional profiles did emerge. The top 30 increasing taxa clustered with the majority of Mt Bold samples, perhaps reflecting local adaptation. Perennial agriculture and the majority of natural samples clustered together, possibly reflecting generally more stable and mature soil ecosystems. Samples from annual crops (i.e. the most disturbed soils) also clustered together. The functional data also suggest not all natural samples behave the same, as two national park samples clustered with annual crop samples. Fig. 6. Inferred microbial function relative abundance based on site (row) and function (column) z-scores, derived from rarefied OTU abundance data (sequence depth = 16,704). The method used to infer gene functions could only represent 20-30% of sequences from the rarefied sample microbiota data. Row groupings highlight sites with similar functional profiles based on second-level branching of the row dendrogram. Row side colors indicate sample types. This plot illustrates similarities in functional profiles-i.e. columns display variation in 5434 orthologous genes-however it is beyond the scope of this study to examine these functions in detail. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

Implications for ecosystem restoration and management
We propose a two-dimensional index of ecosystem condition based on soil bacterial 16S survey data, where soil eDNA samples might be used to map study sites into zones ranging from degraded ecosystems, through intermediate stages of restoration, to mature restored and reference ecosystems. Such an index could have value in tracking the progress of restoration activities, assessing the condition of land, and prioritising areas for restoration investments. The axes of Fig. 3 reflect shifting proportions of often fast-growing, environmental generalists and opportunists (x-axis) vs. more stable niche-holders adapted to mature and reference ecosystems (y-axis). We offer a first-cut approach relevant to our temperate grassy woodland ecosystem which may be improved by considering trending taxa from a wider diversity of environments. For example, similar microbiome datasets could be collated from restoration sites (e.g. chronosequences like Mt Bold) across more dissimilar environments with the objective to build a universal indicator set. Alternatively, locally representative trajectories of restoration or land improvement, for a given soil type and land use, could provide a localized frame of reference for bacterial shifts in particular ecosystems (e.g. low to medium to high sustainable production agricultural land), against which neighbouring land could be evaluated.
Interestingly, we observed some natural environments can have characteristics similar to disturbed and developing ecosystems, while a minority of human-altered environments can resemble mature reference ecosystems. Presumably, human-altered environments that are close to mimicking natural systems would include stable, perennial land cover, as suggested by the functional clustering of many natural samples with perennial horticulture and pastures. On the other hand, natural environments with low in-situ build-up of organic matter (e.g. due to sandy or infertile soils) and therefore stunted soil biological activity and buffering capacity, might cause the soil microbiota to resemble a developing or degraded ecosystem. We acknowledge that soil microbiota may not always be representative of current land use and management. It is possible for soils to have legacy influences, and soil microbiota may undergo multi-decadal shifts as microbial habitats and feedstocks move towards new equilibria following land use change (Bell and Lawrence, 2009). Issues such as grazing pressure and isolation can also cause native vegetation-based ecosystems to not necessarily function in a natural or reference state.
Biological indicators are used elsewhere in ecosystem monitoring (Stanford and Spacie, 1994) due to the ability of organisms to reflect cumulative or integrative responses across physical and chemical parameters that may undergo short-term fluctuations. That is, physicochemical monitoring often cannot fully represent the condition of ecosystems without expensive multi-parameter, fine temporal resolution monitoring. Additionally, soil inoculation can be a powerful tool to help facilitate restoration of disturbed terrestrial ecosystems and steer plant community development (Wubs et al., 2016). Therefore identifying beneficial microbial signatures may help in targeting soil microbiota for inoculations or recognizing sites with inherent remediation potential.

Implications for microbiota-mediated human health
Our results suggest that there is potential for disturbed soils to harbour environmental opportunists with potential pathogenic character. We found a number of often fast-growing, environmental opportunists that were associated with human-altered soil samples and inversely correlated with restoration. Notable environmental taxa with importance for human health in this category included the genera Bacillus, Clostridium, Enterobacter, Legionella and Pseudomonas which include opportunistic pathogens, often associated with nosocomial infections. Through direct contact or wind-blown aerobiology these soilborne bacteria may impact susceptible individuals, neighbouring buildings, and even highly sterilized (i.e. ecologically vacant) health care facilities such as hospitals. Aerobiological access is likely via high traffic areas for patients, staff and visitors. Such a prospect is of particular concern where environmental and/or exposed human microbiomes suffer declining microbial diversity, thus increasing the potential for pathogenic microbes from the environment to overpower the defence mechanisms of resident environmental and/or exposed human microbiota. That is, pathogenicity should be considered in the context of the host-microbe system, as discussed later. To illustrate this possibility, environmental sources were suspected as the most likely origin in a recent Escherichia coli O55:H7 outbreak in Dorset England (Public Health England, 2017). Strains of E. coli can become naturalized to live in soils (Ishii and Sadowsky, 2008). It is also concerning that environmental opportunists such as Klebsiella pneumoniae may be implicated in spreading anti-microbial resistance genes to clinically-important pathogens (Wyres and Holt, 2018). However, it may be possible to reduce the threat from opportunistic potential environmental pathogens through ecological restoration and preserving and enhancing natural and biodiverse vegetation in urban areas. It may also be possible to reduce the risk of hospital-acquired infections, which may in part be due to airborne opportunistic microbes from environmental sources. Although not all natural areas will be the same, nor have the same immune-priming attributes, we expect that the maintenance of naturebased microbial diversity, together with more slow-growing microbiota adapted to restored and reference ecosystems, will have population health benefits through building immune fitness and suppressing opportunistic pathogens.
Our analyses show that presumptions should not be made about the level of microbial diversity, nor functional diversity, for natural sites in comparison to human-altered sites, nor generalizations about sites of the same class, based on single-time-point eDNA surveys. Variation in 16S OTU alpha diversity and functional alpha diversity appear to be driven by site-specific factors, overshadowing any effect linked to the natural or human-altered classification. On the other hand, where we had eDNA survey data characterizing restoration from cleared (humanaltered) land towards a natural state at the single location of Mt Bold, we saw an increase in microbial diversity (although not functional diversity). We did not have further data available to apply our new methods more widely and test whether restoring native biodiversity might lead to increased microbial diversity within particular locations elsewhere. In short, our study provides early supporting evidence that restoring a more biodiverse ecosystem may boost the microbial diversity within that location, which should then become available for beneficial immunomodulation (Mills et al., 2017).
It is beyond the scope of our study to identify microbial Old Friends, however we highlight example taxa that associate with ecological restoration and natural reference soil microbiomes vs. human-altered or disturbed soil microbiomes (Table 1, Fig. 4; Web Appendix, Tables S1-S5). These results offer potential targets for subsequent research into possible immunomodulatory capabilities. From the human-associated taxa that associated with natural samples and restoration, we observed the often slow-growing Mycobacterium. This genus includes the common non-pathogenic soil-dwelling M. vaccae which has been associated with reduced anxiety-like behaviour in mice (Matthews and Jenks, 2013). Although, Mycobacterium also contains pathogenic species associated with infrequent but serious diseases including tuberculosis (M. tuberculosis) and leprosy (M. leprae). We note that new conceptualisations of what makes a pathogen are relevant to this discussion. Casadevall and Pirofski (2000) suggest the definition of a pathogen should be based on the potential for host damage arising from the host-microbe relationship, and not attributes of the microbe alone. Examples where pathogenicity depends on both the microbe and host environment include cell-to-cell signalling, termed quorum sensing, where high cell densities of genetically-related bacteria can produce a positive feedback and self-promotion of growth factors and higher virulence (Rumbaugh et al., 2012). Conversely, through ecological mechanisms, greater microbial diversity in soils can resist the invasion and establishment of potentially pathogenic species (van Elsas et al., 2012). In earlier work (Liddicoat et al., 2016), we sought to unite the Biodiversity Hypothesis and microbial Old Friends concepts, suggesting that optimum beneficial immunomodulation from a particular Old Friend is likely to occur in the context of microbial diversity, required to keep any potential pathogenic behaviour in check.

Limitations
Our analysis of bacterial indicators reflected a limited microbiome dataset associated with restoration towards a grassy woodland ecosystem. Therefore, the top trending taxa we identified may not be applicable for very different environments. Also, as we analysed genera, our study cannot inform the implications for particular and often rare taxa. Instead, our study helps to build understanding of the distribution patterns of larger groups of related organisms that may share similar ecological niches. It is speculative to suggest that increased abundance of genera such as Bacillus, Clostridium, Enterobacter, Legionella and Pseudomonas may contribute to increased rates of human disease. Even so, it is informative to appreciate that increased exposure to these taxa is likely in disturbed environments, and as a consequence there may be an increased likelihood of health impacts in susceptible individuals due to their generally fast-growing and opportunistic nature (Benenson and APHA, 1995;Ristuccia and Cunha, 1985).
Our study lacked data on the actual biomass of environmental microbiota, as well as actual human exposures. Also, amplicon-based OTU abundance data are subject to taxon-specific biases (e.g. during DNA extraction and polymerase chain reaction amplification of DNA). However, despite such biases, using relative sequence abundance information, as we have done, has potential to provide more accurate insights to actual biomass proportions compared to alternative analyses using presence-absence only data (Deagle et al., 2018). Although eDNA analyses do not permit true insight into actual bacterial exposures or biomass, the increasing knowledge we gain into microbiota diversity and the relative abundance of key taxa in different environments is relevant to human exposures and microbial ecological interactions between environmental and host microbiotas.
We experienced limitations using PICRUSt v1.1.1 to infer functional profiles for the different environmental microbiomes, in the conversion of our study OTUs into Greengenes (13-5) sequence identifiers, and in finding suitable nearest available taxa in the PICRUSt database. We used copy number correction only for the functional analysis within the PICRUSt software as this followed developers' recommendations. However, in the remainder of our microbiome data analyses we did not seek to correct for gene copy numbers, as this is routinely not included elsewhere due to a lack of available knowledge (Louca et al., 2018). Despite these issues, we believe our coherent findings from the analysis of functional differences between the different soil communities has provided useful insight. We acknowledge that attributing functions to 16S data and refining microbiome data analyses methods more broadly represent areas of active research and development.

Conclusions
Using our new merged-sample bootstrap resampling framework, which preserves microbiome data and permits more detailed study of site-level information, we discovered emergent patterns that were not apparent using conventional one-off rarefying of individual samples. Specifically, we found patterns in OTU relative abundance and function for ecologically-relevant and human-associated key trending soil bacteria from a localized restoration chronosequence at Mt Bold (South Australia), which aligned with differences between human-altered and natural soil microbiome samples from across Australia. Our results help build knowledge towards using microbial indicators from soil eDNA to assess, manage and restore ecosystem condition. We also show that environmental opportunists, including potential human pathogens, increase in OTU relative abundance in disturbed ecosystems; a finding that may have important implications for microbiota-mediated human health and possible new ecologically-based health improvement and pathogen control interventions.