From diversity to complexity: Microbial networks in soils

Network analysis has been used for many years in ecological research to analyze organismal associations, for example in food webs, plant-plant or plant-animal interactions. Although network analysis is widely applied in microbial ecology, only recently has it entered the realms of soil microbial ecology, shown by a rapid rise in studies applying co-occurrence analysis to soil microbial communities. While this application offers great potential for deeper insights into the ecological structure of soil microbial ecosystems, it also brings new challenges related to the specific characteristics of soil datasets and the type of ecological questions that can be addressed. In this Perspectives Paper we assess the challenges of applying network analysis to soil microbial ecology due to the small-scale heterogeneity of the soil environment and the nature of soil microbial datasets. We review the different approaches of network construction that are commonly applied to soil microbial datasets and discuss their features and limitations. Using a test dataset of microbial communities from two depths of a forest soil, we demonstrate how different experimental designs and network constructing algorithms affect the structure of the resulting networks, and how this in turn may influence ecological conclusions. We will also reveal how assumptions of the construction method, methods of preparing the dataset, and definitions of thresholds affect the network structure. Finally, we discuss the particular questions in soil microbial ecology that can be approached by analyzing and interpreting specific network properties. Targeting these network properties in a meaningful way will allow applying this technique not in merely descriptive, but in hypothesis-driven research. Analysing microbial networks in soils opens a window to a better understanding of the complexity of microbial communities. However, this approach is unfortunately often used to draw conclusions which are far beyond the scientific evidence it can provide, which has damaged its reputation for soil microbial analysis. In this Perspectives Paper, we would like to sharpen the view for the real potential of microbial co-occurrence analysis in soils, and at the same time raise awareness regarding its limitations and the many ways how it can be misused or misinterpreted.


I. ILLUSTRATION OF THE INCREASE OF THE USE OF NETWORK ANALYSIS IN SOIL RESEARCH
In the last two decades we have observed a significant increase of the use of network analysis in the investigation of microbial systems. We illustrate this increase by showing the number of articles published in the field from 2000 to 2020, see Fig. S1. The figure shows the result of a keyword search in Web of Science, by selecting the research articles with expressions: "microbial network analysis" or "microbial co-occurrence analysis", and filtering the ones applied to soil with expression "soil". We can observe an exponential growth of such articles in the past decades, and the rate of increase is faster in soil research (see inset of Fig. S1). Having only less than 10% from the total until 2010, they have reached almost 35% of all microbial network papers by 2020.

II. SAMPLING SITE AND PROCEDURE
Soil was sampled in a mature beech forest (48 • 07 18.0"N 16 • 02 45.8"E, 540 m altitude) after removal of litter, on the April 12, 2021. The sampling site is located at the research site "Klausen-Leopoldsdorf" in the Vienna Woods, Lower Austria, Austria, which is run by the Austrian Federal Office and Research Centre for Forests. The soil type is classified as a dystric cambisol over sandstone with a loam-loamy clay texture [3]. Soil cores (8 cm diameter, 5 cm height) were taken at two soil depths (0 − 5 and 15 − 20 cm) one on top of the other, at two locations with approximately one meter distance to each other in five plots, respectively. This made up 10 soil cores for each soil depth. The cores were transported to the laboratory and kept at 4 • C overnight. Subsequently, we took two subsamples (core: 3 cm diameter, 5 cm height) from each core except for one core of the lower soil horizon, which contained too many stones to core it. We ended up with 20 sub-cores of the upper soil horizon and 18 for the lower soil horizon. After we sieved the cores (2 mm) and removed roots and FIG. S1: Number of publications from a Citation Report from Web of Science using for the total the expression (((ALL=(microbial network analysis)) OR ALL=(microbial co-occurrence analysis))) and for soil the expression ((((ALL=(microbial network analysis)) OR ALL=(microbial co-occurrence analysis)) )AND ALL=(soil)). Inset: the plot in log-lin scale, to show the exponential growth. stones, we weighed in 400 mg of each of the 38 samples in Lysis Matrix E tubes (MP Biomedicals) and stored them at −80 • until DNA extraction.

III. DNA EXTRACTION AND SEQUENCING
Collected soils ( 400 mg of soil) underwent bead-beating using Lysis Matrix E tubes (MP Biomedicals) on a PowerLyser (MP Biomedicals) for 30 seconds. Following bead-beading, DNA was extracted from the soil samples at the Joint Microbiome Facility (JMF) [5] using the MPBio Soil DNA extraction kit (MP Biomedicals). Extracted DNA was assessed for quantity and quality prior to PCR amplification using the 515F/806R primers for the 16S SSU rRNA marker gene. Amplified material was sequenced on the Illumina MiSeq platform (Illumina).
The obtained number of reads for the samples originating from lower soil layer were similar to the ones from upper soil layer. The sparsity of the dataset is shown by the mean-variance relationship of the sequence reads in Fig. S3. For the network construction we do not use taxa with low read numbers (these taxa also correspond to taxa with low prevalence), see Fig. S2 and S3. This criteria leaves 283 ASVs in the upper and 304 ASVs lower layer. After trimming we leave only the ASVs with more than 100 total reads in each dataset (this also corresponds to an average of 5 reads per sample). The taxonomic composition at the Phylum level of datasets are shown in Fig. S4, the main differences among the subsets are the higher fraction of 16S rRNA attributed to Actinobacteria and Bacteroidetes in the upper soil layer, and higher fraction attributed to Chloroflexi in the lower soil layer. We show the overlap between the detected taxa in both layers in Fig S5, the two datasets share 121 ASVs.

B. Influence of zeros
As we write in the main text and show in Fig. S3, the high-throughput sequencing datasets have a large number of zeros. These zeros can have different origins and therefore may distort the network construction. In other words to infer associations on the basis of these zeros can be misleading. At the same time the presence of zeros results in strong pairwise correlations for rare taxa as shown in Fig. S6 (a, b). Fig. S6 (a, b) represent the edges detected for the non-filtered dataset, after an edge threshold of 0.7 for pairs of taxa with given prevalence in the dataset. Notice the grid of 20x20 samples (circles) for the upper layer, 18x18 for the lower layer. The figure shows that a higher number of edges is established between taxa with low prevalence. Filtering out the taxa with low prevalence, or taxa with low total read numbers strongly changes this pattern as shown in Fig. S6 (c, d). In general we see a lower number of edges established, and most of them are for prevalent taxa.

C. Constructed networks
The steps for the network construction are described in the main text. Here we reproduce the networks showing the phylogenetic classifications of nodes in Fig. S7, S8.

D. Non-metric multidimensional scaling
Here we show results for a traditional method of ordination -non-metric multidimensional scaling (NMDS), with the objective to compare its results to network analysis. The most commonly ordination is used to compare differences between samples, see Fig. S9. In an analogous manner, it can be applied to explore differences among taxa. Even if the latter is rarely done in microbial ecology, we have used it to be able to realize the comparison with the taxa-centered view offered by network analysis, see