Community Structure of Arbuscular Mycorrhizal Fungi in Soils of Switchgrass Harvested for Bioenergy

We assessed the different species of beneficial fungi living in agricultural fields of switchgrass, a large grass grown for biofuels, using high-resolution DNA sequencing. Contrary to our expectations, the fungi were not greatly affected by fertilization. However, we found a positive relationship between plant productivity and the number of families of beneficial fungi at one site. Furthermore, we sequenced many species that could not be identified with existing reference databases. One group of fungi was highlighted in an earlier study for being widely distributed but of unknown taxonomy. We discovered that this group belonged to a family called Pervetustaceae, which may benefit switchgrass in stressful environments. To produce higher-yielding switchgrass in a more sustainable manner, it could help to study these undescribed fungi and the ways in which they may contribute to greater switchgrass yield in the absence of fertilization.

Dirks & Jackson (2020) 6 Figure S4. Phylogenetic tree of Diversisporales and Glomerales reference taxa and ASVs recovered in this study. The three circles adjacent to the taxon name correspond to site occurrence, with a filled circle indicating presence at that site. From left to right, the circles correspond to Hancock, Oregon, and Rhinelander. Bootstrap support ≥ .75 is indicated by grey circles on the branches. Dirks & Jackson (2020) 7 Table S1. Coordinates, texture, and mean physiochemical properties of the surface soils of the Wisconsin Marginal Lands Experiment sites (1).  Table S2. Sequences retained through each of the steps of the DADA2 bioinformatics pipeline.
"CCS" (circular consensus sequences) is the number of sequences that were generated with a minimum of five passes during PacBio sequencing. "Primers" is the number of sequences that contained the AMF-specific pSSU-ITS-pLSU primer sequences. "Filtered" is the number of sequences that had a quality score greater than or equal to three, expected error less than or equal to two, and sequence length between 1000 and 1600 base pairs. "Denoised" is the number of sequences that were inferred as ASVs according to the PacBio error-learning algorithm of DADA2. "Non-chimeric" is the number of sequences remaining after removing chimeric ones.
Finally, "Retained" is the fraction of sequences retained at the end of the pipeline (non-chimeric sequences divided by circular consensus sequences). The bottom two rows of the

Note S1
PacBio is criticized for low sequencing depth and a high error rate, which may deter its broader use in the study of AMF (2). While PacBio has a lower sequencing depth than Illumina sequencing technologies and therefore may not describe microbial communities as fully, this is not a problem for relatively low-diversity groups like Glomeromycotina (3)(4). Furthermore, sequencing depth may be a moot point with increased access to PacBio Sequel 2, which generates up to 4 million reads compared to Sequel's 500,000. In regard to high error rates, the circularization and multiple sequencing passes of individual DNA molecules result in an error rate comparable to -or even less than -other leading platforms (5). In conjunction with the use of error-learning bioinformatics algorithms like DADA2, the average number of erroneous bases is less than one for every 2000 nucleotides, resulting in single-nucleotide resolution for mediumlength amplicons like those employed in this study (6). The length of amplicons generated with PacBio have the additional benefit of primarily capturing living organisms in the community as the majority of relic DNA from dead organisms consists of fragments < 200 bases in length (7).

Note S2
In our curation of a Glomeromycotina LSU database for phylogenetics, we discovered 351 sequences that were derived from type cultures but not labelled with the "type_material" source key in GenBank. Most undesignated type-material sequences were discovered by searching for accessions referenced in primary literature describing new species of Glomeromycotina fungi; a number of others were found by searching for sequences not labelled as type material but containing the query "epitype", "holotype", "isotype", or "sp. nov.".
Although of the highest importance, these sequences were therefore not retrievable as type material via Entrez Direct (EDirect), the suite of command-line utilities for interfacing with National Center for Biotechnology Information databases (8). Given that GenBank contains numerous misidentified DNA sequences, a label to distinguish confidently identified sequences from those that might be erroneous is a necessity for sequence-based identification (9). Barring contamination, sequences from type material are always correct according to the logic of taxonomy. Thus, it is essential that GenBank sequences from type material are accurately labelled as such so that they can be discerned from all other sequences, whose identities are hypotheses or best guesses of varying reliability. We notified staff at NCBI and authors of theses sequences, who are working to correct the metadata of these accessions.
For public sequence repositories to serve the wider research community, mycologists must take care to ensure and submit high-quality metadata (9). Poorly annotated sequences propagate through UNITE and SILVA, resulting in erroneous and imprecise taxonomic assignments of environmental amplicon sequences. As one example, four pSSU-ITS-pLSU typematerial sequences were generated and deposited in GenBank for the description of Diversispora jakucsiae: KJ850181-KJ850185 (10). In GenBank, these accessions are labelled as Diversispora to Pervetustaceae in our study were identified as such when referencing UNITE. Incomplete database curation prevents full insight into community composition and hinders a broader understanding AMF ecology and biogeography from environmental sequences.