Integration of Phenotypes in Microbiome Networks for Designing Synthetic Communities: a Study of Mycobiomes in the Grafted Tomato System

ABSTRACT Understanding factors influencing microbial interactions, and designing methods to identify key taxa that are candidates for synthetic communities, or SynComs, are complex challenges for achieving microbiome-based agriculture. Here, we study how grafting and the choice of rootstock influences root-associated fungal communities in a grafted tomato system. We studied three tomato rootstocks (BHN589, RST-04-106, and Maxifort) grafted to a BHN589 scion and profiled the fungal communities in the endosphere and rhizosphere by sequencing the internal transcribed spacer (ITS2). The data provided evidence for a rootstock effect (explaining ~2% of the total captured variation, P < 0.01) on the fungal community. Moreover, the most productive rootstock, Maxifort, supported greater fungal species richness than the other rootstocks or controls. We then constructed a phenotype-operational taxonomic unit (OTU) network analysis (PhONA) using an integrated machine learning and network analysis approach based on fungal OTUs and associated tomato yield as the phenotype. PhONA provides a graphical framework to select a testable and manageable number of OTUs to support microbiome-enhanced agriculture. We identified differentially abundant OTUs specific to each rootstock in both endosphere and rhizosphere compartments. Subsequent analyses using PhONA identified OTUs that were directly associated with tomato fruit yield and others that were indirectly linked to yield through their links to these OTUs. Fungal OTUs that are directly or indirectly linked with tomato yield may represent candidates for synthetic communities to be explored in agricultural systems. IMPORTANCE The realized benefits of microbiome analyses for plant health and disease management are often limited by the lack of methods to select manageable and testable synthetic microbiomes. We evaluated the composition and diversity of root-associated fungal communities from grafted tomatoes. We then constructed a phenotype-OTU network analysis (PhONA) using these linear and network models. By incorporating yield data in the network, PhONA identified OTUs that were directly predictive of tomato yield and others that were indirectly linked to yield through their links to these OTUs. Follow-up functional studies of taxa associated with effective rootstocks, identified using approaches such as PhONA, could support the design of synthetic fungal communities for microbiome-based crop production and disease management. The PhONA framework is flexible for incorporation of other phenotypic data, and the underlying models can readily be generalized to accommodate other microbiome or ’omics data.

I nteractions are key to defining system behaviors, structures, and outcomes. In microbial systems, interactions among organisms define their distribution, assemblies, and ecosystem functions. In addition to microbe-microbe interactions, microbes interact with their hosts and are essential to host health and performance (1)(2)(3)(4)(5)(6)(7)(8). In agriculture, plantmicrobe interactions improve plant productivity by providing access to nutrients (9)(10)(11), reducing infection by plant pathogens (5,12), triggering plant growth-promoting factors (13,14), and enhancing plant resistance (15,16) and tolerance to abiotic stresses (17)(18)(19)(20). Although the importance of microbes and host-microbe interactions to host health and ecological processes is well known, interaction-based approaches to manage crop production remain a scientific frontier. Past attempts to translate information about microbial interactions to design biocontrol agents or biofertilizers have often had limited efficacy and durability (21,22). Most microbial inocula have been applied as single species, often selected based on pairwise relations of microbes with a pathogen or the host. Interactions among microbes as well as with the host are important, and the net outcome of these complex interactions defines host health and ecosystem functions (23). Thus, it is critical to understand the ecology of microbes selected for biological applications, and systems approaches centered on host-microbe interaction can help guide the selection of microbes for synthetic communities (24,25).
Among the tools to better understand microbial interactions, network models of microbial communities, and studies of network structures and key groups, have proven popular for generating hypotheses about how to engineer microbial consortia. In such network models of microbiomes, a node represents an operational taxonomic unit (OTU), and a link exists between two OTUs if their sequence proportions are significantly associated across samples. When evaluated with other conventional measures of microbial community structures, such as diversity indices, network models can be used to identify hub taxa that may be key to maintaining microbial assemblages and diversity (26) or to evaluate changes in community complexity and interactions in response to experimental treatments (27). Microbiome networks are useful for describing general community structures and their key properties and are often the most practical option when additional information about species interactions is missing or the goal is to compare across studies with different types of data (28,29). The utility of network analysis for identifying candidate assemblages for biocontrol can be enhanced by incorporating nodes that represent other additional types of features (30,31). For instance, a novel association of host metadata with the microbiome was revealed in an integrated microbiome-metadata network (32), where a feature strongly associated with hub microbes can serve as a marker to measure host performance. In agriculture, plant yield or other phenotypic traits can be integrated in microbiome networks. Because such models include host phenotypes, they facilitate finding candidate sets of OTUs that may directly or indirectly affect host phenotypic traits. Visualization of networks is often valuable for this purpose, but the real value of phenotype-based network models is their ability to infer potential candidate taxa or consortia. The hypothesized beneficial sets of OTUs may represent targets for pure culturing efforts, or if cultures exist, the sets can be further evaluated in laboratory or field studies.
In our current study, building on previously described agricultural experiments in grafted tomato systems (33,34), we characterized the root-associated fungal (RAF) communities and implemented an interaction-based approach to select potential candidate fungi that are predictive of tomato yield and/or that are in significant association with other fungal taxa. The new phenotype-OTU network analysis (PhONA) is a method for network-visualization and a framework to support the selection of candidate taxa and to integrate system traits (such as host yield) in microbe-microbe association networks. PhONA first identifies OTUs predictive of phenotype using lasso regression and then uses the predictive OTUs from lasso regression to build a reduced generalized linear model (GLM). PhONA then combines the GLM results indicating positive or negative associations of the predicted OTUs with the host phenotype as well as with other OTUs in a network model (Fig. 1). Due to the large number of OTUs compared to the sample size, lasso regression was used because it is suited for minimizing overfitting when applied with a relatively small sample size (35) and has been implemented in microbiome studies (36,37).
While phenotype-based network models have the potential to identify key taxa, application of such models should be integrated with findings from other community analyses so that the inference about key taxa is biologically and ecologically meaningful. For example, plant microbiome studies indicate that a small but consistent proportion of variation in microbial communities is often explained by the host genotype (38)(39)(40)(41)(42)(43)(44), indicating the potential for genotype-based modulation of microbial communities in crop production on a broader scale. These results support the idea of host-specific microbial community selection (45). Many such microbes may be taxa that are evolutionarily essential for the survival and function of plants (46,47). In addition, the extent of host genotype filtering of microbes differs across the rhizosphere, rhizoplane, and endosphere and varies from one host species to another (48)(49)(50). Results that indicate microbial filtering by different crop hosts, plant compartments, geographic locations, and environmental factors (51,52) are promising for designing experiments to minimize the search space, or necessary sample numbers, to identify candidate taxa for synthetic communities. For instance, factors that explain great variation in microbial community composition but that are outside the control of management can be treated as blocks in experimental designs, so that host-or compartment-specific effects on the microbial community can be searched to identify the most desirable candidate taxa. Synthetic communities can potentially be matched to the environments in which they are most useful (53).
Phenotype-based selection of microbial consortia is promising as an effective approach to select representative microbial taxa and could support the design of microbiome-based products. Changes in abundance (54), successive selection over multiple generations (3), or analyses of binary host-microbe relationships (55) are some of the recent phenotypebased applications to select candidate taxa for biological applications. Despite the importance of biological test-based approaches, difficulty in culturing all the microbes makes computational approaches instrumental to define microbe-microbe and host-microbiome associations and to identify the biological and ecological key taxa. Tools to describe the community structures based on the cooccurrence matrix or covariance structures (56,57) are more common, whereas tools to integrate host phenotype or environmental factors are at an earlier phase of development. Relatively small sample sizes combined with a great number of features may limit applications of the recent graph-based approaches. Such methods allow measurement of direct associations via conditional dependence structures and offer options to include environmental and phenotypic information in the model (58). CoNet (59) and FlashWeave (58) allow representation of the phenotype or an environmental variable as an extra node or a column in the adjacency matrix, and the same statistical method can be applied to define the associations among microbes and between microbes and phenotypes (taxa and metadata). PhONA is generic, as it allows the user to select data structure-specific models for microbe-microbe and microbe-phenotype associations. In the current case study, we used lasso regression to identify the subset of OTUs and then fitted them using GLMs to predict OTU-phenotype associations, whereas the OTU-OTU associations were defined using sparse correlations for compositional data (SparCC). Additionally, we contrasted the RAF community's diversity and interactions among the rootstocks and the controls for endosphere and rhizosphere compartments. Based on our yield data, rootstock vigor, and previous studies of microbial interactions (27), we expected a greater number of fungal OTUs and of microbial associations for more productive rootstocks. Moreover, in our previous studies of bacterial communities in the tomato rhizobiome, we observed compartment-specific (endosphere versus rhizosphere) effects of grafting and rootstocks on bacterial community composition and diversity (34) and expected similar effects on RAF diversity and composition. All the code and vignettes for PhONA are available at https://ravinpoudel.github.io/PhONA/index.html and archived at Zenodo (doi: 10.5281/zenodo.6600986).
Effects of grafting and rootstock on a-diversity. There was strong evidence for a rootstock effect on OTU richness (F 1,3 = 8.05, P = 0.008) and Shannon entropy (F 1,3 = 3.39, P = 0.02) of tomato RAF communities. Mean species richness was higher in both the endosphere (P = 0.02) and rhizosphere (P = 0.002) of one of the hybrid rootstocks, Maxifort, compared to the nongrafted control (Fig. S2B). Shannon entropy followed trends similar to richness trends with a higher estimate for Maxifort; however, there was evidence for higher Shannon entropy in Maxifort only for the rhizosphere (P = 0.004), but not for the endosphere (P = 0.5) (Fig. S2A). Both species richness and Shannon entropy were higher (P , 0.001) in the rhizosphere than in the endosphere across all the rootstock genotypes ( Fig. S2).
Effects of grafting and rootstock on RAF composition. Based on previous studies of the plant genotype effect on the rhizobiome (34), we expected a significant rootstock effect on community composition. Rootstock explained 2% of the variation in the RAF community composition (permutational multivariate analysis of variance [PERMANOVA]; P , 0.05), whereas compartment, study site, and year explained a greater proportion of the variation than rootstock ( Fig. S3 and Table S2). Endosphere-rhizosphere compartments accounted for 9.17% of the variation. Study site and interannual variation explained 8.44% and 5.33% of the total variation, respectively (Table S2). There was strong evidence (P , 0.001) for the interaction of rootstocks with study sites but little evidence (P = 0.1) for the interaction with year, suggesting more consistent shaping of fungal communities by rootstocks in a particular location across the study years.
Comparison of differentially abundant OTUs (DAOTUs). The analysis of differential abundance found effects of rootstock genotype at the individual OTU level. While analyses of alpha diversity indicated higher diversity in the rhizosphere than in the endosphere, we observed the opposite in the analysis of DAOTUs, with nearly twice as many DAOTUs in the endosphere (n = 134, i.e., 10.2% of the total OTUs) as in the rhizosphere (n = 87 i.e., 6.6% of total OTUs) ( Fig. 2 and Fig. S4). Comparison across rootstocks indicated a greater number of DAOTUs in Maxifort (n = 82) than in RST-04-106 (n = 75) and the self-graft control (n = 64). Compared to the hybrid rootstocks, the number of depleted taxa was greater in the self-graft control (n = 38). Among the enriched OTUs in Maxifort, 25 belonged to Basidiomycota, 25 to Ascomycota, and 11 to Glomeromycota, whereas 3 basidiomycete, 4 ascomycete, and 2 Mortierellomycota OTUs were depleted in Maxifort. In RST-04-106, enriched taxa included 23 OTUs in Basidiomycota, 25 OTUs in Ascomycota, and five OTUs in Mortierellomycota, whereas the depleted OTUs included 2 in Mortierellomycota and 3 in Ascomycota. Comparing the self-and nongraft controls, 9 OTUs in Ascomycota, 3 OTUs in Basidiomycota, and 5 OTUs in Mortierellomycota were enriched in the self-graft treatment, whereas 17 OTUs in Basidiomycota, 10 OTUs in Ascomycota, 4 OTUs in Mortierellomycota, and 4 OTUs in Glomeromycota were depleted in the self-graft treatment.
Network analysis/general network structures. Fungal community complexity, defined in terms of mean node degree and community structures/motifs, varied among the rootstocks in both the endosphere and the rhizosphere, with a greater mean node degree in one of the hybrid rootstocks, Maxifort, compared to both controls and RST-04-106 (Fig. 3, Fig. S5 and 6, and Table 1). Complexity was higher in the rhizosphere than in the endosphere compartment (Fig. 3, Fig. S5 and 6, and Table 1). In addition to the total number of links, the link type (either positive or negative) differed among the rootstocks in both compartments (Table 1), with a higher ratio of negative to positive links in Maxifort in both the endosphere and the rhizosphere compartments. Rhizosphere fungal communities had a higher ratio of negative to positive links than those in the endosphere, for all rootstocks. Although we observed rootstock-specific or compartment-specific effects on the node degree and ratio of negative to positive links, the numbers of modules defined using a simulated annealing (SA) algorithm were similar in both the endosphere and rhizosphere compartments and across the rootstocks (Table 1). Our analyses of node types divided the observed nodes in the association network into four categories: peripherals, module hubs, network hubs, and connectors. More taxa in the rhizosphere were identified as key nodes than in the endosphere ( Fig. 4 and 5).
Lasso regression, GLM, and PhONA. Using lasso regression and GLM models, we identified the OTUs predictive of tomato yield in each compartment in each rootstock. The number of predictive OTUs identified using the varImp function was about the same across the rootstock treatments in both the compartments. However, not all the predicted OTUs were associated with other OTUs in the network models. The Maxifort rhizosphere had the highest number of OTUs (10) associated with other OTUs in the network models. Only a subset of the entire microbiome was predictive of the yield, among which only a few microbes were associated with other microbes in the network models.

DISCUSSION
This study demonstrated the effect of rootstocks on RAF community composition and structure. General diversity-based analyses indicated a rootstock effect. The integrated host phenotype and OTU network in the PhONA identified potential candidate taxa for each rootstock and community structures in the endosphere and rhizosphere compartments. In contrast to network models that portray only microbe-microbe interactions, PhONA integrates the results of GLM models of microbe association with phenotypic traits to support inferences about candidate taxa and predictive microbiome analyses. Thus, candidate taxa can be selected not only because they have a direct association with the host response variable(s), but also because they are indirectly associated with the host response variable through their associations with community members that have direct associations with host traits. For instance, a node that has a FIG 2 Legend (Continued) using OTU counts from self-grafts and nongrafts as controls. All the tests were adjusted to control the false-discovery rate (FDR, P , 0.01) using the Benjamini-Hochberg method. Each point represents an OTU labeled at the genus level and colored based on phylum. Phyla with less than 1% relative abundance are labeled as "other." The position along the x axis represents the abundance fold change contrast with controls (except for the self-graft versus nongraft comparison, where the nongraft treatment was used as a control for the contrast).

Phenotype-Based Networks for Microbial Consortium Design
Applied and Environmental Microbiology positive association with a system phenotype node (in our case, yield) might have negative or positive associations with other OTUs. Such OTUs with indirect positive associations with the desired phenotype might also be included in biofertilizer consortia. PhONA provides data-driven selection of a consortium based on association with the  Phenotype-Based Networks for Microbial Consortium Design Applied and Environmental Microbiology phenotype of interest. Applying PhONA for disease phenotypes or pathogen resistance phenotypes could be useful for designing rational biocontrol consortia. We also observed some OTUs with direct negative associations with the yield node. Efforts to control taxa negatively associated with desired phenotypes, as well as the taxa that have positive associations with these taxa, might contribute to maximizing yield. Although we did not observe any disease symptoms in our experiments, OTUs negatively associated with the yield node might represent a case of asymptomatic negative microbial effects on yield. Efforts to explore negatively associated OTUs might provide opportunities to minimize asymptomatic yield loss in crops.
The main goal of PhONA is to provide a systems framework to generate hypotheses about the role of microbiome components in host function and performance and to support the potential for mechanistic/predictive models to better understand host-microbiome interactions. In planta experiments with fungal cultures are essential to test the hypotheses generated by these models, to help to differentiate between associations that are based on consistent biological interactions and not simply based on shared (or opposing) environmental niche preferences. It is important to be cautious in attributing biological interactions to the key structures in network attributes because the links in the PhONA may or may not depict biological interactions. That is, many links may represent only correlative relationships and not causal ones (30). For instance, the hub node in the network is often regarded as a key node, but the high number of links with the hub node in the association network could be due to shared niches, biological interactions, or a mixture thereof. If the associations are mostly due to shared niches, removing such a

Phenotype-Based Networks for Microbial Consortium Design
Applied and Environmental Microbiology hub node will have a more limited effect, whereas removal of a hub node involved in many biological interactions could lead to significant effects on the microbial community. The information gained through network analyses can be used in conjunction with prior knowledge of the observed organisms to select the taxa for consortia. For example, in the case of the nongraft treatment (Fig. 3), we observed many OTUs assigned to the potentially plant growth-promoting genus Mortierella. One of the Mortierella nodes was positively associated with yield, and another Mortierella module was positively associated with a node directly linked to yield. Taken together, these observations suggest that many Mortierella species may contribute to plant yield through direct or indirect mechanisms. In the same network, the plant pathogen Thanatephorus was a connector and was negatively associated with two modules, whereas many OTUs within these two modules were positively associated with each other. If the goal is to suppress this pathogen, including organisms from these two modules that are negatively associated with Thanatephorus may aid in the design of biocontrols. Network structure or prior biological knowledge about key nodes can guide consortium design but should be considered as a hypothesis-generating tool that requires further experimentation.
RAF community composition, diversity, and interactions differed between the endosphere and rhizosphere compartments. Although the endosphere and the rhizosphere are physically adjacent, they are distinct in community composition and diversity. Compartment specificity in community composition and diversity has been reported for other plant species, in both natural and agricultural settings (38, 60, 61). Usually, bulk soil is considered a source of plant-associated microbial communities, a subset of

Phenotype-Based Networks for Microbial Consortium Design
Applied and Environmental Microbiology which is selected for in the rhizosphere (38,62), mainly as a function of root exudates and rhizodeposits (50,(63)(64)(65)(66). Selection of the rhizosphere microbiome could be specific (e.g., antagonistic to plant pathogens) (67,68) or more general with less influence of host genotype. In comparison, the endosphere of host plants often supports lower microbial diversity than the rhizosphere does (61,62), as host tissues and defense systems act as biotic filters (2). As a result, the microbiome is more specialized in the endosphere than in the rhizosphere. RAF compartment specificity may also be an important consideration for microbe-based disease management strategies-especially for the management of pathogens or pests that are compartment specific, such as endoparasites and ectoparasites. RAF community composition and diversity also differed among the rootstock genotypes, and the observed effects were due to the plant genotypes rather than grafting. Plant genotypes can structure root-associated fungal communities (38,61,69). The commercial rootstocks in our study have been bred to provide resistance against specific soilborne pathogens and pests. Small host genotypic differences could alter the physiological and immunological responses in the root systems, thereby selecting genotype-specific RAF communities (70). For example, some root exudates and metabolites could be specific to a plant genotype (71)(72)(73) and provide specific control of microbial communities (74)(75)(76). In some cases, the host genotype effect can be directly attributed to root anatomy (68,76,77). Efficient root types and architectures are desired agronomic traits to cope with biotic and abiotic stresses (78), and root systems vary among and within plant species (76). Moreover, the effect of plant genotypes on microbial communities in the root system may be linked to the flow of nutrients between the above-ground scion and below-ground rootstocks, where vigorous rootstock genotypes could drive greater resources to the microbial communities by supporting larger scion biomass. In such a positive nutrient feedback between the scion and rootstock, rootstock genotype appears to be a more critical driver than scion genotype (79). Rootstock genotypes supporting higher yield and biomass may support higher microbial diversity by excreting a greater volume of photosynthates as root exudates and metabolites. Although we did not evaluate root exudates, and used the same scion across the study, our study is consistent with higher yield and biomass (as for the Maxifort rootstock) being associated with higher fungal diversity. Additionally, we observed an effect of rootstock on the RAF community composition. Collectively, the results support our expectations of rootstock-specific control of the RAF community.
Our definition of complexity is based on interactions in networks, using a definition similar to that used in other microbiome network analyses (27,41,80). However, a greater number of interactions and complex network structures/motifs would tend to be observed whenever more nodes exist in these association networks, an inherent relationship not always considered in studies of complexity in microbiomes. The higher number of OTUs associated with Maxifort would tend to result in higher complexity than to rootstocks with fewer OTUs. Another potential measure of complexity is network density, the proportion of links observed in a network relative to the total number of possible links. For all the rootstocks we studied, network density was similar (0.04) in both compartments, indicating similar community complexity. Statistical methods comparable to rarefaction, designed to equalize the number of nodes across networks or methods to balance OTU richness for sampling efforts (81), will be a valuable future effort for understanding how network complexity responds to treatments and for making comparisons across studies. In addition, methods to optimize and automate the selection of association thresholds to define the pairwise relationships in a microbiome network are a gap and an opportunity for improving microbiome network analyses. For graphics in the figures in this analysis, we selected a level of association such that an interpretable number of OTUs were depicted for visual consideration. Studies directly applied to identify potential microbial assemblages for agricultural applications could benefit from exploring results of a range of thresholds.
Our study indicates a rootstock genotype-specific effect on RAF diversity, composition, and interactions and also demonstrates integration of system phenotypes such as plant yield in a network-based model to support selection of candidate taxa for biological use. However, in sequence-based studies such as ours, the biological and functional significance of the candidate OTUs remains unknown. Follow-up experiments with fungal cultures will be necessary to determine the biological roles of the candidate OTUs and to differentiate causal associations from correlations based on niche preference. Similarly, further development of PhONA to incorporate temporal microbiome data and Bayesian learning and inference methodologies (82,83) has the potential to support causal inference, including an understanding of directionality in microbiome networks. PhONA utilizes a lasso regression and GLM to link OTUs with a system phenotype, although many other models, such as random forest and other machine learning approaches (84), could also be used. Given the nature of microbiome data, having a high number of features (p) and relatively small number of samples (n), other models to address the n Â p problem can improve PhONA predictions. Rather than pure prediction, our methods aim to find the key predictors and use them in the GLM model for evaluating associations with a phenotypic response such as plant yield. PhONA focused on finding the attributable predictors/ OTUs that are key to biological interventions, which are missed in approaches that are focused purely on prediction (85). Small sample size was a limitation in our current study, reflecting the challenge of processing a large number of plant replicates, and we did not validate the results from our model by splitting data into training and test sets. A rigorous model validation step would improve the accuracy of PhONA. As lab-based technologies and computational resources become less expensive, studies with large sample sizes are becoming more practical and, when combined with an analytical framework such as PhONA, microbial community analyses can go beyond simple analyses of diversity to help make microbiome-based agriculture a reality.

MATERIALS AND METHODS
Experimental plots, rootstocks, and study sites. We studied grafted tomato plants in high tunnels in an experimental design similar to that described in detail by Poudel et al. (34). Tomato plants were grafted following a tube-grafting protocol described in Meyer et al. (33). Our study included three rootstocks (BHN589, RST-04-106, and Maxifort) in four graft treatments: (i) nongrafted BHN589 plants, (ii) self-grafted BHN589 plants (plants grafted to their own rootstock), (iii) BHN589 grafted to RST-04-106, and (iv) BHN589 grafted to Maxifort. We chose BHN589 as the scion primarily based on its popularity due to high yield and high-quality fruit with a long shelf life. For rootstocks, we selected Maxifort because it is a productive and popular rootstock and RST-04-106 as a new rootstock variety based on tomato breeders' recommendations.
Our study included two sites: Olathe Horticulture Research and Extension Center (OHREC) and Common Harvest Farm, a farm managed by a collaborating farmer. For more information about the sites, see Table S1. At each study site, the four graft treatments were assigned to four plots per block in a randomized complete block design. Each plot consisted of 5 to 8 plants, and one middle plant per plot was sampled during the peak growth stage. There were six blocks at OHREC and four blocks at Common Harvest Farm, such that for each year, each graft treatment was replicated 10 times. The experiment was repeated for 2 years (2014 and 2015) with a similar design, with the blocks and rootstocks randomly and independently assigned each year.
Sample preparation, DNA extraction, and amplicon generation. To compare the fungal communities, we selected a center plant from each plot and carefully dug the whole plant out such that the majority of the roots remained intact. Endosphere and rhizosphere samples were prepared as previously described (34), and the total genomic DNA was extracted using a DNA extraction kit (UltraClean soil DNA isolation kit; MoBio, Carlsbad, CA, USA) as per the manufacturer's protocol, with slight modification during the homogenization step (34). To recover high genetic diversity, we opted for the two-step PCR approach. The primary PCR amplicons were generated in 50-mL reactions under the following conditions: 1 mM forward and reverse primers, 10 ng template DNA, 200 mM each dioxynucleotide, 1.5 mM MgCl 2 , 10 mL 5Â Phusion Green HF buffer (Finnzymes, Vantaa, Finland), 24.5 mL molecular biology-grade water, and 1 unit (0.5 mL) Phusion Hot Start II DNA polymerase (Finnzymes). PCR cycle parameters consisted of a 98°C initial denaturing step for 30 s, followed by 30 cycles at 98°C for 10 s, 57°C annealing temperature for 30 s, and a 72°C extension step for 30 s, followed by a final extension step at 72°C for 10 min. All samples were PCR-amplified in triplicate to minimize stochasticity, pooled, and cleaned using Diffinity RapidTip (Diffinity Genomics, West Chester, PA, USA). In this PCR, we amplified the entire internal transcribed spacer (ITS) region of fungal rRNA genes using primers ITS1F-CTTGGTCATTTAGAGGAAGTAA and ITS4-TCCTCCGCTTATTGATATGC (see, e.g., 86). The average amplicon length of the ITS region in fungi is about 600 bp and could not reliably be fully covered with the Illumina MiSeq platform (v.3 chemistry) in a single read. Thus, in the following nested PCR, only ITS2 of the ITS region was amplified using fITS7-ITS4 primers (87) incorporating unique molecular identifier tags (MIDs) at the 59 end of the reverse primer (ITS4). For the nested PCR, we used similar reagents and PCR conditions as in the primary PCR, with some modifications: the number of PCR cycles was reduced to 10, total reaction volume was reduced to 25 mL, and 5 mL of cleaned PCR product from the first PCR Phenotype-Based Networks for Microbial Consortium Design Applied and Environmental Microbiology amplification was used as the DNA template. The nested PCR was also run in triplicate, pooled by experimental unit, and cleaned with an Agencourt AmPure cleanup kit using a SPRIplate 96-ring magnet (Beckman Coulter, Beverly, MA, USA) as per the manufacturer's protocol. Then, 200 ng of cleaned, barcoded amplicons was combined per experimental unit, and the final pool was cleaned again using an Agencourt AmPure cleanup kit as described above. MiSeq adaptors were ligated to the library and paired-end sequenced on a MiSeq personal sequencing system (Illumina, San Diego, CA, USA) using a MiSeq reagent kit V3 with 600 cycles. The endosphere and the rhizosphere amplicon libraries were sequenced separately in two runs. Adaptor ligation and sequencing were performed at the Integrated Genomics Facility at Kansas State University. All sequence data generated in this study were deposited in the NCBI Sequence Read Archive depository (BioProject no. PRJNA496268).
Bioinformatics and OTU designation. The sequence library of fastq files was curated using the mothur pipeline (version 1.33.3; 88) following steps modified from the MiSeq standard operating protocol (SOP; www.mothur.org/wiki/MiSeq_SOP). Briefly, the forward and reverse reads were assembled into contigs using the default alignment algorithm. Any sequences shorter than 250 bp or containing an ambiguous base call or more than 8 homopolymers or missing MIDs were removed from the library. Barcoded sequences were assigned to experimental units, and the data for endosphere and rhizosphere libraries were merged and processed together for the remaining steps in the mothur pipeline. The pairwise distance matrix based on the filtered sequences was created, and sequence data were clustered into OTUs at 97% sequence similarity using the nearest neighbor joining algorithm. The clustered OTUs were assigned to a putative taxonomic identity using a Bayesian classifier (89)  Statistical analyses. We evaluated the network of associations among fungal OTUs with network models to better understand the community composition and the interactions therein. The observed OTU database was divided into eight subsets, each combination of the four rootstocks and two compartments (endosphere and rhizosphere), such that we constructed eight networks in total. In our network models, a node represents an OTU and a link exists between a pair of OTUs if there is evidence (P , 0.05) that their frequencies are correlated (positively or negatively) across samples. Reducing false associations due to compositional bias in network modeling of microbiome data is important for clearer interpretation (57). Thus, we used a sparse correlations for compositional data (SparCC) method to evaluate the pairwise associations (57), designed to minimize the compositional bias effect due to normalization. In our analyses, associations were defined in 20 iterations, and the significance of a pairwise association was determined from 500 bootstrapped data sets. Once the matrix defining all the pairwise associations was derived, we selected only those OTUs for which the absolute value of at least one association was greater than 0.5 (and P , 0.05) in the network analyses for each of the rootstock genotypes.
To identify the OTUs associated with tomato yield in each rootstock, a regression-based model was fitted to the observed data. Marketable tomato yield data (Tables S6 and S7) were the response variable, and fungal OTUs were potential predictors. We used the caret package (91) to evaluate the lasso regression and selected OTUs using varImp functions. Lasso regression used an L1 regularization approach to shrink the less important variables' coefficients to zero and to reduce the number of variables in the model. In lasso regression, lambda determines the penalty of regularization, and its value can range from zero to infinity; when it is zero, the results are similar to the least square lines. A grid-based approach was used to tune the lambda parameter using repeated (500 iterations) 5-fold cross-validation, and the value of lambda with lowest variance was selected. Only the OTUs with nonzero coefficients were selected, based on the lasso-regression model, to build the reduced GLM model, and the association type of each OTU with phenotype was estimated. Given the small sample size, we did not evaluate the model performance by splitting the data into training and test cases, although this would be a valuable step in future studies with larger sample sizes. PhONA then integrates the results from the GLM model for yield with the OTU-OTU association network. We plotted the resulting network using the igraph package (92) in R. To evaluate the role of nodes in the network, we placed each node in one of four categories-peripherals, module hubs, network hubs, and connectors-based on the within-module degree and among-module connectivity (30,93). The role analyses consider the presence or absence of links in the network and do not account for the link types (i.e., positive or negative associations).
To evaluate the effects of rootstocks on fungal diversity, Shannon entropy and species richness were evaluated using the vegan package (94) wrapped by the phyloseq package (95) in R (96). Differences in diversity across the rootstocks were compared using a mixed model analysis of variance (ANOVA) in the lme4 package in R (97). Rootstock treatment and compartment were treated as fixed factors, while study site and sampling year were treated as random factors, with study site used as a blocking factor. Changes in fungal community composition across the samples were estimated based on a Bray-Curtis dissimilarity distance matrix and visualized in nonmetric multidimensional scaling (NMDS) plots. The contribution of factors to the observed variation in fungal composition was estimated in a permutational multivariate analysis of variance (PERMANOVA, using 1,000 permutations) using the adonis function in the vegan package (94). To identify OTUs that were sensitive to the rootstock treatments, the observed frequency (proportion) of each OTU was evaluated by fitting a generalized linear model (GLM) with negative binomial distribution, to identify depleted or enriched OTUs (differentially abundant OTUs [DAOTUs]). Likelihood ratio tests and contrast analyses (between the hybrid rootstock and controls) were performed for the fitted GLM to identify the DAOTUs. We used OTU frequencies from self-grafts and nongrafts as controls, in comparisons with other rootstocks, using contrasts. All tests were adjusted to control the false-discovery rate (FDR, P , 0.01) using the Benjamini-Hochberg method (98). A differential abundance test was performed within the controls (self-graft versus nongraft) to identify the OTUs responsive to the grafting procedure itself.
Data availability. Sequencing data for this project are available at NCBI under BioProject no. PRJNA496268. All the code and vignettes for PhONA are available at https://ravinpoudel.github.io/PhONA/ index.html and archived at Zenodo (doi: 10.5281/zenodo.6600986).

SUPPLEMENTAL MATERIAL
Supplemental material is available online only. SUPPLEMENTAL FILE 1, PDF file, 3.5 MB. SUPPLEMENTAL FILE 2, XLSX file, 0.2 MB.