Modelling signaling networks underlying plant defence

Transcriptional reprogramming plays a signiﬁcant role in governing plant responses to pathogens. The underlying regulatory networks are complex and dynamic, responding to numerous input signals. Most network modelling studies to date have used large-scale expression data sets from public repositories but defence network models with predictive ability have also been inferred from single time series data sets, and sophisticated biological insights generated from focused experiments containing multiple network perturbations. Using multiple network inference methods, or combining network inference with additional data, such as promoter motifs, can enhance the ability of the model to predict gene function or regulatory relationships. Network topology can highlight key signaling components and provides a systems level understanding of plant defence.


Introduction
Computational and mathematical modelling of biological data is not a new approach in plant science. For example, many multi-component modelling strategies are used in agriculture to support decision-making and predict crop yields, and evolution and ecology are two fields where modelling has had a significant impact for many decades. However, it is only relatively recently that modelling approaches have been used to understand molecular signaling pathways underlying plant defence responses and in many cases the significance, or accuracy, of insights from the models have not been tested. In this review we highlight recent advances in modelling of plant defence signaling and the types of questions that can be addressed (Table 1). Where appropriate we have outlined techniques that have been applied to other areas of plant science, typically abiotic stress responses. For greater detail on specific modelling methods please see alternative reviews [1,2].

Predicting defence gene function
Modelling is often used to predict new functions for genes or gene products. Networks can be constructed linking genes on the basis of co-expression across a set of transcriptome data [3] and a guilt-by-association strategy used to predict gene function. This assumes that genes closely associated with a gene of known function may share that function, and can be complemented by prioritising unknown hub genes for experimental validation [4] (see Box 1 for explanation of network terms). The data sets used for such networks can be condition-independent (i.e. no selection for data relevant to the biological process being investigated) or condition-dependent. A recent genome-wide co-expression network in Arabidopsis was generated using nearly 900 microarray data sets and includes over 18 000 genes [5]. Several network modules were induced in response to biotic stress and hormone treatment suggesting a role for that hormone within the defence regulation of that module, and two modules were specifically repressed in the presence of Pseudomonas syringae effector proteins. Genes of unknown function within these modules are potentially novel players in the plant defence response. Functional association networks extend the co-expression concept and incorporate multiple large-scale data sets to enhance their predictive ability. An early plant functional association network, AraNet [6], used transcriptome data, experimentally determined protein-protein interactions, and protein sequence information as well as a variety of gene-gene association data inferred from other organisms including mouse, yeast and human. This network successfully predicted seed pigmentation, drought tolerance and lateral root formation roles for novel genes.
Co-expression analysis is a popular method for gene discovery given its ease of implementation and the ability to utilize gene expression data from a large range of studies such as those in publically available microarray compendia. Two such recent studies have attempted to predict genes with a role in the plant defence response [7,8]. Both used large collections of expression data from pathogen infections of Arabidopsis to infer co-expression networks, with Tully et al. [8] combining co-expression network inference with a motif discovery tool, tailormade to handle large groups of genes, to predict causality within the network. Both studies suggested that hub genes and nodes with high betweenness centrality (Box 1) play important roles in the plant immune response. Betweenness centrality is a measure of how important a node is in linking poorly connected parts of a network, and these information bridges are crucial to information flow within the network [9,10].
A remarkable gene discovery rate (for key regulators of abiotic stress responses) was obtained combining co-expression analysis with a gene expression diversity measure [11 ]. Using a compendium of expression data following multiple abiotic stress treatments, genes were scored on the basis of how varied their differential expression was across stresses, and on how reproducible expression was within independent experiments of the same stress. A set of high-scoring regulatory proteins (for example, transcription factors (TFs), kinases and phosphatases) was selected and incorporated into a co-expression network. Within this network, modules (Box 1) were ranked based on their expression diversity and the presence of known stress regulators with individual genes within these modules prioritised on the prevalence of homozygous T-DNA knockout lines. Impressively this ranking-modelling method correctly predicted phenotypes for 62% of the 42 regulators. Phenotypic predictions based on the gene's score alone had a success rate of 36% revealing the power of the guilt-by-association network approach.
Condition-dependent approaches rely on sufficient data sets being available but may be more feasible for nonmodel organisms. Four data sets analysing the citrus transcriptome after Candidatus Liberibacter asiaticus (Las) infection were used by Zheng and Zhao [12] to construct a co-expression network. Although predictions from this analysis were not experimentally tested, many of the hub genes were orthologues of known defence regulators in Arabidopsis providing some validation to the network. The majority of condition-dependent co-expression studies focus on experiments from a single type of treatment, whereas in natural environments plants are regularly exposed to multiple stresses simultaneously. A novel combinatorial biotic and abiotic stress study revealed that up to 60% of differential gene expression in dual stress treatments was not observed from expression in single stress treatments [13]. This work highlights how we need to broaden our experimental horizons, and/ or predictive modelling abilities, to ensure we are capturing and inferring biological meaning with relevance in the real world. Table 1 Types of network modelling strategies used to study the plant defence response. We indicate the biological question(s) that can be addressed by each approach, potential advantages and disadvantages of the different methodologies and an example

Inferring causal regulatory relationships
Although challenging due to the number of potential interactions, genome-wide transcriptome data has been used to infer regulatory networks that predict specific causal relationships between genes. Carrera et al. [14 ] used a large collection of expression data sets covering multiple treatments, tissues and mutant genotypes to infer regulatory relationships between Arabidopsis TFs and their target genes. These causal relationships were captured as ordinary differential equations with parameters inferred from the data, subsequently enabling simulations of the effect of perturbing the expression of TFs on expression of the network [15]. Computational design demonstrated that expression of the network could be made to resemble viral infection by perturbing a smaller, and different, set of TFs than were differentially expressed during viral infection. This novel approach can be used to investigate plasticity within the Arabidopsis transcriptional network and is of obvious value in synthetic biology strategies for enhancing disease resistance.
Incorporation of cis regulatory elements can enhance the regulatory predictions made by genome-wide network models [16]. Subnetworks containing genes whose promoters are enriched for specific cis elements were extracted from a generic genome-wide co-expression network constructed from a variety of expression profiling data [17] ( Figure 1a). Modules within these subnetworks were predicted to be regulated by a single TF binding to the enriched motif. With increased knowledge of cis elements [18,19] such approaches are likely to have enhanced predictive ability. Genome-wide DNAse I footprinting (DNase-seq) coupled with motif discovery within protected regions suggests specific TF-DNA interactions that can be used to construct regulatory networks. This technique was used to investigate transcriptional re-programming after Modelling signaling networks underlying plant defence Windram and Denby 167 Box 1 Biological networks tend to be interpreted and visualized using techniques borrowed from graph theory. A network graph consists of nodes linked by edges (I), where nodes represent a component of the biological network and edges indicate a relationship between them. In many cases nodes represent a gene or a product of a gene's transcription or translation. Thus we might have a network where nodes are linked by undirected edges (I) where edges could represent for example, a co-expression relationship between genes, functional relatedness or the physical interactions between proteins. Edges can be directed (II) for instance indicating a transcription factor regulating a target gene. Such edges are also known as causal relationships. Such a relationship can be defined further introducing signed directed edges indicating positive or negative regulation (III).

edge (i) (ii) (iii) node
Nodes in networks have different topological properties. In IV we observe two network modules (red and green). These two groups of nodes share higher connectivity within the module than with the network as a whole. Connectivity can dictate the role a node plays in a network, with highly connected nodes often called hubs. Nodes with high out-degree, nodes 1 and 2, regulate a relatively high number of targets, thus are likely to have greater control over the network compared to node 4. Conversely node 3 has a high in-degree being regulated by a large number of other nodes. Node 5 also has low connectivity but sits at the top of a network hierarchy controlling node 4 through nodes 1 and 6, so has the potential to profoundly influence network function. Node 7 is another important type of node. In this network it has high betweenness centrality in that it acts an information bridge linking the two poorly connected modules to each other. It must also be remembered that biological networks are not static entities. Genes (including regulators) exhibit tissue-, condition-and temporal-specific expression. Thus, networks can be rewired to different topological states in response to different inputs such as environmental signals. For example, in condition 1 a network may be structured as IV but in condition 2 structured as in V (changes in both the presence of nodes and edges). The topology influences how information is processed, and the robustness of the network to perturbation, ultimately dictating its response. heat treatment of seedlings and demonstrated that the response to heat was mediated by re-wiring of the TF-TF network, and a net loss of interactions [20 ]. The loss of interactions occurred across a broad range of TFs rather than being concentrated within selected network modules as expected. Interestingly, regions of the genome with increased accessibility after heat shock were concentrated in distal intergenic regions, whereas regions with decreased accessibility were focused around the transcriptional start site. However, regions with extreme accessibility after heat shock were mostly found in the coding regions of genes, and such genes (including known key regulators of the heat shock response) had relatively open promoters under control conditions and were generally highly expressed after treatment [20 ]. This highlights a need to consider such regions in the context of gene regulation during plant stress responses. A major challenge of inferring transcriptional networks is the sheer number of possible regulatory interactions. Approaches such as DNase-seq can significantly simplify the computational burden of network inference by providing prior information on TF binding that can be weighted appropriately.
There has been increasing interest in using multiple inference methods to reconstruct regulatory networks. Marbach et al. [21] illustrated that integrating predictions from multiple network inference methods could be used to construct a consensus model with superior predictive capability compared to networks generated by individual methods. However, while this held true for prokaryote  transcriptome networks, individual and consensus methods both underperformed for inference of simple eukaryote networks. The authors suggested that this is due to the increased complexity of regulation in eukaryotes and decreased correlation of regulator and target mRNA resulting from post-transcriptional regulation. However, Vermeirssen et al. [22 ] recently created an ensemble network for plant abiotic stress responses, using transcriptome data and three network algorithms, that had impressive predictive capability. Specifically, experimental validation demonstrated that the ensemble approach could predict regulatory interactions with 52% precision (true positives/predicted) and 49% recall (true positives/ experimentally validated) for 289 validated interactions.
Time series expression data (preferably high-resolution) can also be used to generate causal regulatory networks and specific regulatory predictions (Figure 1b). Such a time series from Arabidopsis infected with the fungal pathogen Botrytis cinerea was used to construct a directed network using a dynamic Bayesian approach with putative regulator and target nodes [23 ]. Although the network nodes represented mean expression profiles of coexpressed gene clusters, integrating cluster membership and downstream target gene promoter motifs enabled specific regulatory predictions to be made. The strength of this approach is that it uses a single time series data set and hence, by not relying on large collections of data, is suitable for non-model organisms. A focused expression data set (571 genes in 22 immune-response Arabidopsis mutants) was also used by Sato et al. [24] to construct a regulatory network where each node represents a mutant genotype and edges reflect a shared regulatory influence on the expression of defense-related genes. Although this network does not predict causality or enable discovery of novel defence-related genes, it did highlight the preponderance of negative interactions between signaling sectors and predicted many known regulatory relationships. Furthermore, unlike most current modelling of plant defence signaling, the network components do not need to be transcriptionally regulated; it is their effects on expression that is assessed.

Organising principles of the plant defence response
Several recent studies have addressed questions about the organising principles of signaling networks during plant defence rather than attempting to predict specific gene-gene regulatory mechanisms. Initial work by Tsuda et al. [25] used defence signaling mutants to determine the contribution of four different signaling sectors to the defence output (pattern-triggered immunity (PTI), effector-triggered immunity (ETI) and defence against necrotrophic pathogens). This approach demonstrated a positive contribution of all the signaling sectors tested (salicylic acid (SA), jasmonic acid (JA), ethylene (ET) and phytoalexin-deficient 4 (PAD4)) on the plant immune response, going against the dogma of antagonistic interactions between JA and SA and the greater importance of JA and SA signaling to necrotrophic and biotrophic pathogen defence respectively. The modelling also predicted mainly synergistic relationships between signaling sectors during PTI and compensatory relationships during ETI. Recently, extension of this analysis revealed how robustness (resistance to perturbation) is conferred and balanced against tunability (adaption to different pathogens) [26 ]. Elicitors were used to stimulate the defence network in the same signaling mutants with flow of information through the network assessed by signaling sector marker gene expression at two time points. Network output was the impact on subsequent pathogen infection. A multiple regression model used network stimulation and flow as explanatory variables for output and highlighted novel regulatory interactions (Figure 1c), some of which were experimentally verified. JA signaling positively affected SA signaling, and the PAD4 sector directly affected resistance whereas the effect of SA was indirect via PAD4. Most strikingly, the ET sector exhibited only negative interactions with other sectors and reciprocal inhibition of the ET and JA sectors was essential for network robustness. This work has given us greater understanding of signaling mechanisms and dynamics within the defence network and paves the way for predicting resistance phenotype from a minimal set of input information. The beauty of this approach is that it uses complex rather than large-scale data, an elegant example of how multifactorial experimental design is crucial for the predictive ability and power of a model.
A novel machine learning approach using network structure to guide model classification, network-guided forest (NGF), has also been used to probe the topology of PTI and ETI networks [27 ] (Figure 1d). A challenge with machine learning approaches is the requirement of gold standard data for training, with the strength of the method very much determined by the quality and applicability of this data. Machine learning has been used in the past to investigate plant stress but did not always yield highly predictive network models [28]. The NGF approach used a TF-focused functional interaction network combining protein-protein interactions, protein-DNA interactions, co-expression and co-chromatin modification data, as the classification gold standard to learn the PTI and ETI networks from appropriate expression data. Both network nodes and edges could contribute to the classifier and the degree to which they contribute was scored. Approximately 50% of the top ranked regulators in these networks had known roles in immunity suggesting accurate prediction. Crucially a large number of network modules were involved in determining the outcome (PTI, ETI or control) and interactions with significant influence were enriched for intra-module interactions, reminiscent of earlier predictions [24]. As observed previously [25], network topologies suggested that PTI network components function synergistically to potentially drive an effective immune response whilst ensuring a threshold for activation (for example, by requiring consensus input from multiple signals), whereas ETI network components are more sparsely connected with compensatory relationships. This makes the ETI response robust to perturbation, perhaps necessary given the presence of multiple pathogen effectors in a cell.

Harnessing the power of network inference for crop improvement
There is a growing realization in the plant systems biology community of the need to develop suitable methodologies to efficiently transfer knowledge gained from high throughput studies to the field. The guilt-by-association approach has exhibited remarkable potential in identifying functionally related genes not only in Arabidopsis [6] but also in rice [29]. RiceNet combined the power of orthologue prediction (via AraNet) with additional ricespecific data to produce a specific functional association network with greater predictive ability for rice. Researchers queried RiceNet using a set of genes known to play a role in defence against Xanthomonas oryzae mediated by the resistance gene Xa21. After further prioritisation of proteins that interact with a component of the Xa21 interactome, three out of five genes tested exhibited a role in Xa21 resistance. Recently, Lee and colleagues [30] have updated the Arabidopsis functional relevance network with additional data, new data types (Arabidopsis high throughput protein-protein interaction data) and new methodology (gene co-citation). Furthermore, the network has been extended to enable queries (via orthology) for 27 non-model plant species. This updated network (AraNetv2) appears to have significantly better predictive power for non-model plants compared to the original AraNet facilitating gene discovery in plants with agricultural significance. There are a number of ways to generate functional networks for crop species including orthology, co-expression data from the crop, and genomic context similarity [31]. Orthology can be problematic in plants but incorporating at least some expression data from the crop can increase the value of such networks.
As seen in this review, the majority of defence modelling to date has focused on transcriptional networks; the pertinence of transcriptional regulators to crop breeding is illustrated by a comparative network study of maize and its ancestor teosinte [32]. Co-expression networks generated from profiling multiple maize and teosinte genotypes indicated that extensive transcriptional network rewiring has driven the astounding phenotypic changes during domestication of this globally important crop.

Conclusion
We have highlighted recent developments in modelling of the plant defence response. Studies in this area so far have concentrated on transcriptional regulation with modelling methods to predict gene function, identify specific regulatory relationships and uncover the importance of network topology in governing the immune response. Moving forwards there is a need to integrate transcriptional models with non-transcriptional signaling events, perhaps with more mechanistic models. We also need to generate data and methodology to drive network modelling in non-model plants. Transcriptional rewiring has played a substantial role in the evolution of modern crop cultivars and the importance of the transcriptome in plant defence suggests that similar manipulations could lead to enhanced resistance in the field.
screens for novel Arabidopsis thaliana abiotic stress genes. Plant Biotechnol J 2015, 13:501-513. This paper reveals how simple co-expression networks, when handled correctly, can be used to identify key components in the network with a high degree of confidence.