Updated Rice Kinase Database RKD 2.0: enabling transcriptome and functional analysis of rice kinase genes

Background Protein kinases catalyze the transfer of a phosphate moiety from a phosphate donor to the substrate molecule, thus playing critical roles in cell signaling and metabolism. Although plant genomes contain more than 1000 genes that encode kinases, knowledge is limited about the function of each of these kinases. A major obstacle that hinders progress towards kinase characterization is functional redundancy. To address this challenge, we previously developed the rice kinase database (RKD) that integrated omics-scale data within a phylogenetics context. Results An updated version of rice kinase database (RKD) that contains metadata derived from NCBI GEO expression datasets has been developed. RKD 2.0 facilitates in-depth transcriptomic analyses of kinase-encoding genes in diverse rice tissues and in response to biotic and abiotic stresses and hormone treatments. We identified 261 kinases specifically expressed in particular tissues, 130 that are significantly up- regulated in response to biotic stress, 296 in response to abiotic stress, and 260 in response to hormones. Based on this update and Pearson correlation coefficient (PCC) analysis, we estimated that 19 out of 26 genes characterized through loss-of-function studies confer dominant functions. These were selected because they either had paralogous members with PCC values of <0.5 or had no paralog. Conclusion Compared with the previous version of RKD, RKD 2.0 enables more effective estimations of functional redundancy or dominance because it uses comprehensive expression profiles rather than individual profiles. The integrated analysis of RKD with PCC establishes a single platform for researchers to select rice kinases for functional analyses. Electronic supplementary material The online version of this article (doi:10.1186/s12284-016-0106-5) contains supplementary material, which is available to authorized users.


Background
Protein kinases are involved in diverse cellular and biological processes. Elucidation of their roles is limited; functions for only 61 kinases in rice (Oryza sativa) have been ascribed (Yamamoto et al. 2012). This subset represents approximately 4.1 % of all kinase genes in that genome. Since sequencing of the entire rice genome was completed in 2005 (IRGSP 2005), diverse omics datasets have been generated that include genome-wide expression analysis using microarrays as well as RNA-seq, re-sequencing of 3000 rice accessions, protein-protein interactomes, gene regulatory networks, and metabolic pathways. Because these databases use very different platforms for formatting, it has been a challenge to integrate publicly available datasets to facilitate functional genomics studies (Chandran and Jung 2014;Alexandrov et al. 2015).
Functional redundancy is a significant obstacle to the identification of gene functions. Completed whole-genome sequences enable researchers to estimate the extent of functional redundancy due to gene duplications. From 20 to 60 % of the rice genome is tandemly or segmentally duplicated (Itoh et al. 2007;Lin et al. 2008). In addition, 21,403 proteins in rice have been assigned to 3856 paralogous protein family groups, thus demonstrating that more than 50 % of all rice genes annotated to non-transposable element (non-TE) gene models have functional redundancy because of their paralogous relationships (Lin et al. 2008). However, not all paralogous genes have such redundancy. Knock-out mutant analysis has identified light responsedefective phenotypes in T-DNA insertional mutant lines for nine gene family members that are predominantly expressed under high illumination ). In addition, 79 of 127 ubiquitously expressed genes which have been functionally characterized through loss-offunction studies belong to gene families (Jung et al. 2015). In both cases, functional dominance of the characterized genes within a particular family has been readily estimated through phylogenomics analyses that integrate metadata from diverse expression datasets within a phylogenetics context (Jung et al. 2015).
To date, six phylogenomics databases have been developed for rice. These include kinase, glycosyltransferase, glycoside hydrolase, transcription factor, transporter, and cytochrome p450 databases (Jung et al. 2015). Rice kinase database (RKD) was the first of these to be established, and it has provided a basic framework for all others (Dardick et al. 2007;Jung et al. 2010). Application of RKD in facilitating functional genomic studies was previously demonstrated. For example, phylogenomics approach played a major role in the identification of a set of MAPK, MAPKK, and MAPKKK genes that are part of same signaling cascades, and in predicting the regulatory model of a light-inducible kinase gene (Jung et al. 2010).
One unique feature of RKD compared with other databases is an option to integrate protein-protein interaction network data based on yeast two-hybrid or tandem affinity purification tagging analyses (Dardick et al. 2007). A large set of this integrated transcriptomic data, retrieved from the National Center for Biotechnology Information Gene Expression Omnibus (NCBI GEO; www.ncbi.nlm.nih.gov/ geo/. accessed on November 21, 2015), is a main provider of diverse biological information about interesting kinase genes for further functional studies Jung et al. 2010). The current format of those databases for integrating transcriptome data has proven cumbersome because users must mine data from a list of datasets within NCBI GEO. However, Genevestigator, RiceXPro, and Rice Oligonucleotide Array Database (ROAD; www.ricearray.org/, accessed on December 2, 2015) have recently integrated these meta-expression data through classification and reconstruction to enhance accessibility (Zimmermann et al. 2008;Cao et al. 2012;Sato et al. 2013).
Here, we describe RKD 2.0, an updated RKD that integrates metadata from genes expressed in particular tissues and in response to abiotic/biotic stresses and hormone treatments. Using these data, a genome-wide meta-analysis of expression patterns for all predicted rice kinases have been performed and identified kinaseencoding genes with distinct expression patterns. Also, steps involved in the construction and application of RKD 2.0 has been discussed.

Results and discussion
Updated features of RKD 2.0 The earlier version, RKD, presented phylogenomics data for 1508 kinases comprising 65 subfamilies, and a group of kinase genes that could not be assigned to any of those subfamilies (Dardick et al. 2007;Jung et al. 2010). Because that version utilized all gene models (splicing variants), the integrated transcriptome information showed redundant expression data for multiple models produced from a particular locus. However, for the current version, we selected one representative model per locus based on the information available from rice genome annotation project (RGAP) database ) that could provide a simplified and comprehensive visualization of the integrated data. Likewise, instead of taking individual datasets from NCBI GEO (Barrett et al. 2011), Affymetrixbased anatomical metadata (ROAD; http://www.ricear ray.org/expression/experiment_search.php, accessed on December 2, 2015), used in ROAD (Cao et al. 2012) have been integrated within the phylogenic tree context. Furthermore, meta-analysis with diverse datasets of genes that are expressed in response to biotic and abiotic stresses as well as hormones treatment was performed. Unlike the average normalized intensity data used for an anatomical meta-expression database, log 2 fold-change data for treatment versus control have been introduced. Differential expression in response to five pathogens (Magnaporthe grisea, MG; Magnaporthe oryzae, MO; rice stripe virus, RSV; Xanthomonas oryzae pv. oryzae, Xoo; and the brown plant hopper, BPH), in addition to five abiotic stresses (drought, salt, cold, heat, and submergence) and six hormones (abscisic acid, ABA; brassinolide, BL; gibberellic acid, GA; indole-3-acetic acid, IAA; trans-zeatin, tZ; and jasmonic acid, JA) were also analyzed. The results are presented in Additional files 1, 2, 3, 4 and 5: Tables S1-S5. The expression data used for metaanalysis from anatomical tissues and in stress responses were based on Affymetrix array data downloaded from the NCBI GEO (Cao et al. 2012;Zimmermann et al. 2008). For hormone responses, those expression data were based on Agilent 44 K arrays (GSE39429) (NCBI GEO; http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi? acc=GSE39429, accessed on November 21, 2015) (Sato et al. 2013). As an alternative platform for meta-analysis of microarray data, RNASeq data of anatomical and abiotic stress treated samples were integrated to the database (Secco et al. 2013;Wang et al. 2015). All of these updated   (Yamamoto et al. 2012), 61 RGAP loci related to kinases that have been functionally characterized were retrieved. Among them, function of a subset of kinase genes were related to more than one trait. Thus, 61 kinases are associated with 124 traits. Although these genes were broadly classified as conferring morphological traits (35 genes), physiological traits (27), stress responses resistance (59), or other biological functions (3) ( Table 1), 63 of them are associated with multiple functions. Among those for morphological traits, 10 genes each target for culm leaves and dwarfism. Another three genes each are related to panicle/flower formation and seeds, four to roots and five to seedling development. The major physiological effects of rice mutants lead to sterility (6 genes), indicating that the genes in this category are important for fertilization and embryo development. The physiological-trait classification also includes four genes related to source activity, six for flowering, two for eating quality, and one for seed production. In that group, mutations of three genes are attributed to germination dormancy. In addition to the panicle-dependent effect, physiological effects can be observed in the roots (two genes) and culm leaves (one gene). Mutation of one gene causes lethality. The stress-response classification is subdivided into genes that function in biotic stresses, abiotic stresses, and resistance to mechanical stresses such as lodging. With regard to biotic stresses, 15 genes have roles in blast resistance, eight including Xa21 gene involved in bacterial blight resistance (Song et al. 1995) , function of a gene associated with insect resistance and one in resistance to other diseases. For abiotic stresses, eight genes are Fig. 1 Heatmap analysis of kinase genes showing anatomical tissue-preferred expression pattern. Gene expression pattern reveals five tissuepreferential groups and the total genes in each group is shown in parenthesis. Affymetrix dataset includes the gene expression profiles from both indica and japonica subspecies plant samples. Sample profiling from subspecies are represented using black and grey bars for indica and japonica respectively. Log2-transformed intensity values were taken for generating heatmap and the scale is given above the heatmap. Genes with red arrow pointing the heatmap are the known genes associated with drought, 15 with salinity, six with chilling, and another five with other types of stresses. Analysis of previously characterized genes indicates that kinases plays diverse role in physiology and growth of rice plants.
Functional assignment of rice kinase genes using expression data To assess the biological functions of rice kinases, metaexpression profiles generated from a large collection of microarray-based expression datasets available in NCBI GEO have been used. Global analysis with anatomical data suggested that 55 kinases are predominantly expressed in the roots, 41 in the leaves/shoots, 58 in the callus/panicles, 36 in anthers/pollen and 71 genes are ubiquitously expressed ( Fig. 1; Additional file 1: Table S1). Of these, functions for 16 genes have been elucidated through genetics and molecular studies (Table 1). For example, LOC_Os05g40770 is categorized under root preferred featured anatomical expression group in the current investigation. Previous study revealed that OsRPK1/ LOC_Os05g40770 is expressed in the root tips and encodes a Ca 2 + -independent leucine-rich-repeat (LRR) receptor-like kinase (RLK). Transgenic rice plants overexpressing OsRPK1 showed undeveloped adventitious and lateral roots (Zou et al. 2014). Biotic stress-related meta-expression data revealed significant upregulation in response to MG (29 kinases), MO (21 kinases), RSV (13 kinases), Xoo (18 kinases), and BPH (49 kinases) ( Fig. 2; Additional file 2: Table S2). Of these, functions have been elucidated for eight genes (Table 1). OsACDR1/LOC_Os03g06410, for which present meta-analysis revealed induction under BPH infection, encodes a putative Raf-like mitogen-activated protein kinase kinase kinase (MAPKKK) and plays a positive regulatory role against fungal infection by modulating defense related gene expression (Kim et al. 2009).
Abiotic stress-related meta-expression data showed that 43 kinases are induced by drought or salt stress, 166 by low temperature, 44 by heat, and 43 by submergence ( Fig. 3; Additional file 3: Table S3). Of these, functions of 23 kinase genes have been elucidated (Table 1). They include calcium-dependent protein kinase OsCDPK7/ Fig. 2 Heatmap analysis of expression patterns for kinase genes related to biotic-stress responses. Biotic stress responsive kinase genes heatmap is generated using log2-transformed fold change values under four stress conditions. Number of stress responsive genes in each condition is shown inside parenthesis after the applied biotic stress. Genes with red arrow pointing the heatmap are the known genes LOC_Os04g49510 which acts as positive regulator for stress conditions and confers cold, salt and drought resistance on overexpression (Saijo et al. 2000). Interestingly, we also noted the cold depend induction of OsCDPK7 in the abiotic stress dataset based analysis and provides a hint about more potential candidates in the featured groups. Furthermore, the genes that are differentially expressed (log2 fold change >2 and P < 0.05) in root under heavy metal treatment were analyzed to reveal the potential kinase genes involved in soil toxicity. It was revealed that 21 genes were upregulated whereas expressions of 92 genes were declined under arsenic treatment in soil. Similarly, 44 and 79 genes are induced and declined by cadmium treatment, respectively. More numbers of kinase genes were responsive to chromium: expressions of 142 genes are induced; and that of 206 genes were reduced. In case of lead treatment in soil, 36 genes are upregulated and 54 genes are downregulated (Additional file 4: Table S4).
The meta-expression data for hormone responses indicated that 67 kinases are up-regulated upon ABA treatment, 27 by IAA, 32 by tZ, and 134 by JA treatment ( Fig. 4; Additional file 5: Table S5). Among them, functions of 18 genes have been characterized previously.
Based on the featured expression data, it is evident that these 947 kinases may serve as potential primary targets for further functional studies, with 261 related to anatomy; 130, to biotic stresses; 296, to abiotic stresses; and 260, to hormone responses.

Evaluation of functional redundancy or functional dominance for rice kinases with known functions via PCC analysis
Regarding morphological or physiological traits, the roles played by 26 kinases have been identified through loss-of-function studies. Pearson correlation coefficient (PCC) can be used to measure the linear dependence between two genes, with a value of '1' representing a perfect, positive correlation. Among paralogous kinase members, PCC values are useful indicators when evaluating functional similarities. PCC values for one to three family members closely linked to each kinase have been generated based on anatomical meta-expression Fig. 3 Heatmap analysis of kinase genes related to abiotic-stress responses. Abiotic stress responsive kinase genes heatmap is generated using log2-transformed fold change values under four stress conditions. Number of stress responsive genes in each condition is shown inside parenthesis after the applied abiotic stress. Genes with red arrow pointing the heatmap are the known genes data. These data, summarized in Table 2 and Fig. 5, indicated that 10 kinases exist independently from a clade and another nine have no closely linked members with PCC values >0.5. This explains functional dominancy for the 73 % of kinases with known functions. Therefore, the PCC approach can provide useful estimates of functional redundancy or dominance among family members. In addition, integrating anatomical expression data within the phylogenetics context serves as an effective platform for examining functional dormancy or redundancy within a gene family. Examples of integrated anatomical expression data for subgroups of three kinase families are shown in Fig. 6. Compared with other members, the dominant expression of OsCDPK1 (LOC_Os03g03660) has a PCC value of 0 to 0.5 (Fig. 6a) and predominant expression of OsBAK1 (LOC_Os08g07760) produces a PCC value below zero (Fig. 6B), while OsABC1-2 (LOC_Os02g36570) has redundant expression patterns with PCC values above 0.5 with LCO_Os09g07660 (Fig. 6c). In the case of OsABC1-2, functional studies have used RNAi to repress the expression of OsABC1-2 to overcome the functional redundancy. The observed phenotype might have been caused by suppression of multiple targets closely linked with LOC_Os09g07660 in that tree. Further investigation is needed to clarify this possibility.

Conclusions
Of the 1508 kinases featured in RKD 2.0, functions for all except 61 kinases require further examination. This new database will enable researchers to select target kinases more effectively for additional characterization. The predominant expression found among paralogous kinases might be a valuable indicator of functional dominance. Based on loss-of-function analyses and application of PCC values between a kinase and its paralog, we were able to explain the dominancy of 19 kinases with PCC values above 0.5 as well as those that form independent clades in a phylogenetic tree. Primary candidates for loss-of-function studies can be chosen by identifying kinase genes with dominant expression and then comparing them with closely linked family members. These metaexpression data also serve as a source of information about the location and/or conditions that can be applied Fig. 4 Heatmap analysis of expression patterns for kinase genes in response to hormone treatments. Hormone responsive kinase genes are generated from Agilent log2-transformed fold change hormone dataset, which includes responses from six major hormone treatments at different time intervals in root and shoot. Group of genes induced under specific hormones treatment along with their total number is indicated in parenthesis. Genes with red arrow pointing the heatmap are the known genes in subsequent assessments as well as situations in which true functional redundancy may exist. Additional datasets, such as predicted protein-protein interaction networks, further enhance database utility by indicating direct targets regulated by kinases. With the use of gene-indexed mutants that cover at least half of the rice genome, accumulation of diverse omics data will further enhance the functionality of such phylogenomics databases.

Collection of microarray data
Anatomy: Expression data for the anatomical tissues/ organs were retrieved from ROAD (Cao et al. 2012).
Abiotic stress: The dataset for abiotic stresses comprised multiple samples from NCBI GEO under the series GSE6901, GSE16108, GSE21651, GSE24048, GSE25176, GSE26280, GSE33204, GSE37940, and  Indicates paralogs of rice kinase genes in gene column which are identified from RKD 2.0 databases c Indicates Pierson correlation coefficient (PCC) value between paralog pairs in gene and paralog columns d Indicates rice kinase genes (marked with red letters) used in Figure 6. Brown box has 0.5 > PCC value and is estimated to have redundant roles among paralogs; green one with negative PCC value and uncolored area with 0 < PCC value < 0.5 are estimated to have predominant roles among paralogs  Fig. 6 Phylogenomics data for three kinase subfamilies that include members with known functions. Integrating anatomical expression data within context of phylogenic tree suggested functional dominancy of OsCDPK1 (LOC_Os03g03660) with PCC value <0.379 (a), OsBAK1 (LOC_Os08g07760) with PC value of −0.241 (b), and functional redundancy of OsABC1-2 (LOC_Os02g36570) with PCC value of 0.794 = (c). Red boxes indicate pairs of kinases used for PCC analysis, gray color indicates the genes without probes in affymetrix array platform, and asterisks (*) indicate the selected kinases GSE38023. Experiment IDs in the Array express dataset (Array Express; https://www.ebi.ac.uk/arrayexpress/. Accessed 2 December 2015) included E-MEXP-2267and E-MEXP-2401. Altogether, 145 individual experiments were parsed for this analysis.
Biotic stress: The GEO series for the biotic stress database involved GSE7256, GSE30941, GSE41798, GSE18361, GSE36272, GSE29967, and GSE11025. In all, 103 individual samples were analyzed under these seven series.
Hormone response: For analyzing hormone-related meta-expression, we relied upon the Agilent 44 K array data GSE39429, as generated by Sato et al. (2013).
Collection of RNA-Seq data RNA-Seq anatomical expression dataset consisting of 27 tissues/organs were retrieved from source data developed by Wang et al. (2015). In addition, spatio-temporal transcriptome data of rice root in response to phosphate starvation and recovery were used (Secco et al. 2013).

PCC analysis
An anatomy dataset was used for estimating PCC values among neighboring genes within a subclade. For each gene, the intensity level of the RGAP gene model probes was fetched and formatted in pairs using in-house scripts. For every pair of genes that showed expression across a range of experiments, a PCC value was calculated using the Microsoft Excel function.

Heatmap analysis
To examine the expression patterns of kinase genes in various rice tissues/organs as well as in response to abiotic/biotic stresses or hormone treatments, we used a meta-expression analysis based on 1150 Affymetrix array data plus GEO series GSE39429, which utilized the Agilent 44 K microarray platform (GEO platform GPL6854). We then uploaded the log 2 normalized intensity data (tab-delimited text format) into Multi Experiment Viewer (MEV; http://www.tm4.org/, accessed on October 20, 2015) and created the desired heatmaps ( Fig. 1; Additional file 1: Table S1). In addition, we developed a meta-expression database for biotic stress responses to MG, MO, RSV, Xoo, and BPH ( Fig. 2; Additional file 2: Table S2), a meta-expression database for responses to drought, salt, cold, heat, and submergence ( Fig. 3; Additional file 3: Table S3), and a meta-expression database for responses to six hormones (ABA, GA, IAA, BL, JA, and cytokinin/trans-Zeatin) ( Fig. 4; Additional file 5: Table S5). Expression data for selected kinases were presented by heatmaps. For other methods used for construction of RKD 2.0, RKD original version retains the details (Dardick et al. 2007).