Co-expression clustering across flower development identifies modules for diverse floral forms in Achimenes (Gesneriaceae)

Background Genetic pathways involved with flower color and shape are thought to play an important role in the development of flowers associated with different pollination syndromes, such as those associated with bee, butterfly, or hummingbird pollination. Because pollination syndromes are complex traits that are orchestrated by multiple genes and pathways, the gene regulatory networks have not been explored. Gene co-expression networks provide a systems level approach to identify important contributors to floral diversification. Methods RNA-sequencing was used to assay gene expression across two stages of flower development (an early bud and an intermediate stage) in 10 species of Achimenes (Gesneriaceae). Two stage-specific co-expression networks were created from 9,503 orthologs and analyzed to identify module hubs and the network periphery. Module association with bee, butterfly, and hummingbird pollination syndromes was tested using phylogenetic mixed models. The relationship between network connectivity and evolutionary rates (dN/dS) was tested using linear models. Results Networks contained 65 and 62 modules that were largely preserved between developmental stages and contained few stage-specific modules. Over a third of the modules in both networks were associated with flower color, shape, and pollination syndrome. Within these modules, several hub nodes were identified that related to the production of anthocyanin and carotenoid pigments and the development of flower shape. Evolutionary rates were decreased in highly connected genes and elevated in peripheral genes. Discussion This study aids in the understanding of the genetic architecture and network properties underlying the development of floral form and provides valuable candidate modules and genes for future studies.

Each stage was sampled with two or three biological replicates, contributing four or six 136 samples for each species. The bud stage was chosen to represent a baseline level of gene 137 expression when floral phenotypes are largely similar between species that could then be 138 compared to the intermediate stage where floral phenotypes diverge and many important 139 pathways involved in flower development and pigmentation show increased expression levels 140 (Roberts and Roalson 2017). All tissues were immediately frozen into liquid nitrogen and stored 141 at -80°C. 142 143 RNA extraction and RNA-seq library construction 144 Total RNA was extracted from frozen tissue using the RNEasy Plant Kit (Qiagen) and 145 ribosomal depleted RNA samples were prepared using the RiboMinus Plant Kit (Thermo Fisher 146 Scientific). Stranded libraries were prepared using the NEBNext Ultra Directional RNA Library 147 Kit (New England Biolabs), barcoded, and pooled based on nanomolar concentrations. Library 148 quality and quantity were assessed using the Qubit (dsDNA HS assay; Thermo Fisher Scientific) 149 and BioAnalyzer 2100 (Agilent). The library pool was sequenced across 4 lanes of an Illumina 150 HiSeq2500 for paired-end 101 bp reads at the Genomics Core Lab at Washington State 151 University, Spokane. All new sequence data were deposited in the NCBI Sequence Read Archive 152 (BioProject: PRJNA435759). 199 The resulting counts for each contig were matched with the inferred orthogroups derived above, 200 creating two separate matrices for the Bud and D stage samples. In the resulting gene expression 201 matrices, rows corresponded to orthogroups and columns corresponded to individual samples 202 (Supplemental Data). Counts for the Bud and D stage samples were separately transformed using 203 variance-stabilizing transformation implemented in the Bioconductor package DESeq2 (Love et 204 al. 2014) and quantile normalized using the Bioconductor package preprocessCore (Bolstad 205 2018). Lastly, counts for both stages were separately corrected for confounding effects, such as 206 batch or species effects, using principal component regression with the 'sva_network' function in 207 the Bioconductor package sva (Parsana et al. 2019). 208 209 Co-expression networks and identification of modules 210 The Bud and D stage RNAseq data was clustered into gene co-expression networks using 211 the R package WGCNA (Langfelder and Horvath 2008). Prior to network construction, 212 orthogroups were filtered to remove any with >50% missing values or <0.3 expression variance. 213 In total, 9503 orthogroups common to both Bud and D stage data were used to construct separate 214 networks for each stage. Parameters for the Bud stage network were as follows: power = 18, 215 networkType = "signed", corType = "bicor", maxPOutliers = 0.05, TOMType = "signed", 216 deepSplit = 2, mergeCutHeight = 0.25. For the D stage network, the following parameters were 217 used: power = 16, networkType = "signed", corType = "bicor", maxPOutliers = 0.05, TOMType 218 = "signed", deepSplit = 2, mergeCutHeight = 0.25. Soft-thresholding powers for each network 219 ("power" parameter) were chosen as the lowest value such that the scale free topology model fit 220 (R^2) was 0.9 (Supplemental Data). Signed networks consider positively correlated nodes ≥ 221 connected, while unsigned networks consider both positively and negatively correlated nodes 222 connected (van Dam et al. 2018). We used signed networks here ("networkType" and 223 "TOMType" parameters) because this type is considered more robust to biological functions 224 with more specific expression patterns (Mason et al. 2009;Song et al. 2012). Unsigned networks 225 can capture only strong correlations and many negative regulatory relationships in biological 226 systems are weak or moderate (Ritchie et al. 2009). Biweight midcorrelation ("corType" 227 parameter) was used as this method is often more robust than Pearson correlation and more 228 powerful than Spearman correlation (Song et al. 2012). All other WGCNA parameters were kept 229 at their default values.

230
Module eigengenes and orthogroup connectivity were calculated in each network 231 separately using the 'moduleEigengenes' and 'intramodularConnectivity' functions in WGCNA, 232 respectively. The module eigengene is defined as the first principal component of a module and 233 represents the gene expression profile of a sample within a module (Langfelder and Horvath 234 2008). Connectivity refers to the sum of connection strengths a node has to other network nodes.

235
Module hub genes (defined as the most highly connected nodes within a module) were 236 identified in both networks based on module membership (kME) scores > 0.9, calculated using 237 the 'signedKME' function of WGCNA. kME is defined as the correlation between the 238 expression levels of a gene and the module eigengene (Horvath and Dong 2008). Additionally, 239 using connectivity measures, we defined peripheral genes in each network as the lowest 10% of 240 connected nodes. 241 We tested whether Bud stage modules were preserved in the D stage and vice versa using 242 module preservation statistics in WGCNA (Langfelder et al. 2011). Median Rank and Zsummary 243 statistics were computed using the 'modulePreservation' function of WGCNA, using 200 244 permutations (Langfelder et al. 2011). Median rank statistics were compared to the "gold" 245 module, which consists of 1000 randomly selected orthogroups. Modules with a lower median 246 rank exhibit stronger preservation than modules with a higher median rank. Zsummary scores > 247 10 indicate strong evidence of preservation and scores < 2 indicate no evidence of preservation. We previously inferred a phylogeny for the 12 sampled species using 1306 single-copy 251 orthologs identified from the same transcriptome dataset used here (Roberts and Roalson 2018). 252 For comparative analyses of module-trait relationships, we randomly sampled 50 single-copy 253 ortholog gene trees and rescaled branch lengths to be proportional to time (ultrametric) using the 254 'chronos' function in the R package ape (Paradis et al. 2004). Bootstrap support was 100 for 255 nearly every branch in the Roberts and Roalson (2018) phylogeny, therefore we chose to use 256 randomly sampled ortholog trees to account for phylogenetic uncertainty.  259 We correlated external traits (e.g., red flower color) with the module eigengenes using a 260 phylogenetic Markov Chain Monte Carlo method, implemented in the R packages MCMCglmm 261 (Hadfield 2010) and mulTree (Guillerme and Healy 2014). These analyses were performed 262 individually for each module in both the Bud and D stage networks. Three floral traits and 263 pollination syndrome were coded: primary flower color (purple, red, white, yellow), flower 264 shape (funnelform, salverform, tubular), corolla spur (absent, present), and syndrome (bee, 265 butterfly, hummingbird). We fit a multivariate mixed model with a non-informative prior, where 266 the shape and scale parameters equal 0.001, and residual variance was fixed. Analyses were 267 performed over the 50 randomly selected ortholog trees to account for phylogenetic uncertainty. 268 A phylogenetic model was used to account for any interspecific non-dependency in the dataset 269 by using a random effects structure that incorporated a phylogenetic tree model. Floral traits 270 were set as categorical response variables and the module eigengenes were set as the predictor 271 variable. For each tree, two MCMC chains were run for 250k generations and discarding the first 272 50k as burn-in. This process was run individually over each module in both the Bud and D 273 stages. We checked for convergence between model chains using the Gelman-Rubin statistic 274 (Gelman and Rubin 1992). Potential scale reduction values were all less than 1.1 and effective 275 sample sizes for all fixed effects were greater than 400. We considered fixed effects to be 276 statistically significant when the probabilities in the 95% credible region did not include zero 277 (Supplemental Figures 3-6).
279 Evolutionary rate analysis 280 Codon alignments of the filtered orthogroups (n=9503) were produced using PAL2NAL 281 v14 (Suyama et al. 2006). Subsequent alignments and corresponding gene trees were used as 282 input to the codeml package in PAML v4.9 (Yang 2007) to estimate d N /d S (omega, ω). d N /d S for 283 each orthogroup was estimated using a one-ratio model (model = 0, NSsites = 0), providing a 284 single estimate for each orthogroup to match other single orthogroup metrics, such as 285 connectivity. If d N /d S values were exceptionally large (= 999) because of zero synonymous 286 differences, those values were removed from the dataset.

287
First, we performed linear regression (LM) to test the effect of orthogroup connectivity, 288 average expression levels, and their interactions (explanatory variables) on the estimated d N /d S 289 values (main effect), LM = d N /d S ~ connectivity * expression. This was performed using the lm 290 function in R with 10000 bootstrap pseudoreplicates. Second, we ran two-sample t-tests with 291 10000 permutations to test whether hub nodes and periphery nodes in each network had lower 292 (for hubs) or higher (for periphery) d N /d S values than all background nodes. Third, we fit a LM 293 with 10000 bootstrap pseudoreplicates to test whether modules associated with a pollination 294 syndrome (bee, butterfly, or hummingbird; see Results below) had increased d N /d S values, LM = 295 d N /d S ~ syndrome. For all LM and permutation analyses, d N /d S , connectivity, and expression 296 values were log transformed to reach a normal distribution before being processed. Analyses 297 were performed separately for each network using the separately estimated connectivity and 298 expression levels.   Twelve transcriptomes were sequenced (72 libraries) and assembled in 12 species after 318 sampling across two floral developmental stages (Supplemental Table 1). The transcriptomes had 399 Likewise, weak evidence for trait association was considered when the MCMCglmm results 400 were significant, but only a single species or no species sharing the significant traits had elevated 401 (> 0.1) or decreased (< -1.0) eigengene expression. WGCNA does not use trait information when 402 inferring modules, so modules that correlated strongly with any floral trait were inferred without 403 a priori knowledge.  5-6). There 413 were 3 (4.6%), 7 (10.8%), and 13 (20%) modules associated with bee, butterfly, and 414 hummingbird pollination syndrome in the Bud stage network, while 1 (1.5%), 8 (12.9%), and 12 415 (19.4%) modules were associated with the same syndromes in the D stage. Modules were never 416 associated with more than one syndrome. Butterfly and hummingbird syndromes were never 417 correlated to the same modules and were often correlated in opposite directions (i.e., butterfly 418 was positive correlated and hummingbird was negative correlated).

419
Among the 3 modules that were correlated to the bee pollination syndrome in the Bud 420 stage were many orthogroups involved in hormone signaling pathways (particularly ethylene and 421 abscisic acid, n = 14), cell wall and lipid biosynthesis (n = 7), photomorphogenesis (n = 2), 422 nitrogen compound metabolism (n = 4), and the transport of potassium and monocarboxylic 423 acids (n = 5) (Table 4). In contrast, the 1 module correlated to the bee pollination syndrome in 424 the D stage was enriched for different functions from the Bud stage, including the positive 425 regulation of molecular functions and response to biotic stimuli (n = 7), phospholipid catalysis (n 426 = 2), xyloglucan biosynthesis (n = 2), mRNA stability and catalysis (n = 5), and chromatin 427 modification (n = 4) ( Table 5).  (Table 5). 436 Thirteen modules were correlated to the hummingbird pollination syndrome in the Bud 437 stage and the orthogroups were enriched for numerous functions, including many developmental 438 processes (Table 4). These modules were enriched for floral whorl development (n = 12), RNA 439 and mRNA modifications (n = 19), cell growth and division (n = 54), organelle assembly (n = 3), 440 transportation of carbohydrates and potassium (n = 22), and responses to external stimuli (n = 441 17) ( Table 4). Many of the same enrichments were seen in the 12 modules correlated to the 442 hummingbird pollination syndrome in the D stage. These modules had enrichments for floral 443 whorl development (n = 14), cell polarity and proliferation (n = 8), adaxial/abaxial specification 444 (n = 8), organ formation (n = 3), the regulation of growth (n = 26), and endosperm development 445 (n = 6). Additionally, these modules had enrichment for RNA and mRNA modification (n = 79), 446 transcription (n = 3), protein-DNA complex formation (n = 7), signaling (n = 25), and UV 447 response (n = 9) ( Table 5).  (Figures 5-6). Red and yellow are associated with hummingbird 452 pollination, purple with butterfly pollination, and white with bee pollination. In the Bud stage, 453 there were 14 (21.5%), 11 (16.9%), and 1 (1.5%) module associated with red, purple, or yellow 454 flower color. In the D stage, no modules were associated with yellow, but 8 (12.9%) and 9 455 (14.5%) modules were associated with red or purple flower color. In both networks, no modules 456 were corelated to white flower color. Similar to syndrome, purple and red flower color were 457 never associated with the same modules and were always correlated in opposite directions.

458
The 14 modules associated with red flowers in the Bud stage overlapped significantly 459 with the modules associated with the hummingbird pollination syndrome in the Bud stage and 460 contained the same functional enrichments (Table 4; Supplemental Table 8). However, in the 8 D 461 stage modules associated with red flowers, the orthogroups were enriched for functions involved 462 in flower morphogenesis (n = 8), cell wall modification (n = 3), positive regulation of gene 463 expression (n = 14), and histone modification (n = 7), among others (Supplemental Table 8).

497
Among the 5 modules associated with tubular flowers in the Bud stage were orthogroups 498 enriched for functions in the positive regulation of transcription and organelle assembly (n = 13), 499 RNA processing (n = 5), xylem development (n = 4), metabolism (n = 12), and many 500 biosynthetic processes (Supplemental Table 8). In the 5 modules associated with tubular flowers 501 in the D stage were enrichments for floral whorl development (n = 6), positive regulation of gene 502 expression (n = 15), mRNA modification (n = 5), and signaling (n = 12) (Supplemental Table 8).

503
Corolla spurs-The presence of corolla spurs is found only on butterfly pollinated 504 species, two of which were included: A. grandiflora and A. patens (Figure 1; Table 1). There was 505 expression in 5 (7.7%) modules associated with spurs in the Bud stage network, while only 1 506 (1.6%) module was associated in the D stage network (Figures 3-4). Within the 5 modules 507 associated with corolla spurs in the Bud stage were orthogroups related to miRNA processing (n 508 = 4), mRNA splicing (n = 9), the regulation of signal transduction pathways (n = 16), and pollen 509 tube growth (n = 6) (Supplemental Table 8). In the single module associated with corolla spurs in 510 the D stage (ME53) were very few orthogroups involved in microtubule organization (n = 2), 511 stamen development (n = 1), the positive regulation of cell division (n = 1), flavonoid 512 biosynthesis (n = 1), and indole acetic acid metabolism (n = 1) (Supplemental Table 8).

513
Non-associated-There were 13 and 9 modules not associated with any traits in the Bud 514 and D stage networks, respectively (Figures 5-6). Three of these modules in the Bud stage were 515 not-preserved in the D stage (modules ME46, ME44, and ME41; Figures 5-6; Supplemental 516 Table 5). The 9 non-associated modules in the D stage were also not associated with any traits in 517 the Bud stage (Figures 5-6) and the orthogroups had overlapping enrichments for many essential 518 processes (Supplemental Table 8). These modules were enriched for orthogroups involved in the 519 development of the inflorescence, stamens, and floral organs, biosynthesis of vitamins and 520 nucleotides, photosynthesis and photosynthetic electron transport, metabolism of phosphate 521 compounds, and cell surface receptor signaling (Supplemental Table 8). Numerous orthogroups 522 were also involved in the positive regulation of reproduction, the negative regulation of growth, 523 and the regulation of heterochronic development, histone methylation, and gibberellic acid 524 mediated signaling (Supplemental Table 8). 525 526 Evolutionary rates correlate to network location 527 We ran several analyses to explore how several network variables may affect 528 evolutionary rates of the orthogroups within each network (as measured by d N /d S ). First, we 529 tested the effects of orthogroup connectivity, expression levels, and their interaction on the d N /d S 530 values (Supplemental Table 7  Second, we tested whether the nodes we defined as hubs in each network had lower d N /d S 541 values, suggestive of increased evolutionary constraint (Supplemental Table 9). Hubs are 542 considered to functionally significant and may control large portions of the network. We found 543 that the hubs in the Bud stage network had lower d N /d S than all background genes (Two-sample 544 t-test, t(679.31) = 7.0934, p = 1.65e-12, Cohen's d = 0.383; Permutation t-test. mean diff. = 545 0.266, p = 0.0001) (Supplemental Figure 8). Similarly, hubs in the D stage network also had 546 lower d N /d S than all background genes (Two-sample t-test, t(755.91) = 4.1918, p = 1.55e-05, 547 Cohen's d = 0.206; Permutation t-test, mean diff. = 0.143, p = 0.0001) (Supplemental Figure 8). 548 Third, we tested whether the most peripheral nodes (the lowest 10% connected nodes) 549 had higher d N /d S values, which may suggest relaxed evolutionary constraint (Supplemental Table  550 Figure 8). 556 Lastly, we tested whether modules correlated with bee, butterfly, or hummingbird 557 pollination syndromes showed increased evolutionary rates over the non-associated modules.  Table 9). Only bee syndrome associated modules had 2 561 marginally lower d N /d S values in both networks (Bud stage, LM, t = -3.58, p = 0.000346; D 562 stage, LM, t = -2.508, p = 0.0122). Butterfly associated modules had marginally lower d N /d S only 563 in the D stage (LM, t = -2.548, p = 0.0108), while hummingbird associated modules did not have 564 any effect in either network. Here we report on patterns of gene co-expression across two stages of flower 568 development in 12 species of Achimenes, Eucodonia, and Gesneria. Our comparative and 569 phylogenetic context allowed us to reveal gene co-expression patterns across a group of closely 570 related plant species that correlated to diverse pollination syndromes and floral traits, including 571 flower color, flower shape, and corolla spurs. Our results suggest floral forms that are associated 572 with bee, butterfly, and hummingbird pollinator groups are correlated with many co-expression 573 modules. We found that nearly a third of modules in each network (23 and 21 modules in the 574 Bud and D stages, respectively) had evidence for association with the three pollination 575 syndromes (Figures 5-6). In our analysis, large fractions of our 9503 orthogroups set could be 576 partitioned into modules (Figure 3), network hubs and peripheral nodes identified (Supplemental 577 Tables 6-7), and several orthogroups were identified for their potential importance in floral 578 phenotypic differentiation. Lastly, orthogroup position in each network (measured by 579 connectivity) had an association with evolutionary rate (d N /d S ) (Figure 7). These analyses 580 provide a first overview of floral transcriptional architecture across members of the same genus 581 that display diverse floral forms.  Hub nodes (genes) are considered to be conserved, highly connected nodes that are 594 central to network architecture and may directly orchestrate numerous biological processes 595 (Ravasz et al. 2002;Hu et al. 2016;Mähler et al. 2017). Our analyses identified over 600 gene 596 nodes as hubs within each network, the majority of which were only hubs within a specific floral 597 stage. These hub nodes tended to have lower d N /d S estimates, suggesting that they not prone to 598 rapid evolutionary change (Supplemental Table 9). Unsurprisingly, hub nodes in the Bud stage Floral forms corresponding to bee, butterfly, and hummingbird pollination have evolved 905 multiple times across the Neotropical plant genus Achimenes. Genome-wide gene expression 906 estimates were taken from flowers in 12 species across two development stages in order to 907 construct, analyze, and compare stage-specific gene co-expression networks. We hypothesized 908 that numerous modules in each network would correlate to these pollination syndromes and that 909 central genes in the network may be candidates for involvement in the development of important 910 floral traits, such as flower color and shape. We found that nearly a third of modules were 911 correlated to the pollination syndromes and many more were correlated to different flower colors 912 and shapes. The outgroup species, E. verticillata (bee pollinated) and G. cuneifolia 913 (hummingbird pollinated), displayed correlation patterns similar to the ingroup species that 914 shared floral traits, suggesting that some co-expression patterns might be shared across 915 evolutionary distances. Several of the hub genes in the networks were homologs of the 916 anthocyanin and carotenoid biosynthetic pathways, important for the production of floral 917 pigments that attract different pollinators. A negative relationship between network connectivity 918 and d N /d S corroborates the findings in model systems that more centrally located nodes (likely 919 hubs) are under increased evolutionary constraint. We found that the less connected genes  Flower color, Flower shape, and Spurs) that were tested for an association to module eigengenes using MCMCglmm. The eigengene expression is indicated for each module in which all three replicates from an individual species had > 1.0 or < -1.0 expression. An association was considered strong when both MCMCglmm results were significant and the eigengene expression was > 1.0 or < -1.0 in multiple species that shared the significant trait.
An association was considered weak when MCMCglmm results were significant but the eigengene expression was > 1.0 or < -1.0 in a single species or no species that shared the significant trait. Eigengene expression for species that share traits to those found significant by MCMCglmm are in bold.

Figure 6
Summary of D stage network module association to floral traits and pollination syndrome Each row represents an individual module in the D stage network and shows the total size (column All), the number of hubs (column Hubs), and whether the module is preserved in the Bud stage. Each column in the central table represents the floral traits (Pollination Syndrome, Flower color, Flower shape, and Spurs) that were tested for an association to module eigengenes. The eigengene expression is indicated for each module in which all three replicates from an individual species had > 1.0 or < -1.0 expression. An association was considered strong when both MCMCglmm results were significant and the eigengene expression was > 1.0 or < -1.0 in multiple species that shared the significant trait. An association was considered weak when MCMCglmm results were significant but the eigengene expression was > 1.0 or < -1.0 in a single species or no species that shared the significant trait. Eigengene expression for species that share traits to those found significant by MCMCglmm are in bold.