Transcription through the eye of a needle: daily and annual cycles of gene expression variation in Douglas-fir needles

Background Perennial growth in plants is the product of interdependent cycles of daily and annual stimuli that induce cycles of growth and dormancy. In conifers, needles are the key perennial organ that integrates daily and seasonal signals from light, temperature, and water availability. To understand the relationship between seasonal rhythms and seasonal gene expression responses in conifers, we examined diurnal and circannual needle mRNA accumulation in Douglas-fir (Pseudotsuga menziesii) needles at diurnal and circannual scales. Using mRNA sequencing, we sampled 6.1×109 microreads from 19 trees and constructed a de novo pan-transcriptome reference that includes 173,882 tree-derived transcripts. Using this reference, we mapped RNA-Seq reads from 179 samples that capture daily, seasonal, and annual variation. Results We identified 12,042 diurnally-cyclic transcripts, 9,299 of which showed homology to annotated genes from other plant genomes, including angiosperm core clock genes. Annual analysis revealed 21,225 an-nually-cyclic transcripts, 17,335 of which showed homology to annotated genes from other plant genomes. The timing of maximum gene expression is associated with light quality at diurnal and photoperiod at annual scales, with two-thirds of transcripts reaching maximum expression +/− 2 hours from sunrise and sunset, and half reaching maximum expression +/− 20 days from winter and summer solstices. Comparison to published microarray-based gene expression studies in spruce (Picea) show that the rank order of expression for 760 putatively orthologous genes was significantly preserved, highlighting the generality of our findings. Conclusions This finding highlights the extensive annual and seasonal transcriptome variability demonstrated in conifer needles. At these temporal scales, 29% of expressed transcripts showed a significant diurnal rhythm, and 58.7% showed a significant circannual rhythm. Remarkably, thousands of genes reach their annual peak activity during winter dormancy, a time of metabolic stasis. Photoperiod appears to be a dominant driver of annual transcription patterns in Douglas-fir, and these results may be general for predicting rhythmic transcription patterns in emerging gymnosperm models.


BACKGROUND
The sensing of daily, seasonal, and annual environmental variation in land plants is accomplished using a diverse array of organs, transcriptional regulators that drive oscillatory functions, and pathways that refine, optimize, and entrain rhythms to rhythmic environmental stimuli. Circadian patterns are ubiquitous in photosynthetic [1] [2][3] [4] and nonphotosynthetic [5] [6] organisms, and they are essential for coordinating external signals for optimally timing transcriptional activity to match the demands of growth and phenology with resource availability. The genetic basis for diurnal responses in model plants have been intensively studied using mutant screens [4][7] [8] and global transcription [7], and these studies have identified components of the central oscillator ("core clock") and genes that are targets of rhythmic activation and repression. Variability in core clock genes alters the timing of the clock and clock-dependent pathways, so subtle changes in genes controlling clock rhythm may contribute to local adaptation on a short time scale, and evolutionary divergence on a longer time scale [1] [9].
In temperate zone trees, circadian rhythms are superimposed on longer annual cycles involving transitions between active growth --the time when light energy is captured and converted into growth --and dormancy, the time when growth potential is arrested to protect cells from seasonal stresses of cold temperatures, freezing and desiccation [10] [11] [12]. Accurately timed transitions between growth and dormancy are essential for adapting to variable environments [13] and they depend on: (1) reliable environmental cues that forecast future change; (2) diverse sensory organs; and (3) competing biochemical networks that integrate sensory signals and shift responses from one 'module' (growth; dormancy; senescence) to the next. Photoperiod and light quality are known to be important cues for initiating seasonal growth rhythms and establishing the onset of dormancy for many trees [11] [12] [14] [15], so pathways involved in light capture and photoperception are expected to show annual rhythmic variation. As dormancy is established, trees sense and respond to cold temperatures (chilling units) by increasing cold hardiness and increasing resistance to desiccation. After chilling thresholds are met, trees respond to warming temperatures (forcing units) by releasing tissues from dormancy, remaining in a 'stand-by' state until requirements for initiating growth (heat, water availability, light quality [10] [14] [16]) are met. Genes implicated in dormancy and resumption of spring growth should also show annual cyclic variation, and these include components of the circadian clock and photoperiod-responsive genes [10] [11] [14] [15], and pathways involved in temperature and water perception [15][17] [18], hormone regulation and cell growth [10] [19], and glucan hydrolysis [19].
The organs responsible for sensing and integrating external stimuli --leaves, shoots, and roots --are common to perennial plants, but signal perception and integration during the dormant season is likely accomplished by different means in conifer trees with perennial leaves ("needles"), versus trees with deciduous leaves. Perennial needles confer advantages by preserving annual investments in carbon fixation, conducting photosynthesis year round [20] [21], offering alternative mechanisms for preventing winter embolisms [22], and providing an environmentally-responsive sensor that adds to budand stem-associated signals during dormancy. Perennial needles also come with fitness trade-offs because they can be damaged by cold during entry into dormancy and during de-acclimation in spring [21] [23], and by photosystem excitation that can lead to the formation of reactive oxygen species during dormancy [21] [24]. Given the complexity involved in orchestrating growth and stasis over annual cycles, annual gene expression variation in conifer needles should show high complexity, especially when compared with the annual leaves characteristic of model trees (Populus L.) and herbs (Arabidopsis Heynh. in Holl & Heynh).
To gain an understanding of the complexity of circadian and circannual rhythms of gene expression in conifers, we examined cyclic gene expression in needles from Douglas-fir (Pseudotsuga menziesii (Mirb.) Franco) at daily and annual scales. Douglasfir is related to model conifers like Norway spruce (Picea abies L.) and Loblolly pine (Pinus taeda L.) through divergence in the Early Cretaceous ~130 MYA [25], and to angiosperms like Populus and Arabidopsis through a more ancient divergence ~300 MYA. Douglas-fir has an expansive native range in North America [26], and it is noteworthy among conifers for its significant population variation in needle cold hardiness, phenology, and growth traits [27] [28]. Some of the strongest associations between provenance source and quantitative traits in Douglas-fir are exhibited by needle traits related to annual cycles of freeze-avoidance, such as spring and fall needle cold hardiness, and rhythmic cues that define the onset of winter like the first winter freeze and variability in the frost free period [23][29] [30].
In this study, we use next-generation mRNA sequencing to produce individual de novo needle transcriptomes from 19 Douglas-fir trees, and use the resulting "pan-transcriptome" as a reference for mapping RNA-seq reads from experiments evaluating diurnal variation over two daily cycles and circannual variation over one annual cycle. We specifically searched for transcripts exhibiting rhythmic expression [31][32] to define the timing of maximum expression ("phase") and amplitude. Our results provide the first characterization of year-round transcriptome-wide activity in leaves from a perennial plant, and they identify a core set of transcripts that show evidence for significant cycling on seasonal and annual scales. Our results show congruence with microarray-based studies of seasonal gene expression in Sitka spruce [33], and they underscore the potential for using gene expression information from Douglas-fir to predict cyclic transcriptome responses, and gene expression patterns responding to different environmental cues in temperate-zone conifers.

Plant materials and sample information.
Trees used for annual analysis are from a larger reciprocal translocation study [18][34] that includes multiple sources of Pseudotsuga menziesii var. menziesii from the Pacific Northwest of North America. Trees were chosen to maximize differences in source climatic and phenology [28]; included are families from regions that derive from cold/wet sources (47 Fig. 1). Needles were collected between 11am and 1 pm (ZT = 05:00 to 07:00) from 13 individual trees. Environmental data (sunrise; sunset; day length; cumulative weekly precipitation; minimum and maximum daily temperature) was collected over the duration of this experiment (File S1; Fig. 1). Sampling intervals and RNA-seq sample sizes for the annual study are summarized in Notes S1. Trees used for diurnal analysis derive from the warm/dry region (43.3° N, -123.1° W, 429 m elevation) and were grown in Corvallis, Oregon, USA (42.3° N, -122.9° W, 74 m elevation; supplemental notes S1). Needles were collected from 6 individual three-year old trees three half-sib families, two sibs per family). Needles were collected at 4 hour intervals, starting at 2 AM, for a total of 48 hours in early fall (September 7,8). For this experiment, sunrise (ZT0) occurred at 06:44 AM, and sunset occurred at 19:34 PM, giving a 12:50 photoperiod. Sampling intervals and RNA-seq sample sizes for the diurnal study are summarized in Notes S1.
Indexed Illumina RNA-seq libraries used 2 µg total RNA and TruSeq chemistry (Illumina Inc., San Diego, CA, USA), modified for strand-specific sampling [36]. In this protocol, first-strand synthesis products were desalted to remove unincorporated dNTPs (Sephadex G-25; Sigma-Aldrich, St. Louis, MO, USA), and reconstituted in dNTP-free secondstrand synthesis buffer with second strand enzyme mix (New England Biolabs) and a dUTP/dNTP mixture (Thermo-Scientific, Waltham, MA, USA) to incorporate dUTP into the second strand. All other steps follow Illumina protocols, except that uracilcontaining strands were degraded using a uracilspecific excision reagent mixture (37°C for 15 minutes; New England Biolabs) prior to PCR. Amplified libraries were quantified and pooled at equimolar 6-plex representation at 10 nM. Sequencing was performed at Oregon State University's Center for Genome Research and Biocomputing  see Notes S1).
Transcriptome assemblies and annotation. Microreads from individual trees were combined over all time periods to create 19 single-tree source files. Reads were quality trimmed using Trimmomatic v.0.30 [37] (using options -phred33 LEAD-ING:20 TRAILING:20 SLIDINGWIN-DOW: 5:20), and 19 individual transcriptome assemblies were de novo assembled using Trinity v.r2013_08_14 [38] using a minimum size of 200 bp. To create a pan-transcriptome reference, the longest transcripts from each component in singletree de novo assemblies were identified and combined into a single file, and transcripts smaller than 300 bp were removed. This combined file was sorted by sequence length and then clustered using USEARCH v.7.0.1001 [39]. The usearch64 -cluster_smallmem command was used with a sequence identity threshold of 90% (-id 0.9 flag) and with the -strand both flag to combine forward and reverse transcripts into the same clusters. A table of the number of input bases from each library, transcripts assembled for each individual, and the combined clustered reference assembly is provided in Notes S1. Individual assemblies and the pan-transcriptome reference are available for download at the TreeGenes Forest Tree Genome database web site under the link "Pseudotsuga menziesii Transcriptome" [40].
To annotate plant-derived transcripts, we used BLASTX and TBLASTX [41] to identify transcript matches from the NCBI NR database (minimum identity; expect < 1e -10 ), and BLASTX to identify matches to the Mercator plant metabolic function database [42] [43][44]. We used LASTZ [45] to identify chloroplast and mitochondrial transcripts using the Pseudotsuga sinensis chloroplast (NC_016064.1) and Loblolly Pine draft mitochondrial [46] genomes as references. We used BLAT [47] to search for conservation between Douglas-fir transcripts and conifer genome references, the Loblolly pine reference genome [48](Pinus taeda version 1.0), and a pre-publication draft for the Doug-las-fir genome [49](version 0.5). LASTZ and BLAT searches used a match criterion of >80% identity with contiguous hits > 50 bp. Annotations are available at at the TreeGenes Forest Tree Genome database web site under the link "Pseudotsuga menziesii Transcriptome" [40].
Detecting diurnal and annual cyclic transcriptome variation. RNA-seq reads from individual tree samples were aligned using BowTie 2.2.3 [50] with the following call: bowtie2 --end-to- This allowed 15 consecutive seed extension attempts before the aligner moved on (-D 15), a maximum of 2 attempts to re-seed reads with repetitive seeds (-R 2), and a 22 bp seed (-L 22) with zero allowed mismatches in this seed. The function to determine the interval between seed substrings during multi-seed alignment was set to f(x)=1+1.15* qrt(x), where x is read length (-i S,1,1.15); based on 101 bp read lengths, this resulted in an interval of 13 bp. For this experiment, transcripts showing a median < 5 counts were considered background and were excluded from subsequent analyses.
For transcripts exceeding the detection threshold, the 72 diurnal samples (six individuals; 12 time points) were collected into one table and transcript counts were normalized using DESeq [51][52] [53]. Annual samples were similarly tabulated, median filtered, and DESeq normalized. After normalization, we computed family means by averaging reads from half-siblings (trees 44 and 90 = family A; trees 43, 46, and 49 = family B). For the annual study, counts were linearly interpolated to emulate equally -spaced sample intervals. The two processed count tables (diurnal, annual) were passed through JTKcycle [31] to identify statistically significant rhythmic transcription (p<0.05), after application of a false-discovery rate correction of q < 0.01 [54]. We used JTK-cycle to identify the phase (time point at which the underlying curve reaches its maximum value) for each transcript, with phases measured in hours after 12:00 AM for the diurnal study, or Julian days (days after January 1) for the annual study. Summaries of cyclic properties (phase; amplitude; period) are provided in supplementary file S2.
Defining relationship between transcriptional phases and solar and weather factors, and enrichment/depletion tests by season. To evaluate the relationship between the timing of maximum annual gene expression ("phase") and environmental variables, annual transcriptional phases were sorted into two week bins, starting on the first sample date (27-October-2010), and continuing until the last sample date. Counts of transcripts reaching maximum expression within two week bins were tallied, and counts were compared to environmental variable summaries for the same intervals; mean photoperiod (hours), the sum of precipitation (mm), and the mean low and high temperatures (°C). The number of genes reaching phase per two-week bin was modelled as a function of solar and environmental factors using linear and polynomial fits (degree = 2).
To evaluate Mercator metabolic terms for enrichment or depletion, we binned significantly cyclic transcripts (e.g., q < 0.01 from JTK-cycle) from diurnal and annual experiments into four temporal "phase bins" of equal time duration. For the diurnal experiment, phase bins were six hours in length, with the first bin approximately centered on "sunrise" (4:01am -10:00am), and subsequent bins defined as "midday" (10:01am -16:00pm), "sunset" (16:01pm -22:00pm), and "night" (22:01pm -04:00am). Annual bins were 91 or 92 days in length, with the first bin approximately centered on the winter solstice or "short photoperiod" (5-Nov to 4-Feb), and subsequent bins defined as "spring photoperiod" (5-Feb to 5-May), "long photoperiod" (6-May to 5-Aug), and "fall photoperiod" (6-Aug to 4-Nov). Enrichment tests for Mercator pathway terms were performed using term lists for transcripts identified as significantly cyclic and identified to a function (all Mercator bins except 35.2, which is "not assigned.unknown undefined"). This resulted in lists of 7,415 annotated transcripts for the diurnal experiment (sunrise bin = 1,916; midday = 1333; sunset = 2,685; night = 1,481), and 13,514 annotated transcripts for the annual experiment (short photoperiod bin = 5,796; spring photoperiod = 1,466; long photoperiod = 5,073; fall photoperiod = 1,179). Enrichment/ depletion tests were performed using a one-tailed Fisher's exact test and the program Mefisto [55]; adjustments for two-tailed tests were made by multiplying P-values by 2, as recommended by Rivals et al. [56], and a false-discovery rate correction of 1% (q < 0.01; [54]) was applied using the p.adjust function in the R library stats.
Comparing annual transcriptome expression variation to other conifers -We compared our annual RNA-seq expression data from a subset of dates (17-August to 7-October) with a previously published microarray-based study of needle gene expression in Sitka spruce (Picea sitchensis) conducted over a similar time and season interval in British Columbia, Canada (30-August to 18-October) [33]. Putative orthologs of Douglas-fir and Sitka spruce transcripts were identified using reciprocal best BLASTN matches between the Douglasfir pan-transcriptome reference and 18,237 clone sequences used in the spruce 21.8k microarray [33] We evaluated the conservation of expression patterns between studies by computing the ratio of fall:summer gene expression using transcripts showing significant expression differences (q<0.05) in microarray data [33] : versus transcripts showing significant circannual rhythm (q<0.01. from JTK-cycle) in RNA-seq data: For the 760 transcripts meeting these criteria, expression ratios were ranked from high-to-low and ranks were compared by Kendall's tau ( ) with the cor.test function in R. In this comparison, is bounded by +1 and −1, with the bounds representing perfect preservation of ranked gene expression ratios in the same (+1) or opposite (-1) direction, and 0 representing random ordering of gene expression ratios between experiments.

RESULTS
Defining the needle 'pan-transcriptome' of Douglas-fir -In this study, needle tissue was sampled by mRNA-Seq to evaluate diurnal and circannual variation in global transcription (Fig 1A, Fig 1B; supplemental notes S1). Needles were sampled for RNA at different time intervals to evaluate two transcriptome responses: (1) diurnal responses, using a sampling interval of four hours across two days (12 time points); and (2) circannual responses, using a sampling interval of approximately 3 -4 weeks across a complete year (16 time points). This sampling scheme resulted in a data set that included 19 trees and 179 individual RNA-seq libraries to evaluate different aspects of temporal needle gene expression.
Individual tree mRNA-seq libraries from needles yielded 94.0 -573.8 million reads, and individual de novo assemblies using Trinity produced 47,976 -126,355 components 200 bp or larger (Table S2). The number of assembled Trinity transcriptome sequences and cumulative sequence length were positively and significantly correlated with the number of input reads (r 2 > 0.92; Notes S1). Across all assemblies, the majority of Trinity components (85.7%) showed a single subsequence, while the maximum number of sequences per component in any library was 227. A 'pan-transcriptome' reference was created using the longest sequence from each component in single-tree de novo assemblies, followed by clustering to reduce allelic and redundant sequences to one representative sequence. This step reduced the pool of transcripts from 1.66 million sequences from 19 individual-tree assemblies, to a pan-transcriptome with 199,471 sequences.
Multiple sources of evidence were used to characterize plant-derived transcripts for homologies and putative functions (Notes S1  19,000 were positively identified by BLAST as derived from foliar metaflora/metafauna present on Douglas-fir needles, or contaminants from the sampling/library construction process. A final list of 6,589 transcripts could not be identified using BLAST searches or searches of either draft gymnosperm genome assembly; due to their uncertain origin, these transcripts were omitted from subsequent analyses. Diurnal transcriptome variation in Douglas-fir tracks daily light/dark transitions -Experiments to detect diurnal cyclic transcriptome variation included six individuals (two sibs per family; three families) collected over twelve 4-hour intervals (Notes S1). After mapping reads from individual diurnal libraries to the Douglas-fir pantranscriptome reference, 41,382 transcripts met our threshold for analysis of diurnal cycling (median mapped reads > 5; Table 1, supplemental file S2). Following DESeq count normalization, JTK-cycle identified 15,487 transcripts as showing significant cyclic diurnal expression at a false-discovery rate of 5% (q < 0.05). The nonparametric test used in JTKcycle can identify transcripts as significantly cyclic even if they show minute amplitudes, such as those that might result from circadian fluctuations in total RNA levels [57] [58]. Since the magnitude of daily RNA fluctuation is unknown for conifer needles, we adopted a more stringent false-discovery rate of 1% (q < 0.01) to identify 'high-confidence" cyclic patterns; this reduced the number of diurnal transcripts to 12,042.
The distribution of expression phase times ( Fig. 2A) across all high-confidence diurnal transcripts shows a pronounced bimodal pattern, with the highest proportion of genes reaching maximum transcript accumulation near sunrise (ZT0 = clock hour 06:44 AM) and before sunset (ZT = 12:50, or clock hour 19:34 PM). This pattern has been shown in diverse plants and animals [5] [7], and is explained as "expression anticipation" for the transition from dark to light in the morning, and light to dark in the evening. Our high-confidence diurnal transcripts include homologues to many of the known core clock genes from angiosperm models [4][7] (Fig. 3; Table 2). We compared the timing of maximum expression for representative clock and seasonal genes in Douglas-fir to values reported for A rabidopsis [7] in controlled chamber experiments. We found a high degree of concordance in the phase for many of these genes ( Table 2), which is striking given the different methods used and organismal divergence involved in this comparison. Transcripts encoding homologs of circadian clock-associated1 (CCA1), cryptochrome1 (CRY1), constitutive photomorphogenic 1 (Table 2). A smaller number of genes -specifically, flowering locus T (FT), late elongated hypocotyl (LHY), and Zeitlupe (ZTL) -showed pronounced differences in phase from A rabidopsis. These differences may be due to incorrect assessments of orthology in gene families, but in the case   of FT, it is likely due to divergent gene functions in gymnosperms and angiosperms, as has been previously suggested [14] [59]. A noteworthy finding is that a large proportion of high-confidence diurnal cycling transcripts (38.4%; N = 4,627) have no homology to known proteins in Mercator and are annotated as "not assigned.unknown" (Fig. 2A) 3 Profiles identified as not significantly rhythmic by JTK_CYCLE are noted "n.s."  (1,481 transcripts). Across daily bins, we found evidence for overrepresentation or underrepresentation in 15 Mercator metabolic pathways (Fig. 4A; supplementary file S3).
Annual transcriptome variation in Douglas-fir tracks annual variation in photoperiod -Experiments to detect circannual variation included 5 individuals collected at 16 time points over 12 months, and the data were median filtered, normalized by DESeq, averaged by family, and then linearly interpolated to even sampling dates (Notes S1). After mapping reads from individual annual libraries to the pan-transcriptome reference, 36,145 transcripts met our threshold for analysis (Table 1). Following DESeq count normalization, JTK_CYCLE identified 24,688 transcripts showing significant circannual expression at a false-discovery rate of 5% (q < 0.05). After imposing a more stringent falsediscovery correction (q < 0.01), the list contained 21,225 high-confidence circannual transcripts (Table 1; supplemental File S3).
The distribution of circannual expression phases (Fig. 2B) for high-coverage transcripts also shows a pronounced bimodal pattern, with the majority of transcripts reaching maximum expression in one of two seasons: (a) December through January, coinciding with winter dormancy, maximum freeze tolerance, reduced metabolic activity, and short photoperiods (day length < 10.5 hours); or (b) June through July, coinciding with maximum shoot and radial growth, high metabolic activity, and long photoperiods (day length < 14.5 hours). Example transcripts showing estimated phases for each month of the year are provided in Fig. 5, and details for these transcripts are provided in Notes 1. Included are transcripts homologous to genes known to play a role in winter adaptation and cold tolerance in A rabidopsis (Fig. 5A, inducer of CBP expression, ICE1), cold acclimation in potato (Fig 5B  stearoyl-ACP Δ9 desaturase gene; [22][60]), and proteins involved in winter photoprotection in coni-fers (Fig. 5C, early light inducible protein, ELIP1; [24]). Transcripts homologous to genes recently identified as showing evidence of convergent adaptation to cold climates in conifers [61] are also shown, including a mitochondrial transcription termination factor (Fig. 5F), a FMN-linked oxidoreductase (Fig. 5L), and a DNAJ-like heat shock protein (Fig. 5N). Transcripts homologous to genes related to QTLs strongly associated with drought in Arabidopsis [62] are also shown, including a cysteine proteinase (Fig. 5G), a pectin acetylesterase (Fig. 5H) and a chaperonin (Fig. 5K). As was observed with diurnal transcripts, 36.3% (N = 7,711) of the high-confidence annual cycling genes are "not assigned.unknown" in Mercator (Fig. 2B), and they show no homology to proteins or RNAs in GenBank. One example of a transcript with unknown function that is highly-expressed in spring is shown in Fig. 5D.
We examined the relationship between the timing of maximum annual accumulation for transcription and four environmental inputs; mean photoperiod, biweekly cumulative precipitation, and bi-weekly mean temperature (maximum, minimum). Annual cyclic expression showed a strong significant association with photoperiod (r 2 = 0.715, F 24,2 = 27.62, P < 0.0001; Fig. 6A), and root mean square error was lower for a second-order polynomial fit than a linear fit. Annual cyclic expression also showed a significant but weaker association with bi-weekly cumulative precipitation (r 2 = 0.368, F 24,2 = 13.39, P < 0.0011; Fig. 6B), and a lower root mean square error for linear versus a second-order polynomial fit. Annual cyclic expression showed non-significant associations with bi-weekly minimum and maximum temperatures (Fig. 6C-6D). Photoperiod and weather have been implicated to be major drivers in seasonal gene expression in different plant tissues [12] [33]. Our results suggest that photoperiod is the dominant driver of the timing of cyclic gene expression maxima in Douglas-fir needles, but that precipitation and water availability is also positively associated with higher seasonal transcriptional activity.
Mercator terms were associated with 13,513 of the 21,225 high-confidence transcripts (excluding "not assigned.unknown"), and these were analyzed for evidence of over-representation or underrepresentation. Transcriptional phases were divided into seasonal bins by photoperiod (defined in Methods); bins included 'short' photoperiod (5,796 transcripts), 'spring' photoperiod (1,466 transcripts); 'long' photoperiod (5,073 transcripts); and 'fall' photoperiod (1,179 transcripts). Across seasonal bins, we found evidence for over-or underrepresentation in 48 Mercator metabolic pathways ( Fig. 4B; supplemental file S4). Pathways showing an over-representation of phase values in short photoperiod days (winter) were related primarily to biotic stress, signaling, protein degradation and post- Gene expression was estimated by RNA-seq from needle samples collected from two families of trees at c. three week intervals for one year. For each transcript, the Douglas-fir transcript name and best BLAST match to Arabidopsis (prefix 'at') are provided; genes lacking a BLAST match to the NCBI NR database are identified as 'unknown'. Lines connect the mean gene expression (DESeqnormalized counts per transcript) for each family, and error bars represent the SD for all replicates.
translational modification, and RNA transcription in diverse regulatory genes such as MAP kinases, RNA helicases, and proteins playing a role in hormone signaling and transduction (e.g., auxin; DUF26). Pathways showing over-representation of phases on long photoperiod days (summer) were related primarily to organelle regulation (pentatricopeptide and tetratricopeptide repeat protein families), photosynthesis-related metabolism (photosynthesis light reaction; carbonic anhydrase; metal binding/storage), ribosomal protein synthesis (nuclear and organellar), redox.thioredoxin, and stress from heat.   in British Columbia, Canada [33]. From these studies, we identified 11,582 transcripts that showed reciprocal best BLASTN matches between the Douglas-fir transcriptome and Sitka spruce clones used in microarray design (supplemental file S5). Out of this total, 760 transcripts showed evidence of significant seasonal change in Sitka Spruce (q<0.05; two-fold change [33]), and cyclic behavior in Douglas-fir (q<0.01, JTK-cycle). The rank order of fall:late summer gene expression ratios for these 760 transcripts was more preserved than would be expected if gene order was random (Kendall's = 0.2901; z = 11.989; p < 2.2e-16), indicating that these experiments reveal similar patterns of fall season gene expression for a common set of genes, even though the comparisons are based on different evolutionary lineages (Picea vs. Pseudotsuga), methodologies (microarray vs. RNA-seq), garden latitudes (42.3° N vs. 54° N), and sample dates. The general conservation of transcriptional patterns between these unrelated conifers highlights the utility of results from Douglas-fir as a tool for predicting annual expression trends and expression phase for homologous genes from other temperate zone conifers, such as spruce and pine [61].

DISCUSSION
Perennial, evergreen needles are one of the key features that distinguish nearly 700 species of conifers from the tree model Populus and other deciduous angiosperm trees. Over a calendar year, persistent leaves undergo transcriptional and metabolic shifts that allow for high photosynthetic rates when conditions are favorable (even during winter), while providing conservative protection from the damaging effects of winter cold during dormancy, and drought during annual dry seasons [20]. By defining the complexity and contribution of gene expression over the complete growth-dormancy cycle, circannual studies like this provide a foundation for identifying associations between the synchrony of transcriptional change to seasonal changes and changes in phenology, and they offer a source of evidence for identifying genes/pathways that may contribute to adaptive responses in forest trees to climate change.
Our results show that the timing of maximum transcript accumulation in diurnally and circanually rhythmic transcripts from conifer needles is associated with the timing and quality of light at both temporal scales. At a daily scale, 12,042, or 29%, of expressed diurnal transcripts showed a significant diurnal rhythm, with two-thirds of these diurnal transcripts achieving maximum mRNA accumulation within +/-2 hours of sunrise or sunset. At an annual scale, 21,225, or 58.7%, of expressed annual transcripts showed significant circannual rhythms, with nearly half of all expressed transcripts achieving maximum mRNA accumulation within +/-20 days of the shortest or longest photoperiod. The influence of light as an entraining force of circadian rhythms is well-known in plants [1] [8]; our analysis expands the list of known circadian genes in angiosperms to their lesser-studied sister lineage, the gymnosperms.
In contrast to daily cycles, the role of photoperiod on the timing of global transcript accumulation in perennial leaves is less well understood. Photoperiod is known to play a crucial role in the timing of the onset of dormancy, bud break and flowering in many plants [10][11] [14]; our study shows that this 'photoperiodic effect' is a dominant pattern in the annual leaf transcriptome. Given the reliability of light as an entraining force for forecasting seasonal change, this is likely to be a common (if not universal) feature of temperate-zone conifer gene transcription. This hypothesis could be tested by evaluating cyclic transcriptome variation in trees grown in reciprocal gardens with fixed day length and contrasting temperature or precipitation regimes. In this type of comparison, genes showing light-dependent transcription should show a constant phase, irrespective of influences from the local climate. In contrast, genes responding to other exogenous cues like temperature and precipitation should show a shift in expression phase. These types of 'needle-specific' contrasts cannot be made with deciduous tree leaves, but equivalent comparisons could be made using other seasonallyresponsive tissues (cambium; buds) from trees that are known to be responsive to photoperiod (e.g., Populus [9][12) and temperature (e.g., apple or pear [63]).
The entrainment of diurnal and circannual gene expression by light quality or day length in Douglas-fir is intuitive, but the dramatic accumulation of transcripts in winter presents a paradox; why does transcript accumulation peak for such a high proportion of the cyclic transcriptome during dormancy, when metabolic activity is reduced and growth is arrested? In western Oregon, Douglas-fir remains photosynthetically active during the winter [20], but it undergoes physiological changes that result in maximum cold hardiness between November to early December [23]; this timing coincides with the increase in genes achieving maximum transcript accumulation (e.g., Figure 2B). Transcription for many genes and pathways are known to increase under short days and cold temperatures, including those that show associations to winter cold-acclimation responses [24] [22]. Direct comparisons to leaf transcriptomes cannot be made with winterdeciduous angiosperms, but the list of Douglasfir cyclic transcripts reaching maximum transcription in winter includes genes that have been implicated in cold acclimation of Populus buds (e.g., early light-inducible proteins (ELIPs), Crepeat binding factors, fatty acid desaturases, major carbohydrate enzymes, LEA-like proteins, and heat shock proteins [17]. Short day-induced transcriptional upregulation in needles appears extensive, and further effort is required to determine whether this massive increase in transcript accumulation is a response to the proximal demands of growth cessation, dormancy, cold hardiness, or more distant processes involved in breaking dormancy.
Our study highlights a related paradox, which is that the physiologically complex events occurring during the resumption of spring growth (increased photosynthetic rate; onset of radial and shoot growth; flowering) occur without a large, coordinated transcriptional response from cyclic genes in needles. This suggests that the transcriptional activity required to end dormancy and resume spring growth involves a comparatively small number of cyclic genes and pathways, that the activity is partitioned to other tissues and tissue-specific cyclic genes, or that the coordinated response occurs far in advance of the actual event of spring growth. The veracity of these alternatives could be readily evaluated using comparative time-series gene expression (or proteomic) experiments using different tissues from individual plants over the dormancy-growth transition.
The daily and circannual transcriptional patterns for Douglas-fir are available for examination [64], and these will be useful for predicting the timing of transcription in genomically-complex gymnosperm models such as spruce [65] [66] and pine [48]. The combination of large genome size (~20 Gbp), high transcriptional complexity [67] [68], genic redundancy, and divergence from angiosperm models has made it difficult to infer gene function in conifers based on homology alone, although geneticenvironmental and genetic-phenotypic associations are being investigated in many conifer species [61] [69]. Conifers lack tractable models for reverse genetic manipulation, so context-specific knowledge offered by diurnal, seasonal, and circannual gene expression studies can provide additional clues to the timing of transcript accumulation, and the seasonal context by which genes and pathways are up-and downregulated. Accurate estimates of phase, and changes in phase under different experimental conditions, will provide a means to characterize and define regulatory relationships in pathways critical to a variety of functions. A clearer understanding of these complex circannual responses will emerge as the developing Douglas-fir genome [70] and other conifer genomes are integrated with daily, seasonal, circannual, and tissue-specific transcriptomic studies. This knowledge will set the stage for understanding how conifer needles have responded to climate changes during their 300+ million year independent history from flowering plants, and it may offer novel avenues for predicting adaptive responses under different future climate alternatives.

DECLARATIONS
Availability of data and material. The sequences, reference transcriptome and mapped count data are available under NCBI BioProject Umbrella PRJNA362352; this includes diurnal sequences and counts (PRJNA263611), annual diurnal sequences and counts (PRJNA243096), and reference transcriptome (PRJNA356432). Mercator and GO annotations are available from the TreeGenes website (http://treegenesdb.org/ftp/Transcriptome_Data/ transcriptome/Psme/ Cyclic_Transcriptome_Assembly_v2.04_%5bFS%5d/), and data sets analyzed for this published article are available as supplementary files.
Competing interests. The author s declare that they have no competing interests. The findings mention of trade names or commercial products does not constitute endorsement or recommendation for use by the U.S. Government.
Funding. This project was suppor ted by the US Department of Agriculture National Institute of Food and Agriculture (Plant Genome, Genetics and Breeding Program #2010-65300-20166) and the US Forest Service Pacific Northwest Research Station.