Simplification of circadian rhythm measurement using species-independent time-indicated genes

Abstract The circadian rhythm varies among species, and the distribution of common circadian rhythm-related genes in plants is not yet clearly understood. In this study, we analyzed the transcriptome data from plants of three different species (Solanum lycopersicum, Arabidopsis thaliana, and Lactuca sativa) and their circadian rhythms. Homology of the gene sequences was analyzed. Thirty genes containing time information were found to be common among the three species studied and were used to measure the circadian rhythm. Because 22 of these 30 genes were associated with photosynthesis, we suggest that light control could be used to regulate the circadian rhythm. Currently, a high-cost transcriptome analysis is required for the measurement of circadian rhythm; however, our results showed that it was possible to reduce the number of target genes to 30 and, hence, to reduce the cost of the analysis. Our findings will enable easier estimation of circadian time, which, in turn, will facilitate environmental control for plant growth through better control of circadian time, thus facilitating better crop management practice.


Introduction
Almost all organisms have an approximately 24-h biological rhythm, the so-called circadian rhythm that controls various biological phenomena, such as the opening and closing of stomata, flower initiation, and gene expression [1]. Moreover, circadian rhythms affect plant metabolic pathways by controlling the underlying plant hormones [2]. Crop quality strongly depends on the metabolites produced by complex metabolic pathways; thus, a circadian rhythm is an important determinant of crop quality parameters, such as sugar concentration and nutrient content [3]. Therefore, circadian rhythms may be used for controlling plant growth.
Previous studies have shown that a circadian rhythm can be controlled by controlling the external environment [4,5]. However, circadian rhythms have not been fully elucidated, and a circadian rhythmbased technology of wide applicability in agriculture has not been developed yet. One form of control over the circadian rhythm involves clock genes, which have been found to occur in Arabidopsis thaliana.
The circadian rhythm cycle is formed mainly by the transfer feedback (negative feedback loop) of CCA1/LHY and TOC1, which show cyclic expression fluctuation for 24 h. [6]. Many genes exhibit a 24-h cycle because they are regulated by the expression cycle of CCA1/LHY and TOC1. However, CCA1 and LHY are not common to all species. For example, tomato (Solanum lycopersicum) and barley (Hordeum vulgare) have the LHY gene, but not the CCA1 gene [7]. In addition, the central clock genes are not highly conserved. Therefore, when attempting to control the circadian rhythm, rather than modifying genes, it is more effective to control the cycle generated by the circadian rhythm. For example, lettuce growth rate has been shown to increase by synchronizing its photoperiod with the circadian rhythm cycle [8], a phenomenon called circadian resonance [8,9].
Chlorophyll fluorescence may be used as a method for gathering biological information, such as photosynthesis potential and plant susceptibility to stress [10], and can be measured to assess biological rhythms [11,12]. However, this method does not detect the clock-gene behavior. In contrast, the molecular timetable method (MTM) has been effectively used to calculate the circadian rhythms in animals from their gene expression patterns [13]. In this method, the transcriptome data are acquired continuously in chronological order, and the expression levels at the same sampling time-points are arranged for each expression pattern. Additionally, circadian time is measured from the peak (molecular peak time) of the scatter plot. In previous studies, we demonstrated that MTM can also be used in plants cultivated under different conditions [14][15][16], and, although expensive, time can be accurately estimated by using hundreds of periodic genes in the time-series transcriptome analysis. Nonetheless, the circadian time estimation cannot be performed frequently and is difficult to use in industrial, agricultural, or other applications because only a high-cost method that involves the acquisition of transcriptome data is available for the accurate measurement of a circadian rhythm.
In the present study, we aimed to develop a circadian clock technology for use in agriculture. The accurate measurement of a circadian clock requires comprehensive determination of gene expression levels, but the cost is abnormally high for agriculture. Therefore, it is necessary to reduce the cost by conducting a precisely targeted analysis. We focused on the gene expression cycle to identify the circadian rhythmrelated genes common to the three species under study. We described a detailed method for identifying these common genes and characterizing their behavior. By successfully estimating the internal circadian time with these common genes, our study demonstrates a method for easily understanding the characteristics of a circadian rhythm, regardless of the species studied. This will be useful for developing highly productive farming strategies.

Plant material
Three plant species were used in these experiments: tomato (Solanum lycopersicum, cultivars 'Taian-kichijitsu' and 'Momotaropeace'), lettuce (Lactuca sativa L. 'Frill Ice', and A. thaliana accession Col-0. Tomato was cultivated in a sunlight-type plant factory (Ehime University, Japan), and tomato seedlings were transplanted into rockwool cubes (Grodan Delta, GRODAN, Roermond, the Netherlands) in August 2013. The rockwool cubes were watered using a nutrient solution [17]. Twenty slabs were set in each line. The sampling of 'Taiankichijitsu' tomato was carried out in January 2014, June 2016, and April 2017. The tomato plants sampled in June 2016 were diseased (probably powdery mildew). The sampling of 'Momotaro-peace' tomato was carried out in November 2017. The temperature, relative humidity, and level of insolation are shown in Supplementary Data 1. The winter, spring, diseased, and autumn tomatoes were sampled in January 2014, June 2016, April 2017, and November 2017, respectively.
Arabidopsis thaliana was seeded on aseptic 40-mm diameter dishes containing 4 mL of Murashige and Skoog medium (2% (w/v) sucrose) and solidified with 2 g L −1 gellan gum. The A. thaliana and lettuce seeds were placed on a urethane sponge containing water and incubated for 1 (lettuce) or 2 (A. thaliana) weeks under fluorescent light [photosynthetic photon flux density (PPFD) = 250-450 μmol m −2 s −1 ] at 22°C with a 12-h light/12-h dark (LD) photoperiod. Thereafter, the seedlings were transplanted and grown in a deep-flow hydroponic system in a fully controlled plant factory.
For transplant, a panel with a planting hole and a root growth area at a water depth of 90 mm was used (885 mm × 590 mm × 30 mm; M Hydroponic Research, Co. Ltd, Aichi, Japan). Seedlings with each urethane sponge were transplanted into the hole, and the panels were installed in the cultivation beds. The pump circulating the solution was run at a flow of 10-15 L min −1 . The cultivation medium was Otsuka House No. 1 and 2 (N:P 2 O 5 :K 2 O:CaO:MgO = 10:8:27:0:4 and 11:0:0:23:0; Otsuka Chemical, Co. Ltd, Osaka, Japan). The pH and electrical conductivity (EC) of the nutrient solution were set at 6.0 and 2.0, respectively. The growing conditions in the plant factory were as follows: temperature 22°C, relative humidity (RH) 50%, and CO 2 concentration 1000 μmol mol −1 . The light environment was either LL (continuous light) or LD (LED; R:G:B = 120:40:40; total PPFD = 180-220 μmol m −2 s −1 ). Illumination was set as LD and was switched to LL 12 days after transplant.

RNA extraction and sequence analysis
Tomato leaves of the fifth branch from the tip were sampled every 2 h for 2 days (winter and diseased tomatoes, n = 1, spring and autumn tomatoes, n = 3), sliced into 1-cm 2 segments, and stored at 0°C in RNAlater solution (Qiagen, Hilden, Germany). The RNA of 'Momotaropeace' tomato was mixed with equal amounts of the RNA extracted from thesamples for further use. Lettuce leaves grown under LL and LD conditions and A. thaliana leaves were sampled every 2 h for 2 days at 13 after transplant (n = 1). The sampled leaves were frozen in liquid nitrogen and stored at −80°C. Total RNA was extracted from each sample using the Agilent Plant RNA Isolation Mini Kit (Agilent Technologies, Santa Clara, CA, USA), according to the manufacturer's instructions.
Total RNA extracted was detected using a Bioanalyzer (Agilent Technologies). In the process of preparing the RNA sequencing library, the total RNA extracted was mixed with ERCC RNA Spike-In control mixes (Life Technologies, Carlsbad, CA, USA). Poly(A) RNA was isolated using oligo(dT)25 Dynabeads (Invitrogen, Carlsbad, CA, USA), according to the method of Wang et al. [18]. The sequencing libraries of lettuce, A. thaliana, winter tomato, and LL tomato were prepared according to the method of Nagano et al. [19], and the sequences were analyzed using a HiSeq 2000 sequencer (single-end, 50-bp reads; Illumina, San Diego, CA, USA). The sequencing libraries of spring and autumn tomatoes were prepared according to the method of Kamitani et al. [20], and the sequences were analyzed using a HiSeq 2500 sequencer (single-end, 50-bp reads; Illumina). The reads obtained were registered in the DNA Databank of Japan (DDBJ) database (http:// trace.ddbj.nig.ac.jp/DRASearch) under the following accession numbers: DRA004542 (LL lettuce), DRA004561 (LD lettuce), DRA005748 (A. thaliana), DRA003530 (winter tomato), DRA003528 (LL tomato), DRA004568 (diseased tomato), DRA007095 (spring tomato), and DRA007076 (autumn tomato).

Transcriptome analysis
Quality control of the clean reads obtained was performed using FastQC (http://www.bioinformatics.babraham.ac.uk/projects/fastqc/). The reference sequences for the mapping of tomato and lettuce were obtained from the National Center for Biotechnology Information (NCBI; http://www.ncbi.nlm.nih.gov/) database. The reference sequences for A. thaliana were obtained from The Arabidopsis Information Resource (TAIR; https://www.arabidopsis.org/) database. Mapping was performed by using Bowtie 2 (ver.2.3) [21] and quantification by using the RNA-Seq by Expectation Maximization software (RSEM; http://deweylab.biostat.wisc.edu/rsem/) [22]. Both software programs were run with default settings. The expression data were normalized to reads per kilo-base million (RPKM) values.

Periodicity analysis and circadian time estimation
We used the MTM [13,14] to detect the periodicity genes or contigs of tomato, lettuce, and A. thaliana. In this method, the time-series gene expression data were fitted to the cosine curve of the 24-h cycle, and the periodically expressed genes were selected from the correlation. The genes with a coefficient of determination (R 2 ) value ≥0.7 were selected as periodically expressed genes (time-indicated genes, TiGs). The selected TiGs were arranged for each expression peak-time, and the expression level of each gene was extracted at each sampling time-point to prepare a scatter plot. The peak of this scatter plot was the circadian time at the sampling time-point. The accuracy of these time estimates was evaluated by correlation between the cosine curve of the scatter plot obtained and the same length of time as the natural period of the target plant. A heatmap was generated using z-score normalization values transformed from RPKM values and R software.

Search for common genes
Gene homology was determined using the basic local alignment search tool (BLAST) analysis (tBLASTx). The homology between tomato and A. thaliana was determined using PSI-BLAST, which was executed with descriptions set to 5000, interactions set to 2000, and alignments set to 1000. Other BLAST parameters were set to default values. The gene with the highest e-value was selected as the homologous gene. All homologous TiGs were linked to A. thaliana using BLAST (including PSI-BLAST), and common genes were selected.

Results
The result of quality check showed that the Q30 quality scores of all sample reads were ≥80%. However, this result was excluded from the analysis because the final sample on the last day of the June tomato sampling had a low-quality sequence read. The clean reads (approximately 90% or more for A. thaliana, 80% or more for tomato, and 90% or more for lettuce) were mapped to the reference sequences for each obtained sequence. Each normalized expression level data is provided as supplemental data (Supplementary data 2-8). Lettuce showed 1710 and 1095 contigs (TiGs; herein, we include contigs as genes) with R 2 ≥ 0.7 in LL and LD conditions, respectively. Of these, 614 TiGs overlapped (Fig. 1). Arabidopsis thaliana showed 923 TiGs with R 2 ≥ 0.7 in LL condition. The winter and diseased tomatoes showed 2518 and 628 TiGs with R 2 ≥ 0.7, respectively. In lettuce, 614 TiGs were homologous with 556 genes of winter tomato and 439 genes of A. thaliana, according to the BLAST results. In winter tomato, 556 genes were homologous with 402 genes of A. thaliana, according to the PSI-BLAST results (Fig. 1A). In A. thaliana, 439 genes were homologous with 402 genes of winter tomato, according to the PSI-BLAST results, and 923 TiGs were homologous with 818 tomato genes (number of genes duplicated in winter and diseased tomatoes) and 751 TiGs (contigs) of lettuce (Fig. 1B). In winter tomato, 2518 TiGs were homologous with 1538 contigs of lettuce and 1798 genes of A. thaliana (Fig. 1C). Lastly, in diseased tomato, 628 TiGs were homologous with 443 contigs of lettuce and 482 genes of A. thaliana (Fig. 1D).
These homologous TiGs appeared to convey time information; however, whether they indicated circadian rhythm was unclear. Therefore, we selected them to estimate circadian time. A clear scatter diagram was obtained with the base TiGs (e.g., 614 lettuce genes listed in Supplementary Data 1), which rendered an average correlation value with the cosine curve at each sampling time-point of approximately 0.8 and allowed accurate estimation of the circadian time involved (Fig. 1). However, the scatter plots of homologous genes showed blurred curves. Moreover, to estimate circadian time, the average correlation value with the cosine curve for all plant species combined was approximately 0.5, which permitted a certain degree of accuracy (Supplementary Data 9). At the time estimated from the homologous genes, the estimation error was within the 2-h sampling interval for all except seven (of 325) time points. (Fig. 2).
The ability to estimate circadian time with homologous TiGs across species implies that they might encode information indicating the internal time derived from the circadian rhythm. Furthermore, these genes appeared to be common to all three species studied. Based on the results of the MTM (R 2 ≥ 0.7), we selected 30 genes from the three plant species studied (Table 1). These 30 genes did not include CCA1, nor did they include PRRs or CO (genes related to clock function), but did include TOC1 and GI. Additionally, according to the information available in the TAIR database, the products of 22 of these 30 genes were localized in the chloroplast. In contrast, only five were localized in the nucleus, where the circadian clock was formed.
We annotated these 30 genes by function using GO terms based on the accession numbers of A. thaliana. The 30 genes were classified into three broad groups: carbohydrate metabolism, light response-related, Numbers under the plant name indicate the number of homologous genes. All of these results showed a R 2 > 0.7 for the cosine curve as fitted using the molecular timetable method (MTM). and others (Fig. 3). The categories of these groups indicated that these 30 genes were related to light. However, there was no common characteristic sequence in the upstream regions and sequences of these common 30 genes.
We analyzed the expression fluctuation of these 30 genes. Although the expression pattern of these 30 genes was slightly different depending on the species, the expression was cyclically fluctuated (Fig. 4). Next, we analyzed the peak times of these 30 genes. The circadian time was calculated from the correlation between the expression pattern of TiGs and the cosine curve. Thus, the period that can be estimated decreases with increasing similarity among the expression patterns of these 30 genes. Further, we confirmed the variation in the expression peak time of these 30 genes. Arabidopsis thaliana showed a distinctly biased expression peak, but the 30 genes showed expression peaks from morning to night in all species (Fig. 5). Finally, we confirmed that circadian time estimation is possible using these 30 genes. Using the selected 30 genes from lettuce LL, lettuce LD, 'Taian-kichijitsu'tomato, diseased 'Taian-kichijitsu' tomato, and A. thaliana, the circadian time was estimated with an accuracy of ± 2 h of the sampling interval (Fig. 6). In A. thaliana and autumn tomato, the estimation accuracy decreased after 24:00 in circadian time. In addition, it was possible to accurately estimate the time even in spring ('Taian-kichijitsu') and autumn ('Momotaro-peace') tomatoes, which were not used for selecting these 30 genes (Fig. 6F, G). Gene description is based on the description of these genes in A. thaliana. The notation of the pathway is based on the Kyoto Encyclopedia of Genes and Genomes (KEGG) database.

Discussion
In the present study, we focused on the circadian rhythm-related periodicity of gene expression and gene commonality among tomato, lettuce, and A. thaliana. We found 30 common genes that were successfully used to estimate the circadian rhythm duration. We analyzed the homology between the periodicity of gene expression calculated by MTM and genes of other plant species. Since homology analysis was performed twice, the number of final homologous genes was greatly reduced, as compared with the initial number of genes (Fig. 1). Nevertheless, these homologous genes were able to estimate the circadian time to a sufficient level of accuracy. This indicated that we had selected highly conserved genes that contained circadian time information by selecting the initial periodic gene and setting the R 2 of periodicity strictly to ≥0.7.
However, the accuracy of estimation tended to decrease as the circadian time exceeded 24:00 (Fig. 6). This was probably due to the use of LL samples for selecting common genes. It is well known that the attenuation of the amplitude and the deviation of the free-run cycle are caused by exposure to continuous light [4,23]. In addition, diseased tomato plants existed among the samples, and circadian clocks are known to effectively cope with pathogens by periodically expressing the genes of plant hormones and disease response pathways [24,25]. We previously reported that disease response-hormones (salicylic acid, abscisic acid, ethylene, and jasmonic acid) are under the influence of the circadian rhythm [2]. Therefore, we believe that our data from the diseased samples might explain the decline observed in the accuracy of estimation. Research on the effects of infection and disease response on the body clock is underway. Previous studies have reported that plant immunity is closely related to the circadian rhythm [26]. From these studies, we believe that the difference in responsiveness of the circadian clock between opportunistic and severe infections is evident.
The fact that 628 genes in diseased tomato plants showed a circadian cycle indicates that the circadian clock behavior is at the center of daily activities. Circadian clocks do not show periodicity during cell differentiation. Particularly in plants, circadian clocks are reset in the apical tissues of the roots and surrounding cells during lateral root formation [27]. Moreover, the circadian clock is known to form a phase again as it triggers the auxin signal for new cell differentiation [28]. For this reason, our data may suggest that when events, such as disease, occur, the circadian clock is maintained, but with gene expression. It has been reported that is the cycle of Opuntia ficus-indica for change in the expression of CCA1 and TOC1 in circadian clocks differs by as much as 12 h [29]. In addition, Nose et al. [30] reported that circadian clock formation and its expression network are different between angiosperms and gymnosperms Moreover, Hayama et al. [31] have reported a high possibility of the existence of two types of clock in Pharbitis nil. Therefore, CCA1 and TOC1 were considered not necessarily related to each other. Our results also suggest a similar possibility. However, we were unable to find a characteristic common sequence in the upstream regions of these 30 genes.
Even in the tomato depicted in Fig. 1A, the number of common genes decreased to 30, despite the presence of 388 homologous genes. This is probably because we used the data from LL and diseased tomatoes. In addition, Higashi et al. [15] observed that the number of  periodically expressed genes decreased by 80% because of the differences in light conditions (LL or LD), even for the same species. Moreover, we confirmed that the periodic genes were reduced by approximately 80%, regardless of the presence or absence of disease in tomato (data not shown). We believe that the selected 30 genes encode substantial information on circadian rhythms. Nevertheless, CCA1/LHY was not included among those 30 genes.
Plant circadian rhythms are known to differ between shoots and roots [32,33], and CCA1 does not inhibit the expression of other genes in the roots [34]. Further, it has been demonstrated that this difference in circadian rhythms between shoots and roots is synchronized by weak binding between cells [35,36]. Therefore, the importance of clock genes is believed to vary, depending on tissues and cells. In addition, unlike A. thaliana, both, tomato and H. vulgare have LHY, but not CCA1 [7]. In contrast, TOC1 exists in many plant species, including tomato and H. vulgare. These findings support the idea of species-specific importance of different clock genes. In addition, owing to the likely existence of two clocks in Pharbitis nil [31], the circadian clock system of plants might require the presence of multiple clocks. The known clock involving CCA1 and TOC1 cannot clearly explain the circadian rhythm of a plant circadian clock. Therefore, it might be necessary to analyze CCA1 and TOC1 separately to fully understand their influence on the circadian clock.
We found that 22 of the 30 genes were located in chloroplasts. In addition, five genes were localized in the nucleus. Furthermore, we analyzed data under different light conditions; thus, we considered that 22 genes were not affected by a simple light and light cycle. Therefore, among these 22 genes, a gene was thought to play an important role in circadian rhythm. Wilson and Connolly [37] reported that Fe signals from chloroplasts regulate the circadian clock. Among the 22 chloroplast-localized genes, some were related to Fe signaling, but we could not clarify their direct association with signal transduction. On the other hand, Lu et al. [26] summarized the relationship between light, immunity, and circadian rhythm. They separated the relationship between a simple light response and circadian rhythm and discussed the relationship between circadian rhythm and immunity. The 22 genes from the present study might be involved in the physiological functions related to immune response and circadian clock.
We found 30 common genes from the transcriptome data of tomato (including diseased plants), lettuce, and A. thaliana grown under different conditions, including continuous light. We reduced the number of genes to be analyzed to 30, which made the analysis possible on a small scale. Even if we compare the transcriptomes of different species simply by real-time quantitative PCR analysis, this can considerably reduce the cost (by at least 90%) and time required to complete the analysis.

Conclusion
Contrary to the brief summary of the plant circadian rhythm that was thought heretofore, our results indicate a new circadian clock [31], likely comprising of multiple systems. Furthermore, we considered that the 30 common genes that normally show circadian rhythmseven under continuous lightare at, or close to, the core of the plant circadian clock. Therefore, the behavior of these 30 genes is a clear indicator of the persistence of a plant circadian clock and can provide stable support for carrying out transduction studies of other plant circadian clocks. Furthermore, in agriculture, the reduction of the number of genes in this study makes accurate measurement of the circadian clock rhythm more practical. A number of phenotyping techniques have been developed for use in agriculture, and we believe that integrating these techniques with the measurement of the circadian clock rhythm would enable us to accurately determine the state of plant growth.

Author contributions
YT, TH, and HF designed the study. YT performed the data analysis. AN, MH, and AT prepared the RNA-Seq library. AN, AT, and MK devised a method of library construction. YT and HF wrote the manuscript. KT assisted in the sampling of tomato in the sunlight-type plant factory.
All authors discussed the results and implications and commented on the manuscript.

Funding
This work was supported, in part, by Grants-in-Aid for Scientific Research (nos. 25712029 and 16H05011) and Precursory Research for Embryonic Science and Technology (no. JPMJPR15O4) of the Japan Science and Technology Agency (to HF).