Comparative Transcriptome Analysis of Chinary, Assamica and Cambod tea (Camellia sinensis) Types during Development and Seasonal Variation using RNA-seq Technology

Tea quality and yield is influenced by various factors including developmental tissue, seasonal variation and cultivar type. Here, the molecular basis of these factors was investigated in three tea cultivars namely, Him Sphurti (H), TV23 (T), and UPASI-9 (U) using RNA-seq. Seasonal variation in these cultivars was studied during active (A), mid-dormant (MD), dormant (D) and mid-active (MA) stages in two developmental tissues viz. young and old leaf. Development appears to affect gene expression more than the seasonal variation and cultivar types. Further, detailed transcript and metabolite profiling has identified genes such as F3′H, F3′5′H, FLS, DFR, LAR, ANR and ANS of catechin biosynthesis, while MXMT, SAMS, TCS and XDH of caffeine biosynthesis/catabolism as key regulators during development and seasonal variation among three different tea cultivars. In addition, expression analysis of genes related to phytohormones such as ABA, GA, ethylene and auxin has suggested their role in developmental tissues during seasonal variation in tea cultivars. Moreover, differential expression of genes involved in histone and DNA modification further suggests role of epigenetic mechanism in coordinating global gene expression during developmental and seasonal variation in tea. Our findings provide insights into global transcriptional reprogramming associated with development and seasonal variation in tea.

Global expression profiling of three different tea cultivars during development and seasonal variation. We explored the transcriptome of twenty four libraries from 24 samples to identify differentially expressed genes (DEGs) during development and seasonal variations among three tea cultivar. The transcripts with log two-fold and above differential expressions were selected as DEGs. FPKM expression value and differential expression of all unigenes specific to three cultivars: 'H' , 'T' and 'U' during developmental and seasonal variations are listed in Supplementary Table S4. MA-plot representing differences between the two samples while plotting logarithmic fold changes on the y-axis against the logarithmic mean of counts on the x-axis ( Supplementary Fig. S3). Tissues from three cultivars were compared at four stages of season to identified DEGs among cultivars. The number of DEGs varied from 12064 (for TBD_UBD) to 17427 (for TBA-TBA). All three cultivars showed higher number of DEGs at active season (A). A total of 17

Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analysis of differentially expressed genes (DEGs).
We performed GO and KEGG enrichment analysis to assign functional categories to the differentially expressed genes (DEGs) during developmental and seasonal comparisons. In GO enrichment analysis, GO term related to 'biological process' category such as cell proliferation, development growth, regulation of meristem development and maintenance, secondary metabolic processes, hormone metabolic process, regulation of gene expression, epigenetic, photosynthesis, DNA methylation, cell cycle, multi-organism process and response to abiotic stimulus were highly enriched during development and seasonal variations (Supplementary Table S6; Supplementary Fig. S4). On the other hand, GO term related to 'molecular function' such as NADH-dehydrogenase, various aspects of oxidoreductase, hydrolase activity, transporter activity and DNA and protein binding were highly enriched during development and seasonal variations (Supplementary Table S6; Supplementary Fig. S4).
In KEGG pathway enrichment analysis 55, 61, and 58 pathways corresponding to 3670, 5679 and 5520 DEGs were significantly enriched in 'H' , 'T' and 'U' , respectively (P ≤ 0.05; Supplementary Table S7). The genes that encode enzymes involved in various metabolic pathways such as pentose and glucuronate interconversion pathway, galactose metabolism were highly enriched during seasonal variation in all three cultivars. However, starch and sucrose metabolic pathway was additionally enriched during different developmental stages in 'H' and 'U' cultivars. Pathways involved in secondary metabolites biosynthesis such as flavonoids and phenylpropanoids, Transcriptome and metabolite analysis identified molecular response of key secondary metabolic pathways during development and seasonal variation in three tea cultivars. Catechin biosynthetic pathway. The flavonoid biosynthetic pathway of C. sinensis has been studied at physiological, biochemical and genetic levels [16][17][18] . Although, several independent studies have reported the expression profile of various genes of these pathways individually or in limited number in response to leaf age and different external cues in single varieties [19][20][21][22][23][24][25] . Also, these previous studies were limited either by gene studied, tissues, variety and/or season. In view of this, expression profile of key regulatory gene of flavonoid pathway was comprehensively investigated in three tea cultivars during development and seasonal variation. Among the cultivars eight gene namely, flavanone 3-hydroxylase (F3H), flavonoid 3′ -hydroxylase (F3′H), flavonoid 3′ ,5′ -hydroxylase (F3′ 5′ H), flavonol synthase (FLS), dihydroflavonol 4-reductase (DFR), leucoanthocyanidin reductase (LAR), flavone synthase (FNS) and anthocyanidin reductase (ANR) were differentially expressed. In particular, the expression of F3′ H, F3′ 5′ H, FLS, DFR, LAR and ANR was found to be highest in 'U' and 'T' cultivars whereas F3H and FNS was highest in 'H' cultivar. In addition, expression of phenylalanine ammonia-lyase (PAL), chalcone isomerase (CHI), 4-coumarate CoA ligase (4CL), cinnamate 4-hydroxylase (C4H), chalcone synthase (CHS) and F3H gene was not affected at different season in 'H' cultivar. Results also showed minimum variation in gene expression as well as metabolite content in 'H' cultivar across the season and tissue as compared to 'U' and 'T' cultivars ( Fig. 3A; Supplementary Table S8A). Further, expression analysis indicated that almost all known genes related to flavonoid biosynthesis were differentially expressed during development. The expressions of all genes were observed to be higher in young tissue during development at four different stages of season except FLS, whose expression was observed in old leaf tissue ( Fig. 3A; Supplementary Table S8A). However, minimum difference in expression was observed during development at 'D' season. Moreover, while comparing seasons, the tissues obtained during ' A' season showed maximum relative expression of most of the regulatory genes except F3H, 4CL, C4H, CHS and CHI, intended to decrease in subsequent seasons and observed minimum expression for these genes during 'D' season which subsequently increased during 'MA' season. (Fig. 3A; Supplementary Table S8A). Previously, variation in catechins concentration has been reported during one or two seasons in tea 19 . Here, influence of cultivar type, development stage and season on catechins content was identified by analysing young and old leaf tissue of three tea cultivars during four different seasons using HPLC. Six characteristic tea catechins namely; catechin (C), epicatechin (EC), gallocatechin (GC), epigallocatechin (EGC), epicatechingallate (ECG), epigallocatechingallate (EGCG) showed significant differences in their content among three tea cultivars. The highest total catechins (TC) content was recorded in 'U' (26.9%) followed by 'T' (23.9%) and 'H' (19.8%) ( Table 2). The difference was also observed in two different tissues. TC was higher in young compared to old leaf tissue. Data on TC during seasonal variation has documented its highest content during active season and lowest during dormant season. Among the individual catechins, all six catechins were recorded in higher amount in young tissue during ' A' season. EGCG was recorded in highest amount followed by ECG, while GC was lowest in all three cultivars irrespective of tissue and season. The order of catechins was EGCG > ECG > EC > EGC in 'H' cultivar whereas in 'T' and 'U' cultivar was EGCG > ECG > EGC > EC (Table 2). Hence, cultivar-type, their tissue development stage and seasons have strong influence on the catechins level in tea.  Table S8A). The expression of TCS, MXMT and SAMS was observed to be higher in young tissue during development. While during seasonal variation the expressions of these genes were highest during ' A' season and intended to decrease during 'MD' and 'D' and subsequently increased during MA season. In addition, gene encoding xanthine dehydrogenase (XDH), an enzyme involved in xanthine (precursor of caffeine) catabolism showed higher expression in old tissue during development; while during seasonal variation highest expression was observed at 'D' and 'MD' seasons. Interestingly, expression of XDH gene was not affected by cultivars ( Fig. 3B; Supplementary Table S8A).
Among the cultivars caffeine content was highest in 'H' (4.4%) followed by 'T' (3.6%) and 'U' (3.4%). However, lowest caffeine content was observed in old tissue of 'U' (0.75%) cultivar at 'D' season ( Table 2). Caffeine content was found to be higher in young tissue during development in all three cultivars irrespective of seasons (Table 2). Caffeine content was recorded maximum during ' A' season and minimum during 'D' season in all three tea cultivars (Table 2).
Further, correlation was identified for total catechin (TC) and caffeine content to expression of important genes of catechin and caffeine biosynthesis mentioned above in three tea cultivars during development and seasonal variation. Relative expressions of PAL, 4CL, C4H, CHS, CHI, F3′ H, F3′ 5′ H, LAR, DFR, and ANS gene were positively correlated with TC during ' A' and 'MA' season (Table 3). Moreover, expression of 4CL, C4H, CHS, FNS, LAR, DFR and ANS was also positively correlated with TC content during 'MD' season (Table 3). While during 'D' season the expression of only PAL and FNS genes were positively correlated with TC content (Table 3). Moreover, in caffeine biosynthesis pathway significant positive correlation was observed for SAMS and TCS gene expression with caffeine content during ' A' , 'MD' and 'MA' seasons (Table 3). However, the expression of XDH showed negative correlation with caffeine content during ' A' , 'MD' and 'MA' season (Table 3).
Data on catechin and caffeine contents were analyzed by two-way ANOVA followed by Duncun's test to explore the effects of cultivar, season, tissue, cultivar*season, cultivar*tissue, season*tissue and cultivar*season *tissue in tea plants. Significant differences in individual and total catechin and caffeine concentrations were identified in three tea cultivars during development and seasonal variation (Table 4). Results indicated that different tissue, cultivars and seasons have effect on catechin and caffeine accumulation in tea plants. Transcriptional response of phytohormone biosynthesis and signal transduction related genes during development and seasonal variation. The DEGs annotated to phytohormone metabolism in KEGG enrichment analysis were further investigated to identify transcriptional regulation of various phytohormones during development and seasonal variations in three tea cultivars. The enrichment of carotenoid pathway, associated with ABA metabolism was observed (Supplementary Table S7). The ABA biosynthetic pathway genes such as 9-cisepoxycarotenoid dioxygenase (NCED) and zeaxanthin epoxidase showed higher expression in old tissue compared to young tissue during development. While during seasonal variations, expression of these genes were observed to be higher at 'D' and 'MD' seasons in 'H' and 'T' cultivars respectively. Conversely, in the 'U' cultivar expression of NCED gene was higher during ' A' and 'MA' seasons ( Fig. 4A; Supplementary Table S9). However the transcripts of genes encoding ABA8′ -hydroxylase, a key enzyme for ABA degradation 26 , showed higher expression during 'D' . Similarly differential expression of gibberellin metabolism-related genes was observed by analyzing the diterpenoid biosynthesis pathway (Supplementary Table S7). The expression of transcripts annotated as GA20 oxidase (GA20ox), a key enzyme of active gibberellins' biosynthesis was higher in young tissue compared to old tissue during ' A' season in all three tea cultivars. However, the highest expression of this gene was noticed during 'MD' season only. Conversely, expression of the gene encoding gibberellin 2-beta-hydroxylase, an enzyme involved in gibberellin inactivation showed higher expression in old tissue compared to young tissue during 'D' and 'MD' seasons (Fig. 4A).
The cysteine and methionine metabolic pathway, related to ethylene biosynthesis was found to be significantly enriched in KEGG analysis (Supplementary Table S7). Genes related to ethylene biosynthesis such as S-adenosyl-L-methionine synthetase (SAMS), ACC synthase (ACS) and ACC oxidase (ACO) were differentially expressed. The expressions of ACS and ACO genes were higher in old tissue during 'D' and 'MD' seasons in 'T' and 'U' cultivar. However in 'H' cultivar, expression of these genes was higher in young tissue during ' A' and 'MA' season (Fig. 4A). Moreover, the expression SAMS was higher in young tissue during ' A' and 'MA' season. Further, differential expressions of genes involved in other phytohormone pathways were also investigated. Indole-3-pyruvate monooxygenase gene or YUCCA, which catalyzes the rate limiting step in the auxin biosynthesis pathway 27   Conversely, this gene was up-regulated in 'U' cultivar. Jasmonate-O-methyltransferase, whose product deactivates jasmonate, was up-regulated in young tissue at ' A' season in all three cultivars. However, an increase in expression of this gene was also observed during 'MD' season in 'H' and 'U' cultivars ( Fig. 4A; Supplementary Table S9).
DEGs were also annotated in plant hormone signal transduction pathways (Supplementary Table S7), among which ABA, gibberellins, ethylene, auxin and brassinosteroid were further analyzed ( Fig. 4B; Supplementary Table S9). The relative expression of genes involved in ABA signaling pathway such as abscisic acid Sample GC mg/g CV% C mg/g CV% EC mg/g CV% ECG mg/g CV% EGC mg/g CV% EGCG mg/g CV% TC mg/g CV% Caffeine mg/g CV%  Table 3. Correlation between expression of catechins and caffeine biosynthesis-related genes and total catechin and caffeine content among three tea cultivars during development and seasonal variation. Asterisks indicate that the genes from catechin and caffeine pathways are significantly correlated with the catechin and caffeine content (*P ≤ 0.05 and 0.01, **P ≤ 0.001).
Scientific Transcriptional response of epigenetic regulation related genes during development and seasonal variation. The expression of chromatin remodeling components during dormancy and in subsequent growth reactivation has been conducted in tea (Fig. 5). Dormancy was observed to be inducing the expression of different histone deacetylases, SUVR3 and SWI/SNF-like homolog. Also, higher expression of methyltransferase gene and lower expression of DEMETER-like as well as ROS genes was observed during 'D' and 'MD' seasons in old tissue of three tea cultivars (Fig. 5).

qRT-PCR validation of differentially expressed transcripts.
To authenticate the reproducibility as well as accuracy of differential gene expression identified through RNA-seq and computational analysis, 10 genes related to phenylpropanoid, flavonoid and caffeine biosynthesis pathways having differential expression were selected for qRT-PCR. Among the chosen transcripts the expression pattern of 9 genes obtained through qRT-PCR and RNA-seq was in accordance with each other. Bar-graph was plotted by comparing the change in log2 fold values calculated by transcriptome analysis and qRT-PCR (Fig. 6). However, one transcript (LAR) did not match precisely with its RNA-seq value (Fig. 6). Such differences in expression pattern observed by RNA seq and qRT-PCR methods have also been reported by other groups as well 1,28,29 .

Discussion
Development (apical bud growth), seasonal variation as well as cultivars are amongst the critical determinants of tea quality and yield. Due to the significance of development, tea plants have evolved sophisticated system to detect environmental conditions and to regulate developmental programs. Recently, Paul et al. 9 have reported the expression profile in assamica type of tea clone during active growth and dormancy. However, the molecular regulations related to tea yield and quality owing to the influence of development, seasonal variations and genetic background remains largely unexplored. Therefore, present investigation targets at generating knowledge about  gene networks and molecular regulation of processes that are influenced by development and seasonal variation in three tea cultivars grown in Kangra valley of Himachal Pradesh, India. Global transcriptional reprogramming is considered as the important molecular response of the plants to adapt to different developmental and season variations. To understand the molecular response, RNA-seq analysis of tea cultivars was performed and investigated the transcriptional differences during development and seasonal variation in three tea cultivars. Identification of several genes and transcript isoforms in tea demonstrated the power of deep sequencing technology. Large number of transcripts exhibited a developmental stage, season and/ or cultivar specific response. Differences in number of DEGs in 'T' and 'U' and 'H' cultivars during development and seasonal variation has indicated different response of these cultivars towards such variations. The results suggest that major transition in transcriptome of three tea cultivars occurred during development at ' A' season, while it remained more or less quiescent at 'D' season. Across seasonal variations, less number of DEGs during 'D' season has indicated the down-regulatory effect of dormancy on transcript expression. Such down regulatory effects on transcript number and expression during dormancy has also been reported in assamica type of tea clone 9 . Overall analysis of RNA-seq data revealed a complex transcriptional network governing developmental and seasonal variation in tea.
Catechins are not only important for tea quality but also related to the growth and metabolism of tea plant. The amount of catechins present in the tea shoots gives an indication of the potential of a cultivar to produce good quality tea. Recently, RNA-Seq technology was used to identify key genes involved in the regulation of the flavonoid biosynthesis pathway in tea 11,30 . However, only few reports are available on the changes of genes related to catechin biosynthesis in response to environmental stresses 31,32 . Here, comparative transcriptome analysis data documented that the expression levels of F3′ H, F3′ 5′ H, FLS, DFR, LAR and ANR were highest in 'T' and 'U' cultivars and supported the consistent changes in EGCG, ECG, and EGC along with TC levels. Results have suggested the key regulatory role of these genes in controlling the levels of catechins in different cultivars. Moreover, higher catechins are reported in larger-leaf tea species 10 . The leaf size of cultivars used in the present study were large, medium and small in size in 'U' , 'T' and 'H' cultivar respectively. In agreement with this, highest TC was recorded in 'U' cultivar followed by 'T' and 'H' cultivars which further suggested the potential of 'U' cultivar to produce better quality tea compared to 'T' and 'H' cultivars. Comparatively stable gene expression and metabolite content across the season and tissue further suggested 'H' cultivar to be more tolerant to season and tissue. Higher cold tolerance has also been reported for chinary tea cultivar in a previous study 33 . An increased expression of FLS with decrease in TC content in old leaf tissue during development has indicated FLS as negative regulator of catechin biosynthesis in tea. Opposite relation in the expression of FLS and catechin biosynthesis has been reported in previous studies as well 31,34 . A significant higher TC in young tissue during development signifies the importance of young tissue (two leaf and a bud) in producing good quality tea. Together gene expression and content analysis across the season indicated that dormancy has reduced the quality of tea leaves by reducing the accumulation of major catechins. Moreover, higher TC during ' A' season emphasizes the importance of season in tea quality. Finding has suggested the key role of transcriptional regulation on catechins content during distinct stages of activity-dormancy cycle in different tissues and cultivars.
Caffeine is important bioactive ingredient synthesized in young leaves of C. sinensis plants. Previously, caffeine content and expression of TCS of caffeine biosynthesis pathway has been reported during development and seasonal variations in Kangra jat tea cultivar 4 . In present study, SAMS, MXMT, TCS genes were upregulated while XDH was downregulated with increase in caffeine content in young tissue and during ' A' season in all three cultivars. Results suggested key regulator role of SAMS, MXMT, TCS and XHD genes in caffeine biosynthesis/ catabolism in leaves of different tea cultivars.
Transcriptome analysis has identified transcriptional regulation of ABA and GA levels during distinct stages of activity-dormancy cycle in three tea cultivars. ABA has been proposed to promote and maintain bud dormancy in woody plants [35][36][37] . It has been suggested that ABA levels are maintained by a delicate balance between its biosynthesis and catabolism, rather than simply by biosynthesis alone. High levels of ABA are maintained when both ABA biosynthesis and catabolism are active 26 . Further, ABA biosynthesis-related genes have been shown to upregulate whereas the genes responsible for ABA catalyses were downregulated during dormancy 12,15 . In this study, relative expression of both ABA biosynthesis (NCED and zeaxanthin epoxidase) and catabolism (ABA8′ -hydroxylase) genes were found to be increased on the onset of dormancy. However, higher expression of NCED and zeaxanthin epoxidase than ABA8′ -hydroxylase may describe the dormancy induced ABA accumulation. Conversely, lower expression of NCED and zeaxanthin during dormancy release (at MA season) further suggested reduced biosynthesis of ABA. Moreover, changes in hormonal levels could also change the hormone sensitivity in the pathway which could be another crucial manifestation in the activity-dormancy transitions of plant. Further, during dormancy higher expressions of ABA response genes as well as ABA biosynthetic genes was observed. Expression regulation has suggested the significant role of ABA in dormancy induction and maintenance in tea.
On the other hand, modulation of gibberellin (GA) biosynthesis and signaling could play a key role during the activity-dormancy cycle. Previous reports have shown an increase in GA levels during delayed dormancy in hybrid aspen 38 . Also, decrease in GA20ox expression and increase in GA2ox expression has been reported towards endodormancy release in Japanese pear. Upon dormancy release, higher expression of GA20ox and lower expression of GA2ox has been reported in hybrid aspen 12 . A significant decrease in GA levels has been reported in tea during dormancy 39 . Regulation of GA metabolism related genes in tea cultivars during dormancy and in subsequent dormancy release (MA season) has emphasized that this could be due to downregulation of GA20 OX expression and upregulation GA2 OX during dormancy and upregulation of GA20 OX and downregulation of GA2 OX during dormancy release. Hence, GA20 OX and GA2 OX are suggested to be the key targets for modulating GA levels in tea plant. DELLA protein has been reported to play an important role in maintenance of dormancy in leafy spurge 40 and paoenia ostii 41 . Therefore, higher expression of DELLA protein during onset of dormancy has suggested its role in dormancy maintenance in different tea cultivars. Moreover, the gene expression data presented here, as well as hormonal measurements during dormancy in tea from earlier studies, indicate that ABA 42 and GA 39 metabolism display opposite responses during dormancy in tea. Beside transcriptional regulation, epigenetic modifications such as chromatin remodeling, histone modification and DNA methylation have been known to play an important role in regulating gene expression in perennial plants during development and seasonal dormancy 14 . However, involvement of such epigenetic regulatory mechanisms is so far unknown in tea plant. There is growing evidence that genome-wide epigenetic regulation of gene expression is involved in dormancy regulation 14,43 . Higher DNA methylation and lower H4 acetylation levels were shown to induce dormancy in Castanea sativa 12,44 . Further, Bertoni et al. 45 study has strengthen the importance of epigenetic regulation during dormancy by reporting histone deacetylases and histone lysine methyltransferase (SUVR3) upregulation in hybrid aspen. In poplar too, SWI2/SNF2-like genes responsible for chromatin modification have displayed upregulation during dormancy 46 , while its expression was reported to be significantly downregulated during dormancy release in leafy spurge 47 . In pursuance, the expression of chromatin remodeling components has been observed to be inducing the expression of different histone deacetylases, SUVR3 and SWI/SNF-like homolog during dormancy. The transcript abundance of histone deacetylases and SWI/SNF-like homolog during 'D' and 'MD' seasons and in old leaf tissue has suggested their role in facilitating the compaction of chromatin and suppression of gene expressions.
Apart from chromatin remodeling, de novo DNA methylation is also known to be involved in dormancy. DNA methylation can be precisely estimated by the transcript level of two genes such as DNA methyltransfereases and DNA glycosylases. Higher DNA methylation level was reported in Castanea sativa during bud rest period 41 and in strawberry during induction of dormancy 48 . Expression of cytocine-5 methyltransferase too was reported higher in Japanese pears during endodormancy 15 . DEMETER-like and repressor of silencing (ROS) gene which encode DNA glycosylase are known to demethylate and activate genes contributing in dormancy release 12 . Moreover, Bertoni et al. 45 reported that DEMETER-like DNA glycosylases are downregulated during dormancy. In agreement with these findings, higher expression of methyltransferase gene and lower expression of DEMETER-like as well as ROS genes observed during 'D' and 'MD' seasons in old tissue of three tea cultivars has suggested that higher DNA methylation could be one of the factors responsible for initiating dormancy in tea plant. Thus it appears that modulating expression of genes governing DNA methylation could represent a potent strategy for overcoming dormancy in tea.
Further, transcriptome analysis has revealed that expression of several genes related to phytohormones and histone/DNA modification was modulated during seasonal active-dormancy cycle in tea (Fig. 7). Thus, transcriptional control of phytohormones and histone/DNA modification are seem to be associated with transitions of active-dormancy cycle. Conclusion Present study provides a global and a comparative survey of transcriptomes of three tea cultivars and thus may serve as an available genetic diversity resource for the tea plant. Study has generated gene expression profiles for two tissues at different seasons in tea plant. Data has also identified regulation of catechins and caffeine pathways in three tea cultivars during development and seasonal variation. Regulation of phytohormone metabolism and signaling at the transcriptional level was found to be important during development and seasonal variation in tea. Data has further revealed that along with transcriptional regulation, epigenetic control could play a key role in regulation of development and seasonal variation in three different tea cultivars.
cDNA library preparation and illumina sequencing. Total   were further processed to sequence and base-calling as well as quality value calculations were performed using Illumina data processing pipeline (RTA version 1.9).
De novo assembly and homology search. Pair end (PE) sequence reads were generated with 72 bases length using CASAVA 1.8 package tool provided by Illumina. Reads' quality was assessed using FilteR tool 28 . High quality reads were used for de novo assembly till scaffold level using SOAPdenovo-trans tool for three tea cultivars separately 49 . K-mer size of 61 was selected for each of the assembly as it achieved the best balance between the numbers of transcripts produced, average length of transcripts, N50 length value and average coverage of total assembly. For more effective assembly, gap filling was carried out by GapCloser to complete scaffolds using the average insert length of 300 bases with paired-end information 50 . Sequence redundancy was removed employing CD-HIT-EST (similarity cut-off ≥ 90%) 51 and TGICL-CAP3 (identity ≥ 90%) tools 52 . Further, dissimilar sequence clustering (DS) was performed using BLASTX hits 28 . A transcript sequence with highest bit score and longest sequence length was selected as unigene representative from each DS cluster.

Annotation and differential expression analysis. Annotations for Gene Ontology (GO), Kyoto
Encyclopedia of Genes and Genomes (KEGG) and Enzyme Commission Codes (EC) for assembled transcripts were carried out using Annot8r annotation tool against UniProt database 53 . Based on highest bit score and E-value, top-hits were selected. For expression measurement as FPKM (Fragments per kilobase of transcript per million mapped) by RSEM 54 , reads from all the conditions were mapped back to assembled transcripts using Bowtie tool for each cultivar individually 55 .
For differential expression (DE) calculation, mapped read count for each tea transcript, according to each conditions were used. DE was carried out using EdgeR package in R 56 . EdgeR estimates the mean and variance of raw read counts under a negative binomial distribution and use exact test to identify differentially expressed transcripts. MA plot was also plotted for DE during various conditions. MA-plot is a scatter plot of logarithmic fold changes (on the y-axis) versus the mean of normalized counts (on the x-axis).
AgriGO tool 57 based singular enrichment analysis was carried out to identify the enriched Gene Ontology terms in all comparative conditions with DE genes at significance level of 0.05. To counterbalance the problem of multiple comparisons, hyper-geometric statistical test was applied with Bonferroni correction method. Similarly, KEGG pathways based enrichment analysis was also performed as described earlier 58 . Where, hyper-geometric test was applied with adjust P-values for multiple comparisons with BH method 59 in R package 60 . Further, differentially enriched pathways with p-value ≤ 0.05 and correction value/FDR (False discovery rate) ≤ 0.05 were selected as significant.
Quantitative real-time polymerase chain reaction (qRT-PCR) analysis. For the validation of reliability of RNA-Seq data, the relative expression levels of 10 selected genes were measured using qRT-PCR. First strand cDNA was synthesized from1μ g of DNaseI-treated total RNA using high capacity cDNA reverse transcription kit (Applied Biosystems, USA). Primer express 3.0 software (Applied Biosystems, USA) was used to design gene specific primers for qRT-PCR (Supplementary Table S10). Glyceraldehyde-3-phosphate dehydrogenase (GAPDH) from tea was used as reference gene. Further, qRT amplification was performed in triplicate on a Step One real-time PCR machine (Applied Biosystems, USA) using SYBR Green qPCR Master Mix (Thermo Scientific, USA). The conditions for qRT-PCR were 94 °C for 4 min followed by 40 cycles of 94 °C for 30 s, 58-61 °C for 30 s and 72 °C for 30 s (data collection) with a final melting curve analysis. The relative expression of each gene was calculated using Ct method, a comparative 2^ddCT method 61 .
Quantification of catechins and caffeine. Extraction and quantification of six tea catechins namely catechin (C), epi-catechin (EC), gallocatechin (GC), epi-gallocatechin (EGC), epi-catechin gallate (ECG), epigallocatechingallate (EGCG) and caffeine was carried out using the method of Joshi et al. 62 . Identification and quantification of C, GC, EC, ECG, EGC, EGCG and caffeine was performed by matching retention time, co-injections and spectral matching with authentic standards. Mean area of three replicate injections was considered for finding the concentrations in samples.

Statistical analysis of catechins and caffeine content. Different catechins and caffeine content
was expressed as the mean ± standard deviation (SD) of three technical replicates. The data were analyzed by two-way ANOVA followed by Duncan's multiple range test at P ≤ 0.05. Further, statistical significance of differences between samples was assessed via t-test and f-test. The correlation was analyzed via Pearson correlation at P ≤ 0.05.