RNA-Seq Using Two Populations Reveals Genes and Alleles Controlling Wood Traits and Growth in Eucalyptus nitens

Eucalyptus nitens is a perennial forest tree species grown mainly for kraft pulp production in many parts of the world. Kraft pulp yield (KPY) is a key determinant of plantation profitability and increasing the KPY of trees grown in plantations is a major breeding objective. To speed up the breeding process, molecular markers that can predict KPY are desirable. To achieve this goal, we carried out RNA-Seq studies on trees at extremes of KPY in two different trials to identify genes and alleles whose expression correlated with KPY. KPY is positively correlated with growth measured as diameter at breast height (DBH) in both trials. In total, six RNA bulks from two treatments were sequenced on an Illumina HiSeq platform. At 5% false discovery rate level, 3953 transcripts showed differential expression in the same direction in both trials; 2551 (65%) were down-regulated and 1402 (35%) were up-regulated in low KPY samples. The genes up-regulated in low KPY trees were largely involved in biotic and abiotic stress response reflecting the low growth among low KPY trees. Genes down-regulated in low KPY trees mainly belonged to gene categories involved in wood formation and growth. Differential allelic expression was observed in 2103 SNPs (in 1068 genes) and of these 640 SNPs (30%) occurred in 313 unique genes that were also differentially expressed. These SNPs may represent the cis-acting regulatory variants that influence total gene expression. In addition we also identified 196 genes which had Ka/Ks ratios greater than 1.5, suggesting that these genes are under positive selection. Candidate genes and alleles identified in this study will provide a valuable resource for future association studies aimed at identifying molecular markers for KPY and growth.


Introduction
Eucalyptus nitens (shining gum) is a perennial forest tree species grown mainly for kraft pulp production (KPY) in many parts of the world [1]. KPY is considered a key determinant of plantation profitability [2] and consequently increased KPY is a major objective of breeding programs [3]. In forest tree species, markerassisted selection (MAS) is particularly attractive because conventional selection is impeded by long generation times and long delays until the full expression of mature traits [4]. A common feature of most agronomic traits in trees is that they are complex, and likely to be controlled by variation in many genes. Currently, there are two approaches being explored in trees for applying markers in breeding for improvement of complex traits. In the first approach, known as Genomic Selection (GS), large numbers of random markers are used for predicting phenotypes from genotypes [5]. In the second approach, markers potentially controlling the trait occurring within candidate genes are identified using association genetics in candidate genes. These associated markers are then used to predict traits as in GS with random markers [6]. The discovery of high quality candidate genes is therefore a crucial step in the discovery of polymorphisms associated with complex traits such as growth and pulp yield.
Recent developments in sequencing technologies are making it possible to identify large numbers of high quality candidate genes by exploring gene expression at the whole transcriptome level. RNA sequencing (RNA-Seq) uses next generation sequencing technologies to sequence complementary DNA (cDNA), and the resulting sequencing reads are either assembled de novo or mapped on to a reference genome, if available. Differential gene expression can be examined by comparing the number of reads mapping to genes in samples derived from different conditions. Such RNA-Seq experiments are new to forest tree species and only a few studies have been published to date [7][8][9][10]. In addition to the identification of differentially expressed genes, RNA-Seq can also be used to identify differentially expressed alleles [7]. Until recently, microarrays were predominantly used to explore differential expression in large numbers of genes [11]. However RNA-Seq is replacing microarrays to overcome some of the limitations in microarray studies including increased false positives due to hybridization signals [12] particularly from transcripts of low abundance [13]. RNA-Seq is also useful for discovering new transcripts, while microarrays can only detect transcripts that correspond to existing genomic sequence information.
In this study, we used RNA-Seq to identify candidate genes and alleles that may influence wood and growth traits by comparing gene expression in cambial tissue between low and high KPY trees. Cambial tissue is widely used in forest tree species to study patterns of expression of genes involved in wood (xylem) development. KPY is a wood quality trait of forest trees and is influenced by the cellulose and lignin content of xylem. Several studies have shown that virtually all cellulose and lignin biosynthetic genes are expressed in cambial tissue. Therefore cambial tissue is widely used as a key organ to identify genes relating to pulp yield in a number of studies [14][15][16][17]. In this study, we identified several genes and alleles affecting wood and growth traits which were consistent between two populations. The functional variants showing differential allelic expression identified in this study are useful for future association studies to identify markers for KPY and growth traits.

Plant material and RNA extraction
Plant material from two trials of E. nitens at Meunna (241.08uS, 145.47uE) and Florentine (242.54uS, 146.51uE) in Tasmania, Australia were used in this study. Meunna and Florentine are approximately 350 kilometres apart and located at an altitude of 297 m and 266 m above mean sea level, respectively. The annual rainfall of Meunna and Florentine are 1007 mm and 1225 mm, respectively. The two trials were established in 1993 to study the performance of 420 E. nitens families, each represented by two-tree plots in each of five replicates. Cambial scrapings for RNA extraction were collected from 44 trees, 22 each from high pulp yield and low pulp yield extremes in the Meunna trial (March 2011) and 66 trees, 33 each from high pulp yield and low pulp yield extremes, in the Florentine trial (May 2012). Scrapings were immediately frozen on dry ice then stored at 280uC. Total RNA was isolated from the cambial scrapings following a modified CTAB method as described in [18]. RNA samples were then treated with TURBO DNA-free Kit (Cat No. AM1907, Ambion) to remove contaminating DNA from RNA preparations and to remove the DNAse from the samples. Concentrations of RNA samples were measured using a QUBIT fluorometer and all the samples were normalized to 100 ng/ul. An equimolar concentration of total RNA from trees in each category (high and low pulp yield) was pooled into three bulks of seven to eight trees each in Meunna and 11 trees each in Florentine and quality checked using an Agilent 2100 Bioanalyser. These three bulks from each treatment were used as biological replicates in differential gene expression analyses.

cDNA Sequencing
In total, six RNA bulks from two treatments (three from high and three from low pulp yield) from each trial were sequenced (paired end) at the Australian Genome Research Facility using the Illumina HiSeq platform (HiSeq 2000). Raw sequence reads were obtained using the Illumina CASAVA pipeline version 1.8.2.

RNA sequence reads mapping and transcript assembly
Adapter sequences from all raw sequence reads were removed using CLC Genomics Workbench v6.0.4 (CLC Inc, Aarhus, Denmark) and sequence reads having a quality score less than 20 were discarded using the Popoolation package [19]. Quality trimmed sequencing reads from all 6 libraries in each trial were pooled and mapped to the Eucalyptus grandis reference genome (http://www.phytozome.net/eucalyptus.php) with TopHat v2.0.9 [20] which uses Bowtie v0.12.7 [21] as an alignment engine. Tophat was run with the default parameters. To determine and exclude ambiguous reads mapping to multiple transcripts we used Tophat's default option (-g value: 20 multi-hits). Since we do not have a reference genome sequence for E. nitens, we used the publicly available E. grandis reference genome sequence for mapping the sequencing reads. A binary sequence alignment file (BAM) produced by TopHat and a FASTA file of E. grandis genome sequence was used to generate transcript annotations in GTF format using Cufflinks v1.1.0 [22]. Cufflinks was run with default parameters without supplying any annotation file. BEDtools v2.18.1 [23] was used to estimate the counts of reads in individual bulks that are mapping to different gene products in the GTF annotation file using the BAM file from each library. Raw read sequences and the read counts data are deposited in NCBI's Gene Expression Omnibus and are accessible through GEO series accession number GSE56592.

Differential gene expression (DGE) analyses
The count files generated using BEDtools for individual bulks were used to find significant differences in transcript abundance between low and high KPY samples using edgeR [24]. EdgeR identifies differentially expressed transcripts based on the assumption that the number of reads produced by each transcript is proportional to its abundance. edgeR measures transcript abundance in counts per million (CPM). As there were three biological replicates each for low and high pulp yield samples in each trial, edgeR observes the differences in the CPMs for each gene across the replicates and uses these variance estimates to calculate the statistical significance (p-values) of observed differential expression. Transcripts with very low expression were filtered before DE analysis based on an expression cut-off of 1 CPM in at least three libraries. For the library sizes in this study, one CPM would correspond to ,50 read counts for the Florentine trial and ,70 read counts for the Meunna trial. Benjamini and Hochberg's algorithm [25] was used to control false discovery rate (FDR) due to multiple testing in differential expression analysis.
A web-based tool High-Throughput GOMiner [26] was used to categorise the differentially expressed genes based on their function. To identify Arabidopsis homologs for gene models predicted from transcriptome mapping, BEDTools was used to extract sequences of all genes from the E. grandis reference genome sequence using gene coordinates from the gene annotation (GTF) file produced using the 'Cufflinks' package. The extracted gene sequences were BLAST searched with the Arabidopsis protein database. The identified Arabidopsis homologs were used in GO enrichment tests based on Biological processes.

SNP discovery and differential allelic expression (DAE) analysis
To identify SNPs from the RNA-Seq data, BAM files generated from TopHat were used in SAMtools to produce an mpileup file. Reads from the three biological replicates from each treatment were combined to increase coverage and confidence of the SNP calls. The mpileup file was used in VarScan [27] to call SNPs. The following parameters were used in SNP calling: minimum read depth (50), minimum supporting reads (20), minimum base quality (20), minimum variant allele frequency (0.01), P-value threshold for calling variants (0.05). We used these stringent parameters compared to the less stringent default parameters to avoid false positives in SNP calling. We also tested for differences in the frequency of the alleles at each SNP between low and high pulp yield samples in each trial. A chi-square test was performed to estimate the significance of allele frequency differences.
A GO analysis was conducted using High-Throughput GOMiner to categorise the genes that had differentially expressed alleles based on their function. Separate analysis was performed for genes that showed both DGE and DAE and genes that showed only DAE.

Identification of genes under positive selection based on Ka/Ks ratios
We used Popoolation to annotate synonymous (SS) and nonsynonymous (NS) substitutions using an mpileup file containing reads merged from all the six bulks from each trial and a coding sequence (CDS) gene annotation file of E. grandis. A minimum allele count of 4, minimum coverage of 20 reads, maximum coverage of 2000 reads and a minimum phred quality of 20 was used to identify SNPs. The identified nonsynonymous and synonymous SNPs were used to estimate Ka/Ks ratios (ratio of number of nonsynonymous substitutions per nonsynonymous site to the number of synonymous substitutions per synonymous site). The nonsynonymous and synonymous SNPs were normalized by their respective lengths estimated with the Popoolation package. A constant 1 is added to the number of SNPs to enable comparisons with genes containing no SNPs, as suggested by [28]. Genes with Ka/Ks ratio of more than 1.5 were considered genes under positive selection. We also conducted GO enrichment tests to identify the biological processes associated with the genes showing positive selection signatures.

Sequencing output
RNA samples of E. nitens trees representing the KPY extremes were collected in two trials, Meunna and Florentine. Distributions of pulp yield for the collected samples from both the trials are shown in Fig. 1. A positive correlation between KPY and diameter at breast height (DBH) was observed for samples in both Meunna and Florentine ( Figure S1).
From both the Meunna and Florentine trials, six RNA bulk libraries from two treatments (three from high and three from low pulp yield) were paired-end sequenced on one lane of an Illumina HiSeq flowcell. In Meunna this yielded a total of 430 million reads, with individual library yields ranging from 49 to 78 million reads. In Florentine, this yielded a total of 286 million reads, with individual libraries yielding 43 to 53 million reads. These reads were mapped to the E. grandis reference genome using Bowtie and TopHat software packages. Sequencing reads from three bulks within a treatment were used as biological replicates in differential gene expression analyses.

Differential gene expression analysis
To identify the candidate genes controlling KPY and growth traits, we performed differential gene expression (DGE) analysis using edgeR on the E. nitens transcripts which had a minimum of one counts per million (CPM) in at least three libraries. The downregulated genes in low KPY (low DBH) samples are primarily involved in growth and cell wall formation while up-regulated genes in low KPY samples (up-regulated in high KPY samples) are mainly involved in biotic and abiotic stress tolerance reflecting the low growth of the low KPY samples. Several genes putatively involved in wood formation and growth such as alpha and betatubulins, calcium dependent protein kinase, cellulose synthases, cellulases, COBRA-like proteins, 4-coumarate:CoA ligase, FAS-CICLIN-like arabinogalactan protein, MYB domain proteins, protein kinases, SAM-dependant methyltransferases, sucrose synthases, xyloglucan endotransglucosylases were down-regulated in low KPY samples (up-regulated in high KPY samples). On the other hand, biotic and abiotic stress related proteins such as several heat shock proteins, pathogenesis related proteins, senescence related genes, zinc induced facilitator-like proteins and WRKY DNA binding proteins were present among the up-regulated genes in low KPY (low growth) samples.
Overall, 32,903 and 30,570 transcripts were predicted in Meunna and Florentine trials, respectively. After filtering for low expression transcripts, 26,279 and 23,917 transcripts from Meunna and Florentine trials were used in DGE analysis. Log2 fold changes between low and high KPY samples ranged from 2 6.79 to 6.26 in Meunna and from 27.75 to 8.18 in Florentine. To reduce false positives, only transcripts that were differentially expressed at 5% FDR level were declared as DE genes. At 5% FDR level, a total of 6122 and 7240 transcripts (4479 and 5528 unique genes based on E. grandis annotations) showed differential expression between low and high KPY samples in Meunna (Table  S1) and Florentine (Table S2) respectively. Of these, 3615 (59%) transcripts were down-regulated and 2507 (41%) up-regulated in low KPY samples in Meunna. In Florentine, 3674 (51%) were down-regulated and 3566 (49%) were up-regulated in low KPY samples. Heatmaps were generated for both the trials using log2CPM of the top 500 genes that were differentially expressed in both trials (Fig. 2). Within a treatment (e.g low KPY) gene expression was similar among the three replicates while it was distinct between treatments in both the trials. To determine the relationship between gene expression in Florentine and Meunna the data used for drawing the heatmaps was used to generate dendrograms based on hierarchical clustering ( Figure S2). The low and high KPY samples from both the trials were assigned to two separate major groups confirming the similarity between biological replicates within and between trials.
Comparing the DGE between the two trials, 3972 transcripts were significantly differentially expressed in both the trials at 5% FDR level. Of these, only 19 transcripts (0.5%) showed opposite patterns of expression. Of the 3953 transcripts that had gene expression changes in the same direction in both trials, 2551 (65%) were down-regulated and 1402 (35%) were up-regulated in low KPY (low growth) samples (Table S3). Correlation between Log fold changes in Meunna and Florentine for these 3953 transcripts is very high ( Figure S3). Differential expression of the top 25 downregulated genes and top 25 up-regulated genes are shown in Tables 1 and 2, respectively. Transcript gene coordinates and gene identities of all significantly (FDR,0.05) differentially expressed transcripts in both the trials are shown in Table S3.

Gene Ontology (GO) enrichment analysis of differentially expressed genes
We performed gene ontology (GO) enrichment analyses to functional characterisation of genes showing differential expression. The gene ontology analysis by High-Throughput GoMiner revealed differential enrichment of genes into various biological processes. The genes up-regulated in low KPY (low growth) samples were enriched in biotic and abiotic stress responsive processes. On the other hand, most of the down-regulated genes (up-regulated in high KPY and high growth samples) belonged to gene categories involved in wood formation and growth (Table 3). In Meunna, 57 gene categories were enriched among the genes differentially expressed between low and high KPY samples at the 5% FDR level. Of these, 18 were up-regulated and 39 were downregulated in the low KPY samples. Thirty nine gene categories were enriched among the genes differentially expressed between low and high KPY samples in Florentine. Of these, five categories were up-regulated and 34 down-regulated in the low KPY samples. Five gene categories were up-regulated and 26 downregulated in the low KPY samples in both trials, providing more confidence in the enrichment of these gene categories.

SNP identification and Differential allelic expression analysis
We studied differential allelic expression of SNPs from candidate genes to identify potential functional markers. In Meunna, 303,648 and 318,733 SNPs were identified in high and low pulp yield samples, respectively. Of these, 280,610 SNPs were present in both the samples. In Florentine, 139,408 and 149,633 SNPs were identified in high and low pulp yield samples, respectively. A total of 135,886 SNPs were common between the two samples. In total, 114,667 SNPs were common to both Meunna and Florentine. Most of these SNPs (45%) were synonymous and the remainder were non-synonymous, 39UTR, intron and a small proportion of them were 59UTR (Fig. 3).
To identify putatively differentially expressed alleles between low and high pulp yield samples based on allele frequency differences, a chi-square test was performed using 114,667 SNPs that are common to both the trials. Using a conservative Bonferroni corrected P value of 0.0001, we identified 27,708 and 9076 SNPs that were differentially expressed in Meunna and Florentine, respectively ( Table 4, Table S4). Of these, 3390 SNPs showed DAE in both the trials and 2103 (62%) of these had allelic frequencies in the same direction in both the trials indicating the robustness of allelic expression of these SNPs. These 2103 SNPs come from 1068 unique genes (Table S4).
Of the 2103 SNPs showing DAE, 640 SNPs (30%) occurred in 313 unique genes that showed DGE (total gene expression). These SNPs may be the cis-acting regulatory variants that influence total gene expression directly or SNPs in high linkage disequilibrium with the cis-acting polymorphisms. In other words, 313 genes had both differential gene and differential allelic expression between low and high KPY samples. Most of the SNPs (61%) which showed DAE and DGE were synonymous SNPs (Fig. 3), suggesting a common role of synonymous SNPs as cis-acting variants. Genes that showed differential expression at both gene and allele levels included cellulose synthases, COBRA-like proteins, FASCICLIN-like arabinogalactan proteins, protein kinase superfamily protein, S-adenosylmethionine synthetases and beta-tubulins. Among the 640 SNPs that showed both DGE and DAE 389 SNPs were synonymous, 120 were nonsynonymous, 91 were 39UTR, 18 were 59UTR and 10 were intronic SNPs. These intronic SNPs may come from unspliced pre-mRNAs.
Gene Ontology (GO) enrichment analysis of genes having differentially expressed alleles GO enrichment analysis was conducted at two levels: 1) for genes that showed both DGE and DAE and 2) for genes that showed only DAE but no DGE. GO enrichment analysis revealed 24 categories (FDR 5%) for genes that had both DGE and DAE ( Table 5). Most of these gene categories belonged to processes related to cell wall development. A total of 56 gene categories were enriched for genes that had only DAE but no DGE (Table S5). Most of these categories included genes involved in catabolic and metabolic processes (growth) and genes responding to abiotic stress factors.
Genes showing selection signature based on Ka/Ks ratios By comparing the two trials we observed in total 196 genes which had Ka/Ks ratios of more than 1.5 in both the trials strongly suggesting that these genes are under positive selection. Of these 196 genes, 27 genes also showed DGE between low and high KPY (growth) samples in both the trials (Table 6). Also, ten SNPs from seven genes that showed signatures of positive selection in both trials, showed DAE between low and high KPY (growth) samples in both the trials (Table S6). Seven of these SNPs were nonsynonymous, two were synonymous and one was within the 59UTR. None of the genes containing these ten SNPs showed

Discussion
We analysed samples from the extremes of the distribution of KPY in two E. nitens trials which also differed in growth. By examining whole transcriptome data we identified several genes and alleles whose expression is correlated with variation in KPY and/or growth. Most of the genes down-regulated in low KPY (low growth) samples (up-regulated in high KPY samples) were related to cell wall biosynthesis and growth. The down regulation of growth genes in low KPY samples may be due to the positive correlation observed between KPY and growth (DBH, Figure S1). Most of the up-regulated genes in low KPY and low growth samples (the down-regulated genes in high KPY samples) were involved in biotic and abiotic stress tolerance. Numerous studies, particularly in humans, have been reported in which RNA from extreme phenotypes has been sequenced to identify alleles or genes with expression correlated with the trait [29][30][31]. This is one of the first RNA-Seq studies in forest trees that exploits phenotype extremes (low and high KPY/growth) to identify differentially expressed genes and alleles potentially affecting KPY and growth. We identified several genes, including some previously uncharacterized transcripts that are differentially expressed between extreme phenotypes. The main advantage of an RNA-Seq experiment is that in addition to identifying candidate genes, polymorphisms potentially influencing the traits can also be obtained from the same data set. Accordingly, we identified a number of polymorphisms and some of these are potential functional polymorphisms that showed DAE. These variants, particularly the functional variants, can be targeted for application in many downstream analyses including association studies and genomic selection. In addition, we identified putative signatures of positive selection in several genes in this study. Comparison of results from two different trials facilitated the identification genes and SNPs that are consistently differentially expressed across environments.

Cell wall-related genes down-regulated in low KPY samples
We identified more than 6000 (23%) and 7000 (30%) genes that are differentially expressed between low and high KPY samples in the Meunna and Florentine trials, respectively. In spite of different site conditions and time of sample collection, around 4000 genes showed consistent patterns of gene expression across both the trials, suggesting the expression of these genes is relatively stable in different environments. About 2500 genes were down-regulated in low KPY (growth) samples and most of them are related to cell wall biosynthesis and growth. These biologically relevant genes are good candidate genes for KPY and growth and other related wood traits. Genes down-regulated in low KPY (up-regulated in high KPY) samples include cell wall-related genes (cellulose synthases, PAL, SAMS, laccases, cinnamate-4-hydroxylase, COBRA-like protein, FASCICLIN-like arabinogalactan proteins, expansions, pectin-lyase like, plant invertase/pectin methylesterase inhibitor superfamily), glycosyl related genes (glycosyl hydrolase, UDP-Glycosyltransferase superfamily protein), and transcription factors (NAC, MYB). Genes that are down-regulated in low KPY samples in our study have also been found to be preferentially expressed in xylem tissues in several other studies. This includes microarraybased studies in tree species which compared different tissue types such as xylem vs phloem [32], shoot apical meristem vs mature xylem [33] and leaves vs xylem [34].
All of the genes that are up-regulated in low KPY (growth) samples belong to categories such as biotic and abiotic stress response, defense response and apoptosis. Since low KPY trees were generally smaller, this suggests that these trees experienced environmental stress, most likely due to competition effects in the trials. A transcriptome study in Arabidopsis thaliana revealed intra-specific competition resulted in activation of genes related to biotic and abiotic stresses [35]. Slow growing trees have been observed to have lower KPY in other tree species including E. globulus [36] and Populus tremuloides [37]. In a study involving E. globulus and E. nitens trees, Downes et al. [38] showed that irrigated trees had higher KPY compared to trees grown in rain-fed conditions. This suggests that trees with lower growth due to environmental factors, particularly water availability, are directing proportionally less carbon into cellulose.

Prevalence of cis-acting polymorphisms
Thirty percent of SNPs with DAE (640) occurred in 313 genes that had DGE between high and low KPY trees. It is likely that some of these SNPs may be cis-acting regulatory variants controlling the expression of the gene in which they occur. Because there are more than one SNP from a gene in many instances, some of the SNPs in some genes will be in high linkage disequilibrium with the true cis-acting SNP. The remaining 1463 SNPs showed DAE but no DGE. Some of these variants may be trans-acting variants or coding variants in transcription factors that affect their binding affinities to target genes [39]. Cis-acting variants that are present within genes influence traits through their effects on gene expression while trans-acting variants affect transcript levels in target genes by interacting with cis-regulatory sequences [40]. While studying regulatory pathways that affect  Table 4. Differential allelic expression between low and high KPY samples in two populations. hematopoietic stem cell function using recombinant inbred mouse stains, Bystrykh et al. [41] showed strong association of the controlling locus with mRNA expression levels for cis-acting QTLs. In a similar study, investigating two tissues of rat recombinant inbred lines important to pathogenesis of the metabolic syndrome, Hubner et al. [42] observed 85-100% of eQTLs were regulated in cis in both the tissues. Trans-acting polymorphisms are difficult to identify compared to cis-acting polymorphisms for two reasons [43]. Unlike cis variants, trans variants can be anywhere in the genome relative to the target gene. Also, the effects of trans variants on gene expression are generally smaller than the effects produced by cis variants. In this study, most of the genes (95%) that showed both DGE and DAE showed down regulation in low KPY samples at the gene level. That is, most of the cell wall-related genes had both differential total gene expression and differential allelic expression suggesting that these variants which showed DAE may be the cisacting variants influencing gene expression. However, most of the growth and stress responsive genes had only differential total gene expression possibly controlled by trans-acting polymorphisms.

Synonymous SNPs are not always ''silent''
There was a greater tendency for synonymous, rather than nonsynonymous, SNPs to be associated with genes that exhibited DAE and both DAE and DGE (see Fig. 3). This is in line with the expectation that nonsynonymous SNPs are more likely to affect phenotype by altering the amino acid structure, while synonymous SNPs are more likely to influence the trait through their effects on gene expression [6,44]. Synonymous SNPs can affect RNA secondary structure and cause allelic imbalance that could alter the expression of a gene. For example, a synonymous SNP in the corneodesmosin gene induced allele-specific gene expression and led to increased mRNA stability in a psoriasis study across diverse ethnic groups [45]. A synonymous SNP in EniCOBL4A gene was associated with cellulose content by affecting allelic expression [44]. In addition to this, synonymous SNPs can also affect protein expression at the post-transcriptional level [46]. These results suggest that synonymous and other silent polymorphisms are also important in affecting the phenotype and focussing only on nonsynonymous SNPs in molecular studies will result in many functional variants being overlooked.

Detection of signatures of positive selection at apoptosis and defense related genes
Higher Ka/Ks ratios could be due to lower constraints on nonsynonymous mutations in some genes, or through enrichment of nonsynonymous mutations by positive selection [47]. As observed in many studies, most of the genes in this study were under purifying selection based on low Ka/Ks ratios. However, 196 genes (0.9% of total genes) showed signatures of positive selection by having Ka/Ks ratios greater than 1.5. Interestingly, based on GO enrichment analysis, all the gene categories are related to apoptosis and defense response. In an Eucalyptus camaldulensis transcriptomics study only 2% of the genes showed signatures of positive selection and most of them are related to apoptosis and cell death [7]. These consistent results across two eucalypt species suggest that apoptosis and stress-related genes are more rapidly evolving. Apoptosis, a process of programmed cell death, is important for plant development and defense [48]. Similar results were also found in other studies. In rice, an overrepresentation of genes involved in defense response and apopstosis in eQTLs were observed [49]. Also, a study comparing the genomes of humans and chimpanzees to identify positively selected genes [50] reported an enrichment of immunity, defense and apoptosis related genes among the positively selected genes. Similarly, in fish, genes related to immune response and defense response were overrepresented in the positively selected gene list [51]. This rapid evolution of apoptosis genes could be due to the following reasons. First, many apoptosis genes may be newly evolved genes and thus still evolving rapidly under the action of natural selection. Second, because apoptosis related genes are involved in immune and defense response, these genes are rapidly evolving to adapt to new pathogens [52] as shown in the following examples. Bishop et al. [53] showed an excess of nonsynonymous compared to synonymous rates in plant class I chitinase in the genus Arabis. Plant chitinases confer resistance to diseases by degrading chitin, a component of fungal cell walls. Likewise, in wheat, signatures of diversifying selection were observed at the Pm3 locus, which confers resistance to wheat powdery mildew, through an excess of nonsynonymous to synonymous nucleotide divergence [54]. The genes showing signatures of positive selection in this study could be valuable targets for selecting candidate SNPs for growth and survival traits in a range of Eucalyptus species as consistent results were obtained across two Eucalyptus species. However, results from this study need to be treated cautiously as pooled samples are used for detecting the positive selection signatures. These results need to be verified by sequencing of individual samples.

Conclusions
By conducting RNA-Seq analysis in two trials we identified a number of candidate genes and alleles whose expression is correlated with KPY and growth traits in E. nitens. Most of the down-regulated genes in low KPY samples are cell wall-related genes, suggesting that the identified candidate genes are biologically relevant. A number of potential functional polymorphisms were also identified that showed DAE. We detected positive selection signatures in numerous genes that are consistent with the results from RNA-Seq study in E. camaldulensis. The genes and alleles identified in this study form a valuable resource for association and genomic selection studies.