Pinus massoniana Introgression Hybrids Display Differential Expression of Reproductive Genes

Pinus massoniana and P. hwangshanensis are two conifer species located in southern China, which are of both economic and ornamental value. Around the middle and lower reaches of the Yangtze River, P. massoniana occurs mainly at altitudes below 700 m, while P. hwangshanensis can be found above 900 m. At altitudes where the distribution of both pines overlaps, a natural introgression hybrid exists, which we will further refer to as the Z pine. This pine has a morphological character that shares attributes of both P. massoniana and P. hwangshanensis. However, compared to the other two pines, its reproductive structure, the pinecone, has an ultra-low ripening rate with seeds that germinate poorly. In this study, we aimed to find the reason for the impaired cone maturation by comparing transcriptome libraries of P. massoniana and Z pine cones at seven successive growth stages. After sequencing and assembly, we obtained unigenes and then annotated them against NCBI’s non-redundant nucleotide and protein sequences, Swiss-Prot, Clusters of Orthologous Groups, Gene Ontology and KEGG Orthology databases. Gene expression levels were estimated and differentially expressed genes (DEGs) of the two pines were mined and analyzed. We found that several of them indeed relate to reproductive process. At every growth stage, these genes are expressed at a higher level in P. massoniana than in the Z pine. These data provide insight into understanding which molecular mechanisms are altered between P. massoniana and the Z pine that might cause changes in the reproductive process.


Introduction
Gymnosperms have their own unique way of reproduction. Microspores grow into pollen carrying sperm cells, while megaspores develop into the megagametophyte. The archegonium forms during development of the megagametophyte. Then, pollen enter ovule through micropyle, they move towards the egg by way of extending a pollen tube, after which fertilization occurs. Finally, the embryo is formed and develops into a gymnosperm seed.
Conifers possess a series of properties that makes exploring their molecular biology through a genomics approach challenging, such as a long life cycle, a reproductive process lasting months or even years, a gigantic genome size and so forth [1,2]. We aimed to explore the molecular mechanism of conifer reproduction by generating transcriptome data through RNA-seq, of successive stages of the developing pinecone. RNA-seq technology has advanced to the stage where it is highly efficient, sensitive and accurate.
By studying the expression dynamics of differentially expressed genes (DEGs) in pinecones, we sought to gain insight into the relevant genes that control the reproductive process of conifers. Previous empirical studies have suggested several genes linked to reproduction, such as DAL [3], MADS-box [4,5], MYB [6] and MSI [7] and so forth. However, studies of the determinants of development and regulation of reproduction have concentrated on model angiosperm species so far, while gymnosperms remain largely understudied.
We centered our studies around two types of conifers, naturally occurring in China. First, Pinus massoniana, also known as the masson pine, is an economically important conifer species. It is mainly distributed in various southern Chinese provinces, at altitudes below 700 m around the middle and lower reaches of the Yangtze River. It offers wood and pulp for manufacturing furniture and paper and also supplies natural resin, which can be further manufactured into 'resin,' a crucial product used in the maintenance of instrument strings and as an ingredient in medicine. Furthermore, P. massoniana fulfills a significant ecological role by replacing or compensating natural forest destruction due to its fast growth and abundant biomass.
The second, Pinus hwangshanensis, primarily grows in southeastern China. It grows most abundantly at an altitude above 900 m around the middle and lower reaches of the Yangtze River. Growing at a higher altitude limits its speed of growth, as well as accumulation of biomass and the resin compared to P. massoniana. P. hwangshanensis is often viewed as luxurious and graceful, making it an ideal ornamental tree.
Mountain Lushan (Figure 1a) is located within the distribution area of both conifer species. Its peak has an altitude of 1474 m and it supports both P. massoniana and P. hwangshanensis vegetation at the afore mentioned altitudes. Where P. massoniana and P. hwangshanensis distribution overlaps, a natural introgression hybrid of both species occurs, sharing phenotypic characters of both its parents ( Figure 1b) [8,9]. Due to the hybrid not being named yet, we refer to it as the 'Z pine' in this article. The Z pine has an extremely low germination and ripening rate compared to both P. massoniana and P. hwangshanensis [10]. These characters could indicate that the Z pine displays genetic incompatibilities during fertilization and/or even embryonic development. What causes this phenomenon is still unknown.
In this study, we collected seven successive development stages of open-pollinated cones P. massoniana and the Z pine, respectively. Then we characterized the transcriptome of these cones using Illumina high-throughput sequencing technology and forty two cDNA libraries were constructed. A series of experiments was performed to mine candidate genes, focusing on differential expression patterns between these two species. Moreover, differentially expressed genes related to fertilization and embryonic development were determined and analyzed in both taxa. This study could help explaining the defect of the Z pine of its unusual low ripening rate and germination rate, comparing to P. massoniana and may provide an approach to understanding difference between species and its introgressive hybrid at the transcriptome level.

Sample Collection
Differently staged, openly pollinated cones of P. massoniana and the Z pine were collected on Mt. Lushan, Jiujiang, China (Table 1, Figure 2). Due to complex environment of forest land and wind-

Sample Collection
Differently staged, openly pollinated cones of P. massoniana and the Z pine were collected on Mt. Lushan, Jiujiang, China (Table 1, Figure 2). Due to complex environment of forest land and wind-pollinated way of pine, sampled individuals (especially the Z pine) may possess different level of introgressive background. In order to minimize the different between them, we assigned five maternal trees of each taxa (P. massoniana or the Z pine) in its sample plot. Experience tells us that P. massoniana carries out its pollination at around 10 April while the Z pine at around 20 April at sample plots. Aware of this condition, we conducted our first sample collection on 27 April (Table 1), when both P. massoniana and the Z pine are already pollinated. The cones were packed with aluminum-foil shortly after collection and then immediately submerged in liquid nitrogen, after which they were stored in a −80 • C freezer until RNA extraction. pollinated way of pine, sampled individuals (especially the Z pine) may possess different level of introgressive background. In order to minimize the different between them, we assigned five maternal trees of each taxa (P. massoniana or the Z pine) in its sample plot. Experience tells us that P. massoniana carries out its pollination at around 10 April while the Z pine at around 20 April at sample plots. Aware of this condition, we conducted our first sample collection on 27 April (Table 1), when both P. massoniana and the Z pine are already pollinated. The cones were packed with aluminum-foil shortly after collection and then immediately submerged in liquid nitrogen, after which they were stored in a -80°C freezer until RNA extraction.   Table 1. (Scale = 10 mm.).

RNA Extraction and Sequencing
We randomly collected three to six cones for RNA isolation of each sample code to make sure that these cones and their RNA could be representative. Cones were taken from the -80°C freezer and  Table 1. (Scale = 10 mm.).

RNA Extraction and Sequencing
We randomly collected three to six cones for RNA isolation of each sample code to make sure that these cones and their RNA could be representative. Cones were taken from the −80 • C freezer and briefly re-frozen in liquid nitrogen to further weaken tissue, cut into pieces, after which sections containing ovules (or seeds) were collected and crushed. RNA was extracted from each sample using the Bioteke Plant RNA Extraction Kit (Beijing, China). Three replications of RNA were extracted for each sample code. Purity and quality of the RNA samples was checked respectively by measuring 260 nm/280 nm UV absorption values with a Nanodrop 2000 (Thermo Fisher Scientific, Waltham, MA, After RNA extraction, magnetic oligo (dT) beads were used to purify mRNA, which was then collected using RNeasy RNA reagent. The mRNA was then cut into small fragments using the RNA Fragment Reagent (Illumina, San Diego, CA, USA) and subsequently cleaned using an RNeasy RNA Cleaning Kit (Qiagen, Germany). First-strand cDNA was then synthesized using MMLV reverse transcriptase (Takara, Japan), while second-strand cDNA synthesis was performed using DNA Polymerase I and RNase H. cDNA was finally sequenced on an Illumina Hiseq X Ten (Illumina, USA). The sequencing raw data were submitted to the NCBI Short Reads Archive (SRA) database under the BioProject accession number PRJNA482692.

Differentially Expressed Genes (DEGs) and Gene Expression Pattern Analysis
Calculation of unigene read counts was performed using RNA-Seq by Expectation-Maximization (RSEM) software [22]. RSEM results were transformed into FPKM [23] values (expected number of Fragments Per Kilobase of transcript sequence per Millions base pairs sequenced), commonly used for measuring gene expression levels. DESeq was used to determine differentially expressed genes of different transcript libraries [24]. Differentially expressed genes were assigned based on a threshold value of FDR (false discovery rate) ≤ 0.001 and|log 2 Ratio| ≥ 2. Gene expression patterns of P. massoniana and the Z pine were assembled by Short Time-series Expression Miner (STEM) [25].

Validation by Quantitative Real-Time PCR (qRT-PCR)
Quantitative real-time PCR was applied for validating differentially expressed genes detected by our RNA-seq analysis. Primers were designed using the NCBI Primer-Blast Tool [26] and synthesized by Generay Biotech Co., Ltd. (Shanghai, China). cDNA samples for qRT-PCR were synthesized using the Vazyme HiScript II Q RT SuperMix for qPCR (Nanjing, China). qRT-PCR was carried out using an Applied Biosystems 7500 PCR cycler (Thermo Fisher Scientific Corporation, CA, USA) and Vazyme ChamQ SYBR qPCR Master Mix (Nanjing, China) as reaction reagent kit. Each sample was run in triplicate, with samples having a final volume of 20 µL: containing 10 µL of ChamQ SYBR qPCR Master Mix (2×), 0.4 µL of each primer, 2 µL of cDNA and 7.2 µL of ddH 2 O. The reaction program was according to standard product instructions. An Actin gene that was discovered from RNA-seq data (Unigene69821_All) was utilized as reference gene. The qRT-PCR data was analyzed with the 2 −∆∆Ct method [27].

Illumina Sequencing and Assembly
A total of 160 Gb raw data was obtained. It has an average of 53 million reads per library. We evaluated the quality of the original and clean sequencing data of all samples. The Q30 value has a range of 88.49~92.33 % for the original sequencing data (Table S1) and a range of 91.75~94.31% for the clean data (Table S2), indicating that this data set is ready for further assembly. All transcripts that we obtained from the staged pine cones of our two conifer species were assembled into 93,291 unigenes (see Materials and methods for details, Table 2). About 39.88% of them exceeded 2 kb in length, while 37.02% unigenes have a length from 1 kb to 2 kp and 23.1% were 100 bp to 1 kb ( Figure S1). The average length of unigenes is 1987 nt, while N50 is 2494 nt. 1,2 When all samples are assembled, they would express much higher abundance than single samples, therefore the data of 'All' is usually higher than others.

Functional Annotation of P. massoniana and Z Pine Unigenes
The assembled unigenes were annotated against the SwissProt, NCBI non-redundant protein and nucleotide sequences (Nr and Nt), Kyoto Encyclopedia of Genes and Genomes (KEGG), Cluster of Orthologous Groups of proteins (COG) and Gene Ontology (GO) databases. A total of 86,006 unigenes were annotated in P. massoniana and the Z pine, of which 25,150 unigenes were annotated against all six databases ( Figure S2).
We were able to annotate most genes using the Nr database, using sequences from Picea sitchensis for the bulk of the annotation (30,943), after which Amborella trichopoda (7055) and Indian lotus (4504) provided most annotations, suggesting that these sequenced species are most closely related to P. massoniana and the Z pine ( Figure S3).
We then used data from the COG, GO and KEGG databases for unigene functional prediction. Using the COG database, 30,017 unigenes could be annotated and were classified into 24 functional categories. The 'general function prediction only' was the most abundant, followed by 'transcription' and 'replication, recombination and repair,' 'function unknown' and 'signal transduction mechanisms' (Figure 3).
A total of 229,950 redundant unigenes (with 38,619 nonredundant unigenes) were annotated into 56 sub-categories under three primary GO categories: biological process, cellular component and molecular function (Figure 4). The top three sub-categories were metabolic process (23,854 unigenes), cellular process (22,439 unigenes) and catalytic activity (20,324 unigenes).

Analysis of Expected Number of Fragments per Kilobase of Transcript Sequence per Million Base Pairs Sequenced (FPKM)
FPKM values were calculated using RSEM software. The general density distribution of expression quantity (Figure 6) was analyzed and showed that the average total of expressed mRNAs across all unigenes of P. massoniana and the Z pine varies between species and stages.

GO Classification and KEGG Enrichment Assessment of Differentially Expressed Genes (DEGs) at Successive Pinecone Stages
We determined and compared the number of up-and down-regulated genes between the two pine species at the seven different developmental stages of the pine cones collected (Figure 7 and Figure S4). In comparison group A, C and G, more down-regulated genes were found than upregulated ones, while in group B, D, E and F, there was more up-regulated genes than downregulated ones (Figure 7).
To gain more insight in differential regulation of genes related to the pinecone reproductive process, we performed GO classification of all DEGs at every developmental stage ( Figure S5). The

Analysis of Expected Number of Fragments per Kilobase of Transcript Sequence per Million Base Pairs Sequenced (FPKM)
FPKM values were calculated using RSEM software. The general density distribution of expression quantity (Figure 6) was analyzed and showed that the average total of expressed mRNAs across all unigenes of P. massoniana and the Z pine varies between species and stages. We then determined whether specific cellular processes are differentially affected at each pinecone stage by performing a KEGG pathway enrichment analysis ( Figure S7).

GO Classification and KEGG Enrichment Assessment of Differentially Expressed Genes (DEGs) at Successive Pinecone Stages
We determined and compared the number of up-and down-regulated genes between the two pine species at the seven different developmental stages of the pine cones collected (Figure 7 and Figure S4). In comparison group A, C and G, more down-regulated genes were found than up-regulated ones, while in group B, D, E and F, there was more up-regulated genes than down-regulated ones (Figure 7). To gain more insight in differential regulation of genes related to the pinecone reproductive process, we performed GO classification of all DEGs at every developmental stage ( Figure S5). The number of differentially expressed unigenes classified in the 'reproduction' category under 'biological process' at each stage is listed ( Figure S6). Three groups displayed more up-than down-regulated genes: ZA-MA, ZD-MD and ZF-MF. While the other four groups showed more down-than up-regulated genes: ZB-MB, ZC-MC, ZE-ME and ZG-MG. The ZG-MG group contains 101 down-regulated genes, almost double the number of up-regulated genes (51 genes).
We then determined whether specific cellular processes are differentially affected at each pinecone stage by performing a KEGG pathway enrichment analysis ( Figure S7).

Quantitative Real-Time PCR Validation
We randomly selected five unigenes for validation of the accuracy of our RNA-seq data set using qRT-PCR. The following unigenes were randomly chosen: Unigene12135_All, Unigene31229_All, Unigene5965_All, Unigene69986_All and Unigene71003_All. We tested cDNA derived from four samples: MA, ME, ZA, ZE, which were collected in different years from the two different species. Details of sequences and primers were list on Tables S3 and S4, respectively. Validation results shows a reliable correlation between RNA-seq and qRT-PCR ( Figure S8).

Temporal Gene Expression Profiles of P. massoniana and the Z Pine
We analyzed gene expression dynamics of all unigenes across pinecone developmental stages for both species and clustered these into 49 unique expression profiles. The eighteen most frequently occurring profiles for each species are shown in Figure 8a (Figure 8a). In addition, we carried out a profile comparison between P. massoniana and the Z pine. Every single profile of P. massoniana is listed and similar ones of Z pine are placed on the right of it in Figure 8b.
occurring profiles for each species are shown in Figure 8a. Within P. massoniana, the top five expression profile types are 16, 10, 31, 34 and 40, with respectively 8,416, 7,967, 5,532, 3,291 and 3,092 genes showing these expression dynamics. In the Z pine the most frequently occurring profiles are profile types 40, 10, 27, 29 and 44, represented by 5,610, 4,336, 3,441, 3,371 and 3,361 genes. P. massoniana and the Z pine shared fourteen profile types among their respective top 18 profiles ( Figure  8a). In addition, we carried out a profile comparison between P. massoniana and the Z pine. Every single profile of P. massoniana is listed and similar ones of Z pine are placed on the right of it in Figure  8b.

Reproductive Genes Are Differentially Expressed between P. massoniana and the Z Pine
Next, we aimed to see whether genes related to reproduction might be differentially expressed between P. massoniana and the Z pine, potentially explaining the reproductive problems that the Z pine experiences. We looked for several DEGs involved in processes such as pollen development, pollen exine formation, pollen tube growth and development of the female gametophyte, endosperm, embryo and/or embryo sac according to recent reports (Table S5). Some of these genes show consistently higher expression in P. massoniana than in the Z pine, including: ACA7 (Ca 2+ -ATPase),

Reproductive Genes Are Differentially Expressed between P. massoniana and the Z Pine
Next, we aimed to see whether genes related to reproduction might be differentially expressed between P. massoniana and the Z pine, potentially explaining the reproductive problems that the Z pine experiences. We looked for several DEGs involved in processes such as pollen development, pollen exine formation, pollen tube growth and development of the female gametophyte, endosperm, embryo and/or embryo sac according to recent reports (Table S5). Some of these genes show consistently higher expression in P. massoniana than in the Z pine, including: ACA7 (Ca 2+ -ATPase), MPK4 (mitogen-activated protein kinase), QRT2 (polygalacturonase), TKPR1 (tetraketide alpha-pyrone reductase 1), PI5K (phosphatidylinositol 4-phosphate 5-kinase), PMEs (pectin methylesterase),  pollen grain separation and is also involved in pollen development [30]. TKPR1 takes part in a biosynthetic pathway leading to hydroxylated α-pyrone compounds [31]. SHT encodes an acyltransferase that conjugates spermidine to hydroxycinnamic acids, impacting the composition of the Arabidopsis pollen wall [32,33]. NPG1 in Arabidopsis is specifically required for pollen germination [34] and not for pollen development [35]. A type B PI5K mediates Arabidopsis and Nicotiana pollen tube growth by regulating apical pectin secretion [36]. PMEs and its pro-region adjust cell wall dynamics of growing pollen tubes in Nicotiana tabacum [37]. The exocyst contributes to the morphogenesis of polarized cells in many eukaryotes, for example, SEC8 facilitates the initiation and maintenance of polarized growth of pollen tubes [38]. SWK2 has an essential role in the coordinated mitotic progression of the female gametophyte in Arabidopsis [39]. Absence of CRINKLY4 could cause an inhibition of aleurone, which is in charge of differentiation normal progression over the endosperm surface development [40]. PPR is required for embryo and seed viability in Arabidopsis, its absence leading to embryo abortion [41,42]. EYE controls golgi-localized proteins, that have an important role in cell and organ expansion [43]. EDD1 encodes plastid and mitochondria, functional absence mutation of EDD1 causes embryo lethality [44]. LEA and SERK play key roles during embryogenesis and SERK is essential for embryogenic competence [45,46]. Misexpression of BLH1 leads to a cell-fate switch of synergid to egg cell in the Arabidopsis eostre mutant embryo sac [47].

Gymnosperm Gene Annotation Using Transcriptome Data
Transcriptome analysis based on RNA sequencing is an effective way to explore the huge genomes of plants like gymnosperms. Several RNA sequencing studies related to gymnosperms have previously been reported [48][49][50][51]. Yet until now few studies have focused on the development of Pinus reproductive organs. In Pinus tabuliformis, unusual bisexual cones were found; here, the gene expression pattern of MADS-box transcription factors, FT/TFL1-like and LFY/NDLY genes was compared between unisexual and bisexual cones [52]. In Pinus bungeana, 39.62 Gb of RNA sequencing data was analyzed from two kinds of sexual cones, obtaining 85,305 unigenes, 53,944 (63.23%) of which were annotated in public databases [53].
In this study, we collected a total of 160 Gb of RNA sequence data from P. massoniana and its introgression hybrid at seven different stages of pinecone development. N50 is a key parameter in genome or transcriptome assembly. It is defined as the sequence length of the shortest contig at 50% of the entire genome or transcriptome length. In principle, the higher the N50 value, the better the sequencing quality. We obtained an N50 of 2,494 bp in all-unigene, compared to previously obtained values of (N50 = 551 bp) for Picea abies [48] and (N50 = 1,942 bp) for P. bungeana [53], which means the quality of our sequencing data improves on previously available data.
A total of 30,943 genes (47.05%, rank 1) were annotated to Picea sitchensis through the Nr database, with further annotations being 1,174 (1.79%, rank 7) to Pinus tabuliformis, 1,074 (1.63%, rank 8) to Pinus taeda, 697 (1.06%, rank 12) to Pinus monticola, 401 (0.61%, rank 18) to Pinus radiata and 376 (0.57%, rank 19) to Picea abies; all these species are conifers and belong to the Pinaceae family. Out of these, two (Pinus taeda and Picea abies) had their genomes sequenced [1,54]. The genome sizes of Pinus taeda and Picea abies are 21.6 Gb and 19.7 Gb, respectively. Pines have an estimated genome size ACA7 belongs to the auto-regulated Ca 2+ -ATPase family, which is exclusively detected in developing flowers of Arabidopsis and participates in the regulation of Ca 2+ homeostasis [28]. MPK4 plays an important role in plant growth, development and male fertility [29]. QRT2 is necessary for pollen grain separation and is also involved in pollen development [30]. TKPR1 takes part in a biosynthetic pathway leading to hydroxylated α-pyrone compounds [31]. SHT encodes an acyltransferase that conjugates spermidine to hydroxycinnamic acids, impacting the composition of the Arabidopsis pollen wall [32,33]. NPG1 in Arabidopsis is specifically required for pollen germination [34] and not for pollen development [35]. A type B PI5K mediates Arabidopsis and Nicotiana pollen tube growth by regulating apical pectin secretion [36]. PMEs and its pro-region adjust cell wall dynamics of growing pollen tubes in Nicotiana tabacum [37]. The exocyst contributes to the morphogenesis of polarized cells in many eukaryotes, for example, SEC8 facilitates the initiation and maintenance of polarized growth of pollen tubes [38]. SWK2 has an essential role in the coordinated mitotic progression of the female gametophyte in Arabidopsis [39]. Absence of CRINKLY4 could cause an inhibition of aleurone, which is in charge of differentiation normal progression over the endosperm surface development [40]. PPR is required for embryo and seed viability in Arabidopsis, its absence leading to embryo abortion [41,42]. EYE controls golgi-localized proteins, that have an important role in cell and organ expansion [43]. EDD1 encodes plastid and mitochondria, functional absence mutation of EDD1 causes embryo lethality [44]. LEA and SERK play key roles during embryogenesis and SERK is essential for embryogenic competence [45,46]. Misexpression of BLH1 leads to a cell-fate switch of synergid to egg cell in the Arabidopsis eostre mutant embryo sac [47].

Gymnosperm Gene Annotation Using Transcriptome Data
Transcriptome analysis based on RNA sequencing is an effective way to explore the huge genomes of plants like gymnosperms. Several RNA sequencing studies related to gymnosperms have previously been reported [48][49][50][51]. Yet until now few studies have focused on the development of Pinus reproductive organs. In Pinus tabuliformis, unusual bisexual cones were found; here, the gene expression pattern of MADS-box transcription factors, FT/TFL1-like and LFY/NDLY genes was compared between unisexual and bisexual cones [52]. In Pinus bungeana, 39.62 Gb of RNA sequencing data was analyzed from two kinds of sexual cones, obtaining 85,305 unigenes, 53,944 (63.23%) of which were annotated in public databases [53].
In this study, we collected a total of 160 Gb of RNA sequence data from P. massoniana and its introgression hybrid at seven different stages of pinecone development. N50 is a key parameter in genome or transcriptome assembly. It is defined as the sequence length of the shortest contig at 50% of the entire genome or transcriptome length. In principle, the higher the N50 value, the better the sequencing quality. We obtained an N50 of 2494 bp in all-unigene, compared to previously obtained values of (N50 = 551 bp) for Picea abies [48] and (N50 = 1942 bp) for P. bungeana [53], which means the quality of our sequencing data improves on previously available data.
A total of 30,943 genes (47.05%, rank 1) were annotated to Picea sitchensis through the Nr database, with further annotations being 1174 (1.79%, rank 7) to Pinus tabuliformis, 1074 (1.63%, rank 8) to Pinus taeda, 697 (1.06%, rank 12) to Pinus monticola, 401 (0.61%, rank 18) to Pinus radiata and 376 (0.57%, rank 19) to Picea abies; all these species are conifers and belong to the Pinaceae family. Out of these, two (Pinus taeda and Picea abies) had their genomes sequenced [1,54]. The genome sizes of Pinus taeda and Picea abies are 21.6 Gb and 19.7 Gb, respectively. Pines have an estimated genome size ranging from 18 Gb to 40 Gb [55][56][57]. This indicated that a lot of novel genes in P. massoniana and the Z pine still completely unknown and expect to discover more in the future.

Impact of Introgression in Expression Levels
Introgressive hybridization implies repeated backcrossing of hybrids with parental species [58]. Hybridization between pines exists frequently in nature [59,60]. As the pollen of pines is mainly moved by wind, it could spread to a vast area. In that case, the element consisting of individuals could be with various proportions of parental genomes. Therefore, those differences between individuals could lead to diversity on gene expression, particularly in genes relate to reproduction. In sample collection, we conducted mixture of cones for each sample code and also made a mixture apply to RNA isolation to reduce the possible expression bias through analysis process.

Differential Expression of Reproductive Genes Could Relate to Delayed Maturation of Z Pinecones
Within the Pinus genus, some female cones take 1.5 to 3 years to mature after pollination, while for P. massoniana and the Z pine specifically, it takes around 1.5 years to do that. Around the middle and lower reaches of the Yangtze River, these two pines are often pollinated in April and mature cones emerge in November of the next year, a long time compared to most angiosperms. The structure of a pinecone is rather complex compared to an angiosperm flower. Therefore, more genes relate to reproduction may exist in cone than in flower and more pathways of these genes may occur in this process also.
Genes directly related to reproduction in gymnosperms have only rarely been reported. One of them is MADS genes, which are well studied relatively, for example in Gnetum spp. [61,62], Ginkgo biloba [63], Picea abies [64] and Cryptomeria japonica [4]. LEAFY is also a crucial kind of gene that involve in reproductive process in Welwitschia mirabilis [65] and Pinus caribaea var. Caribaea [66], as well as NEEDLY in Pinus radiata [67]. We collected several such genes from model plants (e.g., Arabidopsis thaliana) and analyzed their expression level in the two pines. We found that ACA7, MPK4, QRT2, TKPR1, PI5K, PMEs, SEC6, SEC15, SWK2, PPR, EMB, LEA, SERK, BLH1 showed a higher expression level in P. massoniana than in the Z pine. This result indicates that the Z pine may have a lowered expression level of genes related to pollen development, pollen exine formation, pollen tube growth and female gametophyte, embryo and/or embryo sac development, compared to P. massoniana. This outcome provides further understanding towards a possible molecular mechanism responsible for the altered reproduction process of the Z pine in comparison to P. massoniana.

Conclusions
P. massoniana and P. hwangshanensis mainly grow in southern China and produce an introgression hybrid, which we here temporarily named 'the Z pine,' on Mt. Lushan, where both species can be found. This Z pine has morphological characters derived from both parent species, yet has an ultra-low germination and ripening rate. In order to understand the molecular mechanism that might be causing this delayed reproduction, we collected cones from P. massoniana and the Z pine of seven successive developmental stages and determined their transcriptome. Herein we might discover differentially expressed genes underlying the observed reproductive delay. We obtained 93,291 unigenes with an average length size of 1987 bp and 2494 bp of N50. We identified significantly differentially expressed genes (DEGs) in all seven cone growth stages. We screened for DEGs related to reproduction, such as pollen tube growth, development of the female gametophyte and embryo and so forth. Several potentially vital genes were identified and the expression levels of the two pines were compared and analyzed. These results may offer insight into the molecular mechanisms of reproductive process between the two pines and several other plants that with similar differential mode.
Supplementary Materials: The following are available online at http://www.mdpi.com/1999-4907/10/3/230/s1. Table S1: Quality of original sequencing data of P. massoniana and the Z pine; Table S2: Quality of clean data of P. massoniana and the Z pine; Table S3: Sequences of unigenes for qRT-PCR validation; Table S4: Primers for qRT-PCR; Table S5. Genes relate to reproduction process in P. massoniana and the Z pine; Figure S1: Length distribution of all unigene sequences; Figure S2: Venn diagram of annotation against to five databases: NCBI Nr, NCBI Nt, SwissProt, COG and KEGG for all unigenes; Figure S3. The species distribution of P. massoniana and the Z pine unigenes against the NCBI Nr database; Figure S4: Differentially expressed genes analysis between P. massoniana and the Z pine of each stage; Figure S5: Gene ontology (GO) functional classification of Z pine versus P. massoniana DEGs in seven stages; Figure S6: DEGs relate to reproduction in GO classification of Z pine versus P. massoniana in seven stages; Figure S7: KEGG pathway enrichment analysis of Z pine versus P. massoniana in seven stages; Figure S8: Comparison of unigene expression results between RNA sequencing (FPKM) and qRT-PCR.