Recurrent single-gene duplication drives the expansion and expression diversification of the ADH gene family in pear and other Rosaceae species

Background Alcohol dehydrogenases (ADHs) are essential to plant growth and the formation of aromatic compounds in fruits. However, the evolutionary history and characteristics of ADH gene expression remain largely unclear in Chinese white pear ( Pyrus bretschneideri ) and other fruit species from the family Rosaceae. Results In this study, 464 ADH genes were identified in eight Rosaceae fruit species and 68 of the genes were from pear. Based on the analyses of phylogeny and conserved motifs, the pear ADH genes were classified into four subgroups (I, II, III, and IV). The chromosomal distribution of the genes was found to be uneven and numerous clusters of physically linked ADH genes were detected. Frequent single-gene duplication events were found to have contributed to the formation of ADH gene clusters and the expansion of the ADH gene family in these eight Rosaceae species. Purifying selection was the major force in ADH gene evolution. The younger genes derived from tandem and proximal duplications had evolved faster than those that derived from other types of duplication. RNA-sequencing and quantitative-real time-PCR analysis revealed that the expression levels of three ADH genes were closely correlated with the content of aromatic compounds that are found during fruit development. Conclusion Comprehensive analyses were conducted in eight Rosaceae species and 464 ADH genes were identified. The results of this study provide new insights into the evolution and expression characteristics of ADH family genes in pear and other Rosaceae species. nonan–3-ol, nonan–3-ol, 3-hexanol, 3, 4-diethyl–3-hexanol, and heptanol) during pear fruit development and performed correlation analysis between the expression profiles of ADH genes and content changes in C6-C9 alcohols during the four development stages of fruit using SPSS Statistics tool and significance value was set P < 0.05. Stages S1-S4 were used to denote four different stages of fruit development: 15 days after flowering (DAF), 45 DAF, 90 DAF, and 120 DAF. In this study, S1 = S2 S3 and S4

Pear is one of the most widely grown commercial fruits in the global market. These fruits are cultivated in all temperate regions worldwide. The content of aromatic compounds is important to the quality of flavor in the pear and, hence, improving the aromatic content is a direct method of improving flavor. Fruits often contain more than 1,000 volatile compounds and the main aromatic components in pear include esters, alcohols, aldehydes, ketones, lactones, and terpenoids [27][28][29][30].
Alcohol and aldehyde substances form primary components of aromas in fruits and so these compounds are vital to fruit quality traits. The production of aromatic compounds has been investigated superficially in grape, apple, tomato, apricot, and peach [19, 26, 31-35] but fewer genome-wide annotation and evolutionary studies of ADH genes have been performed in pear or other Rosaceae fruit species. At present, the genome sequences of eight Rosaceae fruit species have been released and they include those of the Chinese white pear (Pyrus bretschneideri Rehd.) [36] and seven other Rosaceae species: apple (Malus domestica) [37], peach (Prunus persica) [38], sweet cherry (Prunus avium) [39], black raspberry (Rubus occidentalis) [40], strawberry (Fragaria vesca) [41,42], Japanese apricot (Prunus mume) [43], and European pear (Pyrus communis) [44,45]. These genomic resources lay a foundation for performing comparative analyses of the ADH gene family among different Rosaceae fruit species. In this study, we identified members of the ADH gene family in pear and the seven other Rosaceae fruit species, as described above, and unraveled the evolutionary history of ADH family genes based on comprehensive analyses of phylogeny, conserved domains, selective pressures, syntenic relationships, and gene duplication events. Moreover, we investigated expression patterns of ADH family genes based on transcriptome data from different pear tissues and quantitative-real time (qRT)-PCR analysis. Several candidate genes closely associated with alcohol and aldehyde biosynthesis were identified using correlation analysis of alcohol and aldehyde content changes and gene expression profiles. The results of this study provide insights into the evolution and functional roles of the ADH gene family.

Results
Non-random chromosomal distribution of ADH genes in pear and seven other Rosaceae species A Hidden Markov Model (HMM) was used to identify ADH family genes in pear and seven other Rosaceae fruit species. A total of 464 ADH genes were identified, of which 68 were identified in Chinese white pear, 68 in European pear, 82 in apple, 68 in peach, 37 in strawberry, 61 in Japanese apricot, 37 in black raspberry, and 43 in sweet cherry (Fig. 1). Lineage-specific whole-genome duplication (WGD) was found to have occurred in the ancestor of Chinese white pear, European pear, and apple, which may have resulted in the higher number of ADH genes in these three species than in strawberry and black raspberry (Fig. 1). Indeed, we found more ADH genes located in syntenic blocks between apple (11 syntenic pairs) and pear (seven syntenic pairs). However, the number of ADH genes in peach and Japanese apricot is similar to that in apple and pear, although peach and Japanese apricot have not experienced a recent WGD.
To determine the mechanism for the expansion of ADH genes in peach and Japanese apricot in the absence of recent genome duplication, we investigated the chromosomal distribution of ADH genes in each species. The distribution of ADH genes is uneven on different chromosomes and homologous gene clusters were observed more often in each investigated species (Fig. 2). In peach, we found several gene clusters of homologous ADH genes on Chr3, Chr6, and Chr8 and the cluster size ranged from 3 to 7. A total of 22 genes were identified on Chr1, which is the highest number of ADH genes on all 17 chromosomes of apple. In Japanese apricot, eight ADH gene clusters were located on Chr1, Chr2, Chr4, and Chr6 and, as in peach, the size of the gene clusters ranged from 3 to 7. It is worth noting that a strong syntenic relationship was found between two gene clusters located on Chr1 and Chr7 in pear and apple. The explosion of gene clusters may account for the expansion of the ADH gene family in Rosaceae species.
Single-gene duplication largely contributed to the expansion of the ADH gene family in Rosaceae species.
In addition to WGD, single-gene duplication events including tandem, proximal, transposed, and dispersed duplications also played important roles in the formation of local gene clusters and gene family expansion. We performed genome-wide identification of different modes of gene duplication in each of the eight Rosaceae species. To infer the evolutionary origins of ADH family genes, we searched for different types of duplicated gene pairs that contained ADH genes and classified them into five modes of gene duplication (Fig. 3, Table S2). We found that 90.5% of ADH genes were derived from single-gene duplications in pear, while 86.0% were derived in this way in apple, 98.9% in peach, 96.8% in strawberry, 98.3% in Japanese apricot, 97.7% in sweet cherry, 97.3% in black raspberry, and 96.3% in European pear. In contrast, only 1.1-14.0% of ADH genes were derived from WGD in the eight Rosaceae species and a relatively higher proportion of WGD-derived ADH genes were found in pear (9.5%) and apple (14.0%) due to lineage-specific genome duplication. ADH genes experienced a high frequency of tandem (12.2-20.2%) and proximal duplications (0-14.6%) in each of the investigated species, which contributed to the formation of the ADH gene clusters we have observed. In addition, dispersed duplications account for the highest number of derived genes (53.0-64.04%) in all the species investigated. However, the mechanism underlying dispersed gene duplication remains unclear.
Recurrent tandem and proximal duplication occurred following whole-genome duplication The Ks (synonymous substitutions per site) value is usually used to estimate the evolutionary dates of genome or gene duplication events [46,47]. Here, the Ks value was estimated for each gene pair ( Fig. 4 and Table S3). In pear, two WGD events were detected including an ancient WGD event which corresponds to the paleo-hexaploidization (γ) event shared by core eudicots that took place ~140 Mya (Ks ~1.5-1.8) [37] and the recent WGD that is inferred to have occurred 30-45 Mya (Ks ~0.15-0.3) [36]. The Ks values of the majority of WGD-derived ADH gene pairs ranged from 0.14 to 0.28, suggesting that these genes may descend from the more recent WGD event. It is notable that the Ks values of ADH gene pairs derived from tandem and proximal duplications are much lower than those derived from the WGD, except for pear, suggesting that ADH genes experienced frequent small-scale gene duplications after the ancient or recent genome duplication events and are younger in age. ADH gene pairs derived from transposed duplications have high Ks values and the median Ks distribution is close to that of WGD-derived gene pairs. ADH gene pairs derived from WGD have higher Ks values in peach, sweet cherry, Japanese apricot, black raspberry, and strawberry than in pear and apple, suggesting that these ADH genes have been retained from the ancient eudicot γ duplication event.
ADH genes evolved under strong purifying selection Ks, Ka, and Ka/Ks values of paralogous ADH gene pairs in eight Rosaceae species were estimated for each gene pair ( Fig. 4 and Table S3). The Ka/Ks ratio has been widely used as an index for measuring the strength and direction of selection pressure. Ka/Ks > 1 indicated positive selection; Ka/Ks = 1 demonstrated neutral evolution; and Ka/Ks < 1 suggested negative (or purifying) selection [48].
Purifying selection can eliminate deleterious mutations and positive (Darwinian) selection can induce and fix advantageous mutations [49]. Our calculated Ka/Ks ratios for all paralogous ADH gene pairs in each species showed that they were less than one, indicating that purifying selection was the main force behind ADH family gene evolution in Rosaceae species. ADH genes derived from tandem and proximal duplications showed high Ka/Ks ratios in the species investigated, suggesting that these genes evolved at a faster evolutionary rate, which is a feature of new genes. In summary, the results from this and the aforementioned analysis supported the hypothesis that tandem and proximal duplications occurred more recently generated new ADH genes that contributed to the formation of homologous ADH gene clusters and supplied the expansion of the ADH gene family ( Fig. 5 and Table   S4).

Microsyntenic relationships among orthologous ADH genes from eight Rosaceae species
In this study, we identified syntenic blocks among eight Rosaceae species by performing interspecies syntenic analysis. Nine ADH genes in pear were found to have orthologous syntenic genes in seven other species, while 16 had orthologous syntenic genes in one of the other seven species (Table S5).
Surprisingly, good collinearity was detected among the eight Rosaceae species to the nine pear ADH genes, even after speciation and long-term evolution, which suggested that these genes originated before diversification of the Rosaceae species and may have conserved functional roles. For example, the orthologous syntenic gene cluster for pear ADH gene Pbr012701.1 includes MD07G1250800 (apple), PCP001961.1 (European pear), Pav_sc0001102.1_g440.1.mk (sweet cherry), Prupe.2G274500 (peach), Pm019522 (Japanese apricot), Ro07_G12285 (black raspberry), and FvH4_7g27140.1 (strawberry). The genomic regions around Pbr012701.1 also showed strong syntenic relationships with their counterparts in the other seven Rosaceae species ( Fig. 6 and Table S6). It is noteworthy that the directions of the ADH and surrounding genes within a distance of 500 kb on Chr5 in Japanese apricot were the inverse of the seven other species, which indicated that a chromosomal inversion occurred after the divergence of Japanese apricot from other Rosaceae species. In addition, seven ADH genes in pear were found to have no syntenic counterpart in the other seven species, suggesting that these genes were newly duplicated in the pear genome after the divergence of pear and apple.
We found that five of seven ADH genes were derived from tandem or proximal duplications. This result supports the aforementioned finding that recent small-scale gene duplications have been important in the expansion of ADH gene family.

Phylogenetic analysis of ADH family genes in pear
Based on phylogenetic analysis and conserved motif analysis, 68 ADH family genes in pear were classified into four subgroups (Fig. 7). Group-I and Group-II both contain 14 genes, Group-III contains 18 genes, and Group-IV contains 22 genes.
The Multiple EM for Motif Elicitation (MEME 5.05) motif search tool was used to predict conserved domains in ADH protein sequences ( Fig. 2 and Table S7). The type and distribution of conserved motifs of ADH genes was similar within subgroups, supporting the classification results of the phylogenetic analysis. A total of 19 motifs were detected in all the ADH genes and the number of motifs contained in ADH protein sequences varied in different subgroups. The Group IV genes encoded more conserved motifs than other subgroups, while genes of Group III encoded fewer motifs.
Motifs 1, 7, 2, and 9 were detected in almost all of the ADH genes, whereas motif 15 was only detected in group II; motif 17 was only detected in group III; and motif 4 was present only in group IV. Motifs 1, 3, 6, 7, 16, and 19 corresponded to the ADH_N domain and motifs 8, 9, 11, 17, and 18 were identified in the ADH_zinc_N domain. In clade I, motifs 1, 3, and 19 represented the ADH_N domain in all genes, except for Pbr033945.1, and motifs 9, 11, and 18 related to the ADH_zinc_N domain and were found in all ADH genes. In clade II, motifs 1 and 3 were identified in the ADH_N domain and motifs 8 and 9 were identified in the ADH_zinc_N domain. Motifs 7, 6, and 16 were represented in the ADH_N domain and motifs 17 and 19 were represented in the ADH_zinc_N domain and were found only in clade III.
The structures of ADH genes in pear were also compared among the different subgroups (Fig. S1).
The number of exons varied from 2 to 13 and 21 genes were annotated in the untranslated region.
The exon/intron structures varied in different subgroups. Group I members contained more exons and introns than other subgroups, whereas group II members contained fewer exons and introns than other subgroups. The exon/intron structures were similar between members of group III and group IV.
We further investigated the genetic features of 68 ADH genes identified in Chinese white pear, including the CDS length, MW, and PI (Table S8). The lengths of the CDSs ranged from 900 to 2,261, the peptide length of ADH proteins ranged from 312 to 887 amino acids, the PI value ranged from 5.33 to 9.32, and the MW ranged from 32.32 to 69.83 kDa. Different subgroups showed distinct gene features (Fig. S2). For example, group I members had the longest CDSs, whereas group III members had the shortest. The PI values varied greatly in each of the subgroups. Members of group III had the highest average PI values, whereas those of group I were the lowest.
Transcriptome expression profiles and qRT-PCR analysis of ADH genes in different pear tissues Based on the transcriptome data from pear fruits, pollen, leaves, petals, sepals, ovaries, stems, and buds, we investigated the expression patterns of the ADH gene family ( Fig. 8 and Table S9).
Transcripts per million (TPM) values were used to measure the gene expression level. We found that 12 ADH genes showed low or no expression in all investigated pear tissues. We further investigated the expression patterns of paralogous genes that corresponded to each of these 12 ADH genes. The results showed that 5 out of 12 ADH genes have diverged expression patterns compared to their highly expressed paralogous genes, suggesting that these five ADH genes (Pbr024187.  Furthermore, qRT-PCR analysis was performed to verify the RNA-Seq expression profiles of the aforementioned eight ADH genes. The RNA-Seq expression profiles of three ADH genes (Pbr013912.1, Pbr026289.1, and Pbr01252.1) were consistent with the results from the qRT-PCR analysis ( Fig. 9 A-C). Expression patterns of three ADH genes were similar, with high expression at S1 and a sharp decline at S2, which is consistent with the previous report that ADH activity and/or alcohol contents are high at the early stages of fruit ripening, whereas derivative esters predominate at maturity [32,34] Discussion ADH genes have been widely studied in a range of plants and they have been reported to participate in plant growth, development, and stress responses [1][2][3]. ADHs can catalyze the reciprocal transformation between alcohols and aldehydes and are involved in the production of aromatic compounds during fruit ripening. The number of ADH genes in Chinese white pear, European pear, and apple is over 1.5-fold those in black raspberry, sweet cherry and strawberry. The number found in peach and Japanese apricot was similar to that found in pear and apple and larger than that in black raspberry, sweet cherry, and strawberry.
We reconstructed the phylogenetic tree of ADH family genes in pear using the neighbor-joining method and four distinct subfamilies were determined. The results from the analysis on conserved motifs, gene features, and gene structures of ADH family genes support the classification results obtained by the phylogenetic analysis. We found that the characteristics of ADH genes were similar within each subfamily and varied among different subfamilies.
Different types of gene duplication events including WGD, tandem, proximal, transposed and dispersed duplications are the main driving forces for gene family expansion in eukaryotes [51,52].
WGD events can generate large numbers of duplicate genes in a very short period of time [53]. Pear and apple experienced a recent lineage-specific WGD, whereas strawberry, sweet cherry, Japanese apricot, black raspberry, and peach did not undergo this duplication event [36]. Indeed, the number of WGD-derived ADH gene pairs in pear and apple is far greater than in the other six species. In addition, single-gene duplications were also important to the expansion of the ADH gene family in Rosaceae species. ADH gene clusters occur frequently in Rosaceae species, which was largely attributed to the recently tandem and proximal duplication. The duplicated ADH genes generated by tandem and proximal duplications showed accelerated evolution. Evolutionary analysis suggested that purifying selection was the primary evolutionary force imposed on ADH family genes. This result is consistent with our previous observation in analyzing the evolution of the Hsf and F-box gene family [54,55].
We identified nine orthologous syntenic gene clusters among eight Rosaceae species by searching interspecies syntenic gene pairs for pear ADH genes. For example, one of the syntenic gene clusters comprised Pbr012701.1, MD07G1250800, PCP001961.1, Pav_sc0001102.1_g440.1.mk, Prupe.2G274500, Pm019522, Ro07_G12285, and FvH4_7g27140.1 and these genes were located in interspecies large synteny blocks. This result suggested that some ancestral ADH genes and surrounding genes were retained in descendants during long-term evolution after speciation and diversification of the Rosaceae. In addition, seven pear ADH genes were found to have no syntenic counterparts in other Rosaceae species, and five out of the seven ADH genes derived from tandem or proximal duplication and two of them were originated from transposed or genome-wide duplications.
This result implied that local small-scale gene duplications played important roles in generating new genes [56,57].
Previous studies showed that ADH activity and/or alcohol levels are highest at an early stage in fruit ripening, whereas, derivative esters are dominant at maturity [33,34]. It has been reported that alcohols play an important role in the process of fruit maturation in tomato, grape, and melon.
Overexpressing Following duplication, duplicated genes may undergo different evolutionary processes including subfunctionalization, neofunctionalization, conservation, or nonfunctionalization [58]. In this study, we found that the duplicated ADH gene pairs showed divergent expression in different pear tissues, suggesting that subfunctionalization occurred frequently after gene duplication. In addition, we found five ADH genes with no or few instances of expression compared to their paralogous genes in all investigated pear tissues, suggesting that these five ADH genes (Pbr024187.1, Pbr004679.1,   Pbr003626.1, Pbr040240.1, and Pbr040236.1) may haveundergone nonfunctionalization or pseudogenization. This result provides evidence for the hypothesis that two gene copies derived via gene duplication may evolve toward distinct evolutionary fates, with one of the two copies gradually losing function and undergoing pseudogenization [59].

Conclusion
In summary, a total of 464 ADH genes were identified in eight Rosaceae genomes and 68 of these genes were from Chinese white pear. Based on phylogenetic, gene structure and conserved motif analyses, ADH family genes were divided into four subfamilies (groups I-IV). Single-gene duplication largely contributed to the expansion of the ADH gene family in Rosaceae species, although lineagespecific genome duplication events in the ancestor of pear and apple also supplied ADH family expansion. Nine orthologous syntenic gene clusters were found among eight Rosaceae species after long-term evolution following Rosaceae diversification, suggesting that ADHs have highly conserved functional roles. Purifying selection was the main evolutionary force imposed on ADH genes. ADH Furthermore, all candidate ADH protein sequences were analyzed using the Pfam database (https://pfam.xfam.org) to verify the presence of GroES-like and zinc-binding domains. Any protein sequences lacking GroES-like and zinc-binding domains were removed.

Analysis of conserved motifs and gene features of ADH genes
In order to identify conserved domains among pear ADH genes, all identified protein sequences were subjected to MEME (Multiple Em for Motif Elicitation; v5.0.5) [63]. The analyses were conducted using default parameters with the following exceptions: the occurrence of motifs was set at 0 or 1 per sequence; the number of motifs was set to 20; the optimum width of motifs was 6-50 residues; and the minimum and maximum numbers of motif sites were set to 2 and 68, respectively. The gene ID, coding sequence (CDS), length of coding sequence, and number of amino acids in the sequence were acquired fromthe Pear Genome Project (http://peargenome.njau.edu.cn/). The protein isoelectric point (PI) and molecular weight (MW) for all candidate family members were computed using the ExPASy website (http://web.expasy.org/compute_pi/).

Chromosomal locations and structures of ADH genes
Information about chromosomal locations of ADH genes was obtained from genome annotation files and the data were visualized using Circos software [64]. In order to construct the gene structures displaying the intron-exon distributions, GSDS (Gene Structure Display Server v2.0; http://gsds.cbi.pku.edu.cn/) was used [65]. Genomic DNA as well as the CDSs of all the ADH genes were submitted to construct the gene structure map. A phylogenetic tree was constructed using the full-length protein sequences of ADH from pear using MEGA (version 7.0) [66] with the Neighborjoining (NJ) method [67] using the Poisson model and bootstrap values for 1000 replicates.

Synteny analysis
To identify collinear gene pairs and syntenic blocks in pear and other Rosaceae species, we used diamond software [68] to perform multiple alignments of protein sequences in the eight Rosaceae species (e-value < 10 -5 ) and then obtained genome annotation files using an in-house Perl script.

Calculation of Ka, Ks and Ka/Ks
The values of Ka (non-synonymous substitutions), Ks (synonymous substitutions), and the Ka/Ks ratio were calculated using the calculate_Ka_Ks_pipeline (https://github.com/qiaoxin/Scripts_for_GB/tree/master/calculate_Ka_Ks_pipeline) [56]. In brief, the coding sequence and gene pairs were prepared. Then computing_Ka_Ks_pipe.pl script was used to perform multiple alignments automatically using MAFFT software and convert them to AXT format for submission to the  (Table S1) were designed using the Primer-Blast (https://www.ncbi.nlm.nih.gov/tools/primer-blast/index.cgi?LINK_LOC = BlastHome) tool. The composition of the PCR mixture was as follows: 0.5 μL of each primer, 5 μL of 2×SYBR Premix ExTaqTM, 1 μL of cDNA, and 3 μL of RNase-free water. The RT-PCR was performed on a Lightcycle-480 (Roche). The qRT-PCR sequence began with 10 min at 95°C, followed by 45 cycles of 95°C for 3 s and 60°C for 10 s, and 30 s of extension at 72°C. Relative expression levels were calculated using the 2 -ΔΔCt method and normalized to the ADH genes.

Supplementary Files
This is a list of supplementary files associated with this preprint. Click to download.