Characterization of the Liriodendron chinense Pentatricopeptide Repeat (PPR) Gene Family and Its Role in Osmotic Stress Response

The Pentatricopeptide repeat (PPR) superfamily is a large gene family in plants that regulates organelle RNA metabolism, which is important for plant growth and development. However, a genome-wide analysis of the PPR gene family and its response to abiotic stress has not been reported for the relict woody plant Liriodendron chinense. In this paper, we identified 650 PPR genes from the L. chinense genome. A phylogenetic analysis showed that the LcPPR genes could roughly be divided into the P and PLS subfamilies. We found that 598 LcPPR genes were widely distributed across 19 chromosomes. An intraspecies synteny analysis indicated that duplicated genes from segmental duplication contributed to the expansion of the LcPPR gene family in the L. chinense genome. In addition, we verified the relative expression of Lchi03277, Lchi06624, Lchi18566, and Lchi23489 in the roots, stems, and leaves and found that all four genes had the highest expression in the leaves. By simulating a drought treatment and quantitative reverse transcription PCR (qRT-PCR) analysis, we confirmed the drought-responsive transcriptional changes in four LcPPR genes, two of which responded to drought stress independent of endogenous ABA biosynthesis. Thus, our study provides a comprehensive analysis of the L. chinense PPR gene family. It contributes to research into their roles in this valuable tree species’ growth, development, and stress resistance.


Introduction
The pentatricopeptide repeat (PPR) superfamily is one of the largest gene families in plants. Typically, PPR proteins comprise more than two tandem repeats of conservative motifs containing 30-40 amino acids [1,2]. PPR proteins can be classified into P and PLS subfamilies, based on the members' motif structure. The PLS subfamily possesses P1, P2, L1, L2, S1, S2, SS, E1, or E2 motifs and occasionally DYW, a non-PPR motif, whereas members of subfamily P only possess tandem P motifs [2]. The PLS subfamily protein is unique to terrestrial plants and was first discovered in Physcomitrium patens [2]. According to available research, most PPR proteins have distinct binding sites, leading to minimal functional redundancy across multiple PPR members [3]. Compared with the hundreds of PPR members in the species, the functionally identified PPR proteins remain a minority. Most PPR proteins target mitochondria or chloroplasts; some PPR proteins have been proven to be dual-targeted [4]. PPR proteins primarily control organelle RNA metabolism such as RNA editing [5,6], RNA splicing [7], and RNA translation [8].
PPR proteins play a crucial role in cytoplasmic male sterility [9] and embryo development [10]. Mitochondrial-targeted PPR proteins regulate proper embryonic development The leaves were collected at 0, 12, and 24 h after treatment, with three biological replicates each. For the qRT-PCR tests, all samples were instantly frozen in liquid nitrogen and stored at −80 • C. The growth conditions were 22 • C, with a 16 h of light and 8 h of dark photoperiod. The relative humidity was 75% [39].

Identification of PPR Genes in L. chinense
We used the PPR (PF01535) seed file downloaded from the Pfam (https://pfam.xfam. org/, accessed on 5 May 2022) database and HMMER3.0 software to search for the genome of L. chinense. We also performed a BlastP search for the L. chinense protein. Combining the aforementioned search results, we removed any unnecessary sequences. To confirm the remaining sequences, we employed SMART (http://smart.embl-heidelberg.de/, accessed on 6 May 2022) and CDD search tools (https://www.ncbi.nlm.nih.gov/Structure/bwrpsb/ bwrpsb.cgi, accessed on 6 May 2022). After analyzing the sequence using the online HMMER search program (https://www.ebi.ac.uk/Tools/hmmer/, accessed on 8 May 2022), we removed proteins with less than two PPR motifs.

Sequence Analysis of LcPPR and Phylogenetic Tree Construction
We used Weblogo (http://weblogo.berkeley.edu/logo.cgi, accessed on 8 May 2022) to create a motif identification to evaluate the conservatism of the LcPPR motif. We used the TBtools (v1.108) program to display data after retrieving each gene's chromosomal and exon/intron locations from the Liriodendron database [40]. We also used the TBtools (v1.108) program to calculate Ka/Ks values. To investigate the collinearity of the LcPPR genes, we employed a multi-collinearity scanning tool package. Additionally, we used ExPASy ProtParam (https://web.expasy.org/protparam/, accessed on 9 May 2022) to determine the theoretical isoelectric point, molecular weight, and protein hydrophilicity of LcPPR genes. We conducted a multi-alignment of 650 LcPPR proteins using MAFFT software (v7.037). The tree was constructed using the maximum likelihood (ML) method with FastTree software (v2.1.10) and a bootstrap analysis of 1000 replicates. The diagram was drawn and modified via iTOL (https://itol.embl.de/itol.cgi, accessed on 4 April 2023).

Cis-Acting Element Prediction
We extracted a 2500 bp sequence upstream of the initiation codon of the target gene from the Liriodendron genome and used PlantCARE databases (http://bioinformatics.psb. ugent.be/webtools/plantcare/html/, accessed on 20 May 2022) to analyze the sequence for potential cis-acting elements. Finally, the results were visualized using TBtools software (v1.108).

Expression Analysis of LcPPR Genes in L. chinense
We performed tissue-specific and drought stress expression analyses using published transcriptome data from the NCBI Sequence Read Archive (SRA) database with the following accession numbers: PRJNA559687 (bract, petal, sepal, stamen, pistil, leaf, and shoot apex) [36] and PRJNA679101 (drought) [39]. The raw data were analyzed using the same process as Wu et al. [39]. A differential expression analysis of transcriptomic data was performed using the edgeR package with a screening condition of log 2 (fold change) > 1 and FDR < 0.05. We then used TBtools to plot a heatmap to visualize the differential gene expression levels. We conducted a qRT-PCR analysis to validate the transcriptome data and determine the expression pattern for selected genes after ABA, PEG, PEG, and fluridone treatments. The total RNA was isolated using an RNA extraction kit (Promega, Shanghai, China, LS1040), and cDNA was synthesized through a reverse transcription kit (Vazyme, Nanjing, China, R312-02). We used the Livak calculation method to calculate the genes' expression levels based on three technical and biological duplicates [41]. For qPCR data, 2 −∆∆Ct was used for the statistical analysis. We normalized them with the reference genes α-Tubulin and GAPDH as well as 18S rRNA [42]. Table S1 lists the primers for qRT-PCR.
We used IBM SPSS Statistics 26 for the statistical analysis. Data were analyzed using a one-way ANOVA with Tambane's T2 analysis or Tukey's correction for multiple compar-isons. Values were stated as the mean ± standard deviation. In all comparisons, p < 0.05 was considered to be statistically significant. We used GraphPad Prism 8 to visualize our results.

Results
3.1. Genome-Wide Identification of the LcPPR Gene Family in L. chinense Firstly, we identified 705 PPR candidate genes for L. chinense using a similaritybased search. Then, by examining the conserved domain, we confirmed 650 LcPPR genes (Table S2 and Figure 1a). The molecular weight (MW) of the LcPPR proteins varied from 16,623.22 to 219,313.38 Da. The theoretical isoelectric point (pI) ranged from 4.51 to 9.67, and the sequence length spanned from 149 to 1911 amino acids. In addition, 501 LcPPR proteins had GRAVY values less than zero, suggesting that they were hydrophilic proteins. According to the gene structure analysis, 3.5% (23/650) of them had just one exon, whereas 55.4% (360/650), 19.4% (126/650), 6.6% (43/650), and 15.1% (98/650) had two, three, four, and more than five exons, respectively (Figures 1b and S1). Based on the analysis of the repeated motif structure, the LcPPR proteins could be divided into P and PLS subfamilies. The former contained only the P motif (235 members), whereas the latter contained several P1, P2, L1, L2, S1, S2, SS, E1, E2, and DYW motifs arranged in a specific order. The PLS subfamily could be categorized into E1 (16 members), E2 (157 members), E+ (67 members), and DYW (140 members) subgroups, depending on the carbon terminal structural domain ( Figure 1c). The number of repeated motifs per LcPPR protein ranged from 2 to 39. Figure 1d shows the amino acid arrangement of each conserved domain of LcPPR proteins.

Phylogenetic Analysis of LcPPR Proteins
To comprehend the evolutionary relationships between these LcPPR proteins, we constructed a phylogenetic tree using the 650 LcPPR proteins' full-length amino acid sequences. Our results showed that group members of the P and PLS subfamilies were clearly separated ( Figure 2). Notably, seven members (Lchi01299, Lchi17169, Lchi24030, Lchi08060, Lchi17690, Lchi23616, and Lchi11005) of the P subfamily were clustered with members of the PLS subfamily, whereas Lchi26149 and Lchi02855 in the PLS subfamily were clustered with branches of the P subfamily. The clustering of members in each subgroup of the PLS subfamily was not satisfactory, and only PLS subgroup members clustered relatively well.

Chromosomal Distribution and Duplication of LcPPR Genes
We found 598 irregularly and extensively dispersed LcPPR genes throughout 19 pseudo-chromosomes of L. chinense, whereas the remaining 52 members could not be assigned to the pseudo-chromosomes (Table S2). Most LcPPR genes were located on chromosome 4 (55 members), whereas the fewest were found on chromosome 18 (14 members). An intraspecies collinearity analysis showed that 30 pairs of LcPPR genes within collinear blocks existed in the L. chinense genome (Figure 3), indicating that duplicated genes from segmental duplication functioned in expanding the LcPPR gene family. The highest number of gene duplication events occurred on chromosomes 2, 3, and 4. To analyze the selective pressure on the LcPPR gene family, we calculated Ka and Ks distance values for the 30 pairs of LcPPR genes described above. The results revealed 27 collinear gene pairs with Ka/Ks values less than 1 (Table S3).

Expression Patterns of LcPPR Genes in Different Tissues
Expression patterns of genes in different tissues may correlate with gene function. We first mapped the expression patterns of all LcPPR genes in the bract, leaf, shoot apex, sepal, stamen, pistil, and petal based on the transcriptome data and found that the LcPPR genes had a variety of expression patterns ( Figure S2). Using a differential expression analysis, we screened 20 LcPPR genes that had significantly different expression levels in leaves ( Figure S3). Four of them were selected using qRT-PCR to verify their relative expressions in the root, stem, and leaf of L. chinense (Figure 4a). Figure S4 shows the results of the specific qRT-PCR primer detection. The melting curve for the validated genes is shown in Figure S5. All these genes were more highly expressed in the leaves, whereas similar levels were expressed in the roots and stems (Figure 4b). Lchi03277 and Lchi23489 had similar expression levels and expression patterns. Lchi06624 and Lchi18566 were significantly more expressed in the leaves than in the roots and stems. Lchi185666 was expressed at the highest level in the leaves, whereas Lchi06624 was expressed at low levels in various tissues.  [43]; (b) qRT-PCR analysis of LcPPR genes in the root, stem, and leaf of L. chinense. Lchi03277 expression in the root was used as a control to determine the relative expression levels. The y-axis shows the expression level (2 −∆∆Ct ). Asterisks indicate statistically significant differences. * p < 0.05.

Analysis of the Cis-Acting Elements in LcPPR Gene Promoters
The cis-acting elements of a promoter are tightly correlated with stress responses and gene transcription. Therefore, we further analyzed the promoter sequences about 2500 bp upstream of the initiator codon of four LcPPR genes. We found many elements, but focused on the cis-acting elements associated with plant hormones and environmental factors. The promoters of these four LcPPR genes all contained elements mentioned above such as the abscisic acid response element ABRE, the drought stress-related element MYC, MBS, and as-1. They indicated that abscisic acid could regulate these genes and be affected by drought stress (Figure 5 and Table 1).

Expression Analysis of LcPPR Genes under Drought Stresses
Based on the transcriptome data [39], we explored the expression profiles of four LcPPR genes under drought stress. All genes were upregulated to varying degrees after 72 h in response to drought stress, suggesting their involvement in drought resistance in L. chinense (Figure 6a). The expression of each gene at 0 h was used as a control to determine the relative expression level. A one-way ANOVA test was used to analyze the data. Letters a-e were used to label statistical significance. There was no statistical significance between groups containing the same letter (p < 0.05).
To confirm our speculation, we treated the three-month-old plants with ABA, PEG, PEG, and fluridone. The qRT-PCR results showed that all four genes were transcriptionally upregulated in response to either the ABA or PEG treatment (Figure 6b). The transcriptional responses of Lchi03277 and Lchi23489 were partially dependent on endogenous ABA biosynthesis because the fluridone treatment significantly decreased the upregulation of these genes in response to the PEG treatment at 24 h. However, an exogenous ABA treatment could not completely simulate the effects of the PEG treatment for Lchi03277 and Lchi23489. In addition, Lchi06624 appeared to be more sensitive to the ABA than the PEG treatment, and its response to the PEG treatment was independent of endogenous ABA biosynthesis. Meanwhile, Lchi18566 also responded to the PEG treatment independently of ABA.

Discussion
With the completed genome sequencing of more species, the identification of PPR gene family members has significantly progressed. At present, in addition to model plants such as Arabidopsis thaliana [1], Oryza sativa [24], and Populus trichocarpa [31], PPR gene families have been identified in Physcomitrium patens [44], Gossypium [45], Citrullus lanatus [46], Camellia sinensis [47], and other plants. However, little is known about woody plants, especially magnoliids. We identified 650 LcPPR genes in the genome assembly of Liriodendron chinense, a relict woody magnoliid species. The number of PPR genes in L. chinense is higher than in herbaceous plants such as A. thaliana (450 members) [1] and C. lanatus (422 members) [46], but comparable with woody plants such as P. trichocarpa (626 members) [31] and C. sinensis (578 members) [47]. The presence of more PPR genes in woody plants than herbaceous plants indicates that they are more diverse, and may assume more functions in woody plants.
Based on the conserved structural domain alignment analysis, the LcPPR genes could be categorized into two subfamilies, P and PLS, which was also confirmed by the phylogenetic analysis. However, a few members in each subfamily were incorrectly grouped, as indicated by P. trichocarpa and C. sinensis. Consequently, several identical and conserved sites between the P motif and its variants could create ambiguities in the phylogenetic grouping. Based on the analysis of conserved structural domains, the amino acid arrangement of LcPPR proteins' structural domains was highly consistent with other species [2], except that the antepenultimate site in E2 was lysine (V) in L. chinense instead of isoleucine (I) (Figure 1d). This finding indicated that PPR proteins are evolutionarily highly conserved. They may have important functions that could be severely affected by a mutation event.
Previous studies have shown that PPR proteins in angiosperms typically contain about 400-600 amino acid residues, and more than half of the individuals in most species do not have introns [24,31,48]. However, most LcPPR genes contain introns, and only 3.5% of the LcPPR genes are intron-free, with the highest number of cases containing one intron at 55.4%. Meanwhile, the coding sequence lengths showed no significant difference between L. chinense and A. thaliana, P. trichocarpa, and C. sinensis ( Figure S6), indicating that intron variations largely caused the gene differentiation in PPR between different species. According to earlier research, most PPR genes eventually lose their introns due to amplification by retro-transposition events [32]. The gene structure analysis demonstrated that most LcPPR genes were ancestral types, consistent with the phylogenetic position of L. chinense in angiosperms [36].
Gene duplication is one of the main methods by which genes obtain new functions or expression patterns, which can promote the evolution of species [49]. In our study, the LcPPR genes were distributed on all chromosomes, and most were located in intraspecies collinear blocks. In addition, there were 30 duplicated gene pairs within the LcPPR gene family, suggesting that gene duplication caused by segmental genome duplication may be responsible for expanding the number of members in the LcPPR gene family. The results of a selective evolutionary analysis indicated that the LcPPR genes were under strong selective pressure for purification.
Tissue-specific studies of genes can provide a foundation for the functional exploration of genes. We selected four genes that were significantly expressed in the leaves by performing a differential expression analysis with transcriptome data. The qRT-PCR results showed that these genes were mainly expressed in the leaf, whereas the expression levels in the root and stem were similar. This finding may be because their function is mainly in the leaves and, to a lesser extent, in the roots and stems. To further determine these genes' possible functions, we analyzed their promoter regions and identified several cis-acting elements that responded to environmental factors and phytohormones. We noticed that all four genes contained cis-acting elements in response to drought stress and ABA. Therefore, we focused on these genes' expression patterns under drought stress in abiotic stress transcriptome data [39]. We found that all four genes were upregulated after 72 h of drought stress. Our results showed that these genes were indeed altered by drought stress, allowing us to further explore these genes' role in drought stress.
We used a PEG treatment to simulate drought stress and found that all four genes appeared to be upregulated. Among them, Lchi03277 was the most upregulated and affected by drought stress, whereas Lchi06624 was the least affected by drought stress, with only a doubling of the upregulation expression. It is known that drought stress stimulates ABA production in plants [50]. Accumulated ABA can activate downstream genes to improve drought resistance in plants by closing the stomata and other pathways [50,51].
The exogenous application of ABA can promote the overexpression of genes and, consequently, enhance plant tolerance to stress [52]. In our study, the exogenous application of ABA promoted the expression of four LcPPR genes. Lchi06624 and Lchi18566 were more sensitive to the ABA treatment, with more than a four-fold expression. Lchi23489 was the least responsive to exogenous ABA applications, with less than two-fold upregulation. We learned that both ABA and PEG applications could affect the expression of LcPPR genes. Therefore, we simultaneously applied PEG and fluridone to understand whether the LcPPR response to drought stress was related to endogenous ABA. Lchi03277 and Lchi23489 expressions decreased after the fluridone application compared with PEG alone, indicating that these genes' upregulated expression levels were affected by endogenous ABA. Conversely, the expressions of Lchi06624 and Lchi18566 did not significantly change after applying fluridone, suggesting that endogenous ABA did not influence these two genes' response to drought stress. Lchi06624 and Lchi18566 could respond to drought stress through an ABA non-dependent pathway. We may subsequently investigate the function of these two genes in other pathways such as the ROS pathway.
There were shortcomings in our experiments. One was that we could not treat the limited number of experimental materials to elucidate the effects on LcPPR genes under simultaneous drought stress and ABA induction. The second was that the number of genes validated by the experiment was not large enough, and many other interesting genes are waiting for further study.

Conclusions
In this study, we identified 650 LcPPR genes in the L. chinense genome, which could be divided into two subfamilies, P and PLS. Gene duplication, possibly from a WGD event, partially contributed to the expansion of the LcPPR gene family in L. chinense. We used transcriptome data to analyze the expression patterns of LcPPR genes in different tissues and under drought stresses. We then selected four genes for an experimental validation. Two of these genes could respond to drought stress through the ABA-independent pathway, whereas the other two responded through the ABA-dependent pathway. In summary, our results can guide further studies on LcPPR gene functions, especially in response to drought stress in L. chinense, thus promoting resistance breeding of this valuable woody tree.
Supplementary Materials: The following supporting information can be downloaded at: https:// www.mdpi.com/article/10.3390/genes14061125/s1. Figure S1: Gene structure of LcPPR genes; Figure S2: Expression patterns of LcPPR genes in different tissues; Figure S3: Expression profiles of 20 LcPPR genes with significant expression in leaves; Figure S4: Specific detection of qRT-PCR primers; Figure S5: The melting curve for the validated genes; Figure S6: Average length of CDS Sequences in Liriodendron chinense, Arabidopsis thaliana, Populus trichocarpa, and Camellia sinensis; Table S1: Primers used for qRT-PCR;