Comprehensive analysis of polygalacturonase family highlights candidate genes related to pollen development and male fertility in wheat (Triticum

Background: Polygalacturonase (PG) belongs to a large family of hydrolases with important functions in cell separation during plant growth and development via the degradation of pectin. The specific expression of PG genes in anthers may be significant for male sterility research and hybrid wheat breeding, but it has not been characterized in wheat (Triticum aestivum L.). Results: We systematically studied the PG gene family using the latest published wheat reference genomic information. In total, 113 wheat PG genes were identified and renamed as TaPG01– 113 based on their chromosomal positions. The PG genes are unequally distributed on 21 chromosomes and classified according to six categories from A–F. Analysis of the gene structures and conserved motifs demonstrated that the Class C and D TaPGs have relatively short gene sequences and a small number of introns. Class E TaPGs are the least conserved and lack conserved domain III. Polyploidy and segmental duplications in wheat were mainly responsible for the expansion of the wheat PG gene family. Predictions of cis-elements indicate that TaPGs have a wide range of functions, including the responses to light, hypothermia, anaerobic conditions, and hormonal stimulation, as well as being involved in meristematic tissue expression. RNA-seq showed that TaPGs have specific temporal and spatial expression characteristics. Twelve spike-specific TaPGs were screened using RNA-seq data and verified by qRT-PCR in the sterile and fertile anthers of thermo-sensitive male-sterile wheat. Four important candidate genes were identified as involved in the male fertility determination process. In fertile anthers, TaPG09 may be involved in the separation of pollen. TaPG87 and TaPG95 could play important roles in anther dehiscence. TaPG93 may be related to pollen development and pollen tube elongation. Conclusions: We analyzed the wheat PG gene family and identified four important TaPGs with differential expression levels in the wheat fertility conversion process. Our findings may facilitate functional investigations of the wheat PG gene family and provide new insights into the fertility conversion mechanism in male-sterile wheat. (Ct) values obtained with the Applied Biosystems 7500 Real-Time PCR System. family

the pollen wall can be divided into an outer wall and inner wall, and the main component of the inner wall of pollen is pectin. The pollen tube is an extension of the inner wall of the pollen, and thus the main component of the primary cell wall of the tip of the pollen tube is also pectin [24,25]. Pectin modification and degradation involve a variety of enzymes, including pectin methylesterases (PMEs), pectin acetylesterases (PAEs), PGs, and pectate lyase-like. Galacturonase is demethylated and deacetylated by PME and PAE, respectively, before depolymerization by PG and pectate lyase-like to release oligomeric galacturonic acid and the cell wall then loosens and grows rapidly [26]. The stamen development process involves with a series of cell expansion and cell separation events involving cell wall degradation and relaxation, which is why PG plays a vital role in plant pollen development and male fertility. The pollen-specific PG genes have been studied in tobacco [27], cotton [28], and Arabidopsis [29], but the pollen-specific genes in wheat require further characterization to provide new insights into male sterility in this crop.
KTM3315A is a thermo-sensitive male-sterile wheat material with excellent traits and great potential for hybrid production, which has been utilized greatly by our research team. In a previous study, we found that KTM3315A exhibited male sterility under normal autumn sowing conditions in Shaanxi, China (108°E, 34°15′N), but it was fertile and could be self-pollinated after spring sowing in the same area. Based on microscopic observations, the binucleate stage was identified as a critical period for fertility conversion because the microspore mother cells in the sterile pollen exhibited abnormal characteristics, such as shrinkage and irregularity [30]. Transcriptome analysis demonstrated that this abortive morphology was closely related to the expression of pectinase [30]. Analysis of the anther-specific PG genes involved in the plant fertility conversion process using excellent male-sterile materials allowed us to cultivate more stable male-sterile materials, thereby accelerating the wheat hybridization breeding process.
In this study, we identified 113 members of the PG gene family in wheat, which we systematically characterized by analyzing their evolutionary relationships, gene structures, conserved domains, ciselements, and expression patterns. Based on the expression patterns of PG genes in the sterile and fertile anthers, we identified four TaPGs with differential expression levels in the anthers of male-sterile and fertile wheat as candidate genes involved in pollen development and male fertility. The results obtained in this pioneering study in wheat provide a valuable important reference to facilitate further research into wheat pollen development and male-sterile wheat cultivation.

Identification and annotation of PG gene family members in wheat
The hidden Markov model (HMM) file for glycoside hydrolase family 28 (GH28, EC 3.2.1) and the amino acid sequences for the PGs in Arabidopsis were aligned with the whole amino acid sequences in wheat, and 113 wheat PGs were finally obtained after strict screening and identification (Additional file 1: Table S1). The 113 wheat PGs were unevenly distributed on 21 chromosomes and each chromosome contained at least one PG (Additional file 2: Figure S1). According to their chromosomal locations, the 113 wheat PG genes were renamed as TaPG01 (Table 1). Accurate identification and consistent naming are essential for further research into this gene family.

Evolutionary analysis of wheat PG gene family members
We constructed a phylogenetic tree based on the PG genes from 13 species, including wheat, to determine their evolutionary trends (Fig. 1, Additional file 3: Figure S2). The other 12 species belonged to four categories. The first category contained the model plant Arabidopsis and rice, which is closely related to wheat. The second category contained the forest model plant Populus, spikemoss model plant Selaginella, and fern Physcomitrella. The third category contained the apple, peach, tomato, watermelon, and cucumber. The fourth category contained Brassica napus and soybean. The phylogenetic tree showed that the wheat PGs were more closely related to those in Arabidopsis and rice, and the functions of the wheat PGs were similar to those in these two species.
In order to further refine the evolution and classification of wheat PGs, we used 238 PG proteins from wheat, Arabidopsis, and rice to construct a new phylogenetic tree (Fig. 2), which showed that the 238 PG proteins from these three species clustered into six branches ranging from A-F. Among these six classes, class A contained 12 wheat PGs, and class B to class F contained 16, 13, 28, 30, and 14 wheat PGs, respectively. In some classes, the PGs on the branches of the phylogenetic tree roughly agreed with the species phylogeny. Most of the rice PGs corresponded with three highly homologous wheat PGs in classes A and E, and most of the rice PGs clustered together with those in wheat due to their high homology. However, the topology was slightly more complex for the classification based on B, C, D, and F because not all of the wheat homologues corresponded with rice paralogues, and some specific rice PGs (e.g., LOC_Os08g23790 in class D) clustered with more than nine wheat homologues, which suggests that significant expansion occurred before or after triploidy. In particular, a significant expansion occurred in class F in wheat compared with rice.

Polyploidization and duplications of wheat PGs
The PG gene family in Arabidopsis contains 66 members and that in rice contains 44 members. The number of PGs in wheat is much higher than those in these two plants, as well as in most other plants. Similarly, tetraploid soybean that evolved from ancient tetraploids also has a relatively large number of PGs (111) [5], thereby suggesting that the hexaploidy of wheat has contributed to the high number of PGs. However, after correcting for the ploidy level, we found that the number of wheat PGs is lower than those in rice (113/3 = 37.7 < 44) and Arabidopsis (113/3 = 37.7 < 66) (Fig. 3).
Irrespective of the total number of PGs, the number of PGs in Arabidopsis is almost half that in wheat, instead of the expected ratio of 1:3. In rice, the total number of PGs in each of classes C, D, and F are about one-third of those in wheat, and the numbers in the other classes are close to one-half of those in wheat. The number of PGs in wheat is not as expected, so we cannot exclude the possibility of gene expansions in rice and Arabidopsis after differentiation, but we also suggest that wheat may have lost some PGs after polyploidy.
In order to understand why the number of wheat PGs was not as expected, we analyzed the homologous group in detail ( Table 2, Additional file 4: Table S2). More than half of all the wheat PGs exhibited a triplet phenomenon, i.e., three genes from the three ABD sub-genomes were highly homologous (1:1:1, 69%). This ratio is much higher than the proportion of all triplets in the wheat genome (69% vs 35.8%) [31]. Therefore, the presence of a high proportion of homologous triplets may partially explain the high total number of wheat PGs, while the loss of one homologue (1:1:0/1:0:1/0:1:1) and orphans/singletons can partly explain why the total number of wheat PGs is not three times that in rice. In addition, the 1:n proportion indicates that the expansion of the PG gene family in wheat also led by other gene duplication events.
Our analysis of the duplications in the wheat genome indicated that eight pairs of tandem duplications and 87 pairs of segmental duplications are present among the TaPGs. The nonsynonymous substitution rate (Ka)/synonymous substitution rate (Ks) values for these eight pairs of tandem duplication TaPGs were all calculated as less than 1, thereby indicating that they were under purifying selection (Additional file 5: Table S3). The 93 TaPGs  The MEME tool was used to search for the conserved motifs in the 113 wheat PGs, and 10 conserved motifs were identified (Fig. 6, Additional file 7: Figure S3). None of the PGs contain all 10 motifs. The class A and class C PGs contain the same five motifs, and all of the class D wheat PGs contain the same nine motifs, except for TaPG92. Motif 1 and motif 3 both contain conserved domains I (SPNTDGI) and II (GDDC) at the same time, while motif 1 also contain a portion of conserved domain III (CGPGHGISIGSLG), and motif 2 and motif 6 contain conserved domain IV (RIK). In addition to these four conserved motifs, motif 5 and motif 9 are the most conserved because they are present in most of the PGs, including the class E PGs. The structures of the TaPGs demonstrate that most of the PG genes contain 1-9 introns, where the class C and D PGs comprising exo-PGs have relatively shorter gene sequences and fewer introns, while the class E PGs comprising oligo-PGs generally have longer intron sequences, and the class F PGs typically contain more introns (7-9). The structural characteristics of each class of PGs were obtained by analyzing the motifs and gene structures, but these characteristics also provided a basis for classifying the functions of PG.

GO annotations and prediction of cis-elements for wheat PGs
Searching for annotation information in databases and predicting upstream regulatory factors is highly effective for determining the corresponding functions of genes. We obtained the molecular  Table S5). The cis-acting elements of the TaPGs were predicted and 10 were analyzed further.

Analysis of the expression patterns of TaPGs
A heatmap was generated to predict the related functions based on the specific expression of PGs by using RNA-seq data obtained from Chinese Spring wheat in 15 different developmental stages and tissues ( Fig. 9). Clearly, 21, seven, and 11 TaPGs were significantly upregulated in the anthesis spikes (spike, reproductive, anthesis), spikes wrapped in flag leaves (spike, reproductive, flag leaf stage), and grains at 2 days post-anthesis (grain, reproductive, 2 dpa). Among the 21 TaPGs upregulated in the anthesis spikes, 18 belonged to class D and three to class B. The 11 TaPGs upregulated in spikes during the flag leaf stage belonged to classes B, C, and D, and the TaPGs upregulated in the grain belonged to class F. The differential expression levels clearly demonstrated that various classes of TaPGs exhibited tissue-specific expression patterns. In order to identify the TaPGS that might play important roles in wheat pollen development, we focused on 12 TaPGs expressed specifically in the spikes (TaPG09, 14,18,28,40,41,49,50,54,87,93, and 95).

Identification of pollen-specific TaPGs
Real-time quantitative PCR (qRT-PCR) was performed for 12 of the TaPGs with spike-specific expression using sterile and fertile anthers, as well as the stems, leaves, and grains from thermosensitive male-sterile wheat KTM3315A in order to determine their expression patterns. RT-PCR (Additional file 9: Figure S4), confirmed that the 12 differentially expressed TaPGs had significantly high expression levels in the anthers. The expression levels of seven TaPGs (TaPG 09, 14,40,41,49,50, and 54) exhibited decreasing trends during anther development, but they were upregulated during the uninucleate stage or binucleate stage in fertile anthers. The expression levels of the anthesis spike-specific TaPGs (TaPG18, 28, 87, 93, and 95) were all highest during the trinucleate stage, and their expression levels differed significantly between fertile and sterile anthers (Fig. 10).
However, the expression level of TaPG87 was significantly downregulated during the trinucleate stage in fertile anthers compared with sterile anthers, whereas the opposite pattern was found for the other four TaPGs. These non-uniform expression patterns suggest that TaPGs have different roles during anther development and that their functions are not redundant, although TaPG87 and TaPG95 both belong to class B in this gene family.
After aligning the differentially expressed TaPGs described above, we found that their protein sequences were highly similar to those of seven Arabidopsis proteins (Additional file 10: Table S6).
Four of the five TaPGs with specific expression in the flag leaf stage shared high homology with Arabidopsis thaliana AT3G59850, while TaPG54 shared high homology with Arabidopsis AT2G43870.
AT3G59850 and AT2G43870 are involved in carbohydrate metabolic process, but their specific functions are not known, in the same manner as AT4G18180 which is the homologous genes of TaPG18 and TaPG28 [4]. However, TaPG09 shared highly homology with AT3G07970 (QRT2), which is Discussion PG belongs to the GH28 family and it also known as pectin enzyme or pectin depolymerase. The PG gene family members have close relationships with plant development, where they play various roles depending on their temporal or spatial expression, or by encoding different hydrolysis patterns or binding different substrates. In recent years, the development of genome sequencing has facilitated systematic studies of the evolution, localization, structure, and expression of plant PG genes. In total, 66 and 44 PG genes have been identified in Arabidopsis and rice [3], respectively, as well as 62 and 53 in watermelon and cucumber [36], and 75, 85, 112, and 99 in Populus, apple, soybean, and Brassica rapa [5][6][7]37]. In this study, we identified 113 wheat PG genes and demonstrated that the PG gene family members have complex roles in wheat.
In order to explain the high number of PGs in wheat compared with other plants, we analyzed the homologous groups and duplications of wheat PG genes. The proportions of intact triplets (1:1:1) and segmental duplications were higher than those in the whole genome, thereby demonstrating that genome replication and segmental duplication were responsible for the expansion of the wheat PG gene family. The presence of genome-wide repeats and other events led to the generation of many repetitive genes in the PG gene family throughout the evolution of wheat [4].  Table S2). In addition, based on the chromosomal locations of the 1:1:1 triplets, we found that most of the TaPGs from a homologous group were located in consistent positions on the three chromosomes either on a long arm or short arm. However, some individual cases were exceptions. . In general, a gene encoding a protein containing these four conserved domains is assigned to the PG gene family, but it should be noted that the conservation of domain III is relatively low [3]. Indeed, we found that all of the TaPGs [57]. In addition, BcMF6, BcMF16, and BcMF17 have been identified as PG genes with important roles in pollen development in Chinese cabbage [58][59][60]. In Arabidopsis, three endo-PGs comprising ADPG1, ADPG2, and QRT2 are critical for pollen grain separation and they also contribute to anther dehiscence, but mutants that overexpress QRT2 exhibit male sterility [29].
During pollen development, meiosis by the pollen mother cells requires a separation event to separate the four microspores. Callose deposition and the accumulation of pectin around the pollen mother cells are associated with pollen tetrad formation, so these components can be detected around the pollen mother cells during meiosis. In Arabidopsis qrt1 and qrt2 deficient mutants where the microspore fails to separate, callose can be degraded normally during tetrad pollen formation but pectin is still present after the degradation of callose, thereby demonstrating that QRT1 and QRT2 are required for the degradation of pectin during microspore separation [32]. In our study, we showed that TaPG09 shares high homology with Arabidopsis QRT2 and it belongs to the PG family. Therefore, based on its low expression level in sterile anthers, we suggest that a lack of TaPG09 during pollen development may lead to the failure of pollen separation. TaPG87 and TaPG95 were found to share high homology with ARABIDOPSIS DEHISCENCE ZONE POLYGALACTURONASE2 (ADPG2). Our qRT-PCR results also demonstrated that both genes are expressed in the late stage (trinucleate) in the fertile anthers in the same manner as many of the pollen-specific PGs mentioned above, and thus we suggest that they are involved in the separation of pollen grains and the cracking of anthers, which are the roles of ADPG1 and ADPG2 in Arabidopsis [61]. The downregulated expression of TaPG87 in the fertile anthers was not expected, but this result was verified by RT-PCR (Additional file 9: Figure   S4) and qRT-PCR. Thus, differences in the expression levels of TaPGs may have diverse effects on the regulation of pollen development in the same manner as a moderate level of QRT2 promotes anther development whereas its overexpression can lead to male sterility. Therefore, the specific effects of the regulation of TaPG87 on wheat fertility require further study. Many studies have shown that PGA4 is involved in the pollen development process and pollen tube growth [35, 62, 63]. The proteins that interact with PGA4 include several proteases that hydrolyze substrates, such as β-D-xylosidase, α-Larabinofuranosidase, glycosyl hydrolase, and pectin lyase-like superfamily proteins. These enzymes play important roles in processes such as degradation of the callose wall surrounding the microspores and they are closely related to male fertility in plants [64][65][66][67][68]. TaPG93 shares high homology with PGA4 and it is highly expressed in the trinucleate stage in the fertile anthers, so we suggest that TaPG93 is also involved in pollen tube growth in the same manner as its homologous gene PGA4. In summary, TaPG09, TaPG87, and TaPG95, and TaPG93 may be involved in three key processes in the later stages of male organ development, i.e., pollen separation, anther dehiscence, and pollen tube growth. Thus, the effects of members of the PG gene family on fertility should not be underestimated, and further validation of their functions will greatly advance our understanding of male sterility in wheat.

Conclusions
PGs are pectin hydrolases with important roles in the development of plants and they are involved in a complex range of functions. In particular, it is important to determine the pollen-specific roles of PGs in wheat in order to understand pollen development and the utilization of male sterility and heterosis. In this study, we systematically investigated the wheat PG gene family and identified 113 PGs with uneven distributions on 21 chromosomes based on information in newly published wheat reference genome. According to their physical locations, these genes were renamed as TaPG01-TaPG113. We constructed a phylogenetic tree based on Arabidopsis and rice, and the 113 TaPGs were assigned to six branches comprising classes A-G, where the class E PGs were least conserved and domain III was not conserved. Analysis of homologous groups and duplications demonstrated that genome replication and segmental duplication were mainly responsible for expansion of the PG gene family in wheat. In addition, cis-element analysis indicated that TaPGs can respond to light, anaerobic conditions, low temperature, and other environmental conditions, and the results also provide the basis for understanding the regulation of wheat PGs by hormones. Expression analysis identified 12 spike-specific TaPGs. These spike-specific genes were expressed in the anthers and four may have important roles in pollen development and male fertility. These findings provide an important theoretical basis for further research into male gametogenesis development and the utilization of heterosis in plants.

Plant materials
In this study, we used wheat material KTM3315A, which was selected in 2001 by our research team at the Northwest A&F University, Yangling, Shaanxi, China. KTM3315A is a thermo-sensitive male-sterile wheat material and it has been investigated by our research group because of its good agronomic traits and stable fertility performance. KTM3315A is sterile in low temperature conditions and fertile under high temperature conditions, thereby allowing self-crossing and it is an advantageous material for the utilization of heterosis [30]. Ten pots containing wheat KTM3315A were placed in two artificial climate incubators with parameters comprising 14 h light (day) and 10 h dark (night), and day/night temperatures of 17°C/15°C for the sterile conditions and 22°C/20°C for the fertile conditions. Thus, 20 pots containing KTM3315A were treated at different temperatures until their growth was complete.
The stems, leaves, and anthers were sampled from sterile and fertile plants in three stages (uninucleate, binucleate, and trinucleate) and stored at − 80°C, as well as the fresh grains from fertile plants.

Identification of wheat PGs gene family
The whole protein sequences for wheat and the HMM file for GH28 were downloaded from the Ensembl plant (http://plants.ensembl.org/Triticum_aestivum/Info/Index) [69] and Pfam databases (http://pfam.xfam.org/family/PF00295) [70], respectively. A wheat-specific HMM file was constructed using the alignment of the wheat protein sequences with the GH28 HMM file based on e-values less than 1e-20. The wheat-specific HMM file was then aligned with the wheat protein sequences and the results with e-values less than 0.01 were identified as candidate wheat PGs. To avoid omissions, we aligned the whole wheat protein sequences with the PG sequences from Arabidopsis downloaded from Phytozome [4]. These two methods obtained all of the candidate wheat PGs and the final wheat PGs were identified after screening using CD-search (https://www.ncbi.nlm.nih.gov/Structure/bwrpsb/bwrpsb.cgi) [71], SMART (http://smart.embl.de/) [72], and pfam (http://pfam.xfam.org/) [70]. The genetic locations of the candidate wheat PGs were obtained from the reference genome annotations. The amino acid lengths, molecular weights, and isoelectric points of the PG proteins were predicted using Expasy (https://www.expasy.org/structural_bioinformatics).

Phylogenetic tree construction for wheat PGs and analysis of gene duplications
In order to determine the evolutionary relationships of the wheat PGs, the amino acid sequences of 66 Arabidopsis PGs, 44 rice PGs, and 113 wheat PGs, as well as the amino acid sequences of PGs from 10 other species (Populus, Selaginella, Physcomitrella, apple, peach, tomato, watermelon, cucumber, Brassica napus, and soybean) were subjected to sequence alignment using CLUSTALW and the phylogenetic tree was constructed with MAGE (Additional file 11: Table S7). The phylogenetic tree was constructed using the neighbor-joining method with 1000 bootstrap replicates. All of the amino acid sequences of the wheat PGs were subjected to self-BLAST together with their chromosomal localizations to analyze the homologous group. Tandem duplications and segmental duplications in the wheat PG gene family were obtained using Mcscanx [73], and the Ka/Ks ratios were calculated for tandem duplicated PGs with the KaKs_Calculator2.0 [74].

Analysis of gene structure and conserved domains in wheat PGs
Based on multiple sequence alignment of the conserved regions, we identified four conserved domains in the PG gene family and they were visualized using GeneDoc software. Ten conserved motifs with lengths of 6-50 amino acids were obtained using MEME. These results were combined with the wheat genome annotation information and the genetic structure and conserved motifs in the wheat PGs were visualized with TB-tools [75].

GO annotation and predictions of cis-element and functions ofTaPGs
The GO annotations for wheat PGs were obtained from the Ensembl Plants database to determine the molecular functions, biological processes, and cellular components associated with wheat PGs. We also investigated the upstream regulatory factors associated with PGs by analyzing their cis-elements.
In order to predict the functions of TaPGs, STRING was used to obtain Arabidopsis PGs with similar sequences to wheat PGs as well as their functions and interactions [77].

Expression analysis ofTaPGs
In order to screen genes based on their differences in expression, RNA-seq data from 15 different tissues (root, stem, leaf, spike, and grain) and developmental stages (seedling, vegetative, and reproductive) in Chinese Spring wheat were used to analyze the expression profile of wheat PG genes, where these data were downloaded from the Wheat Expression Browser database [78,79]. To further verify whether spike-specific TaPGs are associated with pollen development and fertility conversion, relative quantitative analyses were performed for sterile and fertile plants by applying qRT-PCR for three anther development stages (uninucleate, binucleate, and trinucleate) and four aboveground plant parts (anthers, stems, leaves, and grains). Total RNA was extracted using TriGene reagent   Table 2. Homologous group in the wheat PG gene family.
All wheat genes represents the distribution of homologous groups among the entire three genomes in the whole wheat genome according to IWGSC. A, B, and D represents the three wheat genomes.
Other represents conditions not defined above (see Additional file 4: Table S2 for details). n > 1.