Comparative transcriptomic analysis reveals differences in MADS‐box genes of different hypericum in Changbai Mountains

Abstract To explore the differences between the hypericum in the Changbai Mountains, we carried out a transcriptome analysis of two common hypericums in the area, which was Hypericum attenuatum Choisy and Hypericum longistylum Oliv. We screened the MADS‐box genes to analyze divergence time and evolutionary selection expression, and determine their expression levels. The results showed that we detected 9287 differentially expressed genes in the two species, of which shared 6044 genes by the two species. Analysis of the selected MADS genes revealed that the species was in an environment adapted to its natural evolution. The divergence time estimation showed that the segregation of these genes in the two species was related to the changes of external environment and genome replication events. The results of relative expression showed that the later flowering period of Hypericum attenuatum Choisy was related to the higher expression of the SVP (SHORT VEGETATIVE PHASE) and the AGL12 (AGAMOUS LIKE 12), while the lower expression of the FUL (FRUITFULL).

plants has mainly focused on the pharmacology and applications of chemically active ingredients (Li et al., 2012(Li et al., , 2013Wu et al., 2016).
Studies have found that the content of hypericin in the flowers of H. longistylum is higher, and it reaches its peak during the flowering period (Zhang et al., 2011). However, Zhang et al. (2018) and Zhang et al. (2011) found that the flowering time of H. attenuatum was shorter than that of H. longistylum, and the flower size was completely different from that of H. longistylum. Therefore, it is of great significance to study the differences between these two hypericum species to make better use of hypericum plants and extract their chemically active ingredients.
The MADS-box gene family is closely related to the flowering stage of flowering plants. This gene family not only plays an important role in controlling the flowering of plants in the five-round flowering development mechanism but is also related to the regulation of vegetative growth and the regulation process of plant stress resistance (Gramzow & Theissen, 2010;Wang, 2009). The MADS-box gene family is divided into two types, type I and type II (Kerstin et al., 2005;Purugganan et al., 1995). The type I MADS-box gene has an SRF-like MADS region that can encode a highly conserved protein, which usually contains 1-2 exons (Bodt et al., 2003;Elena et al., 2000;Theissen et al., 1996). The type II MADS-box gene contains an MEF2-like MADS region with multiple introns and exons. This region usually includes six introns and seven exons, and the MADS region of type II is more conserved than that of type I (Hileman et al., 2006;Kerstin et al., 2005). In addition, the type II MADS-box gene has a semiconserved K domain, which can be used to help the transcription factor protein dimer and DNA combine into a complex domain and a nonconserved C-terminal domain, which is located between the M region and the K region. Therefore, the type II gene is also called a MIKC-type gene (Hileman et al., 2006;Ma et al., 1991;Purugganan et al., 1995). According to other research, in angiosperms, type II genes are divided into MIKC C -type and MIKC*-type subgroups, MIKC C -type MADS-box genes are also divided into 12 subgroups, and these subgroup genes are all related to the ABCDE model of flower organ identity (Chiruta et al., 2010;Favaro et al., 2003;Smaczniak et al., 2012). Among them, the AGL12 (AGAMOUS LIKE 12) subgroup has not only been verified to have regulate the cell cycle of root development but also be a catalyst of the flowering transition (Tapia-Lopez et al., 2008). The FUL (FRUITFULL) protein in the SQUA subgroup has multiple functions as a meristem characteristic gene, such as regulating the annual life cycle of plants, promoting fruit development, and promoting stem and leaf growth together with the SOC1 protein (Ferrándiz, Gu, et al., 2000;Ferrándiz, Liljegren, & Yanofsky, 2000;Gu et al., 1998). The SVP (SHORT VEGETATIVE PHASE) protein in the STMADS11 subgroup is a repressor protein of flowering transition (Hartmann et al., 2000), which has the function of inhibiting flowering.
With the continuous establishment and improvement of animal and plant genome databases, the use of bioinformatics methods to study the developmental relationship and evolutionary pressure between gene sequences has become increasingly extensive. The nucleotide replacement rate (ω = d N /d S ) is called selection pressure and is the ratio of nonsynonymous (d N ) to synonymous (d S ) substitutions. This ratio can reflect the evolutionary trend of species. The high ratio indicates that the species is in positive selection; The low ratio indicates that it is in purification selection (Dong et al., 2021;Minozzo et al., 2020). By studying these selection pressures, researchers can predict whether a gene has made a new change, and then perform functional validation to determine whether the change has made a new function (Minozzo et al., 2020).
In this study, we selected the leaves of Hypericum plants in the young stage of Changbai Mountain and analyzed their transcriptome data. We screened the transcriptome database, selected the MADSbox gene family among the differentially expressed genes (DEGs), and analyzed its phylogeny and selective evolution to discover the genetic relationship among Hypericum on Changbai Mountain, as well as the selective pressure on plant evolution. This provides theoretical support for the protection and rational utilization of hypericum.

| Establishment of the transcriptome database of Hypericum
RNA sequencing was based on the HiSeq platform, which sequences all mRNAs transcribed from these two species of Hypericum. The sequencing experiment used the Illumina TruSeqTM RNA Sample Prep Kit method for library construction.

| Transcriptome analysis
We performed data quality testing on the original data after sequencing and performed assembly, de-redundancy, and open reading frame prediction on the data that met the requirements. As a result, we performed an orthologous gene search and gene function annotation. The above work of library construction, sequencing, and transcriptome analysis was entrusted to Shanghai Majorbio Company. The transcriptome data of two hypericum species in the Changbai Mountain area were compared by merging three copies.
Taking H. longistylum. as the control, if the gene expression level in H.
attenuatum was significantly greater than that of H. longistylum, the gene was considered to be upregulated; otherwise, it was considered to be downregulated.

| MADS-box gene family screening and ORF prediction
The transcriptome data were collated and MADS genes were screened out (data not provided). Total RNA was extracted by modified Trizol method and reverse transcribed with ReverTra Ace qPCR RT Kit (TOYOBO) (Chang et al., 2014). The differentially expressed MADS-box genes were BLAST aligned in the UniProtKB/SwissProt database on the NCBI website (https://www.ncbi.nlm.nih.gov/), and their ORF (Open Reading Frame) were predicted (https://www.ncbi. nlm.nih.gov/orffi nder/).

| MADS-box gene domain and motif prediction
The online analysis software MEME (http://meme-suite.org/) was used to predict the motifs of the selected gene sequences. The motifs of the same type appeared only 0 or once in a sequence, and the maximum number of motifs of a sequence was set as 7. Used NCBI Domains & Structures (https://www.ncbi.nlm.nih.gov/Struc ture/ cdd/wrpsb.cgi) to make predictions about the structure of the selected genes. Visual analysis was performed using TBtools software.

| Phylogenetic analysis
Forty-two MADS-box family genes of different species were selected and downloaded from UniProtKB/SwissProt database. Each gene was screened with ATG as the initiation codon and TAA, TAG, or TGA as the termination codon (Table 1). According to the downloaded CDSs of the MADS-box gene family of different species, multiple sequence alignments were performed, and a phylogenetic tree was constructed to study the differences among them and the evolutionary relationship between the genes of each species. The multiple alignment of the gene sequence was completed using Clustal X (Larkin et al., 2007) software, and the default parameters were used. MEGA X (Kumar et al., 2016) software was used to calculate the best model based on maximum likelihood method and to construct the phylogenetic tree (Tamura et al., 2004), with the bootstrap value set to 1000 to verify the credibility of the evolutionary tree. The constructed evolutionary tree was made presentable using the iTOL website (https://itol.embl.de/).

| Selective pressure analysis
The codeml program in PAML 4.9 (Yang, 2007) software was used to perform selective pressure analysis on the selected sequences.
Based on the self-built evolutionary tree and the results of multiple alignment of CDSs in this study, we used the codeml program to analyze the proteins encoded by 50 CDSs, and the site models were M0, M1, M2, M3, M7, and M8. Likelihood ratio verification statistics were used to evaluate paired models, M0 versus M3, M1 versus M2, and M7 versus M8, to test whether there was a positive selection site, and a chi-square test was performed on the results. The degree of freedom was the difference between the parameters of the two models. Finally, selection pressure analysis was performed on the selected genes according to the results.

| Estimation of divergence time
Divergence time estimates were made for each type of gene using the MCMCTree program in PAML 4.9 (Kobayashi & Weigel, 2007) software. The MCMCTree program is based on the approximate likelihood method (Reis & Yang, 2011), which estimates the divergence time of the involved species. The age of the fossils in the self-built tree was queried using the TimeTree website (https://www.timet ree. org), where the age unit was 1 = 100 Ma. The stability of the output evolutionary tree file with divergence time was tested by Tracer V1.7.2 (Rambaut et al., 2018) software, and the parameters in PAML software were judged to be reasonable by checking whether its Ess value was >200. Finally, FigTree V1.4.4 software was used for visual analysis.

| qRT-PCR and expression analysis
Upper middle leaves of well-grown plants containing flowering buds were selected at early floral stages (June) of both species to verify the differences in the expression of flowering-related genes between the two species by qRT-PCR. Total RNA was extracted by modified Trizol method and reverse transcribed with ReverTra Ace qPCR RT Kit (TOYOBO) (Chang et al., 2014). The sequences of the designed fluorescence quantitative primers are shown in Table S1.
Using TB Green® Premix Ex Taq™II (TAKARA), we performed qRT-PCR. qRT-PCR analysis of each three biological samples was performed in triplicate.

| Analysis of orthologous gene expression
The Bowtie comparison of the RSEM software was used to perform expression statistics for the results. First, we obtained the number of read counts, compared to each gene for each sample, and then

| Differentially expressed gene analysis and GO, KEGG enrichment analysis
The transcriptome data of the three samples were classified, and a total of 6044 genes were found to be present in all the six samples.
In the three samples of H. longistylum Oliv., there were 7660 genes; a total of 6700 genes were found to be common to H. attenuatum Choisy (Figure 2).
In this study, with FDR (false discovery rates) < 0.05 and |log 2 FC| ≥ 1 as the screening threshold, 9287 significantly DEGs were obtained, TA B L E 1 MADS-box gene sequence accession numbers.  (Figure 3).
Genes could be classified according to the biological processes they participated in, the components that made up cells, and the molecular functions they performed using the GO term. The statistics of GO term annotations were performed on the DEGs in the two groups, and one of the samples was used as a control ( Figure S1).
In this study, GO enrichment analysis was performed on the DEGs obtained ( Figure 4a). From Figure 4a, we can see that the DEGs are significantly (FDR < 0.001) enriched to 54 GO terms: In the biological process category, these differential genes were significantly enriched in 31 GO terms; In the cellularity category, eight GO terms were significantly enriched; In the molecular function category, 15 GO terms were significantly enriched.
As can be seen from KEGG enrichment analysis (Figure 4b), all DEGs were significantly enriched into eight pathways, among which four were highly significantly enriched (FDR < 0.01), respectively: plant hormone signal transduction in environmental information processing and phenylpropanoid in metabolism biosynthesis, ascorbate and aldarate metabolism, and neurotrophin signaling pathway in organismal systems.

| Prediction of MADS-box gene domains and conserved motifs
Among the DEGs, we enriched a total of nine pairs of MADS-box family genes, among which four pairs were significant (data not provided). Each gene corresponded to a sequence in H. longistylum and H. attenuatum, respectively. We made domains and conserved motifs prediction of the amino acid sequences of these four pairs of genes ( Figure 5). The detailed sequences are shown in Table 2.
Through domain's prediction results, we found that these DEGs all contained a K region, and only HlMADS_C gene did not contain MEF2_like region. According to the motifs distribution (Figure 5b), except HlMADS_C and HaMADS_B genes, the other genes contain at least four different motifs, and only A-class genes contain motif 5, while HlMADS_C only contains motif 2 and motif 6.

| Multiple sequence alignment and phylogeny analysis
Clustal X 2.1 software was used to perform multiple sequence align- The sequence after multiple sequence alignment was used to calculate the best model of the paper mulberry tree using MEGA X software. The results showed that the best model was GTR + G + I, which could be used to build a phylogenetic tree and beautify it can be divided into these three branches.
Subsequent LRT (likelihood ratio test) statistics and chi-square tests were used to determine whether the model was significant. Table 4 shows that the M2, M3, and M8 models of the MADS-box gene family are superior to the M1, M0, and M7 models and reach a very significant level (p < .05).

| Analysis of divergence time
The MCMCTree program was used to estimate the divergence time of the three types of genes. Tracer v1.7.2 software found that the Ess values were all >200, indicating that the parameters were reasonable. Finally, the divergence time of each gene was obtained separately, and a tree was constructed (Figure 8)

| Quantitative real-time fluorescence analysis
To further determine the relative expression levels of MADS-box gene in the two plants, quantitative real-time PCR was performed, and the results are shown in Figure 9. As can be seen from the figure, the expression levels of the SVP and AGL12 in H. attenuatum were higher than those in H. longistylum, and reached a significant level (p < .05). And the relative expression levels of the two FUL genes in H. longistylum were higher than those in H. attenuatum, and reached a significant level (p < .05).

F I G U R E 4
Histogram of GO (a) and KEGG (b) enrichment of DEGs. The FDR < 0.001 is marked as ***, the FDR < 0.01 is marked as **, and the FDR < 0.05 is marked as *, and the color gradient on the right represents the FDR size.

| DISCUSS ION
In this study, comparative transcriptome sequencing was used to analyze two species of Hypericum in Changbai Mountains. The results showed that 9287 genes (including 4565 upregulated genes and 4722 downregulated genes) with significant differences were obtained (FDR < 0.05). Among them, we detected 6044 DEGs in all six samples. However, 2301 DEGs (1288 upregulated genes and 1013 downregulated genes) reached the highly significant level (FDR < 0.01). In the subsequent GO enrichment analysis, it F I G U R E 5 Conserved domains (a) and motif (b) of differentially expressed MADS-box genes in two species.
was found that these significantly different genes (FDR < 0.05) were mainly enriched into 54 GO terms. KEGG enrichment analysis found that the genes with significant differences were mainly enriched in plant hormone signal transduction, phenylpropanoid biosynthesis, ascorbate and aldarate metabolism, and neurotrophin signaling pathway.

TA B L E 2
The detailed sequences of the domain and motif.
F I G U R E 6 Partial multiple sequence alignment results of MADS-box genes.
F I G U R E 7 MADS-box gene family phylogenetic tree. The red branches in the figure represent the gene sequences in this study, and the genes in the same background color are grouped into one category and are divided into three categories. MADS-box family genes have been cloned and verified in many plants, and the functions of MADS-box family genes are closely related to the flowering process of plants (Bowman et al., 1991;Thessen et al., 2000;Wang, Song, et al., 2022). As a family gene, to the type of gene function Won et al., 2021). to better understand the functional evolution of this protein (Shen et al., 2021). Therefore, the presence of motif 5 only in group A genes may be the reason why the function of this group of genes is different from the rest of the genes.

MADS-box
Multiple sequence alignment and phylogenetic analysis showed that group A genes were SVP genes, group B and C genes were FUL genes, and group D genes were AGL12 genes. Among them, group C genes are closely related to P. edulis, while group B genes, although classified as FUL genes, are far from each other. The results of selection evolution analysis showed that there was positive selection for MADS-box genes differentially expressed in the two Hypericum species in Changbai Mountain area, and M2 and M8 respectively had six and seven positive selection sites. This means that these genes are being influenced by Darwinian selection evolution (Yang, 2006).
Hypericum species in Changbai Mountains are also in the environment of adaptive protein evolution.
According to the results of divergence time, it can be seen that the flowering genes of H. longistylum and H. attenuatum were separated in the Cenozoic period. The most striking feature of this period is the high level of mammalian and angiosperms (Jiang et al., 2012;Santiago et al., 2020). The Oligocene period covered in this study was dominated by the evolution and spread of modern types of flowering plants, and large-scale temperature changes occurred during this period (Ge et al., 2006;Jia et al., 2004;Ma & Gao, 2003;Zachos et al., 2001 (Wen et al., 2021).
The divergence times in other species of the genus showed that there was a great deal of separation between 24.4 and 42.9 Ma (Kumar et al., 2017;Meseguer et al., 2013). Therefore, it is reasonable to speculate that SVP gene (group A) and FUL gene (group C) were separated during the differentiation of these two species. It was found that Hypericum underwent two genome-wide replication events at 9.16 and 57.96 Ma, respectively (Wen et al., 2021).
Therefore, we speculate that AGL12 gene (group D) was isolated during a genome-wide replication event about 9.16 Ma. However, FUL genes in group B were isolated during a genome-wide replication event about 57.96 Ma.
The analysis of DEGs in different species of the same genus has been involved in many species studies (Guo et al., 2021;Jiang et al., 2021;Zhu, 2020). Studies had shown that high expression of SVP resulted in smaller flowers and also inhibited the flowering process in most dicotyledonous plants (Jaudal et al., 2014). In addition to this, AGL12 suppressed the process of flowering transformation in plants, while FUL promoted the formation of inflorescence meristematic tissue in plants at an early stage of flower development (Kempin et al., 1995;Li et al., 2008;Tapia-Lopez et al., 2008). The results of some other studies showed that both SVP and AGL12 inhibited flowering transition, while FUL promoted the formation of inflorescence meristem at the early stage of flower development (Kempin et al., 1995;Li et al., 2008;Tapia-Lopez et al., 2008). It can be seen from the results of this study that the expression levels of SVP and AGL12 in H. attenuatum were higher than those in H. longistylum, while the expression levels of FUL were lower than those in H. longistylum. Therefore, in H. attenuatum, the flowering transition process and flower meristem formation process development was inhibited at the early stage of flowering time, which was consistent with the phenomenon that the flowering stage of H. attenuatum was later than that of H. longistylum. Some studies had pointed out that the expression and function of the same genes may differ in different species, for example, the presence of high SVP expression in both mango and soybean was found to promote plant flowering Woods et al., 2016). Therefore, the specific functions of the MADS-box family in Hypericum remain to be further investigated subsequently.

| CON CLUS ION
In this study, a total of 9287 DEGs were detected in two Hypericum species in Changbai Mountain, and 6044 genes were expressed in TA B L E 4 MADS-box protein positive selection site. both species. Through subsequent KEGG pathway analysis, it was found that these DEGs were mainly enriched into four pathways.
Four pairs of differentially expressed MADS genes related to flowering were screened out in this study, and it was found that they were influenced by Darwinian selection evolution, and their growth environment was suitable for such selection. The divergence time analysis showed that the divergence of type II MADS genes between the two species may be related to environmental changes and their own genome replication events. The relative expression analysis showed that the high expression of SVP and AGL12, which inhibited flowering, and the low expression of FUL, which promoted flower development, might be internal factors contributing to the later flowering of H. attenuatum than H. longistylum.
The purpose of this study was only to investigate the differences between the MADS-box flowering genes of the two species, and the specific functions of their genes were not verified. Therefore, further studies are needed.

ACK N OWLED G M ENTS
Zuojia Nature Reserve and Jilin Agricultural Science and Technology University are thanked for their support and help in this experiment.
The study was financially supported by the Science and Technology  Table 2.

O RCI D
Xia Yunrui https://orcid.org/0000-0003-1428-1432 F I G U R E 9 Expression analysis graph. The ordinate in the figure is the value of 2 (−ΔΔCt) , and the higher the value, the higher the gene expression. Ha means Hypericum attenuatum, and Hl means Hypericum longistylum.