In situ dissecting the evolution of gene duplication with different histone modification patterns based on high-throughput data analysis in Arabidopsis thaliana

Background Genetic regulation is known to contribute to the divergent expression of duplicate genes; however, little is known about how epigenetic modifications regulate the expression of duplicate genes in plants. Methods The histone modification (HM) profile patterns of different modes of gene duplication, including the whole genome duplication, proximal duplication, tandem duplication and transposed duplication were characterized based on ChIP-chip or ChIP-seq datasets. In this study, 10 distinct HM marks including H2Bub, H3K4me1, H3K4me2, H3K4me3, H3K9ac, H3K9me2, H3K27me1, H3K27me3, H3K36me3 and H3K14ac were analyzed. Moreover, the features of gene duplication with different HM patterns were characterized based on 88 RNA-seq datasets of Arabidopsis thaliana. Results This study showed that duplicate genes in Arabidopsis have a more similar HM pattern than single-copy genes in both their promoters and protein-coding regions. The evolution of HM marks is found to be coupled with coding sequence divergence and expression divergence after gene duplication. We found that functionally selective constraints may impose on epigenetic evolution after gene duplication. Furthermore, duplicate genes with distinct functions have more divergence in histone modification compared with the ones with the same function, while higher expression divergence is found with mutations of chromatin modifiers. This study shows the role of epigenetic marks in regulating gene expression and functional divergence after gene duplication in plants based on sequencing data.


INTRODUCTION
Gene duplication plays a critical role in the generation of genomic repertoires for subsequent functional innovation (Flagel & Wendel, 2009). Duplicate genes can be generated through different molecular processed modes, such as whole genome duplication (WGD) and segmental duplication (SD) (Freeling, 2009;Kustatscher et al., 2019;Wang, Tan & Paterson, 2013). WGD is induced by abnormal cell division, which is a common phenomenon in the plant kingdom (Otto & Whitton, 2000;Zhao et al., 2015). Single-gene duplications are classified as local duplications (including tandem duplication and proximal duplication), which could generate two nearby duplicate genes, and transposed duplications, which could create a gene duplication with a copy of the transposable element at a new chromosomal position via either DNA or RNA-based mechanisms (Cusack & Wolfe, 2007;Wang, Tan & Paterson, 2013;Wang, Wang & Paterson, 2012).
Following gene duplication, retained paralogs initially have identical sequences and functions but tend to undergo divergence in DNA sequence and expression patterns later (Liao et al., 2014). Growing evidence suggests that duplicated genes are regulated by both genetic and epigenetic regulation (Chen, 2007). A variety of molecular mechanisms of epigenetic regulation has been described, with the most important being histone modification (HM), DNA methylation and histone variants. Histone modification is a covalent post-translational modification to histone proteins, which impact gene expression by altering chromatin conformation or recruiting chromatin modifiers and transcriptional regulators. In most species, histone proteins contain four types: H2A, H2B, H3 and H4, which are subject to post-translational modifications including acetylation, methylation, phosphorylation and ubiquitylation.
Histone modifications may occur in a locus-specific manner. For example, histone H3 is primarily acetylated (ac) at Lysines (K) 9, 14, 18, 23 and 56, methylated (me) at Lysines (K) 4, 9, 27, 36 and 79, and phosphorylated at Ser10, Ser28, Thr3 and Thr11. H3K4me1 is an epigenetic modification to protein Histone H3 and it is a mark that indicates the mono-methylation at the fourth lysine residue of H3. Histone modification is an important type of epigenetic marks which could affect the gene transcriptional activation. Some marks are located in the vicinity of activated genes, such as H3K4me1/2/3, H3K9ac, H3K14ac and H3K36me3; these marks are called active marks. Some marks are located next to repressed genes, such as H3K27me1/3; these marks are called repressive marks. Our previous study has discussed the distribution dynamics and roles of epigenetic marks in the modulation of gene transcription (Wang et al., 2015b).
Several lines of evidence suggest that epigenetic changes play an important part in the initial expression divergence between duplicated gene pairs. Zou et al. (2012) have investigated the HM pattern evolution after gene duplication in yeast and showed that genetic and epigenetic elements, which contribute together to the expression divergence between duplicated gene pairs, have co-evolved since gene duplication. The expression levels of three TaEXPA1 homologs were correlated to the change with H3K9me3, H3K4me3 and H3K9ac in hexaploid wheat (Hu et al., 2013). It has been reported that DNA methylation divergence increases with evolutionary age between the duplicate gene pair in human (Keller & Yi, 2014). Also, DNA methylation is proposed as a potentially important regulatory mechanism for gene expression divergence in plants (Sui et al., 2014;Wang, Marowsky & Fan, 2014). Specifically, exonic methylation divergence correlates more closely with expression divergence than intronic methylation divergence (Wang, Marowsky & Fan, 2014). In our previous study, we found that the DNA methylation of the open reading frame region (ORF methylation) illustrates distinct patterns in different modes of duplication . Furthermore, researchers suggested that duplicated genes with the most divergent ORF methylation and expression patterns tended to have distinct biological functions in cassava (Wang et al., 2015a). It was also reported that the important feature of the changes in ORF methylation of duplicated genes in A. thaliana is to undergo evolutionary transormation (Wang, Marowsky & Fan, 2014;Wang & Adams, 2015). Berke & Snel (2014) have found that H3K27me3 is retained and constrains expression divergence after duplication in A. thaliana genomes, which is linked to lower expression divergence, yet higher coding sequence divergence (Berke, Sanchez-Perez & Snel, 2012). However, it remains to be explored whether and how the comprehensive HM profiling, not just a single chromatin mark, influences the expression of duplicate genes in A. thaliana. In this work, we aim to investigate the potential interplay between the HM profiling and gene duplications in A. thaliana, which can help us to understand the mechanisms underlying gene duplication, evolution, and the roles of HM marks in shaping current genomes .

Identification of duplicate genes with different modes in A. thaliana
The MCScanX-transposed software is commonly used to identify gene duplications that occurred within different areas based on the MCScanX algorithm which can detect multiple gene collinearity within and between related genomes . We applied MCScanX-transposed software with default parameters to identify duplicated genes with different modes, including WGD, proximal, tandem and transposed duplications in the A. thaliana genome. The unidentified genes were considered as singleton genes.

Histone profiling
Public ChIP-chip data (Dataset S2) were originally mapped to the TAIR10 version of the A. thaliana reference genome. A total of 10 distinct HM marks including H2Bub, H3K4me1, H3K4me2, H3K4me3, H3K9ac, H3K9me2, H3K27me1, H3K27me 3, H3K36me3 and H3K14ac were analyzed (Additional file 4). For each of the ten HM marks in the open reading frame regions (ORF) and upstream regulatory (promoter) regions (500 bp upstream of the transcription start site), we calculated the log2-transformed average enrichment ratio with nucleosome normalizing. The ratios were then transformed to z scores using the formula Z X = (χ − m)/δ, where χ is the ratio value for a gene, m is the mean ratio of all genes, and δ is the standard deviation of this ratio for all genes. The HM divergence of a duplicate gene pair was measured by D HM = 1− r, where r is the Pearson correlation coefficient of the HM profiles for the duplicated gene pair.

Randomized controlled analysis
Randomization tests examine data in a nonparametric way, without the need to make assumptions about object populations. The most important advantage of randomization tests is that they allow users to specify any test statistic of interest and remain statistically valid. To test whether the duplication group of interest is significantly different from others, the randomization test is applied.
For each test, 10,000 independent randomizations are performed for the controlled analysis. Note that, (i) to obtain the randomized control of HM divergence between duplicate pairs, we randomly generated the same number of singleton pairs. (ii) To test the chromosome bias on the HM divergence between duplicate pairs, randomized singleton pairs were chosen where both copies were located on the different chromosomes. Then, the Wilcoxon rank-sum test was used to evaluate the significance of distribution between the randomized control and observed data sets.

Statistical method
To compare the chromosome bias of HM divergence in ORF and promoter regions of different modes of gene duplication, we calculated the D HM and compared the mean values of D HM in each mode. Wilcoxon rank-sum test was applied to calculate the significance. The same method was applied to assess whether the patterns of expression divergence of duplicated genes were consistent across different perturbation conditions or not.

K S and K A calculation
Coding sequence divergence was measured by the rate of synonymous substitutions (K S ) and nonsynonymous substitutions (K A ). K S and K A between duplicated gene pairs were estimated using MCScanX-transposed software with default parameters.

Gene expression analysis
A total of 88 RNA-seq samples (Dataset S1) from 16 publications were divided into three sample conditions based on different perturbation conditions: (i) the normal development group, named as "Normal" containing 18 wild-type samples; (ii) the group of samples with environmental stresses, named as "Stress" containing 20 samples; (iii) the group of samples with mutations on chromatin modifier (CM), named as "CM_mu" containing 50 samples. These RNA-seq datasets were re-analyzed according to the protocol proposed by Trapnell et al. (2012). This protocol used TopHat2, which is a fast splice junction mapper for RNA-Seq reads (Kim et al., 2013) and Cufflinks2, which could assemble transcripts, estimate their abundances, and test for differential expression based on aligned RNA-Seq reads (Trapnell et al., 2010). Here, the raw datasets were downloaded from the SRA database according to the corresponding SRA ID listed in Additional file 1. Then the reads were aligned to TAIR10 using TopHat2. Finally, the gene expressed abundances were calculated using Cufflinks2.

Functional analysis
To understand the biological function of duplicate genes, the Database for Annotation, Visualization and Integrated Discovery (DAVID) tool with version 6.7 (Huang, Sherman & Lempicki, 2009) was applied to identify the enriched Gene Ontology (GO) functional terms based on each mode of the gene duplication. Further, REVIGO, which could summarize the long lists of GO terms by removing redundant terms (Supek et al., 2011), was used to remove redundant GO terms and visualize the GO enrichment results. All genes and its annotated standard GO terms of arabidopsis are listed in Additional file 5.

Identifying different modes of duplicate genes
All A. thaliana genes were classified by four modes of gene duplication including WGD, proximal, tandem and transposed duplication, and singleton genes ("Materials and Methods"). According to the definition, 8,111 duplicate gene pairs were identified including 4,293 (53%) WGD gene pairs, 2,130 (26%) tandem duplicate gene pairs, 784 (10%) proximal duplicate gene pairs and 904 (11%) transposed duplicate gene pairs (Dataset S2). As shown in Fig. 1A, duplicate genes are scattered across the whole genome, whereas only transposed duplicate genes tend to be located in transposable elements (TEs)-enriched region.
We then calculated the enrichment levels of each HM mark for each gene within its ORF and promoter region ("Materials and Methods", Figs. S1 and S2). Comparing with singletons, WGD genes were more likely to be targeted by active marks (including H3K4me1/2/3, H3K9ac, H3K14ac and H3K36me3) and H3K27me1/3, instead of H3K9me2 in both ORF and promoter regions (Wilcoxon rank-sum test: p-value < 0.05 for all cases). Additionally, transposed duplicate genes show a higher level of H3K9me2 than other modes of duplicate genes and singletons (Wilcoxon rank-sum test: p-value < 0.05 for all cases).
Then, we calculated the pairwise Pearson correlations of HM marks for paralog pairs. The result showed that the correlation coefficients between the duplicate gene pairs for each mark are positive, except for the coefficients between WGD pairs for H3K9me2 (Table 1). Moreover, these correlation coefficients in promoter regions are larger than in ORF regions. On the other hand, some histone modification marks showed co-occurrence between duplicate gene pairs, and the overall patterns of histone modification marks between duplicate pairs are consistent (Pearson correlation coefficient: 0.418 < r < 0.991 and p-value < 1.0E−4, Mantel test (Mantel, 1967), Figs. 1B-1F and Fig. S3).

Divergence of histone modification profiles among different modes of duplicate genes
We further examined the divergence of HM profile between paralog pairs and singletons. The divergence of HM profiles was calculated ("Materials and Methods") in both ORF and promoter regions, which were denoted by scores: D HM-O and D HM-P , respectively. To compare the HM divergence between paralog pairs and randomized singleton pairs, we randomly selected 8,111 pairs from singletons with 10,000 repeats. We found that both D HM-O and D HM-P are lower between duplicate pairs than that of randomized pairs (Wilcoxon rank-sum test: p-value < 2.2E−16 for both cases), which is consistent with the previous study in yeast (Zou et al., 2012).
To investigate the HM status of different gene duplication regions, we compared the effects of HM profile on different gene regions of each gene duplication mode. The results showed that the divergence of HM profile both in ORF and promoter regions are distinct among the different modes of gene duplication (Kruskal-Wallis Test, p-value < 2.2E−16; Figs  divergence score than other duplicate modes, meanwhile, tandem and proximal duplicates have lower divergence score. The physical distance between single-gene duplications followed a trend: transposed duplicates > proximal duplicates > tandem duplicates. We assumed that position effects may have an influence on the divergence of HM profiles of genes with different duplication modes. To test this hypothesis, we selected all A. thaliana gene pairs on the same chromosomes and computed the correlation between HM divergences and physical distances of gene pairs both in ORF and promoter regions. Notably, the HM divergence of gene pairs on the same chromosome is not significantly correlated with physical distance (Pearson correlation coefficient: r = 0.066 for ORF regions and r = 0.089 for promoters). However, gene duplicate pairs on the same chromosome share a more similar HM profile than that on different chromosomes in ORF and promoter regions (Wilcoxon rank-sum test: p-value < 2.2E−16 for both cases, Figs. 2C and 2D).
Notably, duplicate pairs tend to be located on the same chromosome comparing with randomized singleton pairs (Chi-squared test: χ 2 = 152.0918, df = 1, p-value < 2.2E−16), which is termed as "chromosome bias". This falls in line with the research in yeast  Here, chromosome bias represents that two genes of a duplicate gene pair tend to be located on the same chromosome. Significance difference between same and different chromosomes for HM distance are indicated with " ÃÃÃ " for P-value < 0.001.  (Zou et al., 2012). We then wanted to know whether chromosome bias gives a possible explanation for less difference of HM divergence between the duplicate gene pairs than that between randomized singleton pairs. To rule out the possibility, the same number of duplicate gene pairs and random pairs were selected in which both copies located on different chromosomes, with a result that HM divergence was lower between paralog pairs than that between randomized singleton pairs (Fig. 2E, Student's t-test: p-value < 0.001 for both ORF and promoter regions).

Histone modification divergence of paralog pairs along with evolutionary time
To explore how HM profile evolves following gene duplication, the relationship of the HM profile of duplicate genes and the divergence of their coding sequence was characterized. In our study, coding sequence divergence was measured by synonymous substitutions (K S ) rate and nonsynonymous substitutions (K A ) rate. The duplicate pairs with K A < 0.5 and K S < 2.0 were selected for further analysis. We related the divergence of HM profile of duplicate genes to K A (Figs. 3A and 3B) and K S (Figs. 3C and 3D) using a scatterplot. We observed that D HM-O and D HMÀP are weakly positively correlated with K S and K A (Pearson correlation coefficient: 0.06 < r < 0.19; Fig. 3E). As shown in Fig. 3E, HM profile of duplicate genes in ORF diverges much quicker than that in promoters regions (Wilcoxon rank-sum test: p-value < 6.808E−10 for all gene modes), and D HMÀO was more correlated with K S and K A (Pearson correlation: r = 0.18 and 0.19) than D HMÀP (Pearson correlation coefficient: r = 0.06 and 0.12). However, there is no significant difference in HM profile divergence between in ORF and promoter regions for randomized singleton genes (Wilcoxon rank-sum test: p-value = 0.8768). The result illustrated that the divergence of HM in ORF regions is more similar to the coding sequences divergence between paralog pairs. We further explored the association of duplication modes with coding sequence divergence D HM-O of WGDs tend to have a lower correlation with K S and K A than single-gene duplications ( Fig. 3F; Fig. S4). Moreover, at similar K S or K A levels, transposed duplicate pairs were more likely to have higher HM divergence than the tandem and proximal duplicate gene pairs (Figs. 3A-3D). The different extent of HM divergence may be explained by the hypothesis that collinear duplicates tend to have similar chromatin environments, while transposed duplications re-locate to new chromosomal positions that usually have different chromatin environment .

Relationship between epigenetic divergence and gene expression divergence between duplication gene pairs
Our recent study revealed that epigenetic marks play a key role in determining transcriptional outcomes (Wang et al., 2015b). We thus asked whether the divergence of epigenetic patterns contributes to the expression divergence of paralogs. Correlation analysis was performed for HM divergence (D HMÀP and D HMÀO ) and expression divergence (E) between paralog pairs, and we found that there is indeed a relation between them (Pearson correlation coefficient: r = 0.265 for D HMÀO and r = 0.113 for D HMÀP , Figs. 4A and 4B). However, it is a weaker correlation between expression divergence and epigenetic divergence in promoter regions than that in ORF regions. Furthermore, the correlation differs among different modes of gene duplication (Pearson correlation coefficient: 0.239 < r < 0.368 for D HMÀO and 0.096 < r < 0.241 for D HMÀP , Fig. S5). For proximal and transposed duplicates, there is a stronger correlation between HM divergence and expression divergence.
A previous study has suggested that histone modification enzymes and environmental stresses influence the expression evolution of duplicate genes in yeast (Zou et al., 2012). To test whether the expression divergence patterns of duplicate genes were consistent across different perturbation conditions in A. thaliana, we divided expression profiles into three types: Normal, Stress and CM_mu. The expression divergence between duplicate pairs under these three types of conditions were denoted as E Normal , E Stress and E CM mu , respectively. Our results showed that the E CM_nu is significantly greater than E Normal and E Stress (Fig. 4C, Wilcoxon rank-sum test: p-value < 2.2E−16). In contrast, there is no significant difference in expression divergence of duplicate genes in "Normal" and "Stress" conditions ( Fig. 4C, Wilcoxon rank-sum test: p-value = 0.6097). As shown in Fig. 4C, different perturbation conditions have a similar effect on WGD and tandem duplicated genes with all duplicate genes but have little or no effect on the proximal and transposed duplicate genes. Taken together, epigenetic related enzymes (chromatin modifiers) may affect the expression evolution of duplicate genes, especially for WGD and tandem duplicated genes.

Functional divergence and epigenetic divergence between duplicate pairs
The previous study suggested that duplicate pairs with different biological functions differ significantly in ORF HM and promoter HM profiles in yeast (Zou et al., 2012). Here, we estimated how different biological functions influence the epigenetic divergence between duplicate pairs in A. thaliana. The biological functional divergence is classified into five models based on the biological functions of the duplicate genes (Fig. 5A). For each duplicate gene pair (using A and B as an example), A and B share the same GO terms (Type I); A and B partially share GO terms (Type II); A and B do not share GO terms, but they both have associated GO terms (Type III), or just A has associated GO terms (Type IV), or both A and B do not have associated GO terms (Type V). Therefore, the extent of functional divergence showed the following trend: Type I ≈ Type V > Type II > Type III ≈ Type IV. The relationship between functional divergence and duplication modes was investigated. We found that the WGD gene pairs were mostly enriched in Type II, and WGD gene pairs have higher functional divergence than single-gene pairs (Fisher test: p-value < 1.439E−14, Fig. 5B). The single-gene duplication shows the following functional divergence trend: Transposed > Proximal > Tandem (Fig. 5B).
We then investigated the relationship between functional divergence and HM divergence. Duplicate genes from Type I and Type V have lower HM divergence levels than Type II, III and IV (Figs. 5C and 5D; Wilcoxon rank-sum test: p-value < 0.005 for all cases). Type V duplicate genes have much lower HM divergence than Type I in both ORF and promoter regions (Wilcoxon rank-sum test: p-value < 0.005 for both cases). Besides, except for Type V, other types have higher HM divergence in promoter regions than in ORF regions. The results suggested duplicate genes with distinct biological processes undergo the different evolutionary rates of HM pattern in their ORF and promoter regions. That is, functional divergence is consistent with HM divergence.
To understand the biological function of duplicate genes, the functions of the duplicate gene with high HM divergence (D HMÀO > 1 or D HM-P > 1) were investigated. Using gene set enrichment analysis, eight functional models were identified: RNA biogenesis, translational regulation, protein catabolism, protein modification, transport, defense response, metabolism and transcriptional regulation (Fig. 6). It has been reported that the duplication of light-harvesting complex (LHC) in plants could diverge and acquire novel functions, such as response to various stress conditions including high light, high salinity, and nutrient limitation (Rochaix & Bassi, 2019). The biological functions associated with nutrient stress response, magnesium ion transport and phosphate ion transport were identified, indicating the validity of the results in this study.

DISCUSSION
This study overviewed the histone modification (HM) evolution of gene duplication, investigated the evolution's effects on multiple biological points including HM pattern, sequence, expression and function in A. thaliana, and highlighted the important role of histone modification in an evolutionary context. To understand the HM of gene duplication comprehensively, we constructed the whole-genome HM profiling of ten common HM marks in five modes of gene duplication. Our study showed that WGD genes were mostly marked by active marks and H3K27me1/3, while transposed duplicate genes  tended to be modified by H3K9me2. Besides, the correlations of HM marks between duplicate pairs illustrated positive relationships significantly, except for H3K9me2 marking WGD (Table 1). H3K9me2 is an HM mark which is strongly associated with transcriptional repression (Wang et al., 2008). Therefore, the active genes are likely to keep the same HM in the evolution of gene duplication. Furthermore, some marks, such as H3K4me2/3 and H3K36me3, showed co-occurrence between duplicate pairs (Figs. 1B-1F), which is consistent with our previous study (Wang et al., 2015b).
To explore whether the HM is different between duplicated gene pairs and non-duplicated genes, we have investigated the divergence of HM profiling. The result showed that duplicate gene pairs shared more similar HM patterns than randomized singleton pairs both in promoter regions and ORF regions. The same result has been found in yeast (Zou et al., 2012), which means the co-evolution between genetic and epigenetic is important in biology. Note that transposed duplicates have much higher HM divergence than other mode duplicates, which may illustrate the major role of transposon element in generating intergenic variation of species (Slotkin & Martienssen, 2007).
Previous studies have demonstrated that the evolution of epigenetic marks is coupled with coding sequence divergence after gene duplication in yeast and rice (Wang, Marowsky & Fan, 2014;Zou et al., 2012). Consistent with these observations, our data reveal further relationships between HM divergence and coding sequence divergence in A. thaliana. As expected, HM divergence is correlated with the sequence divergence between duplicate genes in the promoter and ORF regions ( Figs. 2A-2D). At similar K A or K S levels, transposed duplicate genes tend to have a higher divergence of HM profile between duplicate genes than tandem and proximal duplicate genes. This may be explained by that collinear duplicates are more likely to have the same chromatin environments, while transposed duplicate genes re-locate to new chromosomal locations that often have different chromatin environments .
The expression analysis revealed that HM divergence and expression divergence between duplicate genes are significantly correlated, which illustrates epigenetic marks have an important role in the regulation of gene expression in eukaryotic genomes. Notably, for proximal and transposed duplicates, there is a strong correlation between HM divergence and expression divergence. As a proof of concept, we validated our result using RNA-seq datasets through collecting three types of expression profiles: Normal, Stress, and Mutated (CM_mu). Our results showed that expression divergence between duplicate pairs in "CM_mu" condition is significantly greater than the other two conditions (Wilcoxon rank-sum test: p-value < 2.2E−16). Epigenetic related enzymes may play an important role in the expression evolution of duplicate genes.
Lastly, we investigated the functional divergence of duplicated genes with the epigenetic divergence. Results show that functionally selective constraints may impose on epigenetic evolution after gene duplication. Furthermore, gene duplicate modes showed the following functional divergence: WGD > Transposed > Proximal > Tandem. According to the result, we suppose that functionally selective constraints pose a greater influence on the epigenetic pattern divergence of WGD genes. It has reported that the WGD could cause a diversification of gene repertoires and allowed functional specialization of its copies in vertebrates, which increased the complexity of gene regulation (Marletaz et al., 2018).
The GO-based analysis also found that duplicate genes with high HM divergence are involved in transcriptional/translational regulation and stress defense response, which is consistent with previous studies (Hanada et al., 2008;Maere et al., 2005). In the future, we hope that studying the three-dimensional (3D) regulation of duplicate genes will enhance our understanding of the mechanisms of gene duplication.

CONCLUSIONS
It has been reported that genome duplication contributes to the histone modification status change (Zhang et al., 2019), then the histone modification could regulate the gene expression backforward. This study investigated the characterizations of duplicate genes and described the features of duplicate genes with different histone modifications.
We found that duplicate genes share more similar HM patterns than randomized singleton pairs in their promoters and ORF regions. Although it is not clear to what extent the epigenetic profiles are the cause or effect of duplicate genes, our result suggests that genetic variables could give rise to epigenetic divergence. Besides, this study shows that the correlation between HM divergence and the expression divergence of duplicate pairs is significant. The expression divergence between duplicate genes changes with the mutation of chromatin modifiers. Based on the results, the functional divergence has a role in the epigenetic divergence after gene duplication, and duplicate genes with high HM divergence are involved in multiple biological processes. In summary, this work will shed light on the role of epigenetic factors contributing to the regulatory evolution and divergence of gene duplication.