Comparative methylation and RNA-seq expression analysis in CpG context to identify genes 1 involved in Backfat vs Liver diversification in Nanchukmacdon Pig 2

Comparative methylation and RNA-seq expression analysis in CpG context to identify genes 1 involved in Backfat vs Liver diversification in Nanchukmacdon Pig 2 Devender Arora, Jong-Eun Park, Dajeong Lim, Bong-Hwan Choi, In-Cheol Cho, 3 Krishnamoorthy Srikanth, Jaebum Kim and Woncheoul Park 4 Animal Genomics and Bioinformatics Division, National Institute of Animal Science, RDA, Wanju 55365, 5 Republic of Korea 6 Subtropical Livestock Research Institute, National Institute of Animal Science, RDA, Jeju 63242, Korea 7 Affiliation: Department of Biomedical Science and Engineering, Konkuk University, Seoul 05029, 8 Republic of Korea 9 Department of Animal Science, Cornell University, Ithaca, NY, United States-14853 10 Corresponding Author 11 12

promoters, usually prone to transcription (Additional file: Figures S1). DMR study was to compare 128 the tissue-specific methylation level and de novo motif analysis for TBFS was carried out for 129 backfat vs liver DMRs using the Homer software (Table: 1) (Additional file: Table S1).

Identification of DEGs, CpG methylation, and Gene ontology
DESeq2 an R package is implemented to identify statistically significant differences in gene expression obtained from featurecount. The overall relationship between backfat and liver was represented in Volcano Plot ( Figure. 3a

Circos plot
Circos plots of all four conditions were generated using CIRCOS tool [31]. The outermost ring represents the 18 autosome chromosomes of sscrofa. The second and fourth ring represents the hypermethylated and upregulated genes identified in the DMRs and DEGs for backfat and liver tissues respectively. The third and fifth ring represents the downregulated genes in the methylated regions ( Figure 4).

Discussion
In the present investigation, to understand the role of genes involved in tissue-specific diversification we have presented a comprehensive view with comparative methylation pattern with differentially expressed genes amongst backfat and liver tissue in Nanchukmacdon Pig.
Methylation analysis is one of the most promising methods recently evolved used to accurately decode diversification in cross tissue differentiation pattern as well as decode close relationship amongst different tissues. Studying these pattern will ultimately help us in identifying markers that specifically targets breed to enhance tissue of interest. Therefore, we profiled DNA methylation and RNA-seq data for the different tissue and integrated the results to identify genes governing the changes and their involvement in tissue-specific changes led by methylation. Our approach targeted tissue-specific methylation patterns in the CpG context, DMR, and gene expression understanding of each tissue. We have analyzed hyper-methylation differentially expressed regions, motif analysis, and role of CpG island in the DMRs for these changes. Respectively, we performed gene expression analysis and with cutoff FDR≤0.05 and Log2FoldChange ≥±2, we have identified genes that are expressed in specific tissue types.
Finally, we integrated all the data to identify potential genes and regions that are hypermethylated-upregulated as well as hyper-methylated down-regulated genes in backfat and liver underlying in CpG island and play important role in the tissue-specific diversification.
Subsequently, we performed gene ontology studies to gain insight knowledge of the genes involved in each condition.
During tissue-specific comparative analysis, we found C methylation in CpG island of backfat is dominating with 77% and 71% in liver tissue ( Figure: 2b) (Additional file: File S1) indicating that the methylation majorly occurred during backfat development which complements by commonly expressed gene and DMRs in the CpG methylation analysis as methylation in CpG island is necessary to control aberration and in our investigation of comparative analysis common genes in CpG islands with methylation and differentially expressed pattern has limited the total number of genes to 16. Amongst, 13 genes were Hyper-methylated in the liver, and 3 were hyper-methylated in backfat.
We performed DMR analysis for denovo methylated regions and found rank 1 motif includes "TATA box" a promoter sequence, which specifies to other molecules where transcription begins and strongly modulates cell-and tissue-specific RANKL expression and osteoclastogenesis process [32]. We have observed a uniform pattern of motif methylation in the highly conserved regulatory factor x genes family which has been reported in the early development and maturation of cells [33] [ Table 1]. The top identified motifs were of particular interest, with most motifs were actively involved in upstream binding to transcription factor and regulating cis and epi-cistrome features that regulate DNA landscape [34]. Similarly, the identified motif was found to have a strong association regulatory transcription factor and has been involved in the differentiation process and sought to observe RAR/RXR bound regions are enriched in differentiation regions [35].
Our findings on common genes in CpG islands with methylation and differentially expressed patterns have a limited total number of genes to 16. Amongst, 13 genes were Hyper-methylated in the liver, and 3 were hyper-methylated in backfat. Among the identified genes, SIX2 is already reported to have involvement in the differentiation process [36]. Methylation in CpG island is necessary to control aberration and to access the impact on gene ontology we have Conclusion: Methylation play important role and understanding gene expression at CpG island in tissue diversification is a potential approach to understand these mechanism. In the present investigation, we have identified common genes highly expressed, and differentially methylated that could be used as potential markers for working in molecular breeding processes and enhancing biologically relevant tissue.

Preparation of gDNA and Total RNA and Sequencing
We collected tissue samples from the backfat and liver of five Nanchukmacdon pigs. Genomic

DMRs and DEGs analysis of WGBS and RNA-seq data
The analysis for WGBS data was performed using reproducible genomics analysis pipeline PiGx-bsseq to understand methylation patterns in identified genes [37]. Where sequence was initially performed for a quality check using trim_galore [38] and alignment were subjected to the filtration of duplicate reads with sam_blaster and sorted using SAMtools [39] afterward mapped to the reference genome of sscrofa11.1 using Bismark [40]. Bismark methyl extractor was performed to measure the methylation in CpG, CHH, and CHG. Sorting of Bam file was undertaken before running the methylcall with the average conversion rate of >99.4% by applying filters based on a minimum coverage of 10 and a mapping quality of at least 10. Since we were interested in identifying the differential pattern in the respective tissues later performed the DMR studies across backfat and liver using methylKit an R package [41-43].
Logistic regression approach was implemented to model the odd log probability of observing this ratio. False discovery rate (Q ≤ 0.01) and percent methylation difference larger than 25% were selected and DMRs were extracted. was used to identify DEGs by setting a cutoff of FDR≤0.05 and log2FoldChange of ±2 for upregulated and downregulated genes.

De novo motif discovery
Hyper-methylated regions were predicted with a cutoff of ±25 in DMRs in backfat and liver.
We were interested in understanding the motif for these methylated regions in GC% of CpG island which is found near to transcription start site and performed by findMotifsGenome.pl module of HOMER software at default parameter [50]. Rank-wise motifs were detected with sorted p-value, %target, and %background targets.

Functional enrichment analysis of methylated genes with differentially expressed genes.
After identifying DEGs commonly found in backfat and liver methylated regions with FDR ≤ 0.05 and log2FoldChange ≥±2 were compiled and submitted to DAVID v6.8 [30] for functional annotation and enrichment analysis. For each list, enriched Gene Ontology (GO) studies were performed for Biological Processes, Molecular functions, and Cellular Compartments.These terms were then clustered semantically using the ReviGO server [51] and Clusterprofiler R package [52] were used for summarizing the GO terms.

CpG island and methylation pattern analysis.
Based on DMRs we aimed to identify regions either inclined towards backfat or liver by comparing CpG island coordinates retrieved from UCSC genome browser [53]. A total of