Next Article in Journal
Thyroid Hormone and Mitochondrial Dysfunction: Therapeutic Implications for Metabolic Dysfunction-Associated Steatotic Liver Disease (MASLD)
Previous Article in Journal
T12-L3 Nerve Transfer-Induced Locomotor Recovery in Rats with Thoracolumbar Contusion: Essential Roles of Sensory Input Rerouting and Central Neuroplasticity
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Mapping Chromatin Occupancy of Ppp1r1b-lncRNA Genome-Wide Using Chromatin Isolation by RNA Purification (ChIRP)-seq

by
John Hojoon Hwang
1,2,3,†,
Xuedong Kang
1,2,†,
Charlotte Wolf
1,4 and
Marlin Touma
1,2,3,5,6,*
1
Neonatal/Congenital Heart Laboratory, Cardiovascular Research Laboratories, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA 90095, USA
2
Department of Pediatrics, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA 90095, USA
3
Children’s Discovery and Innovation Institute, Department of Pediatrics, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA 90095, USA
4
Medical and Life Science, College of Life Science, University of California Los Angeles, Los Angeles, CA 90095, USA
5
Molecular Biology Institute, College of Life Science, University of California Los Angeles, Los Angeles, CA 90095, USA
6
Eli and Edythe Broad Stem Cell Research Center, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA 90095, USA
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Cells 2023, 12(24), 2805; https://doi.org/10.3390/cells12242805
Submission received: 25 October 2023 / Revised: 23 November 2023 / Accepted: 3 December 2023 / Published: 8 December 2023
(This article belongs to the Section Cell Methods)

Abstract

:
Long non-coding RNA (lncRNA) mediated transcriptional regulation is increasingly recognized as an important gene regulatory mechanism during development and disease. LncRNAs are emerging as critical regulators of chromatin state; yet the nature and the extent of their interactions with chromatin remain to be fully revealed. We have previously identified Ppp1r1b-lncRNA as an essential epigenetic regulator of myogenic differentiation in cardiac and skeletal myocytes in mice and humans. We further demonstrated that Ppp1r1b-lncRNA function is mediated by the interaction with the chromatin-modifying complex polycomb repressive complex 2 (PRC2) at the promoter of myogenic differentiation transcription factors, TBX5 and MyoD1. Herein, we employed unbiased chromatin isolation by RNA purification (ChIRP) and high throughput sequencing to map the repertoire of Ppp1r1b-lncRNA chromatin occupancy genome-wide in the mouse muscle myoblast cell line. We uncovered a total of 99732 true peaks corresponding to Ppp1r1b-lncRNA binding sites at high confidence (p-value < 1E-5) and enrichment score ≥ 10). The Ppp1r1b-lncRNA-binding sites averaged 558 bp in length and were distributed widely within the coding and non-coding regions of the genome. Approximately 46% of these true peaks were mapped to gene elements, of which 1180 were mapped to experimentally validated promoter sequences. Importantly, the promoter-mapped binding sites were enriched in myogenic transcription factors and heart development while exhibiting focal interactions with known motifs of proximal promoters and transcription initiation by RNA Pol-II, including TATA-box, transcription initiator motif, CCAAT-box, and GC-box, supporting Ppp1r1b-lncRNA role in transcription initiation of myogenic regulators. Remarkably, nearly 40% of Ppp1r1b-lncRNA-binding sites mapped to gene introns were enriched with the Homeobox family of transcription factors and exhibited TA-rich motif sequences, suggesting potential motif-specific Ppp1r1b-lncRNA-bound introns. Lastly, more than 136521 enhancer sequences were detected in Ppp1r1b-lncRNA-occupancy sites at high confidence. Among these enhancers, 3390 (12%) exhibited cell type/tissue-specific enrichment in fetal heart and muscles. Together, our findings provide further insights into the genome-wide Ppp1r1b-lncRNA: Chromatin interactome that may dictate its function in myogenic differentiation and potentially other cellular and biological processes.

1. Introduction

The majority of the mammalian genome is transcribed to produce RNA transcripts, most of which display no protein-coding potential [1]. Long noncoding RNA (lncRNA) transcripts define an expanding class of non-coding RNA species that are longer than 200 nucleotides and lack functional open reading frames. Like mRNAs, lncRNAs are primarily transcribed by RNA polymerase II (RNA Pol-II), 5′-capped, poly A-tailed, and post-transcriptionally modified mostly by splicing [2,3].
LncRNAs are pervasively transcribed across the genome and have emerged as important transcriptional regulators, affecting all layers of transcriptome regulation, including RNA transcription, splicing, and metabolism [2,3,4,5]. As our understanding of biochemical properties and functional diversity of lncRNA continues to evolve, it is widely accepted that lncRNAs can exert diverse functions that arise from their ability to form complex secondary structures with DNA-, RNA-, and protein-binding properties, leading to complex RNA-DNA, RNA-RNA, or RNA-protein interactions [5,6]. Moreover, a single lncRNA may contain several binding loops that are able to bind to nucleic acids via base pairing or to proteins by certain RNA binding motifs, thus allowing the coordination of signals between different types of macromolecules and chromatin-modifying complexes [5,6]. It has been evident that several lncRNAs, such as HOTAIR (HOX antisense intergenic RNA) and Bvht (Braveheart), can execute their regulatory functions by recruiting chromatin modification complexes and altering the state of chromatin accessibility, leading to transcriptional activation or repression [7,8]. By performing these diverse functions, lncRNAs can influence cellular biology, molecular processes, and tissue homeostasis at multiple levels, including transcriptome regulation, molecular networking, cellular differentiation, and developmental decisions [2,3,4,5,6,7,8].
During development, chromatin state is a key determinant of cellular differentiation, identity, and fate [9,10,11,12]. We have previously identified Ppp1r1b-lncRNA as an essential and functionally conserved epigenetic regulator of myogenic differentiation of cardiac and skeletal myocytes in both mice and humans [13]. Importantly, in response to Ppp1r1b-lncRNA loss, human induced pluripotent stem cells (hiPSCs)-derived cardiac progenitors and skeletal myoblast cell lines failed to produce early markers of myogenic differentiation program upon induction [13]. Cellular differentiation requires the activation of specific transcriptional programs that are governed by cell-specific master regulators and transcription factors [14,15]. We have demonstrated that Ppp1r1b-lncRNA interferes with polycomb repressive complex 2 (PRC2) binding at target promoters of the master transcription factors of myogenic differentiation, TBX5 and MyoD1, leading to decreased enrichment of H3K27me3, a PRC2-catalyzed epigenetic marker of transcriptional repression. In turn, the resulting enhanced chromatin accessibility leads to positive regulation of TBX5 and MyoD1 and induction of myogenic differentiation programs in cardiac and skeletal myocytes. These findings support the key role of Ppp1r1b-lncRNA in modulating chromatin states in a gene-specific manner to promote myogenic differentiation.
Interestingly, while Ppp1r1b-lncRNA was initially thought to act locally on a neighboring protein-coding gene [3], our mechanistic studies, including chromatin isolation by RNA purification-polymerase chain reactions (ChIRP-PCR), revealed that Ppp1r1b-lncRNA executes its function by physically interacting with distantly located transcription factors (TBX5 and MyoD1). In our work presented here, we uncover the full panel of Ppp1r1b-lncRNA-binding sites and explore how the specificity for Ppp1r1b-lncRNA interactions is achieved [13].
We employed a ChIRP strategy followed by single-read high throughput DNA sequencing and subsequent bioinformatics tools to map Ppp1r1b-lncRNA occupancy at the genome scale. By applying downstream peak calling pipeline and peak mapping to gene elements, we revealed genome-wide Ppp1r1b-lncRNA-bound chromatin and gained further insights into the specific motifs that may underlie Ppp1r1b-lncRNA function at proximal promoters or distant enhancers of its putative target genes, including those encoding myogenic differentiation factors, transcription regulation, and chromatin modifiers.

2. Materials and Methods

2.1. ChIRP Assay

2.1.1. Probe Design for ChIRP

The Magna ChIRP RNA interactome kit (EMD Millipore Corp, Burlington, MA, USA) was used. Assays were performed per the manufacturer’s protocol. The capture probe is an antisense-oligo high-affinity probe targeted against a unique Ppp1r1b-lncRNA sequence (Figure 1) that does not overlap with other Ppp1r1b transcripts. It was designed using Stellaris Probe Designer version 1.0 (http://www.singlemoleculefish.com, accessed on 10 January 2018). The probe was compared with the mouse genome using the BLAT tool, https://www.ncbi.nlm.nih.gov/pmc/articles/PMC187518/ and no noticeable homology to non-Ppp1r1b-lncRNA targets were detected. An anti-sense oligo probe against lacZ RNA was provided by the ChIRP kit and used as a negative control for ChIRP-PCR experiments. Both probes were biotinylated at the 3′ end.

2.1.2. Cell Culture

Mouse myoblasts, C2C12 cell line (ATCC), were cultured in DMEM (Invitrogen) supplemented with 10% Fetal Bovine Serum (FBS) and 1% Pen/Strep (Invitrogen).

2.1.3. Cross-Linking, Sonication, and Hybridization

C2C12 cells were grown to log-phase in tissue culture plates and rinsed once with room temperature Phosphate buffer saline (Pbs). Cells were treated with glutaraldehyde for cross-linking, as described previously [13,16,17]. The resulting chromatin was fragmented by sonication. A sample consisting of 2% of the total input chromatin was then removed and served as the sequencing control. A biotinylated complementary anti-sense oligo probe was hybridized to Ppp1r1b-lncRNA and then isolated using magnetic streptavidin beads. No cross-hybridization with the LacZ probe was detected. The co-purified Ppp1r1b-lncRNA bound chromatin was then eluted for protein, RNA, and DNA. Using a combination of RNase A and RNase H, the DNA was gently eluted off of beads as described by the manufacturer’s instructions and processed into small fragments for library preparation.

2.2. ChIRP-Seq

2.2.1. Library Preparation and High-Throughput Sequencing

The sequencing libraries were constructed from the ChIRP-captured and control “input” DNA fragments. Around 3 ng Ppp1r1b-lncRNA-ChIRP DNA and 3 ng control DNA “input” were used for library preparation as per manufacturers’ protocol. DNA fragments were subjected to DNA-end repair, 3′-ad overhanging, and adaptor ligation and then amplified using PCR. After size selection (between 100 and 500 bp), qualified Ppp1r1b-ncRNA-ChIRP and control DNA libraries were used for high throughput single-end (SE) sequencing on BGIseq at a read length of 50 bp, generating an average of 38 million raw sequencing reads per sample.

2.2.2. Bioinformatic Analysis Workflow

1. Data filtering: Raw sequencing data were filtered using the software Short Oligonucleotide Analysis Package (SOAP) nuke 2.1.8 to remove adapter sequences, contamination, and low-quality reads. The following parameters were used for the SOAPnuke filter: I 5 q 0.5 -n 0.1 -Q 2 -c 40. Reads were considered “low-quality” if any of the following was true: (1) the ratio of N (unmappable reads) in whole read was >10%; (2) reads in which unknown bases exceeded 10%; or (3) the ratio of base whose quality was less than 20 was >10%.
2. Reads alignment: Clean reads that passed quality check measures were stored in FASTQ format and then aligned to the reference genome GRCm38/mm10 (Genome Reference Consortium Mouse Build 38 Organism: Mus musculus 10) using SOAP aligner SOAP2 (Version: 2.21t) [18]. No more than two mismatches were allowed in read alignment. Base coverage was normalized per million mappable reads. Reads from Ppp1r1b-lncRNA-ChIRP and control samples were aligned separately. The alignment results were then used for peak calling.
3. Peak calling and identifying true peaks: The uniquely mapped clean reads resulting from the alignment step were then used for peak calling. Candidate peaks for each sample were called using the software Model-Based Analysis for ChIP-Seq (MACS) v1.4.2. [19]. The following parameters were used for peak calling: -g mm -s 50 -p < 1E-5 -m 10 30 --broad -B –trackline:
-g: mappable genome size, defined as the genome size that can be sequenced. GRCm38/mm10 = 1.87E9 (G)
-s: size of sequencing tags
-p: p-value cut-off: 1E-5
-m: minimum length of called peak (10) and maximum gap allowed between two peaks (30) to be merged.
14: smallest peak size
—: input file format in BED format
Based on λlocal, MACS workflow uses dynamic Poisson distribution to calculate the p-value of the specific region based on the unique mapped reads. The region is defined as a peak when the p-value < 1E-5 (by default). The MACS-predicted peaks are also assigned enrichment scores. The more enriched these regions are, the more likely they represent true binding sites. In our analysis, MACS-predicted peaks were further filtered to obtain a list of true peaks (true Ppp1r1b-lncRNA-binding sites) at stringent enrichment score values of ≥10.
4. Peak mapping to promoter elements: To identify peaks that overlap with promoters, 25,111 coding and 3077 non-coding promoters for GRCm38/mm10 were downloaded from the Eukaryotic Promoter Database new (EPDnew) (https://epd.epfl.ch/epdnew/documents/MmNC_epdnew_001_pipeline.php, 1 March 2023) [20,21]. The EPDnew promoters are experimentally validated with next-generation sequencing-based whole-genome TSS mapping protocols, such as Cap Analysis of Gene Expression (CAGE) and Oligocapping, and include TATA-box, initiator motif (IM), CCAT-box, and other well-established promoter elements. Using the Bedtools “intersect” feature, true peaks (enrichment score ≥ 10) with at least 30% overlap with EPDnew promoters were identified.
5. Peaks mapping to putative enhancer elements: Putative enhancer elements were obtained from Enhancer Atlas Browser (Enhancer Atlas 2.0; http://www.enhanceratlas.org/indexv2.php) [22]. The database provides enhancer annotation in nine species, including human, mouse, fly, worm, zebrafish, rat, yeast, chicken, and boar annotations. The consensus enhancers were predicted based on multiple high-throughput experimental datasets (e.g., histone modification, CAGE, GRO-seq, transcription factor binding, and DHS). Currently, the updated database contains 6,198,364 enhancers and 7,437,255 enhancer-gene interactions involving 31,375 genes for 241 murine tissue/cell types identified from 5838 datasets such as NCBI GEO datasets, ENCODE project portal at UCSC, Epigenome Roadmap and FANTOM. As mentioned above, we used the Bedtools “intersect feature to identify true peaks that overlapped with the enhancer elements at confidence score ≥ 1.
6. Peak visualization using UCSC Genome Browser and Broad Institute Integrative Genomics Viewer (IGV): The UCSC Genome Browser, which contains genome references assemblies for multiple species, was used to visualize and download Ppp1r1b-lncRNA-ChIRP derived peaks as well as specific genes and regions genome-wide. After selecting the GRCm38/mm10 genome on the UCSC genome browser, Ppp1r1b-lncRNA-ChIRP, control, and Peak Bed files were uploaded to custom tracks. The distribution of peaks across the genome and within specific regions was shown. IGV was used in a similar fashion for visualization and analysis [23].
7. Motif analysis using MEME-SEA: Motifs analysis was performed using Multiple EM for Motif Elicitation (MEME)-Simple Enrichment Analysis (SEA) v5.5.3. [24,25], using the following command line: sea —verbosity 5 —oc —thresh —align center —p input_file —m motif_database 10.0. MEME works by searching for repeated, ungapped sequence patterns that occur in the DNA or protein sequences provided by the user. The discovered motifs can be compared with databases of known motifs to identify matches to the motifs and display the motifs in various formats. The motif database used in this study is the Universal PBM Resource for Oligonucleotide Binding Evaluation (UniPROBE) database for the murine species, which is generated by universal protein binding microarray (PBM) assays on the in vitro DNA binding specificities of proteins [26].

2.2.3. Functional Enrichment of Peaks’ Related Genes

1. Identifying peaks’ related genes: Data were downloaded from the UCSC Genome browser. To identify peaks’ related genes, we applied the following criteria: A. The reads are uniquely mapped to a protein-coding gene. B. The genes must be annotated (gene name present). C. The gene status is known.
2. Gene ontology annotation of peaks’ related genes: Gene Ontology (GO) analysis was used to predict the main biological functions that are enriched in the peaks’ related genes and assign them to specific molecular functions, biological processes, and cellular components [27]. All peaks’ related genes were mapped to GO terms in the database (http://www.geneontology.org). The number of genes for every term was then calculated. Finally, a hypergeometric test was used to find significantly enriched GO terms in the query list of peaks’ related genes. The calculated p-value goes through Bonferroni correction, and a corrected p-value ≤ 0.05 defines the significantly enriched GO terms in the peaks’ related genes.
3. KEGG pathway enrichment: To further understand the biological functions of the peak-related genes in a pathway-based contest, KEGG (Kyoto Encyclopedia of Genes and Genomes) was used [28]. This analysis identifies significantly enriched metabolic pathways or signal transduction pathways in peak-related genes compared with the target regions’ background. The analysis process follows the same pipeline as that in GO analysis.

2.3. Statistical Analysis

Quantified results and statistical parameters for each bioinformatic analysis step were presented with their data within the corresponding sections of the text.

3. Results

3.1. Quality Control and Alignment Statistics Results

In this study, we mapped Ppp1r1b-lncRNA chromatin occupancy genome-wide by ChIRP-seq in a mouse myoblast cell line, which expresses endogenous Ppp1r1b-lncRNA [3,13]. Sequencing libraries were prepared from Ppp1r1b-lncRNA-ChIRP and control input DNA fragments and subjected to single read high throughput sequencing at a read length of 50 bp (Figure 2). An average of 38 million raw sequencing reads per sample were generated. After filtering low-quality reads and removing adaptor sequences, 36,534,935 and 38,429,369 (99.13% and 98.04%) clean reads were obtained from Ppp1r1b-lncRNA-ChIRP and control samples, respectively, to be used for downstream analysis. The sequencing data summary for each sample is summarized in Table 1.
For read mapping, the clean reads were mapped to the mouse reference genome GRCm38/mm10 using SOAPaligner/soap2 [18]. Only the alignments within 2 bp mismatches were considered.Strict quality control measures for each sample were applied, achieving a genome mapping rate of 95.21% and 96.97% for Ppp1r1b-lncRNA-ChIRP and control samples, respectively (Supplementary Materials, Table S1).
For peak calling, the uniquely mapped clean reads that only map to one genomic position in the total reads were included. Following this criteria, 31,664,010 and 32,579,793 uniquely mapped reads (86.7% and 84.78%) were obtained from Ppp1r1b-lncRNAChIRP and control samples, respectively, to be used for downstream peak calling and subsequent analysis. Alignment statistics results and genome mapping rate for each sample are summarized in Table 2.

3.2. Genome and Gene Depth Distribution Analysis

The uniquely mapped reads that passed quality control measures were then used to estimate the genome depth distribution for each sample separately using BEDTools. The percentage of genome coverage for each sample is shown in [Figure 3A,B]. Gene depth distribution was also obtained separately for each sample by BEDTools, and only those uniquely mapped reads were used in this analysis. As shown in [Figure 3C,D], the average depth of Ppp1r1b-lncRNA-ChIRP reads exhibited differential distribution in relation to genic regions with increased coverage around TSS and towards the distal 50% part of genes.

3.3. Peak Calling

The uniquely mapped reads were then used for genome-wide peak calling using MACS standard pipeline [19]. MACS-detected peak statistics at a p-value cut-off < 1E-5 are summarized in (Table 3). MACS peak calling statistics results include peak location, peak enrichment score, and peak length.
In total, MACS identified 261,455 peaks in the Ppp1r1b-lncRNA-ChIRP sample that passed a p-value < 1E-5 against control input. Ppp1r1b-lncRNA-ChIRP peaks were short and focal, ranging between 165 and 4500 bp in length with a mode of 165 bp and an average peak length of 558 bp (Figure 4A,B). With a genome coverage rate of 5.35%, the peaks were widely distributed in the intergenic (53.4% of all peaks) and the genic (46.4% of all peaks) regions. With reference to the gene elements, 39.6% of the gene-mapped peaks were located in introns, while 2.5% were mapped to exons, 2.2% mapped to immediate Up2k, and 2.1% mapped to immediate Down2K, of the TSS and TES, respectively (Figure 4C).
To enhance the specificity of peaks that represent true Ppp1r1b-lncRNA binding sites, only peaks with enrichment scores equal to or greater than 10 were defined as true peaks that represent Ppp1r1b-lncRNA-binding sites and retained for further analysis. Using these parameters, 99,732 true Ppp1r1b-lncRNA-binding sites were identified genome-wide, of which 42,393 (43% of true peaks) were mapped to protein-coding genes, indicating that the ratio of the gene-mapped peaks to the total number of peaks did not change significantly despite using more stringent thresholds for true peaks selection. Notably, like the genome-mapped binding sites (Figure 4D,E), the gene-mapped binding sites (Figure 4F,G) were widely spread on all chromosomes and retained similar patterns of distribution at different cut-off values for enrichment scores. Furthermore, the length distributions, as well as enrichment score proportions of the genome-mapped binding sites (Figure 5A–D) were comparable to those mapped to the genes (Figure 5E–H). The peak calling results were stored in wiggle files and viewed on the UCSC genome browser and the Integrative Genome Viewer (IGV) for peak visualization. Examples of Ppp1r1b-lncRNA-binding sites are presented in (Supplementary Materials, Figure S1).

3.4. Functional Annotation of Peaks’ Related Genes

The peaks’ related genes are candidate Ppp1r1b-lncRNA binding sites, from which we may infer its potential biological impact and mechanisms of function. The true peaks’ related genes, which define Ppp1r1b-lncRNA-binding sites, were listed for functional annotation to characterize the functional properties of these genes and their products using GO analysis. At a global level, Ppp1r1b-lncRNA binding sites were enriched in molecular function in terms of binding, catalytic activity, transcription regulation, and signal transduction. The nucleus, cellular organelles, and cellular membrane were enriched cellular components. Biological regulation, cellular biogenesis, metabolic process, and cellular transport were enriched biological processes (Figure 6A).
To elucidate Ppp1r1b-lncRNA interactions within certain biological contexts or signaling pathways, KEGG pathway analysis was also performed on all Ppp1r1b-ncRNA binding sites’ related genes (Figure 6B, Table 4). Pathways of cardiomyopathy, cancer and pluripotency, transcriptional regulation, and developmental pathways were among the top enriched processes. Further, critical signaling pathways of development and myogenesis, including Wnt, Notch, metabolism, and insulin signaling, were among the top enriched signaling pathways.

3.5. Promoter Mapped Peaks

As previously demonstrated [13], Ppp1r1b-lncRNA executes its functions through the interaction with promoters of myogenic transcription factors. Further, Ppp1r1b-lncRNA-ChIRP peaks (binding sites) are mostly narrow, reminiscent of TF binding sites. Based on these observations, we performed independent ChIRP-PCR assays. In addition to the previously known interactions with MyoD1 and Tbx5, we validated four new interactions with myogenic differentiation factors specific to cardiac and skeletal myocytes identified from ChIRP-seq (Figure 6C). We further examined Ppp1r1b-lncRNA binding to promoter regions using EPDnew [20,21], including all peaks in this analysis. In total, 2871 peaks were mapped to experimentally annotated promoter sequences. Of these, 1180 true peaks (enrichment score ≥ 10) were retained as promoter-mapped Ppp1r1b-lncRNA-binding sites, accounting for 28% of the true binding sites that mapped to protein-coding genes.
Notably, the promoter-mapped binding sites were significantly enriched with transcription regulators, including those involved in Wnt signaling pathway (Lef1 and Tcf7), heart muscle development (Gata4 and Mef2c) regulation of transcription by RNA Pol-lI (Sox17, SRF, EGR2) and chromatin modification (Smarcb1 and Taf9). Correspondingly, the binding sites that mapped to these genes had high enrichment scores (Table 5). Intriguingly, up to 80% of the identified promoter-mapped binding sites were enriched with one or more of the previously validated sequence elements of proximal promotes (TATA-box, transcription initiator motif, CCAT-box, and GC-box) with established specificity for transcription initiation by RNA Pol-II (Table 5). Together, these results are consistent with Ppp1r1b-lncRNA function in transcription initiation of myogenic regulators via binding to their promoter elements.
LncRNA binding of transcription factors is mainly governed by their sequence specificity and, therefore, is typically associated with highly localized ChIRP-Seq signals in the genome. Therefore, we furthered our motif analysis using MEME: Simple Enrichment Analysis, including all Ppp1r1b-lncRNA-binding sites. Using this analysis, we identified 310 transcription factors with specific motif sequences. Of these transcription factors, 25% belong to the Homeobox family, such as HOX and LHX, and were enriched with TA-rich motifs, such as TTAATTAAT and TAATTA motifs (Figure 7, Table 6). In addition, a few zinc finger-related transcription factors were enriched with GC sequence repeats (Figure 7, Table 6). Together, these findings suggest novel motif sequences for Ppp1r1b-lncRNA-specific interactions with transcription factors.

3.6. Enhancer Mapped Peaks

Cell and tissue specificity are governed by tissue/cell-specific enhancer elements. Using Enhancer Atlas 2.0 workflow [22], all MACS-derived peaks were mapped to detect the enhancer elements that may be enriched in Ppp1r1b-lncRNA-ChIRP signals and to identify their tissue/cell-specific enrichment. In total, more than 12,000,000 enhancer sequences were mapped to all peaks. By applying stringent filtering, only enhancers that are enriched in the true Ppp1r1b-lncRNA-binding sites at confidence score ≥ 1 and ≥30% overlap with a given enhancer were retained, leading to 136,521 Ppp1r1b-lncRNA-bound enhancer consensuses. Among these signals, 12% (16571 enhancers) showed specific enrichment in cardiac progenitor cells, fetal heart, and limb tissues at high confidence scores (Supplementary Materials, Table S2). These findings correspond to Ppp1r1b-lncRNA-specific cellular function in myogenic differentiation of heart and muscle development. Furthermore, histone structure genes and epigenetic modification process were enriched in the enhancers’ enriched Ppp1r1b-lncRNA-binding genes.

4. Discussion

In this study, we applied ChIRP-seq technology against GRCm38/mm10 murine species to identify Ppp1r1b-lncRNA chromatin occupancy genome-wide in mouse myoblast cell line, which expresses Ppp1r1b-lncRNA [3,13]. As described previously [16,17], using Glutaraldehyde crosslinking and Ppp1r1b-lncRNA-targeted high-affinity probe, the lncRNA-bound DNA sequences were recovered and purified. LacZ probe was used as a negative control, and no cross-hybridization with Ppp1r1b-lncRNA was observed. The purified Ppp1r1b-lncRNA-ChIRP DNA fragments were used to generate the sequencing libraries and subjected to high throughput single-read sequencing. An input DNA sample was subjected to the same sequencing protocol and used as a control to allow interpretation of the results.
We selected MACS, a window-based method [19], for peak calling based on previous knowledge that Ppp1r1b-lncRNA executes its function via the interaction with myogenic transcription factors [13]. MACS has been reported to outperform several other methods in the identification of transcription-binding sites that tend to be focal and narrow [16,19]. In addition, the MACS pipeline is user-friendly and provides important information for each peak, including genomic position, enrichment score, etc.
Using MACS, we identified 244,944 Ppp1r1b-lncRNA-ChIRP peaks in the genome at p < 1E-5 and enrichment score ≥ 1. Ppp1r1b-lncRNA-ChIRP peaks were focal and narrow, averaging 554 bp in length, and typically span less than 300 hundred nucleotides (Mode 165 bp) but occasionally stretched beyond 2K BPs (1% of all peaks). We found the peaks mapped to the intergenic regions (53.6%) and gene regions (47.4%) and distributed on all chromosomes (Figure 4C). By applying a more stringent enrichment threshold (score ≥ 10) for true peaks, a total of 99,732 Ppp1r1b-lncRNA binding sites were detected at high confidence, of which 44% mapped to annotated protein-coding genes. Hence, despite applying stringent criteria to define true peaks, the proportion of binding sites that mapped to the protein-coding genes was retained, and the distribution patterns of the peaks on the different chromosomes remained consistent at different cut-off values for peak length and enrichment scores, both at genes and genome scales.
The enrichments with myogenesis, muscle contractions, and cardiomyopathy in the interacted genes reinforce the essential role of Ppp1r1b-lncRNA in myogenic differentiation. Other than myogenic differentiation factors, Wnt signaling, Notch signaling, and multipotency pathways are also critical to the lineage commitment of skeletal muscle and cardiac progenitors. Moreover, we identified enrichment with sarcomere structures genes (Myh7, Tnnt2, and Tcap) (Table 4) and components of myocyte membrane, including the Dystrophin–Glycoprotein complex (DGC) components (Dmd, Dnta, Dntb, Sgcd, and Utrn) that play important roles in maintaining the integrity of myocyte cellular membranes in heart and skeletal muscles (Supplementary Materials, Table S3).
The enrichment with ribonucleoproteins and RNA binding proteins (Hnrnpa1, Rbm20, Rbfox1) known to be involved in cardiac and muscle diseases and with chromatin modification genes (Kdm3b, Kdm5c, and Hdac4) supports that Ppp1r1b-lncRNA roles may span transcriptional/post-transcriptional regulation and chromatin modifications (Supplementary Materials, Table S3). These newly identified candidates at the genomic scale beget further functional studies. In addition, we observed numerous specific Ppp1r1b-lncRNA-binding sites that mapped to other annotated regulatory non-coding RNA (St7) and micro-RNA (Mir466 and Mir1191) genes of known functions, with signal intensities and enrichment scores comparable to those mapped to the protein-coding genes (Supplementary Materials, Table S3). Thus, the current comprehensive repertoire of Ppp1r1b-lncRNA occupancy provides a rich resource for a complete understanding of Ppp1r1b-lncRNA function.
Importantly, among the Ppp1r1b-lncRNA-binding sites, we identified the previously confirmed Ppp1r1b-lncRNA-interactions with TBX5 and MyoD1 using ChIRP-PCR [13] (Figure 6C), supporting that our criteria for detecting Ppp1r1b-lncRNA-binding sites can identify true signals with potential functional relevance. Intriguingly, we also detected new unique interactions with other key transcription factors of myogenic differentiation in heart and skeletal muscles and validated these new findings independently using ChIRP-PCR as a gold standard (Figure 6C).
By mapping the Ppp1r1b-lncRNA binding sites to the experimentally validated promoters of the EPDnew database, we identified 1180 hits located in experimentally validated promoters within −1000 to +/−200 kb of TSS of a given gene. These signals predict true promoter-mapped Ppp1r1b-lncRNA-binding sites based on enrichment score ≥ 10. Importantly, most of these promoter occupancy sites were enriched with one or more of the four previously annotated regulatory elements that define proximal promoters with binding affinity to RNA Pol-II. This pattern of Ppp1r1b-lncRNA interaction with the promoter’s elements supports the idea that Ppp1r1b-lncRNA may promote transcriptional initiation [29,30] (Table 5). These data also demonstrate that ChIRP-seq may precisely uncover biologically relevant interactions.
As stated previously, the observed Ppp1r1b-lncRNA-ChIRP peak pattern is similar to ChIP-seq peaks of transcription factors binding sites. It also resembles the pattern of HOTAIR-ChIRP, a lncRNA known to recruit PRC2 [16]. Like transcription factors, it has been postulated that specific DNA motifs may serve to facilitate lncRNA selective interactions, introducing a new class of regulatory elements in the genome that are specifically targeted by lncRNA. For instance, a GA-rich homopurine motif was previously reported for HOTAIR binding [16]. However, unlike HOTAIR, Ppp1r1b-lncRNA has been shown to interfere with PRC2 binding at the promoter of myogenic transcription factors [13]. Collectively, these different interaction motifs may indicate that Ppp1r1b-lncRNA binding and the resulting function involve a coordinated action of multiple factors. Therefore, identifying motifs that infer specificity for Ppp1r1b-lncRNA interactions with chromatin may lead, at a mechanistic level, to classify lncRNAs that alter chromatin states in a specific manner (recruiting PRC2 to promoter vs. inhibiting PRC2 binding at promoter). Indeed, using MEME-SEA, we identified TA-rich motifs occupying Ppp1r1b-lncRNA-binding sites in the Homeobox family of transcription factors. This finding may suggest other functions for Ppp1r1b-lncRNA and raises particular interest in future studies since the Homeobox proteins play critical roles in organogenesis and patterning, including cardiogenesis, during development.
Although Ppp1r1b-lncRNA-ChIRP peaks were narrow and focal, they were not restricted to proximal promoters, but a large proportion of them were located within the intronic and intergenic areas, suggesting potential enrichment with other regulatory elements, such as enhancers. As distal cis-regulatory elements, enhancers activate the transcription of their target genes in cell type-specific and tissue-specific manners [31,32]. To date, the Enhancer Atlas 2.0 database [23] is the most comprehensive enhancer database that includes 13,494,603 annotated consensus enhancers based on 16,055 datasets in 586 tissue/cell types across nine species. Indeed, we identified significant enrichment with sequence motifs of distal enhancers, with more than 90% possessing cell type/tissue-specificity. By narrowing our analysis to cardiac and limb cell type/tissue, we identified 3390 enhancers at a confidence score >1 and overlapped with Ppp1r1b-lncRNA-binding sites ≥ 30%.

5. Conclusions

Our study provides Ppp1r1b-lncRNA occupancy at a genome scale. The identified interaction with promoters and enhancers and their putative enriched motifs may potentially dictate Ppp1r1b-lncRNA function in myogenic differentiation and potentially other cellular and biological processes. We should acknowledge the limitations of our study. Despite the comprehensive analysis and the new insights, our study remains descriptive, and the biological impacts of the newly identified interactions in altering chromatin state and influencing target gene expression remain to be mechanistically investigated. Pending functional results of selected important candidates, the study will further our understanding of the Ppp1r1b-lncRNA-derived functional regulome that may dictate its essential role in myogenic differentiation and potentially other cellular and biological processes.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/2073-4409/12/24/2805/s1, Figure S1: Ppp1r1b-lncRNA Binding Sites to Transcription Factors. A. IGV viewer windows depict Ppp1r1b-lncRNA-sites (peaks) enriched in the promoter, intronic, and the distal 50% regions of genes, including myogenic differentiation transcription factors, Homeobox transcription factors (TA-rich motifs) and zinc fingers (GC-rich motif). Table S1: QC results for each sample; Table S2: Ppp1r1b-lncRNA-bound enhancers. Top 100 Ppp1r1b-lncRNA-enriched enhancers with specificity to fetal heart- or muscle- tissue/cells; Table S3: Examples of Genome-wide Ppp1r1b-lncRNA Interactions. Examples of the Dystrophin-Glycoprotein Complex (DGC) genes, Chromatin Modification genes, RNA-binding protein genes, and noncoding RNA genes are presented.

Author Contributions

Conceptualization, X.K. and M.T.; Methodology, J.H.H., X.K. and M.T.; software, J.H.H. and M.T.; validation, X.K.; formal analysis, J.H.H. and M.T.; investigation, J.H.H., X.K., C.W. and M.T.; resources, M.T.; data curation, J.H.H. and M.T.; writing—original draft preparation, J.H.H. and M.T.; writing—review and editing, J.H.H., C.W. and M.T.; visualization, J.H.H. and M.T.; supervision, M.T.; project administration, M.T.; funding acquisition, J.H.H. and M.T. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by a fellowship grant from the R25 Education Program in Bioinformatics, Omics, and Computational Biology NIH/NICHHD “R25Actf HD109116Projectf01A1” for J.H.H. (PIs SUD and JME) and the UCLA CDI “Seed Award”, the UCLA Academic Senate Faculty Research Fund, and the NIH/NHLBI “1R01 HL153853-01” for M.T.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The datasets used and analyzed during the current study are available from the corresponding author upon reasonable request. The study was registered in the Gene Expression Omnibus (GEO) repository [www.ncbi.nlm.nih.gov/geo] under [Neonatal Heart Maturation (NHM) SupperSeries GSE85728. http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE85728], accessed on 20 August 2016. Upon acceptance of this manuscript, ChIRP sequencing data will be deposited to the repository and made publicly available. C2C12 Cell Line Source: In this study we employed a mouse myoblast cell line obtained from ATCC. https://www.atcc.org/products/crl-1772 (CRL-1772 ™). This line is well-established, commercially available, and has been used by researchers for more than a decade [13].

Acknowledgments

We acknowledge the support of (1) Sherin Devaskar and Josephine Enciso. (2) UCLA Children’s Discovery and Innovation Institute. (3) California Center for Rare Disease (CCRD) at the UCLA Institute for Precision Health. (4) Clinical Genomics Center. (5) UCLA Congenital Heart Defects BioCore.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Alexander, R.P.; Fang, G.; Rozowsky, J.; Snyder, M.; Gerstein, M.B. Annotating non-coding regions of the genome. Nat. Rev. Genet. 2010, 11, 559–571. [Google Scholar] [CrossRef]
  2. Touma, M. Genome regulation by long noncoding RNAs in neonatal heart maturation and congenital heart defects. J. Clin. Mol. Med. 2020, 3, 2516–5593. [Google Scholar] [CrossRef]
  3. Touma, M.; Kang, X.; Zhao, Y.; Cass, A.A.; Gao, F.; Biniwale, R.; Coppola, G.; Xiao, X.; Reemtsen, B.; Wang, Y. Decoding the Long Noncoding RNA During Cardiac Maturation: A Roadmap for Functional Discovery. Circ. Cardiovasc. Genet. 2016, 9, 395–407. [Google Scholar] [CrossRef] [PubMed]
  4. Fernandes, J.C.R.; Acuña, S.M.; Aoki, J.I.; Floeter-Winter, L.M.; Muxel, S.M. Long Non-Coding RNAs in the Regulation of Gene Expression: Physiology and Disease. Noncoding RNA 2019, 5, 17. [Google Scholar] [CrossRef] [PubMed]
  5. Mercer, T.R.; Dinger, M.E.; Mattick, J.S. Long non-coding RNAs: Insights into functions. Nat. Rev. Genet. 2009, 10, 155–159. [Google Scholar] [CrossRef] [PubMed]
  6. Mattick, J.S.; Amaral, P.P.; Carninci, P.; Carpenter, S.; Chang, H.Y.; Chen, L.L.; Chen, R.; Dean, C.; Dinger, M.E.; Fitzgerald, K.A.; et al. Long non-coding RNAs: Definitions, functions, challenges, and recommendations. Nat. Rev. Mol. Cell Biol. 2023, 24, 430–447. [Google Scholar] [CrossRef] [PubMed]
  7. Gupta, R.A.; Shah, N.; Wang, K.C.; Kim, J.; Horlings, H.M.; Wong, D.J.; Tsai, M.C.; Hung, T.; Argani, P.; Rinn, J.L.; et al. Long non-coding RNA HOTAIR reprograms chromatin state to promote cancer metastasis. Nature 2010, 464, 1071–1076. [Google Scholar] [CrossRef] [PubMed]
  8. Klattenhoff, C.; Scheuermann, J.C.; Surface, L.E.; Bradley, R.K.; Fields, P.A.; Steinhauser, M.L.; Ding, H.; Butty, V.L.; Torrey, L.; Haas, S.; et al. Braveheart, a long non-coding RNA required for cardiovascular lineage commitment. Cell 2013, 152, 570–583. [Google Scholar] [CrossRef] [PubMed]
  9. Kouzarides, T. Chromatin modifications and their function. Cell 2007, 128, 693–705. [Google Scholar] [CrossRef]
  10. Aloia, L.; Di Stefano, B.; Di Croce, L. Polycomb complexes in stem cells and embryonic development. Development 2013, 140, 2525–2534. [Google Scholar] [CrossRef]
  11. Adam, R.C.; Fuchs, E. The yin and yang of chromatin dynamics in stem cell fate selection. Trends Genet. 2016, 32, 89–100. [Google Scholar] [CrossRef]
  12. Liu, Z.; Chen, O.; Zheng, M.; Wang, L.; Zhou, Y.; Yin, C.; Liu, J.; Qian, L. Re-patterning of H3K27me3, H3K4me3 and DNA methylation during fibroblast conversion into induced cardiomyocytes. Stem Cell Res. 2016, 16, 507–518. [Google Scholar] [CrossRef]
  13. Kang, X.; Zhao, Y.; Van Arsdell, G.; Nelson, S.F.; Touma, M. Ppp1r1b-lncRNA inhibits PRC2 at myogenic regulatory genes to promote cardiac and skeletal muscle development in mouse and human. RNA 2020, 26, 481–491. [Google Scholar] [CrossRef]
  14. Hernández-Hernández, J.M.; García-González, E.G.; Brun, C.E.; Rudnicki, M.A. The myogenic regulatory factors, determinants of muscle development, cell identity and regeneration. Semin. Cell Dev. Biol. 2017, 72, 10–18. [Google Scholar] [CrossRef] [PubMed]
  15. Olson, E.N. Gene regulatory networks in the evolution and development of the heart. Science 2006, 313, 1922–1927. [Google Scholar] [CrossRef] [PubMed]
  16. Chu, C.; Qu, K.; Zhong, F.L.; Artandi, S.E.; Chang, H.Y. Genomic Maps of Long Noncoding RNA Occupancy Reveal Principles of RNA-Chromatin Interactions. Mol. Cell 2011, 44, 667–678. [Google Scholar] [CrossRef] [PubMed]
  17. Chu, C.; Quinn, J.; Chang, H.Y. Chromatin Isolation by RNA Purification (ChIRP). J. Vis. Exp. 2012, 61, 3912. [Google Scholar]
  18. Li, R.; Yu, C.; Li, Y.; Lam, T.W.; Yiu, S.M.; Kristiansen, K.; Wang, J. SOAP2: An improved ultrafast tool for short read alignment. Bioinformatics 2009, 25, 1966–1967. [Google Scholar] [CrossRef]
  19. Zhang, Y.; Liu, T.; Meyer, C.A.; Eeckhoute, J.; Johnson, D.S.; Bernstein, B.E.; Nusbaum, C.; Myers, R.M.; Brown, M.; Li, W.; et al. Model-based Analysis of ChIP-Seq (MACS). Genome Biol. 2008, 9, R137. [Google Scholar] [CrossRef]
  20. Dreos, R.; Ambrosini, G.; Cavin Perier, R.; Bucher, P. EPD and EPDnew, high-quality promoter resources in the next-generation sequencing era. Nucleic Acids Res. 2013, 41, D157–D164. [Google Scholar] [CrossRef]
  21. Meylan, P.; Dreos, R.; Ambrosini, G.; Groux, R.; Bucher, P. EPD in 2020: Enhanced data visualization and extension to ncRNA promoters. Nucleic Acids Res. 2020, 48, D65–D69. [Google Scholar] [CrossRef] [PubMed]
  22. Gao, T.; Qian, J. EnhancerAtlas 2.0: An updated resource with enhancer annotation in 586 tissue/cell types across nine species. Nucleic Acids Res. 2020, 48, D58–D64. Available online: http://www.enhanceratlas.org/indexv2.php (accessed on 30 March 2023). [CrossRef] [PubMed]
  23. Robinson, J.T.; Thorvaldsdóttir, H.; Winckler, W.; Guttman, M.; Lander, E.S.; Getz, G.; Mesirov, J.P. Integrative Genomics Viewer. Nat. Biotechnol. 2011, 29, 24–26. [Google Scholar] [CrossRef]
  24. Bailey, T.L.; Boden, M.; Buske, F.A.; Frith, M.; Grant, C.E.; Clementi, L.; Ren, J.; Li, W.W.; Noble, W.S. MEME SUITE: Tools for motif discovery and searching. Nucleic Acids Res. 2009, 37, W202–W208. Available online: http://www.ncbi.nlm.nih.gov/pubmed/19458158 (accessed on 5 April 2023). [CrossRef] [PubMed]
  25. Bailey, T.L.; Grant, C.E. SEA: Simple Enrichment Analysis of motifs. bioRxiv 2021. [Google Scholar] [CrossRef]
  26. Hume, M.A.; Barrera, L.A.; Gisselbrecht, S.S.; Bulyk, M.L. UniPROBE, update 2015: New tools and content for the online database of protein-binding microarray data on protein-DNA interactions. Nucleic Acids Res. 2015, 43, D117–D122. [Google Scholar] [CrossRef]
  27. Ye, J.; Zhang, Y.; Cui, H.; Liu, J.; Wu, Y.; Cheng, Y.; Xu, H.; Huang, X.; Li, S.; Zhou, A.; et al. WEGO: A web tool for plotting GO annotations. Nucleic Acids Res. 2006, 34, W293–W397. [Google Scholar] [CrossRef]
  28. Kanehisa, M.; Araki, M.; Goto, S.; Hattori, M.; Hirakawa, M.; Itoh, M.; Katayama, T.; Kawashima, S.; Okuda, S.; Tokimatsu, T.; et al. KEGG for linking genomes to life and the environment. Nucleic Acids Res. 2008, 36, D480–D484. [Google Scholar] [CrossRef]
  29. Calhoun, V.C.; Stathopoulos, A.; Levine, M. Promoter-proximal tethering elements regulate enhancer-promoter specificity in the Drosophila antennapedia complex. Proc. Natl. Acad. Sci. USA 2002, 99, 9243–9247. [Google Scholar] [CrossRef]
  30. Yella, V.R.; Bansal, M. DNA structural features of eukaryotic TATA-containing and TATA-less promoters. FEBS Open Bio. 2017, 16, 324–334. [Google Scholar] [CrossRef]
  31. He, Y.; Gorkin, D.U.; Dickel, D.E.; Nery, J.R.; Castanon, R.G.; Lee, A.Y.; Shen, Y.; Visel, A.; Pennacchio, L.A.; Ren, B.; et al. Improved regulatory element prediction based on tissue-specific local epigenomic signatures. Proc. Natl. Acad. Sci. USA 2017, 114, E1633–E1640. [Google Scholar] [CrossRef] [PubMed]
  32. Gorkin, D.U.; Barozzi, I.; Zhao, Y.; Zhang, Y.; Huang, H.; Lee, A.Y.; Li, B.; Chiou, J.; Wildberg, A.; Ding, B.; et al. An atlas of dynamic chromatin landscapes in mouse fetal development. Nature 2020, 583, 744–775. [Google Scholar] [CrossRef] [PubMed]
Figure 1. Ppp1r1b-lncRNA Genomic Position and Probe Design. (A) Schematic illustration of the relative position of lncRNA NONMMUT011874 (Ppp1r1b-lncRNA) in relation to Ppp1r1b gene in mouse genome. Adopted from Touma et al. 2016 [3]. (B) Mouse Ppp1r1b ENSMUSG00000061718 Transcripts. Red arrow and box highlight the unique exon that was targeted to design a Ppp1r1b-lncRNA anti-sense oligo probe for Ppp1r1b-lncRNA.
Figure 1. Ppp1r1b-lncRNA Genomic Position and Probe Design. (A) Schematic illustration of the relative position of lncRNA NONMMUT011874 (Ppp1r1b-lncRNA) in relation to Ppp1r1b gene in mouse genome. Adopted from Touma et al. 2016 [3]. (B) Mouse Ppp1r1b ENSMUSG00000061718 Transcripts. Red arrow and box highlight the unique exon that was targeted to design a Ppp1r1b-lncRNA anti-sense oligo probe for Ppp1r1b-lncRNA.
Cells 12 02805 g001
Figure 2. Schematic Summary of Ppp1r1b-lncRNA-ChIRP Assay and Bioinformatic Pipeline. (A) Chromatin was cross-linked in vivo. Biotinylated tiling probes were hybridized to target lncRNA, and chromatin complexes were purified using magnetic streptavidin beads, followed by stringent washes. The lncRNA-bound DNA or proteins were eluted with a cocktail of RNase A and H. A putative lncRNA binding DNA sequence is highlighted in red. Adopted from Chu et al. 2011. (B) The ChIRP-captured DNA fragments and control input libraries were subjected to next-generation sequencing. After quality control (QC) and alignment to the reference genome, the clean, uniquely mapped reads were used for peak calling and downstream bioinformatic analysis.
Figure 2. Schematic Summary of Ppp1r1b-lncRNA-ChIRP Assay and Bioinformatic Pipeline. (A) Chromatin was cross-linked in vivo. Biotinylated tiling probes were hybridized to target lncRNA, and chromatin complexes were purified using magnetic streptavidin beads, followed by stringent washes. The lncRNA-bound DNA or proteins were eluted with a cocktail of RNase A and H. A putative lncRNA binding DNA sequence is highlighted in red. Adopted from Chu et al. 2011. (B) The ChIRP-captured DNA fragments and control input libraries were subjected to next-generation sequencing. After quality control (QC) and alignment to the reference genome, the clean, uniquely mapped reads were used for peak calling and downstream bioinformatic analysis.
Cells 12 02805 g002
Figure 3. Genome and Gene Depth Distribution for Each Sample. (A,B) Composite genome coverage of Ppp1r1b-lncRNA-ChIRP and Control (input) sequencing reads in mouse myoblasts cell line. These figures represent all reads in each sample. (C,D) Composite sequencing reads profiles of Ppp1r1b-lncRNA-ChIRP and Control (input) samples across gene regions in mouse myoblast cell line. These figures only represent the reads mapped to gene elements. The ‘Input’ library was used as a control.
Figure 3. Genome and Gene Depth Distribution for Each Sample. (A,B) Composite genome coverage of Ppp1r1b-lncRNA-ChIRP and Control (input) sequencing reads in mouse myoblasts cell line. These figures represent all reads in each sample. (C,D) Composite sequencing reads profiles of Ppp1r1b-lncRNA-ChIRP and Control (input) samples across gene regions in mouse myoblast cell line. These figures only represent the reads mapped to gene elements. The ‘Input’ library was used as a control.
Cells 12 02805 g003
Figure 4. MACS-derived Ppp1r1b-lncRNA-ChIRP Peaks Statistics. (A) Depth Distribution of all peaks called by MACS at p-value < 1E-5. The X-axis indicates the number of reads, and the Y-axis indicates the proportion of peaks in the specific number of reads. (B) Length distribution of all MACS-called peaks. The X-axis refers to peak length, and the Y-axis refers to peak numbers. (C) Distribution of all MACS-called peaks based on genomic position and gene elements: intergenic, introns, downstream, upstream, and exons. (D,E) Number of MACS-called Peaks (p < 1E-5) genome-wide (D) and on each chromosome (E) at different enrichment score cut-off values (≥0, ≥1, and ≥10). The X-axis indicates chromosome number, and the Y-axis indicates the number of peaks mapped to each chromosome. (F,G) Number of MACS-called peaks (p < 1E-5) that mapped to known protein-coding genes on all chromosomes (F) and on each chromosome (G) at different enrichment score cut-off values (≥0, ≥1, and ≥10). The X-axis indicates chromosome number, and the Y-axis indicates the number of gene-mapped peaks on each chromosome.
Figure 4. MACS-derived Ppp1r1b-lncRNA-ChIRP Peaks Statistics. (A) Depth Distribution of all peaks called by MACS at p-value < 1E-5. The X-axis indicates the number of reads, and the Y-axis indicates the proportion of peaks in the specific number of reads. (B) Length distribution of all MACS-called peaks. The X-axis refers to peak length, and the Y-axis refers to peak numbers. (C) Distribution of all MACS-called peaks based on genomic position and gene elements: intergenic, introns, downstream, upstream, and exons. (D,E) Number of MACS-called Peaks (p < 1E-5) genome-wide (D) and on each chromosome (E) at different enrichment score cut-off values (≥0, ≥1, and ≥10). The X-axis indicates chromosome number, and the Y-axis indicates the number of peaks mapped to each chromosome. (F,G) Number of MACS-called peaks (p < 1E-5) that mapped to known protein-coding genes on all chromosomes (F) and on each chromosome (G) at different enrichment score cut-off values (≥0, ≥1, and ≥10). The X-axis indicates chromosome number, and the Y-axis indicates the number of gene-mapped peaks on each chromosome.
Cells 12 02805 g004
Figure 5. Ppp1r1b-lncRNA-Binding Sites (True Peaks) Statistics. (A) Number of genome-mapped true peaks at different peak length (bp) values. (B) Distribution of genome-mapped true peaks at different peak length (bp) values. (C) Number of genome-mapped true peaks at different enrichment score values. (D) Distribution of genome-mapped true peaks at different enrichment score values. (E) Number of gene-mapped true peaks at different peak length (bp) values. (F) Distribution of gene-mapped true peaks at different peak length (bp) values. (G) Number of gene-mapped true peaks at different enrichment score values. (H) Distribution of gene-mapped true peaks at different enrichment score values.
Figure 5. Ppp1r1b-lncRNA-Binding Sites (True Peaks) Statistics. (A) Number of genome-mapped true peaks at different peak length (bp) values. (B) Distribution of genome-mapped true peaks at different peak length (bp) values. (C) Number of genome-mapped true peaks at different enrichment score values. (D) Distribution of genome-mapped true peaks at different enrichment score values. (E) Number of gene-mapped true peaks at different peak length (bp) values. (F) Distribution of gene-mapped true peaks at different peak length (bp) values. (G) Number of gene-mapped true peaks at different enrichment score values. (H) Distribution of gene-mapped true peaks at different enrichment score values.
Cells 12 02805 g005
Figure 6. Functional Annotation of Ppp1r1b-lncRNA-Binding Sites. (A) Gene Ontology (GO) analysis of Ppp1r1b-lncRNA-ChIRP peaks’ related genes. The top significantly enriched GO terms (FDR < 0.05) involved in biological processes, cellular components, or molecular functions are presented. The number of genes in each term is shown. (B) Summary of KEGG Pathway analysis of Ppp1r1b-lncRNA-ChIRP peaks’ related genes. The top significantly enriched pathways are presented. The number of genes in each pathway is shown. Only true peaks’ related genes are included in these analyses. (C) Ppp1r1b-lncRNA-ChIRP-PCR validation of Ppp1r1b-lncRNA interaction with promoters of myogenic differentiation transcription factors. A LacZ probe was used as a negative control.
Figure 6. Functional Annotation of Ppp1r1b-lncRNA-Binding Sites. (A) Gene Ontology (GO) analysis of Ppp1r1b-lncRNA-ChIRP peaks’ related genes. The top significantly enriched GO terms (FDR < 0.05) involved in biological processes, cellular components, or molecular functions are presented. The number of genes in each term is shown. (B) Summary of KEGG Pathway analysis of Ppp1r1b-lncRNA-ChIRP peaks’ related genes. The top significantly enriched pathways are presented. The number of genes in each pathway is shown. Only true peaks’ related genes are included in these analyses. (C) Ppp1r1b-lncRNA-ChIRP-PCR validation of Ppp1r1b-lncRNA interaction with promoters of myogenic differentiation transcription factors. A LacZ probe was used as a negative control.
Cells 12 02805 g006
Figure 7. Motif Enrichment in Ppp1r1b-lncRNA Binding Sites to Transcription Factors. (A) Representative examples of significantly enriched motifs in Ppp1r1b-lncRNA-binding sites in Homeobox transcription factors (Lhx9 and Prrx2 with TA-rich motifs) and zinc fingers (Zfp161 and Zbtb14 with GC-rich motif). (B) Positional distribution of the best match to the motif in the primary sequences. The plot is smoothed with a triangular function whose width is 5% of the maximum primary sequence length. The position of the dotted vertical line indicates whether the sequences were aligned on their left ends, centers, or right ends, respectively. (C) The percentage of sequences matching the motif. A sequence is said to match the motif if some position within it has a match score greater than or equal to the optimal score threshold.
Figure 7. Motif Enrichment in Ppp1r1b-lncRNA Binding Sites to Transcription Factors. (A) Representative examples of significantly enriched motifs in Ppp1r1b-lncRNA-binding sites in Homeobox transcription factors (Lhx9 and Prrx2 with TA-rich motifs) and zinc fingers (Zfp161 and Zbtb14 with GC-rich motif). (B) Positional distribution of the best match to the motif in the primary sequences. The plot is smoothed with a triangular function whose width is 5% of the maximum primary sequence length. The position of the dotted vertical line indicates whether the sequences were aligned on their left ends, centers, or right ends, respectively. (C) The percentage of sequences matching the motif. A sequence is said to match the motif if some position within it has a match score greater than or equal to the optimal score threshold.
Cells 12 02805 g007
Table 1. Summary of sequencing data for each sample.
Table 1. Summary of sequencing data for each sample.
Sample IDFragment Length (bp)Sequencing StrategyClean Reads NumberClean Data Size (bp)Clean Rate (%)
Ppp1r1b-lncRNA_ChIRP100–500SE5036,534,9351.83E+0999.13
Control100–500SE5038,429,3691.92E+0998.04
Fragment Length (bp): DNA fragment for library building; SE50: Strategy of sequencing sample with single end (SE), and the following number reflects read length. Clean Reads Number: The count of reads number in clean data. Clean Data Size: The count of bases in clean data. Clean Rate (%): The ratio of clean data size to raw data size = Clean Data Size (bp)/Raw Data Size (bp).
Table 2. Alignment statistics results and genome mapping rate for each sample.
Table 2. Alignment statistics results and genome mapping rate for each sample.
Sample IDSpeciesClean ReadsMapped ReadsMapped Rate (%)Uniquely Mapped ReadsUniquely Mapped Rate (%)
Ppp1r1b-lncRNA-ChIRPmm1036,534,93534,783,23395.2131,674,01086.7
Controlmm1038,429,36937,264,69496.9732,579,79384.78
Clean Reads: Total clean reads number; Mapped Reads: Total reads that can be mapped to the reference genome. Mapped Reads Rate (10%): The proportion of reads that can be mapped to the reference genome in total reads. Uniquely Mapped Reads: Total reads that only map to one position in the reference genome. Uniquely Mapped Rate (%): The proportion of reads that only map to one position in total reads.
Table 3. Peak calling statistics.
Table 3. Peak calling statistics.
Sample IDPeak NumberTotal LengthAverage LengthTotal Tag DepthAverage Tag DepthGenome Rate (%)
Ppp1r1b-lncRNA-ChIRP261,455146,128,1395583,848,968.705145.35
Peak number: Number of all MACS-detected peaks. Total Length: Total length of all peaks. Average Length: Average length of all peaks. Total Tag Depth: Total tag depth of all summits. Average Tag Depth: Average tag depth of all summits. Genome Rate: Proportion of total length of all peaks in the whole genome.
Table 4. KEGG pathway analysis. Top 20 significant pathways enriched in the peaks’ related genes. p-values, adjusted p-values, and five representative genes in each category are presented.
Table 4. KEGG pathway analysis. Top 20 significant pathways enriched in the peaks’ related genes. p-values, adjusted p-values, and five representative genes in each category are presented.
KEGG Pathwayp Valueadjp-ValueExample Genes
Cardiomyopathy9.74E-105.29E-08Myh6, Tnnt2, Tnni3, Tcap, DMD
MAPK Signaling Pathway7.83E-092.55E-07MAPK12, MAPK10, MAPK1, BMP4, Ppp2cb
MicroRNAs in cancer6.26E-081.46E-06Ezr, Tnr, Sos1, Lrp1, Abl1
Thyroid hormone signaling pathway6.26E-081.46E-06Adam23, Prkca, Med12l, Otog, Bmp4
CAMP signaling pathway1.24E-072.70E-06Atp6v1h, Rdh10, Ndufs1, Ndufa10, Prim2
Calcium signaling pathway1.24E-072.70E-06Cacna1c, Tnnc2, Cacna1d, Plcb1, Pde1c
Insulin resistance1.83E-073.73E-06Prcke, nfkb1, Prkag2, Slc27a1, Ppara
Regulating pluripotency of stem cells3.19E-075.47E-06Meis1, Jak2, Lhx2, cdh1, Pou5f1
Adrenergic signaling of cardiomyocytes1.12E-051.21E-04Cacng3,Lam4, Tnn, Ctnna, Lamc1
mTOR signaling pathway2.36E-052.34E-04Grb10, Rheb, Mtor, Deptor, Akt1,3
Muscle contraction8.21E-056.08E-04Atp1a4, cacng2, Casq2, Ryr2, Cox7a21
Wnt signaling pathway8.54E-056.18E-04Lef1, Wnt3, Tcf7, Nfatc1, fzd6,3,9
Inositol phosphate metabolism8.54E-056.18E-04Pten, Mtmr2, Plcb1, Itpk1, Pip4k2b
Notch signaling pathway0.0001215648.62E-04Notch3, Jag1, EP300, Hess, Kat2a
Regulation of actin cytoskeleton0.0001348149.16E-04Vwc2l, Lamb3, Cpne4, Adam23, Actn1
Table 5. Promoter-mapped Ppp1r1b-lncRNA-binding sites involved in transcription by RNA Pol-II. Top 25 experimentally validated Ppp1r1b-lncRNA-bound promoters involved in transaction by RNA Pol-II. Neg: Negative; Pos: Positive.
Table 5. Promoter-mapped Ppp1r1b-lncRNA-binding sites involved in transcription by RNA Pol-II. Top 25 experimentally validated Ppp1r1b-lncRNA-bound promoters involved in transaction by RNA Pol-II. Neg: Negative; Pos: Positive.
Peak ChrPeak
Start
Peak
End
PG + Peak
ID
Peak ScorePromoter
Start
Promoter
End
Gene
Symbol
TATA IMCCAATGCStrandOverlap vs. PromoterOverlap vs.
Promoter (%)
chr1463,244,29363,245,51177,53774.6663,245,22763,245,287Gata4 XXXNeg60100
chr1383,523,70983,524,546108,42510.0983,524,50483,524,564Mef2cXXX Pos60100
chr1746,555,82446,556,722108,93832.9646,556,15846,556,218SrfX XXNeg60100
chr1579,345,91879,346,730107,95221.9879,346,56379,346,623Maff XX Pos60100
chr108142714681,427,779108,15814.8281,427,14681,427,206Nfic XX Neg60100
chr1135,257,953135,258,88212,55536.78135,258,446135,258,506Elf3XXX Neg60100
chr14,493,5334,493,908108,11915.734,493,5974,493,657Sox17XXXXNeg60100
chr1067,536,48567,538,05025,89355.6167,537,82067,537,880Egr2X Pos60100
chr1734,031,77534,032,425107,64271.9434,032,31934,032,379Rxrb XPos60100
chr1479,479,97279,481,156108,47710.5879,481,12979,481,189Elf1X Pos2745
chr1256,534,91556,535,714107,99820.1656,535,18756,535,247Nkx2-1 XNeg60100
chr719,629,31919,629,812211,71338.5219,629,41219,629,472Relb XXNeg60100
chr15102,625,143102,625,69694,38099.04102,625,475102,625,535Atf7 XNeg60100
chr151,986,78651,987,935427328.2851,987,07751,987,137Stat4 XXXPos60100
chr3131,108,912131,111,373161,23623.35131,110,273131,110,333Lef1 XPos60100
chr228,621,84028,622,035132,35724.4728,621,93428,621,994Gfi1b X Neg60100
chr330,138,87430,140,715151,24736.830,140,41230,140,472Mecom X Neg60100
chr1610,992,40210,993,184108,18813.9710,993,06810,993,128Litaf XXXNeg60100
chr74,915,1434,915,650108,36610.164,915,1724,915,232Zfp628 XPos60100
chr5134,306,110134,306,838193,22634.41134,306,598134,306,658Gtf2i X Neg60100
chr13100,651,310100,651,684108,24712.71100,651,575100,651,635Taf9 XX Pos60100
chr1075,921,05975,921,71126,84831.2275,921,56775,921,627Smarcb1 X Neg60100
chrX12,761,91512,762,355251,94239.9712,762,00512,762,065Med14 XNeg60100
chr7139,943,548139,943,869107,96121.67139,943,772139,943,832Utf1 XPos60100
chr352,104,97552,105,552153,27930.4552,104,98052,105,040Maml3 XXNeg60100
Table 6. Motif enrichment of Ppp1r1b-lncRNA-bound transcription factors. Top 20 Ppp1r1b-lncRNA-bound transcription factors with enriched motif sequences.
Table 6. Motif enrichment of Ppp1r1b-lncRNA-bound transcription factors. Top 20 Ppp1r1b-lncRNA-bound transcription factors with enriched motif sequences.
RANKIDALT_IDCONSENSUSSCORE_THRPVALUEEVALUEQVALUE
79UP00237_1Otp_3496.1VVYWRTTAATTAAYDNG4.20.00E+007.69E-3060.00E+00
87UP00164_1Hoxa7_2668.2SGMNTTAATTAATDNNC7.52.01E-2737.76E-2711.70E-273
128UP00175_1Lhx9_3492.1CBYATTAATTAATHMCY6.16.04E-1802.33E-1773.47E-180
69UP00144_1Hoxb4_2627.1CNNRTTAATTAATWAHY8.32.81e-3431.08e-3401.08e-340
7UP00169_1Lmx1b_3433.2VDWWWTTAATTAATWHB6.63.33e-11461.28e-11431.28e-1143
40UP00078_1Arid3a_primarySNNHTTAATTAAAMNHN7.83.38e-5061.30e-5031.30e-503
55UP00141_1Vsx1_1728.1CSARTTAATTAAYNAHT7.83.96e-3991.53e-3961.53e-396
78UP00209_2Cart1_1275.1BVMNTTAATTAAYYNNN6.71.83e-3107.05E-3081.72e-310
18UP00129_1Pou3f1_3819.1DVNTAATTAATTAABTN6.76.07e-9202.34e-9172.34e-917
70UP00142_1Uncx4.1_2281.2VNTAATTAATTAABGSG7.36.51e-3372.51e-3342.51e-334
51UP00172_1Prop1_3949.1VGVRTTAATTAAKWNNC7.36.71e-4222.59e-4192.59e-419
47UP00196_1Hoxa4_3426.1DDTTATTAATTAACKBG6.27.06e-4492.72e-4462.72e-446
81UP00182_1Hoxa6_1040.1AMGKTAATTACCHHAD9.17.54E-3042.91E-3016.86E-304
82UP00189_1Hoxa5_3415.1AMGKTAATTAVCWHAD7.52.19E-3008.45E-2981.97E-300
85UP00248_1Pax7_3783.1MSHNYTAATTARBHVDN103.56E-2841.37E-2813.08E-284
92UP00174_1Hoxa2_3079.1AVGGTAATTASCHMAN7.49.71E-2603.75E-2577.77E-260
94UP00214_1Hoxb5_3122.2ANGKTAATTASCHMAT9.12.45E-2579.45E-2551.92E-257
97UP00167_1En1_3123.2RNNAACTAATTARKDC5.83.29E-2491.27E-2462.50E-249
88UP00065_1Zfp161_primaryKGGCGCGCGCRCHYRD141.94E-2727.51E-2701.63E-272
193UP00001_1E2F2_primaryNHWARGGCGCGCSAH211.65E-746.38E-726.31E-75
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Hwang, J.H.; Kang, X.; Wolf, C.; Touma, M. Mapping Chromatin Occupancy of Ppp1r1b-lncRNA Genome-Wide Using Chromatin Isolation by RNA Purification (ChIRP)-seq. Cells 2023, 12, 2805. https://doi.org/10.3390/cells12242805

AMA Style

Hwang JH, Kang X, Wolf C, Touma M. Mapping Chromatin Occupancy of Ppp1r1b-lncRNA Genome-Wide Using Chromatin Isolation by RNA Purification (ChIRP)-seq. Cells. 2023; 12(24):2805. https://doi.org/10.3390/cells12242805

Chicago/Turabian Style

Hwang, John Hojoon, Xuedong Kang, Charlotte Wolf, and Marlin Touma. 2023. "Mapping Chromatin Occupancy of Ppp1r1b-lncRNA Genome-Wide Using Chromatin Isolation by RNA Purification (ChIRP)-seq" Cells 12, no. 24: 2805. https://doi.org/10.3390/cells12242805

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop