Long non-coding and coding RNA profiling using strand-specific RNA-seq in human hypertrophic cardiomyopathy

Liu, Xuanyu; Ma, Yi; Yin, Kunlun; Li, Wenke; Chen, Wen; Zhang, Yujing; Zhu, Changsheng; Li, Tianjiao; Han, Bianmei; Liu, Xuewen; Wang, Shuiyun; Zhou, Zhou

doi:10.1038/s41597-019-0094-6

Download PDF

Data Descriptor
Open access
Published: 13 June 2019

Long non-coding and coding RNA profiling using strand-specific RNA-seq in human hypertrophic cardiomyopathy

Xuanyu Liu¹,
Yi Ma¹,
Kunlun Yin¹,
Wenke Li¹,
Wen Chen¹,
Yujing Zhang¹,
Changsheng Zhu²,
Tianjiao Li¹,
Bianmei Han¹,
Xuewen Liu¹,
Shuiyun Wang² &
…
Zhou Zhou¹

Scientific Data volume 6, Article number: 90 (2019) Cite this article

11k Accesses
39 Citations
1 Altmetric
Metrics details

Subjects

Abstract

Hypertrophic cardiomyopathy (HCM) represents one of the most common heritable heart diseases. However, the signalling pathways and regulatory networks underlying the pathogenesis of HCM remain largely unknown. Here, we present a strand-specific RNA-seq dataset for both coding and lncRNA profiling in myocardial tissues from 28 HCM patients and 9 healthy donors. This dataset constitutes a valuable resource for the community to examine the dysregulated coding and lncRNA genes in HCM versus normal conditions.

Design Type(s)	transcription profiling design • disease state design • sequence analysis objective
Measurement Type(s)	transcription profiling assay
Technology Type(s)	RNA sequencing
Factor Type(s)	experimental condition
Sample Characteristic(s)	Homo sapiens • heart

Machine-accessible metadata file describing the reported data (ISA-Tab format)

Inferring gene regulatory networks from single-cell multiome data using atlas-scale external data

Article Open access 12 April 2024

Tissue-specific enhancer–gene maps from multimodal single-cell data identify causal disease alleles

Article 09 April 2024

Spatially organized cellular communities form the developing human heart

Article Open access 13 March 2024

Background & Summary

Hypertrophic cardiomyopathy (HCM) represents one of the most common heart diseases (an estimated prevalence of at least 0.2%), and a leading cause of sudden death in young people^1,2. HCM is generally regarded as a genetic disorder caused predominately by mutations in eight sarcomere genes, including MYH7, MYBPC3, ACTC1, TPM1, MYL2, MYL3, TNNI3, and TNNT2³. However, the signalling pathways and regulatory networks underlying the pathogenesis of HCM remain largely unknown.

Long non-coding RNAs (lncRNAs) are a large class of transcripts ≥200 nucleotides in length that do not encode proteins⁴. Compared with coding mRNAs, most lncRNAs are less well annotated and their functions are largely unexplored. Nevertheless, there is increasing evidence showing that lncRNAs are involved in a variety of biological processes and diseases^5,6. lncRNAs have been implicated in pathologically processes of HCM, such as cardiomyocyte disarrangement, myocardial hypertrophy and interstitial fibrosis⁷. Through comparative analysis between 7 HCM patients and 5 control subjects using microarray, dysregulated lncRNAs in myocardial tissues of HCM patients were found to be involved in the pathogenesis of HCM through the regulation of pathogenetic pathways⁸.

RNA-seq is emerging as the major transcriptome profiling system. RNA-seq has considerable advantages over microarray in many aspects such as novel transcript identification through de novo assembly, splice junction identification and allele-specific expression analysis. Compared with the standard RNA-seq protocol, strand-specific RNA-seq retains strand of origin information, thus providing a greater resolution for sense/antisense profiling, which is essential for antisense lncRNA identification⁹. To our knowledge, we still lack a strand-specific RNA-seq dataset for myocardial tissues of HCM patients.

Here, we present a strand-specific RNA-seq dataset for both coding and lncRNA profiling in myocardial tissues from 28 HCM patients and 9 healthy donors. This dataset constitutes a valuable resource for the community to examine the dysregulated coding and lncRNA genes in HCM versus normal conditions. This dataset may also be reutilized through integration with future datasets to further enhance statistical power by increasing sample size.

Methods

Ethical approval

This study was approved by the ethics committee of the institutional review board at Fuwai Hospital. All procedures were conducted according to the ethical standards of the research committee. Informed consent was obtained from all subjects.

Patients and sample collection

Figure 1a shows the experimental design and workflow. We enrolled 28 HCM patients undergoing septal myectomy (Table 1 and Supplementary Table S1). All these patients were subjected to genetic testing to identify pathogenic mutations in HCM causal genes. Among them, 10 patients were confirmed to carry deleterious mutation in the gene MYBPC3 encoding cardiac myosin binding protein C, and 8 patients in the gene MYH7 encoding beta-myosin heavy chain. The remaining 10 patients were genetically undiagnosed. Myocardial tissues were collected during surgery and immediately placed in liquid nitrogen for storage. In addition, myocardial tissues in a normal, healthy condition (9 samples as a control group) were collected from donor hearts abandoned during cardiac transplant due to unexpected reasons.

Table 1 Summary statistics for the sequencing data.

Full size table

RNA isolation and qualification

Total RNA was isolated with TRIzol^TM reagent (Invitrogen, USA) according to the manufacturer’s instruction. RNA concentration was measured using Qubit^® RNA Assay Kit in Qubit^® 2.0 Fluorometer (Life Technologies, CA, USA). RNA purity was assessed using the NanoPhotometer^® spectrophotometer (IMPLEN, CA, USA). RNA integrity was checked using the RNA Nano 6000 Assay Kit on the Agilent Bioanalyzer 2100 system (Agilent Technologies, CA, USA). Only samples with a 260:280 ratio of ≥1.5 and an RNA integrity number (RIN) of ≥8 were subjected to deep sequencing.

Strand-specific RNA-seq library preparation & sequencing

We prepared a strand-specific RNA-seq library for each sample. Firstly, ribosomal RNA (rRNA) was removed by Epicentre Ribo-zero^TM rRNA Removal Kit (Epicentre, USA) from 3 μg total RNA. Then, sequencing libraries were generated using NEBNext^® Ultra^TM Directional RNA Library Prep Kit for Illumina^® (NEB, USA) following manufacturer’s instructions. Briefly, the first strand cDNA synthesis was performed using M-MuLV reverse transcriptase and random hexamer primer. The second strand cDNA was synthesized using RNase H and DNA Polymerase I. The dTTP was replaced by dUTP in the reaction buffer. Following end repair and adenylation, cDNA fragments were ligated to adaptors. Then, 3 μl USER Enzyme was incubated with the cDNA for 15 min at 37 °C followed by 5 min at 95 °C before PCR. Following PCR amplification, products were purified using the AMPure XP system. Finally, library quality was assessed on the Agilent Bioanalyzer 2100 system. The resulting libraries were sequenced on the Illumina HiSeq X Ten System in a 2 × 150 bp paired-end mode.

Read alignment and transcript assembly

Figure 1b shows the bioinformatic analysis workflow. The raw sequencing reads¹⁰ were subjected to adapter trimming and base quality filtering by fastp v0.7.0¹¹. Clean reads obtained were aligned to the human reference genome (GRCh37) using hisat2 v2.1.0¹² under default settings. Following alignment, the quality of each RNA-seq dataset was assessed through a variety of metrics generated by QoRTs¹³. Transcript de novo assembly for each sample was performed using StringTie v1.3.4b¹⁴ under default settings with the guidance of a reference annotation (GENCODE GRCh37 release 27, -G option). The assembled transcripts of all samples were merged into a single file using the merge function of StringTie with the reference annotation provided (-G option). Other parameters were set to defaults (-m 50 -T 1 -f 0.01 -g 250).

Novel lncRNA gene prediction

The transcripts without matched known transcript information in the StringTie merge output were predicted to be from novel lncRNA genes based on the following criteria: (1) the novel transcripts assembled must have definite strand information; (2) the transcripts must have more than one exon; (3) the transcripts must be more or equal to 200 bp in length; and (4) the coding potential of the transcripts were predicted using CPC2¹⁵, and only the transcripts labelled as “noncoding” in the output were kept. We ultimately got 205 novel lncRNA genes (ALL_GENE_EXPR_DEG_ANALYSIS.xlsx)¹⁶.

Expression abundance quantification

All coding genes and lncRNA genes, including predicted novel lncRNA, lincRNA, sense intronic lncRNA, sense overlapping lncRNA and antisense lncRNA genes, were incorporated in expression abundance quantification (stringtie_merged.strand.lncRNA.proteincoding.gtf)¹⁶. Firstly, the transcript sequences (stringtie_merged.strand.lncRNA.proteincoding.fa)¹⁶ were extracted from the reference genome using gffread (https://github.com/gpertea/gffread). Then, the expression of the transcripts was quantified with kallisto v0.43.1¹⁷ under default settings. For comparison among samples, transcript abundance for each sample was normalized with Transcripts Per Million (TPM)¹⁸. The expression of each gene was determined by aggregating the expression of all corresponding transcript isoforms. Along with transcript abundance estimates, 100 bootstraps per sample were generated (kallisto quant –b 100), which serve as proxies for technical replicates. Figure 2a,b show the expression profiles of coding genes and lncRNA genes in each sample, respectively. Based on the expression of coding genes, hierarchical clustering analysis revealed distinct expression landscapes between the normal and HCM groups for both coding and lncRNA genes. However, samples from each of the three HCM groups were not clustered together, indicating that there may be no significant difference in transcriptome among HCM patients with different genetic backgrounds at least in the sampling stage.

Differential expression analysis

Following quantification, the identification of differentially expressed genes (DEGs) between HCM and normal samples was performed using sleuth v0.29.0¹⁹, which could leverage the bootstraps of kallisto to correct for technical variation. The biological significance threshold was set to a fold change of ±2 fold, and the statistical significance threshold was set to a q-value of 0.05 (−log10 q-value > 1.3). Only genes that achieved both biological and statistical significance were considered as DEGs. We identified 132 and 241 coding genes up-regulated and down-regulated in HCM versus normal samples, respectively (Fig. 2c). We also found 67 and 83 lncRNA genes up-regulated and down-regulated in HCM versus normal samples, respectively (Fig. 2d). We made available the useful information for each sample, including the expression abundance of each gene, testing statistics and DEGs (ALL_GENE_EXPR_DEG_ANALYSIS.xlsx)¹⁶.

Data Records

The sequencing data in the fastq format have been deposited in NCBI Sequence Read Archive (SRA)¹⁰. The transcript abundance file for each sample has been deposited in Gene Expression Omnibus (GEO)¹⁸. Other processed files were uploaded to figshare¹⁶.

Technical Validation

After quality control, the number of sequenced bases was over 11 Gb in all samples, and the Q20 (the percentage of bases with Phred-scaled quality score ≥20) was over 97% in all samples (Q30 over 93%), indicating that the base quality was sufficiently high for downstream analyses (Table 1). When aligning the clean reads to human reference genome, the overall alignment rate was high (over 97%) in all samples, suggesting little contamination from microorganisms (Table 1).

Taking advantage of QoRTs¹³, a toolkit for quality assessment of RNA-seq dataset, we made cross-comparisons of samples to identify any outliers or systematic errors associated with biological conditions, i.e., different groups (Fig. 3a–f). Figure 3a shows the distribution curve of estimated insert size for each sample. We found that the curves were relatively smooth (no “spikes”) and consistent across samples and conditions, reflecting little technical bias across samples. Figure 3b shows the gene body coverage profile for each sample, and no significant 3’ bias was found, indicating that the datasets were not affected by RNA degradation. Figure 3c shows the read mapping rates for different location categories in each sample, from which we did not observe any outlier within each condition, suggesting consistency across samples in terms of alignment. Similarly, we did not observe a disproportionate identification of novel splice junctions in one sample or condition (Fig. 3d). Except for the nucleotide composition bias in the first few cycles that normally occur in Illumina RNA-seq data, the base composition was quite uniform across all other cycles (Fig. 3e). Figure 3f shows the alignment soft clipping rate by cycle in each sample. We did not observe any “spikes” in the curves for all samples and the clipping profiles were generally consistent across samples and conditions. To visualize the high-dimensional transcriptomic datasets, we performed dimension reduction with principle component analysis (PCA). Consistent with the observation in the hierarchical clustering analysis (Fig. 2a,b), we found that all HCM samples clustered together and were distant from normal samples (Fig. 3g), thus suggesting that our data are suitable for differential expression analysis. As expected, the transcriptomic variance among samples was found to be more significant in the normal condition than the diseased HCM condition.

Taken together, we presented a high-quality dataset that was suitable for differential expression and splicing analysis of both coding and lncRNA genes in myocardial tissues between HCM and normal conditions.

Code Availability

The code for processing the data from raw sequencing reads to DEGs is available within figshare (CODE_for_RNA-seq.sh)¹⁶.

References

Ho, C. Y. Hypertrophic cardiomyopathy in 2012. Circulation 125, 1432–1438 (2012).
Article Google Scholar
Geske, J. B., Ommen, S. R. & Gersh, B. J. Hypertrophic cardiomyopathy: clinical update. JACC Heart Fail 6, 364–375 (2018).
Article Google Scholar
Walsh, R. et al. Defining the genetic architecture of hypertrophic cardiomyopathy: re-evaluating the role of non-sarcomeric genes. Eur. Heart J. 38, 3461–3468 (2017).
CAS PubMed PubMed Central Google Scholar
Mattick, J. S. & Rinn, J. L. Discovery and annotation of long noncoding RNAs. Nat. Struct. Mol. Biol. 22, 5–7 (2015).
Article CAS Google Scholar
Greco, S., Salgado Somoza, A., Devaux, Y. & Martelli, F. Long Noncoding RNAs and Cardiac Disease. Antioxid. Redox Signal. 29, 880–901 (2018).
Article CAS Google Scholar
Rinn, J. L. & Chang, H. Y. Genome regulation by long noncoding RNAs. Annu. Rev. Biochem. 81, 145–166 (2012).
Article CAS Google Scholar
Shen, S., Jiang, H., Bei, Y., Xiao, J. & Li, X. Long non-coding RNAs in cardiac remodeling. Cell. Physiol. Biochem. 41, 1830–1837 (2017).
Article CAS Google Scholar
Yang, W., Li, Y., He, F. & Wu, H. Microarray profiling of long non-coding RNA (lncRNA) associated with hypertrophic cardiomyopathy. BMC Cardiovasc. Disord. 15, 62 (2015).
Article Google Scholar
Mills, J. D., Kawahara, Y. & Janitz, M. Strand-specific RNA-seq provides greater resolution of transcriptome profiling. Curr. Genomics 14, 173–181 (2013).
Article CAS Google Scholar
NCBI Sequence Read Archive, https://identifiers.org/ncbi/insdc.sra:SRP186138 (2019).
Chen, S., Zhou, Y., Chen, Y. & Gu, J. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics 34, i884–i890 (2018).
Article Google Scholar
Kim, D., Langmead, B. & Salzberg, S. L. HISAT: a fast spliced aligner with low memory requirements. Nat. Methods 12, 357 (2015).
Article CAS Google Scholar
Hartley, S. W. & Mullikin, J. C. QoRTs: a comprehensive toolset for quality control and data processing of RNA-Seq experiments. BMC Bioinformatics 16, 224 (2015).
Article Google Scholar
Pertea, M. et al. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat. Biotechnol. 33, 290 (2015).
Article CAS Google Scholar
Kang, Y. J. et al. CPC2: a fast and accurate coding potential calculator based on sequence intrinsic features. Nucleic Acids Res. 45, W12–W16 (2017).
Article CAS ADS Google Scholar
Liu, X. Long non-coding RNA and mRNA profiling using RNA sequencing in human hypertrophic cardiomyopathy. figshare, https://doi.org/10.6084/m9.figshare.c.4407284.v1 (2019).
Bray, N. L., Pimentel, H., Melsted, P. & Pachter, L. Near-optimal probabilistic RNA-seq quantification. Nat. Biotechnol. 34, 525–527 (2016).
Article CAS Google Scholar
Liu, X. & Zhou, Z. Long non-coding and protein-coding RNA profiling using strand-specific RNA-seq in human hypertrophic cardiomyopathy. Gene Expression Omnibus, https://identifiers.org/geo:GSE130036 (2019).
Pimentel, H., Bray, N. L., Puente, S., Melsted, P. & Pachter, L. Differential analysis of RNA-seq incorporating quantification uncertainty. Nat. Methods 14, 687–690 (2017).
Article CAS Google Scholar

Download references

Acknowledgements

This work is supported by grants of the National Natural Science Foundation of China (81570276), Beijing Science and Technology Program (Z161100000516154), the CAMS Initiative for Innovative Medicine (2016-I2M-1-016) and the Post-doctoral International Exchange Project (2018-BSH04).

Author information

Authors and Affiliations

Center of Laboratory Medicine, Fuwai Hospital, State Key Laboratory of Cardiovascular Disease, Beijing Key Laboratory for Molecular Diagnostics of Cardiovascular Diseases, National Center for Cardiovascular Diseases, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, 100037, China
Xuanyu Liu, Yi Ma, Kunlun Yin, Wenke Li, Wen Chen, Yujing Zhang, Tianjiao Li, Bianmei Han, Xuewen Liu & Zhou Zhou
Departments of Cardiovascular Surgery, Fuwai Hospital, State Key Laboratory of Cardiovascular Disease, National Center for Cardiovascular Diseases, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, 100037, China
Changsheng Zhu & Shuiyun Wang

Authors

Xuanyu Liu
View author publications
You can also search for this author in PubMed Google Scholar
Yi Ma
View author publications
You can also search for this author in PubMed Google Scholar
Kunlun Yin
View author publications
You can also search for this author in PubMed Google Scholar
Wenke Li
View author publications
You can also search for this author in PubMed Google Scholar
Wen Chen
View author publications
You can also search for this author in PubMed Google Scholar
Yujing Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Changsheng Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Tianjiao Li
View author publications
You can also search for this author in PubMed Google Scholar
Bianmei Han
View author publications
You can also search for this author in PubMed Google Scholar
Xuewen Liu
View author publications
You can also search for this author in PubMed Google Scholar
Shuiyun Wang
View author publications
You can also search for this author in PubMed Google Scholar
Zhou Zhou
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Y.M. participated in the experimental design. K.Y., T.L., B.H. and Xuewen L. performed RNA isolation and library preparation. W.L., W.C., Y.Z. and C.Z. gave suggestions on data analysis. Xuanyu L. analysed the data and wrote the manuscript. S.W. and Z.Z. conceived the project.

Corresponding authors

Correspondence to Shuiyun Wang or Zhou Zhou.

Ethics declarations

Competing Interests

The authors declare no competing interests.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

ISA-Tab metadata file

Download metadata file

Supplementary Information

Table S1

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

The Creative Commons Public Domain Dedication waiver http://creativecommons.org/publicdomain/zero/1.0/ applies to the metadata files associated with this article.

Reprints and permissions

About this article

Cite this article

Liu, X., Ma, Y., Yin, K. et al. Long non-coding and coding RNA profiling using strand-specific RNA-seq in human hypertrophic cardiomyopathy. Sci Data 6, 90 (2019). https://doi.org/10.1038/s41597-019-0094-6

Download citation

Received: 28 February 2019
Accepted: 15 May 2019
Published: 13 June 2019
DOI: https://doi.org/10.1038/s41597-019-0094-6

This article is cited by

Lineage-specific regulatory changes in hypertrophic cardiomyopathy unraveled by single-nucleus RNA-seq and spatial transcriptomics
- Xuanyu Liu
- Kunlun Yin
- Zhou Zhou
Cell Discovery (2023)
Single-cell RNA sequencing identifies an Il1rn+/Trem1+ macrophage subpopulation as a cellular target for mitigating the progression of thoracic aortic aneurysm and dissection
- Xuanyu Liu
- Wen Chen
- Zhou Zhou
Cell Discovery (2022)
Single-nucleus profiling of human dilated and hypertrophic cardiomyopathy
- Mark Chaffin
- Irinna Papangeli
- Patrick T. Ellinor
Nature (2022)
A critical approach for successful use of circulating microRNAs as biomarkers in cardiovascular diseases: the case of hypertrophic cardiomyopathy
- Dimitrios Ntelios
- Elisavet Georgiou
- Georgios Tzimagiorgis
Heart Failure Reviews (2022)
Therapeutic benefits of niraparib tosylate as radio sensitizer in esophageal squamous cell carcinoma: an in vivo and in vitro preclinical study
- Yuzhong Cui
- Wei Huang
- Baosheng Li
Clinical and Translational Oncology (2022)