Skip to main content

Endothelial Cell RNA-Seq Data: Differential Expression and Functional Enrichment Analyses to Study Phenotypic Switching

  • Protocol
  • First Online:
Angiogenesis

Abstract

RNA-seq is a common approach used to explore gene expression data between experimental conditions or cell types and ultimately leads to information that can shed light on the biological processes involved and inform further hypotheses. While the protocols required to generate samples for sequencing can be performed in most research facilities, the resulting computational analysis is often an area in which researchers have little experience. Here we present a user-friendly bioinformatics workflow which describes the methods required to take raw data produced by RNA sequencing to interpretable results. Widely used and well documented tools are applied. Data quality assessment and read trimming were performed by FastQC and Cutadapt, respectively. Following this, STAR was utilized to map the trimmed reads to a reference genome and the alignment was analyzed by Qualimap. The subsequent mapped reads were quantified by featureCounts. DESeq2 was used to normalize and perform differential expression analysis on the quantified reads, identifying differentially expressed genes and preparing the data for functional enrichment analysis. Gene set enrichment analysis identified enriched gene sets from the normalized count data and clusterProfiler was used to perform functional enrichment against the GO, KEGG, and Reactome databases. Example figures of the functional enrichment analysis results were also generated. The example data used in the workflow are derived from HUVECs, an in vitro model used in the study of endothelial cells, published and publicly available for download from the European Nucleotide Archive.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 299.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Stark R, Grzelak M, Hadfield J (2019) RNA sequencing: the teenage years. Nat Rev Genet 20(11):631–656

    Article  CAS  PubMed  Google Scholar 

  2. Wang Z, Gerstein M, Snyder M (2009) RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet 10(1):57–63

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Oshlack A, Robinson MD, Young MD (2010) From RNA-seq reads to differential expression results. Genome Biol 11(12):220–220

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Conesa A et al (2016) A survey of best practices for RNA-seq data analysis. Genome Biol 17(1):13

    Article  PubMed  PubMed Central  Google Scholar 

  5. Soneson C, Delorenzi M (2013) A comparison of methods for differential expression analysis of RNA-seq data. BMC Bioinformatics 14(1):91

    Article  PubMed  PubMed Central  Google Scholar 

  6. Anjum A et al (2016) Identification of differentially expressed genes in RNA-seq data of Arabidopsis thaliana: a compound distribution approach. J Comput Biol 23(4):239–247

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Hochberg Y, Benjamini Y (1990) More powerful procedures for multiple significance testing. Stat Med 9(7):811–818

    Article  CAS  PubMed  Google Scholar 

  8. Reimand J et al (2019) Pathway enrichment analysis and visualization of omics data using g:Profiler, GSEA, Cytoscape and EnrichmentMap. Nat Protoc 14(2):482–517

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Andrade J et al (2021) Control of endothelial quiescence by FOXO-regulated metabolites. Nat Cell Biol 23(4):413–423

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Kocherova I et al (2019) Human umbilical vein endothelial cells (HUVECs) co-culture with osteogenic cells: from molecular communication to engineering prevascularised bone grafts. J Clin Med 8(10):1602

    Article  CAS  PubMed Central  Google Scholar 

  11. FastQC (2015) https://qubeshub.org/resources/fastqc

  12. Andrews S (2010) FastQC: a quality control tool for high throughput sequence data [Online]. http://www.bioinformatics.babraham.ac.uk/projects/fastqc/

  13. Martin M (2011) Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet J 17(1):3

    Article  Google Scholar 

  14. Dobin A et al (2013) STAR: ultrafast universal RNA-seq aligner. Bioinformatics (Oxford, England) 29(1):15–21

    Article  CAS  Google Scholar 

  15. Okonechnikov K, Conesa A, García-Alcalde F (2015) Qualimap 2: advanced multi-sample quality control for high-throughput sequencing data. Bioinformatics 32(2):292–294

    PubMed  PubMed Central  Google Scholar 

  16. Liao Y, Smyth GK, Shi W (2013) featureCounts: an efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics 30(7):923–930

    Article  PubMed  Google Scholar 

  17. Love MI, Huber W, Anders S (2014) Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol 15(12):550

    Article  PubMed  PubMed Central  Google Scholar 

  18. Yu G et al (2012) clusterProfiler: an R package for comparing biological themes among gene clusters. OMICS 16(5):284–287

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Yu G, He Q-Y (2016) ReactomePA: an R/Bioconductor package for reactome pathway analysis and visualization. Mol BioSyst 12(2):477–479

    Article  CAS  PubMed  Google Scholar 

  20. Subramanian A et al (2005) Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci U S A 102(43):15545–15550

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Mootha VK et al (2003) PGC-1α-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes. Nat Genet 34(3):267–273

    Article  CAS  PubMed  Google Scholar 

  22. Ashburner M et al (2000) Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet 25(1):25–29

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Gene Ontology Consortium (2020) The Gene Ontology resource: enriching a GOld mine. Nucleic Acids Res 49(D1):D325–D334

    Article  Google Scholar 

  24. Kanehisa M (2019) Toward understanding the origin and evolution of cellular organisms. Protein Sci 28(11):1947–1951

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Kanehisa M et al (2020) KEGG: integrating viruses and cellular organisms. Nucleic Acids Res 49(D1):D545–D551

    Article  PubMed Central  Google Scholar 

  26. Kanehisa M, Goto S (2000) KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res 28(1):27–30

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Jassal B et al (2019) The reactome pathway knowledgebase. Nucleic Acids Res 48(D1):D498–D503

    PubMed Central  Google Scholar 

  28. Liberzon A et al (2015) The Molecular Signatures Database (MSigDB) hallmark gene set collection. Cell Syst 1(6):417–425

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. R Core Team (2014) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria

    Google Scholar 

  30. R Studio Team (2020) RStudio: integrated development for R. RStudio, PBC, Boston, MA

    Google Scholar 

  31. Luo W, Brouwer C (2013) Pathview: an R/Bioconductor package for pathway-based data integration and visualization. Bioinformatics 29(14):1830–1831

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. Zhu A, Ibrahim JG, Love MI (2018) Heavy-tailed prior distributions for sequence count data: removing the noise and preserving large differences. Bioinformatics 35(12):2084–2092

    Article  PubMed Central  Google Scholar 

  33. Durinck S et al (2005) BioMart and Bioconductor: a powerful link between biological databases and microarray data analysis. Bioinformatics 21(16):3439–3440

    Article  CAS  PubMed  Google Scholar 

  34. Durinck S et al (2009) Mapping identifiers for the integration of genomic datasets with the R/Bioconductor package biomaRt. Nat Protoc 4(8):1184–1191

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Wickham H (2016) ggplot2: elegant graphics for data analysis. Springer, New York. ISBN 978-3-319-24277-4. https://ggplot2.tidyverse.org

    Book  Google Scholar 

  36. Leinonen R et al (2010) The European Nucleotide Archive. Nucleic Acids Res 39(Suppl_1):D28–D31

    PubMed  PubMed Central  Google Scholar 

  37. Howe KL et al (2020) Ensembl 2021. Nucleic Acids Res 49(D1):D884–D891

    Article  PubMed Central  Google Scholar 

  38. Hansen KD, Brenner SE, Dudoit S (2010) Biases in Illumina transcriptome sequencing caused by random hexamer priming. Nucleic Acids Res 38(12):e131

    Article  PubMed  PubMed Central  Google Scholar 

  39. Evans C, Hardin J, Stoebel DM (2018) Selecting between-sample RNA-Seq normalization methods from the perspective of their assumptions. Brief Bioinform 19(5):776–792

    Article  CAS  PubMed  Google Scholar 

  40. The Gene Ontology Consortium (2018) The Gene Ontology Resource: 20 years and still GOing strong. Nucleic Acids Res 47(D1):D330–D338

    Article  PubMed Central  Google Scholar 

  41. Kanehisa M et al (2017) KEGG: new perspectives on genomes, pathways, diseases and drugs. Nucleic Acids Res 45(D1):D353–D361

    Article  CAS  PubMed  Google Scholar 

  42. Yu G (2021) enrichplot: visualization of functional enrichment result. R package version 1.12.1. https://yulab-smu.top/biomedical-knowledge-mining-book/

  43. Maglott D et al (2011) Entrez Gene: gene-centered information at NCBI. Nucleic Acids Res 39(Database issue):D52–D57

    Article  CAS  PubMed  Google Scholar 

  44. Ewing B et al (1998) Base-calling of automated sequencer traces using Phred. I. Accuracy assessment. Genome Res 8(3):175–185

    Article  CAS  PubMed  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Andrew V. Benest .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature

About this protocol

Check for updates. Verify currency and authenticity via CrossMark

Cite this protocol

Pinel, G.D. et al. (2022). Endothelial Cell RNA-Seq Data: Differential Expression and Functional Enrichment Analyses to Study Phenotypic Switching. In: Benest, A.V. (eds) Angiogenesis. Methods in Molecular Biology, vol 2441. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-2059-5_29

Download citation

  • DOI: https://doi.org/10.1007/978-1-0716-2059-5_29

  • Published:

  • Publisher Name: Humana, New York, NY

  • Print ISBN: 978-1-0716-2058-8

  • Online ISBN: 978-1-0716-2059-5

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics