Abstract
RNA-seq is a common approach used to explore gene expression data between experimental conditions or cell types and ultimately leads to information that can shed light on the biological processes involved and inform further hypotheses. While the protocols required to generate samples for sequencing can be performed in most research facilities, the resulting computational analysis is often an area in which researchers have little experience. Here we present a user-friendly bioinformatics workflow which describes the methods required to take raw data produced by RNA sequencing to interpretable results. Widely used and well documented tools are applied. Data quality assessment and read trimming were performed by FastQC and Cutadapt, respectively. Following this, STAR was utilized to map the trimmed reads to a reference genome and the alignment was analyzed by Qualimap. The subsequent mapped reads were quantified by featureCounts. DESeq2 was used to normalize and perform differential expression analysis on the quantified reads, identifying differentially expressed genes and preparing the data for functional enrichment analysis. Gene set enrichment analysis identified enriched gene sets from the normalized count data and clusterProfiler was used to perform functional enrichment against the GO, KEGG, and Reactome databases. Example figures of the functional enrichment analysis results were also generated. The example data used in the workflow are derived from HUVECs, an in vitro model used in the study of endothelial cells, published and publicly available for download from the European Nucleotide Archive.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Stark R, Grzelak M, Hadfield J (2019) RNA sequencing: the teenage years. Nat Rev Genet 20(11):631–656
Wang Z, Gerstein M, Snyder M (2009) RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet 10(1):57–63
Oshlack A, Robinson MD, Young MD (2010) From RNA-seq reads to differential expression results. Genome Biol 11(12):220–220
Conesa A et al (2016) A survey of best practices for RNA-seq data analysis. Genome Biol 17(1):13
Soneson C, Delorenzi M (2013) A comparison of methods for differential expression analysis of RNA-seq data. BMC Bioinformatics 14(1):91
Anjum A et al (2016) Identification of differentially expressed genes in RNA-seq data of Arabidopsis thaliana: a compound distribution approach. J Comput Biol 23(4):239–247
Hochberg Y, Benjamini Y (1990) More powerful procedures for multiple significance testing. Stat Med 9(7):811–818
Reimand J et al (2019) Pathway enrichment analysis and visualization of omics data using g:Profiler, GSEA, Cytoscape and EnrichmentMap. Nat Protoc 14(2):482–517
Andrade J et al (2021) Control of endothelial quiescence by FOXO-regulated metabolites. Nat Cell Biol 23(4):413–423
Kocherova I et al (2019) Human umbilical vein endothelial cells (HUVECs) co-culture with osteogenic cells: from molecular communication to engineering prevascularised bone grafts. J Clin Med 8(10):1602
FastQC (2015) https://qubeshub.org/resources/fastqc
Andrews S (2010) FastQC: a quality control tool for high throughput sequence data [Online]. http://www.bioinformatics.babraham.ac.uk/projects/fastqc/
Martin M (2011) Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet J 17(1):3
Dobin A et al (2013) STAR: ultrafast universal RNA-seq aligner. Bioinformatics (Oxford, England) 29(1):15–21
Okonechnikov K, Conesa A, García-Alcalde F (2015) Qualimap 2: advanced multi-sample quality control for high-throughput sequencing data. Bioinformatics 32(2):292–294
Liao Y, Smyth GK, Shi W (2013) featureCounts: an efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics 30(7):923–930
Love MI, Huber W, Anders S (2014) Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol 15(12):550
Yu G et al (2012) clusterProfiler: an R package for comparing biological themes among gene clusters. OMICS 16(5):284–287
Yu G, He Q-Y (2016) ReactomePA: an R/Bioconductor package for reactome pathway analysis and visualization. Mol BioSyst 12(2):477–479
Subramanian A et al (2005) Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci U S A 102(43):15545–15550
Mootha VK et al (2003) PGC-1α-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes. Nat Genet 34(3):267–273
Ashburner M et al (2000) Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet 25(1):25–29
Gene Ontology Consortium (2020) The Gene Ontology resource: enriching a GOld mine. Nucleic Acids Res 49(D1):D325–D334
Kanehisa M (2019) Toward understanding the origin and evolution of cellular organisms. Protein Sci 28(11):1947–1951
Kanehisa M et al (2020) KEGG: integrating viruses and cellular organisms. Nucleic Acids Res 49(D1):D545–D551
Kanehisa M, Goto S (2000) KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res 28(1):27–30
Jassal B et al (2019) The reactome pathway knowledgebase. Nucleic Acids Res 48(D1):D498–D503
Liberzon A et al (2015) The Molecular Signatures Database (MSigDB) hallmark gene set collection. Cell Syst 1(6):417–425
R Core Team (2014) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria
R Studio Team (2020) RStudio: integrated development for R. RStudio, PBC, Boston, MA
Luo W, Brouwer C (2013) Pathview: an R/Bioconductor package for pathway-based data integration and visualization. Bioinformatics 29(14):1830–1831
Zhu A, Ibrahim JG, Love MI (2018) Heavy-tailed prior distributions for sequence count data: removing the noise and preserving large differences. Bioinformatics 35(12):2084–2092
Durinck S et al (2005) BioMart and Bioconductor: a powerful link between biological databases and microarray data analysis. Bioinformatics 21(16):3439–3440
Durinck S et al (2009) Mapping identifiers for the integration of genomic datasets with the R/Bioconductor package biomaRt. Nat Protoc 4(8):1184–1191
Wickham H (2016) ggplot2: elegant graphics for data analysis. Springer, New York. ISBN 978-3-319-24277-4. https://ggplot2.tidyverse.org
Leinonen R et al (2010) The European Nucleotide Archive. Nucleic Acids Res 39(Suppl_1):D28–D31
Howe KL et al (2020) Ensembl 2021. Nucleic Acids Res 49(D1):D884–D891
Hansen KD, Brenner SE, Dudoit S (2010) Biases in Illumina transcriptome sequencing caused by random hexamer priming. Nucleic Acids Res 38(12):e131
Evans C, Hardin J, Stoebel DM (2018) Selecting between-sample RNA-Seq normalization methods from the perspective of their assumptions. Brief Bioinform 19(5):776–792
The Gene Ontology Consortium (2018) The Gene Ontology Resource: 20 years and still GOing strong. Nucleic Acids Res 47(D1):D330–D338
Kanehisa M et al (2017) KEGG: new perspectives on genomes, pathways, diseases and drugs. Nucleic Acids Res 45(D1):D353–D361
Yu G (2021) enrichplot: visualization of functional enrichment result. R package version 1.12.1. https://yulab-smu.top/biomedical-knowledge-mining-book/
Maglott D et al (2011) Entrez Gene: gene-centered information at NCBI. Nucleic Acids Res 39(Database issue):D52–D57
Ewing B et al (1998) Base-calling of automated sequencer traces using Phred. I. Accuracy assessment. Genome Res 8(3):175–185
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature
About this protocol
Cite this protocol
Pinel, G.D. et al. (2022). Endothelial Cell RNA-Seq Data: Differential Expression and Functional Enrichment Analyses to Study Phenotypic Switching. In: Benest, A.V. (eds) Angiogenesis. Methods in Molecular Biology, vol 2441. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-2059-5_29
Download citation
DOI: https://doi.org/10.1007/978-1-0716-2059-5_29
Published:
Publisher Name: Humana, New York, NY
Print ISBN: 978-1-0716-2058-8
Online ISBN: 978-1-0716-2059-5
eBook Packages: Springer Protocols