Abstract
Identification of new drug and cell therapy targets for disease treatment will be facilitated by a detailed molecular understanding of normal and disease development. Human pluripotent stem cells can provide a large in vitro source of human cell types and, in a growing number of instances, also three-dimensional multicellular tissues called organoids. The application of stem cell technology to discovery and development of new therapies will be aided by detailed molecular characterisation of cell identity, cell signalling pathways and target gene networks. Big data or ‘omics’ techniques—particularly transcriptomics and proteomics—facilitate cell and tissue characterisation using thousands to tens-of-thousands of genes or proteins. These gene and protein profiles are analysed using existing and/or emergent bioinformatics methods, including a growing number of methods that compare sample profiles against compendia of reference samples. This review assesses how compendium-based analyses can aid the application of stem cell technology for new therapy development. This includes via robust definition of differentiated stem cell identity, as well as elucidation of complex signalling pathways and target gene networks involved in normal and diseased states.
Similar content being viewed by others
References
Andersson R et al (2014) An atlas of active enhancers across human cell types and tissues. Nature 507:455–461. https://doi.org/10.1038/nature12787
Asp P et al (2011) Genome-wide remodeling of the epigenetic landscape during myogenic differentiation. Proc Natl Acad Sci U S A 108:E149–E158. https://doi.org/10.1073/pnas.1102223108
Babu MM, Luscombe NM, Aravind L, Gerstein M, Teichmann SA (2004) Structure and evolution of transcriptional regulatory networks. Curr Opin Struct Biol 14:283–291. https://doi.org/10.1016/j.sbi.2004.05.004
Bailey T et al (2013) Practical guidelines for the comprehensive analysis of ChIP-seq data. PLoS Comput Biol 9:e1003326. https://doi.org/10.1371/journal.pcbi.1003326
Banks CJ, Joshi A, Michoel T (2016) Functional transcription factor target discovery via compendia of binding and expression profiles. Sci Rep 6:20649. https://doi.org/10.1038/srep20649
Barrett T et al (2013) NCBI GEO: archive for functional genomics data sets—update. Nucleic Acids Res 41:D991–D995. https://doi.org/10.1093/nar/gks1193
Bebek G, Yang J (2007) PathFinder: mining signal transduction pathway segments from protein-protein interaction networks. BMC Bioinformatics 8:335. https://doi.org/10.1186/1471-2105-8-335
Beer MA, Tavazoie S (2004) Predicting gene expression from sequence. Cell 117:185–198
Berg J (2016) Gene-environment interplay. Science 354:15. https://doi.org/10.1126/science.aal0219
Boeva V (2016) Analysis of genomic sequence motifs for deciphering transcription factor binding and transcriptional regulation in eukaryotic. Cells Front Genet 7:24. https://doi.org/10.3389/fgene.2016.00024
Boyer LA et al (2005) Core transcriptional regulatory circuitry in human embryonic stem cells. Cell 122:947–956. https://doi.org/10.1016/j.cell.2005.08.020
Bumgarner R (2013) Overview of DNA microarrays: types, applications, and their future. Curr Protoc Mol Biol Chapter 22:Unit 22 21. https://doi.org/10.1002/0471142727.mb2201s101
Butcher EC, Berg EL, Kunkel EJ (2004) Systems biology in drug discovery. Nat Biotechnol 22:1253–1259. https://doi.org/10.1038/nbt1017
Chen K, Rajewsky N (2007) The evolution of gene regulation by transcription factors and microRNAs. Nat Rev Genet 8:93–103. https://doi.org/10.1038/nrg1990
Chen H et al (2015) Reinforcement of STAT3 activity reprogrammes human embryonic stem cells to naive-like pluripotency. Nat Commun 6:7095. https://doi.org/10.1038/ncomms8095
Cloonan N et al (2008) Stem cell transcriptome profiling via massive-scale mRNA sequencing. Nat Methods 5:613–619. https://doi.org/10.1038/nmeth.1223
Cohen SN, Chang AC, Boyer HW, Helling RB (1973) Construction of biologically functional bacterial plasmids in vitro. Proc Natl Acad Sci U S A 70:3240–3244
Collas P (2010) The current state of chromatin immunoprecipitation. Mol Biotechnol 45:87–100. https://doi.org/10.1007/s12033-009-9239-8
Consortium F et al (2014) A promoter-level mammalian expression atlas. Nature 507:462–470. https://doi.org/10.1038/nature13182
Consortium GT (2013) The genotype-tissue expression (GTEx) project. Nat Genet 45:580–585. https://doi.org/10.1038/ng.2653
Consortium TEP (2012) An integrated encyclopedia of DNA elements in the human genome. Nature 489:57–74. https://doi.org/10.1038/nature11247
Consortium TME (2012) An encyclopedia of mouse DNA elements (Mouse ENCODE). Genome Biol 13:418. https://doi.org/10.1186/gb-2012-13-8-418
Consortium TU (2007) The universal protein resource (UniProt). Nucleic Acids Res 35:D193–D197. https://doi.org/10.1093/nar/gkl929
Cressey D (2012) Stem cells take root in drug development. Nat News
Davidson EH et al (2002) A genomic regulatory network for development. Science 295:1669–1678. https://doi.org/10.1126/science.1069883
DeFreitas T, Saddiki H, Flaherty P (2016) GEMINI: a computationally-efficient search engine for large gene expression datasets. BMC Bioinf 17:102. https://doi.org/10.1186/s12859-016-0934-8
Djordjevic D, Kusumi K, Ho JW (2016) XGSA: a statistical method for cross-species gene set analysis. Bioinformatics 32:i620–i628. https://doi.org/10.1093/bioinformatics/btw428
Duggal G et al (2015) Alternative routes to induce naive pluripotency in human embryonic stem cells. Stem Cells 33:2686–2698. https://doi.org/10.1002/stem.2071
Engreitz JM, Chen R, Morgan AA, Dudley JT, Mallelwar R, Butte AJ (2011) ProfileChaser: searching microarray repositories based on genome-wide patterns of differential expression. Bioinformatics 27:3317–3318. https://doi.org/10.1093/bioinformatics/btr548
Fujibuchi W, Kiseleva L, Taniguchi T, Harada H, Horton P (2007) CellMontage: similar expression profile search server. Bioinformatics 23:3103–3104. https://doi.org/10.1093/bioinformatics/btm462
Furey TS (2012) ChIP-seq and beyond: new and improved methodologies to detect and characterize protein-DNA interactions. Nat Rev Genet 13:840–852. https://doi.org/10.1038/nrg3306
Germanguz I, Listgarten J, Cinkornpumin J, Solomon A, Gaeta X, Lowry WE (2016) Identifying gene expression modules that define human cell fates. Stem Cell Res 16:712–724. https://doi.org/10.1016/j.scr.2016.04.008
Gil DP, Law JN, Murali TM (2017) The PathLinker app: connect the dots in protein interaction networks. F1000Res 6:58. https://doi.org/10.12688/f1000research.9909.1
Gitter A, Klein-Seetharaman J, Gupta A, Bar-Joseph Z (2011) Discovering pathways by orienting edges in protein interaction networks. Nucleic Acids Res 39:e22. https://doi.org/10.1093/nar/gkq1207
Hackney JA, Moore KA (2005) A functional genomics approach to hematopoietic stem cell regulation. Methods Mol Med 105:439–452
Hamosh A, Scott AF, Amberger JS, Bocchini CA, McKusick VA (2005) Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders. Nucleic Acids Res 33:D514–D517. https://doi.org/10.1093/nar/gki033
Han X, Aslanian A, Yates JR 3rd (2008) Mass spectrometry for proteomics. Curr Opin Chem Biol 12:483–490. https://doi.org/10.1016/j.cbpa.2008.07.024
Hannah R, Joshi A, Wilson NK, Kinston S, Gottgens B (2011) A compendium of genome-wide hematopoietic transcription factor maps supports the identification of gene regulatory control mechanisms. Exp Hematol 39:531–541. https://doi.org/10.1016/j.exphem.2011.02.009
Heinz S et al (2010) Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. Mol Cell 38:576–589. https://doi.org/10.1016/j.molcel.2010.05.004
Hibbs MA, Hess DC, Myers CL, Huttenhower C, Li K, Troyanskaya OG (2007) Exploring the functional landscape of gene expression: directed search of large microarray compendia. Bioinformatics 23:2692–2699. https://doi.org/10.1093/bioinformatics/btm403
Hirst M et al (2007) LongSAGE profiling of nine human embryonic stem cell lines. Genome Biol 8:R113. https://doi.org/10.1186/gb-2007-8-6-r113
Hoopes L (2008) Introduction to the gene expression and regulation topic room. Nat Educ 1(1)
Huang DW, Sherman BT, Lempicki RA (2009a) Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists. Nucleic Acids Res 37:1–13. https://doi.org/10.1093/nar/gkn923
Huang DW, Sherman BT, Lempicki RA (2009b) Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat Protoc 4:44–57. https://doi.org/10.1038/nprot.2008.211
Janky R et al (2014) iRegulon: from a gene list to a gene regulatory network using large motif and track collections. PLoS Comput Biol 10:e1003731. https://doi.org/10.1371/journal.pcbi.1003731
Kabir MH, Djordjevic D, O’Connor MD, Ho JWK (2018a) C3: an R package for cross-species compendium-based cell-type identification. Comput Biol Chem 77:187–192
Kabir MH, Murphy P, Lim S, Ho JWK, O’Connor MD (2018b) Large scale profiling of lens epithelial cell signalling pathways and target genes reveals regulatory networks for cataract-associated genes. Exp Eye Res (under review)
Kabir MH, Patrick R, Ho JWK, O’Connor MD (2018c) Identification of active signaling pathways by integrating gene expression and protein interaction data. BMC Syst Biol in press
Kanehisa M, Goto S (2000) KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res 28:27–30
Kim HD, O'Shea EK (2008) A quantitative model of transcription factor-activated gene expression. Nat Struct Mol Biol 15:1192–1198. https://doi.org/10.1038/nsmb.1500
Kuleshov MV et al (2016) Enrichr: a comprehensive gene set enrichment analysis web server 2016 update. Nucleic Acids Res 44:W90–W97. https://doi.org/10.1093/nar/gkw377
Lee TI et al (2006) Control of developmental regulators by Polycomb in human embryonic stem cells. Cell 125:301–313. https://doi.org/10.1016/j.cell.2006.02.043
Liu Y, Zhao H (2004) A computational approach for ordering signal transduction pathway components from genomics and proteomics. Data BMC Bioinf 5:158. https://doi.org/10.1186/1471-2105-5-158
Marbach D, Lamparter D, Quon G, Kellis M, Kutalik Z, Bergmann S (2016) Tissue-specific regulatory circuits reveal variable modular perturbations across complex diseases. Nat Methods 13:366–370. https://doi.org/10.1038/nmeth.3799
Mardis ER (2007) ChIP-seq: welcome to the new frontier. Nat Methods 4:613–614. https://doi.org/10.1038/nmeth0807-613
Medina I et al (2010) Babelomics: an integrative platform for the analysis of transcriptomics, proteomics and genomic data with advanced functional profiling. Nucleic Acids Res 38:W210–W213. https://doi.org/10.1093/nar/gkq388
Mei S, Zhu H (2015) Multi-label multi-instance transfer learning for simultaneous reconstruction and cross-talk modeling of multiple human signaling pathways. BMC Bioinf 16:417. https://doi.org/10.1186/s12859-015-0841-4
Murphy P et al (2018) Light-focusing human micro-lenses generated from pluripotent stem cells model lens development and drug-induced cataract in vitro. Development 145. https://doi.org/10.1242/dev.155838
O'Connor MD (2013) The 3R principle: advancing clinical application of human pluripotent stem cells. Stem Cell Res Ther 4:21. https://doi.org/10.1186/scrt169
O'Connor MD, Kardel MD, Eaves CJ (2011a) Functional assays for human embryonic stem cell pluripotency. Methods Mol Biol 690:67–80. https://doi.org/10.1007/978-1-60761-962-8_4
O'Connor MD et al (2011b) Retinoblastoma-binding proteins 4 and 9 are important for human pluripotent stem cell maintenance. Exp Hematol 39:866–879 e861. https://doi.org/10.1016/j.exphem.2011.05.008
Pinto JP, Reddy Kalathur RK, Machado RS, Xavier JM, Braganca J, Futschik ME (2014) StemCellNet: an interactive platform for network-oriented investigations in stem cell biology. Nucleic Acids Res 42:W154–W160. https://doi.org/10.1093/nar/gku455
Rackham OJ et al (2016) A predictive computational framework for direct reprogramming between human cell types. Nat Genet 48:331–335. https://doi.org/10.1038/ng.3487
Ralston A, Shaw K (2008) Gene expression regulates cell differentiation. Nat Educ 1(1)
Reich M, Liefeld T, Gould J, Lerner J, Tamayo P, Mesirov JP (2006) GenePattern 2.0. Nat Genet 38:500–501. https://doi.org/10.1038/ng0506-500
Respuela P, Nikolic M, Tan M, Frommolt P, Zhao Y, Wysocka J, Rada-Iglesias A (2016) Foxd3 promotes exit from naive pluripotency through enhancer decommissioning and inhibits germline specification cell. Stem Cell 18:118–133. https://doi.org/10.1016/j.stem.2015.09.010
Ritchie ME, Phipson B, Wu D, Hu Y, Law CW, Shi W, Smyth GK (2015) limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res 43:e47. https://doi.org/10.1093/nar/gkv007
Ritz A et al (2016) Pathways on demand: automated reconstruction of human signaling networks. NPJ Syst Biol Appl 2:16002. https://doi.org/10.1038/npjsba.2016.2
Robinson MD, McCarthy DJ, Smyth GK (2010) edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26:139–140. https://doi.org/10.1093/bioinformatics/btp616
Roider HG, Manke T, O'Keeffe S, Vingron M, Haas SA (2009) PASTAA: identifying transcription factors associated with sets of co-regulated genes. Bioinformatics 25:435–442. https://doi.org/10.1093/bioinformatics/btn627
Ruau D et al (2013) Building an ENCODE-style data compendium on a shoestring. Nat Methods 10:926. https://doi.org/10.1038/nmeth.2643
Scott J, Ideker T, Karp RM, Sharan R (2006) Efficient algorithms for detecting signaling pathways in protein interaction networks. J Comput Biol 13:133–144
Shanks N, Greek R, Greek J (2009) Are animal models predictive for humans? Philos Ethics Humanit Med 4:2. https://doi.org/10.1186/1747-5341-4-2
Sharov AA et al (2008) Identification of Pou5f1, Sox2, and Nanog downstream target genes with statistical confidence by applying a novel algorithm to time course microarray and genome-wide chromatin immunoprecipitation data. BMC Genomics 9:269. https://doi.org/10.1186/1471-2164-9-269
Shiels A, Bennett TM, Hejtmancik JF (2010) Cat-Map: putting cataract on the map. Mol Vis 16:2007–2015
Spitz F, Furlong EE (2012) Transcription factors: from enhancer binding to developmental control. Nat Rev Genet 13:613–626. https://doi.org/10.1038/nrg3207
Steffen M, Petti A, Aach J, D'Haeseleer P, Church G (2002) Automated modelling of signal transduction networks. BMC Bioinf 3:34
Tuncbag N et al (2013) Simultaneous reconstruction of multiple signaling pathways via the prize-collecting steiner forest problem. J Comput Biol 20:124–136. https://doi.org/10.1089/cmb.2012.0092
Ungrin M, O'Connor M, Eaves C, Zandstra PW (2007) Phenotypic analysis of human embryonic stem cells. Curr Protoc Stem Cell Biol Chapter 1:Unit 1B 3. https://doi.org/10.1002/9780470151808.sc01b03s2
Van der Jeught M et al (2015) Application of small molecules favoring naive pluripotency during human embryonic stem cell derivation. Cell Reprogram 17:170–180. https://doi.org/10.1089/cell.2014.0085
von Mering C, Huynen M, Jaeggi D, Schmidt S, Bork P, Snel B (2003) STRING: a database of predicted functional associations between proteins. Nucleic Acids Res 31:258–261
Wang K et al (2011) CASCADE_SCAN: mining signal transduction network from high-throughput data based on steepest descent method. BMC Bioinf 12:164. https://doi.org/10.1186/1471-2105-12-164
Warrier S et al (2017) Direct comparison of distinct naive pluripotent states in human embryonic stem cells. Nat Commun 8:15055. https://doi.org/10.1038/ncomms15055
Zacher B, Michel M, Schwalb B, Cramer P, Tresch A, Gagneur J (2017) Accurate promoter and enhancer identification in 127 ENCODE and roadmap epigenomics cell types and tissues by GenoSTAN. PLoS One 12:e0169249. https://doi.org/10.1371/journal.pone.0169249
Zhang L, Mallick BK (2013) Inferring gene networks from discrete expression data. Biostatistics 14:708–722. https://doi.org/10.1093/biostatistics/kxt021
Zhang S, Cao J, Kong YM, Scheuermann RH (2010) GO-Bayes: Gene Ontology-based overrepresentation analysis using a Bayesian approach. Bioinformatics 26:905–911. https://doi.org/10.1093/bioinformatics/btq059
Zhao XM, Li S (2017) HISP: a hybrid intelligent approach for identifying directed signaling pathways. J Mol Cell Biol 9:453–462. https://doi.org/10.1093/jmcb/mjx054
Zhao XM, Wang RS, Chen L, Aihara K (2008) Uncovering signal transduction networks from high-throughput data by integer linear programming. Nucleic Acids Res 36:e48. https://doi.org/10.1093/nar/gkn145
Zinman GE, Naiman S, Kanfi Y, Cohen H, Bar-Joseph Z (2013) ExpressionBlast: mining large, unstructured expression databases. Nat Methods 10:925–926. https://doi.org/10.1038/nmeth.2630
Funding
M.H.K was supported by WSU Postgraduate Research Awards. M.D.O’C was supported by The Medical Advances Without Animals Trust.
Author information
Authors and Affiliations
Contributions
M.H.K drafted the manuscript. M.H.K and M.D.O’C revised and approved the manuscript.
Corresponding author
Ethics declarations
Conflict of interest
Md Humayun Kabir declares that he has no conflict of interest. Michael D. O’Connor declares that he has no conflict of interest.
Ethical approval
This article does not contain any studies with human or animal subjects performed by any of the authors.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This article is part of a Special Issue on ‘Big Data’ edited by Joshua WK Ho and Eleni Giannoulatou.
Rights and permissions
About this article
Cite this article
Kabir, M.H., O’Connor, M.D. Stems cells, big data and compendium-based analyses for identifying cell types, signalling pathways and gene regulatory networks. Biophys Rev 11, 41–50 (2019). https://doi.org/10.1007/s12551-018-0486-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12551-018-0486-4