ABSTRACT
We develop a new framework for inferring models of transcriptional regulation. The models in this approach, which we call physical models, are constructed on the basis of verifiable molecular attributes of the underlying biological system. The attributes include, for example, the existence of protein-protein and protein-DNA interactions in gene regulatory processes, the directionality of signal transduction in protein-protein interactions, as well as the signs of the immediate effects of these interactions (e.g., whether an upstream gen activates or represses the downstream genes). Each attribute is included as a variable in the model, and the variables define a collection of annotated random graphs. Possible configurations of these variables (realizations of the underlying biological system) are constrained by the available data sources. Some of the data sources such as factor-binding data (location data) involve measurements that are directly tied to the variables in the model. Other sources such as gene knock-outs are functional in nature and provide only indirect evidence about the (physical) variables. We associate each knock-out effect in the deletion mutant data with a set of causal paths (molecular cascades) that could in principle explain the effect, resulting in aggregate constraints about the physical variables in the model. The most likely setting of all the variables is found by the max-product algorithm. By testing our approach on datasets related to the pheromone response pathway in S. cerevisiae, we demonstrate that the resulting transcriptional models are consistent with previous studies about the pathway. Moreover, we show that the approach is capable of predicting gene knock-out effects with high degree of accuracy in a cross-validation setting. The method also implicates likely molecular cascades responsible for each observed knock-out effect. The inference results are robust against variations in the model parameters. We can extend the approach to include other data sources such as time course expression profiles. We also discuss coordinated regulation and the use of automated experiment design
- B. Frey and D. MacKay. AR evolution : Belief Propagation in Graphs with Cycles. In Advances in Neural Information Processing Systems (NIPS), pages 479--485, 1997. Google ScholarDigital Library
- A. Hartemink, D. Gi.ord, T. Jaakkola, and R. Young. Combining Location and Expression Data for Principled Discovery of Genetic Regulatory Network Models. In PSB Proceedings, 2002.Google Scholar
- T. Hughes, M. Marton, A . Jones, C. Roberts, R. Stoughton, C. Armour, H. Bennett, E. Coffey, H. Dai, Y. He, M. Kidd, A. King, M. Meyer, D. Stade, P. Lum, S. Stepaniants, D. Shoemaker, D. Gachotte, K. Chakraburtty, J. Simon, M. Bard, and S. Friend. Functional Discovery via a Compendium of Expression Pro.les. Cell, 102:109--126, 2000.Google ScholarCross Ref
- T. Ideker, V. Thorsson, J. Ranish, R. Christmas, J. Buhler, J. Eng, R. Bumgarner, D. Goodlett, R. Aebersold, and L. Hood. Integrated Genomic and Proteomic Analysis of a Systematically Perturbed Metabolic Network. Science, 292:929--934, 2001.Google ScholarCross Ref
- F. Kschischang, B. Frey, and H. Loeliger. Factor Graphs and the Sum-Product Algorithm. IEEE Transactions on Information Theory, 47(2):498--519, 2001. Google ScholarDigital Library
- T. Lee, N. Rinaldi, F. Robert, D. Odom, Z. Bar-Joseph, G. Gerber, N. Hannett, C. Harbison, C. Thompson, I. Simon, J. Zeitlinger, E. Jennings, H. Murray, D. Gordon, B. Ren, J. Wyrick, J. Tagne, T. Volkert, D. G. E. Fraenkel and, and R. Young. Transcriptional Regulatory Networks in Saccharomyces Cerevisiae. Science, 298:799--804, 2002.Google Scholar
- D. Lohr, P. Venkov, and J. Zlatanova. Transcriptional Regulation in Yeast GAL Gene Family: a Complex Genetic Network. FASEB Journal, 9:777--787, 1995.Google ScholarCross Ref
- B. Ren, F. Robert, J. Wyrick, O. Aparicio, E. Jennings, I. Simon, J. Zeitlinger, J. Schreiber, N. Hannett, E. Kanin, T. Volkert, C. Wilson, S. Bell, and R. Young. Genome-wide Location and Function of DNA-binding Proteins. Science, 290:2306--2309, 2000.Google Scholar
- E. Segal, Y. Barash, I. Simon, N. Friedman, and D. Koller. From Promoter Sequence to Expression: A Probabilistic Framework. In RECOMB Proceedings, pages 263--272, 2002. Google ScholarDigital Library
- P. Uetz, L. Giot, G. Cagney, T. Mansfield, R. Judson, J. Knight, D. Lockshon, V. Narayan, M. Srinivasan, P. Pochart, A. Qureshi-Emili, Y. Li, B. Godwin, D. Conover, T. Kalb.eisch, G. Vijayadamodar, M. Yang, M. Johnston, S. Fields, and J.M.Rothberg. AComp rehensive Analysis of Protein-Protein Interactions in Saccharomyces Cerevisiae. Nature, 403:623--627, 2000.Google Scholar
- M. Wainwright, T. Jaakkola, and A. Wilsky. Exact MAP Estimates by (Hyper)Tree Agreement. In Advances in Neural Information Processing Systems (NIPS), 2002.Google Scholar
- Y. Weiss and W. Freeman. Correctness of Belief Propagation in Gaussian Graphical Models of Arbitrary Topology. Neural Computation, 13:2173--2200, 2001. Google ScholarDigital Library
- Y. Weiss and W. Freeman. On the Optimality Solutions of the Max-Product Belief Propagation Algorithm in Arbitrary Graphs. IEEE Transactions on Information Theory, 47:736--744, 2001. Google ScholarDigital Library
- J. Yedidia, W. Freeman, and Y. Weiss. Generalized Belief Propagation. In Advances in Neural Information Processing Systems (NIPS), volume 13, 2000.Google Scholar
Index Terms
- Physical network models and multi-source data integration
Recommendations
A data integration method for exploring gene regulatory mechanisms
DTMBIO '08: Proceedings of the 2nd international workshop on Data and text mining in bioinformaticsSystems biology aims to understand the behavior of and interaction between various components of the living cell, such as genes, proteins, and metabolites. A large number of components are involved in these complex systems and the diversity of ...
Integration of multi-omics data for integrative gene regulatory network inference
Gene regulatory networks provide comprehensive insights and in-depth understanding of complex biological processes. The molecular interactions of gene regulatory networks are inferred from a single type of genomic data, e.g., gene expression data in ...
Network Transformation of Gene Expression for Feature Extraction
ICMLA '12: Proceedings of the 2012 11th International Conference on Machine Learning and Applications - Volume 01Classical approaches to analyze transcriptomic data usually produce average classification models that have very low reproducibility. In this work, genome wide gene expression is considered through the activity of large regulatory networks. We introduce ...
Comments