Abstract
We propose a novel systematic procedure of non-linear data transformation for an adaptive algorithm in the context of network reverse-engineering using information theoretic methods. Our methodology is rooted in elucidating and correcting for the specific biases in the estimation techniques for mutual information (MI) given a finite sample of data. These are, in turn, tied to lack of well-defined bounds for numerical estimation of MI for continuous probability distributions from finite data. The nature and properties of the inevitable bias is described, complemented by several examples illustrating their form and variation. We propose an adaptive partitioning scheme for MI estimation that effectively transforms the sample data using parameters determined from its local and global distribution guaranteeing a more robust and reliable reconstruction algorithm. Together with a normalized measure (Shared Information Metric) we report considerably enhanced performance both for in silico and real-world biological networks. We also find that the recovery of true interactions is in particular better for intermediate range of false positive rates, suggesting that our algorithm is less vulnerable to spurious signals of association.
References
Bansal, M., V. Belcastro, A. Ambesi-Impiombato and D. di Bernardo (2007): “How to infer gene networks from expression profiles,” Mol. Syst. Biol., 3, 78, http://dx.doi.org/10.1038/msb4100120.10.1038/msb4100120Search in Google Scholar PubMed PubMed Central
Beal, M. J., F. Falciani, Z. Ghahramani, C. Rangel and D. L. Wild (2005): “A Bayesian approach to reconstructing genetic regulatory networks with hidden factors,” Bioinformatics, 21, 349–356.10.1093/bioinformatics/bti014Search in Google Scholar PubMed
Butte, A. J. and I. S. Kohane (2000): “Mutual information relevance networks: functional genomic clustering using pairwise entropy measurements,” Pac. Symp. Biocomput., 426, 418–429.10.1142/9789814447331_0040Search in Google Scholar PubMed
Chan, T. E., M. Stumpf and A. C. Babtie (2016): “Network inference and hypotheses-generation from single-cell transcriptomic data using multivariate information measures,” bioRxiv. http://dx.doi.org/10.1101/082099.http://dx.doi.org/10.1101/082099Search in Google Scholar
de Matos Simoes, R. and F. Emmert-Streib (2011): “Influence of statistical estimators of mutual information and data heterogeneity on the inference of gene regulatory networks,” PLoS One, 6, e29279.10.1371/journal.pone.0029279Search in Google Scholar PubMed PubMed Central
Faith, J. J., B. Hayete, J. T. Thaden, I. Mogno, J. Wierzbowski, G. Cottarel, S. Kasif, J. J. Collins and T. S. Gardner (2007): “Large-scale mapping and validation of Escherichia coli transcriptional regulation from a compendium of expression profiles,” PLoS Biol., 5, e8.10.1371/journal.pbio.0050008Search in Google Scholar PubMed PubMed Central
Fraser and Swinney (1986): “Independent coordinates for strange attractors from mutual information,” Phys. Rev. A, 33, 1134–1140.10.1103/PhysRevA.33.1134Search in Google Scholar PubMed
Guimerà, R. and M. Sales-Pardo (2009): “Missing and spurious interactions and the reconstruction of complex networks,” Proc. Natl. Acad. Sci. U.S.A., 106, 22073–22078.10.1073/pnas.0908366106Search in Google Scholar PubMed PubMed Central
Gustafsson, M., M. Hörnquist, J. Lundström, J. Björkegren and J. Tegnér (2009): “Reverse engineering of gene networks with LASSO and nonlinear basis functions,” Ann. N. Y. Acad. Sci., 1158, 265–275.10.1111/j.1749-6632.2008.03764.xSearch in Google Scholar PubMed
Hausser, J. and K. Strimmer (2009): “Entropy inference and the James-Stein estimator, with application to nonlinear gene association networks,” J. Mach. Learn. Res., 10, 1469–1484.Search in Google Scholar
Hecker, M., S. Lambeck, S. Toepfer, E. van Someren and R. Guthke (2009): “Gene regulatory network inference: data integration in dynamic models-a review,” Biosystems, 96, 86–103.10.1016/j.biosystems.2008.12.004Search in Google Scholar PubMed
Hendrickx, D. M., M. M. W. B. Hendriks, P. H. C. Eilers, A. K. Smilde and H. C. J. Hoefsloot (2011): “Reverse engineering of metabolic networks, a critical assessment,” Mol. Biosyst., 7, 511–520.10.1039/C0MB00083CSearch in Google Scholar
Hickman, G. J. and T. C. Hodgman (2009): “Inference of gene regulatory networks using boolean-network inference methods,” J. Bioinform. Comput. Biol., 7, 1013–1029.10.1142/S0219720009004448Search in Google Scholar PubMed
Hill, S. M., Y. Lu, J. Molina, L. M. Heiser, P. T. Spellman, T. P. Speed, J. W. Gray, G. B. Mills and S. Mukherjee (2012): “Bayesian inference of signaling network topology in a cancer cell line,” Bioinformatics, 28, 2804–2810.10.1093/bioinformatics/bts514Search in Google Scholar PubMed PubMed Central
Kinney, J. B. and G. S. Atwal (2014): “Equitability, mutual information, and the maximal information coefficient,” Proc. Natl. Acad. Sci. U.S.A., 111, 3354–3359.10.1073/pnas.1309933111Search in Google Scholar PubMed PubMed Central
Kraskov, A., H. Stögbauer and P. Grassberger (2004): “Estimating mutual information,” Phys. Rev. E, 69, 066138.10.1103/PhysRevE.69.066138Search in Google Scholar PubMed
LESNE, A. (2014): “Shannon entropy: a rigorous notion at the crossroads between probability, information theory, dynamical systems and statistical physics,” Math. Struct. Comput. Sci., 24, e240311.10.1017/S0960129512000783Search in Google Scholar
Liang, S., S. Fuhrman and R. Somogyi (1998): “Reveal, a general reverse engineering algorithm for inference of genetic network architectures,” Pac. Symp. Biocomput., 3, 18–29.Search in Google Scholar
Madar, A., A. Greenfield, E. Vanden-Eijnden and R. Bonneau (2010): “DREAM3: network inference using dynamic context likelihood of relatedness and the inferelator,” PLoS One, 5, e9803.10.1371/journal.pone.0009803Search in Google Scholar PubMed PubMed Central
Marbach, D., J. C. Costello, R. Kïffner, N. M. Vega, R. J. Prill, D. M. Camacho, K. R. Allison, D. R. E. A. M. Consortium, M. Kellis, J. J. Collins and G. Stolovitzky (2012): “Wisdom of crowds for robust gene network inference,” Nat. Methods, 9, 796–804.10.1038/nmeth.2016Search in Google Scholar PubMed PubMed Central
Margolin, A. A., I. Nemenman, K. Basso, C. Wiggins, G. Stolovitzky, R. D. Favera and A. Califano (2006): “ARACNE: an algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context,” BMC Bioinform., 7 (Suppl 1), S7.10.1186/1471-2105-7-S1-S7Search in Google Scholar PubMed PubMed Central
Markowetz, F. and R. Spang (2007): “Inferring cellular networks–a review,” BMC Bioinform., 8 (Suppl 6), S5.10.1186/1471-2105-8-S6-S5Search in Google Scholar PubMed PubMed Central
Mc Mahon, S. S., A. Sim, S. Filippi, R. Johnson, J. Liepe, D. Smith and M. P. Stumpf (2014): “Information theory and signal transduction systems: From molecular information processing to network inference,” Semin. Cell Dev. Biol., 35, 98–108.10.1016/j.semcdb.2014.06.011Search in Google Scholar PubMed
Meyer, P. E., K. Kontos, F. Lafitte and G. Bontempi (2007): “Information-theoretic inference of large transcriptional regulatory networks,” EURASIP J. Bioinform. Syst. Biol., 2007, 79879, http://dx.doi.org/10.1155/2007/79879.http://dx.doi.org/10.1155/2007/79879Search in Google Scholar PubMed PubMed Central
Miller, G. (1955): “Note on the bias of information estimates,” Inf. Theory Psychol. Probl. Methods, 2, 95–100.Search in Google Scholar
Moon, Rajagopalan and Lall (1995): “Estimation of mutual information using kernel density estimators,” Phys. Rev. E. Stat. Phys. Plasmas Fluids Relat. Interdiscip. Top., 52, 2318–2321.10.1103/PhysRevE.52.2318Search in Google Scholar
Mukherjee, S. and T. P. Speed (2008): “Network inference using informative priors,” Proc. Natl. Acad. Sci. U. S. A., 105, 14313–14318.10.1073/pnas.0802272105Search in Google Scholar PubMed PubMed Central
Paninski, L. (2003): “Estimation of entropy and mutual information,” Neural Comput., 15, 1191–1253.10.1162/089976603321780272Search in Google Scholar
Papana, A. and D. Kugiumtzis (2008): “Evaluation of mutual information estimators on nonlinear dynamic systems,” NONLINEAR Phenom. COMPLEX Syst., 225–232, http://arxiv.org/abs/0809.2149.Search in Google Scholar
Shannon, C. E. (1948): “A mathematical theory of communication,” Bell Syst. Tech. J., 27, 379–423.10.1002/j.1538-7305.1948.tb01338.xSearch in Google Scholar
Studham, M. E., A. Tjärnberg, T. E. M. Nordling, S. Nelander and E. L. L. Sonnhammer (2014): “Functional association networks as priors for gene regulatory network inference,” Bioinformatics, 30, i130–i138.10.1093/bioinformatics/btu285Search in Google Scholar PubMed PubMed Central
Viger, F. and M. Latapy (2015): “Efficient and simple generation of random simple connected graphs with prescribed degree sequence,” J. Complex Networks, 4(1), 15–37. http://doi.org/10.1093/comnet/cnv013.http://doi.org/10.1093/comnet/cnv013Search in Google Scholar
Villaverde, A. F., J. Ross, F. Morán and J. R. Banga (2014): “MIDER: Network inference with mutual information distance and entropy reduction,” PLoS One, 9, e96732.10.1371/journal.pone.0096732Search in Google Scholar PubMed PubMed Central
Vinciotti, V., L. Augugliaro, A. Abbruzzo and E. C. Wit (2016): “Model selection for factorial Gaussian graphical models with an application to dynamic regulatory networks,” Stat. Appl. Genet. Mol. Biol., 15, 193–212.10.1515/sagmb-2014-0075Search in Google Scholar PubMed
Werhli, A. V. and D. Husmeier (2008): “Gene regulatory network reconstruction by Bayesian integration of prior knowledge and/or different experimental conditions,” J. Bioinform. Comput. Biol., 6, 543–572.10.1142/S0219720008003539Search in Google Scholar PubMed
Yeung, M. K. S., J. Tegnér and J. J. Collins (2002): “Reverse engineering gene networks using singular value decomposition and robust regression,” Proc. Natl. Acad. Sci. U. S. A., 99, 6163–6168, http://www.pnas.org/content/99/9/6163.abstract.10.1073/pnas.092576199Search in Google Scholar PubMed PubMed Central
Yuan, Y., C.-T. Li and O. Windram (2011): “Directed partial correlation: inferring large-scale gene regulatory network through induced topology disruptions,” PLoS One, 6, e16835.10.1371/journal.pone.0016835Search in Google Scholar PubMed PubMed Central
Zhang, Z. and L. Zheng (2015): “A mutual information estimator with exponentially decaying bias,” Stat. Appl. Genet. Mol. Biol., 14, 243–252.10.1515/sagmb-2014-0047Search in Google Scholar PubMed
Zhang, Z., Z. Zheng, H. Niu, Y. Mi, S. Wu and G. Hu (2015): “Solving the inverse problem of noise-driven dynamic networks,” Phys. Rev. E, 91, 12814.10.1103/PhysRevE.91.012814Search in Google Scholar PubMed
Supplemental Material
The online version of this article (DOI:10.1515/sagmb-2016-0013) offers supplementary material, available to authorized users.
©2016 Walter de Gruyter GmbH, Berlin/Boston