Abstract
Detecting direct associations or inferring networks based on the observed data is an important issue in many fields, including biology, physics, engineering and social studies. In this work, we focus on the information theoretic approaches in the network reconstruction or the direct association detection, in particular, for biological networks. We not only review the traditional approaches or measurements on the associations among the observed variables, such as correlation coefficient, mutual information and conditional mutual information (CMI), but also summarize recently developed theories and methods. The new theoretic works include: information geometry to give a unified framework in detecting causality/association, the partial independence to alleviate the singularity of CMI, and multiscale analysis of CMI to avoid the underestimation issue of CMI. The new methods include part mutual information (PMI) and partial associations (PA), which improve the old measurements in avoiding both overestimation and underestimation. All those theories and methods make important contributions as major advances in the development of network inference.
Similar content being viewed by others
References
Alipanahi B, Frey B J. Network cleanup. Nat Biotechnol, 2013, 31: 714–715
Altay G, Emmert-Streib F. Revealing differences in gene network inference algorithms on the network level by ensemble methods. Bioinformatics, 2010, 26: 1738–1744
Amari S I. Information geometry of the EM and EM algorithms for neural networks. Neural Networks, 1995, 8: 1379–1408
Amari S I. Information Geometry and Its Applications. Volume. 194. New York: Springer, 2016
Ay N. Information geometry on complexity and stochastic interaction. Entropy, 2015, 17: 2432–2458
Bansal M, Belcastro V, Ambesi-Impiombato A, et al. How to infer gene networks from expression profiles. Mol Syst Biol, 2007, 3: 78
Bansal M, Della Gatta G, Di Bernardo D. Inference of gene regulatory networks and compound mode of action from time course gene expression profiles. Bioinformatics, 2006, 22: 815–822
Barabási A L, Albert R. Emergence of scaling in random networks. Science, 1999, 286: 509–512
Barnett L, Barrett A B, Seth A K. Granger causality and transfer entropy are equivalent for Gaussian variables. Phys Rev Lett, 2009, 103: 238701
Barrett A B, Seth A K. Practical measures of integrated information for time-series data. PLoS Comput Biol, 2011, 7: e1001052
Barzel B, Barabási A L. Network link prediction by global silencing of indirect correlations. Nat Biotechnol, 2013, 31: 720–725
Basso K, Margolin A A, Stolovitzky G, et al. Reverse engineering of regulatory networks in human B cells. Nat Genet, 2005, 37: 382–390
Bialek W, Nemenman I, Tishby N. Predictability, complexity, and learning. Neural Comput, 2001, 13: 2409–2463
Bollobás B. Random Graphs. In: Modern Graph Theory. New York: Springer, 1998, 215–252
Boyd D M, Ellison N B. Social network sites: Definition, history, and scholarship. J Computer-Mediated Comm, 2007, 13: 210–230
Bullmore E, Sporns O. Complex brain networks: graph theoretical analysis of structural and functional systems. Nat Rev Neurosci, 2009, 10: 186–198
Butte A J, Kohane I S. Mutual information relevance networks: Functional genomic clustering using pairwise entropy measurements. In: Pacific Symposium on Biocomputing. Singapore: World Scientific, 2000, 418–429
Caldarelli G, Catanzaro M. Networks: A Very Short Introduction. Volume. 335. Oxford: Oxford University Press, 2012
Cellucci C J, Albano A M, Rapp P E. Statistical validation of mutual information calculations: Comparison of alternative numerical algorithms. Phys Rev E (3), 2005, 71: 066208
Chen L, Liu R, Liu Z P, et al. Detecting early-warning signals for sudden deterioration of complex diseases by dynamical network biomarkers. Sci Rep, 2012, 2: 342
Cover T M, Thomas J A. Elements of Information Theory. New York: John Wiley & Sons, 2012
di Bernardo D, Thompson M J, Gardner T S, et al. Chemogenomic profiling on a genome-wide scale using reverseengineered gene networks. Nat Biotechnol, 2005, 23: 377–383
Eisen M B, Spellman P T, Brown P O, et al. Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci USA, 1998, 95: 14863–14868
Engle R F, Granger C W. Co-integration and error correction: Representation, estimation, and testing. Econometrica, 1987, 55: 251–276
Erdös P, Rényi A. On random graphs, I. Publ Math Debrecen, 1959, 6: 290–297
Eungdamrong N J, Iyengar R. Modeling cell signaling networks. Biol Cell, 2004, 96: 355–362
Faith J J, Hayete B, Thaden J T, et al. Large-scale mapping and validation of Escherichia coli transcriptional regulation from a compendium of expression profiles. PLoS Biol, 2007, 5: e8
Feizi S, Marbach D, Médard M, et al. Network deconvolution as a general method to distinguish direct dependencies in networks. Nat Biotechnol, 2013, 31: 726–733
Fraser A M, Swinney H L. Independent coordinates for strange attractors from mutual information. Phys Rev A (3), 1986, 33: 1134–1140
Frenzel S, Pompe B. Partial mutual information for coupling analysis of multivariate time series. Phys Rev Lett, 2007, 99: 204101
Friedman N, Linial M, Nachman I, et al. Using Bayesian networks to analyze expression data. J Comput Biol, 2000, 7: 601–620
Gardner T S, di Bernardo D, Lorenz D, et al. Inferring genetic networks and identifying compound mode of action via expression profiling. Science, 2003, 301: 102–105
Granger C W. Investigating causal relations by econometric models and cross-spectral methods. Econometrica, 1969, 37: 424–438
Granger C W. Some recent development in a concept of causality. J Econometrics, 1988, 39: 199–211
Hagan M, Demuth H B, Beale M H, et al. Neural Network Design. https://doi.org/hagan.okstate.edu/NNDesign.pdf, 2014
Hecker M, Lambeck S, Toepfer S, et al. Gene regulatory network inference: Data integration in dynamic models A review. Biosystems, 2009, 96: 86–103
Hlaváčková-Schindler K, Paluš M, Vejmelka M, et al. Causality detection based on information-theoretic approaches in time series analysis. Phys Rep, 2007, 441: 1–46
Hoeffding W. A non-parametric test of independence. Ann Math Statist, 1948, 19: 546–557
Honey C J, Kötter R, Breakspear M, et al. Network structure of cerebral cortex shapes functional connectivity on multiple time scales. Proc Natl Acad Sci USA, 2007, 104: 10240–10245
Janzing D, Balduzzi D, Grosse-Wentrup M, et al. Quantifying causal in fluences. Ann Statist, 2013, 41: 2324–2358
Kalisch M, Buhlmann P. Estimating high-dimensional directed acyclic graphs with the PC-algorithm. J Mach Learn Res, 2007, 8: 613–636
Khan S, Bandyopadhyay S, Ganguly A R, et al. Relative performance of mutual information estimation methods for quantifying the dependence among short and noisy data. Phys Rev E (3), 2007, 76: 026209
Kinney J B, Atwal G S. Equitability, mutual information, and the maximal information coefficient. Proc Natl Acad Sci USA, 2014, 111: 3354–3359
Kinney J B, Atwal G S. Reply to Murrell et al.: Noise matters. Proc Natl Acad Sci USA, 2014, 111: E2161–E2161
Kinney J B, Atwal G S. Reply to Reshef et al.: Falsifiability or bust. Proc Natl Acad Sci USA, 2014, 111: E3364–E3364
Kosorok M R. On Brownian distance covariance and high dimensional data. Ann Appl Stat, 2009, 3: 1266–1269
Kraskov A, Stögbauer H, Grassberger P. Estimating mutual information. Phys Rev E (3), 2004, 69: 066138
Lall U, Bosworth K. Multivariate kernel estimation of functions of space and time hydrologic data. In: Stochastic and Statistical Methods in Hydrology and Environmental Engineering. New York: Springer, 1994, 301–315
Lebanon G, Lafferty J D. Boosting and maximum likelihood for exponential models. Adv Neural Inf Process Syst, 2002, 14: 447–454
Li M, Li C, Liu W X, et al. Dysfunction of PLA2G6 and CYP2C44-associated network signals imminent carcinogenesis from chronic in flammation to hepatocellular carcinoma. J Mol Cell Biol, 2017, 9: 489–503
Liu X, Chang X, Liu R, et al. Quantifying critical states of complex diseases using single-sample dynamic network biomarkers. PLoS Comput Biol, 2017, 13: e1005633
Liu X, Wang Y, Ji H, et al. Personalized characterization of diseases using sample-specific networks. Nucleic Acids Res, 2016, 44: e164–e164
Lyons R. Distance covariance in metric spaces. Ann Probab, 2013, 41: 3284–3305
Marbach D, Costello J C, Kuffner R, et al. Wisdom of crowds for robust gene network inference. Nat Methods, 2012, 9: 796–804
Marbach D, Prill R J, Schaffter T, et al. Revealing strengths and weaknesses of methods for gene network inference. Proc Natl Acad Sci USA, 2010, 107: 6286–6291
Margolin A A, Nemenman I, Basso K, et al. ARACNE: An algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context. BMC Bioinformatics, 2006, 7: S7
Mashaghi A, Ramezanpour A, Karimipour V. Investigation of a protein complex network. Eur Phys J B, 2004, 41: 113–121
Meyer P E, Lafitte F, Bontempi G. minet: AR/Bioconductor package for inferring large transcriptional networks using mutual information. BMC Bioinformatics, 2008, 9: 461
Moon Y I, Rajagopalan B, Lall U. Estimation of mutual information using kernel density estimators. Phys Rev E (3), 1995, 52: 2318–2321
Murrell B, Murrell D, Murrell H. R2-equitability is satisfiable. Proc Natl Acad Sci USA, 2014, 111: E2160–E2160
Oizumi M, Amari S I, Yanagawa T, et al. Measuring integrated information from the decoding perspective. PLoS Comput Biol, 2016, 12: e1004654
Oizumi M, Tsuchiya N, Amari S I. Unified framework for information integration based on information geometry. Proc Natl Acad Sci USA, 2016, 113: 14817–14822
Omranian N, Eloundou-Mbebi J M, Mueller-Roeber B, et al. Gene regulatory network inference using fused LASSO on multiple data sets. Sci Rep, 2016, 6: 20533
Park H J, Friston K. Structural and functional brain networks: From connections to cognition. Science, 2013, 342: 1238411
Pereda E, Quiroga R Q, Bhattacharya J. Nonlinear multivariate analysis of neurophysiological signals. Prog Neurobiol, 2005, 77: 1–37
Rényi A. On measures of dependence. Acta Math Hungar, 1959, 10: 441–451
Reshef D N, Reshef Y A, Finucane H K, et al. Detecting novel associations in large data sets. Science, 2011, 334: 1518–1524
Reshef D N, Reshef Y A, Mitzenmacher M, et al. Cleaning up the record on the maximal information coefficient and equitability. Proc Natl Acad Sci USA, 2014, 111: E3362–E3363
Robins J M, Scheines R, Spirtes P, et al. Uniform consistency in causal inference. Biometrika, 2003, 90: 491–515
Rosenblum M G, Pikovsky A S. Detecting direction of coupling in interacting oscillators. Phys Rev E (3), 2001, 64: 045202
Runge J, Heitzig J, Petoukhov V, et al. Escaping the curse of dimensionality in estimating multivariate transfer entropy. Phys Rev Lett, 2012, 108: 258701
Schreiber T. Measuring information transfer. Phys Rev Lett, 2000, 85: 461–464
Scott J. Social Network Analysis. Thousand Oaks: Sage, 2012
Shi J, Zhao J, Chen L, et al. Quantifying direct dependencies in biological networks by multiscale association analysis. IEEE/ACM Trans Comput Biol Bioinform, 2017, in press
Simon N, Tibshirani R. Comment onDetecting novel associations in large data sets" by Reshef et al. ArXiv:14017645, 2014
Spirtes P, Glymour C N, Scheines R. Causation, Prediction, and Search. Cambridge: MIT press, 2000
Steuer R, Kurths J, Daub C O, et al. The mutual information: Detecting and evaluating dependencies between variables. Bioinformatics, 2002, 18: S231–S240
Székely G J, Rizzo M L. Brownian distance covariance. Ann Appl Stat, 2009, 3: 1236–1265
Székely G J, Rizzo M L. Partial distance correlation with methods for dissimilarities. Ann Statist, 2014, 42: 2382–2412
Székely G J, Rizzo M L, Bakirov N K, et al. Measuring and testing dependence by correlation of distances. Ann Statist, 2007, 35: 2769–2794
Tegner J, Yeung M S, Hasty J, et al. Reverse engineering gene networks: Integrating genetic perturbations with dynamical modeling. Proc Natl Acad Sci USA, 2003, 100: 5944–5949
Tononi G. Consciousness as integrated information: A provisional manifesto. Biol Bull, 2008, 215: 216–242
Tononi G, Boly M, Massimini M, et al. Integrated information theory: From consciousness to its physical substrate. Nat Rev Neurosci, 2016, 17: 450–461
Van Hulle M M. Edgeworth approximation of multivariate differential entropy. Neural Comput, 2005, 17: 1903–1910
Van Hulle M M. Multivariate edgeworth-based entropy estimation. In: Proceedings of the IEEE Workshop on Machine Learning for Signal Processing, vol. 2005. New York: IEEE, 2005, 311–316
Vejmelka M, Paluš M. Inferring the directionality of coupling with conditional mutual information. Phys Rev E (3), 2008, 77: 026214
Wang K, Saito M, Bisikirska B C, et al. Genome-wide identification of post-translational modulators of transcription factor activity in human B cells. Nat Biotechnol, 2009, 27: 829–837
Wang Y, Joshi T, Zhang X S, et al. Inferring gene regulatory networks from multiple microarray datasets. Bioinformatics, 2006, 22: 2413–2420
Wang Y X, Waterman M S, Huang H. Gene coexpression measures in large heterogeneous samples using count statistics. Proc Natl Acad Sci USA, 2014, 111: 16371–16376
Wasserman S, Faust K. Social Network Analysis: Methods and Applications. Volume. 8. Cambridge: Cambridge University Press, 1994
Watts D J, Strogatz S H. Collective dynamics of small-world' networks. Nature, 1998, 393: 440–442
Wu S, Amari S I. Conformal transformation of kernel functions: A data-dependent way to improve support vector machine classifiers. Neural Process Lett, 2002, 15: 59–67
Yang B, Li M, Tang W, et al. Dynamic network biomarker indicates pulmonary metastasis at the tipping point of hepatocellular carcinoma. Nat Commun, 2018, 9: 678
Yu J, Smith V A, Wang P P, et al. Advances to Bayesian network inference for generating causal networks from observational biological data. Bioinformatics, 2004, 20: 3594–3603
Yu X, Zhang J, Sun S, et al. Individual-specific edge-network analysis for disease prediction. Nucleic Acids Res, 2017, 45: e170–e170
Zhang W, Zeng T, Liu X, et al. Diagnosing phenotypes of single-sample individuals by edge biomarkers. J Mol Cell Biol, 2015, 7: 231–241
Zhang X, Zhao J, Hao J, et al. Conditional mutual inclusive information enables accurate quantification of associations in gene regulatory networks. Nucleic Acids Res, 2015, 43: e31–e31
Zhang X, Zhao X, He K, et al. Inferring gene regulatory networks from gene expression data by path consistency algorithm based on conditional mutual information. Bioinformatics, 2012, 28: 98–104
Zhao J, Zhou Y, Zhang X, et al. Part mutual information for quantifying direct associations in networks. Proc Natl Acad Sci USA, 2016, 113: 5130–5135
Acknowledgements
This work was supported by the National Key R&D Program of China (Grant No. 2017YFA0505500), the Strategic Priority Research Program of the Chinese Academy of Sciences (Grant No. XDB13040700) and National Natural Science Foundation of China (Grant Nos. 31771476, 91529303, 91439103, 11421101 and 91530322).
Author information
Authors and Affiliations
Corresponding authors
Rights and permissions
About this article
Cite this article
Shi, J., Zhao, J., Li, T. et al. Detecting direct associations in a network by information theoretic approaches. Sci. China Math. 62, 823–838 (2019). https://doi.org/10.1007/s11425-017-9206-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11425-017-9206-0