Skip to main content
Log in

Detecting direct associations in a network by information theoretic approaches

  • Reviews
  • Progress of Projects Supported by NSFC
  • Published:
Science China Mathematics Aims and scope Submit manuscript

Abstract

Detecting direct associations or inferring networks based on the observed data is an important issue in many fields, including biology, physics, engineering and social studies. In this work, we focus on the information theoretic approaches in the network reconstruction or the direct association detection, in particular, for biological networks. We not only review the traditional approaches or measurements on the associations among the observed variables, such as correlation coefficient, mutual information and conditional mutual information (CMI), but also summarize recently developed theories and methods. The new theoretic works include: information geometry to give a unified framework in detecting causality/association, the partial independence to alleviate the singularity of CMI, and multiscale analysis of CMI to avoid the underestimation issue of CMI. The new methods include part mutual information (PMI) and partial associations (PA), which improve the old measurements in avoiding both overestimation and underestimation. All those theories and methods make important contributions as major advances in the development of network inference.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Alipanahi B, Frey B J. Network cleanup. Nat Biotechnol, 2013, 31: 714–715

    Article  Google Scholar 

  2. Altay G, Emmert-Streib F. Revealing differences in gene network inference algorithms on the network level by ensemble methods. Bioinformatics, 2010, 26: 1738–1744

    Article  Google Scholar 

  3. Amari S I. Information geometry of the EM and EM algorithms for neural networks. Neural Networks, 1995, 8: 1379–1408

    Article  Google Scholar 

  4. Amari S I. Information Geometry and Its Applications. Volume. 194. New York: Springer, 2016

    Book  MATH  Google Scholar 

  5. Ay N. Information geometry on complexity and stochastic interaction. Entropy, 2015, 17: 2432–2458

    Article  MathSciNet  MATH  Google Scholar 

  6. Bansal M, Belcastro V, Ambesi-Impiombato A, et al. How to infer gene networks from expression profiles. Mol Syst Biol, 2007, 3: 78

    Article  Google Scholar 

  7. Bansal M, Della Gatta G, Di Bernardo D. Inference of gene regulatory networks and compound mode of action from time course gene expression profiles. Bioinformatics, 2006, 22: 815–822

    Article  Google Scholar 

  8. Barabási A L, Albert R. Emergence of scaling in random networks. Science, 1999, 286: 509–512

    Article  MathSciNet  MATH  Google Scholar 

  9. Barnett L, Barrett A B, Seth A K. Granger causality and transfer entropy are equivalent for Gaussian variables. Phys Rev Lett, 2009, 103: 238701

    Article  Google Scholar 

  10. Barrett A B, Seth A K. Practical measures of integrated information for time-series data. PLoS Comput Biol, 2011, 7: e1001052

    Article  MathSciNet  Google Scholar 

  11. Barzel B, Barabási A L. Network link prediction by global silencing of indirect correlations. Nat Biotechnol, 2013, 31: 720–725

    Article  Google Scholar 

  12. Basso K, Margolin A A, Stolovitzky G, et al. Reverse engineering of regulatory networks in human B cells. Nat Genet, 2005, 37: 382–390

    Article  Google Scholar 

  13. Bialek W, Nemenman I, Tishby N. Predictability, complexity, and learning. Neural Comput, 2001, 13: 2409–2463

    Article  MATH  Google Scholar 

  14. Bollobás B. Random Graphs. In: Modern Graph Theory. New York: Springer, 1998, 215–252

    Chapter  Google Scholar 

  15. Boyd D M, Ellison N B. Social network sites: Definition, history, and scholarship. J Computer-Mediated Comm, 2007, 13: 210–230

    Article  Google Scholar 

  16. Bullmore E, Sporns O. Complex brain networks: graph theoretical analysis of structural and functional systems. Nat Rev Neurosci, 2009, 10: 186–198

    Article  Google Scholar 

  17. Butte A J, Kohane I S. Mutual information relevance networks: Functional genomic clustering using pairwise entropy measurements. In: Pacific Symposium on Biocomputing. Singapore: World Scientific, 2000, 418–429

    Google Scholar 

  18. Caldarelli G, Catanzaro M. Networks: A Very Short Introduction. Volume. 335. Oxford: Oxford University Press, 2012

  19. Cellucci C J, Albano A M, Rapp P E. Statistical validation of mutual information calculations: Comparison of alternative numerical algorithms. Phys Rev E (3), 2005, 71: 066208

    Article  Google Scholar 

  20. Chen L, Liu R, Liu Z P, et al. Detecting early-warning signals for sudden deterioration of complex diseases by dynamical network biomarkers. Sci Rep, 2012, 2: 342

    Article  Google Scholar 

  21. Cover T M, Thomas J A. Elements of Information Theory. New York: John Wiley & Sons, 2012

    MATH  Google Scholar 

  22. di Bernardo D, Thompson M J, Gardner T S, et al. Chemogenomic profiling on a genome-wide scale using reverseengineered gene networks. Nat Biotechnol, 2005, 23: 377–383

    Article  Google Scholar 

  23. Eisen M B, Spellman P T, Brown P O, et al. Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci USA, 1998, 95: 14863–14868

    Article  Google Scholar 

  24. Engle R F, Granger C W. Co-integration and error correction: Representation, estimation, and testing. Econometrica, 1987, 55: 251–276

    Article  MathSciNet  MATH  Google Scholar 

  25. Erdös P, Rényi A. On random graphs, I. Publ Math Debrecen, 1959, 6: 290–297

    MathSciNet  MATH  Google Scholar 

  26. Eungdamrong N J, Iyengar R. Modeling cell signaling networks. Biol Cell, 2004, 96: 355–362

    Article  Google Scholar 

  27. Faith J J, Hayete B, Thaden J T, et al. Large-scale mapping and validation of Escherichia coli transcriptional regulation from a compendium of expression profiles. PLoS Biol, 2007, 5: e8

    Article  Google Scholar 

  28. Feizi S, Marbach D, Médard M, et al. Network deconvolution as a general method to distinguish direct dependencies in networks. Nat Biotechnol, 2013, 31: 726–733

    Article  Google Scholar 

  29. Fraser A M, Swinney H L. Independent coordinates for strange attractors from mutual information. Phys Rev A (3), 1986, 33: 1134–1140

    Article  MathSciNet  MATH  Google Scholar 

  30. Frenzel S, Pompe B. Partial mutual information for coupling analysis of multivariate time series. Phys Rev Lett, 2007, 99: 204101

    Article  Google Scholar 

  31. Friedman N, Linial M, Nachman I, et al. Using Bayesian networks to analyze expression data. J Comput Biol, 2000, 7: 601–620

    Article  Google Scholar 

  32. Gardner T S, di Bernardo D, Lorenz D, et al. Inferring genetic networks and identifying compound mode of action via expression profiling. Science, 2003, 301: 102–105

    Article  Google Scholar 

  33. Granger C W. Investigating causal relations by econometric models and cross-spectral methods. Econometrica, 1969, 37: 424–438

    Article  MATH  Google Scholar 

  34. Granger C W. Some recent development in a concept of causality. J Econometrics, 1988, 39: 199–211

    Article  MathSciNet  Google Scholar 

  35. Hagan M, Demuth H B, Beale M H, et al. Neural Network Design. https://doi.org/hagan.okstate.edu/NNDesign.pdf, 2014

    Google Scholar 

  36. Hecker M, Lambeck S, Toepfer S, et al. Gene regulatory network inference: Data integration in dynamic models A review. Biosystems, 2009, 96: 86–103

    Article  Google Scholar 

  37. Hlaváčková-Schindler K, Paluš M, Vejmelka M, et al. Causality detection based on information-theoretic approaches in time series analysis. Phys Rep, 2007, 441: 1–46

    Article  Google Scholar 

  38. Hoeffding W. A non-parametric test of independence. Ann Math Statist, 1948, 19: 546–557

    Article  MathSciNet  MATH  Google Scholar 

  39. Honey C J, Kötter R, Breakspear M, et al. Network structure of cerebral cortex shapes functional connectivity on multiple time scales. Proc Natl Acad Sci USA, 2007, 104: 10240–10245

    Article  Google Scholar 

  40. Janzing D, Balduzzi D, Grosse-Wentrup M, et al. Quantifying causal in fluences. Ann Statist, 2013, 41: 2324–2358

    Article  MathSciNet  MATH  Google Scholar 

  41. Kalisch M, Buhlmann P. Estimating high-dimensional directed acyclic graphs with the PC-algorithm. J Mach Learn Res, 2007, 8: 613–636

    MATH  Google Scholar 

  42. Khan S, Bandyopadhyay S, Ganguly A R, et al. Relative performance of mutual information estimation methods for quantifying the dependence among short and noisy data. Phys Rev E (3), 2007, 76: 026209

    Article  MathSciNet  Google Scholar 

  43. Kinney J B, Atwal G S. Equitability, mutual information, and the maximal information coefficient. Proc Natl Acad Sci USA, 2014, 111: 3354–3359

    Article  MathSciNet  MATH  Google Scholar 

  44. Kinney J B, Atwal G S. Reply to Murrell et al.: Noise matters. Proc Natl Acad Sci USA, 2014, 111: E2161–E2161

    Article  Google Scholar 

  45. Kinney J B, Atwal G S. Reply to Reshef et al.: Falsifiability or bust. Proc Natl Acad Sci USA, 2014, 111: E3364–E3364

    Article  Google Scholar 

  46. Kosorok M R. On Brownian distance covariance and high dimensional data. Ann Appl Stat, 2009, 3: 1266–1269

    Article  MathSciNet  Google Scholar 

  47. Kraskov A, Stögbauer H, Grassberger P. Estimating mutual information. Phys Rev E (3), 2004, 69: 066138

    Article  MathSciNet  Google Scholar 

  48. Lall U, Bosworth K. Multivariate kernel estimation of functions of space and time hydrologic data. In: Stochastic and Statistical Methods in Hydrology and Environmental Engineering. New York: Springer, 1994, 301–315

    Chapter  Google Scholar 

  49. Lebanon G, Lafferty J D. Boosting and maximum likelihood for exponential models. Adv Neural Inf Process Syst, 2002, 14: 447–454

    Google Scholar 

  50. Li M, Li C, Liu W X, et al. Dysfunction of PLA2G6 and CYP2C44-associated network signals imminent carcinogenesis from chronic in flammation to hepatocellular carcinoma. J Mol Cell Biol, 2017, 9: 489–503

    Article  Google Scholar 

  51. Liu X, Chang X, Liu R, et al. Quantifying critical states of complex diseases using single-sample dynamic network biomarkers. PLoS Comput Biol, 2017, 13: e1005633

    Article  Google Scholar 

  52. Liu X, Wang Y, Ji H, et al. Personalized characterization of diseases using sample-specific networks. Nucleic Acids Res, 2016, 44: e164–e164

    Article  Google Scholar 

  53. Lyons R. Distance covariance in metric spaces. Ann Probab, 2013, 41: 3284–3305

    Article  MathSciNet  MATH  Google Scholar 

  54. Marbach D, Costello J C, Kuffner R, et al. Wisdom of crowds for robust gene network inference. Nat Methods, 2012, 9: 796–804

    Article  Google Scholar 

  55. Marbach D, Prill R J, Schaffter T, et al. Revealing strengths and weaknesses of methods for gene network inference. Proc Natl Acad Sci USA, 2010, 107: 6286–6291

    Article  Google Scholar 

  56. Margolin A A, Nemenman I, Basso K, et al. ARACNE: An algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context. BMC Bioinformatics, 2006, 7: S7

    Article  Google Scholar 

  57. Mashaghi A, Ramezanpour A, Karimipour V. Investigation of a protein complex network. Eur Phys J B, 2004, 41: 113–121

    Article  Google Scholar 

  58. Meyer P E, Lafitte F, Bontempi G. minet: AR/Bioconductor package for inferring large transcriptional networks using mutual information. BMC Bioinformatics, 2008, 9: 461

    Article  Google Scholar 

  59. Moon Y I, Rajagopalan B, Lall U. Estimation of mutual information using kernel density estimators. Phys Rev E (3), 1995, 52: 2318–2321

    Article  Google Scholar 

  60. Murrell B, Murrell D, Murrell H. R2-equitability is satisfiable. Proc Natl Acad Sci USA, 2014, 111: E2160–E2160

    Article  Google Scholar 

  61. Oizumi M, Amari S I, Yanagawa T, et al. Measuring integrated information from the decoding perspective. PLoS Comput Biol, 2016, 12: e1004654

    Article  Google Scholar 

  62. Oizumi M, Tsuchiya N, Amari S I. Unified framework for information integration based on information geometry. Proc Natl Acad Sci USA, 2016, 113: 14817–14822

    Article  MathSciNet  MATH  Google Scholar 

  63. Omranian N, Eloundou-Mbebi J M, Mueller-Roeber B, et al. Gene regulatory network inference using fused LASSO on multiple data sets. Sci Rep, 2016, 6: 20533

    Article  Google Scholar 

  64. Park H J, Friston K. Structural and functional brain networks: From connections to cognition. Science, 2013, 342: 1238411

    Article  Google Scholar 

  65. Pereda E, Quiroga R Q, Bhattacharya J. Nonlinear multivariate analysis of neurophysiological signals. Prog Neurobiol, 2005, 77: 1–37

    Article  Google Scholar 

  66. Rényi A. On measures of dependence. Acta Math Hungar, 1959, 10: 441–451

    Article  MathSciNet  MATH  Google Scholar 

  67. Reshef D N, Reshef Y A, Finucane H K, et al. Detecting novel associations in large data sets. Science, 2011, 334: 1518–1524

    Article  MATH  Google Scholar 

  68. Reshef D N, Reshef Y A, Mitzenmacher M, et al. Cleaning up the record on the maximal information coefficient and equitability. Proc Natl Acad Sci USA, 2014, 111: E3362–E3363

    Article  Google Scholar 

  69. Robins J M, Scheines R, Spirtes P, et al. Uniform consistency in causal inference. Biometrika, 2003, 90: 491–515

    Article  MathSciNet  MATH  Google Scholar 

  70. Rosenblum M G, Pikovsky A S. Detecting direction of coupling in interacting oscillators. Phys Rev E (3), 2001, 64: 045202

    Article  Google Scholar 

  71. Runge J, Heitzig J, Petoukhov V, et al. Escaping the curse of dimensionality in estimating multivariate transfer entropy. Phys Rev Lett, 2012, 108: 258701

    Article  Google Scholar 

  72. Schreiber T. Measuring information transfer. Phys Rev Lett, 2000, 85: 461–464

    Article  Google Scholar 

  73. Scott J. Social Network Analysis. Thousand Oaks: Sage, 2012

    Google Scholar 

  74. Shi J, Zhao J, Chen L, et al. Quantifying direct dependencies in biological networks by multiscale association analysis. IEEE/ACM Trans Comput Biol Bioinform, 2017, in press

    Google Scholar 

  75. Simon N, Tibshirani R. Comment onDetecting novel associations in large data sets" by Reshef et al. ArXiv:14017645, 2014

    Google Scholar 

  76. Spirtes P, Glymour C N, Scheines R. Causation, Prediction, and Search. Cambridge: MIT press, 2000

    MATH  Google Scholar 

  77. Steuer R, Kurths J, Daub C O, et al. The mutual information: Detecting and evaluating dependencies between variables. Bioinformatics, 2002, 18: S231–S240

    Article  Google Scholar 

  78. Székely G J, Rizzo M L. Brownian distance covariance. Ann Appl Stat, 2009, 3: 1236–1265

    Article  MathSciNet  MATH  Google Scholar 

  79. Székely G J, Rizzo M L. Partial distance correlation with methods for dissimilarities. Ann Statist, 2014, 42: 2382–2412

    Article  MathSciNet  MATH  Google Scholar 

  80. Székely G J, Rizzo M L, Bakirov N K, et al. Measuring and testing dependence by correlation of distances. Ann Statist, 2007, 35: 2769–2794

    Article  MathSciNet  MATH  Google Scholar 

  81. Tegner J, Yeung M S, Hasty J, et al. Reverse engineering gene networks: Integrating genetic perturbations with dynamical modeling. Proc Natl Acad Sci USA, 2003, 100: 5944–5949

    Article  Google Scholar 

  82. Tononi G. Consciousness as integrated information: A provisional manifesto. Biol Bull, 2008, 215: 216–242

    Article  Google Scholar 

  83. Tononi G, Boly M, Massimini M, et al. Integrated information theory: From consciousness to its physical substrate. Nat Rev Neurosci, 2016, 17: 450–461

    Article  Google Scholar 

  84. Van Hulle M M. Edgeworth approximation of multivariate differential entropy. Neural Comput, 2005, 17: 1903–1910

    Article  MATH  Google Scholar 

  85. Van Hulle M M. Multivariate edgeworth-based entropy estimation. In: Proceedings of the IEEE Workshop on Machine Learning for Signal Processing, vol. 2005. New York: IEEE, 2005, 311–316

    Article  Google Scholar 

  86. Vejmelka M, Paluš M. Inferring the directionality of coupling with conditional mutual information. Phys Rev E (3), 2008, 77: 026214

    Article  MathSciNet  Google Scholar 

  87. Wang K, Saito M, Bisikirska B C, et al. Genome-wide identification of post-translational modulators of transcription factor activity in human B cells. Nat Biotechnol, 2009, 27: 829–837

    Article  Google Scholar 

  88. Wang Y, Joshi T, Zhang X S, et al. Inferring gene regulatory networks from multiple microarray datasets. Bioinformatics, 2006, 22: 2413–2420

    Article  Google Scholar 

  89. Wang Y X, Waterman M S, Huang H. Gene coexpression measures in large heterogeneous samples using count statistics. Proc Natl Acad Sci USA, 2014, 111: 16371–16376

    Article  Google Scholar 

  90. Wasserman S, Faust K. Social Network Analysis: Methods and Applications. Volume. 8. Cambridge: Cambridge University Press, 1994

  91. Watts D J, Strogatz S H. Collective dynamics of small-world' networks. Nature, 1998, 393: 440–442

    Article  MATH  Google Scholar 

  92. Wu S, Amari S I. Conformal transformation of kernel functions: A data-dependent way to improve support vector machine classifiers. Neural Process Lett, 2002, 15: 59–67

    Article  MATH  Google Scholar 

  93. Yang B, Li M, Tang W, et al. Dynamic network biomarker indicates pulmonary metastasis at the tipping point of hepatocellular carcinoma. Nat Commun, 2018, 9: 678

    Article  Google Scholar 

  94. Yu J, Smith V A, Wang P P, et al. Advances to Bayesian network inference for generating causal networks from observational biological data. Bioinformatics, 2004, 20: 3594–3603

    Article  Google Scholar 

  95. Yu X, Zhang J, Sun S, et al. Individual-specific edge-network analysis for disease prediction. Nucleic Acids Res, 2017, 45: e170–e170

    Article  Google Scholar 

  96. Zhang W, Zeng T, Liu X, et al. Diagnosing phenotypes of single-sample individuals by edge biomarkers. J Mol Cell Biol, 2015, 7: 231–241

    Article  Google Scholar 

  97. Zhang X, Zhao J, Hao J, et al. Conditional mutual inclusive information enables accurate quantification of associations in gene regulatory networks. Nucleic Acids Res, 2015, 43: e31–e31

    Article  Google Scholar 

  98. Zhang X, Zhao X, He K, et al. Inferring gene regulatory networks from gene expression data by path consistency algorithm based on conditional mutual information. Bioinformatics, 2012, 28: 98–104

    Article  Google Scholar 

  99. Zhao J, Zhou Y, Zhang X, et al. Part mutual information for quantifying direct associations in networks. Proc Natl Acad Sci USA, 2016, 113: 5130–5135

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported by the National Key R&D Program of China (Grant No. 2017YFA0505500), the Strategic Priority Research Program of the Chinese Academy of Sciences (Grant No. XDB13040700) and National Natural Science Foundation of China (Grant Nos. 31771476, 91529303, 91439103, 11421101 and 91530322).

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Tiejun Li or Luonan Chen.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Shi, J., Zhao, J., Li, T. et al. Detecting direct associations in a network by information theoretic approaches. Sci. China Math. 62, 823–838 (2019). https://doi.org/10.1007/s11425-017-9206-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11425-017-9206-0

Keywords

MSC(2010)

Navigation