skip to main content
survey

Causality-based Feature Selection: Methods and Evaluations

Authors Info & Claims
Published:28 September 2020Publication History
Skip Abstract Section

Abstract

Feature selection is a crucial preprocessing step in data analytics and machine learning. Classical feature selection algorithms select features based on the correlations between predictive features and the class variable and do not attempt to capture causal relationships between them. It has been shown that the knowledge about the causal relationships between features and the class variable has potential benefits for building interpretable and robust prediction models, since causal relationships imply the underlying mechanism of a system. Consequently, causality-based feature selection has gradually attracted greater attentions and many algorithms have been proposed. In this article, we present a comprehensive review of recent advances in causality-based feature selection. To facilitate the development of new algorithms in the research area and make it easy for the comparisons between new methods and existing ones, we develop the first open-source package, called CausalFS, which consists of most of the representative causality-based feature selection algorithms (available at https://github.com/kuiy/CausalFS). Using CausalFS, we conduct extensive experiments to compare the representative algorithms with both synthetic and real-world datasets. Finally, we discuss some challenging problems to be tackled in future research.

Skip Supplemental Material Section

Supplemental Material

References

  1. Silvia Acid, Luis M. de Campos, and Javier G. Castellano. 2005. Learning Bayesian network classifiers: Searching in a space of partially directed acyclic graphs. Mach. Learn. 59, 3 (2005), 213--235.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Silvia Acid, Luis M. de Campos, and Moisés Fernández. 2013. Score-based methods for learning Markov boundaries by searching in constrained spaces. Data Mining Knowl. Disc. 26, 1 (2013), 174--212.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Alan Agresti and Maria Kateri. 2011. Categorical Data Analysis. Springer.Google ScholarGoogle Scholar
  4. Hirotugu Akaike. 1974. A new look at the statistical model identification. In Selected Papers of Hirotugu Akaike. Springer, 215--222.Google ScholarGoogle Scholar
  5. Constantin F. Aliferis, Alexander Statnikov, Ioannis Tsamardinos, Subramani Mani, and Xenofon D. Koutsoukos. 2010. Local causal and Markov blanket induction for causal discovery and feature selection for classification part I: Algorithms and empirical evaluation. J. Mach. Learn. Res. 11 (2010), 171--234.Google ScholarGoogle ScholarCross RefCross Ref
  6. Constantin F. Aliferis, Alexander Statnikov, Ioannis Tsamardinos, Subramani Mani, and Xenofon D. Koutsoukos. 2010. Local causal and markov blanket induction for causal discovery and feature selection for classification part ii: Analysis and extensions. J. Mach. Learn. Res. 11, Jan. (2010), 235--284.Google ScholarGoogle Scholar
  7. Constantin F. Aliferis, Ioannis Tsamardinos, and Alexander Statnikov. 2003. HITON: A novel Markov blanket algorithm for optimal variable selection. In AMIA Annual Symposium Proceedings, Vol. 2003. American Medical Informatics Association, 21.Google ScholarGoogle Scholar
  8. Martin Arjovsky, Léon Bottou, Ishaan Gulrajani, and David Lopez-Paz. 2019. Invariant risk minimization. Arxiv Preprint Arxiv:1907.02893 (2019).Google ScholarGoogle Scholar
  9. Susan Athey. 2017. Beyond prediction: Using big data for policy problems. Science 355, 6324 (2017), 483--485.Google ScholarGoogle Scholar
  10. Harold Bae, Stefano Monti, Monty Montano, Martin H. Steinberg, Thomas T. Perls, and Paola Sebastiani. 2016. Learning Bayesian networks from correlated data. Sci. Rep. 6, 1 (2016), 1--14.Google ScholarGoogle Scholar
  11. Yoshua Bengio, Tristan Deleu, Nasim Rahaman, Rosemary Ke, Sébastien Lachapelle, Olexa Bilaniuk, Anirudh Goyal, and Christopher Pal. 2019. A meta-transfer objective for learning to disentangle causal mechanisms. Arxiv Preprint:1901.10912 (2019).Google ScholarGoogle Scholar
  12. Giorgos Borboudakis and Ioannis Tsamardinos. 2019. Forward-backward selection with early dropping. J. Mach. Learn. Res. 20, 1 (2019), 276--314.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Gavin Brown, Adam Pocock, Ming-Jie Zhao, and Mikel Luján. 2012. Conditional likelihood maximisation: A unifying framework for information theoretic feature selection. J. Mach. Learn. Res. 13, Jan. (2012), 27--66.Google ScholarGoogle Scholar
  14. Wray Buntine. 1991. Theory refinement on Bayesian networks. In Proceedings of the Uncertainty in Artificial Intelligence Conference (UAI’91). Morgan Kaufmann Publishers Inc., 52--60.Google ScholarGoogle ScholarCross RefCross Ref
  15. Ruichu Cai, Zhenjie Zhang, and Zhifeng Hao. 2011. BASSUM: A Bayesian semi-supervised method for classification feature selection. Pattern Recog. 44, 4 (2011), 811--820.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Luis M. de Campos. 2006. A scoring function for learning Bayesian networks based on mutual information and conditional independence tests. J. Mach. Learn. Res. 7, Oct. (2006), 2149--2187.Google ScholarGoogle Scholar
  17. Debo Cheng, Jiuyong Li, Lin Liu, Jixue Liu, Kui Yu, and Thuc Duy Le. 2020. Causal query in observational data with hidden variables. Arxiv Preprint:2001.10269 (2020).Google ScholarGoogle Scholar
  18. David Maxwell Chickering. 2002. Learning equivalence classes of Bayesian-network structures. J. Mach. Learn. Res. 2, 3 (2002), 445--498.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. David Maxwell Chickering. 2002. Optimal structure identification with greedy search. J. Mach. Learn. Res. 3, Nov. (2002), 507--554.Google ScholarGoogle Scholar
  20. Diego Colombo, Marloes H. Maathuis, Markus Kalisch, and Thomas S. Richardson. 2012. Learning high-dimensional directed acyclic graphs with latent and selection variables. Ann. Statist. 40, 1 (2012), 294--321.Google ScholarGoogle ScholarCross RefCross Ref
  21. Gregory F. Cooper and Edward Herskovits. 1992. A Bayesian method for the induction of probabilistic networks from data. Mach. Learn. 9, 4 (1992), 309--347.Google ScholarGoogle ScholarCross RefCross Ref
  22. Povilas Daniusis, Dominik Janzing, Joris Mooij, Jakob Zscheischler, Bastian Steudel, Kun Zhang, and Bernhard Schölkopf. 2012. Inferring deterministic causal relations. Arxiv Preprint Arxiv:1203.3475 (2012).Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Sergio Rodrigues De Morais and Alex Aussem. 2008. A novel scalable and data efficient feature subset selection algorithm. In Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD’08). Springer, 298--312.Google ScholarGoogle Scholar
  24. Byron Ellis and Wing Hung Wong. 2008. Learning causal Bayesian network structures from experimental data. J. Amer. Statist. Assoc. 103, 482 (2008), 778--789.Google ScholarGoogle ScholarCross RefCross Ref
  25. Robin J. Evans et al. 2018. Margins of discrete Bayesian networks. Ann. Statist. 46, 6A (2018), 2623--2656.Google ScholarGoogle ScholarCross RefCross Ref
  26. Shunkai Fu and Michel C. Desmarais. 2008. Fast Markov blanket discovery algorithm via local learning within single pass. In Proceedings of the Conference of the Canadian Society for Computational Studies of Intelligence. Springer, 96--107.Google ScholarGoogle Scholar
  27. Tian Gao, Kshitij Fadnis, and Murray Campbell. 2017. Local-to-global Bayesian network structure learning. In Proceedings of the International Conference on Machine Learning (ICML’17). JMLR.org, 1193--1202.Google ScholarGoogle Scholar
  28. Tian Gao and Qiang Ji. 2015. Local causal discovery of direct causes and effects. In Proceedings of the Conference on Neural Information Processing Systems (NIPS’15). 2512--2520.Google ScholarGoogle Scholar
  29. Tian Gao and Qiang Ji. 2016. Constrained local latent variable discovery. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI’16). 1490--1496.Google ScholarGoogle Scholar
  30. Tian Gao and Qiang Ji. 2017. Efficient Markov blanket discovery and its application. IEEE Trans. Cyber. 47, 5 (2017), 1169--1179.Google ScholarGoogle ScholarCross RefCross Ref
  31. Tian Gao and Qiang Ji. 2017. Efficient score-based Markov blanket discovery. Int. J. Approx. Reas. 80 (2017), 277--293.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Tian Gao and Dennis Wei. 2018. Parallel Bayesian network structure learning. In Proceedings of the International Conference on Machine Learning (ICML’18). 1671--1680.Google ScholarGoogle Scholar
  33. Clark Glymour, Kun Zhang, and Peter Spirtes. 2019. Review of causal discovery methods based on graphical models. Front. Genet. 10 (2019).Google ScholarGoogle Scholar
  34. Olivier Goudet, Diviyan Kalainathan, Philippe Caillou, Isabelle Guyon, David Lopez-Paz, and Michèle Sebag. 2017. Causal generative neural networks. Arxiv Preprint:1711.08936 (2017).Google ScholarGoogle Scholar
  35. Ruocheng Guo, Lu Cheng, Jundong Li, P. Richard Hahn, and Huan Liu. 2020. A survey of learning causality with data: Problems and methods. ACM Computing Surveys (CSUR) 53, 4 (2020), 1--37.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Isabelle Guyon, Constantin Aliferis, et al. 2007. Causal feature selection. In Computational Methods of Feature Selection. Chapman and Hall/CRC, 75--97.Google ScholarGoogle Scholar
  37. Isabelle Guyon and Andre Elisseeff. 2003. An introduction to variable and feature selection. J. Mach. Learn. Res. 3 (2003), 1157--1182.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. David Heckerman, Dan Geiger, and David M. Chickering. 1995. Learning Bayesian networks: The combination of knowledge and statistical data. Mach. Learn. 20, 3 (1995), 197--243.Google ScholarGoogle ScholarCross RefCross Ref
  39. Patrik O. Hoyer, Dominik Janzing, Joris M. Mooij, Jonas Peters, and Bernhard Schölkopf. 2009. Nonlinear causal discovery with additive noise models. In Proceedings of the Conference on Advances in Neural Information Processing Systems. 689--696.Google ScholarGoogle Scholar
  40. Antti Hyttinen, Frederick Eberhardt, and Matti Järvisalo. 2015. Do-calculus when the true graph is unknown. In Proceedings of the Uncertainty in Artificial Intelligence Conference (UAI’15). Citeseer, 395--404.Google ScholarGoogle Scholar
  41. Dominik Janzing, Joris Mooij, Kun Zhang, Jan Lemeire, Jakob Zscheischler, Povilas Daniušis, Bastian Steudel, and Bernhard Schölkopf. 2012. Information-geometric approach to inferring causal directions. Artif. Intell. 182 (2012), 1--31.Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Markus Kalisch, Martin Mächler, Diego Colombo, Marloes H. Maathuis, Peter Bühlmann, et al. 2012. Causal inference using graphical models with the R package pcalg. J. Statist. Softw. 47, 11 (2012), 1--26.Google ScholarGoogle ScholarCross RefCross Ref
  43. Ron Kohavi and George H. John. 1997. Wrappers for feature subset selection. Artif. Intell. 97, 1–2 (1997), 273--324.Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Mikko Koivisto and Kismat Sood. 2004. Exact Bayesian structure discovery in Bayesian networks. J. Mach. Learn. Res. 5, May (2004), 549--573.Google ScholarGoogle Scholar
  45. Daphne Koller and Mehran Sahami. 1996. Toward optimal feature selection. In Proceedings of the International Conference on Machine Learning (ICML’96). Morgan Kaufmann Publishers Inc., 284--292.Google ScholarGoogle Scholar
  46. Wai Lam and Fahiem Bacchus. 1994. Learning Bayesian belief networks: An approach based on the MDL principle. Comput. Intell. 10, 3 (1994), 269--293.Google ScholarGoogle ScholarCross RefCross Ref
  47. Jundong Li, Kewei Cheng, Suhang Wang, Fred Morstatter, Robert P. Trevino, Jiliang Tang, and Huan Liu. 2017. Feature selection: A data perspective. Comput. Surv. 50, 6 (2017), 94.Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Jiuyong Li, Lin Liu, and Thuc Duy Le. 2015. Practical Approaches to Causal Relationship Exploration. Springer.Google ScholarGoogle Scholar
  49. Zhaolong Ling, Kui Yu, Hao Wang, Lei Li, and Xindong Wu. 2020. Using feature selection for local causal structure learning. IEEE Trans. Emerg. Topics Comput. Intell. DOI:10.1109/TETCI.2020.2978238 (2020).Google ScholarGoogle Scholar
  50. Zhaolong Ling, Kui Yu, Hao Wang, Lin Liu, Wei Ding, and Xindong Wu. 2019. BAMB: A balanced Markov blanket discovery approach to feature selection. ACM Trans. Intell. Syst. Technol. 10, 5 (2019), 1--25.Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Xuqing Liu and Xinsheng Liu. 2016. Swamping and masking in Markov boundary discovery. Mach. Learn. 104, 1 (2016), 25--54.Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Xu-Qing Liu and Xin-Sheng Liu. 2018. Markov blanket and Markov boundary of multiple variables. J. Mach. Learn. Res. 19, 1 (2018), 1658--1707.Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Marloes H. Maathuis, Markus Kalisch, Peter Bühlmann, et al. 2009. Estimating high-dimensional intervention effects from observational data. Ann. Stat. 37, 6A (2009), 3133--3164.Google ScholarGoogle ScholarCross RefCross Ref
  54. Sara Magliacane, Thijs van Ommen, Tom Claassen, Stephan Bongers, Philip Versteeg, and Joris M. Mooij. 2018. Domain adaptation by using causal inference to predict invariant conditional distributions. In Proceedings of the Conference and Workshop on Neural Information Processing Systems (NeurIPS’18). 10846--10856.Google ScholarGoogle Scholar
  55. Dimitris Margaritis. 2009. Toward provably correct feature selection in arbitrary domains. In Proceedings of the Conference on Neural Information Processing Systems (NIPS’09). 1240--1248.Google ScholarGoogle Scholar
  56. Dimitris Margaritis and Sebastian Thrun. 2000. Bayesian network induction via local neighborhoods. In Proceedings of the Conference on Neural Information Processing Systems (NIPS’00). 505--511.Google ScholarGoogle Scholar
  57. Andrés R. Masegosa and Serafín Moral. 2012. A Bayesian stochastic search method for discovering Markov boundaries. Knowl.-based Syst. 35 (2012), 211--223.Google ScholarGoogle Scholar
  58. John H. McDonald. 2009. Handbook of Biological Statistics. Vol. 2. Sparky House Publishing, Baltimore, MD.Google ScholarGoogle Scholar
  59. Joris M. Mooij, Jonas Peters, Dominik Janzing, Jakob Zscheischler, and Bernhard Schölkopf. 2016. Distinguishing cause from effect using observational data: Methods and benchmarks. J. Mach. Learn. Res. 17, 1 (2016), 1103--1204.Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. Kevin Murphy et al. 2001. The Bayes net toolbox for Matlab. Comput. Sci. Statist. 33, 2 (2001), 1024--1034.Google ScholarGoogle Scholar
  61. T. Niinimki and Pekka Parviainen. 2012. Local structure discovery in Bayesian networks. In Proceedings of the Workshop on Causal Structure Learning of UAI’12. 634--643.Google ScholarGoogle Scholar
  62. Judea Pearl. 1995. Causal diagrams for empirical research. Biometrika 82, 4 (1995), 669--688.Google ScholarGoogle ScholarCross RefCross Ref
  63. Judea Pearl. 2009. Causality. Cambridge University Press, Cambridge, UK.Google ScholarGoogle Scholar
  64. Judea Pearl. 2014. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann.Google ScholarGoogle Scholar
  65. Judea Pearl et al. 2009. Causal inference in statistics: An overview. Statist. Surv. 3 (2009), 96--146.Google ScholarGoogle ScholarCross RefCross Ref
  66. Judea Pearl and Dana Mackenzie. 2018. The Book of Why: the New Science of Cause and Effect. Basic Books.Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. Jean-Philippe Pellet and André Elisseeff. 2008. Using Markov blankets for causal structure learning. J. Mach. Learn. Res. 9, July (2008), 1295--1342.Google ScholarGoogle Scholar
  68. Jose M. Peña. 2008. Learning Gaussian graphical models of gene networks with false discovery rate control. In Proceedings of the European Conference on Evolutionary Computation, Machine Learning and Data Mining in Bioinformatics. Springer, 165--176.Google ScholarGoogle ScholarDigital LibraryDigital Library
  69. Jose M. Peña, Johan Björkegren, and Jesper Tegnér. 2005. Scalable, efficient and correct learning of Markov boundaries under the faithfulness assumption. In Proceedings of the European Conference on Symbolic and Quantitative Approaches to Reasoning and Uncertainty. Springer, 136--147.Google ScholarGoogle ScholarDigital LibraryDigital Library
  70. Jose M. Pena, Roland Nilsson, Johan Björkegren, and Jesper Tegnér. 2007. Towards scalable and data efficient learning of Markov boundaries. Int. J. Approx. Reas. 45, 2 (2007), 211--232.Google ScholarGoogle ScholarDigital LibraryDigital Library
  71. Jonas Peters, Peter Bühlmann, and Nicolai Meinshausen. 2016. Causal inference by using invariant prediction: Identification and confidence intervals. J. Roy. Statist. Soc.: Series B (Statist. Methodol.) 78, 5 (2016), 947--1012.Google ScholarGoogle ScholarCross RefCross Ref
  72. Jonas Peters, Dominik Janzing, and Bernhard Scholkopf. 2011. Causal inference on discrete data using additive noise models. IEEE Trans. Pattern Anal. Mach. Intell. 33, 12 (2011), 2436--2450.Google ScholarGoogle ScholarDigital LibraryDigital Library
  73. Jonas Peters, Dominik Janzing, and Bernhard Schölkopf. 2017. Elements of Causal Inference: Foundations and Learning Algorithms. The MIT Press, Cambridge, UK.Google ScholarGoogle ScholarDigital LibraryDigital Library
  74. Jonas Peters, Joris M. Mooij, Dominik Janzing, and Bernhard Schölkopf. 2011. Identifiability of causal graphs using functional models. In Proceedings of the 27th Conference on Uncertainty in Artificial Intelligence. 589--598.Google ScholarGoogle Scholar
  75. Adam Pocock, Mikel Luján, and Gavin Brown. 2012. Informative priors for Markov blanket discovery. In Proceedings of the International Workshop on Artificial Intelligence and Statistics (AI and Statistics’12). 905--913.Google ScholarGoogle Scholar
  76. Joseph Ramsey, Madelyn Glymour, Ruben Sanchez-Romero, and Clark Glymour. 2017. A million variables and more: The fast greedy equivalence search algorithm for learning high-dimensional graphical causal models, with an application to functional magnetic resonance images. Int. J. Data Sci. Anal. 3, 2 (2017), 121--129.Google ScholarGoogle ScholarCross RefCross Ref
  77. Thomas Richardson, Peter Spirtes, et al. 2002. Ancestral graph Markov models. Ann. Stat. 30, 4 (2002), 962--1030.Google ScholarGoogle ScholarCross RefCross Ref
  78. Raanan Y. Rohekar, Shami Nisimov, Yaniv Gurwicz, Guy Koren, and Gal Novik. 2018. Constructing deep neural networks by Bayesian network structure learning. In Proceedings of the Conference and Workshop on Neural Information Processing Systems (NeurIPS’18). 3047--3058.Google ScholarGoogle Scholar
  79. M. Rojas-Carulla, B. Schölkopf, R. Turner, and J. Peters. 2018. Invariant models for causal transfer learning. J. Mach. Learn. Res. 19, 36 (2018), 1--34.Google ScholarGoogle Scholar
  80. Yvan Saeys, Inaki Inza, and Pedro Larranaga. 2007. A review of feature selection techniques in bioinformatics. Bioinformatics 23, 19 (2007), 2507--2517.Google ScholarGoogle ScholarDigital LibraryDigital Library
  81. Richard Scheines, Peter Spirtes, Clark Glymour, Christopher Meek, and Thomas Richardson. 1998. The TETRAD project: Constraint based aids to causal model specification. Multivar. Behav. Res. 33, 1 (1998), 65--117.Google ScholarGoogle ScholarCross RefCross Ref
  82. Bernhard Schölkopf. 2019. Causality for machine learning. Arxiv Preprint:1911.10500 (2019).Google ScholarGoogle Scholar
  83. Marco Scutari. 2009. Learning Bayesian networks with the bnlearn R package. Arxiv Preprint:0908.3817 (2009).Google ScholarGoogle Scholar
  84. Konstantinos Sechidis and Gavin Brown. 2015. Markov blanket discovery in positive-unlabelled and semi-supervised data. In Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD’15). Springer, 351--366.Google ScholarGoogle ScholarDigital LibraryDigital Library
  85. Konstantinos Sechidis and Gavin Brown. 2018. Simple strategies for semi-supervised feature selection. Mach. Learn. 107, 2 (2018), 357--395.Google ScholarGoogle ScholarDigital LibraryDigital Library
  86. Shohei Shimizu, Patrik O. Hoyer, Aapo Hyvärinen, and Antti Kerminen. 2006. A linear non-Gaussian acyclic model for causal discovery. J. Mach. Learn. Res. 7, Oct. (2006), 2003--2030.Google ScholarGoogle Scholar
  87. Peter Spirtes, Clark N. Glymour, Richard Scheines, David Heckerman, Christopher Meek, Gregory Cooper, and Thomas Richardson. 2000. Causation, Prediction, and Search. The MIT Press, Cambridge, MA.Google ScholarGoogle Scholar
  88. Alexander Statnikov, Nikita I. Lytkin, Jan Lemeire, and Constantin F. Aliferis. 2013. Algorithms for discovery of multiple Markov boundaries. J. Mach. Learn. Res. 14, Feb. (2013), 499--566.Google ScholarGoogle Scholar
  89. Alexander Statnikov, Sisi Ma, Mikael Henaff, Nikita Lytkin, Efstratios Efstathiadis, Eric R. Peskin, and Constantin F. Aliferis. 2015. Ultra-scalable and efficient methods for hybrid observational and experimental local causal pathway discovery. J. Mach. Learn. Res. 16, 1 (2015), 3219--3267.Google ScholarGoogle ScholarDigital LibraryDigital Library
  90. Alexander Statnikov, Ioannis Tsamardinos, Laura E. Brown, and Constantin F. Aliferis. 2010. Causal explorer: A Matlab library of algorithms for causal discovery and variable selection for classification. Chall. Mach. Learn. 2 (2010), 267--278.Google ScholarGoogle Scholar
  91. Raphael Suter, Djordje Miladinovic, Bernhard Schölkopf, and Stefan Bauer. 2019. Robustly disentangled causal mechanisms: Validating deep representations for interventional robustness. In Proceedings of the International Conference on Machine Learning (ICML’19). 6056--6065.Google ScholarGoogle Scholar
  92. Ioannis Tsamardinos and Constantin Aliferis. 2003. Towards principled feature selection: Relevancy, filters and wrappers. In Proceedings of the 9th International Workshop on Artificial Intelligence and Statistics. Citeseer.Google ScholarGoogle Scholar
  93. Ioannis Tsamardinos, Constantin F. Aliferis, and Alexander Statnikov. 2003. Time and sample efficient discovery of Markov blankets and direct causal relations. In Proceedings of the Conference on Knowledge Discovery and Data Mining (KDD’03). ACM, 673--678.Google ScholarGoogle ScholarDigital LibraryDigital Library
  94. Ioannis Tsamardinos, Constantin F. Aliferis, Alexander R. Statnikov, and Er Statnikov. 2003. Algorithms for large scale Markov blanket discovery. In Proceedings of the Florida Artificial Intelligence Research Society Conference (FLAIRS’03), Vol. 2. 376--380.Google ScholarGoogle Scholar
  95. Ioannis Tsamardinos, Giorgos Borboudakis, Pavlos Katsogridakis, Polyvios Pratikakis, and Vassilis Christophides. 2019. A greedy feature selection algorithm for big data of high dimensionality. Mach. Learn. 108, 2 (2019), 149--202.Google ScholarGoogle ScholarDigital LibraryDigital Library
  96. Ioannis Tsamardinos, Laura E. Brown, and Constantin F. Aliferis. 2006. The max-min hill-climbing Bayesian network structure learning algorithm. Mach. Learn. 65, 1 (2006), 31--78.Google ScholarGoogle ScholarDigital LibraryDigital Library
  97. Changzhang Wang, You Zhou, Qiang Zhao, and Zhi Geng. 2014. Discovering and orienting the edges connected to a target variable in a DAG via a sequential local learning approach. Comput. Statist. Data Anal. 77 (2014), 252--266.Google ScholarGoogle ScholarCross RefCross Ref
  98. De Wang, Danesh Irani, and Calton Pu. 2012. Evolutionary study of web spam: Webb spam corpus 2011 versus webb spam corpus 2006. In 8th International Conference on Collaborative Computing: Networking, Applications and Worksharing (CollaborateCom'12). IEEE, 40--49.Google ScholarGoogle ScholarDigital LibraryDigital Library
  99. Hao Wang, Zhaolong Ling, Kui Yu, and Xindong Wu. 2020. Towards efficient and effective discovery of Markov blankets for feature selection. Inf. Sci. 509 (2020), 227--242.Google ScholarGoogle ScholarDigital LibraryDigital Library
  100. Xingyu Wu, Bingbing Jiang, Kui Yu, Chunyan Miao, and Huanhuan Chen. 2019. Accurate Markov boundary discovery for causal feature selection. IEEE Trans. Cyber. (2019). DOI:https://doi.org/10.1109/TCYB.2019.2940509Google ScholarGoogle ScholarCross RefCross Ref
  101. Xindong Wu, Kui Yu, Wei Ding, Hao Wang, and Xingquan Zhu. 2013. Online feature selection with streaming features. IEEE Trans. Pattern Anal. Mach. Intell. 35, 5 (2013), 1178--1192.Google ScholarGoogle ScholarDigital LibraryDigital Library
  102. Sandeep Yaramakala and Dimitris Margaritis. 2005. Speculative Markov blanket discovery for optimal feature selection. In Proceedings of the IEEE International Conference on Data Mining (ICDM’05). IEEE, 4--9.Google ScholarGoogle ScholarDigital LibraryDigital Library
  103. Jianxin Yin, You Zhou, Changzhang Wang, Ping He, Cheng Zheng, and Zhi Geng. 2008. Partial orientation and local structural learning of causal networks for prediction. In Proceedings of the Workshop on the Causation and Prediction Challenge. 93--105.Google ScholarGoogle Scholar
  104. Kui Yu, Lin Liu, and Jiuyong Li. 2018. Discovering Markov blanket from multiple interventional datasets. Arxiv Preprint:1801.08295 (2018).Google ScholarGoogle Scholar
  105. Kui Yu, Lin Liu, and Jiuyong Li. 2018. A unified view of causal and non-causal feature selection. Arxiv Preprint:1802.05844 (2018).Google ScholarGoogle Scholar
  106. Kui Yu, Lin Liu, Jiuyong Li, and Huanhuan Chen. 2018. Mining Markov blankets without causal sufficiency. IEEE Trans. Neural Netw. Learn. Syst. 99 (2018), 1--15.Google ScholarGoogle Scholar
  107. Kui Yu, Lin Liu, Jiuyong Li, Wei Ding, and Thuc Le. 2019. Multi-source causal feature selection. IEEE Trans. Pattern Anal. Mach. Intell. DOI:10.1109/TPAMI.2019.2908373 (2019).Google ScholarGoogle ScholarDigital LibraryDigital Library
  108. Kui Yu, Xindong Wu, Wei Ding, Yang Mu, and Hao Wang. 2017. Markov blanket feature selection using representative sets. IEEE Trans. Neural Netw. Learn. Syst. 28, 11 (2017), 2775--2788.Google ScholarGoogle ScholarCross RefCross Ref
  109. Yue Yu, Jie Chen, Tian Gao, and Mo Yu. 2019. DAG-GNN: DAG structure learning with graph neural networks. In Proceedings of the International Conference on Machine Learning (ICML’19). 7154--7163.Google ScholarGoogle Scholar
  110. Yiteng Zhai, Yewsoon Ong, and Ivor W. Tsang. 2014. The emerging big dimensionality. IEEE Comput. Intell. Mag. 9, 3 (2014), 14--26.Google ScholarGoogle ScholarDigital LibraryDigital Library
  111. Kun Zhang, Jonas Peters, Dominik Janzing, and Bernhard Schölkopf. 2012. Kernel-based conditional independence test and application in causal discovery. Arxiv Preprint:1202.3775 (2012).Google ScholarGoogle Scholar
  112. Kun Zhang, Bernhard Schölkopf, Peter Spirtes, and Clark Glymour. 2017. Learning causality and causality-related learning: Some recent progress. Nat. Sci. Rev. 5, 1 (2017), 26--29.Google ScholarGoogle ScholarCross RefCross Ref
  113. Muhan Zhang, Shali Jiang, Zhicheng Cui, Roman Garnett, and Yixin Chen. 2019. D-VAE: A variational autoencoder for directed acyclic graphs. In Proceedings of the Conference and Workshop on Neural Information Processing Systems (NeurIPS’19). 1586--1598.Google ScholarGoogle Scholar

Index Terms

  1. Causality-based Feature Selection: Methods and Evaluations

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Computing Surveys
      ACM Computing Surveys  Volume 53, Issue 5
      September 2021
      782 pages
      ISSN:0360-0300
      EISSN:1557-7341
      DOI:10.1145/3426973
      Issue’s Table of Contents

      Copyright © 2020 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 28 September 2020
      • Revised: 1 June 2020
      • Accepted: 1 June 2020
      • Received: 1 November 2019
      Published in csur Volume 53, Issue 5

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • survey
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format .

    View HTML Format