survey

Causality-based Feature Selection: Methods and Evaluations

Authors:
Kui Yu

Hefei University of Technology, Hefei, China

Hefei University of Technology, Hefei, China
View Profile

,
Xianjie Guo

Hefei University of Technology, Hefei, China

Hefei University of Technology, Hefei, China
View Profile

,
Lin Liu

University of South Australia, Adelaide, Australia

University of South Australia, Adelaide, Australia
View Profile

,
Jiuyong Li

University of South Australia, Adelaide, Australia

University of South Australia, Adelaide, Australia
View Profile

,
Hao Wang

Hefei University of Technology, Hefei, China

Hefei University of Technology, Hefei, China
View Profile

,
Zhaolong Ling

Hefei University of Technology, Hefei, China

Hefei University of Technology, Hefei, China
View Profile

,
Xindong Wu

Mininglamp Technology, Beijing, China

Mininglamp Technology, Beijing, China
View Profile

Authors Info & Claims

ACM Computing Surveys Volume 53 Issue 5Article No.: 111pp 1–36https://doi.org/10.1145/3409382

Published:28 September 2020Publication History

ACM Computing Surveys

Abstract

Feature selection is a crucial preprocessing step in data analytics and machine learning. Classical feature selection algorithms select features based on the correlations between predictive features and the class variable and do not attempt to capture causal relationships between them. It has been shown that the knowledge about the causal relationships between features and the class variable has potential benefits for building interpretable and robust prediction models, since causal relationships imply the underlying mechanism of a system. Consequently, causality-based feature selection has gradually attracted greater attentions and many algorithms have been proposed. In this article, we present a comprehensive review of recent advances in causality-based feature selection. To facilitate the development of new algorithms in the research area and make it easy for the comparisons between new methods and existing ones, we develop the first open-source package, called CausalFS, which consists of most of the representative causality-based feature selection algorithms (available at https://github.com/kuiy/CausalFS). Using CausalFS, we conduct extensive experiments to compare the representative algorithms with both synthetic and real-world datasets. Finally, we discuss some challenging problems to be tackled in future research.

Supplemental Material

Available for Download

zip

yu.zip (122.9 KB)

Supplemental movie, appendix, image and software files for, Causality-based Feature Selection: Methods and Evaluations

References

Silvia Acid, Luis M. de Campos, and Javier G. Castellano. 2005. Learning Bayesian network classifiers: Searching in a space of partially directed acyclic graphs. Mach. Learn. 59, 3 (2005), 213--235.Google ScholarDigital Library
Silvia Acid, Luis M. de Campos, and Moisés Fernández. 2013. Score-based methods for learning Markov boundaries by searching in constrained spaces. Data Mining Knowl. Disc. 26, 1 (2013), 174--212.Google ScholarDigital Library
Alan Agresti and Maria Kateri. 2011. Categorical Data Analysis. Springer.Google Scholar
Hirotugu Akaike. 1974. A new look at the statistical model identification. In Selected Papers of Hirotugu Akaike. Springer, 215--222.Google Scholar
Constantin F. Aliferis, Alexander Statnikov, Ioannis Tsamardinos, Subramani Mani, and Xenofon D. Koutsoukos. 2010. Local causal and Markov blanket induction for causal discovery and feature selection for classification part I: Algorithms and empirical evaluation. J. Mach. Learn. Res. 11 (2010), 171--234.Google ScholarCross Ref
Constantin F. Aliferis, Alexander Statnikov, Ioannis Tsamardinos, Subramani Mani, and Xenofon D. Koutsoukos. 2010. Local causal and markov blanket induction for causal discovery and feature selection for classification part ii: Analysis and extensions. J. Mach. Learn. Res. 11, Jan. (2010), 235--284.Google Scholar
Constantin F. Aliferis, Ioannis Tsamardinos, and Alexander Statnikov. 2003. HITON: A novel Markov blanket algorithm for optimal variable selection. In AMIA Annual Symposium Proceedings, Vol. 2003. American Medical Informatics Association, 21.Google Scholar
Martin Arjovsky, Léon Bottou, Ishaan Gulrajani, and David Lopez-Paz. 2019. Invariant risk minimization. Arxiv Preprint Arxiv:1907.02893 (2019).Google Scholar
Susan Athey. 2017. Beyond prediction: Using big data for policy problems. Science 355, 6324 (2017), 483--485.Google Scholar
Harold Bae, Stefano Monti, Monty Montano, Martin H. Steinberg, Thomas T. Perls, and Paola Sebastiani. 2016. Learning Bayesian networks from correlated data. Sci. Rep. 6, 1 (2016), 1--14.Google Scholar
Yoshua Bengio, Tristan Deleu, Nasim Rahaman, Rosemary Ke, Sébastien Lachapelle, Olexa Bilaniuk, Anirudh Goyal, and Christopher Pal. 2019. A meta-transfer objective for learning to disentangle causal mechanisms. Arxiv Preprint:1901.10912 (2019).Google Scholar
Giorgos Borboudakis and Ioannis Tsamardinos. 2019. Forward-backward selection with early dropping. J. Mach. Learn. Res. 20, 1 (2019), 276--314.Google ScholarDigital Library
Gavin Brown, Adam Pocock, Ming-Jie Zhao, and Mikel Luján. 2012. Conditional likelihood maximisation: A unifying framework for information theoretic feature selection. J. Mach. Learn. Res. 13, Jan. (2012), 27--66.Google Scholar
Wray Buntine. 1991. Theory refinement on Bayesian networks. In Proceedings of the Uncertainty in Artificial Intelligence Conference (UAI’91). Morgan Kaufmann Publishers Inc., 52--60.Google ScholarCross Ref
Ruichu Cai, Zhenjie Zhang, and Zhifeng Hao. 2011. BASSUM: A Bayesian semi-supervised method for classification feature selection. Pattern Recog. 44, 4 (2011), 811--820.Google ScholarDigital Library
Luis M. de Campos. 2006. A scoring function for learning Bayesian networks based on mutual information and conditional independence tests. J. Mach. Learn. Res. 7, Oct. (2006), 2149--2187.Google Scholar
Debo Cheng, Jiuyong Li, Lin Liu, Jixue Liu, Kui Yu, and Thuc Duy Le. 2020. Causal query in observational data with hidden variables. Arxiv Preprint:2001.10269 (2020).Google Scholar
David Maxwell Chickering. 2002. Learning equivalence classes of Bayesian-network structures. J. Mach. Learn. Res. 2, 3 (2002), 445--498.Google ScholarDigital Library
David Maxwell Chickering. 2002. Optimal structure identification with greedy search. J. Mach. Learn. Res. 3, Nov. (2002), 507--554.Google Scholar
Diego Colombo, Marloes H. Maathuis, Markus Kalisch, and Thomas S. Richardson. 2012. Learning high-dimensional directed acyclic graphs with latent and selection variables. Ann. Statist. 40, 1 (2012), 294--321.Google ScholarCross Ref
Gregory F. Cooper and Edward Herskovits. 1992. A Bayesian method for the induction of probabilistic networks from data. Mach. Learn. 9, 4 (1992), 309--347.Google ScholarCross Ref
Povilas Daniusis, Dominik Janzing, Joris Mooij, Jakob Zscheischler, Bastian Steudel, Kun Zhang, and Bernhard Schölkopf. 2012. Inferring deterministic causal relations. Arxiv Preprint Arxiv:1203.3475 (2012).Google ScholarDigital Library
Sergio Rodrigues De Morais and Alex Aussem. 2008. A novel scalable and data efficient feature subset selection algorithm. In Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD’08). Springer, 298--312.Google Scholar
Byron Ellis and Wing Hung Wong. 2008. Learning causal Bayesian network structures from experimental data. J. Amer. Statist. Assoc. 103, 482 (2008), 778--789.Google ScholarCross Ref
Robin J. Evans et al. 2018. Margins of discrete Bayesian networks. Ann. Statist. 46, 6A (2018), 2623--2656.Google ScholarCross Ref
Shunkai Fu and Michel C. Desmarais. 2008. Fast Markov blanket discovery algorithm via local learning within single pass. In Proceedings of the Conference of the Canadian Society for Computational Studies of Intelligence. Springer, 96--107.Google Scholar
Tian Gao, Kshitij Fadnis, and Murray Campbell. 2017. Local-to-global Bayesian network structure learning. In Proceedings of the International Conference on Machine Learning (ICML’17). JMLR.org, 1193--1202.Google Scholar
Tian Gao and Qiang Ji. 2015. Local causal discovery of direct causes and effects. In Proceedings of the Conference on Neural Information Processing Systems (NIPS’15). 2512--2520.Google Scholar
Tian Gao and Qiang Ji. 2016. Constrained local latent variable discovery. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI’16). 1490--1496.Google Scholar
Tian Gao and Qiang Ji. 2017. Efficient Markov blanket discovery and its application. IEEE Trans. Cyber. 47, 5 (2017), 1169--1179.Google ScholarCross Ref
Tian Gao and Qiang Ji. 2017. Efficient score-based Markov blanket discovery. Int. J. Approx. Reas. 80 (2017), 277--293.Google ScholarDigital Library
Tian Gao and Dennis Wei. 2018. Parallel Bayesian network structure learning. In Proceedings of the International Conference on Machine Learning (ICML’18). 1671--1680.Google Scholar
Clark Glymour, Kun Zhang, and Peter Spirtes. 2019. Review of causal discovery methods based on graphical models. Front. Genet. 10 (2019).Google Scholar
Olivier Goudet, Diviyan Kalainathan, Philippe Caillou, Isabelle Guyon, David Lopez-Paz, and Michèle Sebag. 2017. Causal generative neural networks. Arxiv Preprint:1711.08936 (2017).Google Scholar
Ruocheng Guo, Lu Cheng, Jundong Li, P. Richard Hahn, and Huan Liu. 2020. A survey of learning causality with data: Problems and methods. ACM Computing Surveys (CSUR) 53, 4 (2020), 1--37.Google ScholarDigital Library
Isabelle Guyon, Constantin Aliferis, et al. 2007. Causal feature selection. In Computational Methods of Feature Selection. Chapman and Hall/CRC, 75--97.Google Scholar
Isabelle Guyon and Andre Elisseeff. 2003. An introduction to variable and feature selection. J. Mach. Learn. Res. 3 (2003), 1157--1182.Google ScholarDigital Library
David Heckerman, Dan Geiger, and David M. Chickering. 1995. Learning Bayesian networks: The combination of knowledge and statistical data. Mach. Learn. 20, 3 (1995), 197--243.Google ScholarCross Ref
Patrik O. Hoyer, Dominik Janzing, Joris M. Mooij, Jonas Peters, and Bernhard Schölkopf. 2009. Nonlinear causal discovery with additive noise models. In Proceedings of the Conference on Advances in Neural Information Processing Systems. 689--696.Google Scholar
Antti Hyttinen, Frederick Eberhardt, and Matti Järvisalo. 2015. Do-calculus when the true graph is unknown. In Proceedings of the Uncertainty in Artificial Intelligence Conference (UAI’15). Citeseer, 395--404.Google Scholar
Dominik Janzing, Joris Mooij, Kun Zhang, Jan Lemeire, Jakob Zscheischler, Povilas Daniušis, Bastian Steudel, and Bernhard Schölkopf. 2012. Information-geometric approach to inferring causal directions. Artif. Intell. 182 (2012), 1--31.Google ScholarDigital Library
Markus Kalisch, Martin Mächler, Diego Colombo, Marloes H. Maathuis, Peter Bühlmann, et al. 2012. Causal inference using graphical models with the R package pcalg. J. Statist. Softw. 47, 11 (2012), 1--26.Google ScholarCross Ref
Ron Kohavi and George H. John. 1997. Wrappers for feature subset selection. Artif. Intell. 97, 1–2 (1997), 273--324.Google ScholarDigital Library
Mikko Koivisto and Kismat Sood. 2004. Exact Bayesian structure discovery in Bayesian networks. J. Mach. Learn. Res. 5, May (2004), 549--573.Google Scholar
Daphne Koller and Mehran Sahami. 1996. Toward optimal feature selection. In Proceedings of the International Conference on Machine Learning (ICML’96). Morgan Kaufmann Publishers Inc., 284--292.Google Scholar
Wai Lam and Fahiem Bacchus. 1994. Learning Bayesian belief networks: An approach based on the MDL principle. Comput. Intell. 10, 3 (1994), 269--293.Google ScholarCross Ref
Jundong Li, Kewei Cheng, Suhang Wang, Fred Morstatter, Robert P. Trevino, Jiliang Tang, and Huan Liu. 2017. Feature selection: A data perspective. Comput. Surv. 50, 6 (2017), 94.Google ScholarDigital Library
Jiuyong Li, Lin Liu, and Thuc Duy Le. 2015. Practical Approaches to Causal Relationship Exploration. Springer.Google Scholar
Zhaolong Ling, Kui Yu, Hao Wang, Lei Li, and Xindong Wu. 2020. Using feature selection for local causal structure learning. IEEE Trans. Emerg. Topics Comput. Intell. DOI:10.1109/TETCI.2020.2978238 (2020).Google Scholar
Zhaolong Ling, Kui Yu, Hao Wang, Lin Liu, Wei Ding, and Xindong Wu. 2019. BAMB: A balanced Markov blanket discovery approach to feature selection. ACM Trans. Intell. Syst. Technol. 10, 5 (2019), 1--25.Google ScholarDigital Library
Xuqing Liu and Xinsheng Liu. 2016. Swamping and masking in Markov boundary discovery. Mach. Learn. 104, 1 (2016), 25--54.Google ScholarDigital Library
Xu-Qing Liu and Xin-Sheng Liu. 2018. Markov blanket and Markov boundary of multiple variables. J. Mach. Learn. Res. 19, 1 (2018), 1658--1707.Google ScholarDigital Library
Marloes H. Maathuis, Markus Kalisch, Peter Bühlmann, et al. 2009. Estimating high-dimensional intervention effects from observational data. Ann. Stat. 37, 6A (2009), 3133--3164.Google ScholarCross Ref
Sara Magliacane, Thijs van Ommen, Tom Claassen, Stephan Bongers, Philip Versteeg, and Joris M. Mooij. 2018. Domain adaptation by using causal inference to predict invariant conditional distributions. In Proceedings of the Conference and Workshop on Neural Information Processing Systems (NeurIPS’18). 10846--10856.Google Scholar
Dimitris Margaritis. 2009. Toward provably correct feature selection in arbitrary domains. In Proceedings of the Conference on Neural Information Processing Systems (NIPS’09). 1240--1248.Google Scholar
Dimitris Margaritis and Sebastian Thrun. 2000. Bayesian network induction via local neighborhoods. In Proceedings of the Conference on Neural Information Processing Systems (NIPS’00). 505--511.Google Scholar
Andrés R. Masegosa and Serafín Moral. 2012. A Bayesian stochastic search method for discovering Markov boundaries. Knowl.-based Syst. 35 (2012), 211--223.Google Scholar
John H. McDonald. 2009. Handbook of Biological Statistics. Vol. 2. Sparky House Publishing, Baltimore, MD.Google Scholar
Joris M. Mooij, Jonas Peters, Dominik Janzing, Jakob Zscheischler, and Bernhard Schölkopf. 2016. Distinguishing cause from effect using observational data: Methods and benchmarks. J. Mach. Learn. Res. 17, 1 (2016), 1103--1204.Google ScholarDigital Library
Kevin Murphy et al. 2001. The Bayes net toolbox for Matlab. Comput. Sci. Statist. 33, 2 (2001), 1024--1034.Google Scholar
T. Niinimki and Pekka Parviainen. 2012. Local structure discovery in Bayesian networks. In Proceedings of the Workshop on Causal Structure Learning of UAI’12. 634--643.Google Scholar
Judea Pearl. 1995. Causal diagrams for empirical research. Biometrika 82, 4 (1995), 669--688.Google ScholarCross Ref
Judea Pearl. 2009. Causality. Cambridge University Press, Cambridge, UK.Google Scholar
Judea Pearl. 2014. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann.Google Scholar
Judea Pearl et al. 2009. Causal inference in statistics: An overview. Statist. Surv. 3 (2009), 96--146.Google ScholarCross Ref
Judea Pearl and Dana Mackenzie. 2018. The Book of Why: the New Science of Cause and Effect. Basic Books.Google ScholarDigital Library
Jean-Philippe Pellet and André Elisseeff. 2008. Using Markov blankets for causal structure learning. J. Mach. Learn. Res. 9, July (2008), 1295--1342.Google Scholar
Jose M. Peña. 2008. Learning Gaussian graphical models of gene networks with false discovery rate control. In Proceedings of the European Conference on Evolutionary Computation, Machine Learning and Data Mining in Bioinformatics. Springer, 165--176.Google ScholarDigital Library
Jose M. Peña, Johan Björkegren, and Jesper Tegnér. 2005. Scalable, efficient and correct learning of Markov boundaries under the faithfulness assumption. In Proceedings of the European Conference on Symbolic and Quantitative Approaches to Reasoning and Uncertainty. Springer, 136--147.Google ScholarDigital Library
Jose M. Pena, Roland Nilsson, Johan Björkegren, and Jesper Tegnér. 2007. Towards scalable and data efficient learning of Markov boundaries. Int. J. Approx. Reas. 45, 2 (2007), 211--232.Google ScholarDigital Library
Jonas Peters, Peter Bühlmann, and Nicolai Meinshausen. 2016. Causal inference by using invariant prediction: Identification and confidence intervals. J. Roy. Statist. Soc.: Series B (Statist. Methodol.) 78, 5 (2016), 947--1012.Google ScholarCross Ref
Jonas Peters, Dominik Janzing, and Bernhard Scholkopf. 2011. Causal inference on discrete data using additive noise models. IEEE Trans. Pattern Anal. Mach. Intell. 33, 12 (2011), 2436--2450.Google ScholarDigital Library
Jonas Peters, Dominik Janzing, and Bernhard Schölkopf. 2017. Elements of Causal Inference: Foundations and Learning Algorithms. The MIT Press, Cambridge, UK.Google ScholarDigital Library
Jonas Peters, Joris M. Mooij, Dominik Janzing, and Bernhard Schölkopf. 2011. Identifiability of causal graphs using functional models. In Proceedings of the 27th Conference on Uncertainty in Artificial Intelligence. 589--598.Google Scholar
Adam Pocock, Mikel Luján, and Gavin Brown. 2012. Informative priors for Markov blanket discovery. In Proceedings of the International Workshop on Artificial Intelligence and Statistics (AI and Statistics’12). 905--913.Google Scholar
Joseph Ramsey, Madelyn Glymour, Ruben Sanchez-Romero, and Clark Glymour. 2017. A million variables and more: The fast greedy equivalence search algorithm for learning high-dimensional graphical causal models, with an application to functional magnetic resonance images. Int. J. Data Sci. Anal. 3, 2 (2017), 121--129.Google ScholarCross Ref
Thomas Richardson, Peter Spirtes, et al. 2002. Ancestral graph Markov models. Ann. Stat. 30, 4 (2002), 962--1030.Google ScholarCross Ref
Raanan Y. Rohekar, Shami Nisimov, Yaniv Gurwicz, Guy Koren, and Gal Novik. 2018. Constructing deep neural networks by Bayesian network structure learning. In Proceedings of the Conference and Workshop on Neural Information Processing Systems (NeurIPS’18). 3047--3058.Google Scholar
M. Rojas-Carulla, B. Schölkopf, R. Turner, and J. Peters. 2018. Invariant models for causal transfer learning. J. Mach. Learn. Res. 19, 36 (2018), 1--34.Google Scholar
Yvan Saeys, Inaki Inza, and Pedro Larranaga. 2007. A review of feature selection techniques in bioinformatics. Bioinformatics 23, 19 (2007), 2507--2517.Google ScholarDigital Library
Richard Scheines, Peter Spirtes, Clark Glymour, Christopher Meek, and Thomas Richardson. 1998. The TETRAD project: Constraint based aids to causal model specification. Multivar. Behav. Res. 33, 1 (1998), 65--117.Google ScholarCross Ref
Bernhard Schölkopf. 2019. Causality for machine learning. Arxiv Preprint:1911.10500 (2019).Google Scholar
Marco Scutari. 2009. Learning Bayesian networks with the bnlearn R package. Arxiv Preprint:0908.3817 (2009).Google Scholar
Konstantinos Sechidis and Gavin Brown. 2015. Markov blanket discovery in positive-unlabelled and semi-supervised data. In Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD’15). Springer, 351--366.Google ScholarDigital Library
Konstantinos Sechidis and Gavin Brown. 2018. Simple strategies for semi-supervised feature selection. Mach. Learn. 107, 2 (2018), 357--395.Google ScholarDigital Library
Shohei Shimizu, Patrik O. Hoyer, Aapo Hyvärinen, and Antti Kerminen. 2006. A linear non-Gaussian acyclic model for causal discovery. J. Mach. Learn. Res. 7, Oct. (2006), 2003--2030.Google Scholar
Peter Spirtes, Clark N. Glymour, Richard Scheines, David Heckerman, Christopher Meek, Gregory Cooper, and Thomas Richardson. 2000. Causation, Prediction, and Search. The MIT Press, Cambridge, MA.Google Scholar
Alexander Statnikov, Nikita I. Lytkin, Jan Lemeire, and Constantin F. Aliferis. 2013. Algorithms for discovery of multiple Markov boundaries. J. Mach. Learn. Res. 14, Feb. (2013), 499--566.Google Scholar
Alexander Statnikov, Sisi Ma, Mikael Henaff, Nikita Lytkin, Efstratios Efstathiadis, Eric R. Peskin, and Constantin F. Aliferis. 2015. Ultra-scalable and efficient methods for hybrid observational and experimental local causal pathway discovery. J. Mach. Learn. Res. 16, 1 (2015), 3219--3267.Google ScholarDigital Library
Alexander Statnikov, Ioannis Tsamardinos, Laura E. Brown, and Constantin F. Aliferis. 2010. Causal explorer: A Matlab library of algorithms for causal discovery and variable selection for classification. Chall. Mach. Learn. 2 (2010), 267--278.Google Scholar
Raphael Suter, Djordje Miladinovic, Bernhard Schölkopf, and Stefan Bauer. 2019. Robustly disentangled causal mechanisms: Validating deep representations for interventional robustness. In Proceedings of the International Conference on Machine Learning (ICML’19). 6056--6065.Google Scholar
Ioannis Tsamardinos and Constantin Aliferis. 2003. Towards principled feature selection: Relevancy, filters and wrappers. In Proceedings of the 9th International Workshop on Artificial Intelligence and Statistics. Citeseer.Google Scholar
Ioannis Tsamardinos, Constantin F. Aliferis, and Alexander Statnikov. 2003. Time and sample efficient discovery of Markov blankets and direct causal relations. In Proceedings of the Conference on Knowledge Discovery and Data Mining (KDD’03). ACM, 673--678.Google ScholarDigital Library
Ioannis Tsamardinos, Constantin F. Aliferis, Alexander R. Statnikov, and Er Statnikov. 2003. Algorithms for large scale Markov blanket discovery. In Proceedings of the Florida Artificial Intelligence Research Society Conference (FLAIRS’03), Vol. 2. 376--380.Google Scholar
Ioannis Tsamardinos, Giorgos Borboudakis, Pavlos Katsogridakis, Polyvios Pratikakis, and Vassilis Christophides. 2019. A greedy feature selection algorithm for big data of high dimensionality. Mach. Learn. 108, 2 (2019), 149--202.Google ScholarDigital Library
Ioannis Tsamardinos, Laura E. Brown, and Constantin F. Aliferis. 2006. The max-min hill-climbing Bayesian network structure learning algorithm. Mach. Learn. 65, 1 (2006), 31--78.Google ScholarDigital Library
Changzhang Wang, You Zhou, Qiang Zhao, and Zhi Geng. 2014. Discovering and orienting the edges connected to a target variable in a DAG via a sequential local learning approach. Comput. Statist. Data Anal. 77 (2014), 252--266.Google ScholarCross Ref
De Wang, Danesh Irani, and Calton Pu. 2012. Evolutionary study of web spam: Webb spam corpus 2011 versus webb spam corpus 2006. In 8th International Conference on Collaborative Computing: Networking, Applications and Worksharing (CollaborateCom'12). IEEE, 40--49.Google ScholarDigital Library
Hao Wang, Zhaolong Ling, Kui Yu, and Xindong Wu. 2020. Towards efficient and effective discovery of Markov blankets for feature selection. Inf. Sci. 509 (2020), 227--242.Google ScholarDigital Library
Xingyu Wu, Bingbing Jiang, Kui Yu, Chunyan Miao, and Huanhuan Chen. 2019. Accurate Markov boundary discovery for causal feature selection. IEEE Trans. Cyber. (2019). DOI:https://doi.org/10.1109/TCYB.2019.2940509Google ScholarCross Ref
Xindong Wu, Kui Yu, Wei Ding, Hao Wang, and Xingquan Zhu. 2013. Online feature selection with streaming features. IEEE Trans. Pattern Anal. Mach. Intell. 35, 5 (2013), 1178--1192.Google ScholarDigital Library
Sandeep Yaramakala and Dimitris Margaritis. 2005. Speculative Markov blanket discovery for optimal feature selection. In Proceedings of the IEEE International Conference on Data Mining (ICDM’05). IEEE, 4--9.Google ScholarDigital Library
Jianxin Yin, You Zhou, Changzhang Wang, Ping He, Cheng Zheng, and Zhi Geng. 2008. Partial orientation and local structural learning of causal networks for prediction. In Proceedings of the Workshop on the Causation and Prediction Challenge. 93--105.Google Scholar
Kui Yu, Lin Liu, and Jiuyong Li. 2018. Discovering Markov blanket from multiple interventional datasets. Arxiv Preprint:1801.08295 (2018).Google Scholar
Kui Yu, Lin Liu, and Jiuyong Li. 2018. A unified view of causal and non-causal feature selection. Arxiv Preprint:1802.05844 (2018).Google Scholar
Kui Yu, Lin Liu, Jiuyong Li, and Huanhuan Chen. 2018. Mining Markov blankets without causal sufficiency. IEEE Trans. Neural Netw. Learn. Syst. 99 (2018), 1--15.Google Scholar
Kui Yu, Lin Liu, Jiuyong Li, Wei Ding, and Thuc Le. 2019. Multi-source causal feature selection. IEEE Trans. Pattern Anal. Mach. Intell. DOI:10.1109/TPAMI.2019.2908373 (2019).Google ScholarDigital Library
Kui Yu, Xindong Wu, Wei Ding, Yang Mu, and Hao Wang. 2017. Markov blanket feature selection using representative sets. IEEE Trans. Neural Netw. Learn. Syst. 28, 11 (2017), 2775--2788.Google ScholarCross Ref
Yue Yu, Jie Chen, Tian Gao, and Mo Yu. 2019. DAG-GNN: DAG structure learning with graph neural networks. In Proceedings of the International Conference on Machine Learning (ICML’19). 7154--7163.Google Scholar
Yiteng Zhai, Yewsoon Ong, and Ivor W. Tsang. 2014. The emerging big dimensionality. IEEE Comput. Intell. Mag. 9, 3 (2014), 14--26.Google ScholarDigital Library
Kun Zhang, Jonas Peters, Dominik Janzing, and Bernhard Schölkopf. 2012. Kernel-based conditional independence test and application in causal discovery. Arxiv Preprint:1202.3775 (2012).Google Scholar
Kun Zhang, Bernhard Schölkopf, Peter Spirtes, and Clark Glymour. 2017. Learning causality and causality-related learning: Some recent progress. Nat. Sci. Rev. 5, 1 (2017), 26--29.Google ScholarCross Ref
Muhan Zhang, Shali Jiang, Zhicheng Cui, Roman Garnett, and Yixin Chen. 2019. D-VAE: A variational autoencoder for directed acyclic graphs. In Proceedings of the Conference and Workshop on Neural Information Processing Systems (NeurIPS’19). 1586--1598.Google Scholar

Index Terms

Causality-based Feature Selection: Methods and Evaluations
1. Computing methodologies
  1. Machine learning
    1. Machine learning algorithms
      1. Feature selection

Recommendations

Correlation based feature selection method

Feature selection is an important data preprocessing step which is performed before a learning algorithm is applied. The issue that has to be taken into consideration when proposing a feature selection method is its computational complexity. Often, if ...
Read More
Genetic algorithms in feature and instance selection

Feature selection and instance selection are two important data preprocessing steps in data mining, where the former is aimed at removing some irrelevant and/or redundant features from a given dataset and the latter at discarding the faulty data. ...
Read More
Synthetic Data for Feature Selection
Artificial Intelligence and Soft Computing
Abstract
Feature selection is an important and active field of research in machine learning and data science. Our goal in this paper is to propose a collection of synthetic datasets that can be used as a common reference point for feature selection ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Computing Surveys Volume 53, Issue 5
September 2021
782 pages
ISSN:0360-0300
EISSN:1557-7341
DOI:10.1145/3426973
Editor:
Albert Zomaya
University of Sydney, Austraila
Issue’s Table of Contents
Copyright © 2020 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 28 September 2020
- Revised: 1 June 2020
- Accepted: 1 June 2020
- Received: 1 November 2019
Published in csur Volume 53, Issue 5

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Bayesian network
Feature selection
Markov boundary
Qualifiers
- survey
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 95
  Total Citations
  View Citations
- 2,264
  Total Downloads
- Downloads (Last 12 months)569
- Downloads (Last 6 weeks)80
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Causality-based Feature Selection: Methods and Evaluations

ACM Computing Surveys

Abstract

Supplemental Material

Available for Download

References

Cited By

Index Terms

Recommendations

Correlation based feature selection method

Genetic algorithms in feature and instance selection

Synthetic Data for Feature Selection