Abstract
Feature selection is a crucial preprocessing step in data analytics and machine learning. Classical feature selection algorithms select features based on the correlations between predictive features and the class variable and do not attempt to capture causal relationships between them. It has been shown that the knowledge about the causal relationships between features and the class variable has potential benefits for building interpretable and robust prediction models, since causal relationships imply the underlying mechanism of a system. Consequently, causality-based feature selection has gradually attracted greater attentions and many algorithms have been proposed. In this article, we present a comprehensive review of recent advances in causality-based feature selection. To facilitate the development of new algorithms in the research area and make it easy for the comparisons between new methods and existing ones, we develop the first open-source package, called CausalFS, which consists of most of the representative causality-based feature selection algorithms (available at https://github.com/kuiy/CausalFS). Using CausalFS, we conduct extensive experiments to compare the representative algorithms with both synthetic and real-world datasets. Finally, we discuss some challenging problems to be tackled in future research.
Supplemental Material
Available for Download
Supplemental movie, appendix, image and software files for, Causality-based Feature Selection: Methods and Evaluations
- Silvia Acid, Luis M. de Campos, and Javier G. Castellano. 2005. Learning Bayesian network classifiers: Searching in a space of partially directed acyclic graphs. Mach. Learn. 59, 3 (2005), 213--235.Google ScholarDigital Library
- Silvia Acid, Luis M. de Campos, and Moisés Fernández. 2013. Score-based methods for learning Markov boundaries by searching in constrained spaces. Data Mining Knowl. Disc. 26, 1 (2013), 174--212.Google ScholarDigital Library
- Alan Agresti and Maria Kateri. 2011. Categorical Data Analysis. Springer.Google Scholar
- Hirotugu Akaike. 1974. A new look at the statistical model identification. In Selected Papers of Hirotugu Akaike. Springer, 215--222.Google Scholar
- Constantin F. Aliferis, Alexander Statnikov, Ioannis Tsamardinos, Subramani Mani, and Xenofon D. Koutsoukos. 2010. Local causal and Markov blanket induction for causal discovery and feature selection for classification part I: Algorithms and empirical evaluation. J. Mach. Learn. Res. 11 (2010), 171--234.Google ScholarCross Ref
- Constantin F. Aliferis, Alexander Statnikov, Ioannis Tsamardinos, Subramani Mani, and Xenofon D. Koutsoukos. 2010. Local causal and markov blanket induction for causal discovery and feature selection for classification part ii: Analysis and extensions. J. Mach. Learn. Res. 11, Jan. (2010), 235--284.Google Scholar
- Constantin F. Aliferis, Ioannis Tsamardinos, and Alexander Statnikov. 2003. HITON: A novel Markov blanket algorithm for optimal variable selection. In AMIA Annual Symposium Proceedings, Vol. 2003. American Medical Informatics Association, 21.Google Scholar
- Martin Arjovsky, Léon Bottou, Ishaan Gulrajani, and David Lopez-Paz. 2019. Invariant risk minimization. Arxiv Preprint Arxiv:1907.02893 (2019).Google Scholar
- Susan Athey. 2017. Beyond prediction: Using big data for policy problems. Science 355, 6324 (2017), 483--485.Google Scholar
- Harold Bae, Stefano Monti, Monty Montano, Martin H. Steinberg, Thomas T. Perls, and Paola Sebastiani. 2016. Learning Bayesian networks from correlated data. Sci. Rep. 6, 1 (2016), 1--14.Google Scholar
- Yoshua Bengio, Tristan Deleu, Nasim Rahaman, Rosemary Ke, Sébastien Lachapelle, Olexa Bilaniuk, Anirudh Goyal, and Christopher Pal. 2019. A meta-transfer objective for learning to disentangle causal mechanisms. Arxiv Preprint:1901.10912 (2019).Google Scholar
- Giorgos Borboudakis and Ioannis Tsamardinos. 2019. Forward-backward selection with early dropping. J. Mach. Learn. Res. 20, 1 (2019), 276--314.Google ScholarDigital Library
- Gavin Brown, Adam Pocock, Ming-Jie Zhao, and Mikel Luján. 2012. Conditional likelihood maximisation: A unifying framework for information theoretic feature selection. J. Mach. Learn. Res. 13, Jan. (2012), 27--66.Google Scholar
- Wray Buntine. 1991. Theory refinement on Bayesian networks. In Proceedings of the Uncertainty in Artificial Intelligence Conference (UAI’91). Morgan Kaufmann Publishers Inc., 52--60.Google ScholarCross Ref
- Ruichu Cai, Zhenjie Zhang, and Zhifeng Hao. 2011. BASSUM: A Bayesian semi-supervised method for classification feature selection. Pattern Recog. 44, 4 (2011), 811--820.Google ScholarDigital Library
- Luis M. de Campos. 2006. A scoring function for learning Bayesian networks based on mutual information and conditional independence tests. J. Mach. Learn. Res. 7, Oct. (2006), 2149--2187.Google Scholar
- Debo Cheng, Jiuyong Li, Lin Liu, Jixue Liu, Kui Yu, and Thuc Duy Le. 2020. Causal query in observational data with hidden variables. Arxiv Preprint:2001.10269 (2020).Google Scholar
- David Maxwell Chickering. 2002. Learning equivalence classes of Bayesian-network structures. J. Mach. Learn. Res. 2, 3 (2002), 445--498.Google ScholarDigital Library
- David Maxwell Chickering. 2002. Optimal structure identification with greedy search. J. Mach. Learn. Res. 3, Nov. (2002), 507--554.Google Scholar
- Diego Colombo, Marloes H. Maathuis, Markus Kalisch, and Thomas S. Richardson. 2012. Learning high-dimensional directed acyclic graphs with latent and selection variables. Ann. Statist. 40, 1 (2012), 294--321.Google ScholarCross Ref
- Gregory F. Cooper and Edward Herskovits. 1992. A Bayesian method for the induction of probabilistic networks from data. Mach. Learn. 9, 4 (1992), 309--347.Google ScholarCross Ref
- Povilas Daniusis, Dominik Janzing, Joris Mooij, Jakob Zscheischler, Bastian Steudel, Kun Zhang, and Bernhard Schölkopf. 2012. Inferring deterministic causal relations. Arxiv Preprint Arxiv:1203.3475 (2012).Google ScholarDigital Library
- Sergio Rodrigues De Morais and Alex Aussem. 2008. A novel scalable and data efficient feature subset selection algorithm. In Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD’08). Springer, 298--312.Google Scholar
- Byron Ellis and Wing Hung Wong. 2008. Learning causal Bayesian network structures from experimental data. J. Amer. Statist. Assoc. 103, 482 (2008), 778--789.Google ScholarCross Ref
- Robin J. Evans et al. 2018. Margins of discrete Bayesian networks. Ann. Statist. 46, 6A (2018), 2623--2656.Google ScholarCross Ref
- Shunkai Fu and Michel C. Desmarais. 2008. Fast Markov blanket discovery algorithm via local learning within single pass. In Proceedings of the Conference of the Canadian Society for Computational Studies of Intelligence. Springer, 96--107.Google Scholar
- Tian Gao, Kshitij Fadnis, and Murray Campbell. 2017. Local-to-global Bayesian network structure learning. In Proceedings of the International Conference on Machine Learning (ICML’17). JMLR.org, 1193--1202.Google Scholar
- Tian Gao and Qiang Ji. 2015. Local causal discovery of direct causes and effects. In Proceedings of the Conference on Neural Information Processing Systems (NIPS’15). 2512--2520.Google Scholar
- Tian Gao and Qiang Ji. 2016. Constrained local latent variable discovery. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI’16). 1490--1496.Google Scholar
- Tian Gao and Qiang Ji. 2017. Efficient Markov blanket discovery and its application. IEEE Trans. Cyber. 47, 5 (2017), 1169--1179.Google ScholarCross Ref
- Tian Gao and Qiang Ji. 2017. Efficient score-based Markov blanket discovery. Int. J. Approx. Reas. 80 (2017), 277--293.Google ScholarDigital Library
- Tian Gao and Dennis Wei. 2018. Parallel Bayesian network structure learning. In Proceedings of the International Conference on Machine Learning (ICML’18). 1671--1680.Google Scholar
- Clark Glymour, Kun Zhang, and Peter Spirtes. 2019. Review of causal discovery methods based on graphical models. Front. Genet. 10 (2019).Google Scholar
- Olivier Goudet, Diviyan Kalainathan, Philippe Caillou, Isabelle Guyon, David Lopez-Paz, and Michèle Sebag. 2017. Causal generative neural networks. Arxiv Preprint:1711.08936 (2017).Google Scholar
- Ruocheng Guo, Lu Cheng, Jundong Li, P. Richard Hahn, and Huan Liu. 2020. A survey of learning causality with data: Problems and methods. ACM Computing Surveys (CSUR) 53, 4 (2020), 1--37.Google ScholarDigital Library
- Isabelle Guyon, Constantin Aliferis, et al. 2007. Causal feature selection. In Computational Methods of Feature Selection. Chapman and Hall/CRC, 75--97.Google Scholar
- Isabelle Guyon and Andre Elisseeff. 2003. An introduction to variable and feature selection. J. Mach. Learn. Res. 3 (2003), 1157--1182.Google ScholarDigital Library
- David Heckerman, Dan Geiger, and David M. Chickering. 1995. Learning Bayesian networks: The combination of knowledge and statistical data. Mach. Learn. 20, 3 (1995), 197--243.Google ScholarCross Ref
- Patrik O. Hoyer, Dominik Janzing, Joris M. Mooij, Jonas Peters, and Bernhard Schölkopf. 2009. Nonlinear causal discovery with additive noise models. In Proceedings of the Conference on Advances in Neural Information Processing Systems. 689--696.Google Scholar
- Antti Hyttinen, Frederick Eberhardt, and Matti Järvisalo. 2015. Do-calculus when the true graph is unknown. In Proceedings of the Uncertainty in Artificial Intelligence Conference (UAI’15). Citeseer, 395--404.Google Scholar
- Dominik Janzing, Joris Mooij, Kun Zhang, Jan Lemeire, Jakob Zscheischler, Povilas Daniušis, Bastian Steudel, and Bernhard Schölkopf. 2012. Information-geometric approach to inferring causal directions. Artif. Intell. 182 (2012), 1--31.Google ScholarDigital Library
- Markus Kalisch, Martin Mächler, Diego Colombo, Marloes H. Maathuis, Peter Bühlmann, et al. 2012. Causal inference using graphical models with the R package pcalg. J. Statist. Softw. 47, 11 (2012), 1--26.Google ScholarCross Ref
- Ron Kohavi and George H. John. 1997. Wrappers for feature subset selection. Artif. Intell. 97, 1–2 (1997), 273--324.Google ScholarDigital Library
- Mikko Koivisto and Kismat Sood. 2004. Exact Bayesian structure discovery in Bayesian networks. J. Mach. Learn. Res. 5, May (2004), 549--573.Google Scholar
- Daphne Koller and Mehran Sahami. 1996. Toward optimal feature selection. In Proceedings of the International Conference on Machine Learning (ICML’96). Morgan Kaufmann Publishers Inc., 284--292.Google Scholar
- Wai Lam and Fahiem Bacchus. 1994. Learning Bayesian belief networks: An approach based on the MDL principle. Comput. Intell. 10, 3 (1994), 269--293.Google ScholarCross Ref
- Jundong Li, Kewei Cheng, Suhang Wang, Fred Morstatter, Robert P. Trevino, Jiliang Tang, and Huan Liu. 2017. Feature selection: A data perspective. Comput. Surv. 50, 6 (2017), 94.Google ScholarDigital Library
- Jiuyong Li, Lin Liu, and Thuc Duy Le. 2015. Practical Approaches to Causal Relationship Exploration. Springer.Google Scholar
- Zhaolong Ling, Kui Yu, Hao Wang, Lei Li, and Xindong Wu. 2020. Using feature selection for local causal structure learning. IEEE Trans. Emerg. Topics Comput. Intell. DOI:10.1109/TETCI.2020.2978238 (2020).Google Scholar
- Zhaolong Ling, Kui Yu, Hao Wang, Lin Liu, Wei Ding, and Xindong Wu. 2019. BAMB: A balanced Markov blanket discovery approach to feature selection. ACM Trans. Intell. Syst. Technol. 10, 5 (2019), 1--25.Google ScholarDigital Library
- Xuqing Liu and Xinsheng Liu. 2016. Swamping and masking in Markov boundary discovery. Mach. Learn. 104, 1 (2016), 25--54.Google ScholarDigital Library
- Xu-Qing Liu and Xin-Sheng Liu. 2018. Markov blanket and Markov boundary of multiple variables. J. Mach. Learn. Res. 19, 1 (2018), 1658--1707.Google ScholarDigital Library
- Marloes H. Maathuis, Markus Kalisch, Peter Bühlmann, et al. 2009. Estimating high-dimensional intervention effects from observational data. Ann. Stat. 37, 6A (2009), 3133--3164.Google ScholarCross Ref
- Sara Magliacane, Thijs van Ommen, Tom Claassen, Stephan Bongers, Philip Versteeg, and Joris M. Mooij. 2018. Domain adaptation by using causal inference to predict invariant conditional distributions. In Proceedings of the Conference and Workshop on Neural Information Processing Systems (NeurIPS’18). 10846--10856.Google Scholar
- Dimitris Margaritis. 2009. Toward provably correct feature selection in arbitrary domains. In Proceedings of the Conference on Neural Information Processing Systems (NIPS’09). 1240--1248.Google Scholar
- Dimitris Margaritis and Sebastian Thrun. 2000. Bayesian network induction via local neighborhoods. In Proceedings of the Conference on Neural Information Processing Systems (NIPS’00). 505--511.Google Scholar
- Andrés R. Masegosa and Serafín Moral. 2012. A Bayesian stochastic search method for discovering Markov boundaries. Knowl.-based Syst. 35 (2012), 211--223.Google Scholar
- John H. McDonald. 2009. Handbook of Biological Statistics. Vol. 2. Sparky House Publishing, Baltimore, MD.Google Scholar
- Joris M. Mooij, Jonas Peters, Dominik Janzing, Jakob Zscheischler, and Bernhard Schölkopf. 2016. Distinguishing cause from effect using observational data: Methods and benchmarks. J. Mach. Learn. Res. 17, 1 (2016), 1103--1204.Google ScholarDigital Library
- Kevin Murphy et al. 2001. The Bayes net toolbox for Matlab. Comput. Sci. Statist. 33, 2 (2001), 1024--1034.Google Scholar
- T. Niinimki and Pekka Parviainen. 2012. Local structure discovery in Bayesian networks. In Proceedings of the Workshop on Causal Structure Learning of UAI’12. 634--643.Google Scholar
- Judea Pearl. 1995. Causal diagrams for empirical research. Biometrika 82, 4 (1995), 669--688.Google ScholarCross Ref
- Judea Pearl. 2009. Causality. Cambridge University Press, Cambridge, UK.Google Scholar
- Judea Pearl. 2014. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann.Google Scholar
- Judea Pearl et al. 2009. Causal inference in statistics: An overview. Statist. Surv. 3 (2009), 96--146.Google ScholarCross Ref
- Judea Pearl and Dana Mackenzie. 2018. The Book of Why: the New Science of Cause and Effect. Basic Books.Google ScholarDigital Library
- Jean-Philippe Pellet and André Elisseeff. 2008. Using Markov blankets for causal structure learning. J. Mach. Learn. Res. 9, July (2008), 1295--1342.Google Scholar
- Jose M. Peña. 2008. Learning Gaussian graphical models of gene networks with false discovery rate control. In Proceedings of the European Conference on Evolutionary Computation, Machine Learning and Data Mining in Bioinformatics. Springer, 165--176.Google ScholarDigital Library
- Jose M. Peña, Johan Björkegren, and Jesper Tegnér. 2005. Scalable, efficient and correct learning of Markov boundaries under the faithfulness assumption. In Proceedings of the European Conference on Symbolic and Quantitative Approaches to Reasoning and Uncertainty. Springer, 136--147.Google ScholarDigital Library
- Jose M. Pena, Roland Nilsson, Johan Björkegren, and Jesper Tegnér. 2007. Towards scalable and data efficient learning of Markov boundaries. Int. J. Approx. Reas. 45, 2 (2007), 211--232.Google ScholarDigital Library
- Jonas Peters, Peter Bühlmann, and Nicolai Meinshausen. 2016. Causal inference by using invariant prediction: Identification and confidence intervals. J. Roy. Statist. Soc.: Series B (Statist. Methodol.) 78, 5 (2016), 947--1012.Google ScholarCross Ref
- Jonas Peters, Dominik Janzing, and Bernhard Scholkopf. 2011. Causal inference on discrete data using additive noise models. IEEE Trans. Pattern Anal. Mach. Intell. 33, 12 (2011), 2436--2450.Google ScholarDigital Library
- Jonas Peters, Dominik Janzing, and Bernhard Schölkopf. 2017. Elements of Causal Inference: Foundations and Learning Algorithms. The MIT Press, Cambridge, UK.Google ScholarDigital Library
- Jonas Peters, Joris M. Mooij, Dominik Janzing, and Bernhard Schölkopf. 2011. Identifiability of causal graphs using functional models. In Proceedings of the 27th Conference on Uncertainty in Artificial Intelligence. 589--598.Google Scholar
- Adam Pocock, Mikel Luján, and Gavin Brown. 2012. Informative priors for Markov blanket discovery. In Proceedings of the International Workshop on Artificial Intelligence and Statistics (AI and Statistics’12). 905--913.Google Scholar
- Joseph Ramsey, Madelyn Glymour, Ruben Sanchez-Romero, and Clark Glymour. 2017. A million variables and more: The fast greedy equivalence search algorithm for learning high-dimensional graphical causal models, with an application to functional magnetic resonance images. Int. J. Data Sci. Anal. 3, 2 (2017), 121--129.Google ScholarCross Ref
- Thomas Richardson, Peter Spirtes, et al. 2002. Ancestral graph Markov models. Ann. Stat. 30, 4 (2002), 962--1030.Google ScholarCross Ref
- Raanan Y. Rohekar, Shami Nisimov, Yaniv Gurwicz, Guy Koren, and Gal Novik. 2018. Constructing deep neural networks by Bayesian network structure learning. In Proceedings of the Conference and Workshop on Neural Information Processing Systems (NeurIPS’18). 3047--3058.Google Scholar
- M. Rojas-Carulla, B. Schölkopf, R. Turner, and J. Peters. 2018. Invariant models for causal transfer learning. J. Mach. Learn. Res. 19, 36 (2018), 1--34.Google Scholar
- Yvan Saeys, Inaki Inza, and Pedro Larranaga. 2007. A review of feature selection techniques in bioinformatics. Bioinformatics 23, 19 (2007), 2507--2517.Google ScholarDigital Library
- Richard Scheines, Peter Spirtes, Clark Glymour, Christopher Meek, and Thomas Richardson. 1998. The TETRAD project: Constraint based aids to causal model specification. Multivar. Behav. Res. 33, 1 (1998), 65--117.Google ScholarCross Ref
- Bernhard Schölkopf. 2019. Causality for machine learning. Arxiv Preprint:1911.10500 (2019).Google Scholar
- Marco Scutari. 2009. Learning Bayesian networks with the bnlearn R package. Arxiv Preprint:0908.3817 (2009).Google Scholar
- Konstantinos Sechidis and Gavin Brown. 2015. Markov blanket discovery in positive-unlabelled and semi-supervised data. In Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD’15). Springer, 351--366.Google ScholarDigital Library
- Konstantinos Sechidis and Gavin Brown. 2018. Simple strategies for semi-supervised feature selection. Mach. Learn. 107, 2 (2018), 357--395.Google ScholarDigital Library
- Shohei Shimizu, Patrik O. Hoyer, Aapo Hyvärinen, and Antti Kerminen. 2006. A linear non-Gaussian acyclic model for causal discovery. J. Mach. Learn. Res. 7, Oct. (2006), 2003--2030.Google Scholar
- Peter Spirtes, Clark N. Glymour, Richard Scheines, David Heckerman, Christopher Meek, Gregory Cooper, and Thomas Richardson. 2000. Causation, Prediction, and Search. The MIT Press, Cambridge, MA.Google Scholar
- Alexander Statnikov, Nikita I. Lytkin, Jan Lemeire, and Constantin F. Aliferis. 2013. Algorithms for discovery of multiple Markov boundaries. J. Mach. Learn. Res. 14, Feb. (2013), 499--566.Google Scholar
- Alexander Statnikov, Sisi Ma, Mikael Henaff, Nikita Lytkin, Efstratios Efstathiadis, Eric R. Peskin, and Constantin F. Aliferis. 2015. Ultra-scalable and efficient methods for hybrid observational and experimental local causal pathway discovery. J. Mach. Learn. Res. 16, 1 (2015), 3219--3267.Google ScholarDigital Library
- Alexander Statnikov, Ioannis Tsamardinos, Laura E. Brown, and Constantin F. Aliferis. 2010. Causal explorer: A Matlab library of algorithms for causal discovery and variable selection for classification. Chall. Mach. Learn. 2 (2010), 267--278.Google Scholar
- Raphael Suter, Djordje Miladinovic, Bernhard Schölkopf, and Stefan Bauer. 2019. Robustly disentangled causal mechanisms: Validating deep representations for interventional robustness. In Proceedings of the International Conference on Machine Learning (ICML’19). 6056--6065.Google Scholar
- Ioannis Tsamardinos and Constantin Aliferis. 2003. Towards principled feature selection: Relevancy, filters and wrappers. In Proceedings of the 9th International Workshop on Artificial Intelligence and Statistics. Citeseer.Google Scholar
- Ioannis Tsamardinos, Constantin F. Aliferis, and Alexander Statnikov. 2003. Time and sample efficient discovery of Markov blankets and direct causal relations. In Proceedings of the Conference on Knowledge Discovery and Data Mining (KDD’03). ACM, 673--678.Google ScholarDigital Library
- Ioannis Tsamardinos, Constantin F. Aliferis, Alexander R. Statnikov, and Er Statnikov. 2003. Algorithms for large scale Markov blanket discovery. In Proceedings of the Florida Artificial Intelligence Research Society Conference (FLAIRS’03), Vol. 2. 376--380.Google Scholar
- Ioannis Tsamardinos, Giorgos Borboudakis, Pavlos Katsogridakis, Polyvios Pratikakis, and Vassilis Christophides. 2019. A greedy feature selection algorithm for big data of high dimensionality. Mach. Learn. 108, 2 (2019), 149--202.Google ScholarDigital Library
- Ioannis Tsamardinos, Laura E. Brown, and Constantin F. Aliferis. 2006. The max-min hill-climbing Bayesian network structure learning algorithm. Mach. Learn. 65, 1 (2006), 31--78.Google ScholarDigital Library
- Changzhang Wang, You Zhou, Qiang Zhao, and Zhi Geng. 2014. Discovering and orienting the edges connected to a target variable in a DAG via a sequential local learning approach. Comput. Statist. Data Anal. 77 (2014), 252--266.Google ScholarCross Ref
- De Wang, Danesh Irani, and Calton Pu. 2012. Evolutionary study of web spam: Webb spam corpus 2011 versus webb spam corpus 2006. In 8th International Conference on Collaborative Computing: Networking, Applications and Worksharing (CollaborateCom'12). IEEE, 40--49.Google ScholarDigital Library
- Hao Wang, Zhaolong Ling, Kui Yu, and Xindong Wu. 2020. Towards efficient and effective discovery of Markov blankets for feature selection. Inf. Sci. 509 (2020), 227--242.Google ScholarDigital Library
- Xingyu Wu, Bingbing Jiang, Kui Yu, Chunyan Miao, and Huanhuan Chen. 2019. Accurate Markov boundary discovery for causal feature selection. IEEE Trans. Cyber. (2019). DOI:https://doi.org/10.1109/TCYB.2019.2940509Google ScholarCross Ref
- Xindong Wu, Kui Yu, Wei Ding, Hao Wang, and Xingquan Zhu. 2013. Online feature selection with streaming features. IEEE Trans. Pattern Anal. Mach. Intell. 35, 5 (2013), 1178--1192.Google ScholarDigital Library
- Sandeep Yaramakala and Dimitris Margaritis. 2005. Speculative Markov blanket discovery for optimal feature selection. In Proceedings of the IEEE International Conference on Data Mining (ICDM’05). IEEE, 4--9.Google ScholarDigital Library
- Jianxin Yin, You Zhou, Changzhang Wang, Ping He, Cheng Zheng, and Zhi Geng. 2008. Partial orientation and local structural learning of causal networks for prediction. In Proceedings of the Workshop on the Causation and Prediction Challenge. 93--105.Google Scholar
- Kui Yu, Lin Liu, and Jiuyong Li. 2018. Discovering Markov blanket from multiple interventional datasets. Arxiv Preprint:1801.08295 (2018).Google Scholar
- Kui Yu, Lin Liu, and Jiuyong Li. 2018. A unified view of causal and non-causal feature selection. Arxiv Preprint:1802.05844 (2018).Google Scholar
- Kui Yu, Lin Liu, Jiuyong Li, and Huanhuan Chen. 2018. Mining Markov blankets without causal sufficiency. IEEE Trans. Neural Netw. Learn. Syst. 99 (2018), 1--15.Google Scholar
- Kui Yu, Lin Liu, Jiuyong Li, Wei Ding, and Thuc Le. 2019. Multi-source causal feature selection. IEEE Trans. Pattern Anal. Mach. Intell. DOI:10.1109/TPAMI.2019.2908373 (2019).Google ScholarDigital Library
- Kui Yu, Xindong Wu, Wei Ding, Yang Mu, and Hao Wang. 2017. Markov blanket feature selection using representative sets. IEEE Trans. Neural Netw. Learn. Syst. 28, 11 (2017), 2775--2788.Google ScholarCross Ref
- Yue Yu, Jie Chen, Tian Gao, and Mo Yu. 2019. DAG-GNN: DAG structure learning with graph neural networks. In Proceedings of the International Conference on Machine Learning (ICML’19). 7154--7163.Google Scholar
- Yiteng Zhai, Yewsoon Ong, and Ivor W. Tsang. 2014. The emerging big dimensionality. IEEE Comput. Intell. Mag. 9, 3 (2014), 14--26.Google ScholarDigital Library
- Kun Zhang, Jonas Peters, Dominik Janzing, and Bernhard Schölkopf. 2012. Kernel-based conditional independence test and application in causal discovery. Arxiv Preprint:1202.3775 (2012).Google Scholar
- Kun Zhang, Bernhard Schölkopf, Peter Spirtes, and Clark Glymour. 2017. Learning causality and causality-related learning: Some recent progress. Nat. Sci. Rev. 5, 1 (2017), 26--29.Google ScholarCross Ref
- Muhan Zhang, Shali Jiang, Zhicheng Cui, Roman Garnett, and Yixin Chen. 2019. D-VAE: A variational autoencoder for directed acyclic graphs. In Proceedings of the Conference and Workshop on Neural Information Processing Systems (NeurIPS’19). 1586--1598.Google Scholar
Index Terms
- Causality-based Feature Selection: Methods and Evaluations
Recommendations
Correlation based feature selection method
Feature selection is an important data preprocessing step which is performed before a learning algorithm is applied. The issue that has to be taken into consideration when proposing a feature selection method is its computational complexity. Often, if ...
Genetic algorithms in feature and instance selection
Feature selection and instance selection are two important data preprocessing steps in data mining, where the former is aimed at removing some irrelevant and/or redundant features from a given dataset and the latter at discarding the faulty data. ...
Synthetic Data for Feature Selection
Artificial Intelligence and Soft ComputingAbstractFeature selection is an important and active field of research in machine learning and data science. Our goal in this paper is to propose a collection of synthetic datasets that can be used as a common reference point for feature selection ...
Comments