Abstract
Determining the causal relation among attributes in a domain is a key task in data mining and knowledge discovery. The Minimum Message Length (MML) principle has demonstrated its ability in discovering linear causal models from training data. To explore the ways to improve efficiency, this paper proposes a novel Markov Blanket identification algorithm based on the Lasso estimator. For each variable, this algorithm first generates a Lasso tree, which represents a pruned candidate set of possible feature sets. The Minimum Message Length principle is then employed to evaluate all those candidate feature sets, and the feature set with minimum message length is chosen as the Markov Blanket. Our experiment results show the ability of this algorithm. In addition, this algorithm can be used to prune the search space of causal discovery, and further reduce the computational cost of those score-based causal discovery algorithms.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Wright, S.: Correlated and causation. Journal of Agricultural Research 20, 557–585 (1921)
Wright, S.: The method of path coefficients. Annals of Mathematical Statistics 5, 161–215 (1934)
Bollen, K.: Structural Equations with Latent Variables. Wiley, New York (1989)
Pearl, J.: Probabilistic Reasoning in Intelligent Systems. Revised second printing edn. Morgan Kauffmann Publishers, San Mateo (1988)
Wallace, C., Boulton, D.: An information measure for classification. Computer Journal 11, 185–194 (1968)
Wallace, C., Korb, K.B., Dai, H.: Causal discovery via MML. In: Proceedings of the 13th International Conference on Machine learning (ICML 1996), pp. 516–524. Morgan Kauffmann Publishers, San Francisco (1996)
Dai, H., Korb, K., Wallace, C., Wu, X.: A study of causal discovery with small samples and weak links. In: Proceedings of the 15th International Joint Conference On Artificial Intelligence IJCAI 1997, pp. 1304–1309. Morgan Kaufmann Publishers, Inc., San Francisco (1997)
Dai, H., Li, G.: An improved approach for the discovery of causal models via MML. In: Chen, M.-S., Yu, P.S., Liu, B. (eds.) PAKDD 2002. LNCS (LNAI), vol. 2336, pp. 304–315. Springer, Heidelberg (2002)
Li, G., Dai, H., Tu, Y.: Linear causal model discovery using MML criterion. In: Proceedings of 2002 IEEE International Conference on Data Mining, Maebashi City, Japan, pp. 274–281. IEEE Computer Society, Los Alamitos (2002)
Dai, H., Li, G., Tu, Y.: An empirical study of encoding schemes and search strategies in discovering causal networks. In: Elomaa, T., Mannila, H., Toivonen, H. (eds.) ECML 2002. LNCS (LNAI), vol. 2430, pp. 48–59. Springer, Heidelberg (2002)
Dai, H., Li, G., Zhou, Z.H., Webb, G.: Ensembling MML causal induction. Technical Report, Deakin University (2003)
Koller, D., Sahami, M.: Toward optimal feature selection. In: Proceedings of the 13th International Conference in Machine Learning (ICML1996), pp. 284–292. Morgan Kaufmann, San Francisco (1996)
Tsamardinos, I., Aliferis, C.: Towards principled feature selection: Relevancy, filters and wrappers. In: Proceedings of the ninth International Workshop on Artificial Intelligence and Statistics, pp. ??–??. IEEE Computer Society Press, Los Alamitos (2003)
Tibshirani, R.: Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society, Series B 58, 267–288 (1996)
Wallace, C., Freeman, P.: Estimation and inference by compact coding. Journal of the Royal Statistical Society B 49, 240–252 (1987)
Conway, J., Sloane, N.: Sphere Packings, Lattices and Groups. Springer, London (1988)
Harvey, A.: The Econometric Analysis of Time Series, 2nd edn. The MIT Press, Cambridge (1990)
Loehlin, J.C.: Latent Variable Models: An Introduction to Factor, Path and Structural Analysis, 2nd edn. Lawrence Erlbaum Associates, Hillsdale (1992)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Li, G., Dai, H., Tu, Y. (2004). Identifying Markov Blankets Using Lasso Estimation. In: Dai, H., Srikant, R., Zhang, C. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2004. Lecture Notes in Computer Science(), vol 3056. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24775-3_39
Download citation
DOI: https://doi.org/10.1007/978-3-540-24775-3_39
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-22064-0
Online ISBN: 978-3-540-24775-3
eBook Packages: Springer Book Archive