Abstract
We address the problem of joint feature selection in multiple related classification or regression tasks. When doing feature selection with multiple tasks, usually one can “borrow strength” across these tasks to get a more sensitive criterion for deciding which features to select. We propose a novel method, the Multiple Inclusion Criterion (MIC), which modifies stepwise feature selection to more easily select features that are helpful across multiple tasks. Our approach allows each feature to be added to none, some, or all of the tasks. MIC is most beneficial for selecting a small set of predictive features from a large pool of potential features, as is common in genomic and biological datasets. Experimental results on such datasets show that MIC usually outperforms other competing multi-task learning methods not only in terms of accuracy but also by building simpler and more interpretable models.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Caruana, R.: Multitask learning. In: Machine Learning, pp. 41–75 (1997)
Ando, R., Zhang, T.: A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data. The Journal of Machine Learning Research 6, 1817–1853 (2005)
Jacob, L., Bach, F., Vert, J.P.: Clustered multi-task learning: A convex formulation. In: Koller, D., Schuurmans, D., Bengio, Y., Bottou, L. (eds.) Advances in Neural Information Processing Systems, vol. 21 (2009)
Argyriou, A., Evgeniou, T., Pontil, M.: Convex multi-task feature learning. Mach. Learn. 73(3), 243–272 (2008)
Ben-David, S., Borbely, R.S.: A notion of task relatedness yielding provable multiple-task learning guarantees. Mach. Learn. 73(3), 273–287 (2008)
Lee, S.I., Chatalbashev, V., Vickrey, D., Koller, D.: Learning a meta-level prior for feature relevance from multiple related tasks. In: ICML 2007: Proceedings of the 24th international conference on Machine learning, pp. 489–496. ACM, New York (2007)
Raina, R., Ng, A.Y., Koller, D.: Constructing informative priors using transfer learning. In: ICML 2006, pp. 713–720. ACM, New York (2006)
Jebara, T.: Multi-task feature and kernel selection for SVMs. In: ICML 2004, ACM Press, New York (2004)
Turlach, B., Venables, W., Wright, S.: Simultaneous variable selection. Technometrics 47(3), 349–363 (2005)
Obozinski, G., Taskar, B., Jordan, M.I.: Joint covariate selection and joint subspace selection for multiple classification problems. Statistics and Computing (2009)
Efron, B., Hastie, T., Johnstone, L., Tibshirani, R.: Least angle regression. Annals of Statistics 32, 407–499 (2004)
Natarajan, B.: Sparse approximate solutions to linear systems. SIAM journal on computing 24, 227 (1995)
Lin, D., Pitler, E., Foster, D.P., Ungar, L.H.: In defense of ℓ0. In: Workhsop on Feature Selection at International Conference on Machine Learning, ICML 2008 (2008)
Rissanen, J.: Hypothesis selection and testing by the mdl principle. The Computer Journal 42, 260–269 (1999)
George, E., Foster, D.: Calibration and empirical Bayes variable selection. Biometrika 87(4), 731–747 (2000)
Foster, D.P., George, E.I.: The risk inflation criterion for multiple regression. The Annals of Statistics 22(4), 1947–1975 (1994)
Zhou, J., Foster, D., Stine, R., Ungar, L.: Streamwise feature selection. The Journal of Machine Learning Research 7, 1861–1885 (2006)
Rissanen, J.: A universal prior for integers and estimation by minimum description length. Annals of Statistics 11(2), 416–431 (1983)
Elias, P.: Universal codeword sets and representations of the integers. IEEE Transactions on Information Theory 21(2), 194–203 (1975)
Friedman, J.: Fast Sparse Regression and Classification (2008)
Litvin, O., Causton, H.C., Chen, B., Pe’er, D.: Special feature: Modularity and interactions in the genetics of gene expression. Proceedings of the National Academy of Sciences of the United States of America (February 2009) PMID: 19223586
van’t Veer, L.J., Dai, H., van de Vijver, M.J., He, Y.D., Hart, A.A., Mao, M., Peterse, H.L., van der Kooy, K., Marton, M.J., Witteveen, A.T., Schreiber, G.J., Kerkhoven, R.M., Roberts, C., Linsley, P.S., Bernards, R., Friend, S.H.: Gene expression profiling predicts clinical outcome of breast cancer. Nature 415(6871), 530–536 (2002)
Kao, W.C., Rakhlin, A.: Transfer learning toolkit (2007), http://multitask.cs.berkeley.edu
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Dhillon, P.S., Tomasik, B., Foster, D., Ungar, L. (2009). Multi-task Feature Selection Using the Multiple Inclusion Criterion (MIC). In: Buntine, W., Grobelnik, M., Mladenić, D., Shawe-Taylor, J. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2009. Lecture Notes in Computer Science(), vol 5781. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04180-8_35
Download citation
DOI: https://doi.org/10.1007/978-3-642-04180-8_35
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-04179-2
Online ISBN: 978-3-642-04180-8
eBook Packages: Computer ScienceComputer Science (R0)