Abstract
Previous decision tree algorithms have used Mahalanobis distance for multiple continuous longitudinal response or generalized entropy index for multiple binary responses. However, these methods are limited to either continuous or binary responses. In this paper, we suggest a new tree-based method that can analyze any type of multiple responses by using a statistical approach, called GEE (generalized estimating equations). The value of this new technique is demonstrated with reference to an application using web-usage survey.
Similar content being viewed by others
References
Agresti, A. 1990. Categorical Data Analysis. NY: Wiley.
Breiman, L., Friedman, J.H., Olshen, R.A. and Stone, C.J. 1984. Classification and Regression Trees. Wadsworth, Belmont.
Cappelli, C., Molla, F., and Siciliano, R. 1998. An alternating pruning method based on the impurity complexity measure. COMPSTAT '98. Proceedings in Computational Statistics, pp. 221–226.
Chaudhuri, P., Lo. W.D., Loh, Y.H., and Yang, C.C. 1995. Generalized regression trees, Statistica Sinica, 5:641–666.
Cox, D.R. 1972. The analysis of multivariate binary data. Applied Statistics, 21:113–120.
Diggle, P.J., Liang, K.Y., and Zeger, S.L. 1994. Analysis of Longitudinal Data. Oxford: Clarendon Press.
Hand, D.J. 1997. Construction and Assessment of Classification Rules. Chichester, UK: Wiley
Hawkins, D.M. and Kass, G.V. 1982. Automatic interaction detection. In Topics in Applied Multivariate Analysis, D.H. Hawkins, (Ed.), Cambridge University Press, pp. 269–302.
Horton, N.J., and Lipsitz, S.R. 1999. Review of software to fit generalized estimating equation regression models. The American Statistician. 53:160–169.
Laird, N.M. and Ware, J.H. 1982. Random-effects models for longitudinal data. Biometrics, 38:963–974.
Larsen, D.R. and Speckman, P.L. 2004. Multivariate regression trees for analysis of abundance data. Biometrics. 60:543–549.
Liang, K.-Y. and Zeger, S.L. 1986. Longitudinal data analysis using generalized linear models, Biometrika. 73:13–22.
Loh, W.-Y. and Vanichsetakul, N. 1988. Tree-structured classification via generalized discriminant analysis (with discussion). Journal of the American Statistical Association, 83:715–728.
McCullagh, P. and Nelder, J.A. 1983. Generalized Linear Models. Chapman and Hall.
Quinlan, J.R. 1986. Introduction of decision tree. Machine Learning, 1:81–106.
SAS Institute, Inc 2000. SAS Macro Language: Reference, Version 8. Cary NC: SAS Institute Inc.
Segal, M.R. 1992. Tree-structured methods for longitudinal data. Journal of the American Statistical Association, 87:407–418.
Stokes, M.E., Davis, C.S., and Koch, G.G. 1995. Categorical Data Analysis Using the SAS System. Cary, NC: SAS Institute Inc.
Taylor, P.C. and Siverman, B.W. 1992. Block diagram and splitting criteria for classification trees. Statist. Comput., 3:147–161.
Thall, P.F. and Vail, S.C. 1990. Some covariance models for longitudinal count data with overdispersion. Biometrics, 46:657–671.
Ware M., Frank, E., Holmes, G., Hall, M., and Witten, I.H. 2001. Interactive machine learning: Letting users build clssifiers. Int. J. Human-Computer Studies, 55:281–292.
Yu, Y. and Lambert, D. 1999. Fitting trees to functional data, with an application to time-of-day patterns. Journal of Computational and Graphical Statistics, 8:749–762.
Zeger, S. and Liang, L. 1986. Longitudinal data analysis using generalized linear models. Biometrika, 73:13–22.
Zhang, H.P. 1998. Classification tree for multiple binary responses. Journal of the American Statistical Association, 93:180–193.
Zhang, H.P. and Singer, B. 1999. Recursive Partitioning in the Health Sciences. New York: Springer-Verlag.
Zhao, L.P., and Prentice, R.L. 1990. Correlated binary regression using a quadratic exponential model. Biometrika, 77:642–648.
Author information
Authors and Affiliations
Corresponding author
Additional information
This work was supported by grant No. R05-2003-000-11281-0 from the Basic Research Program of the Korea Science & Engineering Foundation.
Rights and permissions
About this article
Cite this article
Lee, S.K., Kang, HC., Han, ST. et al. Using Generalized Estimating Equation to Learn Decision Tree with Multivariate Responses. Data Min Knowl Disc 11, 273–293 (2005). https://doi.org/10.1007/s10618-005-0004-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10618-005-0004-8