Skip to main content
Log in

Using Generalized Estimating Equation to Learn Decision Tree with Multivariate Responses

  • Published:
Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Abstract

Previous decision tree algorithms have used Mahalanobis distance for multiple continuous longitudinal response or generalized entropy index for multiple binary responses. However, these methods are limited to either continuous or binary responses. In this paper, we suggest a new tree-based method that can analyze any type of multiple responses by using a statistical approach, called GEE (generalized estimating equations). The value of this new technique is demonstrated with reference to an application using web-usage survey.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5

Similar content being viewed by others

References

  • Agresti, A. 1990. Categorical Data Analysis. NY: Wiley.

  • Breiman, L., Friedman, J.H., Olshen, R.A. and Stone, C.J. 1984. Classification and Regression Trees. Wadsworth, Belmont.

  • Cappelli, C., Molla, F., and Siciliano, R. 1998. An alternating pruning method based on the impurity complexity measure. COMPSTAT '98. Proceedings in Computational Statistics, pp. 221–226.

  • Chaudhuri, P., Lo. W.D., Loh, Y.H., and Yang, C.C. 1995. Generalized regression trees, Statistica Sinica, 5:641–666.

    Google Scholar 

  • Cox, D.R. 1972. The analysis of multivariate binary data. Applied Statistics, 21:113–120.

    Google Scholar 

  • Diggle, P.J., Liang, K.Y., and Zeger, S.L. 1994. Analysis of Longitudinal Data. Oxford: Clarendon Press.

  • Hand, D.J. 1997. Construction and Assessment of Classification Rules. Chichester, UK: Wiley

  • Hawkins, D.M. and Kass, G.V. 1982. Automatic interaction detection. In Topics in Applied Multivariate Analysis, D.H. Hawkins, (Ed.), Cambridge University Press, pp. 269–302.

  • Horton, N.J., and Lipsitz, S.R. 1999. Review of software to fit generalized estimating equation regression models. The American Statistician. 53:160–169.

    Google Scholar 

  • Laird, N.M. and Ware, J.H. 1982. Random-effects models for longitudinal data. Biometrics, 38:963–974.

    Google Scholar 

  • Larsen, D.R. and Speckman, P.L. 2004. Multivariate regression trees for analysis of abundance data. Biometrics. 60:543–549.

    Google Scholar 

  • Liang, K.-Y. and Zeger, S.L. 1986. Longitudinal data analysis using generalized linear models, Biometrika. 73:13–22.

  • Loh, W.-Y. and Vanichsetakul, N. 1988. Tree-structured classification via generalized discriminant analysis (with discussion). Journal of the American Statistical Association, 83:715–728.

    Google Scholar 

  • McCullagh, P. and Nelder, J.A. 1983. Generalized Linear Models. Chapman and Hall.

  • Quinlan, J.R. 1986. Introduction of decision tree. Machine Learning, 1:81–106.

  • SAS Institute, Inc 2000. SAS Macro Language: Reference, Version 8. Cary NC: SAS Institute Inc.

  • Segal, M.R. 1992. Tree-structured methods for longitudinal data. Journal of the American Statistical Association, 87:407–418.

    Google Scholar 

  • Stokes, M.E., Davis, C.S., and Koch, G.G. 1995. Categorical Data Analysis Using the SAS System. Cary, NC: SAS Institute Inc.

  • Taylor, P.C. and Siverman, B.W. 1992. Block diagram and splitting criteria for classification trees. Statist. Comput., 3:147–161.

    Google Scholar 

  • Thall, P.F. and Vail, S.C. 1990. Some covariance models for longitudinal count data with overdispersion. Biometrics, 46:657–671.

    Google Scholar 

  • Ware M., Frank, E., Holmes, G., Hall, M., and Witten, I.H. 2001. Interactive machine learning: Letting users build clssifiers. Int. J. Human-Computer Studies, 55:281–292.

    Google Scholar 

  • Yu, Y. and Lambert, D. 1999. Fitting trees to functional data, with an application to time-of-day patterns. Journal of Computational and Graphical Statistics, 8:749–762.

    Google Scholar 

  • Zeger, S. and Liang, L. 1986. Longitudinal data analysis using generalized linear models. Biometrika, 73:13–22.

    Google Scholar 

  • Zhang, H.P. 1998. Classification tree for multiple binary responses. Journal of the American Statistical Association, 93:180–193.

    Google Scholar 

  • Zhang, H.P. and Singer, B. 1999. Recursive Partitioning in the Health Sciences. New York: Springer-Verlag.

  • Zhao, L.P., and Prentice, R.L. 1990. Correlated binary regression using a quadratic exponential model. Biometrika, 77:642–648.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Seong Keon Lee.

Additional information

This work was supported by grant No. R05-2003-000-11281-0 from the Basic Research Program of the Korea Science & Engineering Foundation.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lee, S.K., Kang, HC., Han, ST. et al. Using Generalized Estimating Equation to Learn Decision Tree with Multivariate Responses. Data Min Knowl Disc 11, 273–293 (2005). https://doi.org/10.1007/s10618-005-0004-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10618-005-0004-8

Keywords

Navigation