Reference Hub4
Classification and Regression Trees

Classification and Regression Trees

Johannes Gehrke
Copyright: © 2009 |Pages: 4
ISBN13: 9781605660103|ISBN10: 1605660108|EISBN13: 9781605660110
DOI: 10.4018/978-1-60566-010-3.ch031
Cite Chapter Cite Chapter

MLA

Gehrke, Johannes. "Classification and Regression Trees." Encyclopedia of Data Warehousing and Mining, Second Edition, edited by John Wang, IGI Global, 2009, pp. 192-195. https://doi.org/10.4018/978-1-60566-010-3.ch031

APA

Gehrke, J. (2009). Classification and Regression Trees. In J. Wang (Ed.), Encyclopedia of Data Warehousing and Mining, Second Edition (pp. 192-195). IGI Global. https://doi.org/10.4018/978-1-60566-010-3.ch031

Chicago

Gehrke, Johannes. "Classification and Regression Trees." In Encyclopedia of Data Warehousing and Mining, Second Edition, edited by John Wang, 192-195. Hershey, PA: IGI Global, 2009. https://doi.org/10.4018/978-1-60566-010-3.ch031

Export Reference

Mendeley
Favorite

Abstract

It is the goal of classification and regression to build a data mining model that can be used for prediction. To construct such a model, we are given a set of training records, each having several attributes. These attributes can either be numerical (for example, age or salary) or categorical (for example, profession or gender). There is one distinguished attribute, the dependent attribute; the other attributes are called predictor attributes. If the dependent attribute is categorical, the problem is a classification problem. If the dependent attribute is numerical, the problem is a regression problem. It is the goal of classification and regression to construct a data mining model that predicts the (unknown) value for a record where the value of the dependent attribute is unknown. (We call such a record an unlabeled record.) Classification and regression have a wide range of applications, including scientific experiments, medical diagnosis, fraud detection, credit approval, and target marketing (Hand, 1997). Many classification and regression models have been proposed in the literature, among the more popular models are neural networks, genetic algorithms, Bayesian methods, linear and log-linear models and other statistical methods, decision tables, and tree-structured models, the focus of this chapter (Breiman, Friedman, Olshen, & Stone, 1984). Tree-structured models, socalled decision trees, are easy to understand, they are non-parametric and thus do not rely on assumptions about the data distribution, and they have fast construction methods even for large training datasets (Lim, Loh, & Shih, 2000). Most data mining suites include tools for classification and regression tree construction (Goebel & Gruenwald, 1999).

Request Access

You do not own this content. Please login to recommend this title to your institution's librarian or purchase it from the IGI Global bookstore.