Abstract
Predictingwhich modules are likely to have faults during operations isimportant to software developers, so that software enhancementefforts can be focused on those modules that need improvementthe most. Modeling software quality with classification treesis attractive because they readily model nonmonotonic relationships.In this paper, we apply the TREEDISCalgorithm which is a refinement of the CHAID algorithmto build classification-tree models. Chaid-based algorithmsdiffer from other classification-tree algorithms in their relianceon chi-squared tests when building the tree. Classification-treemodels are vulnerable to overfitting, where the model reflectsthe structure of the training data set too closely. Even thougha model appears to be accurate on training data, if overfitted,it may be much less accurate when applied to a current data set.To account for the severe consequences of misclassifying fault-pronemodules, our measure of overfitting is based on expected costsof misclassification, rather than the total number of misclassifications.We conducted a case study of a very large telecommunicationssystem. A two-way analysis of variance with repetitions foundthat TREEDISC's significance level was highly relatedto overfitting, and can be used to control it. Moreover, theminimum number of modules in a leaf also influenced the degreeof overfitting.
Similar content being viewed by others
References
Akaike, H. 1987. Factor Analysis and AIC. Psychometrika 52(3): 317–332.
Basili, V. R., Briand, L. C., and Melo, W. 1996. A Validation of Object-Oriented Design Metrics as Quality Indicators. IEEE Transactions on Software Engineering 22(10): 751–761.
Berenson, M. L., Levine, D. M., and Goldstein, M. 1983. Intermediate Statistical Methods and Applications: A Computer Package Approach. Englewood Cliffs, New Jersey USA: Prentice-Hall.
Breiman, L., Friedman, J. H., Olshen, R. A., and Stone, C. J. 1984. Classification and Regression Trees. London: Chapman & Hall.
Briand, L. C., Basili, V. R., and Thomas, W. M. 1992. A Pattern Recognition Approach for Software Engineering Data Analysis. IEEE Transactions on Software Engineering 18(11): 931–942.
Clark, L. A., and Pregibon, D. 1992. Tree-Based Models. In: J. M. Chambers and T. J. Hastie (eds.): Statistical Models in S. Pacific Grove, California: Wadsworth, pp. 377–419.
Ebert, C. 1996. Classification Techniques for Metric-Based Software Development. Software Quality Journal 5(4): 255–272.
El Emam, K., Benlarbi, S., and Goel, N. 1999. Comparing Case-Based Reasoning Classifiers for Predicting High Risk Software Components. Technical Report NRC=ERB-1058, National Research Council Canada, Ottawa, Canada. NRC 43602.
Evanco, W. M., and Agresti, W. W. 1994. A Composite Complexity Approach for Software Defect Modeling. Software Quality Journal 3(1): 27–44.
Gokhale, S. S., and Lyu, M. R. 1997. Regression Tree Modeling for the Prediction of Software Quality. In: H. Pham (ed.): Proceedings of the Third ISSAT International Conference on Reliability and Quality in Design. Anaheim, CA, pp. 31–36.
Hawkins, D. M., and Kass, G. V. 1982. Automatic Interaction Detection. In: D. M. Hawkins (ed.): Topics in Applied Multivariate Analysis. Cambridge: Cambridge University Press, Chapt. 5, pp. 269–302.
Henry, J., Henry, S., Kafura, D., and Matheson, L. 1994. Improving Software Maintenance at Martin Marietta. IEEE Software 11(4): 67–75.
Hudepohl, J. P., Aud, S. J., Khoshgoftaar, T. M., Allen, E. B., and Mayrand, J. 1996. EMERALD: Software Metrics and Models on the Desktop. IEEE Software 13(5): 56–60.
Jones, W. D., Hudepohl, J. P., Khoshgoftaar, T. M., and Allen, E. B. 1999. Application of a Usage Profile in Software Quality Models. In: Proceedings of the Third European Conference on Software Maintenance and Reengineering. Amsterdam, Netherlands, pp. 148–157.
Kass, G. V. 1980. An Exploratory Technique for Investigating Large Quantities of Categorical Data. Applied Statistics 29: 119–127.
Khoshgoftaar, T. M., and Allen, E. B. 1998. Classification of Fault-Prone Software Modules: Prior Probabilities, Costs, and Model Evaluation. Empirical Software Engineering: An International Journal 3(3): 275–298.
Khoshgoftaar, T. M., and Allen, E. B. 2000. A Practical Classification Rule for Software Quality Models. IEEE Transactions on Reliability 49(2): 209–216.
Khoshgoftaar, T. M., Allen, E. B., Bullard, L. A., Halstead, R., and Trio, G. P. 1996a. A Tree-Based Classification Model for Analysis of a Military Software System. In: Proceedings of the IEEE High-Assurance Systems Engineering Workshop. Niagara on the Lake, Ontario, Canada, pp. 244–251.
Khoshgoftaar, T. M., Allen, E. B., Jones, W. D., and Hudepohl, J. P. 1999a. Classification Tree Models of Software Quality over Multiple Releases. In: Proceedings: The Tenth International Symposium on Software Reliability Engineering. Boca Raton, Florida USA, pp. 116–125.
Khoshgoftaar, T. M., Allen, E. B., Jones, W. D., and Hudepohl, J. P. 1999b. Which Software Modules Have Faults thatWill Be Discovered by Customers?. Journal of Software Maintenance: Research and Practice 11(1): 1–18.
Khoshgoftaar, T. M., Allen, E. B., Kalaichelvan, K. S., and Goel, N. 1996b. Early Quality Prediction: A Case Study in Telecommunications. IEEE Software 13(1): 65–71.
Khoshgoftaar, T. M., Allen, E. B., Naik, A., Jones, W. D., and Hudepohl, J. P. 1998. Using Classification Trees for Software Quality Models: Lessons Learned. In: Proceedings of the Third IEEE International High-Assurance Systems Engineering Symposium. Bethesda, MD USA, pp. 82–89.
Khoshgoftaar, T. M., Yuan, X., and Allen, E. B. 2000. Balancing Misclassification Rates in Classification-Tree Models of Software Quality. Empirical Software Engineering: An International Journal 5(4): 313–330.
Khoshgoftaar, T. M., Allen, E. B., Yuan, X., Jones, W. D., and Hudepohl, J. P. 1999d. Assessing Uncertain Predictions of Software Quality. In: Proceedings of the Sixth International Software Metrics Symposium. Boca Raton, Florida USA, pp. 159–168.
Khoshgoftaar, T. M., Allen, E. B., Yuan, X., Jones, W. D., and Hudepohl, J. P. 1999e. Preparing Measurements of Legacy Software for Predicting Operational Faults. In: Proceedings: International Conference on Software Maintenance. Oxford, England, pp. 359–368.
Khoshgoftaar, T. M., and Lanning, D. L. 1995. A Neural Network Approach for Early Detection of Program Modules having High Risk in the Maintenance Phase. Journal of Systems and Software 29(1): 85–91.
Kitchenham, B. A. 1998. A Procedure for Analyzing Unbalanced Datasets. IEEE Transactions on Software Engineering 24(4): 278–301.
Mao, W. 2000. Classification of Software Quality Using Tree Modeling with the SPRINT=SLIQ Algorithm. Master's thesis, Florida Atlantic University, Boca Raton, Florida USA. Advised by Taghi M. Khoshgoftaar.
Mayrand, J., and Coallier, F. 1996. System Acquisition Based on Software Product Assessment. In: Proceedings of the Eighteenth International Conference on Software Engineering. Berlin, pp. 210–219.
Quinlan, 0J. R. 1986. Induction of Decision Trees. Machine Learning 1: 81–106.
SAS Institute staff 1995. TREEDISC Macro (Beta Version). Technical report, SAS Institute, Inc., Cary, NC.0 Documentation with macros.
Schneidewind, N. F. 1992. Methodology for Validating Software Metrics. IEEE Transactions on Software Engineering 18(5): 410–422.
Schneidewind, N. F. 1995. Software Metrics Validation: Space Shuttle Flight Software Example. Annals of Software Engineering 1: 287–309.
Schneidewind, N. F. 1998. An Integrated Process and Product Model. In: Proceedings Fifth International Software Metrics Symposium. Bethesda, MD USA, pp. 224–234.
Seber, G. A. F. 1984. Multivariate Observations. New York: John Wiley and Sons.
Selby, R. W., and Porter, A. A. 1988. Learning from Examples: Generation and Evaluation of Decision Trees for Software Resource Analysis. IEEE Transactions on Software Engineering 14(12): 1743–1756.
Steinberg, D., and Colla, P. 1995. CART: A supplementary modules for SYSTAT. Salford Systems, San Diego, CA.
Stone, M., and Rasp, J. 1993. The Assessment of Predictive Accuracy and Model Overfitting: An Alternative Approach. Journal of Business Finance and Accounting 20(1): 125–131.
Takahashi, R., Muraoka, Y., and Nakamura, Y. 1997. Building Software Quality Classification Trees: Approach, Experimentation, Evaluation. In: Proceedings of the Eighth International Symposium on Software Reliability Engineering. Albuquerque, NM USA, pp. 222–233.
Troster, J., and Tian, J. 1995. Measurement and Defect Modeling for a Legacy Software System. Annals of Software Engineering 1: 95–118.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Khoshgoftaar, T.M., Allen, E.B. Controlling Overfitting in Classification-Tree Models of Software Quality. Empirical Software Engineering 6, 59–79 (2001). https://doi.org/10.1023/A:1009803004576
Issue Date:
DOI: https://doi.org/10.1023/A:1009803004576