Skip to main content
Log in

Software Metrics Data Analysis—Exploring the Relative Performance of Some Commonly Used Modeling Techniques

  • Published:
Empirical Software Engineering Aims and scope Submit manuscript

Abstract

Whilst some software measurement research has been unquestionably successful, other research has struggled to enable expected advances in project and process management. Contributing to this lack of advancement has been the incidence of inappropriate or non-optimal application of various model-building procedures. This obviously raises questions over the validity and reliability of any results obtained as well as the conclusions that may have been drawn regarding the appropriateness of the techniques in question. In this paper we investigate the influence of various data set characteristics and the purpose of analysis on the effectiveness of four model-building techniques—three statistical methods and one neural network method. In order to illustrate the impact of data set characteristics, three separate data sets, drawn from the literature, are used in this analysis. In terms of predictive accuracy, it is shown that no one modeling method is best in every case. Some consideration of the characteristics of data sets should therefore occur before analysis begins, so that the most appropriate modeling method is then used. Moreover, issues other than predictive accuracy may have a significant influence on the selection of model-building methods. These issues are also addressed here and a series of guidelines for selecting among and implementing these and other modeling techniques is discussed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Albrecht, A. J. and Gaffney, J. E., Jr. 1983. Software function, source lines of code, and development effort prediction: a software science validation. IEEE Transactions on Software Engineering 9(6): 639–648

    Google Scholar 

  • Boehm, B. W. 1981. Software Engineering Economics. Englewood Cliffs, NJ: Prentice-Hall.

    Google Scholar 

  • Dolado, J. J. 1997. A study of the relationships among Albrecht and Mark II function points, lines of code 4GL and effort. Journal of Systems and Software 37: 161–173.

    Google Scholar 

  • Ebrahimi, N. B. 1999. How to improve the calibration of cost models. IEEE Transactions on Software Engineering 25(1): 136–140.

    Google Scholar 

  • Finnie, G. R., Wittig, G. E. and Desharnais, J.-M. 1997. A comparision of software effort estimation techniques: using function points with neural networks, case-based reasoning and regression models. Journal of Systems and Software 39: 281–289.

    Google Scholar 

  • Gray, A. R., and MacDonell, S. G. 1997. A comparison of techniques for developing predictive models of software metrics. Information and Software Technology 39: 425–437.

    Google Scholar 

  • Hakkarainen, J., Laamanen, P. and Rask, R. 1993. Neural networks in specification level software size estimation. Proceedings of the 26th Hawaii International Conference on System Sciences. Hawaii, USA, IEEE Computer Society Press, 626–634.

    Google Scholar 

  • Heiat, A. and Heiat, N. 1997. A model for estimating efforts required for developing small-scale business applications. Journal of Systems and Software 39: 7–14.

    Google Scholar 

  • Hertz, J., Krogh, A., and Palmer, R. G. 1991. Introduction to the Theory of Neural Computation. Redwood City, CA: Addison-Wesley.

    Google Scholar 

  • Hornik, K., Stinchcombe, M., and White, H. 1989. Multilayer feedforward networks are universal approximators. Neural Networks 2: 359–366.

    Google Scholar 

  • Kasabov, N. K. 1996. Foundations of Neural Networks, Fuzzy Systems and Knowledge Engineering. Cambridge, MA: MIT Press.

    Google Scholar 

  • Lanubile, F. and Visaggio, G. 1997. Evaluating predictive quality models derived from software measures: lessons learned. Journal of Systems and Software 38: 225–234.

    Google Scholar 

  • Lee, A., Cheng, C. H. and Balakrishnan, J. 1998. Software development cost estimation: integrating neural network with cluster analysis. Information & Management 34: 1–9.

    Google Scholar 

  • Li, W., and Henry, S. 1993. Object-oriented metrics that predict maintainability. Journal of Systems and Software 23: 111–122.

    Google Scholar 

  • MacDonell, S. G., and Gray, A. R. A comparison of modeling techniques for software development effort prediction. Proceedings of the 1997 International Conference on Neural Information Processing and Intelligent Information Systems, Dunedin, New Zealand, 869–872.

  • Miyazaki, Y., Terakado, M., Ozaki, K., and Nozaki, H. 1994. Robust regression for developing software estimation models. Journal of Systems and Software 27: 3–16.

    Google Scholar 

  • Neter, J., Kutner, M. H., Nachtsheim, C. J., and Wasserman, W. 1996. Applied Linear Statistical Models. Chicago: Irwin.

    Google Scholar 

  • Putnam, L. H., Putnam, D. T. and Thayer, L. P. 1984. A tool for planning software projects. Journal of Systems and Software 5: 147–154

    Google Scholar 

  • Rousseeuw, P. J., and Leroy, A. M. 1987. Robust Regression and Outlier Detection. New York NY, USA: John Wiley & Sons.

    Google Scholar 

  • Stensrud, E. and Myrtveit, I. 1998. Human performance estimating with analogy and regression models: an empirical validation. Proceedings of the Fifth International Software Metrics Symposium (Metrics'98). Los Alamitos, California, IEEE Computer Society Press, 205–213.

    Google Scholar 

  • Wang, L.-X., and Mendel, J. M. 1992. Generating fuzzy rules by learning from examples. IEEE Transactions on Systems, Man, and Cybernetics 22: 1414–1427.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gray, A.R., MacDonell, S.G. Software Metrics Data Analysis—Exploring the Relative Performance of Some Commonly Used Modeling Techniques. Empirical Software Engineering 4, 297–316 (1999). https://doi.org/10.1023/A:1009849100780

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1009849100780

Navigation