Skip to main content
Log in

Software Project Estimation Using Smooth Curve Methods and Variable Selection and Regularization Methods as an Alternative to Linear Regression Models when the Reference Database Presents a Wedge-shape Form

  • Published:
Programming and Computer Software Aims and scope Submit manuscript

Abstract

Context: The impact of an excellent estimation in planning, budgeting, and control, makes the estimation activities an essential element for the software project success. Several estimation techniques have been developed during the last seven decades. Traditional regression-based is the most often estimation method used in the literature. The generation of models needs a reference database, which is usually a wedge-shaped dataset when real projects are considered. The use of regression-based estimation techniques provides low accuracy with this type of database. Objective: Evaluate and provide an alternative to the general practice of using regression-based models, looking if smooth curve methods and variable selection and regularization methods provide better reliability of the estimations based on the wedge-shaped form databases. Method: A previous study used a reference database with a wedge-shaped form to build a regression-based estimating model. This paper utilizes smooth curve methods and variable selection and regularization methods to buildestimation models, providing an alternative to linear regression models. Results: The results show the improvement in the estimation results when smooth curve methods and variable selection and regularization methods are used against regression-based models when wedge-shaped form databases are considered. For example, GAM with all the variables show that the R-squared is for Effort: 0.6864 and for Cost: 0.7581; the MMRE is for Effort: 0.1095 and for Cost: 0.0578. The results for the GAM with LASSO show that the R‑squared is for Effort: 0.6836 and for Cost: 0.7519; the MMRE is for Effort: 0.1105 and for Cost: 0.0585. In comparison to the R-squared is for Effort: 0.6790 and for Cost: 0.7540; the MMRE is for Effort: 0.1107 and for Cost: 0.0582 while using MLR.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1.
Fig. 2.
Fig. 3.
Fig. 4.
Fig. 5.
Fig. 6.
Fig. 7.

Similar content being viewed by others

REFERENCES

  1. Fedotova, O., Teixeira, L., and Alvelos, A.H., Software effort estimation with multiple linear regression: review and practical application, J. Inf. Sci. Eng., 2013, vol. 29, pp. 925–945.

    Google Scholar 

  2. Sharma, P. and Singh, J., Systematic literature review on software effort estimation using machine learning approaches, Proc. IEEE Int. Conf. Next Generation Computing and Information Systems ICNGCIS 2017, Jammu, 2018, pp. 54–57. https://doi.org/10.1109/ICNGCIS.2017.33

  3. Oliveira, A.L.I., Estimation of software project effort with support vector regression, Neurocomputing, 2006, vol. 69, pp. 1749–1753. https://doi.org/10.1016/j.neucom.2005.12.119

    Article  Google Scholar 

  4. Papadopoulos, H., Papatheocharous, E. and Andreou, A.S., Reliable confidence intervals for software effort estimation, Proc. Workshops of the 5th IFIP Conf. on Artificial Intelligence Applications & Innovations (AIAI-2009), Thessaloniki, 2009, pp. 211–220. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.156.6056.

  5. Valdés-Souto, F. and Naranjo-Albarrán, L., Improving the software estimation models based on functional size through validation of the assumptions behind the linear regression and the use of the confidence intervals when the reference database presents a wedge-shape form, Program. Comput. Software, 2021, vol. 47, pp. 673–693. https://doi.org/10.1134/S0361768821080259

    Article  Google Scholar 

  6. Jørgensen, M. and Shepperd, M., A systematic review of software development cost estimation studies, IEEE Trans. Software Eng., 2007, vol. 33, pp. 33–53. https://doi.org/10.1109/TSE.2007.256943

    Article  Google Scholar 

  7. Braga, P.L., Oliveira, A.L.I., and Meira, S.R.L., Software effort estimation using machine learning techniques with robust confidence intervals, in Proc. 7th Int. Conf. Hybrid Intelligent Systems, Konig, A., Koppen, M., Abraham, A., Igel, C., and Kasabov, N., Eds., Kaiserslautern: IEEE Computer Soc., 2007. https://doi.org/10.1109/his.2007.56

  8. Shin, M. and Goel, A.L., Empirical data modeling in software engineering using radial basis functions, IEEE Trans. Software Eng., 2000, vol. 26, pp. 567–576. https://doi.org/10.1109/32.852743

    Article  Google Scholar 

  9. Shin, M. and Goel, A.L., Empirical data modeling in software engineering using radial basis functions, IEEE Trans. Software Eng., 2000, vol. 26, pp. 567–576. https://doi.org/10.1109/32.852743

    Article  Google Scholar 

  10. Kitchenham, B. and Mendes, E., Why comparative effort prediction studies may be invalid, Proc. 5th Int. Workshop on Predictive Models in Software Engineering, PROMISE 2009, Vancouver, May 18–19, 2009. https://doi.org/10.1145/1540438.1540444

  11. Bilgaiyan, S., Sagnika, S., Mishra, S., and Das, M., A systematic review on software cost estimation in agile software development, J. Eng. Sci. Technol. Rev., 2017, vol. 10, pp. 51–64. https://doi.org/10.25103/jestr.104.08

    Article  Google Scholar 

  12. Jørgensen, M., Regression models of software development effort estimation accuracy and bias, Empirical Software Eng. Int. J., 2004, vol. 9, pp. 297–314.

    Article  Google Scholar 

  13. Abran, A., Software Project Estimation: The Fundamentals for Providing High Quality Information to Decision Makers, 1st ed., John Wiley & Sons, 2015.

    Google Scholar 

  14. Kitchenham, B. and Taylor, N., Software cost models, ICL Tech. J., 1984, vol. 4, pp. 73–102.

    Google Scholar 

  15. Lee, T.K., Wei, K.T., and Ghani, A.A.A., Systematic literature review on effort estimation for Open Sources (OSS) web application development, Proc. IEEE Future Technologies Conf., FTC 2016, San Francisco, 2016, pp. 1158–1167. https://doi.org/10.1109/FTC.2016.7821748

  16. Carbonera, C.E., Farias, K., and Bischoff, V., Software development effort estimation: a systematic mapping study, IET Res. J., 2020, vol. 14, pp. 1–14. https://doi.org/10.1049/iet-sen.2018.5334

    Article  Google Scholar 

  17. Yadav, N., Gupta, N., Aggarwal, M., and Yadav, A., Comparison of COSYSMO model with different software cost estimation techniques, Proc. IEEE Int. Conf. on Issues and Challenges in Intelligent Computing Techniques ICICT 2019, Ghaziabad, 2019. https://doi.org/10.1109/ICICT46931.2019.8977686

  18. Gray, A.R. and MacDonell, S.G., Comparison of techniques for developing predictive models of software metrics, Inf. Software Technol., 1997, vol. 39, pp. 425–437. https://doi.org/10.1016/S0950-5849(96)00006-7

    Article  Google Scholar 

  19. Silhavy, R., Prokopova, Z., and Silhavy, P., Algorithmic optimization method for effort estimation, Program. Comput. Software, 2016, vol. 42, pp. 161–166. https://doi.org/10.1134/S0361768816030087

    Article  MathSciNet  Google Scholar 

  20. Durán, M., Juárez-Ramírez, R., Jiménez, S., and Tona, C., User story estimation based on the complexity decomposition using Bayesian networks, Program. Comput. Software, 2020, vol. 46, pp. 569–583. https://doi.org/10.1134/S0361768820080095

    Article  Google Scholar 

  21. Bourque, P., Oligny, S., Abran, A., and Fournier, B., Developing project duration models in software engineering, J. Comput. Sci. Technol., 2007, vol. 22, pp. 348–357.

    Article  Google Scholar 

  22. Linda, M.C.B. and Laird, M., Software Measurement and Estimation: A Practical Approach, New York: Jonh Wiley & Sons, 2006.

    Google Scholar 

  23. Koch, S. and Mitlöhner, J., Software project effort estimation with voting rules, Decis. Support Syst., 2009, vol. 46, pp. 895–901. https://doi.org/10.1016/j.dss.2008.12.002

    Article  Google Scholar 

  24. De Lucia, A., Pompella, E., and Stefanucci, S., Assessing effort estimation models for corrective maintenance through empirical studies, Inf. Software Technol., 2005, vol. 47, pp. 3–15. https://doi.org/10.1016/j.infsof.2004.05.002

    Article  Google Scholar 

  25. Hill, J., Thomas, L.C., and Allen, D.E., Experts’ estimates of task durations in software development projects, Int. J. Proj. Manag., 2000, vol. 18, pp. 13–21. https://doi.org/10.1016/S0263-7863(98)00062-3

    Article  Google Scholar 

  26. ISO/IEC 14143-1:2007: Information Technology – Software Measurement – Functional Size Measurement, 2007. https://www.iso.org/standard/38931.html.

  27. Shepperd, M. and MacDonell, S., Evaluating prediction systems in software project estimation, Inf. Software Technol., 2012, vol. 54, pp. 820–827. https://doi.org/10.1016/j.infsof.2011.12.008

    Article  Google Scholar 

  28. Foss, T., Stensrud, E., Kitchenham, B., and Myrtveit, I., A simulation study of the model evaluation criterion MMRE, IEEE Trans. Software Eng., 2003, no. 29, pp. 985–995.

  29. Myrtveit, I., Stensrud, E., and Shepperd, M., Reliability and validity in comparative studies of software prediction models, IEEE Trans. Software Eng., 2005, vol. 31, pp. 380–391.

    Article  Google Scholar 

  30. Jørgensen, M., Halkjelsvik, T., and Liestol, K., When should we (not) use the mean magnitude of relative error (MMRE) as an error measure in software development effort estimation?, Inf. Software Technol., 2022, vol. 143, pp. 1–4. https://doi.org/10.1016/j.infsof.2021.106784

    Article  Google Scholar 

  31. Hastie, T., Tibshirani, R., and Friedman, J., The Elements of Statistical Learning, Data Mining, Inference, and Prediction, 2nd ed., New York, Springer, 2009. https://doi.org/10.1007/b94608

    Book  MATH  Google Scholar 

  32. Yee, T.W., Vector Generaliz and Linear,Additive with an Implementation in R, Springer, 2015.

    Google Scholar 

  33. Hastie, T., Tibshirani, R., and Wainwright, M., Statistical Learning with Sparsity the Lasso and Generalizations, Chapman and Hall/CRC, 2016. https://doi.org/10.1111/insr.12167

    Book  MATH  Google Scholar 

  34. Wood, S.N., Generalized Additive Models, 2nd ed., New York: Chapman and Hall/CRC, 2017. https://doi.org/10.1201/9781315370279

    Book  MATH  Google Scholar 

  35. Hastie, T.J., Tibshirani, R.J., and Sasieni, P., Generalized additive models, Stat. Med., 1992, vol. 11, pp. 981–982.

    Google Scholar 

  36. McCullagh, P., Nelder, J.A. and Enderlein, G., Generalized linear models, Biom. J., 1987, vol. 29, pp. 206–206. https://doi.org/10.1002/bimj.4710290217

    Article  Google Scholar 

  37. James, G., Witten, D., Hastie, T., and Tibshirani, R., An Introduction to Statistical Learning Gareth James Daniela Witten Trevor Hastie Robert Tibshirani with Applications in R, 1st ed., Springer, 2013. https://doi.org/10.1007/978-1-4614-7138-7

    Book  Google Scholar 

  38. Yuan, M. and Lin, Y., Model selection and estimation in regression with grouped variables, J. R. Stat. Soc. Ser. B (Stat. Methodol.), 2006, vol. 68, pp. 49–67. https://doi.org/10.1111/j.1467-9868.2005.00532.x

    Article  Google Scholar 

  39. Groll, A., Hambuckers, J., Kneib, T., and Umlauf, N., LASSO-type penalization in the framework of generalized additive models for location, scale and shape, Comput. Stat. Data Anal., 2018, vol. 140, pp. 59–73.

    Article  MathSciNet  MATH  Google Scholar 

  40. Wickham, H., ggplot2: Elegant Graphics for Data Analysis, 2nd ed., New York: Springer-Verlag, 2009.

  41. Meier, L., van de Geer, S., and Bühlmann, P., The group lasso for logistic regression, J. R. Stat. Soc., 2008, vol. 70, pp. 53–71. https://doi.org/10.1111/j.1467-9868.2007.00627.x

    Article  MathSciNet  MATH  Google Scholar 

  42. Nelder, J. and Wedderburn, R., Generalized linear models, J. R. Stat. Soc., Ser. A, 1972, vol. 135, pp. 370–384.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Francisco Valdés-Souto or Lizbeth Naranjo-Albarrán.

Ethics declarations

The authors declare that they have no conflicts of interest.

APPENDIX A. DATABASE INFORMATION

APPENDIX A. DATABASE INFORMATION

In this appendix the main information about the database used is presented in Table form.

Table S1. Summary of database information

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Valdés-Souto, F., Naranjo-Albarrán, L. Software Project Estimation Using Smooth Curve Methods and Variable Selection and Regularization Methods as an Alternative to Linear Regression Models when the Reference Database Presents a Wedge-shape Form. Program Comput Soft 48, 716–734 (2022). https://doi.org/10.1134/S0361768822080205

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1134/S0361768822080205

Navigation