Software Project Estimation Using Smooth Curve Methods and Variable Selection and Regularization Methods as an Alternative to Linear Regression Models when the Reference Database Presents a Wedge-shape Form

Valdés-Souto, Francisco; Naranjo-Albarrán, Lizbeth

doi:10.1134/S0361768822080205

Software Project Estimation Using Smooth Curve Methods and Variable Selection and Regularization Methods as an Alternative to Linear Regression Models when the Reference Database Presents a Wedge-shape Form

Published: 21 December 2022

Volume 48, pages 716–734, (2022)
Cite this article

Programming and Computer Software Aims and scope Submit manuscript

Francisco Valdés-Souto¹ &
Lizbeth Naranjo-Albarrán²

97 Accesses
1 Citation
Explore all metrics

Abstract

Context: The impact of an excellent estimation in planning, budgeting, and control, makes the estimation activities an essential element for the software project success. Several estimation techniques have been developed during the last seven decades. Traditional regression-based is the most often estimation method used in the literature. The generation of models needs a reference database, which is usually a wedge-shaped dataset when real projects are considered. The use of regression-based estimation techniques provides low accuracy with this type of database. Objective: Evaluate and provide an alternative to the general practice of using regression-based models, looking if smooth curve methods and variable selection and regularization methods provide better reliability of the estimations based on the wedge-shaped form databases. Method: A previous study used a reference database with a wedge-shaped form to build a regression-based estimating model. This paper utilizes smooth curve methods and variable selection and regularization methods to buildestimation models, providing an alternative to linear regression models. Results: The results show the improvement in the estimation results when smooth curve methods and variable selection and regularization methods are used against regression-based models when wedge-shaped form databases are considered. For example, GAM with all the variables show that the R-squared is for Effort: 0.6864 and for Cost: 0.7581; the MMRE is for Effort: 0.1095 and for Cost: 0.0578. The results for the GAM with LASSO show that the R‑squared is for Effort: 0.6836 and for Cost: 0.7519; the MMRE is for Effort: 0.1105 and for Cost: 0.0585. In comparison to the R-squared is for Effort: 0.6790 and for Cost: 0.7540; the MMRE is for Effort: 0.1107 and for Cost: 0.0582 while using MLR.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Improving the Software Estimation Models Based on Functional Size through Validation of the Assumptions behind the Linear Regression and the Use of the Confidence Intervals When the Reference Database Presents a Wedge-Shape Form

Article 28 December 2021

Using Nonlinear Quantile Regression for the Estimation of Software Cost

Application of Function Points and Data Mining Techniques for Software Estimation - A Combined Approach

REFERENCES

Fedotova, O., Teixeira, L., and Alvelos, A.H., Software effort estimation with multiple linear regression: review and practical application, J. Inf. Sci. Eng., 2013, vol. 29, pp. 925–945.
Google Scholar
Sharma, P. and Singh, J., Systematic literature review on software effort estimation using machine learning approaches, Proc. IEEE Int. Conf. Next Generation Computing and Information Systems ICNGCIS 2017, Jammu, 2018, pp. 54–57. https://doi.org/10.1109/ICNGCIS.2017.33
Oliveira, A.L.I., Estimation of software project effort with support vector regression, Neurocomputing, 2006, vol. 69, pp. 1749–1753. https://doi.org/10.1016/j.neucom.2005.12.119
Article Google Scholar
Papadopoulos, H., Papatheocharous, E. and Andreou, A.S., Reliable confidence intervals for software effort estimation, Proc. Workshops of the 5th IFIP Conf. on Artificial Intelligence Applications & Innovations (AIAI-2009), Thessaloniki, 2009, pp. 211–220. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.156.6056.
Valdés-Souto, F. and Naranjo-Albarrán, L., Improving the software estimation models based on functional size through validation of the assumptions behind the linear regression and the use of the confidence intervals when the reference database presents a wedge-shape form, Program. Comput. Software, 2021, vol. 47, pp. 673–693. https://doi.org/10.1134/S0361768821080259
Article Google Scholar
Jørgensen, M. and Shepperd, M., A systematic review of software development cost estimation studies, IEEE Trans. Software Eng., 2007, vol. 33, pp. 33–53. https://doi.org/10.1109/TSE.2007.256943
Article Google Scholar
Braga, P.L., Oliveira, A.L.I., and Meira, S.R.L., Software effort estimation using machine learning techniques with robust confidence intervals, in Proc. 7th Int. Conf. Hybrid Intelligent Systems, Konig, A., Koppen, M., Abraham, A., Igel, C., and Kasabov, N., Eds., Kaiserslautern: IEEE Computer Soc., 2007. https://doi.org/10.1109/his.2007.56
Shin, M. and Goel, A.L., Empirical data modeling in software engineering using radial basis functions, IEEE Trans. Software Eng., 2000, vol. 26, pp. 567–576. https://doi.org/10.1109/32.852743
Article Google Scholar
Shin, M. and Goel, A.L., Empirical data modeling in software engineering using radial basis functions, IEEE Trans. Software Eng., 2000, vol. 26, pp. 567–576. https://doi.org/10.1109/32.852743
Article Google Scholar
Kitchenham, B. and Mendes, E., Why comparative effort prediction studies may be invalid, Proc. 5th Int. Workshop on Predictive Models in Software Engineering, PROMISE 2009, Vancouver, May 18–19, 2009. https://doi.org/10.1145/1540438.1540444
Bilgaiyan, S., Sagnika, S., Mishra, S., and Das, M., A systematic review on software cost estimation in agile software development, J. Eng. Sci. Technol. Rev., 2017, vol. 10, pp. 51–64. https://doi.org/10.25103/jestr.104.08
Article Google Scholar
Jørgensen, M., Regression models of software development effort estimation accuracy and bias, Empirical Software Eng. Int. J., 2004, vol. 9, pp. 297–314.
Article Google Scholar
Abran, A., Software Project Estimation: The Fundamentals for Providing High Quality Information to Decision Makers, 1st ed., John Wiley & Sons, 2015.
Google Scholar
Kitchenham, B. and Taylor, N., Software cost models, ICL Tech. J., 1984, vol. 4, pp. 73–102.
Google Scholar
Lee, T.K., Wei, K.T., and Ghani, A.A.A., Systematic literature review on effort estimation for Open Sources (OSS) web application development, Proc. IEEE Future Technologies Conf., FTC 2016, San Francisco, 2016, pp. 1158–1167. https://doi.org/10.1109/FTC.2016.7821748
Carbonera, C.E., Farias, K., and Bischoff, V., Software development effort estimation: a systematic mapping study, IET Res. J., 2020, vol. 14, pp. 1–14. https://doi.org/10.1049/iet-sen.2018.5334
Article Google Scholar
Yadav, N., Gupta, N., Aggarwal, M., and Yadav, A., Comparison of COSYSMO model with different software cost estimation techniques, Proc. IEEE Int. Conf. on Issues and Challenges in Intelligent Computing Techniques ICICT 2019, Ghaziabad, 2019. https://doi.org/10.1109/ICICT46931.2019.8977686
Gray, A.R. and MacDonell, S.G., Comparison of techniques for developing predictive models of software metrics, Inf. Software Technol., 1997, vol. 39, pp. 425–437. https://doi.org/10.1016/S0950-5849(96)00006-7
Article Google Scholar
Silhavy, R., Prokopova, Z., and Silhavy, P., Algorithmic optimization method for effort estimation, Program. Comput. Software, 2016, vol. 42, pp. 161–166. https://doi.org/10.1134/S0361768816030087
Article MathSciNet Google Scholar
Durán, M., Juárez-Ramírez, R., Jiménez, S., and Tona, C., User story estimation based on the complexity decomposition using Bayesian networks, Program. Comput. Software, 2020, vol. 46, pp. 569–583. https://doi.org/10.1134/S0361768820080095
Article Google Scholar
Bourque, P., Oligny, S., Abran, A., and Fournier, B., Developing project duration models in software engineering, J. Comput. Sci. Technol., 2007, vol. 22, pp. 348–357.
Article Google Scholar
Linda, M.C.B. and Laird, M., Software Measurement and Estimation: A Practical Approach, New York: Jonh Wiley & Sons, 2006.
Google Scholar
Koch, S. and Mitlöhner, J., Software project effort estimation with voting rules, Decis. Support Syst., 2009, vol. 46, pp. 895–901. https://doi.org/10.1016/j.dss.2008.12.002
Article Google Scholar
De Lucia, A., Pompella, E., and Stefanucci, S., Assessing effort estimation models for corrective maintenance through empirical studies, Inf. Software Technol., 2005, vol. 47, pp. 3–15. https://doi.org/10.1016/j.infsof.2004.05.002
Article Google Scholar
Hill, J., Thomas, L.C., and Allen, D.E., Experts’ estimates of task durations in software development projects, Int. J. Proj. Manag., 2000, vol. 18, pp. 13–21. https://doi.org/10.1016/S0263-7863(98)00062-3
Article Google Scholar
ISO/IEC 14143-1:2007: Information Technology – Software Measurement – Functional Size Measurement, 2007. https://www.iso.org/standard/38931.html.
Shepperd, M. and MacDonell, S., Evaluating prediction systems in software project estimation, Inf. Software Technol., 2012, vol. 54, pp. 820–827. https://doi.org/10.1016/j.infsof.2011.12.008
Article Google Scholar
Foss, T., Stensrud, E., Kitchenham, B., and Myrtveit, I., A simulation study of the model evaluation criterion MMRE, IEEE Trans. Software Eng., 2003, no. 29, pp. 985–995.
Myrtveit, I., Stensrud, E., and Shepperd, M., Reliability and validity in comparative studies of software prediction models, IEEE Trans. Software Eng., 2005, vol. 31, pp. 380–391.
Article Google Scholar
Jørgensen, M., Halkjelsvik, T., and Liestol, K., When should we (not) use the mean magnitude of relative error (MMRE) as an error measure in software development effort estimation?, Inf. Software Technol., 2022, vol. 143, pp. 1–4. https://doi.org/10.1016/j.infsof.2021.106784
Article Google Scholar
Hastie, T., Tibshirani, R., and Friedman, J., The Elements of Statistical Learning, Data Mining, Inference, and Prediction, 2nd ed., New York, Springer, 2009. https://doi.org/10.1007/b94608
Book MATH Google Scholar
Yee, T.W., Vector Generaliz and Linear,Additive with an Implementation in R, Springer, 2015.
Google Scholar
Hastie, T., Tibshirani, R., and Wainwright, M., Statistical Learning with Sparsity the Lasso and Generalizations, Chapman and Hall/CRC, 2016. https://doi.org/10.1111/insr.12167
Book MATH Google Scholar
Wood, S.N., Generalized Additive Models, 2nd ed., New York: Chapman and Hall/CRC, 2017. https://doi.org/10.1201/9781315370279
Book MATH Google Scholar
Hastie, T.J., Tibshirani, R.J., and Sasieni, P., Generalized additive models, Stat. Med., 1992, vol. 11, pp. 981–982.
Google Scholar
McCullagh, P., Nelder, J.A. and Enderlein, G., Generalized linear models, Biom. J., 1987, vol. 29, pp. 206–206. https://doi.org/10.1002/bimj.4710290217
Article Google Scholar
James, G., Witten, D., Hastie, T., and Tibshirani, R., An Introduction to Statistical Learning Gareth James Daniela Witten Trevor Hastie Robert Tibshirani with Applications in R, 1st ed., Springer, 2013. https://doi.org/10.1007/978-1-4614-7138-7
Book Google Scholar
Yuan, M. and Lin, Y., Model selection and estimation in regression with grouped variables, J. R. Stat. Soc. Ser. B (Stat. Methodol.), 2006, vol. 68, pp. 49–67. https://doi.org/10.1111/j.1467-9868.2005.00532.x
Article Google Scholar
Groll, A., Hambuckers, J., Kneib, T., and Umlauf, N., LASSO-type penalization in the framework of generalized additive models for location, scale and shape, Comput. Stat. Data Anal., 2018, vol. 140, pp. 59–73.
Article MathSciNet MATH Google Scholar
Wickham, H., ggplot2: Elegant Graphics for Data Analysis, 2nd ed., New York: Springer-Verlag, 2009.
Meier, L., van de Geer, S., and Bühlmann, P., The group lasso for logistic regression, J. R. Stat. Soc., 2008, vol. 70, pp. 53–71. https://doi.org/10.1111/j.1467-9868.2007.00627.x
Article MathSciNet MATH Google Scholar
Nelder, J. and Wedderburn, R., Generalized linear models, J. R. Stat. Soc., Ser. A, 1972, vol. 135, pp. 370–384.
Article Google Scholar

Download references

Author information

Authors and Affiliations

National Autonomous University of Mexico Science Faculty Investigación Científica, C.U., Coyoacán, 04510, Ciudad de México, CDMX, México
Francisco Valdés-Souto
National Autonomous University of Mexico Science Faculty Investigación Científica, C.U., Coyoacán, 04510, Ciudad de México, CDMX, México
Lizbeth Naranjo-Albarrán

Authors

Francisco Valdés-Souto
View author publications
You can also search for this author in PubMed Google Scholar
Lizbeth Naranjo-Albarrán
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Francisco Valdés-Souto or Lizbeth Naranjo-Albarrán.

Ethics declarations

The authors declare that they have no conflicts of interest.

APPENDIX A. DATABASE INFORMATION

In this appendix the main information about the database used is presented in Table form.

Table S1. Summary of database information

Full size table

Rights and permissions

Reprints and permissions

About this article

Cite this article

Valdés-Souto, F., Naranjo-Albarrán, L. Software Project Estimation Using Smooth Curve Methods and Variable Selection and Regularization Methods as an Alternative to Linear Regression Models when the Reference Database Presents a Wedge-shape Form. Program Comput Soft 48, 716–734 (2022). https://doi.org/10.1134/S0361768822080205

Download citation

Received: 20 June 2022
Revised: 18 July 2022
Accepted: 20 August 2022
Published: 21 December 2022
Issue Date: December 2022
DOI: https://doi.org/10.1134/S0361768822080205

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions