Abstract
Context: The impact of an excellent estimation in planning, budgeting, and control, makes the estimation activities an essential element for the software project success. Several estimation techniques have been developed during the last seven decades. Traditional regression-based is the most often estimation method used in the literature. The generation of models needs a reference database, which is usually a wedge-shaped dataset when real projects are considered. The use of regression-based estimation techniques provides low accuracy with this type of database. Objective: Evaluate and provide an alternative to the general practice of using regression-based models, looking if smooth curve methods and variable selection and regularization methods provide better reliability of the estimations based on the wedge-shaped form databases. Method: A previous study used a reference database with a wedge-shaped form to build a regression-based estimating model. This paper utilizes smooth curve methods and variable selection and regularization methods to buildestimation models, providing an alternative to linear regression models. Results: The results show the improvement in the estimation results when smooth curve methods and variable selection and regularization methods are used against regression-based models when wedge-shaped form databases are considered. For example, GAM with all the variables show that the R-squared is for Effort: 0.6864 and for Cost: 0.7581; the MMRE is for Effort: 0.1095 and for Cost: 0.0578. The results for the GAM with LASSO show that the R‑squared is for Effort: 0.6836 and for Cost: 0.7519; the MMRE is for Effort: 0.1105 and for Cost: 0.0585. In comparison to the R-squared is for Effort: 0.6790 and for Cost: 0.7540; the MMRE is for Effort: 0.1107 and for Cost: 0.0582 while using MLR.
Similar content being viewed by others
REFERENCES
Fedotova, O., Teixeira, L., and Alvelos, A.H., Software effort estimation with multiple linear regression: review and practical application, J. Inf. Sci. Eng., 2013, vol. 29, pp. 925–945.
Sharma, P. and Singh, J., Systematic literature review on software effort estimation using machine learning approaches, Proc. IEEE Int. Conf. Next Generation Computing and Information Systems ICNGCIS 2017, Jammu, 2018, pp. 54–57. https://doi.org/10.1109/ICNGCIS.2017.33
Oliveira, A.L.I., Estimation of software project effort with support vector regression, Neurocomputing, 2006, vol. 69, pp. 1749–1753. https://doi.org/10.1016/j.neucom.2005.12.119
Papadopoulos, H., Papatheocharous, E. and Andreou, A.S., Reliable confidence intervals for software effort estimation, Proc. Workshops of the 5th IFIP Conf. on Artificial Intelligence Applications & Innovations (AIAI-2009), Thessaloniki, 2009, pp. 211–220. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.156.6056.
Valdés-Souto, F. and Naranjo-Albarrán, L., Improving the software estimation models based on functional size through validation of the assumptions behind the linear regression and the use of the confidence intervals when the reference database presents a wedge-shape form, Program. Comput. Software, 2021, vol. 47, pp. 673–693. https://doi.org/10.1134/S0361768821080259
Jørgensen, M. and Shepperd, M., A systematic review of software development cost estimation studies, IEEE Trans. Software Eng., 2007, vol. 33, pp. 33–53. https://doi.org/10.1109/TSE.2007.256943
Braga, P.L., Oliveira, A.L.I., and Meira, S.R.L., Software effort estimation using machine learning techniques with robust confidence intervals, in Proc. 7th Int. Conf. Hybrid Intelligent Systems, Konig, A., Koppen, M., Abraham, A., Igel, C., and Kasabov, N., Eds., Kaiserslautern: IEEE Computer Soc., 2007. https://doi.org/10.1109/his.2007.56
Shin, M. and Goel, A.L., Empirical data modeling in software engineering using radial basis functions, IEEE Trans. Software Eng., 2000, vol. 26, pp. 567–576. https://doi.org/10.1109/32.852743
Shin, M. and Goel, A.L., Empirical data modeling in software engineering using radial basis functions, IEEE Trans. Software Eng., 2000, vol. 26, pp. 567–576. https://doi.org/10.1109/32.852743
Kitchenham, B. and Mendes, E., Why comparative effort prediction studies may be invalid, Proc. 5th Int. Workshop on Predictive Models in Software Engineering, PROMISE 2009, Vancouver, May 18–19, 2009. https://doi.org/10.1145/1540438.1540444
Bilgaiyan, S., Sagnika, S., Mishra, S., and Das, M., A systematic review on software cost estimation in agile software development, J. Eng. Sci. Technol. Rev., 2017, vol. 10, pp. 51–64. https://doi.org/10.25103/jestr.104.08
Jørgensen, M., Regression models of software development effort estimation accuracy and bias, Empirical Software Eng. Int. J., 2004, vol. 9, pp. 297–314.
Abran, A., Software Project Estimation: The Fundamentals for Providing High Quality Information to Decision Makers, 1st ed., John Wiley & Sons, 2015.
Kitchenham, B. and Taylor, N., Software cost models, ICL Tech. J., 1984, vol. 4, pp. 73–102.
Lee, T.K., Wei, K.T., and Ghani, A.A.A., Systematic literature review on effort estimation for Open Sources (OSS) web application development, Proc. IEEE Future Technologies Conf., FTC 2016, San Francisco, 2016, pp. 1158–1167. https://doi.org/10.1109/FTC.2016.7821748
Carbonera, C.E., Farias, K., and Bischoff, V., Software development effort estimation: a systematic mapping study, IET Res. J., 2020, vol. 14, pp. 1–14. https://doi.org/10.1049/iet-sen.2018.5334
Yadav, N., Gupta, N., Aggarwal, M., and Yadav, A., Comparison of COSYSMO model with different software cost estimation techniques, Proc. IEEE Int. Conf. on Issues and Challenges in Intelligent Computing Techniques ICICT 2019, Ghaziabad, 2019. https://doi.org/10.1109/ICICT46931.2019.8977686
Gray, A.R. and MacDonell, S.G., Comparison of techniques for developing predictive models of software metrics, Inf. Software Technol., 1997, vol. 39, pp. 425–437. https://doi.org/10.1016/S0950-5849(96)00006-7
Silhavy, R., Prokopova, Z., and Silhavy, P., Algorithmic optimization method for effort estimation, Program. Comput. Software, 2016, vol. 42, pp. 161–166. https://doi.org/10.1134/S0361768816030087
Durán, M., Juárez-Ramírez, R., Jiménez, S., and Tona, C., User story estimation based on the complexity decomposition using Bayesian networks, Program. Comput. Software, 2020, vol. 46, pp. 569–583. https://doi.org/10.1134/S0361768820080095
Bourque, P., Oligny, S., Abran, A., and Fournier, B., Developing project duration models in software engineering, J. Comput. Sci. Technol., 2007, vol. 22, pp. 348–357.
Linda, M.C.B. and Laird, M., Software Measurement and Estimation: A Practical Approach, New York: Jonh Wiley & Sons, 2006.
Koch, S. and Mitlöhner, J., Software project effort estimation with voting rules, Decis. Support Syst., 2009, vol. 46, pp. 895–901. https://doi.org/10.1016/j.dss.2008.12.002
De Lucia, A., Pompella, E., and Stefanucci, S., Assessing effort estimation models for corrective maintenance through empirical studies, Inf. Software Technol., 2005, vol. 47, pp. 3–15. https://doi.org/10.1016/j.infsof.2004.05.002
Hill, J., Thomas, L.C., and Allen, D.E., Experts’ estimates of task durations in software development projects, Int. J. Proj. Manag., 2000, vol. 18, pp. 13–21. https://doi.org/10.1016/S0263-7863(98)00062-3
ISO/IEC 14143-1:2007: Information Technology – Software Measurement – Functional Size Measurement, 2007. https://www.iso.org/standard/38931.html.
Shepperd, M. and MacDonell, S., Evaluating prediction systems in software project estimation, Inf. Software Technol., 2012, vol. 54, pp. 820–827. https://doi.org/10.1016/j.infsof.2011.12.008
Foss, T., Stensrud, E., Kitchenham, B., and Myrtveit, I., A simulation study of the model evaluation criterion MMRE, IEEE Trans. Software Eng., 2003, no. 29, pp. 985–995.
Myrtveit, I., Stensrud, E., and Shepperd, M., Reliability and validity in comparative studies of software prediction models, IEEE Trans. Software Eng., 2005, vol. 31, pp. 380–391.
Jørgensen, M., Halkjelsvik, T., and Liestol, K., When should we (not) use the mean magnitude of relative error (MMRE) as an error measure in software development effort estimation?, Inf. Software Technol., 2022, vol. 143, pp. 1–4. https://doi.org/10.1016/j.infsof.2021.106784
Hastie, T., Tibshirani, R., and Friedman, J., The Elements of Statistical Learning, Data Mining, Inference, and Prediction, 2nd ed., New York, Springer, 2009. https://doi.org/10.1007/b94608
Yee, T.W., Vector Generaliz and Linear,Additive with an Implementation in R, Springer, 2015.
Hastie, T., Tibshirani, R., and Wainwright, M., Statistical Learning with Sparsity the Lasso and Generalizations, Chapman and Hall/CRC, 2016. https://doi.org/10.1111/insr.12167
Wood, S.N., Generalized Additive Models, 2nd ed., New York: Chapman and Hall/CRC, 2017. https://doi.org/10.1201/9781315370279
Hastie, T.J., Tibshirani, R.J., and Sasieni, P., Generalized additive models, Stat. Med., 1992, vol. 11, pp. 981–982.
McCullagh, P., Nelder, J.A. and Enderlein, G., Generalized linear models, Biom. J., 1987, vol. 29, pp. 206–206. https://doi.org/10.1002/bimj.4710290217
James, G., Witten, D., Hastie, T., and Tibshirani, R., An Introduction to Statistical Learning Gareth James Daniela Witten Trevor Hastie Robert Tibshirani with Applications in R, 1st ed., Springer, 2013. https://doi.org/10.1007/978-1-4614-7138-7
Yuan, M. and Lin, Y., Model selection and estimation in regression with grouped variables, J. R. Stat. Soc. Ser. B (Stat. Methodol.), 2006, vol. 68, pp. 49–67. https://doi.org/10.1111/j.1467-9868.2005.00532.x
Groll, A., Hambuckers, J., Kneib, T., and Umlauf, N., LASSO-type penalization in the framework of generalized additive models for location, scale and shape, Comput. Stat. Data Anal., 2018, vol. 140, pp. 59–73.
Wickham, H., ggplot2: Elegant Graphics for Data Analysis, 2nd ed., New York: Springer-Verlag, 2009.
Meier, L., van de Geer, S., and Bühlmann, P., The group lasso for logistic regression, J. R. Stat. Soc., 2008, vol. 70, pp. 53–71. https://doi.org/10.1111/j.1467-9868.2007.00627.x
Nelder, J. and Wedderburn, R., Generalized linear models, J. R. Stat. Soc., Ser. A, 1972, vol. 135, pp. 370–384.
Author information
Authors and Affiliations
Corresponding authors
Ethics declarations
The authors declare that they have no conflicts of interest.
APPENDIX A. DATABASE INFORMATION
APPENDIX A. DATABASE INFORMATION
In this appendix the main information about the database used is presented in Table form.
Rights and permissions
About this article
Cite this article
Valdés-Souto, F., Naranjo-Albarrán, L. Software Project Estimation Using Smooth Curve Methods and Variable Selection and Regularization Methods as an Alternative to Linear Regression Models when the Reference Database Presents a Wedge-shape Form. Program Comput Soft 48, 716–734 (2022). https://doi.org/10.1134/S0361768822080205
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1134/S0361768822080205