Abstract
Analogy-based software effort estimation (ASEE) plays an important role in software development. It attracts the attention of researchers nowadays due to the simplicity of the ASEE reasoning method. ASEE reasoning is considered simple because it is similar to human reasoning. The estimation approach repeatedly uses the effort values of preceding similar projects. In this approach, the appropriate number of similar previous projects to be reused is still a topic of debate in ASEE research studies. The reliability and accuracy of ASEE methods are considerably affected by the quality of software repositories (datasets). Therefore, if a software dataset does not follow the ASEE principle, then it is not considered useful for the ASEE method. This article presents a novel approach for ASEE to find the appropriate number of analogues from quality datasets. In this approach, the data pre-processing stage is based on Spearman’s rank-order correlation and Kruskal–Wallis test. In the proposed approach, it can deal with categorical (both nominal and ordinal) attributes individually. Spearman’s rank-order correlation is used to find reliable numerical and ordinal attributes. Kruskal–Wallis test identifies reliable nominal attributes. Reliable attributes refer to those attributes which significantly influence the effort. The experimental results show that the proposed approach enhances the quality of the dataset, attribute selection from the metadata, and reduces the abnormal observation and overall project development cost.
Similar content being viewed by others
Data availability
References
Resmi, V., Vijayalakshmi, S., Chandrabose, R.S.: An effective software project effort estimation system using optimal firefly algorithm. Clust. Comput. 22(5), 11329–11338 (2019)
Shepperd, M., Schofield, C.: Estimating software project effort using analogies. IEEE Trans. Softw. Eng. 23(11), 736–743 (1997)
Auer, M., Trendowicz, A., Graser, B., Haunschmid, E., Biffl, S.: Optimal project feature weights in analogy-based cost estimation: improvement and limitations. IEEE Trans. Softw. Eng. 32(2), 83–92 (2006)
Edinson, P., Muthuraj, L.: Performance analysis of fcm based anfis and elman neural network in software effort estimation. Int. Arab J. Inf. Technol 15(1), 94–102 (2018)
Boehm, B., Abts, C., Chulani, S.: Software development cost estimation approaches-a survey. Ann. Softw. Eng. 10(1), 177–205 (2000)
Kosti, M. V., Mittas, N., Angelis, L.: Dd-eba: an algorithm for determining the number of neighbors in cost estimation by analogy using distance distributions. arXiv preprint arXiv:1012.5755
Azzeh, M., Neagu, D., Cowling, P.: Software project similarity measurement based on fuzzy c-means. In: International Conference on Software Process, Springer, pp. 123–134 (2008)
Nazir, S., Shahzad, S., Atan, R.B., Farman, H.: Estimation of software features based birthmark. Clust. Comput. 21(1), 333–346 (2018)
Li, Y.-F., Xie, M., Goh, T.N.: A study of project selection and feature weighting for analogy based software cost estimation. J. Syst. Softw. 82(2), 241–252 (2009)
Azzeh, M., Nassif, A.B.: Analogy-based effort estimation: a new method to discover set of analogies from dataset characteristics. IET Softw. 9(2), 39–50 (2015)
Azzeh, M., Neagu, D., Cowling, P.I.: Fuzzy grey relational analysis for software effort estimation. Empir. Softw. Eng. 15(1), 60–90 (2010)
Suresh Kumar, P., Behera, H., Nayak, J., Naik, B.: A pragmatic ensemble learning approach for effective software effort estimation. Innov.s Syst. Softw. Eng. 18(2), 283–299 (2022)
Khatibi, V., Jawawi, D.N., Khatibi, E.: Increasing the accuracy of analogy based software development effort estimation using neural networks. Int. J. Comput. Commun. Eng. 2(1), 78 (2013)
Angelis, L., Stamelos, I.: A simulation tool for efficient analogy based cost estimation. Empir. Softw. Eng. 5(1), 35–68 (2000)
Mahmood, Y., Kama, N., Azmi, A., Khan, A.S., Ali, M.: Software effort estimation accuracy prediction of machine learning techniques: a systematic performance evaluation. Softw.: Pract. Exp. 52(1), 39–65 (2022)
Wieczorek, I.: Improved software cost estimation-a robust and interpretable modelling method and a comprehensive empirical investigation. Empir. Softw. Eng. 7(2), 177–180 (2002)
Myrtveit, I., Stensrud, E.: A controlled experiment to assess the benefits of estimating with analogy and regression models. IEEE Trans. Softw. Eng. 25(4), 510–525 (1999)
Yücalar, F., Kilinc, D., Borandag, E., Ozcift, A.: Regression analysis based software effort estimation method. Int. J. Softw. Eng. Knowl. Eng. 26(05), 807–826 (2016)
Liu, Q., Xiao, J., Zhu, H.: Feature selection for software effort estimation with localized neighborhood mutual information. Clust. Comput. 22(3), 6953–6961 (2019)
Bardsiri, V.K., Jawawi, D.N.A., Hashim, S.Z.M., Khatibi, E.: Increasing the accuracy of software development effort estimation using projects clustering. IET Softw. 6(6), 461–473 (2012)
Malathi, S., Sridhar, S.: Estimation of effort in software cost analysis for heterogenous dataset using fuzzy analogy. arXiv preprint arXiv:1211.1136
Humayun, M., Gang, C.: Estimating effort in global software development projects using machine learning techniques. Int. J. Inf. Educ. Technol. 2(3), 208 (2012)
Prabhakar, M.D., Dutta, M.: Prediction of software effort using artificial neural network and support vector machine. Int. J. Adv. Res. Comput. Sci. Softw. Eng. 3(3), 40–46 (2013)
Araujo, R. d A., Oliveira, A..L., Meira, S.: A class of hybrid multilayer perceptrons for software development effort estimation problems. Expert Syst. Appl. 90, 1–12 (2017)
Sakhrawi, Z., Sellami, A., Bouassida, N.: Software enhancement effort estimation using correlation-based feature selection and stacking ensemble method. Clust. Comput. 25(4), 2779–2792 (2022)
Kaushik, A., Verma, S., Singh, H.J., Chhabra, G.: Software cost optimization integrating fuzzy system and coa-cuckoo optimization algorithm. Int. J. Syst. Assur. Eng. Manag. 8(2), 1461–1471 (2017)
Satapathy, S.M., Kumar, M., Rath, S.K.: Fuzzy-class point approach for software effort estimation using various adaptive regression methods. CSI Trans. ICT 1(4), 367–380 (2013)
Borandag, E., Yucalar, F., Erdogan, S.Z.: A case study for the software size estimation through MK II FPA and FP methods. Int. J. Comput. Appl. Technol. 53(4), 309–314 (2016)
Kocaguneli, E., Menzies, T., Bener, A., Keung, J.W.: Exploiting the essential assumptions of analogy-based effort estimation. IEEE Trans. Softw. Eng. 38(2), 425–438 (2011)
Zhu, B., Yu, L.-A., Geng, Z.-Q.: Cost estimation method based on parallel monte Carlo simulation and market investigation for engineering construction project. Clust. Comput. 19(3), 1293–1308 (2016)
Baker, D.R.: A hybrid approach to expert and model based effort estimation. West Virginia University (2007)
Chinthanet, B., Phannachitta, P., Kamei, Y., Leelaprute, P., Rungsawang, A., Ubayashi, N., Matsumoto, K.: A review and comparison of methods for determining the best analogies in analogy-based software effort estimation. In: Proceedings of the 31st Annual ACM Symposium on Applied Computing, pp. 1554–1557 (2016)
Kitchenham, B., Mendes, E.: Why comparative effort prediction studies may be invalid. In: Proceedings of the 5th International Conference on Predictor Models in Software Engineering, pp. 1–5 (2009)
Kirsopp, C., Mendes, E., Premraj, R., Shepperd, M.: An empirical analysis of linear adaptation techniques for case-based prediction. In: International Conference on Case-Based Reasoning, Springer, pp. 231–245 (2003)
Idri, A., Abran, A., Khoshgoftaar, T.: Fuzzy analogy: a new approach for software cost estimation. In: International Workshop on Software Measurement, Citeseer, pp. 28–29 (2001)
Li, J., Ruhe, G., Al-Emran, A., Richter, M.M.: A flexible method for software effort estimation by analogy. Empir. Softw. Eng. 12(1), 65–106 (2007)
JH, Z.: Spearman rank correlation. Encyclopedia of Biostatistics, 7, (2005). https://doi.org/10.1002/0470011815
Xia, X., Lo, D., Bao, L., Sharma, A., Li, S.:Personality and project success: insights from a large-scale study with professionals. In: 2017 IEEE International Conference on Software Maintenance and Evolution (ICSME), IEEE, pp. 318–328 (2017)
Azzeh, M., Elsheikh, Y.: Learning best k analogies from data distribution for case-based software effort estimation. arXiv preprint arXiv:1703.04567
Shirabad, J.S., Menzies, T.: The promise repository of software engineering databases. School of Information Technology and Engineering, University of Ottawa, Canada, http://promise.site.uottawa.ca/SERepository
Azzeh, M.: Dataset quality assessment: an extension for analogy based effort estimation. Int. J. Comput. Sci. Eng. Surv. 4(1), S6 (2013)
Funding
This research received no specific Grant from any funding agency.
Author information
Authors and Affiliations
Contributions
All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Ethical approval
This manuscript is the authors’ own original work, which has not been previously published elsewhere.
Informed consent
Research does not involve humans.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Pal, N., Yadav, M.P. & Yadav, D.K. Appropriate number of analogues in analogy based software effort estimation using quality datasets. Cluster Comput 27, 531–546 (2024). https://doi.org/10.1007/s10586-023-03967-2
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10586-023-03967-2