Abstract
Unsupervised technique like clustering may be used for software cost estimation in situations where parametric models are difficult to develop. This paper presents a software cost estimation model based on a modified K-Modes clustering algorithm. The aims of this paper are: first, the modified K-Modes clustering which is an enhancement over the simple K-Modes algorithm using a proper dissimilarity measure for mixed data types, is presented and second, the proposed K-Modes algorithm is applied for software cost estimation. We have compared our modified K-Modes algorithm with existing algorithms on different software cost estimation datasets, and results showed the effectiveness of our proposed algorithm.
Similar content being viewed by others
References
Andreopoulos B, An A, Wang X (2005) Clustering the internet topology at multiple layers. WSEAS Trans Inf Sci Appl 2(10):1625–1634
Aranganayagi S, Thangavel K (2009) Improved k-modes for categorical clustering using weighted dissimilarity measure. In: World Academy of Science, Engineering and Technology 3:813–819
Arifoglu A (1993) A methodology for software cost estimation. ACM SIGSOFT Softw Eng Notes 18(2):96–105
Aroba J, Cuadrado-Gallego JJ, Sicilia MA, Ramos I, Garcia-Barriocanal E (2008) Segmented software cost estimation models based on fuzzy clustering. J Syst Softw 81(11):1944–1950
Bai L, Liang J, Dang C (2011) An initialization method to simultaneously find initial cluster centers and the number of clusters for clustering categorical data. J Knowl-Based Syst 24(6):785–795
Benala TR, Dehuri S, Mall R, ChinnaBabu K (2012) Software effort prediction using unsupervised learning (clustering) and functional link artificial neural networks. In: 2012 World congress on information and communication technologies (WICT), pp 115–120
Bishnu PS, Bhattacherjee V (2013) A modified k-modes clustering algorithm. PReMI, Indian Statistical Institute, Kolkata, LNCS 8251:60–66
Cao F, Liang J, Bai L (2009) A new initialization method for categorical data clustering. J Expert Syst Appl 36(7):10223–10228
Cao F, Liang J, Li D, Bai L, Dang C (2012) A dissimilarity measure for the k-modes clustering algorithm. J Knowl-Based Syst 26:120–127
Cuadrado-Gallego JJ, Sicilia MA (2007) An algorithm for the generation of segmented parametric software estimation models and its empirical evaluation. J Comput Inf 26(1):1–15
Cuadrado-Gallego JJ, Sicilia MA, Rodriguez D, Garre M (2006) An empirical study of process-related attributes in segmented software cost-estimation relationships. J Syst Softw 79(3):353–361
Dejaeger K, Verbeke W, Martens D, Baesens B (2012) Data mining techniques for software effort estimation: a comparative study. IEEE Trans Softw Eng 38(2):375–397
Gan G, Yang Z, Wu J (2007) A genetic k-modes algorithm for clustering categorical data. Adv Data Min Appl 3584:195–202
Gan G, Wu J, Yang Z (2009) A genetic fuzzy k-modes algorithm for clustering categorical data. J Expert Syst Appl 36(2):1615–1620
Han J, Kamber M (2007) Data mining concepts and techniques, 2nd edn. Morgan Kaufmann publishers, Burlington
He Z, Deng S, Xu X (2005) Improving k-modes algorithm considering the frequencies of attribute values in mode. In: Computational intelligence and security, LNCS 3801:157–162
Huang Z (1997) Clustering large data sets with mixed numeric and categorical values. In: 1st Pacific Asia knowledge discovery and data mining conference, World Scientific, pp 21–34
Huang Z (1998) Extensions to the k-means algorithm for clustering large data sets with categorical values. Data Min Knowl Discov 2(3):283–304
Huang Z, Ng MK (1999) A fuzzy k-modes algorithm for clustering categorical data. IEEE Trans Fuzzy Syst 7(4):446–452
Jiang G, Wang Y, Liu H (2012) Research on software cost evaluation model based on case -based reasoning. In: 2nd world congress on software engineering, pp 338–341
Keung J (2009) Software development cost estimation using analogy: a review. In: ASWEC, pp 327–336
Lefteris A, Ioannis S, Maurizio M (2001) Building a software cost estimation model based on categorical data. In: In Proceedings of 7th international software metrics symposium, pp 4–15
Lin JC, Tzeng HY (2010) Applying particle swarm optimization to estimate software effort by multiple factors software project clustering. In: International computer symposium, pp 1039–1044
Manganaro V, Paratore S, Alessi E, Coffa S, Cavallaro S (2005) Adding semantics to gene expression profiles: new tools for drug discovery. Curr Med Chem 12(10):1149–1160
Mittas N, Angelis L (2013) Ranking and clustering software cost estimation models through a multiple comparisons algorithm. IEEE Trans Softw Eng 39(4):537–551
Ng MK, Li MJ, Huang JZ, He Z (2007) On the impact of dissimilarity measure in k-modes clustering algorithm. IEEE Trans Pattern Anal Mach Intell 29(3):503–507
Omar S, Soliman OS, Saleh DA, Rashwan S (2012) A bio inspired fuzzy k-modes clustering algorithm. LNCS 7665:663–669
Papatheocharous E, Andreou AS (2009) Approaching software cost estimation using entropy-based fuzzy k-modes clustering algorithm. In: AIAI workshop proceedings, pp 231–241
Stamelos I, Angelis L, Morisio M, Sakellaris E, Bleris GL (2003) Estimating the development cost of custom software. J Inf Manag 40(8):729–741
Sun Y, Zhu QM, Chen ZX (2002) An iterative initial-points refinement algorithm for categorical data clustering. Pattern Recognit Lett 23(7):875–884
Wu S, Jiang Q, Huang JZ (2007) A new initialization method for categorical data clustering. LNCS 4426:972–980
Yao J, Dash M, Tan ST, Liu H (2000) Entropy-based fuzzy clustering and fuzzy modeling. J Fuzzy Sets Syst 113(3):381–388
Zhang W, Yang Y, Wang Q (2013) A study on software effort prediction using machine learning techniques. Eval Novel Approach Softw Eng Commun Comput Inf Sci 275:1–15
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Bishnu, P.S., Bhattacherjee, V. Software cost estimation based on modified K-Modes clustering Algorithm. Nat Comput 15, 415–422 (2016). https://doi.org/10.1007/s11047-015-9492-7
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11047-015-9492-7