Skip to main content
Log in

Software cost estimation based on modified K-Modes clustering Algorithm

  • Published:
Natural Computing Aims and scope Submit manuscript

Abstract

Unsupervised technique like clustering may be used for software cost estimation in situations where parametric models are difficult to develop. This paper presents a software cost estimation model based on a modified K-Modes clustering algorithm. The aims of this paper are: first, the modified K-Modes clustering which is an enhancement over the simple K-Modes algorithm using a proper dissimilarity measure for mixed data types, is presented and second, the proposed K-Modes algorithm is applied for software cost estimation. We have compared our modified K-Modes algorithm with existing algorithms on different software cost estimation datasets, and results showed the effectiveness of our proposed algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Andreopoulos B, An A, Wang X (2005) Clustering the internet topology at multiple layers. WSEAS Trans Inf Sci Appl 2(10):1625–1634

    Google Scholar 

  • Aranganayagi S, Thangavel K (2009) Improved k-modes for categorical clustering using weighted dissimilarity measure. In: World Academy of Science, Engineering and Technology 3:813–819

    Google Scholar 

  • Arifoglu A (1993) A methodology for software cost estimation. ACM SIGSOFT Softw Eng Notes 18(2):96–105

    Article  Google Scholar 

  • Aroba J, Cuadrado-Gallego JJ, Sicilia MA, Ramos I, Garcia-Barriocanal E (2008) Segmented software cost estimation models based on fuzzy clustering. J Syst Softw 81(11):1944–1950

    Article  Google Scholar 

  • Bai L, Liang J, Dang C (2011) An initialization method to simultaneously find initial cluster centers and the number of clusters for clustering categorical data. J Knowl-Based Syst 24(6):785–795

    Article  Google Scholar 

  • Benala TR, Dehuri S, Mall R, ChinnaBabu K (2012) Software effort prediction using unsupervised learning (clustering) and functional link artificial neural networks. In: 2012 World congress on information and communication technologies (WICT), pp 115–120

  • Bishnu PS, Bhattacherjee V (2013) A modified k-modes clustering algorithm. PReMI, Indian Statistical Institute, Kolkata, LNCS 8251:60–66

  • Cao F, Liang J, Bai L (2009) A new initialization method for categorical data clustering. J Expert Syst Appl 36(7):10223–10228

    Article  Google Scholar 

  • Cao F, Liang J, Li D, Bai L, Dang C (2012) A dissimilarity measure for the k-modes clustering algorithm. J Knowl-Based Syst 26:120–127

    Article  Google Scholar 

  • Cuadrado-Gallego JJ, Sicilia MA (2007) An algorithm for the generation of segmented parametric software estimation models and its empirical evaluation. J Comput Inf 26(1):1–15

    MATH  Google Scholar 

  • Cuadrado-Gallego JJ, Sicilia MA, Rodriguez D, Garre M (2006) An empirical study of process-related attributes in segmented software cost-estimation relationships. J Syst Softw 79(3):353–361

    Article  Google Scholar 

  • Dejaeger K, Verbeke W, Martens D, Baesens B (2012) Data mining techniques for software effort estimation: a comparative study. IEEE Trans Softw Eng 38(2):375–397

    Article  Google Scholar 

  • Gan G, Yang Z, Wu J (2007) A genetic k-modes algorithm for clustering categorical data. Adv Data Min Appl 3584:195–202

    Article  Google Scholar 

  • Gan G, Wu J, Yang Z (2009) A genetic fuzzy k-modes algorithm for clustering categorical data. J Expert Syst Appl 36(2):1615–1620

    Article  Google Scholar 

  • Han J, Kamber M (2007) Data mining concepts and techniques, 2nd edn. Morgan Kaufmann publishers, Burlington

    MATH  Google Scholar 

  • He Z, Deng S, Xu X (2005) Improving k-modes algorithm considering the frequencies of attribute values in mode. In: Computational intelligence and security, LNCS 3801:157–162

  • Huang Z (1997) Clustering large data sets with mixed numeric and categorical values. In: 1st Pacific Asia knowledge discovery and data mining conference, World Scientific, pp 21–34

  • Huang Z (1998) Extensions to the k-means algorithm for clustering large data sets with categorical values. Data Min Knowl Discov 2(3):283–304

    Article  Google Scholar 

  • Huang Z, Ng MK (1999) A fuzzy k-modes algorithm for clustering categorical data. IEEE Trans Fuzzy Syst 7(4):446–452

    Article  Google Scholar 

  • Jiang G, Wang Y, Liu H (2012) Research on software cost evaluation model based on case -based reasoning. In: 2nd world congress on software engineering, pp 338–341

  • Keung J (2009) Software development cost estimation using analogy: a review. In: ASWEC, pp 327–336

  • Lefteris A, Ioannis S, Maurizio M (2001) Building a software cost estimation model based on categorical data. In: In Proceedings of 7th international software metrics symposium, pp 4–15

  • Lin JC, Tzeng HY (2010) Applying particle swarm optimization to estimate software effort by multiple factors software project clustering. In: International computer symposium, pp 1039–1044

  • Manganaro V, Paratore S, Alessi E, Coffa S, Cavallaro S (2005) Adding semantics to gene expression profiles: new tools for drug discovery. Curr Med Chem 12(10):1149–1160

    Article  Google Scholar 

  • Mittas N, Angelis L (2013) Ranking and clustering software cost estimation models through a multiple comparisons algorithm. IEEE Trans Softw Eng 39(4):537–551

    Article  Google Scholar 

  • Ng MK, Li MJ, Huang JZ, He Z (2007) On the impact of dissimilarity measure in k-modes clustering algorithm. IEEE Trans Pattern Anal Mach Intell 29(3):503–507

    Article  Google Scholar 

  • Omar S, Soliman OS, Saleh DA, Rashwan S (2012) A bio inspired fuzzy k-modes clustering algorithm. LNCS 7665:663–669

    Google Scholar 

  • Papatheocharous E, Andreou AS (2009) Approaching software cost estimation using entropy-based fuzzy k-modes clustering algorithm. In: AIAI workshop proceedings, pp 231–241

  • Stamelos I, Angelis L, Morisio M, Sakellaris E, Bleris GL (2003) Estimating the development cost of custom software. J Inf Manag 40(8):729–741

    Article  Google Scholar 

  • Sun Y, Zhu QM, Chen ZX (2002) An iterative initial-points refinement algorithm for categorical data clustering. Pattern Recognit Lett 23(7):875–884

    Article  MATH  Google Scholar 

  • Wu S, Jiang Q, Huang JZ (2007) A new initialization method for categorical data clustering. LNCS 4426:972–980

    Google Scholar 

  • Yao J, Dash M, Tan ST, Liu H (2000) Entropy-based fuzzy clustering and fuzzy modeling. J Fuzzy Sets Syst 113(3):381–388

    Article  MATH  Google Scholar 

  • Zhang W, Yang Y, Wang Q (2013) A study on software effort prediction using machine learning techniques. Eval Novel Approach Softw Eng Commun Comput Inf Sci 275:1–15

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Partha Sarathi Bishnu.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bishnu, P.S., Bhattacherjee, V. Software cost estimation based on modified K-Modes clustering Algorithm. Nat Comput 15, 415–422 (2016). https://doi.org/10.1007/s11047-015-9492-7

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11047-015-9492-7

Keywords

Navigation