Software cost estimation based on modified K-Modes clustering Algorithm

Bishnu, Partha Sarathi; Bhattacherjee, Vandana

doi:10.1007/s11047-015-9492-7

Software cost estimation based on modified K-Modes clustering Algorithm

Published: 08 February 2015

Volume 15, pages 415–422, (2016)
Cite this article

Natural Computing Aims and scope Submit manuscript

Partha Sarathi Bishnu¹ &
Vandana Bhattacherjee¹

461 Accesses
2 Citations
Explore all metrics

Abstract

Unsupervised technique like clustering may be used for software cost estimation in situations where parametric models are difficult to develop. This paper presents a software cost estimation model based on a modified K-Modes clustering algorithm. The aims of this paper are: first, the modified K-Modes clustering which is an enhancement over the simple K-Modes algorithm using a proper dissimilarity measure for mixed data types, is presented and second, the proposed K-Modes algorithm is applied for software cost estimation. We have compared our modified K-Modes algorithm with existing algorithms on different software cost estimation datasets, and results showed the effectiveness of our proposed algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Threshold Extraction Framework for Software Metrics

Article 06 September 2019

Software Clustering Using Automated Feature Subset Selection

A New Binary Similarity Measure Based on Integration of the Strengths of Existing Measures: Application to Software Clustering

References

Andreopoulos B, An A, Wang X (2005) Clustering the internet topology at multiple layers. WSEAS Trans Inf Sci Appl 2(10):1625–1634
Google Scholar
Aranganayagi S, Thangavel K (2009) Improved k-modes for categorical clustering using weighted dissimilarity measure. In: World Academy of Science, Engineering and Technology 3:813–819
Google Scholar
Arifoglu A (1993) A methodology for software cost estimation. ACM SIGSOFT Softw Eng Notes 18(2):96–105
Article Google Scholar
Aroba J, Cuadrado-Gallego JJ, Sicilia MA, Ramos I, Garcia-Barriocanal E (2008) Segmented software cost estimation models based on fuzzy clustering. J Syst Softw 81(11):1944–1950
Article Google Scholar
Bai L, Liang J, Dang C (2011) An initialization method to simultaneously find initial cluster centers and the number of clusters for clustering categorical data. J Knowl-Based Syst 24(6):785–795
Article Google Scholar
Benala TR, Dehuri S, Mall R, ChinnaBabu K (2012) Software effort prediction using unsupervised learning (clustering) and functional link artificial neural networks. In: 2012 World congress on information and communication technologies (WICT), pp 115–120
Bishnu PS, Bhattacherjee V (2013) A modified k-modes clustering algorithm. PReMI, Indian Statistical Institute, Kolkata, LNCS 8251:60–66
Cao F, Liang J, Bai L (2009) A new initialization method for categorical data clustering. J Expert Syst Appl 36(7):10223–10228
Article Google Scholar
Cao F, Liang J, Li D, Bai L, Dang C (2012) A dissimilarity measure for the k-modes clustering algorithm. J Knowl-Based Syst 26:120–127
Article Google Scholar
Cuadrado-Gallego JJ, Sicilia MA (2007) An algorithm for the generation of segmented parametric software estimation models and its empirical evaluation. J Comput Inf 26(1):1–15
MATH Google Scholar
Cuadrado-Gallego JJ, Sicilia MA, Rodriguez D, Garre M (2006) An empirical study of process-related attributes in segmented software cost-estimation relationships. J Syst Softw 79(3):353–361
Article Google Scholar
Dejaeger K, Verbeke W, Martens D, Baesens B (2012) Data mining techniques for software effort estimation: a comparative study. IEEE Trans Softw Eng 38(2):375–397
Article Google Scholar
Gan G, Yang Z, Wu J (2007) A genetic k-modes algorithm for clustering categorical data. Adv Data Min Appl 3584:195–202
Article Google Scholar
Gan G, Wu J, Yang Z (2009) A genetic fuzzy k-modes algorithm for clustering categorical data. J Expert Syst Appl 36(2):1615–1620
Article Google Scholar
Han J, Kamber M (2007) Data mining concepts and techniques, 2nd edn. Morgan Kaufmann publishers, Burlington
MATH Google Scholar
He Z, Deng S, Xu X (2005) Improving k-modes algorithm considering the frequencies of attribute values in mode. In: Computational intelligence and security, LNCS 3801:157–162
Huang Z (1997) Clustering large data sets with mixed numeric and categorical values. In: 1st Pacific Asia knowledge discovery and data mining conference, World Scientific, pp 21–34
Huang Z (1998) Extensions to the k-means algorithm for clustering large data sets with categorical values. Data Min Knowl Discov 2(3):283–304
Article Google Scholar
Huang Z, Ng MK (1999) A fuzzy k-modes algorithm for clustering categorical data. IEEE Trans Fuzzy Syst 7(4):446–452
Article Google Scholar
Jiang G, Wang Y, Liu H (2012) Research on software cost evaluation model based on case -based reasoning. In: 2nd world congress on software engineering, pp 338–341
Keung J (2009) Software development cost estimation using analogy: a review. In: ASWEC, pp 327–336
Lefteris A, Ioannis S, Maurizio M (2001) Building a software cost estimation model based on categorical data. In: In Proceedings of 7th international software metrics symposium, pp 4–15
Lin JC, Tzeng HY (2010) Applying particle swarm optimization to estimate software effort by multiple factors software project clustering. In: International computer symposium, pp 1039–1044
Manganaro V, Paratore S, Alessi E, Coffa S, Cavallaro S (2005) Adding semantics to gene expression profiles: new tools for drug discovery. Curr Med Chem 12(10):1149–1160
Article Google Scholar
Mittas N, Angelis L (2013) Ranking and clustering software cost estimation models through a multiple comparisons algorithm. IEEE Trans Softw Eng 39(4):537–551
Article Google Scholar
Ng MK, Li MJ, Huang JZ, He Z (2007) On the impact of dissimilarity measure in k-modes clustering algorithm. IEEE Trans Pattern Anal Mach Intell 29(3):503–507
Article Google Scholar
Omar S, Soliman OS, Saleh DA, Rashwan S (2012) A bio inspired fuzzy k-modes clustering algorithm. LNCS 7665:663–669
Google Scholar
Papatheocharous E, Andreou AS (2009) Approaching software cost estimation using entropy-based fuzzy k-modes clustering algorithm. In: AIAI workshop proceedings, pp 231–241
Stamelos I, Angelis L, Morisio M, Sakellaris E, Bleris GL (2003) Estimating the development cost of custom software. J Inf Manag 40(8):729–741
Article Google Scholar
Sun Y, Zhu QM, Chen ZX (2002) An iterative initial-points refinement algorithm for categorical data clustering. Pattern Recognit Lett 23(7):875–884
Article MATH Google Scholar
Wu S, Jiang Q, Huang JZ (2007) A new initialization method for categorical data clustering. LNCS 4426:972–980
Google Scholar
Yao J, Dash M, Tan ST, Liu H (2000) Entropy-based fuzzy clustering and fuzzy modeling. J Fuzzy Sets Syst 113(3):381–388
Article MATH Google Scholar
Zhang W, Yang Y, Wang Q (2013) A study on software effort prediction using machine learning techniques. Eval Novel Approach Softw Eng Commun Comput Inf Sci 275:1–15
Google Scholar

Download references

Author information

Authors and Affiliations

Birla Institute of Technology, Ranchi, 834001, India
Partha Sarathi Bishnu & Vandana Bhattacherjee

Authors

Partha Sarathi Bishnu
View author publications
You can also search for this author in PubMed Google Scholar
Vandana Bhattacherjee
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Partha Sarathi Bishnu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bishnu, P.S., Bhattacherjee, V. Software cost estimation based on modified K-Modes clustering Algorithm. Nat Comput 15, 415–422 (2016). https://doi.org/10.1007/s11047-015-9492-7

Download citation

Published: 08 February 2015
Issue Date: September 2016
DOI: https://doi.org/10.1007/s11047-015-9492-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Software cost estimation based on modified K-Modes clustering Algorithm

Abstract

Access this article

Similar content being viewed by others

Threshold Extraction Framework for Software Metrics

Software Clustering Using Automated Feature Subset Selection

A New Binary Similarity Measure Based on Integration of the Strengths of Existing Measures: Application to Software Clustering

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Software cost estimation based on modified K-Modes clustering Algorithm

Abstract

Access this article

Similar content being viewed by others

Threshold Extraction Framework for Software Metrics

Software Clustering Using Automated Feature Subset Selection

A New Binary Similarity Measure Based on Integration of the Strengths of Existing Measures: Application to Software Clustering

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation