Skip to main content
Log in

Aggregation of multi-objective fuzzy symmetry-based clustering techniques for improving gene and cancer classification

  • Focus
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

The current work reports about the application of a cluster ensemble approach in combining results produced by some multiobjective-based clustering techniques. Firstly, some multiobjective-based fuzzy clustering techniques are developed using the search capabilities of differential evolution and particle swarm optimization. Both these clustering techniques utilize a recently developed point symmetry-based distance for allocation of points to different clusters. The appropriate partitioning from a data set is identified by optimizing simultaneously two cluster quality measures, namely Xie–Beni index and FSym-index. First objective function uses Euclidean distance as a similarity measure, and the second objective function uses point symmetry-based distance in its computation. A set of trade-off solutions are produced by each of these clustering techniques on the final Pareto optimal front. Finally, this set of solutions are combined using a link-based cluster ensemble technique. The effectiveness of ensemble techniques is illustrated on partitioning some real-life gene expression and cancer data sets where automatic identification of set of genes or set of cancer tissues is a pressing issue. The potency of the ensemble techniques applied on both the multi-objective DE- and PSO-based clustering approaches is shown in comparison with several state-of-the-art techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. http://cmgm.stanford.edu/pbrown/sporulation.

  2. http://faculty.washington.edu/kayee/cluster.

  3. http://faculty.washington.edu/kayee/cluster.

  4. http://homes.esat.kuleuven.be/thijs/Work/Clustering.html.

  5. http://www.sciencemag.org/feature/data/984559.shl.

  6. http://algorithmics.molgen.mpg.de/Static/Supplements/.

  7. http://www.ailab.si/supp/bi-cancer/projections/info/SRBCT.htm.

References

  • Acharya S, Saha S, Thadisina Y (2016) Multiobjective simulated annealing-based clustering of tissue samples for cancer diagnosis. IEEE J Biomed Health Inform 20(2):691–698

    Article  Google Scholar 

  • Alaei HK, Salahshoor K, Alaei HK (2013) A new integrated on-line fuzzy clustering and segmentation methodology with adaptive PCA approach for process monitoring and fault detection and diagnosis. Soft Comput 17(3):345–362

    Article  Google Scholar 

  • Alizadeh AA, Eisen MB, Davis RE, Ma C, Lossos IS, Rosenwald A, Boldrick JC, Sabet H, Tran T, Yu X, Powell JI, Yang L, Marti GE, Moore T, Hudson JJ, Lu L, Lewis DB, Tibshirani R, Sherlock G, Chan WC, Greiner TC, Weisenburger DD, Armitage JO, Warnke R, Levy R, Wilson W, Grever MR, Byrd JC, Botstein D, Brown PO, Staudt LM (2000) Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature 403(6769):503–511

    Article  Google Scholar 

  • Alon U, Barkai N, Notterman DA, Gish K, Ybarra S, Mack D, Levine AJ (1999) Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc Natl Acad Sci 96(12):6745–6750

    Article  Google Scholar 

  • Bakhshali M (2017) Segmentation and enhancement of brain mr images using fuzzy clustering based on information theory. Soft Comput. https://doi.org/10.1007/s00500-016-2210-2

  • Bandyopadhyay S, Saha S (2007) GAPS: a clustering method using a new point symmetry based distance measure. Pattern Recognit 40(12):3430–3451

    Article  MATH  Google Scholar 

  • Bandyopadhyay S, Saha S (2013) Unsupervised classification—similarity measures, classical and metaheuristic approaches, and applications. Springer, Berlin

    MATH  Google Scholar 

  • Bandyopadhyay S, Maulik U, Wang JT (eds) (2007a) Analysis of biological data: a soft computing approach. Volume 3 of science, engineering, and biology informatics. World Scientific, Singapore

    Google Scholar 

  • Bandyopadhyay S, Maulik U, Mukhopadhyay A (2007b) Multiobjective genetic clustering for pixel classification in remote sensing imagery. IEEE Trans Geosci Remote Sens 45(5–2):1506–1511

    Article  Google Scholar 

  • Bandyopadhyay S, Mukhopadhyay A, Maulik U (2007c) An improved algorithm for clustering gene expression data. Bioinformatics 23(21):2859–2865

    Article  Google Scholar 

  • Bezdek JC (1981) Pattern recognition with fuzzy objective function algorithms. Plenum, New York

    Book  MATH  Google Scholar 

  • Brunet JP, Tamayo P, Golub TR, Mesirov JP (2004) Metagenes and molecular pattern discovery using matrix factorization. Proc Natl Acad Sci 101(12):4164–4169

    Article  Google Scholar 

  • Calado P, Cristo M, Gonçalves MA, de Moura ES, Ribeiro-Neto BA, Ziviani N (2006) Link-based similarity measures for the classification of web documents. JASIST 57(2):208–221

    Article  Google Scholar 

  • Chen Y, Li K, Chen Z, Wang J (2017) Restricted gene expression programming: a new approach for parameter identification inverse problems of partial differential equation. Soft Comput 21(10):2651–2663

    Article  Google Scholar 

  • Cherkassky V (1997) The nature of statistical learning theory. IEEE Trans Neural Netw 8(6):1564

    Article  Google Scholar 

  • Chitsaz E, Jahromi MZ (2016) A novel soft subspace clustering algorithm with noise detection for high dimensional datasets. Soft Comput 20(11):4463–4472

    Article  MATH  Google Scholar 

  • Das S, Konar A, Chakraborty UK (2005) Two improved differential evolution schemes for faster global search. In: Genetic and evolutionary computation conference, GECCO 2005, proceedings, Washington DC, USA, June 25–29, pp 991–998

  • Das S, Abraham A, Konar A (2008) Automatic clustering using an improved differential evolution algorithm. IEEE Trans Syst Man Cybern Part A 38(1):218–237

    Article  Google Scholar 

  • de Souto MCP, Costa IG, de Araujo DSA, Ludermir TB, Schliep A (2008) Clustering cancer gene expression data: a comparative study. BMC Bioinform 9:497

    Article  Google Scholar 

  • Deb K, Pratap A, Agarwal S, Meyarivan T (2002) A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans Evol Comput 6(2):182–197

    Article  Google Scholar 

  • Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30

    MathSciNet  MATH  Google Scholar 

  • Dorigo M, Stützle T (2004) Ant colony optimization. Bradford Company, Scituate

    MATH  Google Scholar 

  • Du X, Ni Y, Xie D, Yao X, Ye P, Xiao R (2015) The time complexity analysis of a class of gene expression programming. Soft Comput 19(6):1611–1625

    Article  MATH  Google Scholar 

  • Eisen M, Spellman P, Brown P, Botstein D (1998a) Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci 85:14863–14868

    Article  Google Scholar 

  • Eisen MB, Spellman PT, Brown PO, Botstein D (1998b) Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci USA 95(25):14863–8

    Article  Google Scholar 

  • Fern XZ, Brodley CE (2004) Solving cluster ensemble problems by bipartite graph partitioning. In: Machine learning, proceedings of the twenty-first international conference (ICML 2004), Banff, Alberta, Canada, July 4–8

  • Fraley C, Raftery AE (1998) How many clusters? Which clustering method? Answers via model-based cluster analysis. Comput J 41(8):578–588

    Article  MATH  Google Scholar 

  • Fred ALN, Jain AK (2005) Combining multiple clusterings using evidence accumulation. IEEE Trans Pattern Anal Mach Intell 27(6):835–850

    Article  Google Scholar 

  • Friedman M (1937) The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J Am Stat Assoc 32(200):675–701

    Article  MATH  Google Scholar 

  • Ghosh D, Chinnaiyan AM (2002) Mixture modelling of gene expression data from microarray experiments. Bioinformatics 18(2):275–286

    Article  Google Scholar 

  • Handl J, Knowles J (2007) An evolutionary approach to multiobjective clustering. IEEE Trans Evol Comput 11(1):56–76

    Article  Google Scholar 

  • Herwig R, Poustka AJ, Mller C, Bull C, Lehrach H, O’Brien J (1999) Large-scale clustering of cdna-fingerprinting data. Genome Res 9(11):1093–105

    Article  Google Scholar 

  • Heyer LJ, Kruglyak S, Yooseph S (1999) Exploring expression data: identification and analysis of coexpressed genes. Genome Res 9(11):1106–1115

    Article  Google Scholar 

  • Iam-on N, Boongoen T, Garrett SM (2008) Refining pairwise similarity matrix for cluster ensemble problem with cluster relations. In: Discovery science, 11th international conference, DS 2008, Budapest, Hungary, October 13–16, 2008. Proceedings, pp 222–233

  • Jain AK, Dubes RC (1988) Algorithms for clustering data. Prentice-Hall, Englewood Cliffs

    MATH  Google Scholar 

  • Jain AK, Murty MN, Flynn PJ (1999) Data clustering: a review. ACM Comput Surv 31(3):264–323

    Article  Google Scholar 

  • Jarman IH, Etchells TA, Bacciu D, Garibaldi JM, Ellis IO, Lisboa PJG (2011) Clustering of protein expression data: a benchmark of statistical and neural approaches. Soft Comput 15(8):1459–1469

    Article  Google Scholar 

  • Karaboga D, Basturk B (2008) On the performance of artificial bee colony (ABC) algorithm. Appl Soft Comput 8(1):687–697

    Article  Google Scholar 

  • Kennedy J, Eberhart RC (1995) Particle swarm optimization. In: Proceedings of the IEEE international conference on neural networks, pp 1942–1948

  • Kittler J, Hatef M, Duin RPW, Matas J (1998) On combining classifiers. IEEE Trans Pattern Anal Mach Intell 20(3):226–239

    Article  Google Scholar 

  • Klink S, Reuther P, Weber A, Walter B, Ley M (2006) Analysing social networks within bibliographical data. In: Database and expert systems applications, 17th international conference, DEXA 2006, Kraków, Poland, September 4–8, 2006, Proceedings, pp 234–243

  • Kuo RJ, Wang MJ, Huang TW (2011) An application of particle swarm optimization algorithm to clustering analysis. Soft Comput 15(3):533–542

    Article  Google Scholar 

  • Li D (2011) Gene expression studies with DGL global optimization for the molecular classification of cancer. Soft Comput 15(1):111–129

    Article  Google Scholar 

  • Li Y, Yang G, He H, Jiao L, Shang R (2016) A study of large-scale data clustering based on fuzzy clustering. Soft Comput 20(8):3231–3242

    Article  Google Scholar 

  • Liu L, Hawkins D, Ghosh S, Young S (2003) Robust singular value decomposition analysis of microarray data. Proc Natl Acad Sci 100:13167–13172

    Article  MathSciNet  MATH  Google Scholar 

  • Lockhart DJ, Winzeler EA (2000) Genomics, gene expression and DNA arrays. Nature 405(6788):827–36

    Article  Google Scholar 

  • Lu Y, Liang M, Ye Z, Cao L (2015) Improved particle swarm optimization algorithm and its application in text feature selection. Appl Soft Comput 35:629–636

    Article  Google Scholar 

  • Maulik U, Saha I (2010) Automatic fuzzy clustering using modified differential evolution for image classification. IEEE Trans Geosci Remote Sens 48(9):3503–3510

    Article  Google Scholar 

  • Maulik U, Mukhopadhyay A, Bandyopadhyay S (2009) Combining pareto-optimal clusters using supervised learning for identifying co-expressed genes. BMC Bioinform 10(27):1197–1208

    Google Scholar 

  • Nemenyi P (1963) Distribution-free multiple comparisons. Ph.D. thesis, New Jersey, USA

  • Ni Q, Pan Q, Du H, Cao C, Zhai Y (2017) A novel cluster head selection algorithm based on fuzzy clustering and particle swarm optimization. IEEE/ACM Trans Comput Biol Bioinform 14(1):76–84

    Article  Google Scholar 

  • Noorbehbahani F, Mousavi SR, Mirzaei A (2015) An incremental mixed data clustering method using a new distance measure. Soft Comput 19(3):731–743

    Article  Google Scholar 

  • Pakhira MK, Maulik U, Bandyopadhyay S (2004) Validity index for crisp and fuzzy clusters. Pattern Recognit 37(3):487–501

    Article  MATH  Google Scholar 

  • Re M (2011) Comparing early and late data fusion methods for gene expression prediction. Soft Comput 15(8):1497–1504

    Article  MathSciNet  Google Scholar 

  • Rousseeuw P (1987) Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math 20(1):53–65

  • Saha S (2017) Enhancing point symmetry-based distance for data clustering. Soft Comput. https://doi.org/10.1007/s00500-016-2477-3

  • Saha S, Bandyopadhyay S (2009) A new point symmetry based fuzzy genetic clustering technique for automatic evolution of clusters. Inf Sci 179(19):3230–3246

    Article  MATH  Google Scholar 

  • Saha S, Ekbal A, Gupta K, Bandyopadhyay S (2013) Gene expression data clustering using a multiobjective symmetry based clustering technique. Comput Biol Med 43(11):1965–1977

    Article  Google Scholar 

  • Saha S, Kaushik K, Alok AK, Acharya S (2016) Multi-objective semi-supervised clustering of tissue samples for cancer diagnosis. Soft Comput 20(9):3381–3392

    Article  Google Scholar 

  • Sharan R, Shamir R (2000) Center CLICK: a clustering algorithm with applications to gene expression analysis. In: Proceedings of the eighth international conference on intelligent systems for molecular biology, August 19–23, 2000, La Jolla/San Diego, CA, USA, pp 307–316

  • Sherlock G (2000) Analysis of large-scale gene expression data. Curr Opin Immunol 12(2):201–205

    Article  MathSciNet  Google Scholar 

  • Storn R, Price K (1997) Differential evolution–a simple and efficient heuristic for global optimization over continuous spaces. J Glob Optim 11(4):341–359

    Article  MathSciNet  MATH  Google Scholar 

  • Strehl A, Ghosh J (2002) Cluster ensembles—a knowledge reuse framework for combining multiple partitions. J Mach Learn Res 3:583–617

    MathSciNet  MATH  Google Scholar 

  • Tamayo P, Slonim D, Mesirov J, Zhu Q, Kitareewan S, Dmitrovsky E, Lander ES, Golub TR (1999) Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. Proc Natl Acad Sci 96(6):2907–2912

    Article  Google Scholar 

  • Xie XL, Beni G (1991) A validity measure for fuzzy clustering. IEEE Trans Pattern Anal Mach Intell 13(8):841–847

    Article  Google Scholar 

  • Yang X, Deb S (2014) Cuckoo search: recent advances and applications. Neural Comput Appl 24(1):169–174

    Article  Google Scholar 

  • Yin C, Xia L, Zhang S et al (2017) Improved clustering algorithm based on high-speed network data stream. Soft Comput. https://doi.org/10.1007/s00500-017-2708-2

  • Yue S, Wang P, Wang J, Huang T (2013) Extension of the gap statistics index to fuzzy clustering. Soft Comput 17(10):1833–1846

    Article  Google Scholar 

  • Yue S, Wang J, Wang J, Bao X (2016) A new validity index for evaluating the clustering results by partitional clustering algorithms. Soft Comput 20(3):1127–1138

    Article  Google Scholar 

  • Zăvoianu AC, Lughofer E, Bramerdorfer G, Amrhein W, Klement EP (2015) DECMO2: a robust hybrid and adaptive multi-objective evolutionary algorithm. Soft Comput 19(12):3551–3569

    Article  Google Scholar 

  • Zhou Z, Zhu S (2017) Kernel-based multiobjective clustering algorithm with automatic attribute weighting. Soft Comput. https://doi.org/10.1007/s00500-017-2590-y

Download references

Acknowledgements

Authors would like to acknowledge the help from Indian Institute of Technology Patna and National Institute of Technology Mizoram to conduct this research.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sriparna Saha.

Ethics declarations

Conflict of interest

All the authors declare that they do not have any conflict of interest.

Human and animal rights

We have not performed any experiments which involve animals or humans.

Additional information

Communicated by S. Deb, T. Hanne, K. C. Wong.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Saha, S., Das, R. & Pakray, P. Aggregation of multi-objective fuzzy symmetry-based clustering techniques for improving gene and cancer classification. Soft Comput 22, 5935–5954 (2018). https://doi.org/10.1007/s00500-017-2865-3

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-017-2865-3

Keywords

Navigation