Abstract
High-throughput technologies, like DNA microarray, help in simultaneous monitoring of the expression levels of thousands of genes during important biological processes and over the collection of experimental conditions. Automatically uncovering functionally related genes is a basic building block to solve various problems related to functional genomics. But sometimes a subset of genes may not be similar with respect to all the conditions present in the dataset; thus, bi-clustering concept becomes popular where different subsets of genes and the corresponding subsets of conditions with respect to which genes are most similar are automatically identified. In the current study, we have posed this problem in the multi-objective optimization (MOO) framework where different bi-cluster quality measures are optimized simultaneously. The search potentiality of a simulated annealing-based MOO technique, AMOSA, is used for the simultaneous optimization of these measures. A case study on the suitability of different distance measures in solving the bi-clustering problem is also conducted. The competency of the proposed multi-objective-based bi-clustering approach is shown for three benchmark datasets. The obtained results are further validated using statistical and biological significance tests.
Similar content being viewed by others
Notes
(http://www.cplusplus.com/reference/cstdlib/rand/)
http://promodel.com/onlinehelp/promodel/80/C-14%20-%20Rand().htm
References
Acharya S, Saha S (2016) Importance of proximity measures in clustering of cancer and mirna datasets: proposal of an automated framework. Mol BioSyst 12(11):3478–3501
Acharya S, Saha S, Thadisina Y (2016) Multiobjective simulated annealing-based clustering of tissue samples for cancer diagnosis. IEEE J Biomed Health Inf 20(2):691–698
Angiulli F, Pizzuti C (2005) Gene expression biclustering using random walk strategies. In: International conference on data warehousing and knowledge discovery. Springer, pp 509–519
Attneave F (1955) Symmetry, information, and memory for patterns. Am J Psychol 68(2):209–222
Bandyopadhyay S, Saha S (2007) Gaps: A clustering method using a new point symmetry-based distance measure. Pattern Recogn 40(12):3430–3451
Bandyopadhyay S, Saha S (2012) Unsupervised classification: similarity measures, classical and metaheuristic approaches, and applications. Springer, Berlin
Bandyopadhyay S, Saha S, Maulik U, Deb K (2008) A simulated annealing-based multiobjective optimization algorithm: Amosa. IEEE Trans Evol Comput 12(3):269–283
Ben-Dor A, Chor B, Karp R, Yakhini Z (2003) Discovering local structure in gene expression data: the order-preserving submatrix problem. J Comput Biol 10(3–4):373–384
Bousselmi M, Bechikh S, Hung C-C, Said LB (2017) Bi-mock: a multi-objective evolutionary algorithm for bi-clustering with automatic determination of the number of bi-clusters. In: International conference on neural information processing. Springer, pp 366–376
Bryan K, Cunningham P, Bolshakova N (2005) Biclustering of expression data using simulated annealing. In: 18th IEEE symposium on computer-based medical systems, 2005. Proceedings. IEEE, pp 383–388
Chakraborty A, Maka H (2005) Biclustering of gene expression data using genetic algorithm. In: Proceedings of the 2005 IEEE symposium on computational intelligence in bioinformatics and computational biology, 2005. CIBCB’05. IEEE, pp 1–8
Cheng K-O, Law N-F, Siu W-C, Lau T (2007) Bivisu: software tool for bicluster detection and visualization. Bioinformatics 23(17):2342–2344
Cheng Y, Church GM (2000) Biclustering of expression data. Ismb 8(2000):93–103
Deb K, Pratap A, Agarwal S, Meyarivan T (2002) A fast and elitist multiobjective genetic algorithm: Nsga-ii. IEEE Trans Evol Comput 6(2):182–197
Deb K, Sindhya K, Hakanen J (2016) Multi-objective optimization. In: Decision sciences: theory and practice. CRC Press, Boca Raton, FL
Divina F, Aguilar-Ruiz JS (2007) A multi-objective approach to discover biclusters in microarray data. In: Proceedings of the 9th annual conference on Genetic and evolutionary computation. ACM, pp 385–392
Dudoit S, Fridlyand J (2003) Classification in microarray experiments. Stat Anal Gene Expr Microarray Data 1:93–158
Getz G, Levine E, Domany E (2000) Coupled two-way clustering analysis of gene microarray data. Proc Nat Acad Sci 97(22):12079–12084
Giancarlo R, Bosco GL, Pinello L (2010) Distance functions, clustering algorithms and microarray data analysis. In: International conference on learning and intelligent optimization. Springer, pp 125–138
Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H, Loh ML, Downing JR, Caligiuri MA et al (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286(5439):531–537
Hartigan JA (1972) Direct clustering of a data matrix. J Am Stat Assoc 67(337):123–129
Hochreiter S, Bodenhofer U, Heusel M, Mayr A, Mitterecker A, Kasim A, Khamiakova T, Van Sanden S, Lin D, Talloen W et al (2010) Fabia: factor analysis for bicluster acquisition. Bioinformatics 26(12):1520–1527
Huang Q, Tao D, Li X, Liew A (2012) Parallelized evolutionary learning for detection of biclusters in gene expression data. IEEE/ACM Trans Comput Biol Bioinf 9(2):560–570
Ihmels J, Bergmann S, Barkai N (2004) Defining transcription modules using large-scale gene expression data. Bioinformatics 20(13):1993–2003
Jain AK, Dubes RC (1988) Algorithms for clustering data. Prentice-Hall, Inc
Liu J, Li Z, Liu F, Chen Y (2008) Multi-objective particle swarm optimization biclustering of microarray data. In: IEEE international conference on bioinformatics and biomedicine, 2008. BIBM’08. IEEE, pp 363–366
Maulik U, Mukhopadhyay A, Bandyopadhyay S (2009) Finding multiple coherent biclusters in microarray data using variable string length multiobjective genetic algorithm. IEEE Trans Inf Technol Biomed 13(6):969
Ray SS, Bandyopadhyay S, Pal SK (2007) New distance measure for microarray gene expressions using linear dynamic range of photo multiplier tube. In: International conference on computing: theory and applications, 2007. ICCTA’07. IEEE, pp 337–341
Sahoo P, Acharya S, Saha S (2016) Automatic generation of biclusters from gene expression data using multi-objective simulated annealing approach. In: 2016 23rd international conference on pattern recognition (ICPR). IEEE, pp 2174–2179
Seifoddini HK (1989) Single linkage versus average linkage clustering in machine cells formation applications. Comput Ind Eng 16(3):419–426
Seridi K, Jourdan L, Talbi E-G (2015) Using multiobjective optimization for biclustering microarray data. Appl Soft Comput 33:239–249
Sirkin RM (2005) Statistics for the social sciences. Sage Publications
Tanay A, Sharan R, Shamir R (2002) Discovering statistically significant biclusters in gene expression data. Bioinformatics 18(suppl 1):S136–S144
Toussaint GT (1980) Pattern recognition and geometrical complexity. In: Proceedings of the 5th international conference on pattern recognition, vol 334, p 347
Yan D, Wang J (2013) Biclustering of gene expression data based on related genes and conditions extraction. Pattern Recogn 46(4):1170–1182
Yang J, Wang H, Wang W, Yu P (2003) Enhanced biclustering on expression data. In: 3rd IEEE symposium on bioinformatics and bioengineering, 2003. Proceedings. IEEE, pp 321–327
Zhang Z, Teo A, Ooi BC, Tan K-L (2004) Mining deterministic biclusters in gene expression data. In: 4th IEEE symposium on bioinformatics and bioengineering, 2004. BIBE 2004. Proceedings. IEEE, pp 283–290
Zhao L, Zaki MJ (2005) Tricluster: an effective algorithm for mining coherent clusters in 3d microarray data. In: Proceedings of the 2005 ACM SIGMOD international conference on Management of data. ACM, pp 694–705
Acknowledgements
The first author sincerely thanks Tata Consultancy Services (TCS) for providing funding to conduct this research.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
Authors Sudipta Acharya, Sriparna Saha and Pracheta Sahoo declare that they have no conflict of interest.
Ethical approval
This article does not contain any studies with human participants or animals performed by any of the authors.
Informed consent
Not applicable.
Additional information
Communicated by V. Loia.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
First two authors have equal contributions.
Rights and permissions
About this article
Cite this article
Acharya, S., Saha, S. & Sahoo, P. Bi-clustering of microarray data using a symmetry-based multi-objective optimization framework. Soft Comput 23, 5693–5714 (2019). https://doi.org/10.1007/s00500-018-3227-5
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-018-3227-5