Skip to main content
Log in

Evolutionary soft co-clustering: formulations, algorithms, and applications

  • Published:
Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Abstract

We consider the co-clustering of time-varying data using evolutionary co-clustering methods. Existing approaches are based on the spectral learning framework, thus lacking a probabilistic interpretation. We overcome this limitation by developing a probabilistic model in this paper. The proposed model assumes that the observed data are generated via a two-step process that depends on the historic co-clusters. This allows us to capture the temporal smoothness in a probabilistically principled manner. To perform maximum likelihood parameter estimation, we present an EM-based algorithm. We also establish the convergence of the proposed EM algorithm. An appealing feature of the proposed model is that it leads to soft co-clustering assignments naturally. We evaluate the proposed method on both synthetic and real-world data sets. Experimental results show that our method consistently outperforms prior approaches based on spectral method. To fully exploit the real-world impact of our methods, we further perform a systematic application study on the analysis of Drosophila gene expression pattern images. We encode the spatial gene expression information at a particular developmental time point into a data matrix using a mesh-generation pipeline. We then co-cluster the embryonic domains and the genes simultaneously for multiple time points using our evolutionary co-clustering method. Results show that the co-clusters of gene and embryonic domains reflect the underlying biology.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  • Aggarwal CC, Han J, Wang J, Yu PS (2003) A framework for clustering evolving data streams. In: Proceedings of the 29th international conference on very large data bases, pp 81–92

  • Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, Harris MA, Hill DP, Issel-Tarver L, Kasarskis A, Lewis S, Matese JC, Richardson JE, Ringwald M, Rubin GM, Sherlock G (2000) Gene Ontology: tool for the unification of biology. Nat Genet 25:25–29

    Article  Google Scholar 

  • Asur S, Parthasarathy S, Ucar D (2007) An event-based framework for characterizing the evolutionary behavior of interaction graphs. In: Proceedings of the 13th ACM SIGKDD international conference on knowledge discovery and data mining, pp 913–921

  • Bach FR, Jordan MI (2006) Learning spectral clustering, with application to speech separation. J Mach Learn Res 7:1963–2001

  • Chakrabarti D, Kumar R, Tomkins A (2006) Evolutionary clustering. In: Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining, pp 554–560

  • Cheng Y, Church GM (2000) Biclustering of expression data. In: Proceedings of the eighth international conference on intelligent systems for molecular biology, pp 93–103

  • Chi Y, Song X, Zhou D, Hino K, Tseng BL (2009) On evolutionary spectral clustering. ACM Trans Knowl Discov Data 3:17:1–17:30

  • Cho H, Dhillon IS (2008) Coclustering of human cancer microarrays using minimum sum-squared residue coclustering. IEEE/ACM Trans Comput Biol Bioinform 5:385–400

    Article  Google Scholar 

  • Chung FRK (1997) Spectral graph theory, vol 92. American Mathematical Society.

  • Deodhar M, Ghosh J (2010) SCOAL: a framework for simultaneous co-clustering and learning from complex data. ACM Trans Knowl Discov Data 4(3):11:1–11:31

  • Dhillon IS, Guan Y, Kulis B (2004) Kernel k-means: spectral clustering and normalized cuts. In: Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining, pp 551–556

  • Dhillon IS, Mallela S, Modha DS (2003) Information-theoretic co-clustering. In: Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining, pp 89–98

  • Dhillon IS (2001) Co-clustering documents and words using bipartite spectral graph partitioning. In: Proceedings of the seventh ACM SIGKDD international conference on knowledge discovery and data mining, pp 269–274

  • Frise E, Hammonds AS, Celniker SE (2010) Systematic image-driven analysis of the spatial Drosophila embryonic expression landscape. Mol Syst Biol 6:345

    Article  Google Scholar 

  • Giannakidou E, Koutsonikola V, Vakali A, Kompatsiaris Y (2008) Co-clustering tags and social data sources. In: Proceedings of the 2008 the ninth international conference on web-age information management, pp 317–324

  • Golub GH, van Loan CF (1996) Matrix computations, 3rd edn. Johns Hopkins University Press, Baltimore, MD

    MATH  Google Scholar 

  • Green N, Rege M, Liu X, Bailey R (2011) Evolutionary spectral co-clustering. In: The 2011 international joint conference on neural networks, pp 1074–1081

  • Hartigan JA (1972) Direct clustering of a data matrix. J Am Stat Assoc 67(337):123–129

    Article  Google Scholar 

  • Jain AK, Murty MN, Flynn PJ (1999) Data clustering: a review. ACM Comput Surv 31:264–323

    Article  Google Scholar 

  • Kluger Y, Basri R, Chang JT, Gerstein M (2003) Spectral biclustering of microarray data: coclustering genes and conditions. Genome Res 13(4):703–716

    Article  Google Scholar 

  • Kumar S, Jayaraman K, Panchanathan S, Gurunathan R, Marti-Subirana A, Newfeld SJ (2002) BEST: a novel computational approach for comparing gene expression patterns from early stages of Drosophila melanogaster develeopment. Genetics 169:2037–2047

    Google Scholar 

  • Kumar S, Konikoff C, Van Emden B, Busick C, Davis KT, Ji S, Lin-Wei W, Ramos H, Brody T, Panchanathan S, Ye J, Karr TL, Gerold K, McCutchan M, Newfeld SJ (2011) Flyexpress: visual mining of spatiotemporal patterns for genes and publications in drosophila embryogenesis. Bioinformatics 27(23):3319–3320

    Article  Google Scholar 

  • Lécuyer E, Yoshida H, Parthasarathy N, Alm C, Babak T, Cerovina T, Hughes TR, Tomancak P, Krause HM (2007) Global analysis of mRNA localization reveals a prominent role in organizing cellular architecture and function. Cell 131:174–187

    Article  Google Scholar 

  • Lécuyer E, Tomancak P (2008) Mapping the gene expression universe. Curr Opin Genet Dev 18(6):506–512

    Article  Google Scholar 

  • Lee DD, Seung HS (1999) Learning the parts of objects by non-negative matrix factorization. Nature 401:788–791

    Article  Google Scholar 

  • Leskovec J, Kleinberg J, Faloutsos C (2007) Graph evolution: densification and shrinking diameters. ACM Trans Knowl Discov Data 1(1):2

  • Li J, Tao D (2013) Simple exponential family PCA. IEEE Trans Neural Netw Learn Syst 24(3):485–497

    Article  MathSciNet  Google Scholar 

  • Lin Y-R, Chi Y, Zhu S, Sundaram H, Tseng BL (2009) Analyzing communities and their evolutions in dynamic social networks. ACM Trans Knowl Discov Data 3:8:1–8:31

  • Li J, Tao D (2013) A Bayesian factorised covariance model for image analysis. In: Proceedings of the international joint conferences on artificial intelligence

  • Livne OE, Golub GH (2004) Scaling by binormalization. Numer Algorithms 35:97–120

    Article  MathSciNet  Google Scholar 

  • Long B, Wu X, Zhang ZM, Yu PS (2006) Unsupervised learning on k-partite graphs. In: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 317–326

  • Long B, Zhang ZM, Yu PS (2005) Co-clustering by block value decomposition. In: Proceedings of the eleventh ACM SIGKDD international conference on knowledge discovery in data mining. ACM, pp 635–640

  • Luxburg U (2007) A tutorial on spectral clustering. Stat Comput 17:395–416

    Article  MathSciNet  Google Scholar 

  • Madeira SC, Oliveira AL (2004) Biclustering algorithms for biological data analysis: a survey. IEEE/ACM Trans Comput Biol Bioinform 1:24–45

    Article  Google Scholar 

  • Mei Q, Zhai CX (2005) Discovering evolutionary theme patterns from text: an exploration of temporal text mining. In: Proceedings of the eleventh ACM SIGKDD international conference on knowledge discovery and data mining, pp 198–207

  • Ng AY, Jordan MI, Weiss Y (2001) On spectral clustering: analysis and an algorithm. Adv Neural Inf Process Syst 14:849–856

    Google Scholar 

  • Saha A, Sindhwani V (2012) Learning evolving and emerging topics in social media: a dynamic NMF approach with temporal regularization. In: Proceedings of the fifth ACM international conference on web search and data mining, pp 693–702

  • Sandmann T, Girardot C, Brehme M, Tongprasit W, Stolc V, Furlong EEM (2007) A core transcriptional network for early mesoderm development in Drosophila melanogaster. Genes Dev 21(4):436–449

    Article  Google Scholar 

  • Shewchuk JR (1996) Triangle: engineering a 2D quality mesh generator and delaunay triangulator. In: Lin MC, Manocha D (eds) Applied computational geometry: towards geometric engineering, volume 1148 of lecture notes in computer science. Springer, Berlin, pp 203–222. From the First ACM Workshop on Applied Computational Geometry

  • Shi J, Malik J (2000) Normalized cuts and image segmentation. IEEE Trans Pattern Anal Mach Intell 22(8):888–905

  • Stathopoulos A, Levine M (2005) Genomic regulatory networks and animal development. Dev Cell 9(4):449–462

    Article  Google Scholar 

  • Sun J, Faloutsos C, Papadimitriou S, Yu PS (2007) GraphScope: parameter-free mining of large time-evolving graphs. In: Proceedings of the 13th ACM SIGKDD international conference on knowledge discovery and data mining, pp 687–696

  • Tao D, Li X, Wu X, Maybank SJ (2007) General tensor discriminant analysis and gabor features for gait recognition. IEEE Trans Pattern Anal Mach Intell 29(10):1700–1715

    Article  Google Scholar 

  • Tianbing X, Zhang Z, Yu PS, Long B (2012) Generative models for evolutionary clustering. ACM Trans Knowl Discov Data 6(2):7

    Google Scholar 

  • Tomancak P, Berman B, Beaton A, Weiszmann R, Kwan E, Hartenstein V, Celniker S, Rubin G (2007) Global analysis of patterns of gene expression during Drosophila embryogenesis. Genome Biol 8(7):R145

    Article  Google Scholar 

  • Tomancak P, Beaton A, Weiszmann R, Kwan E, Shu S, Lewis SE, Richards S, Ashburner M, Hartenstein V, Celniker SE, Rubin GM (2002) Systematic determination of patterns of gene expression during Drosophila embryogenesis. Genome Biol 3(12):0081–0088

  • Tong H, Papadimitriou S, Philip SY, Faloutsos C (2008) Proximity tracking on time-evolving bipartite graphs. In: Proceedings of the SIAM international conference on data mining, pp 704–715

  • Volker Hartenstein (1995) Atlas of Drosophila development. Cold Spring Harbor Laboratory Press, New York

    Google Scholar 

  • Wang F, Li P, König AC (2011a) Efficient document clustering via online nonnegative matrix factorizations. In: Proceedings of the SIAM international conference on data mining. SIAM, pp 908–919

  • Wang F, Li T, Zhang C (2008) Semi-supervised clustering via matrix factorization. In: Proceedings of the SIAM international conference on data mining. SIAM, pp 1–12

  • Wang F, Tong H, Lin C-Y (2011b) Towards evolutionary nonnegative matrix factorization. In: Proceedings of the twenty-fifth AAAI conference on artificial intelligence

  • Yu K, Yu S, Tresp V (2006) Soft clustering on graphs. In: Weiss Y, Schölkopf B, Platt J (eds) Advances in neural information processing systems, vol 18. MIT Press, Cambridge, MA, pp 1553–1560

    Google Scholar 

  • Zha H, He X, Ding C, Simon H, Gu M (2001) Bipartite graph partitioning and data clustering. In: Proceedings of the tenth international conference on information and knowledge management, pp 25–32

  • Zhang W, Feng D, Li R, Chernikov A, Chrisochoides N, Osgood C, Konikoff C, Newfeld S, Kumar S, Ji S (2013) A mesh generation and machine learning framework for Drosophila gene expression pattern image analysis. BMC Bioinform 14:372

    Article  Google Scholar 

  • Zhang W, Ji S, Zhang R (2013) Evolutionary soft co-clustering. In: Proceedings of the 2013 SIAM international conference on data mining. Society for Industrial and Applied Mathematics, Philadelphia, PA, pp 121–129

Download references

Acknowledgments

We thank Hanghang Tong and Fei Wang for providing the DBLP data, Yun Chi and Yu-Ru Lin for many insightful discussions. This research was supported in part by NSF Grants DBI-1147134, DBI-1356621, CCF-1139864, CCF-1136538, and CSI-1136536, and by Old Dominion University Office of Research.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shuiwang Ji.

Additional information

Responsible editor: Eamonn Keogh.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, W., Li, R., Feng, D. et al. Evolutionary soft co-clustering: formulations, algorithms, and applications. Data Min Knowl Disc 29, 765–791 (2015). https://doi.org/10.1007/s10618-014-0375-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10618-014-0375-9

Keywords

Navigation