Skip to main content

Denoising Cluster Analysis

  • Conference paper
  • First Online:
Neural Information Processing (ICONIP 2015)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9491))

Included in the following conference series:

Abstract

Clustering or cluster analysis is an important and common task in data mining and analysis, with applications in many fields. However, most existing clustering methods are sensitive in the presence of limited amounts of data per cluster in real-world applications. Here we propose a new method called denoising cluster analysis to improve the accuracy. We first construct base clusterings with artificially corrupted data samples and later learn their ensemble based on mutual information. We develop multiplicative updates for learning the aggregated cluster assignment probabilities. Experiments on real-world data sets show that our method unequivocally improves cluster purity over several other clustering approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://archive.ics.uci.edu/ml/.

References

  1. Arora, R., Gupta, M., Kapila, A., Fazel, M.: Clustering by left-stochastic matrix factorization. In: ICML (2011)

    Google Scholar 

  2. Bishop, C.: Training with noise is equivalent to Tikhonov regularization. Neural Comput. 7(1), 108–116 (1995)

    Article  Google Scholar 

  3. Dikmen, O., Yang, Z., Oja, E.: Learning the information divergence. IEEE Trans. Pattern Anal. Mach. Intell. 37(7), 1442–1454 (2015)

    Article  Google Scholar 

  4. Herbrich, R., Graepel, T.: Invariant pattern recognition by semidefinite programming machines. In: NIPS (2004)

    Google Scholar 

  5. Hofmann, T.: Probabilistic latent semantic indexing. In: SIGIR, pp. 50–57 (1999)

    Google Scholar 

  6. Parzen, E.: On estimation of a probability density function and mode. Ann. Math. Stat. 33(3), 1065–1076 (1962)

    Article  MathSciNet  MATH  Google Scholar 

  7. Romano, S., Bailey, J., Nguyen, V., Verspoor, K.: Standardized mutual information for clustering comparisons: one step further in adjustment for chance. In: ICML (2014)

    Google Scholar 

  8. Shi, J., Malik, J.: Normalized cuts and image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 888–905 (2000)

    Article  Google Scholar 

  9. Strehl, A., Ghosh, J.: Cluster ensembles - a knowledge reuse framework for combining multiple partitions. J. Mach. Learn. Res. 3, 583–617 (2002)

    MathSciNet  MATH  Google Scholar 

  10. Vincent, P., Larochelle, H., Bengio, Y., Manzagol, P.: Extracting and composing robust features with denoising autoencoders. In: ICML (2008)

    Google Scholar 

  11. Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y., Manzagol, P.A.: Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion. J. Mach. Learn. Res. 11, 3371–3408 (2010)

    MathSciNet  MATH  Google Scholar 

  12. Vinh, N.X., Epps, J., Bailey, J.: Information theoretic measures for clusterings comparison: variants, properties, normalization and correction for chance. J. Mach. Learn. Res. 11, 2837–2854 (2010)

    MathSciNet  MATH  Google Scholar 

  13. Yang, Z., Hao, T., Dikmen, O., Chen, X., Oja, E.: Clustering by nonnegative matrix factorization using graph random walk. In: NIPS (2012)

    Google Scholar 

  14. Yang, Z., Laaksonen, J.: Multiplicative updates for non-negative projections. Neurocomputing 71(1–3), 363–373 (2007)

    Article  Google Scholar 

  15. Yang, Z., Oja, E.: Linear and nonlinear projective nonnegative matrix factorization. IEEE Trans. Neural Netw. 21(5), 734–749 (2010)

    Article  Google Scholar 

  16. Yang, Z., Oja, E.: Unified development of multiplicative algorithms for linear and quadratic nonnegative matrix factorization. IEEE Trans. Neural Netw. 22(12), 1878–1891 (2011)

    Article  Google Scholar 

  17. Yang, Z., Oja, E.: Clustering by low-rank doubly stochastic matrix decomposition. In: ICML (2012)

    Google Scholar 

  18. Yang, Z., Oja, E.: Quadratic nonnegative matrix factorization. Pattern Recogn. 45(4), 1500–1510 (2012)

    Article  MATH  Google Scholar 

  19. Yang, Z., Peltonen, J., Kaski, S.: Optimization equivalence of divergences improves neighbor embedding. In: ICML (2014)

    Google Scholar 

  20. Yang, Z., Zhang, H., Yuan, Z., Oja, E.: Kullback-leibler divergence for nonnegative matrix factorization. In: Honkela, T. (ed.) ICANN 2011, Part I. LNCS, vol. 6791, pp. 250–257. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  21. Zhu, Z., Yang, Z., Oja, E.: Multiplicative updates for learning with stochastic matrices. In: Kämäräinen, J.-K., Koskela, M. (eds.) SCIA 2013. LNCS, vol. 7944, pp. 143–152. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhirong Yang .

Editor information

Editors and Affiliations

Appendix: Proof of Theorem 1

Appendix: Proof of Theorem 1

Proof

We use W for current estimate, \(\widetilde{W}\) for variable, and \(W^\text {new}\) for the new estimate, respectively. The objective function \(\widetilde{\mathcal {J}}\) fulfills the theorem conditions in [16]. Therefore, we can construct the majorization function

$$\begin{aligned} \nonumber G(\widetilde{W},W)=&\sum _{ik}\left[ \mathcal {\nabla }^+_{ik}\widetilde{W}_{ik}-\mathcal {\nabla }^-_{ik}W_{ik}\log \widetilde{W}_{ik}+\frac{B_{ik}}{A_{ik}}W_{ik}-\frac{W_{ik}}{A_{ik}}\log \widetilde{W}_{ik}\right] +\text {constant}\end{aligned}$$

such that \(G(\widetilde{W},W)\ge \widetilde{\mathcal {J}}(\widetilde{W},\lambda )\) and \(G(W,W)=\widetilde{\mathcal {J}}(W,\lambda )\). Let \(W^\text {new}\) be the minimum of \(G(\widetilde{W},W)\), which is implemented by zeroing \(\partial G/\partial \widetilde{W}\) and yields Eq. 12. Therefore \(\widetilde{\mathcal {J}}(W^\text {new},\lambda ) \le G(W^\text {new},W) \le G(W,W) =\widetilde{\mathcal {J}}(W,\lambda )\).

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Zhang, R., Yang, Z., Corander, J. (2015). Denoising Cluster Analysis. In: Arik, S., Huang, T., Lai, W., Liu, Q. (eds) Neural Information Processing. ICONIP 2015. Lecture Notes in Computer Science(), vol 9491. Springer, Cham. https://doi.org/10.1007/978-3-319-26555-1_49

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-26555-1_49

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-26554-4

  • Online ISBN: 978-3-319-26555-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics