Skip to main content

Minimum Similarity Sampling Scheme for Nyström Based Spectral Clustering on Large Scale High-Dimensional Data

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8482))

Abstract

Large-scale spectral clustering in high-dimensional space is among the most popular unsupervised problems. Existed sampling schemes have different limitations on high-dimensional data. This paper proposes an improved Nyström extension based spectral clustering algorithm with a designed sampling scheme for high-dimensional data. We first take insight into some existed sampling schemes. We illustrate their defects especially in high dimension scene. Furthermore we provide theoretical analysis on how the similarity between the sample set and non-sampling set influences the approximation error, and propose an improved sampling scheme, the minimum similarity sampling (MSS) for high-dimensional space clustering. Experiments on both synthetic and real datasets show that the proposed sampling scheme outperforms other algorithms when applied in Nyström based spectral clustering with higher accuracy, and lowers the time consumption for sampling.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Belabbas, M.A., Wolfe, P.J.: Spectral methods in machine learning and new strategies for very large datasets. Proceedings of the National Academy of Sciences 106(2), 369–374 (2009)

    Article  Google Scholar 

  2. Beyer, K., Goldstein, J., Ramakrishnan, R., Shaft, U.: When is “nearest neighbor” meaningful? In: Beeri, C., Bruneman, P. (eds.) ICDT 1999. LNCS, vol. 1540, pp. 217–235. Springer, Heidelberg (1998)

    Chapter  Google Scholar 

  3. Chen, X., Cai, D.: Large scale spectral clustering with landmark-based representation. In: AAAI (2011)

    Google Scholar 

  4. Fowlkes, C., Belongie, S., Chung, F., Malik, J.: Spectral grouping using the nystrom method. IEEE Transactions on Pattern Analysis and Machine Intelligence 26(2), 214–225 (2004)

    Article  Google Scholar 

  5. Huang, L., Yan, D., Taft, N., Jordan, M.I.: Spectral clustering with perturbed data. In: Advances in Neural Information Processing Systems, pp. 705–712 (2008)

    Google Scholar 

  6. Hunter, B., Strohmer, T.: Performance analysis of spectral clustering on compressed, incomplete and inaccurate measurements. arXiv preprint arXiv:1011.0997 (2010)

    Google Scholar 

  7. Kannan, R., Vempala, S., Vetta, A.: On clusterings: Good, bad and spectral. Journal of the ACM (JACM) 51(3), 497–515 (2004)

    Article  MATH  MathSciNet  Google Scholar 

  8. MeilPa, M., Shi, J.: Learning segmentation by random walks (2000)

    Google Scholar 

  9. Ng, A.Y., Jordan, M.I., Weiss, Y., et al.: On spectral clustering: Analysis and an algorithm. Advances in Neural Information Processing Systems 2, 849–856 (2002)

    Google Scholar 

  10. Shi, J., Malik, J.: Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 22(8), 888–905 (2000)

    Article  Google Scholar 

  11. Shinnou, H., Sasaki, M.: Spectral clustering for a large data set by reducing the similarity matrix size. In: Preceedings of the Sixth International Language Resouces and Evaluation, LREC (2008)

    Google Scholar 

  12. Von Luxburg, U.: A tutorial on spectral clustering. Statistics and Computing 17(4), 395–416 (2007)

    Article  MathSciNet  Google Scholar 

  13. Yan, D., Huang, L., Jordan, M.I.: Fast approximate spectral clustering. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 907–916. ACM (2009)

    Google Scholar 

  14. Zhang, K., Tsang, I.W., Kwok, J.T.: Improved nyström low-rank approximation and error analysis. In: Proceedings of the 25th International Conference on Machine Learning, pp. 1232–1239. ACM (2008)

    Google Scholar 

  15. Zhang, X., You, Q.: Clusterability analysis and incremental sampling for nyström extension based spectral clustering. In: 2011 IEEE 11th International Conference on Data Mining (ICDM), pp. 942–951. IEEE (2011)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Zeng, Z., Zhu, M., Yu, H., Ma, H. (2014). Minimum Similarity Sampling Scheme for Nyström Based Spectral Clustering on Large Scale High-Dimensional Data. In: Ali, M., Pan, JS., Chen, SM., Horng, MF. (eds) Modern Advances in Applied Intelligence. IEA/AIE 2014. Lecture Notes in Computer Science(), vol 8482. Springer, Cham. https://doi.org/10.1007/978-3-319-07467-2_28

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-07467-2_28

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-07466-5

  • Online ISBN: 978-3-319-07467-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics