Efficient Density-Based Subspace Clustering in High Dimensions

Assent, Ira

doi:10.1007/978-3-662-48577-4_3

Ira Assent¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7627))

Included in the following conference series:

International Workshop on Clustering High-Dimensional Data

1153 Accesses
1 Citations

Abstract

Density-based clustering defines clusters as dense areas in feature space separated by sparsely populated areas. It is known to successfully identify clusters of arbitrary shapes even in noisy data. Today, we face increasingly high-dimensional data, i.e. data objects described by many attributes. Effects attributed to the “curse of dimensionality” mean that in high-dimensional spaces, traditional clustering methods fail to identify meaningful clusters. In little more than a decade, the research field of subspace clustering has established methods for identifying clusters in subsets of the attributes in such high-dimensional spaces. As the number of possible subsets is exponential in the number of attributes, efficient algorithms are crucial. This short survey discusses challenges in this area, and presents models and algorithms for efficient and scalable density-based subspace clustering.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 34.99; Price excludes VAT (USA)

Softcover Book: USD 44.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Agrawal, R., Gehrke, J., Gunopulos, D., Raghavan, P.: Automatic subspace clustering of high dimensional data for data mining applications. In: SIGMOD, pp. 94–105 (1998)
Google Scholar
Assent, I.: Clustering high dimensional data. Wiley Interdisc. Rev. Data Min. Knowl. Discov. 2(4), 340–350 (2012)
Article Google Scholar
Assent, I., Krieger, R., Müller, E., Seidl, T.: DUSC: dimensionality unbiased subspace clustering. In: ICDM, pp. 409–414 (2007)
Google Scholar
Assent, I., Krieger, R., Müller, E., Seidl, T.: EDSC: efficient density-based subspace clustering. In: CIKM, pp. 1093–1102 (2008)
Google Scholar
Assent, I., Krieger, R., Müller, E., Seidl, T.: INSCY: indexing subspace clusters with in-process-removal of redundancy. In: ICDM, pp. 719–724 (2008)
Google Scholar
Bellman, R.: Dynamic Programming. Princeton University Press, Princeton (1957)
MATH Google Scholar
Beyer, K., Goldstein, J., Ramakrishnan, R., Shaft, U.: When is nearest neighbor meaningful? In: Beeri, C., Bruneman, P. (eds.) ICDT 1999. LNCS, vol. 1540, pp. 217–235. Springer, Heidelberg (1998)
Chapter Google Scholar
Ester, M., Kriegel, H.-P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases. In: KDD, pp. 226–231 (1996)
Google Scholar
Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: SIGMOD, pp. 1–12 (2000)
Google Scholar
Kailing, K., Kriegel, H.-P., Kröger, P.: Density-connected subspace clustering for high-dimensional data. In: SDM, pp. 246–257 (2004)
Google Scholar
Kriegel, H.-P., Kröger, P., Sander, J., Zimek, A.: Density-based clustering. Wiley Interdisc. Rev. Data Min. Knowl. Disc. 1(3), 231–240 (2011)
Article Google Scholar
Kriegel, H.-P., Kröger, P., Zimek, A.: Clustering high-dimensional data: a survey on subspace clustering, pattern-based clustering, and correlation clustering. ACM Trans. Knowl. Discov. Data 3(1), 1:1–1:58 (2009)
Article Google Scholar
Moise, G., Sander, J.: Finding non-redundant, statistically significant regions in high dimensional data: a novel approach to projected and subspace clustering. In: KDD, pp. 533–541 (2008)
Google Scholar
Moise, G., Zimek, A., Kröger, P., Kriegel, H.-P., Sander, J.: Subspace and projected clustering: experimental evaluation and analysis. Knowl. Inf. Syst. 21, 299–326 (2009). doi:10.1007/s10115-009-0226-y
Article Google Scholar
Müller, E., Assent, I., Günnemann, S., Krieger, R., Seidl, T.: Relevant subspace clustering: mining the most interesting non-redundant concepts in high dimensional data. In: ICDM, pp. 377–386 (2009)
Google Scholar
Müller, E., Assent, I., Günnemann, S., Seidl, T.: Scalable density-based subspace clustering. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management, pp. 1077–1086. ACM (2011)
Google Scholar
Müller, E., Günnemann, S., Assent, I., Seidl, T.: Evaluating clustering in subspace projections of high dimensional data. PVLDB 2(1), 1270–1281 (2009)
Google Scholar
Parsons, L., Haque, E., Liu, H.: Subspace clustering for high dimensional data: a review. SIGKDD Explor. 6(1), 90–105 (2004)
Article Google Scholar
Sim, K., Gopalkrishnan, V., Zimek, A., Cong, G.: A survey on enhanced subspace clustering. Data Min. Knowl. Discov. 26, 332–397 (2012). online first
Article MathSciNet MATH Google Scholar

Download references

Acknowledgments

This work has been supported in part by the Danish Council for Strategic Research, grant 10-092316, and by the Danish Council for Independent Research - Technology and Production Sciences (FTP), grant 10-081972.

Author information

Authors and Affiliations

Department of Computer Science, Aarhus University, Aarhus, Denmark
Ira Assent

Authors

Ira Assent
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ira Assent .

Editor information

Editors and Affiliations

DIBRIS, University of Genoa DIBRIS, Genoa, Italy
Francesco Masulli
University of Naples "Parthenope", Naples, Italy
Alfredo Petrosino
DIBRIS, University of Genoa, Genoa, Italy
Stefano Rovetta

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Assent, I. (2015). Efficient Density-Based Subspace Clustering in High Dimensions. In: Masulli, F., Petrosino, A., Rovetta, S. (eds) Clustering High--Dimensional Data. CHDD 2012. Lecture Notes in Computer Science(), vol 7627. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-48577-4_3

Download citation

DOI: https://doi.org/10.1007/978-3-662-48577-4_3
Published: 25 November 2015
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-48576-7
Online ISBN: 978-3-662-48577-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics