Correlation Clustering with Partial Information

Demaine, Erik D.; Immorlica, Nicole

doi:10.1007/978-3-540-45198-3_1

Erik D. Demaine⁸ &
Nicole Immorlica⁸

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2764))

Included in the following conference series:

International Workshop on Randomization and Approximation Techniques in Computer Science
International Workshop on Approximation Algorithms for Combinatorial Optimization

1155 Accesses
35 Citations

Abstract

We consider the following general correlation-clustering problem [1]: given a graph with real edge weights (both positive and negative), partition the vertices into clusters to minimize the total absolute weight of cut positive edges and uncut negative edges. Thus, large positive weights (representing strong correlations between endpoints) encourage those endpoints to belong to a common cluster; large negative weights encourage the endpoints to belong to different clusters; and weights with small absolute value represent little information. In contrast to most clustering problems, correlation clustering specifies neither the desired number of clusters nor a distance threshold for clustering; both of these parameters are effectively chosen to be the best possible by the problem definition.

Correlation clustering was introduced by Bansal, Blum, and Chawla [1], motivated by both document clustering and agnostic learning. They proved NP-hardness and gave constant-factor approximation algorithms for the special case in which the graph is complete (full information) and every edge has weight +1 or -1. We give an O(logn)-approximation algorithm for the general case based on a linear-programming rounding and the “region-growing” technique. We also prove that this linear program has a gap of Ω(logn), and therefore our approximation is tight under this approach. We also give an O(r ³)-approximation algorithm for K _{r, r}-minor-free graphs. On the other hand, we show that the problem is APX-hard, and any o(logn)-approximation would require improving the best approximation algorithms known for minimum multicut.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bansal, N., Blum, A., Chawla, S.: Correlation clustering. In: IEEE Symp. on Foundations of Computer Science (2002)
Google Scholar
Bejerano, Y., Immorlica, N., Naor, S., Smith, M.: Location area design in cellular networks. In: International Conference on Mobile Computing and Networking (2003)
Google Scholar
Charikar, M., Guruswami, V., Wirth, A.: Clustering with qualitative information (unpublished manuscript)
Google Scholar
Dahlhaus, E., Johnson, D.S., Papadimitriou, C.H., Seymour, P.D., Yannakakis, M.: The complexity of multiway cuts. In: ACM Symp. on Theory of Comp. (1992)
Google Scholar
Emanuel, D., Fiat, A.: Correlation clustering — minimizing disagreements on arbitrary weighted graphs. In: European Symp. on Algorithms (2003)
Google Scholar
Ester, M., Kriegel, H.-P., Sander, J., Xu, X.: Clustering for mining in large spatial databases. KI-Journal 1 (1998); Special Issue on Data Mining. Scien. Tec. Publishing
Google Scholar
Garg, N., Vazirani, V.V., Yannakakis, M.: Approximate max-flow min(multi)cut theorems and their applications. SIAM J. Comp. 25 (1996)
Google Scholar
Hochbaum, D.S., Shmoys, D.B.: A unified approach to approximation algorithms for bottleneck problems. Journal of the ACM 33 (1986)
Google Scholar
Hu, T.C.: Multicommodity network flows. Operations Research (1963)
Google Scholar
Jain, K., Vazirani, V.V.: Primal-dual approximation algorithms for metric facility location and k-median problems. In: IEEE Symp. on Foundations of Computer Science (1999)
Google Scholar
Kanungo, T., Mount, D.M., Netanyahu, N.S., Piatko, C.D., Silverman, R., Wu, A.Y.: An efficient k-means clustering algorithm: Analysis and implementation. IEEE Transactions on Pattern Analysis and Machine Intelligence 24(7) (2002)
Google Scholar
Klein, P.N., Plotkin, S.A., Rao, S.: Excluded minors, network decomposition, and multicommodity flow. In: ACM Symp. on Theory of Comp. (1993)
Google Scholar
Leighton, T., Rao, S.: Multicommodity max-flow min-cut theorems and their use in designing approximation algorithms. Journal of the ACM 46(6) (1999)
Google Scholar
Meila, M., Heckerman, D.: An experimental comparison of several clustering and initialization methods. In: Conference on Uncertainty in Artificial Intelligence (1998)
Google Scholar
Murtagh, F.: A survey of recent advances in hierarchical clustering algorithms. The Computer Journal 26(4) (1983)
Google Scholar
Procopiuc, C.M.: Clustering problems and their applications. Department of Computer Science, Duke University, http://www.cs.duke.edu/~magda/clustering-survey.ps.gz
Schulman, L.J.: Clustering for edge-cost minimization. Electronic Colloquium on Computational Complexity, ECCC 6(35) (1999)
Google Scholar
Steinbach, M., Karypis, G., Kumar, V.: A comparison of document clustering techniques. In: KDD-2000 Workshop on TextMining Workshop (2000)
Google Scholar
Tardos, E., Vazirani, V.V.: Improved bounds for the max-flow minmulticut ratio for planar and Kr,r-free graphs. Information Processing Letters 47(2), 77–80 (1993)
Article MATH MathSciNet Google Scholar
Vazirani, V.V.: Approximation Algorithms. Springer, Berlin (2001)
Google Scholar
Yannakakis, M., Kanellakis, P.C., Cosmadakis, S.C., Papadimitriou, C.H.: Cutting and partitioning a graph after a fixed pattern. In: 10th Intl. Coll. on Automata, Languages, and Programming (1983)
Google Scholar

Download references

Author information

Authors and Affiliations

Laboratory for Computer Science, MIT, Cambridge, MA, 02139, USA
Erik D. Demaine & Nicole Immorlica

Authors

Erik D. Demaine
View author publications
You can also search for this author in PubMed Google Scholar
Nicole Immorlica
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Computer Science Department, Princeton University, 35 Olden Street, 08540, Princeton, NJ
Sanjeev Arora
Institute for Computer Science, University of Kiel, Olshausenstrasse 40, 24118, Kiel, Germany
Klaus Jansen
Battelle Bâtiment A, Centre Universitaire d’Informatique, Route de Drize 7, 1227, Carouge, Geneva, Switzerland
José D. P. Rolim
University of California, Los Angeles
Amit Sahai

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Demaine, E.D., Immorlica, N. (2003). Correlation Clustering with Partial Information. In: Arora, S., Jansen, K., Rolim, J.D.P., Sahai, A. (eds) Approximation, Randomization, and Combinatorial Optimization.. Algorithms and Techniques. RANDOM APPROX 2003 2003. Lecture Notes in Computer Science, vol 2764. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-45198-3_1

Download citation

DOI: https://doi.org/10.1007/978-3-540-45198-3_1
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-40770-6
Online ISBN: 978-3-540-45198-3
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics