Abstract
We consider the following general correlation-clustering problem [1]: given a graph with real edge weights (both positive and negative), partition the vertices into clusters to minimize the total absolute weight of cut positive edges and uncut negative edges. Thus, large positive weights (representing strong correlations between endpoints) encourage those endpoints to belong to a common cluster; large negative weights encourage the endpoints to belong to different clusters; and weights with small absolute value represent little information. In contrast to most clustering problems, correlation clustering specifies neither the desired number of clusters nor a distance threshold for clustering; both of these parameters are effectively chosen to be the best possible by the problem definition.
Correlation clustering was introduced by Bansal, Blum, and Chawla [1], motivated by both document clustering and agnostic learning. They proved NP-hardness and gave constant-factor approximation algorithms for the special case in which the graph is complete (full information) and every edge has weight +1 or -1. We give an O(logn)-approximation algorithm for the general case based on a linear-programming rounding and the “region-growing” technique. We also prove that this linear program has a gap of Ω(logn), and therefore our approximation is tight under this approach. We also give an O(r 3)-approximation algorithm for K r, r -minor-free graphs. On the other hand, we show that the problem is APX-hard, and any o(logn)-approximation would require improving the best approximation algorithms known for minimum multicut.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Bansal, N., Blum, A., Chawla, S.: Correlation clustering. In: IEEE Symp. on Foundations of Computer Science (2002)
Bejerano, Y., Immorlica, N., Naor, S., Smith, M.: Location area design in cellular networks. In: International Conference on Mobile Computing and Networking (2003)
Charikar, M., Guruswami, V., Wirth, A.: Clustering with qualitative information (unpublished manuscript)
Dahlhaus, E., Johnson, D.S., Papadimitriou, C.H., Seymour, P.D., Yannakakis, M.: The complexity of multiway cuts. In: ACM Symp. on Theory of Comp. (1992)
Emanuel, D., Fiat, A.: Correlation clustering — minimizing disagreements on arbitrary weighted graphs. In: European Symp. on Algorithms (2003)
Ester, M., Kriegel, H.-P., Sander, J., Xu, X.: Clustering for mining in large spatial databases. KI-Journal 1 (1998); Special Issue on Data Mining. Scien. Tec. Publishing
Garg, N., Vazirani, V.V., Yannakakis, M.: Approximate max-flow min(multi)cut theorems and their applications. SIAM J. Comp. 25 (1996)
Hochbaum, D.S., Shmoys, D.B.: A unified approach to approximation algorithms for bottleneck problems. Journal of the ACM 33 (1986)
Hu, T.C.: Multicommodity network flows. Operations Research (1963)
Jain, K., Vazirani, V.V.: Primal-dual approximation algorithms for metric facility location and k-median problems. In: IEEE Symp. on Foundations of Computer Science (1999)
Kanungo, T., Mount, D.M., Netanyahu, N.S., Piatko, C.D., Silverman, R., Wu, A.Y.: An efficient k-means clustering algorithm: Analysis and implementation. IEEE Transactions on Pattern Analysis and Machine Intelligence 24(7) (2002)
Klein, P.N., Plotkin, S.A., Rao, S.: Excluded minors, network decomposition, and multicommodity flow. In: ACM Symp. on Theory of Comp. (1993)
Leighton, T., Rao, S.: Multicommodity max-flow min-cut theorems and their use in designing approximation algorithms. Journal of the ACM 46(6) (1999)
Meila, M., Heckerman, D.: An experimental comparison of several clustering and initialization methods. In: Conference on Uncertainty in Artificial Intelligence (1998)
Murtagh, F.: A survey of recent advances in hierarchical clustering algorithms. The Computer Journal 26(4) (1983)
Procopiuc, C.M.: Clustering problems and their applications. Department of Computer Science, Duke University, http://www.cs.duke.edu/~magda/clustering-survey.ps.gz
Schulman, L.J.: Clustering for edge-cost minimization. Electronic Colloquium on Computational Complexity, ECCC 6(35) (1999)
Steinbach, M., Karypis, G., Kumar, V.: A comparison of document clustering techniques. In: KDD-2000 Workshop on TextMining Workshop (2000)
Tardos, E., Vazirani, V.V.: Improved bounds for the max-flow minmulticut ratio for planar and Kr,r-free graphs. Information Processing Letters 47(2), 77–80 (1993)
Vazirani, V.V.: Approximation Algorithms. Springer, Berlin (2001)
Yannakakis, M., Kanellakis, P.C., Cosmadakis, S.C., Papadimitriou, C.H.: Cutting and partitioning a graph after a fixed pattern. In: 10th Intl. Coll. on Automata, Languages, and Programming (1983)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Demaine, E.D., Immorlica, N. (2003). Correlation Clustering with Partial Information. In: Arora, S., Jansen, K., Rolim, J.D.P., Sahai, A. (eds) Approximation, Randomization, and Combinatorial Optimization.. Algorithms and Techniques. RANDOM APPROX 2003 2003. Lecture Notes in Computer Science, vol 2764. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-45198-3_1
Download citation
DOI: https://doi.org/10.1007/978-3-540-45198-3_1
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-40770-6
Online ISBN: 978-3-540-45198-3
eBook Packages: Springer Book Archive