Skip to main content

Abstract

We consider the following general correlation-clustering problem [1]: given a graph with real edge weights (both positive and negative), partition the vertices into clusters to minimize the total absolute weight of cut positive edges and uncut negative edges. Thus, large positive weights (representing strong correlations between endpoints) encourage those endpoints to belong to a common cluster; large negative weights encourage the endpoints to belong to different clusters; and weights with small absolute value represent little information. In contrast to most clustering problems, correlation clustering specifies neither the desired number of clusters nor a distance threshold for clustering; both of these parameters are effectively chosen to be the best possible by the problem definition.

Correlation clustering was introduced by Bansal, Blum, and Chawla [1], motivated by both document clustering and agnostic learning. They proved NP-hardness and gave constant-factor approximation algorithms for the special case in which the graph is complete (full information) and every edge has weight +1 or -1. We give an O(logn)-approximation algorithm for the general case based on a linear-programming rounding and the “region-growing” technique. We also prove that this linear program has a gap of Ω(logn), and therefore our approximation is tight under this approach. We also give an O(r 3)-approximation algorithm for K r, r -minor-free graphs. On the other hand, we show that the problem is APX-hard, and any o(logn)-approximation would require improving the best approximation algorithms known for minimum multicut.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bansal, N., Blum, A., Chawla, S.: Correlation clustering. In: IEEE Symp. on Foundations of Computer Science (2002)

    Google Scholar 

  2. Bejerano, Y., Immorlica, N., Naor, S., Smith, M.: Location area design in cellular networks. In: International Conference on Mobile Computing and Networking (2003)

    Google Scholar 

  3. Charikar, M., Guruswami, V., Wirth, A.: Clustering with qualitative information (unpublished manuscript)

    Google Scholar 

  4. Dahlhaus, E., Johnson, D.S., Papadimitriou, C.H., Seymour, P.D., Yannakakis, M.: The complexity of multiway cuts. In: ACM Symp. on Theory of Comp. (1992)

    Google Scholar 

  5. Emanuel, D., Fiat, A.: Correlation clustering — minimizing disagreements on arbitrary weighted graphs. In: European Symp. on Algorithms (2003)

    Google Scholar 

  6. Ester, M., Kriegel, H.-P., Sander, J., Xu, X.: Clustering for mining in large spatial databases. KI-Journal 1 (1998); Special Issue on Data Mining. Scien. Tec. Publishing

    Google Scholar 

  7. Garg, N., Vazirani, V.V., Yannakakis, M.: Approximate max-flow min(multi)cut theorems and their applications. SIAM J. Comp. 25 (1996)

    Google Scholar 

  8. Hochbaum, D.S., Shmoys, D.B.: A unified approach to approximation algorithms for bottleneck problems. Journal of the ACM 33 (1986)

    Google Scholar 

  9. Hu, T.C.: Multicommodity network flows. Operations Research (1963)

    Google Scholar 

  10. Jain, K., Vazirani, V.V.: Primal-dual approximation algorithms for metric facility location and k-median problems. In: IEEE Symp. on Foundations of Computer Science (1999)

    Google Scholar 

  11. Kanungo, T., Mount, D.M., Netanyahu, N.S., Piatko, C.D., Silverman, R., Wu, A.Y.: An efficient k-means clustering algorithm: Analysis and implementation. IEEE Transactions on Pattern Analysis and Machine Intelligence 24(7) (2002)

    Google Scholar 

  12. Klein, P.N., Plotkin, S.A., Rao, S.: Excluded minors, network decomposition, and multicommodity flow. In: ACM Symp. on Theory of Comp. (1993)

    Google Scholar 

  13. Leighton, T., Rao, S.: Multicommodity max-flow min-cut theorems and their use in designing approximation algorithms. Journal of the ACM 46(6) (1999)

    Google Scholar 

  14. Meila, M., Heckerman, D.: An experimental comparison of several clustering and initialization methods. In: Conference on Uncertainty in Artificial Intelligence (1998)

    Google Scholar 

  15. Murtagh, F.: A survey of recent advances in hierarchical clustering algorithms. The Computer Journal 26(4) (1983)

    Google Scholar 

  16. Procopiuc, C.M.: Clustering problems and their applications. Department of Computer Science, Duke University, http://www.cs.duke.edu/~magda/clustering-survey.ps.gz

  17. Schulman, L.J.: Clustering for edge-cost minimization. Electronic Colloquium on Computational Complexity, ECCC 6(35) (1999)

    Google Scholar 

  18. Steinbach, M., Karypis, G., Kumar, V.: A comparison of document clustering techniques. In: KDD-2000 Workshop on TextMining Workshop (2000)

    Google Scholar 

  19. Tardos, E., Vazirani, V.V.: Improved bounds for the max-flow minmulticut ratio for planar and Kr,r-free graphs. Information Processing Letters 47(2), 77–80 (1993)

    Article  MATH  MathSciNet  Google Scholar 

  20. Vazirani, V.V.: Approximation Algorithms. Springer, Berlin (2001)

    Google Scholar 

  21. Yannakakis, M., Kanellakis, P.C., Cosmadakis, S.C., Papadimitriou, C.H.: Cutting and partitioning a graph after a fixed pattern. In: 10th Intl. Coll. on Automata, Languages, and Programming (1983)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Demaine, E.D., Immorlica, N. (2003). Correlation Clustering with Partial Information. In: Arora, S., Jansen, K., Rolim, J.D.P., Sahai, A. (eds) Approximation, Randomization, and Combinatorial Optimization.. Algorithms and Techniques. RANDOM APPROX 2003 2003. Lecture Notes in Computer Science, vol 2764. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-45198-3_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-45198-3_1

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-40770-6

  • Online ISBN: 978-3-540-45198-3

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics