Skip to main content

Pairs Covered by a Sequence of Sets

  • Conference paper
  • First Online:
  • 647 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9210))

Abstract

Enumerating minimal new combinations of elements in a sequence of sets is interesting, e.g., for novelty detection in a stream of texts. The sets are the bags of words occuring in the texts. We focus on new pairs of elements as they are abundant. By simple data structures we can enumerate them in quadratic time, in the size of the sets, but large intersections with earlier sets rule out all pairs therein in linear time. The challenge is to use this observation efficiently. We give a greedy heuristic based on the twin graph, a succinct description of the pairs covered by a set family, and on finding good candidate sets by random sampling. The heuristic is motivated and supported by several related complexity results: sample size estimates, hardness of maximal coverage of pairs, and approximation guarantees when a few sets cover almost all pairs.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Boros, E., Gurvich, V., Khachiyan, L., Makino, K.: On Maximal Frequent and Minimal Infrequent Sets in Binary Matrices. Ann. Math. Artif. Intell. 39, 211–221 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  2. Ceci, M., Appice, A., Loglisci, C., Caruso, C., Fumarola, F., Valente, C., Malerba, D.: Relational frequent patterns mining for novelty detection from data streams. In: Perner, P. (ed.) MLDM 2009. LNCS, vol. 5632, pp. 427–439. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  3. Cygan, M., Kratsch, S., Pilipczuk, M., Pilipczuk, M., Wahlström, W.: Clique cover and graph separation: new incompressibility results. ACM Trans. Comput. Theory 6, Article 6 (2014),

    Google Scholar 

  4. Elbassioni, K.M., Hagen, M., Rauf, I.: Some fixed-parameter tractable classes of hypergraph duality and related problems. In: Grohe, M., Niedermeier, R. (eds.) IWPEC 2008. LNCS, vol. 5018, pp. 91–102. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  5. Elomaa, T., Kujala, J.: Covering analysis of the greedy algorithm for partial cover. In: Elomaa, T., Mannila, H., Orponen, P. (eds.) Ukkonen Festschrift 2010. LNCS, vol. 6060, pp. 102–113. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  6. Gramm, J., Guo, J., Hüffner, F., Niedermeier, R.: Data Reduction and Exact Algorithms for Clique Cover. ACM J. Exper. Algor. 13, Article No. 2 (2008)

    Google Scholar 

  7. Gupta, A., Mittal, A., Bhattacharya, A.: Minimally infrequent itemset mining using pattern-growth paradigm and residual trees. In: Haritsa, J.R., Dayal, U., Deshpande, P.M., Sadaphal, V.P. (eds.) 17th International Conference on Management of Data, pp. 57–68. Allied Publishers, Bangalore (2011)

    Google Scholar 

  8. Haglin, D.J., Manning, A.M.: On minimal infrequent itemset mining. In: Stahlbock, R., Crone, S.F., Lessmann, S. (eds.) DMIN 2007, pp. 141–147, CSREA Press (2007)

    Google Scholar 

  9. Hochbaum, D.S.: Approximating covering and packing problems: set cover, vertex cover, independent set, and related problems. In: Hochbaum, D.S. (ed.) Approximation Algorithms for NP-hard Problems, pp. 94–143. PSW Publishing, Boston (1997)

    Google Scholar 

  10. Hochbaum, D.S., Pathria, A.: Analysis of the greedy approach of maximum k-coverage. Naval Res. Q. 45, 615–627 (1998)

    Article  MathSciNet  MATH  Google Scholar 

  11. Karkali, M., Rousseau, F., Ntoulas, A., Vazirgiannis, M.: Efficient online novelty detection in news streams. In: Lin, X., Manolopoulos, Y., Srivastava, D., Huang, G. (eds.) WISE 2013, Part I. LNCS, vol. 8180, pp. 57–71. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  12. Karkali, M., Rousseau, F., Ntoulas, A., Vazirgiannis, M.: Using temporal IDF for efficient novelty detection in text streams. CoRR abs/1401.1456 (2014)

    Google Scholar 

  13. Turán, P.: On an extremal problem in graph theory. Matematikai és Fizikai Lapok 48, 436–452 (1941)

    Google Scholar 

  14. Williams, V.V., Williams, R.: Subcubic equivalences between path, matrix and triangle problems. In: FOCS 2010, pp. 645–654. IEEE Computer Society (2010)

    Google Scholar 

Download references

Acknowledgment

This work has been supported by the Swedish Foundation for Strategic Research (SSF) through Grant IIS11-0089 for a data mining project entitled “Data-driven secure business intelligence”. The author also wishes to thank the referees for careful reading.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Peter Damaschke .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Damaschke, P. (2015). Pairs Covered by a Sequence of Sets. In: Kosowski, A., Walukiewicz, I. (eds) Fundamentals of Computation Theory. FCT 2015. Lecture Notes in Computer Science(), vol 9210. Springer, Cham. https://doi.org/10.1007/978-3-319-22177-9_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-22177-9_17

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-22176-2

  • Online ISBN: 978-3-319-22177-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics