Abstract
Enumerating minimal new combinations of elements in a sequence of sets is interesting, e.g., for novelty detection in a stream of texts. The sets are the bags of words occuring in the texts. We focus on new pairs of elements as they are abundant. By simple data structures we can enumerate them in quadratic time, in the size of the sets, but large intersections with earlier sets rule out all pairs therein in linear time. The challenge is to use this observation efficiently. We give a greedy heuristic based on the twin graph, a succinct description of the pairs covered by a set family, and on finding good candidate sets by random sampling. The heuristic is motivated and supported by several related complexity results: sample size estimates, hardness of maximal coverage of pairs, and approximation guarantees when a few sets cover almost all pairs.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Boros, E., Gurvich, V., Khachiyan, L., Makino, K.: On Maximal Frequent and Minimal Infrequent Sets in Binary Matrices. Ann. Math. Artif. Intell. 39, 211–221 (2003)
Ceci, M., Appice, A., Loglisci, C., Caruso, C., Fumarola, F., Valente, C., Malerba, D.: Relational frequent patterns mining for novelty detection from data streams. In: Perner, P. (ed.) MLDM 2009. LNCS, vol. 5632, pp. 427–439. Springer, Heidelberg (2009)
Cygan, M., Kratsch, S., Pilipczuk, M., Pilipczuk, M., Wahlström, W.: Clique cover and graph separation: new incompressibility results. ACM Trans. Comput. Theory 6, Article 6 (2014),
Elbassioni, K.M., Hagen, M., Rauf, I.: Some fixed-parameter tractable classes of hypergraph duality and related problems. In: Grohe, M., Niedermeier, R. (eds.) IWPEC 2008. LNCS, vol. 5018, pp. 91–102. Springer, Heidelberg (2008)
Elomaa, T., Kujala, J.: Covering analysis of the greedy algorithm for partial cover. In: Elomaa, T., Mannila, H., Orponen, P. (eds.) Ukkonen Festschrift 2010. LNCS, vol. 6060, pp. 102–113. Springer, Heidelberg (2010)
Gramm, J., Guo, J., Hüffner, F., Niedermeier, R.: Data Reduction and Exact Algorithms for Clique Cover. ACM J. Exper. Algor. 13, Article No. 2 (2008)
Gupta, A., Mittal, A., Bhattacharya, A.: Minimally infrequent itemset mining using pattern-growth paradigm and residual trees. In: Haritsa, J.R., Dayal, U., Deshpande, P.M., Sadaphal, V.P. (eds.) 17th International Conference on Management of Data, pp. 57–68. Allied Publishers, Bangalore (2011)
Haglin, D.J., Manning, A.M.: On minimal infrequent itemset mining. In: Stahlbock, R., Crone, S.F., Lessmann, S. (eds.) DMIN 2007, pp. 141–147, CSREA Press (2007)
Hochbaum, D.S.: Approximating covering and packing problems: set cover, vertex cover, independent set, and related problems. In: Hochbaum, D.S. (ed.) Approximation Algorithms for NP-hard Problems, pp. 94–143. PSW Publishing, Boston (1997)
Hochbaum, D.S., Pathria, A.: Analysis of the greedy approach of maximum k-coverage. Naval Res. Q. 45, 615–627 (1998)
Karkali, M., Rousseau, F., Ntoulas, A., Vazirgiannis, M.: Efficient online novelty detection in news streams. In: Lin, X., Manolopoulos, Y., Srivastava, D., Huang, G. (eds.) WISE 2013, Part I. LNCS, vol. 8180, pp. 57–71. Springer, Heidelberg (2013)
Karkali, M., Rousseau, F., Ntoulas, A., Vazirgiannis, M.: Using temporal IDF for efficient novelty detection in text streams. CoRR abs/1401.1456 (2014)
Turán, P.: On an extremal problem in graph theory. Matematikai és Fizikai Lapok 48, 436–452 (1941)
Williams, V.V., Williams, R.: Subcubic equivalences between path, matrix and triangle problems. In: FOCS 2010, pp. 645–654. IEEE Computer Society (2010)
Acknowledgment
This work has been supported by the Swedish Foundation for Strategic Research (SSF) through Grant IIS11-0089 for a data mining project entitled “Data-driven secure business intelligence”. The author also wishes to thank the referees for careful reading.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Damaschke, P. (2015). Pairs Covered by a Sequence of Sets. In: Kosowski, A., Walukiewicz, I. (eds) Fundamentals of Computation Theory. FCT 2015. Lecture Notes in Computer Science(), vol 9210. Springer, Cham. https://doi.org/10.1007/978-3-319-22177-9_17
Download citation
DOI: https://doi.org/10.1007/978-3-319-22177-9_17
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-22176-2
Online ISBN: 978-3-319-22177-9
eBook Packages: Computer ScienceComputer Science (R0)