Skip to main content

Algorithmic Gems in the Data Miner’s Cave

  • Conference paper
Fun with Algorithms (FUN 2014)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8496))

Included in the following conference series:

Abstract

When I was younger and spent most of my time playing in the field of (more) theoretical computer science, I used to think of data mining as an uninteresting kind of game: I thought that area was a wild jungle of ad hoc techniques with no flesh to seek my teeth into. The truth is, I immediately become kind-of skeptical when I see a lot of money flying around: my communist nature pops out and I start seeing flaws everywhere.

I was an idealist, back then, which is good. But in that specific case, I was simply wrong. You may say that I am trying to convince myself just because my soul has been sold already (and they didn’t even give me the thirty pieces of silver they promised, btw). Nonetheless, I will try to offer you evidences that there are some gems, out there in the data miner’s cave, that you yourself may appreciate.

Who knows? Maybe you will decide to sell your soul to the devil too, after all.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Johnson, S.: The Ghost Map: the Story of London’s Most Terrifying Epidemic - And How It Changed Science, Cities, and the Modern World. Riverhead Books (2006)

    Google Scholar 

  2. Page, L., Brin, S., Motwani, R., Winograd, T.: The PageRank citation ranking: Bringing order to the web. Technical Report 66, Stanford University (1999)

    Google Scholar 

  3. Boldi, P., Codenotti, B., Santini, M., Vigna, S.: Ubicrawler: A scalable fully distributed web crawler. Software: Practice & Experience 34(8), 711–726 (2004)

    Google Scholar 

  4. Boldi, P., Marino, A., Santini, M., Vigna, S.: Bubing: Massive crawling for the masses. Poster Proc. of 23rd International World Wide Web Conference, Seoul, Korea (2014)

    Google Scholar 

  5. Lee, H.T., Leonard, D., Wang, X., Loguinov, D.: Irlbot: Scaling to 6 billion pages and beyond. ACM Trans. Web 3(5), 8:1–8:34 (2009)

    Google Scholar 

  6. Cho, J., Garcia-Molina, H.: Parallel crawlers. In: Proceedings of the 11th International Conference on World Wide Web, pp. 124–135. ACM (2002)

    Google Scholar 

  7. Karger, D., Lehman, E., Leighton, T., Panigrahy, R., Levine, M., Lewin, D.: Consistent hashing and random trees: Distributed caching protocols for relieving hot spots on the world wide web. In: Proceedings of the Twenty-ninth Annual ACM Symposium on Theory of Computing, pp. 654–663. ACM (1997)

    Google Scholar 

  8. Majewski, B.S., Wormald, N.C., Havas, G., Czech, Z.J.: A family of perfect hashing methods. Comput. J. 39(6), 547–554 (1996)

    Article  Google Scholar 

  9. Jacobson, G.: Space-efficient static trees and graphs. In: 30th Annual Symposium on Foundations of Computer Science, Research Triangle Park, North Carolina, pp. 549–554. IEEE (1989)

    Google Scholar 

  10. Belazzougui, D., Boldi, P., Pagh, R., Vigna, S.: Theory and practise of monotone minimal perfect hashing. In: Proceedings of the Tenth Workshop on Algorithm Engineering and Experiments (ALENEX), pp. 132–144. SIAM (2009)

    Google Scholar 

  11. Belazzougui, D., Boldi, P., Pagh, R., Vigna, S.: Monotone minimal perfect hashing: Searching a sorted table with O(1) accesses. In: Proceedings of the 20th Annual ACM-SIAM Symposium on Discrete Mathematics (SODA), pp. 785–794. ACM Press, New York (2009)

    Chapter  Google Scholar 

  12. Belazzougui, D., Boldi, P., Pagh, R., Vigna, S.: Fast prefix search in little space, with applications. In: de Berg, M., Meyer, U. (eds.) ESA 2010, Part I. LNCS, vol. 6346, pp. 427–438. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  13. Belazzougui, D., Boldi, P., Vigna, S.: Dynamic z-fast tries. In: Chavez, E., Lonardi, S. (eds.) SPIRE 2010. LNCS, vol. 6393, pp. 159–172. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  14. Randall, K.H., Stata, R., Wiener, J.L., Wickremesinghe, R.G.: The Link Database: Fast access to graphs of the web. In: Proceedings of the Data Compression Conference, pp. 122–131. IEEE Computer Society, Washington, DC (2002)

    Google Scholar 

  15. Boldi, P., Vigna, S.: The WebGraph framework I: Compression techniques. In: Proc. of the Thirteenth International World Wide Web Conference, pp. 595–601. ACM Press (2004)

    Google Scholar 

  16. Moffat, A.: Compressing integer sequences and sets. In: Kao, M.-Y. (ed.) Encyclopedia of Algorithms, pp. 1–99. Springer, US (2008)

    Google Scholar 

  17. Chierichetti, F., Kumar, R., Lattanzi, S., Mitzenmacher, M., Panconesi, A., Raghavan, P.: On compressing social networks. In: KDD 2009: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 219–228. ACM, New York (2009)

    Google Scholar 

  18. Boldi, P., Santini, M., Vigna, S.: Permuting web and social graphs. Internet Math. 6(3), 257–283 (2010)

    Article  MathSciNet  Google Scholar 

  19. Boldi, P., Santini, M., Vigna, S.: Permuting web graphs. In: Avrachenkov, K., Donato, D., Litvak, N. (eds.) WAW 2009. LNCS, vol. 5427, pp. 116–126. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  20. Boldi, P., Rosa, M., Santini, M., Vigna, S.: Layered label propagation: A multiresolution coordinate-free ordering for compressing social networks. In: Srinivasan, S., Ramamritham, K., Kumar, A., Ravindra, M.P., Bertino, E., Kumar, R. (eds.) Proceedings of the 20th International Conference on World Wide Web, pp. 587–596. ACM (2011)

    Google Scholar 

  21. Milgram, S.: The small world problem. Psychology Today 2(1), 60–67 (1967)

    MathSciNet  Google Scholar 

  22. Travers, J., Milgram, S.: An experimental study of the small world problem. Sociometry 32(4), 425–443 (1969)

    Article  Google Scholar 

  23. Lipton, R.J., Naughton, J.F.: Estimating the size of generalized transitive closures. In: VLDB 1989: Proceedings of the 15th International Conference on Very Large Data Bases, pp. 165–171. Morgan Kaufmann Publishers Inc. (1989)

    Google Scholar 

  24. Crescenzi, P., Grossi, R., Lanzi, L., Marino, A.: A comparison of three algorithms for approximating the distance distribution in real-world graphs. In: Marchetti-Spaccamela, A., Segal, M. (eds.) TAPAS 2011. LNCS, vol. 6595, pp. 92–103. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  25. Palmer, C.R., Gibbons, P.B., Faloutsos, C.: Anf: a fast and scalable tool for data mining in massive graphs. In: KDD 2002: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 81–90. ACM, New York (2002)

    Google Scholar 

  26. Boldi, P., Rosa, M., Vigna, S.: HyperANF: Approximating the neighbourhood function of very large graphs on a budget. In: Srinivasan, S., Ramamritham, K., Kumar, A., Ravindra, M.P., Bertino, E., Kumar, R. (eds.) Proceedings of the 20th International Conference on World Wide Web, pp. 625–634. ACM (2011)

    Google Scholar 

  27. Flajolet, P., Fusy, É., Gandouet, O., Meunier, F.: HyperLogLog: the analysis of a near-optimal cardinality estimation algorithm. In: Proceedings of the 13th Conference on Analysis of Algorithm (AofA 2007), pp. 127–146 (2007)

    Google Scholar 

  28. Backstrom, L., Boldi, P., Rosa, M., Ugander, J., Vigna, S.: Four degrees of separation. In: ACM Web Science 2012: Conference Proceedings, pp. 45–54. ACM Press (2012), Best paper award

    Google Scholar 

  29. Backstrom, L., Dwork, C., Kleinberg, J.M.: Wherefore art thou r3579x?: anonymized social networks, hidden patterns, and structural steganography. In: WWW, pp. 181–190 (2007)

    Google Scholar 

  30. Narayanan, A., Shmatikov, V.: De-anonymizing social networks. In: IEEE Symposium on Security and Privacy (2009)

    Google Scholar 

  31. Boldi, P., Bonchi, F., Gionis, A., Tassa, T.: Injecting uncertainty in graphs for identity obfuscation. Proceedings of the VLDB Endowment 5(11), 1376–1387 (2012)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Boldi, P. (2014). Algorithmic Gems in the Data Miner’s Cave. In: Ferro, A., Luccio, F., Widmayer, P. (eds) Fun with Algorithms. FUN 2014. Lecture Notes in Computer Science, vol 8496. Springer, Cham. https://doi.org/10.1007/978-3-319-07890-8_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-07890-8_1

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-07889-2

  • Online ISBN: 978-3-319-07890-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics