Skip to main content

Analytics over Probabilistic Unmerged Duplicates

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8720))

Abstract

This paper introduces probabilistic databases with unmerged duplicates (DBud), i.e., databases containing probabilistic information about instances found to describe the same real-world objects. We discuss the need for efficiently querying such databases and for supporting practical query scenarios that require analytical or summarized information. We also sketch possible methodologies and techniques that would allow performing efficient processing of queries over such probabilistic databases, and especially without the need to materialize the (potentially, huge) collection of all possible deduplication worlds.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Andritsos, P., Fuxman, A., Miller, R.: Clean answers over dirty databases: A probabilistic approach. In: ICDE (2006)

    Google Scholar 

  2. Dalvi, N., Suciu, D.: Efficient query evaluation on probabilistic databases. VLDB 16(4) (2007)

    Google Scholar 

  3. Dylla, M., Miliaraki, I., Theobald, M.: Top-k query processing in probabilistic databases with non-materialized views. In: ICDE (2013)

    Google Scholar 

  4. Elmagarmid, A., Ipeirotis, P., Verykios, V.: Duplicate record detection: A survey. TKDE 19(1) (2007)

    Google Scholar 

  5. Fink, R., Han, L., Olteanu, D.: Aggregation in probabilistic databases via knowledge compilation. PVLDB 5(5) (2012)

    Google Scholar 

  6. Ioannou, E., Nejdl, W., Niederée, C., Velegrakis, Y.: On-the-fly entity-aware query processing in the presence of linkage. PVLDB 3(1) (2010)

    Google Scholar 

  7. Olteanu, D., Wen, H.: Ranking query answers in probabilistic databases: Complexity and efficient algorithms. In: ICDE (2012)

    Google Scholar 

  8. Ré, C., Dalvi, N., Suciu, D.: Efficient top-k query evaluation on probabilistic data. In: ICDE (2007)

    Google Scholar 

  9. Sismanis, Y., Wang, L., Fuxman, A., Haas, P., Reinwald, B.: Resolution-aware query answering for business intelligence. In: ICDE (2009)

    Google Scholar 

  10. Soliman, M., Ilyas, I., Chang, K.: Top-k query processing in uncertain databases. In: ICDE (2007)

    Google Scholar 

  11. Wick, M., Rohanimanesh, K., Schultz, K., McCallum, A.: A unified approach for schema matching, coreference and canonicalization. In: KDD (2008)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Ioannou, E., Garofalakis, M. (2014). Analytics over Probabilistic Unmerged Duplicates. In: Straccia, U., Calì, A. (eds) Scalable Uncertainty Management. SUM 2014. Lecture Notes in Computer Science(), vol 8720. Springer, Cham. https://doi.org/10.1007/978-3-319-11508-5_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-11508-5_17

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-11507-8

  • Online ISBN: 978-3-319-11508-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics