Skip to main content
Log in

Semantics and evaluation of top-k queries in probabilistic databases

  • Published:
Distributed and Parallel Databases Aims and scope Submit manuscript

Abstract

We study here fundamental issues involved in top-k query evaluation in probabilistic databases. We consider simple probabilistic databases in which probabilities are associated with individual tuples, and general probabilistic databases in which, additionally, exclusivity relationships between tuples can be represented. In contrast to other recent research in this area, we do not limit ourselves to injective scoring functions. We formulate three intuitive postulates for the semantics of top-k queries in probabilistic databases, and introduce a new semantics, Global-Topk, that satisfies those postulates to a large degree. We also show how to evaluate queries under the Global-Topk semantics. For simple databases we design dynamic-programming based algorithms. For general databases we show polynomial-time reductions to the simple cases, and provide effective heuristics to speed up the computation in practice. For example, we demonstrate that for a fixed k the time complexity of top-k query evaluation is as low as linear, under the assumption that probabilistic databases are simple and scoring functions are injective.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Zhang, X., Chomicki, J.: On the semantics and evaluation of top-k queries in probabilistic databases. In: ICDE Workshops, pp. 556–563 (2008)

  2. Imielinski, T., Lipski, W.: Incomplete information in relational databases. J. ACM 31(4), 761–791 (1984)

    Article  MATH  MathSciNet  Google Scholar 

  3. Cavallo, R., Pittarelli, M.: The theory of probabilistic databases. In: VLDB (1987)

  4. Halpern, J.Y.: An analysis of first-order logics of probability. Artificial Intelligence 46(3), 311–350 (1990)

    Article  MATH  MathSciNet  Google Scholar 

  5. Abiteboul, S., Hull, R., Vianu, V.: Foundations of Databases: The Logical Level. Addison–Wesley, Reading (1994)

    Google Scholar 

  6. Fuhr, N., Rölleke, T.: A probabilistic relational algebra for the integration of information retrieval and database systems. ACM Trans. Inf. Syst. 15(1), 32–66 (1997)

    Article  Google Scholar 

  7. Zimányi, E.: Query evaluation in probabilistic relational databases. Theor. Comput. Sci. 171(1–2), 179–219 (1997)

    Article  MATH  Google Scholar 

  8. Lakshmanan, L.V.S., Leone, N., Ross, R.B., Subrahmanian, V.S.: Probview: A flexible probabilistic database system. ACM Trans. Database Syst. 22(3), 419–469 (1997)

    Article  Google Scholar 

  9. Dalvi, N.N., Suciu, D.: Efficient query evaluation on probabilistic databases. VLDB J. 16(4), 523–544 (2007)

    Article  Google Scholar 

  10. Benjelloun, O., Sarma, A.D., Halevy, A.Y., Widom, J.: Uldbs: Databases with uncertainty and lineage. In: VLDB (2006)

  11. Widom, J.: Trio: A system for integrated management of data, accuracy, and lineage. In: CIDR (2005)

  12. Olteanu, D., Koch, C., Antova, L.: World-set decompositions: Expressiveness and efficient algorithms. Theor. Comput. Sci. 403(2–3), 265–284 (2008)

    Article  MATH  MathSciNet  Google Scholar 

  13. http://www.infosys.uni-sb.de/projects/maybms/

  14. Fagin, R.: Combining fuzzy information from multiple systems. J. Comput. Syst. Sci. 58(1), 83–99 (1999)

    Article  MATH  MathSciNet  Google Scholar 

  15. Fagin, R., Lotem, A., Naor, M.: Optimal aggregation algorithms for middleware. In: PODS (2001)

  16. Natsev, A., Chang, Y.C., Smith, J.R., Li, C.S., Vitter, J.S.: Supporting incremental join queries on ranked inputs. In: VLDB (2001)

  17. Marian, A., Bruno, N., Gravano, L.: Evaluating top-queries over web-accessible databases. ACM Trans. Database Syst. 29(2), 319–362 (2004)

    Article  Google Scholar 

  18. Guha, S., Koudas, N., Marathe, A., Srivastava, D.: Merging the results of approximate match operations. In: VLDB, pp. 636–647 (2004)

  19. Ilyas, I.F., Aref, W.G., Elmagarmid, A.K.: Joining ranked inputs in practice. In: VLDB (2002)

  20. Ilyas, I.F., Aref, W.G., Elmagarmid, A.K.: Supporting top-k join queries in relational databases. In: VLDB (2003)

  21. Soliman, M.A., Ilyas, I.F., Chang, K.C.C.: Top-k query processing in uncertain databases. In: ICDE (2007)

  22. Soliman, M.A., Ilyas, I.F., Chang, K.C.C.: Probabilistic top- and ranking-aggregate queries. ACM Trans. Database Syst. 33(3) (2008)

  23. Ré, C., Dalvi, N.N., Suciu, D.: Efficient top-k query evaluation on probabilistic data. In: ICDE (2007)

  24. Hua, M., Pei, J., Zhang, W., Lin, X.: Ranking queries on uncertain data: a probabilistic threshold approach. In: SIGMOD Conference, pp. 673–686 (2008)

  25. Cormode, G., Li, F., Yi, K.: Semantics of ranking queries for probabilistic data and expected ranks. In: ICDE, pp. 305–316 (2009)

  26. Bruno, N., Wang, H.: The threshold algorithm: From middleware systems to the relational engine. IEEE Trans. Knowl. Data Eng. 19(4), 523–537 (2007)

    Article  Google Scholar 

  27. Burdick, D., Deshpande, P.M., Jayram, T.S., Ramakrishnan, R., Vaithyanathan, S.: OLAP over uncertain and imprecise data. VLDB J. 16(1), 123–144 (2007)

    Google Scholar 

  28. Yi, K., Li, F., Kollios, G., Srivastava, D.: Efficient processing of top-k queries in uncertain databases. In: ICDE, pp. 1406–1408 (2008)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xi Zhang.

Additional information

Research partially supported by NSF grant IIS-0307434. An earlier version of some of the results in this paper was presented in [1].

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, X., Chomicki, J. Semantics and evaluation of top-k queries in probabilistic databases. Distrib Parallel Databases 26, 67–126 (2009). https://doi.org/10.1007/s10619-009-7050-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10619-009-7050-y

Keywords

Navigation