Abstract
We study here fundamental issues involved in top-k query evaluation in probabilistic databases. We consider simple probabilistic databases in which probabilities are associated with individual tuples, and general probabilistic databases in which, additionally, exclusivity relationships between tuples can be represented. In contrast to other recent research in this area, we do not limit ourselves to injective scoring functions. We formulate three intuitive postulates for the semantics of top-k queries in probabilistic databases, and introduce a new semantics, Global-Topk, that satisfies those postulates to a large degree. We also show how to evaluate queries under the Global-Topk semantics. For simple databases we design dynamic-programming based algorithms. For general databases we show polynomial-time reductions to the simple cases, and provide effective heuristics to speed up the computation in practice. For example, we demonstrate that for a fixed k the time complexity of top-k query evaluation is as low as linear, under the assumption that probabilistic databases are simple and scoring functions are injective.
Similar content being viewed by others
References
Zhang, X., Chomicki, J.: On the semantics and evaluation of top-k queries in probabilistic databases. In: ICDE Workshops, pp. 556–563 (2008)
Imielinski, T., Lipski, W.: Incomplete information in relational databases. J. ACM 31(4), 761–791 (1984)
Cavallo, R., Pittarelli, M.: The theory of probabilistic databases. In: VLDB (1987)
Halpern, J.Y.: An analysis of first-order logics of probability. Artificial Intelligence 46(3), 311–350 (1990)
Abiteboul, S., Hull, R., Vianu, V.: Foundations of Databases: The Logical Level. Addison–Wesley, Reading (1994)
Fuhr, N., Rölleke, T.: A probabilistic relational algebra for the integration of information retrieval and database systems. ACM Trans. Inf. Syst. 15(1), 32–66 (1997)
Zimányi, E.: Query evaluation in probabilistic relational databases. Theor. Comput. Sci. 171(1–2), 179–219 (1997)
Lakshmanan, L.V.S., Leone, N., Ross, R.B., Subrahmanian, V.S.: Probview: A flexible probabilistic database system. ACM Trans. Database Syst. 22(3), 419–469 (1997)
Dalvi, N.N., Suciu, D.: Efficient query evaluation on probabilistic databases. VLDB J. 16(4), 523–544 (2007)
Benjelloun, O., Sarma, A.D., Halevy, A.Y., Widom, J.: Uldbs: Databases with uncertainty and lineage. In: VLDB (2006)
Widom, J.: Trio: A system for integrated management of data, accuracy, and lineage. In: CIDR (2005)
Olteanu, D., Koch, C., Antova, L.: World-set decompositions: Expressiveness and efficient algorithms. Theor. Comput. Sci. 403(2–3), 265–284 (2008)
Fagin, R.: Combining fuzzy information from multiple systems. J. Comput. Syst. Sci. 58(1), 83–99 (1999)
Fagin, R., Lotem, A., Naor, M.: Optimal aggregation algorithms for middleware. In: PODS (2001)
Natsev, A., Chang, Y.C., Smith, J.R., Li, C.S., Vitter, J.S.: Supporting incremental join queries on ranked inputs. In: VLDB (2001)
Marian, A., Bruno, N., Gravano, L.: Evaluating top-queries over web-accessible databases. ACM Trans. Database Syst. 29(2), 319–362 (2004)
Guha, S., Koudas, N., Marathe, A., Srivastava, D.: Merging the results of approximate match operations. In: VLDB, pp. 636–647 (2004)
Ilyas, I.F., Aref, W.G., Elmagarmid, A.K.: Joining ranked inputs in practice. In: VLDB (2002)
Ilyas, I.F., Aref, W.G., Elmagarmid, A.K.: Supporting top-k join queries in relational databases. In: VLDB (2003)
Soliman, M.A., Ilyas, I.F., Chang, K.C.C.: Top-k query processing in uncertain databases. In: ICDE (2007)
Soliman, M.A., Ilyas, I.F., Chang, K.C.C.: Probabilistic top- and ranking-aggregate queries. ACM Trans. Database Syst. 33(3) (2008)
Ré, C., Dalvi, N.N., Suciu, D.: Efficient top-k query evaluation on probabilistic data. In: ICDE (2007)
Hua, M., Pei, J., Zhang, W., Lin, X.: Ranking queries on uncertain data: a probabilistic threshold approach. In: SIGMOD Conference, pp. 673–686 (2008)
Cormode, G., Li, F., Yi, K.: Semantics of ranking queries for probabilistic data and expected ranks. In: ICDE, pp. 305–316 (2009)
Bruno, N., Wang, H.: The threshold algorithm: From middleware systems to the relational engine. IEEE Trans. Knowl. Data Eng. 19(4), 523–537 (2007)
Burdick, D., Deshpande, P.M., Jayram, T.S., Ramakrishnan, R., Vaithyanathan, S.: OLAP over uncertain and imprecise data. VLDB J. 16(1), 123–144 (2007)
Yi, K., Li, F., Kollios, G., Srivastava, D.: Efficient processing of top-k queries in uncertain databases. In: ICDE, pp. 1406–1408 (2008)
Author information
Authors and Affiliations
Corresponding author
Additional information
Research partially supported by NSF grant IIS-0307434. An earlier version of some of the results in this paper was presented in [1].
Rights and permissions
About this article
Cite this article
Zhang, X., Chomicki, J. Semantics and evaluation of top-k queries in probabilistic databases. Distrib Parallel Databases 26, 67–126 (2009). https://doi.org/10.1007/s10619-009-7050-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10619-009-7050-y