Skip to main content
Log in

OLAP over uncertain and imprecise data

  • Special Issue Paper
  • Published:
The VLDB Journal Aims and scope Submit manuscript

Abstract

We extend the OLAP data model to represent data ambiguity, specifically imprecision and uncertainty, and introduce an allocation-based approach to the semantics of aggregation queries over such data. We identify three natural query properties and use them to shed light on alternative query semantics. While there is much work on representing and querying ambiguous data, to our knowledge this is the first paper to handle both imprecision and uncertainty in an OLAP setting.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Abiteboul S., Kanellakis P.C., Grahne G. On the representation and querying of sets of possible worlds. In: SIGMOD (1987)

  2. Arenas, M., Bertossi, L.E., Chomicki, J.: Consistent query answers in inconsistent databases. In: PODS (1999)

  3. Arenas M., Bertossi L.E., Chomicki J., He X., Raghavan V., Spinrad J. (2003) Scalar aggregation in inconsistent databases. Theor. Comput. Sci. 3(296): 405–434

    Article  MATH  MathSciNet  Google Scholar 

  4. Bell D.A., Guan J.W., Lee S.K. (1996) Generalized union and project operations for pooling uncertain and imprecise information. Data Knowl. Eng. 18(2): 89–117

    Article  MATH  Google Scholar 

  5. Cavallo, R., Pittarelli, M.: The theory of probabilistic databases. In: VLDB (1987)

  6. Chen A.L.P., Chiu J.S., Tseng F.S.C. (1996) Evaluating aggregate operations over imprecise data. IEEE TKDE 8(2): 273–284

    Google Scholar 

  7. Cheng, R., Kalashnikov, D.V., Prabhakar, S.: Evaluating probabilistic queries over imprecise data. In: SIGMOD (2003)

  8. Dempster A.P., Laird N.M., Rubin D.B. (1977) Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. Ser. B (Methodological) 39(1): 1–38

    MATH  MathSciNet  Google Scholar 

  9. Dey, D., Sarkar, S.: PSQL: A query language for probabilistic relational data. Data Knowl. Eng. 28(1), 107–120 (1998). DOI http://dx.doi.org/10.1016/S0169-023X(98)00015-9

  10. Fuhr, N., Rölleke, T.: A probabilistic relational algebra for the integration of information retrieval and database systems. ACM Trans. Inf. Syst. 15(1), 32–66 (1997). http://doi.acm.org/10.1145/239041.239045

  11. Garcia-Molina H., Porter D. (1992) The management of probabilistic data. IEEE TKDE 4, 487–501

    Google Scholar 

  12. Garg, A., Jayram, T.S., Vaithyanathan, S., Zhu, H.: Model based opinion pooling. In: The Eight International Symposium on Artificial Intelligence and Mathematics (2004)

  13. Genest C., Zidek J.V. (1986) Combining probability distributions: a critique and an annotated bibliography (avec discussion). Stat. Sci. 1, 114–148

    MATH  MathSciNet  Google Scholar 

  14. Kiviniemi, J., Wolski, A., Pesonen, A., Arminen, J.: Lazy aggregates for real-time OLAP. In: DaWaK 1999

  15. Lakshmanan L.V.S., Leone N., Ross R., Subrahmanian V.S. (1997) ProbView: a flexible probabilistic database system. ACM TODS 22(3): 419–469

    Article  Google Scholar 

  16. Lenz, H.J., Shoshani, A.: Summarizability in OLAP and statistical data bases In: SSDBM (1997)

  17. Lenz, H.J., Thalheim, B.: OLAP databases and aggregation functions. In: SSDBM (2001)

  18. McClean S.I., Scotney B.W., Shapcott M. (2001) Aggregation of imprecise and uncertain information in databases. IEEE TKDE 13(6): 902–912

    Google Scholar 

  19. Motro A. (1990) Accommodating imprecision in database systems: issues and solutions. SIGMOD Rec. 19(4): 69–74

    Article  Google Scholar 

  20. .Motro, A.: Sources of uncertainty, imprecision and inconsistency in information systems. In: Uncertainty Management in Information Systems, pp. 9–34 (1996)

  21. Pedersen, T.B., Jensen, C.S., Dyreson, C.E.: Supporting imprecision in multidimensional databases using granularities. In: SSDBM (1999)

  22. Ross, R., Subrahmanian, V.S., Grant, J.: Aggregate operators in probabilistic databases. J. ACM 52(1), 54–101 (2005). http://doi.acm.org/10.1145/1044731.1044734

    Google Scholar 

  23. Rundensteiner, E.A., Bic, L.: Evaluating aggregates in possibilistic relational databases. Data Knowl. Eng. 7(3), 239–267 (1992). DOI http://dx.doi.org/10.1016/0169-023X(92)90040-I

  24. Shoshani, A.: OLAP and statistical databases: similarities and differences. In: PODS (1997)

  25. Wu X., Barbará D. (2002) Learning missing values from summary constraints. SIGKDD Explor. 4(1): 21–30

    Google Scholar 

  26. .Wu, X., Barbará, D.: Modeling and imputation of large incomplete multidimensional datasets. In: DaWaK (2002)

  27. Zhu, H., Vaithyanathan, S., Joshi, M.V.: Topic learning from few examples. In: PKDD (2003)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Doug Burdick.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Burdick, D., Deshpande, P.M., Jayram, T.S. et al. OLAP over uncertain and imprecise data. The VLDB Journal 16, 123–144 (2007). https://doi.org/10.1007/s00778-006-0033-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00778-006-0033-y

Keywords

Navigation