Skip to main content

Minimizing detail data in data warehouses

  • Conference paper
  • First Online:
Advances in Database Technology — EDBT'98 (EDBT 1998)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1377))

Included in the following conference series:

Abstract

Data warehouses collect and maintain large amounts of data from several distributed and heterogeneous data sources. Because of security reasons, operational requirements, and technical feasibility it is often impossible for data warehouses to access the data sources directly. Instead data warehouses have to replicate legacy information as detail data in order to be able to maintain their summary data.

In this paper we investigate how to minimize the amount of detail data stored in a data warehouse. More specifically, we identify the minimal amount of data that has to be replicated in order to maintain, either incrementally or by recomputation, summary data defined in terms of generalized project-select-join (GPSJ) views. We show how to minimize the number of tuples and attributes in the current detail tables and even aggregate them where possible. The amount of data to be stored in current detail tables is minimized by exploiting smart duplicate compression in addition to local and join reductions. We identify situations where it becomes possible to omit the typically huge fact table and prove that these techniques in concert ensure that the current detail data is minimal in the sense that no subset of it permits to accurately maintain the same summary data. Finally, we sketch how existing maintenance methods can be adapted to use the minimal detail tables we propose.

This research was supported in part by the Danish Technical Research Council through grant 9700780 and Nykredit, Inc.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. M. O. Akinde, O. G. Jensen, and M. H. Böhlen. Minimizing Detail Data in Data Warehouses. R-98-5002, Aalborg University, 1998.

    Google Scholar 

  2. J. A. Blakely, N. Coburn, and P. A. Larson. Updating Derived Relations: Detecting Irrevelant and Autonomously Computable Updates. In ACM Transactions on Database Systems, pages 14(3):369–400. Los Alamitos, USA, September 1989.

    Article  Google Scholar 

  3. S. Ceri and J. Widom. Deriving Production Rules for Incremental View Maintenance. In Proceedings of the Seventeenth International Conference on Very Large Databases, pages 577–589. Barcelona, Spain, September 1991.

    Google Scholar 

  4. A. Gupta, V. Harinarayan, and D. Quass. Aggregate-Query Processing in Data Warehousing Environments. In Umeshwar Dayal, Peter M. D. Gray, and Shojiro Nishio, editors, Proceedings of the Twenty-first International Conference on Very large Databases. Zurich, Switzerland, September 1995.

    Google Scholar 

  5. A. Gupta, H. V. Jagadish, and I. S. Mumick. Data Integration using Self Maintainable Views. Technical report, AT&T Bell Laboratories, November 1994.

    Google Scholar 

  6. T. Griffin and L. Libkin. Incremental Maintenance of Views with Duplicates. In M. Carey and D. Schneider, editors, Proceedings of the ACM SIGMOD Conference on Management of Data, pages 328–339. San Jose, CA, USA, May 1995.

    Google Scholar 

  7. A. Gupta, I. S. Mumick, and V. S. Subrahmanian. Maintaining Views Incrementally. In Proceedings of the ACM SIGMOD Conference on Management of Data, pages 157–166. Washington D.C., USA, May 1993.

    Google Scholar 

  8. J. V. Harrison and S. W. Diettrich. Maintenance of Materialized Views in a Deductive Database: An Update Propagation Approach. In Proceedings of the Sixth International Conference of Data Enginerring, pages 56–65, 1992.

    Google Scholar 

  9. R. Hull and G. Zhou. A Framework for Supporting Data Integration using the Materialized and Virtual Approaches. In Proceedings of the ACM SIGMOD Conference on Management of Data. Montreal, Quebec, Canada, June 1996.

    Google Scholar 

  10. W. H. Inmon, C. Imhoff, and G. Battas. Building the Operational Data Store. John Wiley & Sons, Inc., 1996.

    Google Scholar 

  11. W. H. Inmon. Building the Data Warehouse. John Wiley & Sons, Inc., 1992.

    Google Scholar 

  12. R. Kimball. The Data Warehouse Toolkit. John Wiley & Sons, Inc., 1996.

    Google Scholar 

  13. I. S. Mumick, D. Quass, and B. S. Mumick. Maintenance of Data Cubes and Summary Tables in a Warehouse. In Proceedings of the ACM SIGMOD Conference on Management of Data. Tuscon, Arizona, USA, May 1997.

    Google Scholar 

  14. D. Quass, A. Gupta, I. S. Mumick, and J. Widom. Making Views Self-Maintainable for Data Warehousing. In Proceedings of the Conference on Parallel and Distributed Information Systems. Miami Beach, Florida, USA, December 1996.

    Google Scholar 

  15. D. Quass. Maintenance Expressions for Views with Aggregation. In ACM Workshop on Materialized Views: Techniques and Applications. Montreal, Canada, June 1996.

    Google Scholar 

  16. A. Segev and W. Fang. Currency-based Updates to Distributed Materialized Views. In Proceedings of the Sixth International Conference of Data Enginerring, pages 512–520. Los Alamitos, USA, 1990.

    Google Scholar 

  17. A. Segev and W. Fang. Optimal Update Policies for Distributed Materialized Views. In Management Science, pages 37(7):851–870, July 1991.

    Article  MATH  Google Scholar 

  18. A. Segev and J. Park. Updating Distributed Materialized Views. In IEEE Transactions on Knowledge and Data Engineering, pages 1(2):173–184, 1989.

    Article  Google Scholar 

  19. J. Widom. Research Problems in Data Warehousing. In Proceedings of the Fourth International Conference on Information and Knowledge Management, November 1995.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Hans-Jörg Schek Gustavo Alonso Felix Saltor Isidro Ramos

Rights and permissions

Reprints and permissions

Copyright information

© 1998 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Akinde, M.O., Jensen, O.G., Böhlen, M.H. (1998). Minimizing detail data in data warehouses. In: Schek, HJ., Alonso, G., Saltor, F., Ramos, I. (eds) Advances in Database Technology — EDBT'98. EDBT 1998. Lecture Notes in Computer Science, vol 1377. Springer, Berlin, Heidelberg . https://doi.org/10.1007/BFb0100992

Download citation

  • DOI: https://doi.org/10.1007/BFb0100992

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-64264-0

  • Online ISBN: 978-3-540-69709-1

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics