skip to main content
10.1145/1077501.1077509acmconferencesArticle/Chapter ViewAbstractPublication PagesiqisConference Proceedingsconference-collections
Article

ETL queues for active data warehousing

Published:17 June 2005Publication History

ABSTRACT

Traditionally, the refreshment of data warehouses has been performed in an off-line fashion. Active Data Warehousing refers to a new trend where data warehouses are updated as frequently as possible, to accommodate the high demands of users for fresh data. In this paper, we propose a framework for the implementation of active data warehousing, with the following goals: (a) minimal changes in the software configuration of the source, (b) minimal overhead for the source due to the active nature of data propagation, (c) the possibility of smoothly regulating the overall configuration of the environment in a principled way. In our framework, we have implemented ETL activities over queue networks and employ queue theory for the prediction of the performance and the tuning of the operation of the overall refreshment process. Due to the performance overheads incurred, we explore different architectural choices for this task and discuss the issues that arise for each of them.

References

  1. Daniel J. Abadi, Don Carney, Ugur Çetintemel, et al. Aurora: a new model and architecture for data stream management. The VLDB Journal, 12(2), 120--139, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. G. Alonso, F. Casati, H. Kuno, V. Machiraju. Web Services: Concepts, Architectures and Applications. Springer-Verlag, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. J. Adzic, V. Fiore. Data Warehouse Population Platform. In Proc. 5th Intl. Workshop on the Design and Management of Data Warehouses (DMDW'03), Berlin, Germany, 2003.Google ScholarGoogle Scholar
  4. Apache Software Foundation. Axis. Available at http://ws.apache.org/axis/Google ScholarGoogle Scholar
  5. S. Babu, J. Widom. Continuous Queries over Data Streams. SIGMOD Record 30(3), 109--120, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Donald Burleson. New Developments In Oracle Data Warehousing. Available at: http://dba-oracle.com/oracle_news/2004_4_22_burleson.htmGoogle ScholarGoogle Scholar
  7. Stefano Ceri, Jennifer Widom. Deriving Production Rules for Incremental View Maintenance. In Proc. VLDB, Barcelona Spain, September 1991, 577--589 Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Yingwei Cui, Jennifer Widom. Lineage tracing for general data warehouse transformations. The VLDB Journal 12(1), 41--58, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. W. Duquaine Web Services Ruminations. Presentation at High Performance Transaction Systems Workshop (HPTS'03). Asilomar Conference Center, California, October 12-15, 2003. Available at http://research.sun.com/hpts2003/Google ScholarGoogle Scholar
  10. Galhardas, H., Florescu, D., Shasha, D., and Simon, E. Ajax: An Extensible Data Cleaning Tool. In Proc. ACM SIGMOD, Dallas, Texas, May 2000, p. 590. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. D. Gross, C. Harris. Fundamentals of Queuing Theory. Wiley, 3rd Edition, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. H. Gupta and I. S. Mumick. Incremental Maintenance of Aggregate and Outerjoin Expressions. To appear in Information Systems, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Ashish Gupta, Inderpal Singh Mumick. Maintenance of Materialized Views: Problems, Techniques, and Applications. Data Engineering Bulletin 18(2), 3--18, 1995.Google ScholarGoogle Scholar
  14. Qingchun Jiang, Sharma Chakravarthy. Queueing analysis of relational operators for continuous data streams. In Proc. CIKM, New Orleans, Louisiana, USA, November 2003, 271--278. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Wilburt Labio, Jun Yang, Yingwei Cui, Hector Garcia-Molina, Jennifer Widom: Performance Issues in Incremental Warehouse Maintenance. In Proc. VLDB, Cairo, Egypt, September 2000, 461--472. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Wilburt Labio, Hector Garcia-Molina: Efficient Snapshot Differential Algorithms for Data Warehousing. In Proc. VLDB, Mumbai, India, September 1996, 63--74. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. D. Lomet, J. Gehrke. Special Issue on Data Stream Processing. Data Engineering Bulletin, 26(1), 2003.Google ScholarGoogle Scholar
  18. Wilburt Labio, Janet L. Wiener, Hector Garcia-Molina, Vlad Gorelik. Efficient Resumption of Interrupted Warehouse Loads. In Proc. of ACM SIGMOD, Dallas, Texas, USA, May 2000, 46--57. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. On-Time Data Warehousing with Oracle 10g - Information at the Speed of your Business. An Oracle White Paper. August 2003. Available at http://www.oracle.com/technology/products/bi/pdf/10grl_twp_bi_ontime_etl.pdfGoogle ScholarGoogle Scholar
  20. P. Graf. The Program Base Library. Publicly available through http://mission.base.com/peter/source/Google ScholarGoogle Scholar
  21. Vijayshankar Raman, Joseph M. Hellerstein: Potter's Wheel. An Interactive Data Cleaning System. In Proc. VLDB, Rome, Italy, September 2001, 381--390. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. C. White. Intelligent Business Strategies: Real-Time Data Warehousing Heats Up. DM Peview, August 2002. Available at http://www.dmreview.com/article_sub.cfm? articleId=5570Google ScholarGoogle Scholar
  23. A. Willig. Performance Evaluation Techniques. Available at http://www-ks.hpi.uni-potsdam.de/docs/engl/teaching/pet/ss2004/script.pdf, 2004.Google ScholarGoogle Scholar
  24. Yue Zhuge, Hector Garcia-Molina, Joachim Hammer, Jennifer Widom: View Maintenance in a Warehousing Environment. In Proc. of ACM SIGMOD, 1995, 316--327. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Xin Zhang, Elke A. Rundensteiner: Integrating the maintenance and synchronization of data warehouses using a cooperative framework. Information Systems 27(4), 219--243, 2002.Google ScholarGoogle ScholarCross RefCross Ref

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Conferences
    IQIS '05: Proceedings of the 2nd international workshop on Information quality in information systems
    June 2005
    116 pages
    ISBN:1595931600
    DOI:10.1145/1077501

    Copyright © 2005 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 17 June 2005

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • Article

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader