ABSTRACT
Traditionally, the refreshment of data warehouses has been performed in an off-line fashion. Active Data Warehousing refers to a new trend where data warehouses are updated as frequently as possible, to accommodate the high demands of users for fresh data. In this paper, we propose a framework for the implementation of active data warehousing, with the following goals: (a) minimal changes in the software configuration of the source, (b) minimal overhead for the source due to the active nature of data propagation, (c) the possibility of smoothly regulating the overall configuration of the environment in a principled way. In our framework, we have implemented ETL activities over queue networks and employ queue theory for the prediction of the performance and the tuning of the operation of the overall refreshment process. Due to the performance overheads incurred, we explore different architectural choices for this task and discuss the issues that arise for each of them.
- Daniel J. Abadi, Don Carney, Ugur Çetintemel, et al. Aurora: a new model and architecture for data stream management. The VLDB Journal, 12(2), 120--139, 2003. Google ScholarDigital Library
- G. Alonso, F. Casati, H. Kuno, V. Machiraju. Web Services: Concepts, Architectures and Applications. Springer-Verlag, 2003. Google ScholarDigital Library
- J. Adzic, V. Fiore. Data Warehouse Population Platform. In Proc. 5th Intl. Workshop on the Design and Management of Data Warehouses (DMDW'03), Berlin, Germany, 2003.Google Scholar
- Apache Software Foundation. Axis. Available at http://ws.apache.org/axis/Google Scholar
- S. Babu, J. Widom. Continuous Queries over Data Streams. SIGMOD Record 30(3), 109--120, 2001. Google ScholarDigital Library
- Donald Burleson. New Developments In Oracle Data Warehousing. Available at: http://dba-oracle.com/oracle_news/2004_4_22_burleson.htmGoogle Scholar
- Stefano Ceri, Jennifer Widom. Deriving Production Rules for Incremental View Maintenance. In Proc. VLDB, Barcelona Spain, September 1991, 577--589 Google ScholarDigital Library
- Yingwei Cui, Jennifer Widom. Lineage tracing for general data warehouse transformations. The VLDB Journal 12(1), 41--58, 2003. Google ScholarDigital Library
- W. Duquaine Web Services Ruminations. Presentation at High Performance Transaction Systems Workshop (HPTS'03). Asilomar Conference Center, California, October 12-15, 2003. Available at http://research.sun.com/hpts2003/Google Scholar
- Galhardas, H., Florescu, D., Shasha, D., and Simon, E. Ajax: An Extensible Data Cleaning Tool. In Proc. ACM SIGMOD, Dallas, Texas, May 2000, p. 590. Google ScholarDigital Library
- D. Gross, C. Harris. Fundamentals of Queuing Theory. Wiley, 3rd Edition, 1998. Google ScholarDigital Library
- H. Gupta and I. S. Mumick. Incremental Maintenance of Aggregate and Outerjoin Expressions. To appear in Information Systems, 2004. Google ScholarDigital Library
- Ashish Gupta, Inderpal Singh Mumick. Maintenance of Materialized Views: Problems, Techniques, and Applications. Data Engineering Bulletin 18(2), 3--18, 1995.Google Scholar
- Qingchun Jiang, Sharma Chakravarthy. Queueing analysis of relational operators for continuous data streams. In Proc. CIKM, New Orleans, Louisiana, USA, November 2003, 271--278. Google ScholarDigital Library
- Wilburt Labio, Jun Yang, Yingwei Cui, Hector Garcia-Molina, Jennifer Widom: Performance Issues in Incremental Warehouse Maintenance. In Proc. VLDB, Cairo, Egypt, September 2000, 461--472. Google ScholarDigital Library
- Wilburt Labio, Hector Garcia-Molina: Efficient Snapshot Differential Algorithms for Data Warehousing. In Proc. VLDB, Mumbai, India, September 1996, 63--74. Google ScholarDigital Library
- D. Lomet, J. Gehrke. Special Issue on Data Stream Processing. Data Engineering Bulletin, 26(1), 2003.Google Scholar
- Wilburt Labio, Janet L. Wiener, Hector Garcia-Molina, Vlad Gorelik. Efficient Resumption of Interrupted Warehouse Loads. In Proc. of ACM SIGMOD, Dallas, Texas, USA, May 2000, 46--57. Google ScholarDigital Library
- On-Time Data Warehousing with Oracle 10g - Information at the Speed of your Business. An Oracle White Paper. August 2003. Available at http://www.oracle.com/technology/products/bi/pdf/10grl_twp_bi_ontime_etl.pdfGoogle Scholar
- P. Graf. The Program Base Library. Publicly available through http://mission.base.com/peter/source/Google Scholar
- Vijayshankar Raman, Joseph M. Hellerstein: Potter's Wheel. An Interactive Data Cleaning System. In Proc. VLDB, Rome, Italy, September 2001, 381--390. Google ScholarDigital Library
- C. White. Intelligent Business Strategies: Real-Time Data Warehousing Heats Up. DM Peview, August 2002. Available at http://www.dmreview.com/article_sub.cfm? articleId=5570Google Scholar
- A. Willig. Performance Evaluation Techniques. Available at http://www-ks.hpi.uni-potsdam.de/docs/engl/teaching/pet/ss2004/script.pdf, 2004.Google Scholar
- Yue Zhuge, Hector Garcia-Molina, Joachim Hammer, Jennifer Widom: View Maintenance in a Warehousing Environment. In Proc. of ACM SIGMOD, 1995, 316--327. Google ScholarDigital Library
- Xin Zhang, Elke A. Rundensteiner: Integrating the maintenance and synchronization of data warehouses using a cooperative framework. Information Systems 27(4), 219--243, 2002.Google ScholarCross Ref
Recommendations
Present and future directions in data warehousing
Many large organizations have developed data warehouses to support decision making. The data in a warehouse are subject oriented, integrated, time variant, and nonvolatile. A data warehouse contains five types of data: current detail data, older detail ...
A data warehouse architecture for clinical data warehousing
ACSW '07: Proceedings of the fifth Australasian symposium on ACSW frontiers - Volume 68Data warehousing methodologies share a common set of tasks, including business requirements analysis, data design, architectural design, implementation and deployment. Clinical data warehouses are complex and time consuming to review a series of patient ...
Comments