ABSTRACT
Developers of rapidly growing applications must be able to anticipate potential scalability problems before they cause performance issues in production environments. A new type of data independence, called scale independence, seeks to address this challenge by guaranteeing a bounded amount of work is required to execute all queries in an application, independent of the size of the underlying data. While optimization strategies have been developed to provide these guarantees for the class of queries that are scale-independent when executed using simple indexes, there are important queries for which such techniques are insufficient.
Executing these more complex queries scale-independently requires precomputation using incrementally-maintained materialized views. However, since this precomputation effectively shifts some of the query processing burden from execution time to insertion time, a scale-independent system must be careful to ensure that storage and maintenance costs do not threaten scalability. In this paper, we describe a scale-independent view selection and maintenance system, which uses novel static analysis techniques that ensure that created views do not themselves become scaling bottlenecks. Finally, we present an empirical analysis that includes all the queries from the TPC-W benchmark and validates our implementation's ability to maintain nearly constant high-quantile query and update latency even as an application scales to hundreds of machines.
- P. Agrawal et al. Asynchronous view maintenance for vlsd databases. In SIGMOD, 2009. Google ScholarDigital Library
- S. Agrawal, S. Chaudhuri, and V. R. Narasayya. Automated selection of materialized views and indexes in sql databases. In VLDB, 2000. Google ScholarDigital Library
- Y. Ahmad, O. Kennedy, et al. Dbtoaster: higher-order delta processing for dynamic, frequently fresh views. Proc. VLDB Endow., 5(10), 2012. Google ScholarDigital Library
- M. Armbrust, K. Curtis, T. Kraska, A. Fox, M. J. Franklin, and D. A. Patterson. PIQL: Success-tolerant query processing in the cloud. PVLDB, 5(3), 2011. Google ScholarDigital Library
- M. Armbrust et al. Scads: Scale-independent storage for social computing applications. In CIDR, 2009.Google Scholar
- J. A. Blakeley, P.-Å. Larson, and F. W. Tompa. Efficiently updating materialized views. In SIGMOD, 1986. Google ScholarDigital Library
- S. Ceri and J. Widom. Deriving production rules for incremental view maintenance. In VLDB, 1991. Google ScholarDigital Library
- M. Chaabouni et al. The point-range tree: a data structure for indexing intervals. In Proc. of ACM CSC, 1993. Google ScholarDigital Library
- L. S. Colby et al. Algorithms for deferred view maintenance. SIGMOD Rec., 25(2), 1996. Google ScholarDigital Library
- E. Cunha et al. Analyzing the dynamic evolution of hashtags on twitter: a language-based approach. In Workshop on Languages in Social Media, 2011. Google ScholarDigital Library
- G. DeCandia et al. Dynamo: amazon's highly available key-value store. SIGOPS, 41, 2007. Google ScholarDigital Library
- J. Gray, A. Bosworth, A. Layman, D. Reichart, and H. Pirahesh. Data cube: A relational aggregation operator generalizing group-by, cross-tab, and sub-totals. 1996.Google Scholar
- A. Gupta, D. Katiyar, and I. S. Mumick. Counting solutions to the view maintenance problem. In Workshop on Deductive Databases, JICSLP, 1992.Google Scholar
- H. Gupta and I. Mumick. Selection of views to materialize in a data warehouse. Knowledge and Data Engineering, IEEE Transactions on, 17(1), 2005. Google ScholarDigital Library
- J. Kincaid. Zuckerberg: Online sharing is growing at an exponential rate. http://tinyurl.com/cskurl3.Google Scholar
- C. Koch. Incremental query evaluation in a ring of databases. In PODS, 2010. Google ScholarDigital Library
- Y. Kotidis et al. Dynamat: a dynamic view management system for data warehouses. SIGMOD Rec., 28(2), 1999. Google ScholarDigital Library
- W. Labio et al. Performance issues in incremental warehouse maintenance. In VLDB, 2000. Google ScholarDigital Library
- X. Long and T. Suel. Three-level caching for efficient query processing in large web search engines. In WWW, 2005. Google ScholarDigital Library
- G. Luo et al. Locking protocols for materialized aggregate join views. In VLDB, 2003. Google ScholarDigital Library
- H. Mistry, P. Roy, S. Sudarshan, and K. Ramamritham. Materialized view selection and maintenance using multi-query optimization. SIGMOD Rec., 30(2), 2001. Google ScholarDigital Library
- M. E. J. Newman. Power laws, pareto distributions and zipf's law. Contemporary Physics, 46, 2005.Google Scholar
- M. T. Özsu and P. Valduriez. Principles of distributed database systems (2nd ed.). 1999. Google ScholarDigital Library
- D. Quass and J. Widom. On-line warehouse view maintenance. In SIGMOD, 1997. Google ScholarDigital Library
- K. Salem et al. How to roll a join: asynchronous incremental view maintenance. SIGMOD Rec., 29(2), 2000. Google ScholarDigital Library
- B. Trushkowsky et al. The scads director: scaling a distributed storage system under stringent performance requirements. In FAST, 2011. Google ScholarDigital Library
- P. Valduriez. Join indices. ACM Trans. Database Syst., 12(2), 1987. Google ScholarDigital Library
- K. Weil. Measuring tweets. http://blog.twitter.com/2010/02/measuring-tweets.html.Google Scholar
Index Terms
- Generalized scale independence through incremental precomputation
Recommendations
On scale independence for querying big data
PODS '14: Proceedings of the 33rd ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systemsTo make query answering feasible in big datasets, practitioners have been looking into the notion of scale independence of queries. Intuitively, such queries require only a relatively small subset of the data, whose size is determined by the query and ...
Materialized views selection for answering queries
ICDEM'10: Proceedings of the Second international conference on Data Engineering and ManagementA data warehouse stores historical data to support analytical query processing. These analytical queries are long and complex and processing these against a large data warehouse consumes a lot of time. As a result, the query response time is high. One ...
Scalable and efficient processing of top-k multiple-type integrated queries
AbstractIn this paper, we define a new class of queries, the top-k multiple-type integrated query (simply, top-k MULTI query). It deals with multiple data types and finds the information in the order of relevance between the query and the object. Various ...
Comments