research-article

Generalized scale independence through incremental precomputation

Authors:
Michael Armbrust

Google, Inc, Mountain View, CA, USA

Google, Inc, Mountain View, CA, USA
View Profile

,
Eric Liang

UC Berkeley, Berkeley, CA, USA

UC Berkeley, Berkeley, CA, USA
View Profile

,
Tim Kraska

Brown University, Providence, CA, USA

Brown University, Providence, CA, USA
View Profile

,
Armando Fox

UC Berkeley, Berkeley, CA, USA

UC Berkeley, Berkeley, CA, USA
View Profile

,
Michael J. Franklin

UC Berkeley, Berkeley, CA, USA

UC Berkeley, Berkeley, CA, USA
View Profile

,
David A. Patterson

UC Berkeley, Berkeley, CA, USA

UC Berkeley, Berkeley, CA, USA
View Profile

SIGMOD '13: Proceedings of the 2013 ACM SIGMOD International Conference on Management of DataJune 2013Pages 625–636https://doi.org/10.1145/2463676.2465333

Published:22 June 2013Publication History

SIGMOD '13: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data

Pages 625–636

ABSTRACT

Developers of rapidly growing applications must be able to anticipate potential scalability problems before they cause performance issues in production environments. A new type of data independence, called scale independence, seeks to address this challenge by guaranteeing a bounded amount of work is required to execute all queries in an application, independent of the size of the underlying data. While optimization strategies have been developed to provide these guarantees for the class of queries that are scale-independent when executed using simple indexes, there are important queries for which such techniques are insufficient.

Executing these more complex queries scale-independently requires precomputation using incrementally-maintained materialized views. However, since this precomputation effectively shifts some of the query processing burden from execution time to insertion time, a scale-independent system must be careful to ensure that storage and maintenance costs do not threaten scalability. In this paper, we describe a scale-independent view selection and maintenance system, which uses novel static analysis techniques that ensure that created views do not themselves become scaling bottlenecks. Finally, we present an empirical analysis that includes all the queries from the TPC-W benchmark and validates our implementation's ability to maintain nearly constant high-quantile query and update latency even as an application scales to hundreds of machines.

References

P. Agrawal et al. Asynchronous view maintenance for vlsd databases. In SIGMOD, 2009. Google ScholarDigital Library
S. Agrawal, S. Chaudhuri, and V. R. Narasayya. Automated selection of materialized views and indexes in sql databases. In VLDB, 2000. Google ScholarDigital Library
Y. Ahmad, O. Kennedy, et al. Dbtoaster: higher-order delta processing for dynamic, frequently fresh views. Proc. VLDB Endow., 5(10), 2012. Google ScholarDigital Library
M. Armbrust, K. Curtis, T. Kraska, A. Fox, M. J. Franklin, and D. A. Patterson. PIQL: Success-tolerant query processing in the cloud. PVLDB, 5(3), 2011. Google ScholarDigital Library
M. Armbrust et al. Scads: Scale-independent storage for social computing applications. In CIDR, 2009.Google Scholar
J. A. Blakeley, P.-Å. Larson, and F. W. Tompa. Efficiently updating materialized views. In SIGMOD, 1986. Google ScholarDigital Library
S. Ceri and J. Widom. Deriving production rules for incremental view maintenance. In VLDB, 1991. Google ScholarDigital Library
M. Chaabouni et al. The point-range tree: a data structure for indexing intervals. In Proc. of ACM CSC, 1993. Google ScholarDigital Library
L. S. Colby et al. Algorithms for deferred view maintenance. SIGMOD Rec., 25(2), 1996. Google ScholarDigital Library
E. Cunha et al. Analyzing the dynamic evolution of hashtags on twitter: a language-based approach. In Workshop on Languages in Social Media, 2011. Google ScholarDigital Library
G. DeCandia et al. Dynamo: amazon's highly available key-value store. SIGOPS, 41, 2007. Google ScholarDigital Library
J. Gray, A. Bosworth, A. Layman, D. Reichart, and H. Pirahesh. Data cube: A relational aggregation operator generalizing group-by, cross-tab, and sub-totals. 1996.Google Scholar
A. Gupta, D. Katiyar, and I. S. Mumick. Counting solutions to the view maintenance problem. In Workshop on Deductive Databases, JICSLP, 1992.Google Scholar
H. Gupta and I. Mumick. Selection of views to materialize in a data warehouse. Knowledge and Data Engineering, IEEE Transactions on, 17(1), 2005. Google ScholarDigital Library
J. Kincaid. Zuckerberg: Online sharing is growing at an exponential rate. http://tinyurl.com/cskurl3.Google Scholar
C. Koch. Incremental query evaluation in a ring of databases. In PODS, 2010. Google ScholarDigital Library
Y. Kotidis et al. Dynamat: a dynamic view management system for data warehouses. SIGMOD Rec., 28(2), 1999. Google ScholarDigital Library
W. Labio et al. Performance issues in incremental warehouse maintenance. In VLDB, 2000. Google ScholarDigital Library
X. Long and T. Suel. Three-level caching for efficient query processing in large web search engines. In WWW, 2005. Google ScholarDigital Library
G. Luo et al. Locking protocols for materialized aggregate join views. In VLDB, 2003. Google ScholarDigital Library
H. Mistry, P. Roy, S. Sudarshan, and K. Ramamritham. Materialized view selection and maintenance using multi-query optimization. SIGMOD Rec., 30(2), 2001. Google ScholarDigital Library
M. E. J. Newman. Power laws, pareto distributions and zipf's law. Contemporary Physics, 46, 2005.Google Scholar
M. T. Özsu and P. Valduriez. Principles of distributed database systems (2nd ed.). 1999. Google ScholarDigital Library
D. Quass and J. Widom. On-line warehouse view maintenance. In SIGMOD, 1997. Google ScholarDigital Library
K. Salem et al. How to roll a join: asynchronous incremental view maintenance. SIGMOD Rec., 29(2), 2000. Google ScholarDigital Library
B. Trushkowsky et al. The scads director: scaling a distributed storage system under stringent performance requirements. In FAST, 2011. Google ScholarDigital Library
P. Valduriez. Join indices. ACM Trans. Database Syst., 12(2), 1987. Google ScholarDigital Library
K. Weil. Measuring tweets. http://blog.twitter.com/2010/02/measuring-tweets.html.Google Scholar

Index Terms

Generalized scale independence through incremental precomputation
1. Information systems
  1. Data management systems
    1. Database management system engines
      1. Parallel and distributed DBMSs

Recommendations

On scale independence for querying big data
PODS '14: Proceedings of the 33rd ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems

To make query answering feasible in big datasets, practitioners have been looking into the notion of scale independence of queries. Intuitively, such queries require only a relatively small subset of the data, whose size is determined by the query and ...
Read More
Materialized views selection for answering queries
ICDEM'10: Proceedings of the Second international conference on Data Engineering and Management

A data warehouse stores historical data to support analytical query processing. These analytical queries are long and complex and processing these against a large data warehouse consumes a lot of time. As a result, the query response time is high. One ...
Read More
Scalable and efficient processing of top-k multiple-type integrated queries
Abstract
In this paper, we define a new class of queries, the top-k multiple-type integrated query (simply, top-k MULTI query). It deals with multiple data types and finds the information in the order of relevance between the query and the object. Various ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGMOD '13: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
June 2013
1322 pages
ISBN:9781450320375
DOI:10.1145/2463676
General Chairs:
Kenneth Ross
Columbia University
,
Divesh Srivastava
AT&T Research
,
Program Chair:
Dimitris Papadias
HKUST
Copyright © 2013 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 22 June 2013
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
materialized view selection
scalability
scale independence
Qualifiers
- research-article
Conference

Acceptance Rates
SIGMOD '13 Paper Acceptance Rate76of372submissions,20%Overall Acceptance Rate785of4,003submissions,20%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 26
  Total Citations
  View Citations
- 422
  Total Downloads
- Downloads (Last 12 months)3
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Generalized scale independence through incremental precomputation

SIGMOD '13: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data

ABSTRACT

References

Cited By

Index Terms

Recommendations

On scale independence for querying big data

Materialized views selection for answering queries

Scalable and efficient processing of top-k multiple-type integrated queries