ABSTRACT
The need to perform testing and tuning of database instances with production-like workloads (W), configurations (C), data (D), and resources (R) arises routinely. The further W, C, D, and R used in testing and tuning deviate from what is observed on the production database instance, the lower is the trustworthiness of the testing and tuning tasks done. For example, it is common to hear about performance degradation observed after the production database is upgraded from one software version to another. A typical cause of this problem is that the W, C, D, or R used during upgrade testing differed in some way from that on the production database. Performing testing and tuning tasks in principled and automated ways is very important, especially since---spurred by innovations in cloud computing---the number of database instances that a database administrator (DBA) has to manage is growing rapidly.
We present Flex, a platform for trustworthy testing and tuning of production database instances. Flex gives DBAs a high-level language, called Slang, to specify definitions and objectives regarding running experiments for testing and tuning. Flex's orchestrator schedules and runs these experiments in an automated manner that meets the DBA-specified objectives. Flex has been fully prototyped. We present results from a comprehensive empirical evaluation that reveals the effectiveness of Flex on diverse problems such as upgrade testing, near-real-time testing to detect corruption of data, and server configuration tuning. We also report on our experiences taking some of the testing and tuning software described in the literature and porting them to run on the Flex platform.
- Amazon Web Services. aws.amazon.com.Google Scholar
- P. Bodik, R. Griffith, C. Sutton, A. Fox, M. I. Jordan, and D. A. Patterson. Automatic Exploration of Datacenter Performance Regimes. In Automated Control for Datacenters and Clouds, 2009. Google ScholarDigital Library
- N. Borisov and S. Babu. Rapid Experimentation for Testing and Tuning a Production Database Deployment. Technical report, Duke University, 2012. http://bit.ly/Hz6U5w.Google Scholar
- N. Borisov, S. Babu, N. Mandagere, and S. Uttamchandani. Warding off the Dangers of Data Corruption with Amulet. In SIGMOD, 2011. Google ScholarDigital Library
- Business Process Execution Language. http://bit.ly/HI1LFY.Google Scholar
- S. Chaudhuri, V. R. Narasayya, and R. Ramamurthy. Exact Cardinality Query Optimization for Optimizer Testing. PVLDB, 2009. Google ScholarDigital Library
- Data corruption in CouchDB. couchdb.apache.org/notice/1.0.1.html.Google Scholar
- Facebook dark launch. http://on.fb.me/RWsOO.Google Scholar
- S. Duan, V. Thummala, and S. Babu. Tuning Database Configuration Parameters with iTuned. In VLDB, 2009. Google ScholarDigital Library
- Running MySQL on Amazon EC2 with EBS. http://bit.ly/b7SWwg.Google Scholar
- A. J. Elmore, S. Das, D. Agrawal, and A. E. Abbadi. Zephyr: Live Migration in Shared Nothing Databases for Elastic Cloud Platforms. In SIGMOD, 2011. Google ScholarDigital Library
- D. J. Farrar. Schema-driven Experiment Management: Declarative Testing with Dexterity. In DBTest, 2010. Google ScholarDigital Library
- L. Galanis, S. Buranawatanachoke, et al. Oracle Database Replay. In SIGMOD, 2008. Google ScholarDigital Library
- F. Haftmann, D. Kossmann, and E. Lo. Parallel Execution of Test Runs for Database Application Systems. In VLDB, 2005. Google ScholarDigital Library
- B. Hindman, A. Konwinski, M. Zaharia, A. Ghodsi, A. Joseph, R. Katz, S. Shenker, and I. Stoica. Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center. In USENIX, 2011. Google ScholarDigital Library
- The Exploding Digital Universe. http://bit.ly/bzgTBq.Google Scholar
- R. Kohavi, T. Crook, and R. Longbotham. Online Experimentation at Microsoft. In Workshop on Data Mining Case Studies and Practice Prize, 2010.Google Scholar
- Data corruption at Ma.gnolia.com. en.wikipedia.org/wiki/Gnolia.Google Scholar
- MySQL upgrade from 4 to 5. http://bit.ly/sV9PIf.Google Scholar
- Oracle online index rebuild. http://bit.ly/trDrGe.Google Scholar
- Oracle upgrade regression. http://bit.ly/uOHwB1.Google Scholar
- PostgreSQL TPCH bug. http://bit.ly/rIJK1w.Google Scholar
- Salesforce Sandbox. http://bit.ly/7Bi1jU.Google Scholar
- S. Subramanian, Y. Zhang, R. Vaidyanathan, H. S. Gunawi, A. C. Arpaci-Dusseau, R. H. Arpaci-Dusseau, and J. F. Naughton. Impact of Disk Corruption on Open-Source DBMS. In ICDE, 2010.Google ScholarCross Ref
- UpSizeR. http://upsizer.comp.nus.edu.sg/upsizer/.Google Scholar
- K. Yagoub, P. Belknap, B. Dageville, K. Dias, S. Joshi, and H. Yu. Oracle's SQL Performance Analyzer. DEB, 2008.Google Scholar
- W. Zheng, R. Bianchini, G. J. Janakiraman, J. R. Santos, and Y. Turner. JustRunIt: Experiment-Based Management of Virtualized Data Centers. In USENIX, 2009. Google ScholarDigital Library
Index Terms
- Rapid experimentation for testing and tuning a production database deployment
Comments