Abstract
Snapshot isolation (SI) is a prevalent weak isolation level that avoids the performance penalty imposed by serializability and simultaneously prevents various undesired data anomalies. Nevertheless, SI anomalies have recently been found in production cloud databases that claim to provide the SI guarantee. Given the complex and often unavailable internals of such databases, a black-box SI checker is highly desirable.
In this paper we present PolySI, a black-box checker that efficiently checks SI and provides understandable counterexamples upon detecting violations. PolySI builds on a characterization of SI using generalized polygraphs (GPs), for which we establish its soundness and completeness. PolySI employs an SMT solver and also accelerates SMT solving by utilizing a compact constraint encoding of GPs and domain-specific optimizations for pruning constraints. As our extensive assessment demonstrates, PolySI successfully reproduces all of 2477 known SI anomalies, detects novel SI violations in three production cloud databases, identifies their causes, outperforms the state-of-the-art black-box checkers under a wide range of workloads, and can scale up to large workloads.
- Atul Adya. 1999. Weak Consistency: A Generalized Theory and Optimistic Implementations for Distributed Transactions. Ph.D. Dissertation. USA.Google Scholar
- Peter Bailis, Aaron Davidson, Alan Fekete, Ali Ghodsi, Joseph M. Hellerstein, and Ion Stoica. 2013. Highly Available Transactions: Virtues and Limitations. Proc. VLDB Endow. 7, 3 (nov 2013), 181--192. Google ScholarDigital Library
- Sam Bayless, Noah Bayless, Holger H. Hoos, and Alan J. Hu. 2015. SAT modulo Monotonic Theories. In Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence (AAAI'15). AAAI Press, 3702--3709.Google Scholar
- Hal Berenson, Phil Bernstein, Jim Gray, Jim Melton, Elizabeth O'Neil, and Patrick O'Neil. 1995. A Critique of ANSI SQL Isolation Levels. In SIGMOD '95. ACM, 1--10. Google ScholarDigital Library
- Philip A Bernstein, Vassos Hadzilacos, and Nathan Goodman. 1986. Concurrency Control and Recovery in Database Systems. Addison-Wesley Longman Publishing Co., Inc., USA.Google ScholarDigital Library
- Ranadeep Biswas and Constantin Enea. 2019. On the Complexity of Checking Transactional Consistency. Proc. ACM Program. Lang. 3, OOPSLA, Article 165 (Oct. 2019), 28 pages. Google ScholarDigital Library
- Ranadeep Biswas, Diptanshu Kakwani, Jyothi Vedurada, Constantin Enea, and Akash Lal. 2021. MonkeyDB: Effectively Testing Correctness under Weak Isolation Levels. Proc. ACM Program. Lang. 5, OOPSLA, Article 132 (oct 2021), 27 pages. Google ScholarDigital Library
- Ahmed Bouajjani, Constantin Enea, Rachid Guerraoui, and Jad Hamza. 2017. On verifying causal consistency. In POPL'17. ACM, 626--638.Google ScholarDigital Library
- Andrea Cerone, Giovanni Bernardi, and Alexey Gotsman. 2015. A Framework for Transactional Consistency Models with Atomic Visibility. In CONCUR'15 (LIPIcs), Vol. 42. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 58--71.Google Scholar
- Andrea Cerone and Alexey Gotsman. 2018. Analysing Snapshot Isolation. J. ACM 65, 2, Article 11 (Jan. 2018), 41 pages. Google ScholarDigital Library
- Manuel Clavel, Francisco Durán, Steven Eker, Patrick Lincoln, Narciso Martí-Oliet, José Meseguer, and Carolyn Talcott. 2007. All about Maude - a High-Performance Logical Framework: How to Specify, Program and Verify Systems in Rewriting Logic. Springer-Verlag, Berlin, Heidelberg.Google Scholar
- MariaDB Galera Cluster. Accessed February 14, 2023. https://mariadb.com/kb/en/what-is-mariadb-galera-cluster/.Google Scholar
- CockroachDB. Accessed February 14, 2023. https://www.cockroachlabs.com/.Google Scholar
- Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. 2009. Introduction to Algorithms, Third Edition (3rd ed.). The MIT Press.Google ScholarDigital Library
- Natacha Crooks, Youer Pu, Lorenzo Alvisi, and Allen Clement. 2017. Seeing is Believing: A Client-Centric Specification of Database Isolation. In PODC '17. ACM, 73--82. Google ScholarDigital Library
- Ben Darnell. Accessed February 14, 2023. Lessons Learned from 2+ Years of Nightly Jepsen Tests. https://www.cockroachlabs.com/blog/jepsen-tests-lessons/.Google Scholar
- Oracle Database. Accessed February 14, 2023. https://www.oracle.com/database/.Google Scholar
- Khuzaima Daudjee and Kenneth Salem. 2006. Lazy Database Replication with Snapshot Isolation. In VLDB'06. VLDB Endowment, 715--726.Google Scholar
- Dgraph. Accessed February 14, 2023. https://dgraph.io/.Google Scholar
- Yifan Gan, Xueyuan Ren, Drew Ripberger, Spyros Blanas, and Yang Wang. 2020. IsoDiff: Debugging Anomalies Caused by Weak Isolation. Proc. VLDB Endow. 13, 12 (July 2020), 2773--2786. Google ScholarDigital Library
- Graphviz. Accessed February 14, 2023. Open source graph visualization software. https://graphviz.org/.Google Scholar
- Kaile Huang, Si Liu, Zhenge Chen, Hengfeng Wei, David Basin, Haixiang Li, and Anqun Pan. 2022. Efficient Black-box Checking of Snapshot Isolation in Databases. Technical Report. https://arxiv.org/abs/2301.07313.Google Scholar
- Kaile Huang, Si Liu, Zhenge Chen, Hengfeng Wei, David Basin, Haixiang Li, and Anqun Pan. Accessed February 14, 2023. Issue #17. https://github.com/jepsen-io/elle/issues/17.Google Scholar
- Jepsen. Accessed February 14, 2023. https://jepsen.io.Google Scholar
- Jepsen. Accessed February 14, 2023. Issue #824. https://github.com/YugaByte/yugabyte-db/issues/824.Google Scholar
- Nick Kallen. Accessed February 14, 2023. Big Data in Real Time at Twitter. https://www.infoq.com/presentations/Big-Data-in-Real-Time-at-Twitter/.Google Scholar
- Kyle Kingsbury and Peter Alvaro. 2020. Elle: Inferring Isolation Anomalies from Experimental Observations. Proc. VLDB Endow. 14, 3 (Nov. 2020), 268--280.Google ScholarDigital Library
- Si Liu, Peter Csaba Ölveczky, Min Zhang, Qi Wang, and José Meseguer. 2019. Automatic Analysis of Consistency Properties of Distributed Transaction Systems in Maude. In TACAS 2019 (LNCS), Vol. 11428. Springer, 40--57.Google Scholar
- Wyatt Lloyd, Michael J. Freedman, Michael Kaminsky, and David G. Andersen. 2013. Stronger semantics for low-latency geo-replicated storage. In NSDI' 13. USENIX Association, 313--328.Google ScholarDigital Library
- Haonan Lu, Siddhartha Sen, and Wyatt Lloyd. 2020. Performance-Optimal Read-Only Transactions. In OSDI 2020. USENIX Association, 333--349.Google Scholar
- MongoDB. Accessed February 14, 2023. https://www.mongodb.com/.Google Scholar
- Christos H. Papadimitriou. 1979. The Serializability of Concurrent Database Updates. J. ACM 26, 4 (oct 1979), 631--653. Google ScholarDigital Library
- Daniel Peng and Frank Dabek. 2010. Large-Scale Incremental Processing Using Distributed Transactions and Notifications. In OSDI'10. USENIX Association, USA, 251--264.Google ScholarDigital Library
- PostgreSQL. Accessed February 14, 2023. Transaction Isolation. https://www.postgresql.org/docs/current/transaction-iso.html.Google Scholar
- RUBiS. Accessed February 14, 2023. Auction Site for e-Commerce Technologies Benchmarking. https://projects.ow2.org/view/rubis/.Google Scholar
- Microsoft SQL Server. Accessed February 14, 2023. https://www.microsoft.com/en-us/sql-server/.Google Scholar
- Yair Sovran, Russell Power, Marcos K. Aguilera, and Jinyang Li. 2011. Transactional Storage for Geo-Replicated Systems. In SOSP '11. ACM, 385--400. Google ScholarDigital Library
- Cheng Tan, Changgeng Zhao, Shuai Mu, and Michael Walfish. 2020. COBRA: Making Transactional Key-Value Stores Verifiably Serializable. In OSDI'20. Article 4, 18 pages.Google Scholar
- Douglas B. Terry, Alan J. Demers, Karin Petersen, Mike Spreitzer, Marvin Theimer, and Brent B. Welch. 1994. Session Guarantees for Weakly Consistent Replicated Data. In PDIS. IEEE Computer Society, 140--149.Google ScholarDigital Library
- Jepsen testing of MongoDB 4.2.6. Accessed February 14, 2023. http://jepsen.io/analyses/mongodb-4.2.6.Google Scholar
- Jepsen testing of TiDB 2.1.7. Accessed February 14, 2023. https://jepsen.io/analyses/tidb-2.1.7.Google Scholar
- TiDB. Accessed February 14, 2023. https://en.pingcap.com/tidb/.Google Scholar
- TPC. Accessed February 14, 2023. TPC-C: On-Line Transaction Processing Benchmark. https://www.tpc.org/tpcc/.Google Scholar
- Todd Warszawski and Peter Bailis. 2017. ACIDRain: Concurrency-Related Attacks on Database-Backed Web Applications. In SIGMOD 2017. ACM, 5--20. Google ScholarDigital Library
- Shale Xiong, Andrea Cerone, Azalea Raad, and Philippa Gardner. 2020. Data Consistency in Transactional Storage Systems: A Centralised Semantics. In ECOOP'20, Vol. 166. 21:1--21:31. Google ScholarCross Ref
- YugabyteDB. Accessed February 14, 2023. https://www.yugabyte.com/.Google Scholar
- Kamal Zellag and Bettina Kemme. 2014. Consistency anomalies in multi-tier architectures: automatic detection and prevention. VLDB J. 23, 1 (2014), 147--172. Google ScholarDigital Library
- Rachid Zennou, Ranadeep Biswas, Ahmed Bouajjani, Constantin Enea, and Mohammed Erradi. 2019. Checking Causal Consistency of Distributed Databases. In NETYS 2019 (LNCS), Vol. 11704. Springer, 35--51. Google ScholarDigital Library
Recommendations
Viper: A Fast Snapshot Isolation Checker
EuroSys '23: Proceedings of the Eighteenth European Conference on Computer SystemsSnapshot isolation (SI) is supported by most commercial databases and is widely used by applications. However, checking SI today---given a set of transactions, checking if they obey SI---is either slow or gives up soundness.
We present viper, an SI ...
Simplifying Snapshot Isolation: A New Definition, Equivalence, and Efficient Checking
PaPoC '24: Proceedings of the 11th Workshop on Principles and Practice of Consistency for Distributed DataSnapshot Isolation (SI) is a popular isolation level, supported by many databases and is widely used by applications. Understanding and checking SI is essential. However, today's SI definitions can be obscure for non-experts to understand, or inefficient ...
Remus: Efficient Live Migration for Distributed Databases with Snapshot Isolation
SIGMOD '22: Proceedings of the 2022 International Conference on Management of DataShared-nothing, distributed databases scale transactional and analytical processing over a large data volume by spreading data across servers. However, static sharding of data across nodes makes such systems fail to timely adapt to changing workloads ...
Comments