Abstract
When replacing a dirty victim page upon page miss, the conventional buffer managers flush the dirty victim first to the storage before reading the missing page. This read-after-write (RAW) protocol, unfortunately, causes the read stall problem on flash storage; because of the asymmetric I/O speed and parallelism in flash storage, the clean frames are quickly consumed, so the read for the missing page often has to wait for the slow write to complete and for the frame to be clean due to the resource conflict for the same buffer frame. RAW will thus make the performance-critical synchronous reads often blocked by writes, severely worsening transaction throughput and latency. In addition, its strict I/O ordering will make flash storage with abundant parallelism under-utilized.
To avoid read stalls in the DBMS buffer, we propose RW (fused read and write) as a new storage interface. Using RW on read stall, the buffer manager can issue both read and write requests at once to the storage. Then, once the dirty page is copied to the storage buffer, it can immediately serve the read. In addition, to resolve read stalls in the flash storage buffer, we propose R-Buf, where the read buffer is separated from the write buffer so that reads can proceed at no stall. RW and R-Buf, working at different layers, complement each other when used together. We prototype RW and R-Buf on a real Cosmos+ OpenSSD board. Evaluation results show that RW alone improves TPC-C throughput over RAW by 3.2x and, combined with R-Buf, does by 3.9x. In addition, we demonstrate that R-Buf effectively mitigates the I/O interference in multi-tenancy.
- Ibrar Ahmed, Gregory Smith, and Enrico Pirozzi. 2018. PostgreSQL 10 High Performance: Expert Techniques for Query Optimization, High Availability, and Efficient Database Maintenance. Packt Publishing.Google Scholar
- Timothy G. Armstrong, Vamsi Ponnekanti, Dhruba Borthakur, and Mark Callaghan. 2013. LinkBench: A Database Benchmark Based on the Facebook Social Graph. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data (SIGMOD '13). 1185--1196.Google ScholarDigital Library
- Jens Axboe. [n.d.]. FIO (Flexible IO Tester). https://github.com/axboe/fio.Google Scholar
- William Bridge, Ashok Joshi, M. Keihl, Tirthankar Lahiri, Juan Loaiza, and N. MacNaughton. 1997. The Oracle Universal Server Buffer. In Proceedings of the 23rd International Conference on Very Large Data Bases (VLDB '97). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 590--594.Google Scholar
- Feng Chen, Binbing Hou, and Rubao Lee. 2016. Internal Parallelism of Flash Memory-Based Solid-State Drives. ACM Transactions on Storage (TOS) 12 (2016), 1 -- 39.Google ScholarDigital Library
- Feng Chen, Rubao Lee, and Xiaodong Zhang. 2011. Essential roles of exploiting internal parallelism of flash memory based solid state drives in high-speed data processing. In 2011 IEEE 17th International Symposium on High Performance Computer Architecture. 266--277. Google ScholarCross Ref
- Brian F. Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, and Russell Sears. 2010. Benchmarking Cloud Serving Systems with YCSB. In Proceedings of the 1st ACM Symposium on Cloud Computing (SoCC '10). 143--154.Google ScholarDigital Library
- Intel Corporation. 2018. Accelerated SSD Infrastructure for the Cloud. https://builders.intel.com/docs/datacenterbuilders/accelerated-ssd-infrastructure-for-the-cloud-with-attala.pdf. (2018).Google Scholar
- Karl Dias, Mark Ramacher, Uri Shaft, Venkateshwaran Venkataramani, and Graham Wood. 2005. Automatic Performance Diagnosis and Tuning in Oracle. In CIDR.Google Scholar
- Nima Elyasi, Changho Choi, Anand Sivasubramaniam, Jingpei Yang, and Vijay Balakrishnan. 2019. Trimming the Tail for Deterministic Read Performance in SSDs. In 2019 IEEE International Symposium on Workload Characterization (IISWC). 49--58. Google ScholarCross Ref
- Facebook. 2014. db_bench. https://github.com/facebook/rocksdb/wiki/Benchmarking-tools.Google Scholar
- Jim Gray and Bob Fitzgerald. 2008. Flash Disk Opportunity for Server Applications: Future Flash-Based Disks Could Provide Breakthroughs in IOPS, Power, Reliability, and Volumetric Capacity When Compared with Conventional Disks. Queue 6, 4 (July 2008), 18--23. Google ScholarDigital Library
- Jim Gray and Andreas Reuter. 1992. Transaction Processing: Concepts and Techniques (1st ed.). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA.Google ScholarDigital Library
- Guy Harrison. 2014. Using Flash SSD to Optimize Oralce Database Performance. https://www.slideshare.net/gharriso/ssd-and-the-db-flash-cache.Google Scholar
- Gabriel Haas, Michael Haubenschild, and Viktor Leis. 2020. Exploiting Directly-Attached NVMe Arrays in DBMS. In 10th Conference on Innovative Data Systems Research, CIDR 2020.Google Scholar
- Jasmine OpenSSD. 2011. OpenSSD Project. http://www.openssd-project.org/wiki/Jasmine_OpenSSD_Platform.Google Scholar
- Minji Kang, Soyee Choi, Gihwan Oh, and Sang-Won Lee. 2020. 2R: Efficiently Isolating Cold Pages in Flash Storages. Proceedings of VLDB Endowment 13, 12 (jul 2020), 2004--2017.Google ScholarDigital Library
- Woon-Hak Kang, Sang-Won Lee, and Bongki Moon. 2016. Flash as Cache Extension for Online Transactional Workloads. The VLDB Journal 25, 5 (Oct. 2016), 673--694. Google ScholarDigital Library
- Hyojun Kim and Seongjun Ahn. 2008. BPLRU: A Buffer Management Scheme for Improving Random Writes in Flash Storage. In Proceedings of the 6th USENIX Conference on File and Storage Technologies (San Jose, California) (FAST'08). USENIX Association, USA, Article 16, 14 pages.Google ScholarDigital Library
- Alexey Kopytov. 2018. SysBench. https://github.com/akopytov/sysbench.Google Scholar
- Jaewook Kwak, Sangjin Lee, Kibin Park, Jinwoo Jeong, and Yong Ho Song. 2020. Cosmos+ OpenSSD: Rapid Prototype for Flash Storage Systems. ACM Transactions on Storage 16, 3, Article 15 (July 2020).Google ScholarDigital Library
- Sang-Won Lee, Bongki Moon, and Chanik Park. 2009. Advances in Flash Memory SSD Technology for Enterprise Database Applications. In Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data (SIGMOD '09). 863--870.Google ScholarDigital Library
- Viktor Leis, Michael Haubenschild, Alfons Kemper, and Thomas Neumann. 2018. Leanstore: In-memory data management beyond main memory. In 2018 IEEE 34th International Conference on Data Engineering (ICDE). 185--196.Google ScholarCross Ref
- Scott T. Leutenegger and Daniel Dias. 1993. A Modeling Study of the TPC-C Benchmark. In Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data (SIGMOD '93). 22--31. Google ScholarDigital Library
- Feifei Li. 2019. Cloud-Native Database Systems at Alibaba: Opportunities and Challenges. PVLDB 12, 12 (2019), 1942--1945.Google Scholar
- Violin Memory. 2016. Flash Fabric Architecture (Version 2.0). A Whitepaper from Violin Memory.Google Scholar
- MySQL Team (Oracle Corp.). 2021. Configuring Buffer Pool Flushing. https://dev.mysql.com/doc/refman/5.7/en/innodb-buffer-pool-flushing.html.Google Scholar
- MySQL Team (Oracle Corp.). 2021. The InnoDB Buffer Pool. https://dev.mysql.com/doc/refman/5.7/en/innodb-buffer-pool.html.Google Scholar
- MySQLTeam (Oracle Corp.). 2021. Optimizing InnoDB Disk I/O. https://dev.mysql.com/doc/refman/5.7/en/optimizing-innodb-diskio.html.Google Scholar
- MySQL Team (Oracle Corp.). 2021. Server System Variable Reference. https://dev.mysql.com/doc/refman/5.7/en/server-system-variable-reference.html.Google Scholar
- Eyee Hyun Nam, Bryan Suk Joon Kim, Hyeonsang Eom, and Sang Lyul Min. 2011. Ozone (O3): An Out-of-Order Flash Memory Controller Architecture. IEEE Trans. Comput. 60, 5 (2011), 653--666. Google ScholarDigital Library
- Sai Tung On, Shen Gao, Bingsheng He, Ming Wu, Qiong Luo, and Jianliang Xu. 2014. FD-Buffer: A Cost-Based Adaptive Buffer Replacement Algorithm for FlashMemory Devices. IEEE Trans. Comput. 63, 9 (2014), 2288--2301. Google ScholarDigital Library
- Tarikul Islam Papon and Manos Athanassoulis. 2021. A Parametric I/O Model for Modern Storage Devices. In Proceedings of the 17th International Workshop on Data Management on New Hardware (DaMoN 2021) (Virtual Event, China) (DAMON'21). Association for Computing Machinery, New York, NY, USA, Article 2, 11 pages. Google ScholarDigital Library
- Jong-Hyeok Park, Soyee Choi, Gihwan Oh, and Sang-Won Lee. 2021. SaS: SSD as SQL Database System. Proceedings of VLDB Endowment 14, 9 (may 2021), 1481--1488.Google ScholarDigital Library
- Seon-yeong Park, Dawoon Jung, Jeong-uk Kang, Jin-soo Kim, and Joonwon Lee. 2006. CFLRU: A Replacement Algorithm for Flash Memory. In Proceedings of the 2006 International Conference on Compilers, Architecture and Synthesis for Embedded Systems (Seoul, Korea) (CASES '06). Association for Computing Machinery, New York, NY, USA, 234--241. Google ScholarDigital Library
- Percona. 2018. tpcc-mysql. https://github.com/Percona-Lab/tpcc-mysql.Google Scholar
- Adam J. Storm, Christian Garcia-Arellano, Sam S. Lightstone, Yixin Diao, and M. Surendra. 2006. Adaptive Self-Tuning Memory in DB2. In Proceedings of the 32nd International Conference on Very Large Data Bases (Seoul, Korea) (VLDB '06). VLDB Endowment, 1081--1092.Google Scholar
- Steven Swanson and Adrian Caulfield. 2013. Refactor, Reduce, Recycle: Restructuring the I/O Stack for the Future of Storage. Computer 46, 8 (Aug. 2013), 52--59. Google ScholarDigital Library
- J. Z. Teng and R. A. Gumaer. 1984. Managing IBM Database 2 buffers to maximize performance. IBM Systems Journal 23, 2 (1984), 211--218. Google ScholarDigital Library
- The PostgreSQL Global Development Group. 2019. PostgreSQL 11 Documentation: Resource Consumption. https://www.postgresql.org/docs/current/runtime-config-resource.html.Google Scholar
- TPC. [n.d.]. TPC-H. http://www.tpc.org/tpch.Google Scholar
- Dana Van Aken, Andrew Pavlo, Geoffrey J. Gordon, and Bohan Zhang. 2017. Automatic Database Management System Tuning Through Large-Scale Machine Learning. In Proceedings of the 2017 ACM International Conference on Management of Data (SIGMOD '17). 1009--1024.Google ScholarDigital Library
- Daniel Waddington and Jim Harris. 2018. Software Challenges for the Changing Storage Landscape. Commun. ACM 61, 11 (oct 2018), 136--145. Google ScholarDigital Library
- Qingsong Wei, Cheng Chen, and Jun Yang. 2014. CBM: A cooperative buffer management for SSD. In 2014 30th Symposium on Mass Storage Systems and Technologies (MSST). 1--12. Google ScholarCross Ref
- Chun-Feng Wu, Yuan-Hao Chang, Ming-Chang Yang, and Tei-Wei Kuo. 2020. When Storage Response Time Catches Up With Overall Context Switch Overhead, What Is Next? IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 39, 11 (2020), 4266--4277. Google ScholarCross Ref
- Guanying Wu and Xubin He. 2012. Reducing SSD Read Latency via NAND Flash Program and Erase Suspension. In Proceedings of the 10th USENIX Conference on File and Storage Technologies (San Jose, CA) (FAST'12). USENIX Association, USA, 10.Google ScholarDigital Library
Recommendations
Avoiding Read Stalls on Flash Storage
SIGMOD '22: Proceedings of the 2022 International Conference on Management of DataWhen a dirty victim page is selected for replacement upon page miss, the buffer manager has to first flush the dirty victim to the storage before reading the missing page. This conventional read-after-write (RAW) protocol, while working well on hard ...
High-Performance and Endurable Cache Management for Flash-Based Read Caching
Flash-based SSDs are widely used as storage caches, which can benefit from both the higher performance of SSDs and lower price of disks. Unfortunately, issues of reliability and lifetime limit the use of flash-based cache. One way to solve this problem ...
Read leveling for flash storage systems
SYSTOR '15: Proceedings of the 8th ACM International Systems and Storage ConferenceDue to its several attractive benefits such as shock resistance, energy efficiency, and space-efficient form factor, flash memory is now applied to a wide range of electronics. Typically, since write requests are harmful to the health of flash memory, ...
Comments