skip to main content
research-article
Artifacts Available / v1.1

Your read is our priority in flash storage

Published:01 May 2022Publication History
Skip Abstract Section

Abstract

When replacing a dirty victim page upon page miss, the conventional buffer managers flush the dirty victim first to the storage before reading the missing page. This read-after-write (RAW) protocol, unfortunately, causes the read stall problem on flash storage; because of the asymmetric I/O speed and parallelism in flash storage, the clean frames are quickly consumed, so the read for the missing page often has to wait for the slow write to complete and for the frame to be clean due to the resource conflict for the same buffer frame. RAW will thus make the performance-critical synchronous reads often blocked by writes, severely worsening transaction throughput and latency. In addition, its strict I/O ordering will make flash storage with abundant parallelism under-utilized.

To avoid read stalls in the DBMS buffer, we propose RW (fused read and write) as a new storage interface. Using RW on read stall, the buffer manager can issue both read and write requests at once to the storage. Then, once the dirty page is copied to the storage buffer, it can immediately serve the read. In addition, to resolve read stalls in the flash storage buffer, we propose R-Buf, where the read buffer is separated from the write buffer so that reads can proceed at no stall. RW and R-Buf, working at different layers, complement each other when used together. We prototype RW and R-Buf on a real Cosmos+ OpenSSD board. Evaluation results show that RW alone improves TPC-C throughput over RAW by 3.2x and, combined with R-Buf, does by 3.9x. In addition, we demonstrate that R-Buf effectively mitigates the I/O interference in multi-tenancy.

References

  1. Ibrar Ahmed, Gregory Smith, and Enrico Pirozzi. 2018. PostgreSQL 10 High Performance: Expert Techniques for Query Optimization, High Availability, and Efficient Database Maintenance. Packt Publishing.Google ScholarGoogle Scholar
  2. Timothy G. Armstrong, Vamsi Ponnekanti, Dhruba Borthakur, and Mark Callaghan. 2013. LinkBench: A Database Benchmark Based on the Facebook Social Graph. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data (SIGMOD '13). 1185--1196.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Jens Axboe. [n.d.]. FIO (Flexible IO Tester). https://github.com/axboe/fio.Google ScholarGoogle Scholar
  4. William Bridge, Ashok Joshi, M. Keihl, Tirthankar Lahiri, Juan Loaiza, and N. MacNaughton. 1997. The Oracle Universal Server Buffer. In Proceedings of the 23rd International Conference on Very Large Data Bases (VLDB '97). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 590--594.Google ScholarGoogle Scholar
  5. Feng Chen, Binbing Hou, and Rubao Lee. 2016. Internal Parallelism of Flash Memory-Based Solid-State Drives. ACM Transactions on Storage (TOS) 12 (2016), 1 -- 39.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Feng Chen, Rubao Lee, and Xiaodong Zhang. 2011. Essential roles of exploiting internal parallelism of flash memory based solid state drives in high-speed data processing. In 2011 IEEE 17th International Symposium on High Performance Computer Architecture. 266--277. Google ScholarGoogle ScholarCross RefCross Ref
  7. Brian F. Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, and Russell Sears. 2010. Benchmarking Cloud Serving Systems with YCSB. In Proceedings of the 1st ACM Symposium on Cloud Computing (SoCC '10). 143--154.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Intel Corporation. 2018. Accelerated SSD Infrastructure for the Cloud. https://builders.intel.com/docs/datacenterbuilders/accelerated-ssd-infrastructure-for-the-cloud-with-attala.pdf. (2018).Google ScholarGoogle Scholar
  9. Karl Dias, Mark Ramacher, Uri Shaft, Venkateshwaran Venkataramani, and Graham Wood. 2005. Automatic Performance Diagnosis and Tuning in Oracle. In CIDR.Google ScholarGoogle Scholar
  10. Nima Elyasi, Changho Choi, Anand Sivasubramaniam, Jingpei Yang, and Vijay Balakrishnan. 2019. Trimming the Tail for Deterministic Read Performance in SSDs. In 2019 IEEE International Symposium on Workload Characterization (IISWC). 49--58. Google ScholarGoogle ScholarCross RefCross Ref
  11. Facebook. 2014. db_bench. https://github.com/facebook/rocksdb/wiki/Benchmarking-tools.Google ScholarGoogle Scholar
  12. Jim Gray and Bob Fitzgerald. 2008. Flash Disk Opportunity for Server Applications: Future Flash-Based Disks Could Provide Breakthroughs in IOPS, Power, Reliability, and Volumetric Capacity When Compared with Conventional Disks. Queue 6, 4 (July 2008), 18--23. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Jim Gray and Andreas Reuter. 1992. Transaction Processing: Concepts and Techniques (1st ed.). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Guy Harrison. 2014. Using Flash SSD to Optimize Oralce Database Performance. https://www.slideshare.net/gharriso/ssd-and-the-db-flash-cache.Google ScholarGoogle Scholar
  15. Gabriel Haas, Michael Haubenschild, and Viktor Leis. 2020. Exploiting Directly-Attached NVMe Arrays in DBMS. In 10th Conference on Innovative Data Systems Research, CIDR 2020.Google ScholarGoogle Scholar
  16. Jasmine OpenSSD. 2011. OpenSSD Project. http://www.openssd-project.org/wiki/Jasmine_OpenSSD_Platform.Google ScholarGoogle Scholar
  17. Minji Kang, Soyee Choi, Gihwan Oh, and Sang-Won Lee. 2020. 2R: Efficiently Isolating Cold Pages in Flash Storages. Proceedings of VLDB Endowment 13, 12 (jul 2020), 2004--2017.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Woon-Hak Kang, Sang-Won Lee, and Bongki Moon. 2016. Flash as Cache Extension for Online Transactional Workloads. The VLDB Journal 25, 5 (Oct. 2016), 673--694. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Hyojun Kim and Seongjun Ahn. 2008. BPLRU: A Buffer Management Scheme for Improving Random Writes in Flash Storage. In Proceedings of the 6th USENIX Conference on File and Storage Technologies (San Jose, California) (FAST'08). USENIX Association, USA, Article 16, 14 pages.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Alexey Kopytov. 2018. SysBench. https://github.com/akopytov/sysbench.Google ScholarGoogle Scholar
  21. Jaewook Kwak, Sangjin Lee, Kibin Park, Jinwoo Jeong, and Yong Ho Song. 2020. Cosmos+ OpenSSD: Rapid Prototype for Flash Storage Systems. ACM Transactions on Storage 16, 3, Article 15 (July 2020).Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Sang-Won Lee, Bongki Moon, and Chanik Park. 2009. Advances in Flash Memory SSD Technology for Enterprise Database Applications. In Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data (SIGMOD '09). 863--870.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Viktor Leis, Michael Haubenschild, Alfons Kemper, and Thomas Neumann. 2018. Leanstore: In-memory data management beyond main memory. In 2018 IEEE 34th International Conference on Data Engineering (ICDE). 185--196.Google ScholarGoogle ScholarCross RefCross Ref
  24. Scott T. Leutenegger and Daniel Dias. 1993. A Modeling Study of the TPC-C Benchmark. In Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data (SIGMOD '93). 22--31. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Feifei Li. 2019. Cloud-Native Database Systems at Alibaba: Opportunities and Challenges. PVLDB 12, 12 (2019), 1942--1945.Google ScholarGoogle Scholar
  26. Violin Memory. 2016. Flash Fabric Architecture (Version 2.0). A Whitepaper from Violin Memory.Google ScholarGoogle Scholar
  27. MySQL Team (Oracle Corp.). 2021. Configuring Buffer Pool Flushing. https://dev.mysql.com/doc/refman/5.7/en/innodb-buffer-pool-flushing.html.Google ScholarGoogle Scholar
  28. MySQL Team (Oracle Corp.). 2021. The InnoDB Buffer Pool. https://dev.mysql.com/doc/refman/5.7/en/innodb-buffer-pool.html.Google ScholarGoogle Scholar
  29. MySQLTeam (Oracle Corp.). 2021. Optimizing InnoDB Disk I/O. https://dev.mysql.com/doc/refman/5.7/en/optimizing-innodb-diskio.html.Google ScholarGoogle Scholar
  30. MySQL Team (Oracle Corp.). 2021. Server System Variable Reference. https://dev.mysql.com/doc/refman/5.7/en/server-system-variable-reference.html.Google ScholarGoogle Scholar
  31. Eyee Hyun Nam, Bryan Suk Joon Kim, Hyeonsang Eom, and Sang Lyul Min. 2011. Ozone (O3): An Out-of-Order Flash Memory Controller Architecture. IEEE Trans. Comput. 60, 5 (2011), 653--666. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Sai Tung On, Shen Gao, Bingsheng He, Ming Wu, Qiong Luo, and Jianliang Xu. 2014. FD-Buffer: A Cost-Based Adaptive Buffer Replacement Algorithm for FlashMemory Devices. IEEE Trans. Comput. 63, 9 (2014), 2288--2301. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Tarikul Islam Papon and Manos Athanassoulis. 2021. A Parametric I/O Model for Modern Storage Devices. In Proceedings of the 17th International Workshop on Data Management on New Hardware (DaMoN 2021) (Virtual Event, China) (DAMON'21). Association for Computing Machinery, New York, NY, USA, Article 2, 11 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Jong-Hyeok Park, Soyee Choi, Gihwan Oh, and Sang-Won Lee. 2021. SaS: SSD as SQL Database System. Proceedings of VLDB Endowment 14, 9 (may 2021), 1481--1488.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Seon-yeong Park, Dawoon Jung, Jeong-uk Kang, Jin-soo Kim, and Joonwon Lee. 2006. CFLRU: A Replacement Algorithm for Flash Memory. In Proceedings of the 2006 International Conference on Compilers, Architecture and Synthesis for Embedded Systems (Seoul, Korea) (CASES '06). Association for Computing Machinery, New York, NY, USA, 234--241. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Percona. 2018. tpcc-mysql. https://github.com/Percona-Lab/tpcc-mysql.Google ScholarGoogle Scholar
  37. Adam J. Storm, Christian Garcia-Arellano, Sam S. Lightstone, Yixin Diao, and M. Surendra. 2006. Adaptive Self-Tuning Memory in DB2. In Proceedings of the 32nd International Conference on Very Large Data Bases (Seoul, Korea) (VLDB '06). VLDB Endowment, 1081--1092.Google ScholarGoogle Scholar
  38. Steven Swanson and Adrian Caulfield. 2013. Refactor, Reduce, Recycle: Restructuring the I/O Stack for the Future of Storage. Computer 46, 8 (Aug. 2013), 52--59. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. J. Z. Teng and R. A. Gumaer. 1984. Managing IBM Database 2 buffers to maximize performance. IBM Systems Journal 23, 2 (1984), 211--218. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. The PostgreSQL Global Development Group. 2019. PostgreSQL 11 Documentation: Resource Consumption. https://www.postgresql.org/docs/current/runtime-config-resource.html.Google ScholarGoogle Scholar
  41. TPC. [n.d.]. TPC-H. http://www.tpc.org/tpch.Google ScholarGoogle Scholar
  42. Dana Van Aken, Andrew Pavlo, Geoffrey J. Gordon, and Bohan Zhang. 2017. Automatic Database Management System Tuning Through Large-Scale Machine Learning. In Proceedings of the 2017 ACM International Conference on Management of Data (SIGMOD '17). 1009--1024.Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Daniel Waddington and Jim Harris. 2018. Software Challenges for the Changing Storage Landscape. Commun. ACM 61, 11 (oct 2018), 136--145. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Qingsong Wei, Cheng Chen, and Jun Yang. 2014. CBM: A cooperative buffer management for SSD. In 2014 30th Symposium on Mass Storage Systems and Technologies (MSST). 1--12. Google ScholarGoogle ScholarCross RefCross Ref
  45. Chun-Feng Wu, Yuan-Hao Chang, Ming-Chang Yang, and Tei-Wei Kuo. 2020. When Storage Response Time Catches Up With Overall Context Switch Overhead, What Is Next? IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 39, 11 (2020), 4266--4277. Google ScholarGoogle ScholarCross RefCross Ref
  46. Guanying Wu and Xubin He. 2012. Reducing SSD Read Latency via NAND Flash Program and Erase Suspension. In Proceedings of the 10th USENIX Conference on File and Storage Technologies (San Jose, CA) (FAST'12). USENIX Association, USA, 10.Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in

Full Access

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader