skip to main content
10.1145/3466752.3480078acmconferencesArticle/Chapter ViewAbstractPublication PagesmicroConference Proceedingsconference-collections
research-article
Public Access

ParaBit: Processing Parallel Bitwise Operations in NAND Flash Memory based SSDs

Published:17 October 2021Publication History

ABSTRACT

Processing-in-memory (PIM) and in-storage-computing (ISC) architectures have been constructed to implement computation inside memory and near storage, respectively. While effectively mitigating the overhead of data movement from memory and storage to the processor, due to the limited bandwidth of existing systems, these architectures still suffer from the large data movement overhead between storage and memory, in particular, if the amount of required data is large. It has become a major constraint for further improving the computation efficiency in PIM and ISC architectures.

In this paper, we propose ParaBit, a scheme that enables Parallel Bitwise operations in NAND flash storage where data reside. By adjusting the latching circuit control and the sequence of sensing operations, ParaBit enables in-flash bitwise operation with no or little extra hardware, which effectively reduces the overhead of data movement between storage and memory. We exploit the massive parallelism in NAND flash based SSDs to mitigate the long latency of flash operations. Our experimental results show that the proposed ParaBit design achieves significant performance improvements over the state-of-the-art PIM and ISC architectures.

References

  1. Nitin Agrawal, Vijayan Prabhakaran, Ted Wobber, John D Davis, Mark S Manasse, and Rina Panigrahy. 2008. Design tradeoffs for SSD performance.. In ATC. USENIX.Google ScholarGoogle Scholar
  2. Junwhan Ahn, Sungpack Hong, Sungjoo Yoo, and et al.2015. A scalable processing-in-memory accelerator for parallel graph processing. In ISCA. ACM.Google ScholarGoogle Scholar
  3. Berkin Akin, Franz Franchetti, and James C Hoe. 2015. Data reorganization in memory using 3D-stacked DRAM. In CAN. ACM.Google ScholarGoogle Scholar
  4. Hadi Asghari-Moghaddam, Young Hoon Son, Jung Ho Ahn, and Nam Sung Kim. 2016. Chameleon: Versatile and practical near-DRAM acceleration architecture for large memory systems. In MICRO. IEEE.Google ScholarGoogle Scholar
  5. Julien Borghetti, Gregory S Snider, Philip J Kuekes, J Joshua Yang, Duncan R Stewart, and R Stanley Williams. 2010. ‘Memristive’switches enable ‘stateful’logic operations via material implication. In Nature. Nature Publishing Group.Google ScholarGoogle Scholar
  6. James Bruce, Tucker Balch, and Manuela Veloso. 2000. Fast and inexpensive color image segmentation for interactive robots. In IROS. IEEE.Google ScholarGoogle ScholarCross RefCross Ref
  7. Yu Cai, Saugata Ghose, Erich F Haratsch, Yixin Luo, and Onur Mutlu. 2017. Error characterization, mitigation, and recovery in flash-memory-based solid-state drives. In Proceedings of the IEEE. IEEE.Google ScholarGoogle ScholarCross RefCross Ref
  8. Chee-Yong Chan and Yannis E Ioannidis. 1998. Bitmap index design and evaluation. In SIGMOD. ACM.Google ScholarGoogle Scholar
  9. Feng Chen, Tian Luo, and Xiaodong Zhang. 2011. CAFTL: A Content-Aware Flash Translation Layer Enhancing the Lifespan of Flash Memory based Solid State Drives.Fast.Google ScholarGoogle Scholar
  10. Ping Chi, Shuangchen Li, Cong Xu, Tao Zhang, Jishen Zhao, Yongpan Liu, Yu Wang, and Yuan Xie. 2016. Prime: A novel processing-in-memory architecture for neural network computation in reram-based main memory. In ISCA. ACM.Google ScholarGoogle Scholar
  11. Jaeyoung Do, Yang-Suk Kee, Jignesh M Patel, Chanik Park, Kwanghyun Park, and David J DeWitt. 2013. Query processing on smart ssds: Opportunities and challenges. In SIGMOD. ACM.Google ScholarGoogle Scholar
  12. Charles Eckert, Xiaowei Wang, Jingcheng Wang, Arun Subramaniyan, Ravi Iyer, Dennis Sylvester, David Blaaauw, and Reetuparna Das. 2018. Neural cache: Bit-serial in-cache acceleration of deep neural networks. In ISCA. IEEE.Google ScholarGoogle Scholar
  13. ENC. [n.d.]. Cosmos OpenSSD Platform. http://www.openssd-project.org/wiki/Cosmos_OpenSSD_Platform.Google ScholarGoogle Scholar
  14. Chiou-Shann Fuh, Shun-Wen Cho, and Kai Essig. 2000. Hierarchical color image region segmentation for content-based image retrieval system. In TIP. IEEE.Google ScholarGoogle Scholar
  15. Congming Gao, Liang Shi, Chun Jason Xue, Cheng Ji, Jun Yang, and Youtao Zhang. 2019. Parallel all the time: Plane level parallelism exploration for high performance SSDs. In MSST. IEEE.Google ScholarGoogle Scholar
  16. Congming Gao, Liang Shi, Mengying Zhao, Chun Jason Xue, Kaijie Wu, and Edwin H-M Sha. 2014. Exploiting parallelism in I/O scheduling for access conflict minimization in flash-based solid state drives. In MSST. IEEE.Google ScholarGoogle Scholar
  17. Congming Gao, Min Ye, Qiao Li, Chun Jason Xue, Youtao Zhang, Liang Shi, and Jun Yang. 2019. Constructing large, durable and fast SSD system via reprogramming 3D TLC flash memory. In MICRO. IEEE.Google ScholarGoogle Scholar
  18. Fei Gao, Georgios Tziantzioulis, and David Wentzlaff. 2019. Computedram: In-memory compute using off-the-shelf drams. In MICRO. IEEE.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Saransh Gupta, Mohsen Imani, and Tajana Rosing. 2018. Felix: Fast and energy-efficient logic in memory. In ICCAD. IEEE.Google ScholarGoogle Scholar
  20. JongWook Han, Choon-Sik Park, Dae-Hyun Ryu, and Eun-Soo Kim. 1999. Optical image encryption based on XOR operations. In Optical Engineering. International Society for Optics and Photonics.Google ScholarGoogle Scholar
  21. Yang Hu, Hong Jiang, Dan Feng, and et al.2012. Exploring and exploiting the multilevel parallelism inside SSDs for improved performance and endurance. In TC. IEEE.Google ScholarGoogle Scholar
  22. Mohsen Imani, Saransh Gupta, Yeseong Kim, and Tajana Rosing. 2019. Floatpim: In-memory acceleration of deep neural network training with high precision. In ISCA. IEEE.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Intel 2015. Intel 64M25 Compute NAND Flash Memory Datasheet. Intel.Google ScholarGoogle Scholar
  24. Woopyo Jeong, Jae-woo Im, Doo-Hyun Kim, and et al.2015. A 128 Gb 3b/cell V-NAND flash memory with 1 Gb/s I/O rate. In JSSC. IEEE.Google ScholarGoogle Scholar
  25. Young Tack Jin, Sungjoon Ahn, and Sungjin Lee. 2018. Performance analysis of nvme ssd-based all-flash array systems. In ISPASS. IEEE.Google ScholarGoogle Scholar
  26. Dongku Kang, Woopyo Jeong, Chulbum Kim, and et al.2016. 256 Gb 3 b/cell V-NAND flash memory with 48 stacked WL layers. JSSC.Google ScholarGoogle Scholar
  27. Hyukjoong Kim, Dongkun Shin, Yun Ho Jeong, and Kyung Ho Kim. 2017. SHRD: Improving Spatial Locality in Flash Storage Accesses by Sequentializing in Host and Randomizing in Device. In FAST. USENIX, 271–284.Google ScholarGoogle Scholar
  28. Sungchan Kim, Hyunok Oh, Chanik Park, and et al.2011. Fast, energy efficient scan inside flash memory SSDs. ADMS.Google ScholarGoogle Scholar
  29. Ricardo Koller and Raju Rangaswami. 2010. I/O deduplication: Utilizing content similarity to improve I/O performance. In TOS. ACM.Google ScholarGoogle Scholar
  30. Gunjae Koo, Kiran Kumar Matam, I Te, and et al.2017. Summarizer: trading communication with computing near storage. In MICRO. IEEE.Google ScholarGoogle Scholar
  31. Joo Hwan Lee, Hui Zhang, Veronica Lagrange, and et al.2020. SmartSSD: FPGA Accelerated Near-Storage Data Analytics on SSD. CAL.Google ScholarGoogle Scholar
  32. Seungjae Lee, Jin-yub Lee, Il-han Park, and et al.2016. 7.5 A 128Gb 2b/cell NAND flash memory in 14nm technology with tPROG= 640μs and 800MB/s I/O rate. In ISSCC. IEEE.Google ScholarGoogle Scholar
  33. Shuangchen Li, Cong Xu, Qiaosha Zou, Jishen Zhao, Yu Lu, and Yuan Xie. 2016. Pinatubo: A processing-in-memory architecture for bulk bitwise operations in emerging non-volatile memories. In DAC. ACM.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Xiaofan Lin, Cong Zhao, and Wei Pan. 2017. Towards accurate binary convolutional neural network. arXiv preprint arXiv:1711.11294.Google ScholarGoogle Scholar
  35. Kevin Marks. 2013. An nvm express tutorial. In Flash Memory Summit.Google ScholarGoogle Scholar
  36. Sally A McKee. 2004. Reflections on the memory wall. In CF.Google ScholarGoogle Scholar
  37. Rino Micheloni, Luca Crippa, and Alessia Marelli. 2010. Inside NAND flash memories. Springer Science & Business Media.Google ScholarGoogle Scholar
  38. Micron. [n.d.]. Parallel NAND System Power Calculator. https://www.micron.com/support/tools-and-utilities/nand-system-power-calculator.Google ScholarGoogle Scholar
  39. Micron 2018. NAND MLC Flash Memory Datasheet. Micron.Google ScholarGoogle Scholar
  40. Kimberly Mlitz. [n.d.]. Data center storage capacity worldwide from 2016 to 2021. https://www.statista.com/statistics/638593/worldwide-data-center-storage-capacity-cloud-vs-traditional//.Google ScholarGoogle Scholar
  41. Onur Mutlu, Saugata Ghose, Juan Gómez-Luna, and Rachata Ausavarungnirun. 2020. A Modern Primer on Processing in Memory. In arXiv preprint arXiv:2012.03112.Google ScholarGoogle Scholar
  42. Jisung Park, Myungsuk Kim, Myoungjun Chun, Lois Orosa, Jihong Kim, and Onur Mutlu. 2020. Reducing Solid State Drive Read Latency by Optimizing Read-Retry. In ASPLOS. ACM.Google ScholarGoogle Scholar
  43. Zhenyuan Ruan, Tong He, and Jason Cong. 2019. INSIDER: Designing In-Storage Computing System for Emerging High-Performance Drive. In ATC. USENIX.Google ScholarGoogle Scholar
  44. Olga Russakovsky, Jia Deng, Hao Su, and et al.2015. Imagenet large scale visual recognition challenge. In IJCV. Springer.Google ScholarGoogle Scholar
  45. Arthur Sainio. 2016. NVDIMM: changes are here so what’s next. In Memory Computing Summit.Google ScholarGoogle Scholar
  46. Samsung. [n.d.]. Samsung 970 Pro. https://www.samsung.com/semiconductor/minisite/ssd/product/consumer/970pro/.Google ScholarGoogle Scholar
  47. Abu Sebastian, Manuel Le Gallo, Riduan Khaddam-Aljameh, and Evangelos Eleftheriou. 2020. Memory devices and applications for in-memory computing. In Nature nanotechnology. Nature Publishing Group.Google ScholarGoogle Scholar
  48. Sudharsan Seshadri, Mark Gahagan, Sundaram Bhaskaran, and et al.2014. Willow: A User-Programmable SSD. In OSDI. USENIX.Google ScholarGoogle Scholar
  49. Vivek Seshadri, Donghyuk Lee, Thomas Mullins, and et al.2017. Ambit: In-memory accelerator for bulk bitwise operations using commodity DRAM technology. In MICRO. IEEE.Google ScholarGoogle Scholar
  50. Yong Ho Song, Sanghyuk Jung, Sang-Won Lee, and Jin-Soo Kim. 2014. Cosmos openSSD: A PCIe-based open source SSD platform. In Flash Memory Summit.Google ScholarGoogle Scholar
  51. Synopsys. [n.d.]. H-spice. https://www.synopsys.com/.Google ScholarGoogle Scholar
  52. Wei Tang, Gang Hua, and Liang Wang. 2017. How to train a compact binary neural network with high accuracy?AAAI.Google ScholarGoogle Scholar
  53. Hossein Valavi, Peter J Ramadge, Eric Nestler, and Naveen Verma. 2019. A 64-tile 2.4-Mb in-memory-computing CNN accelerator employing charge-domain compute. In JSSC. IEEE.Google ScholarGoogle Scholar
  54. Grant Wallace, Fred Douglis, Hangwei Qian, and et al.2012. Characteristics of backup workloads in production systems.. In FAST. USENIX.Google ScholarGoogle Scholar
  55. Zhuo-Rui Wang, Yu-Ting Su, Yi Li, Ya-Xiong Zhou, Tian-Jian Chu, Kuan-Chang Chang, Ting-Chang Chang, Tsung-Ming Tsai, Simon M Sze, and Xiang-Shui Miao. 2016. Functionally complete Boolean logic in 1T1R resistive random access memory. In EDL. IEEE.Google ScholarGoogle Scholar
  56. Xin Xin, Youtao Zhang, and Jun Yang. 2020. ELP2IM: Efficient and Low Power Bitwise Operation Processing in DRAM. In HPCA. IEEE.Google ScholarGoogle Scholar
  57. Ching-Nung Yang and Dao-Shun Wang. 2013. Property analysis of XOR-based visual cryptography. In TCSVT. IEEE.Google ScholarGoogle Scholar
  58. He Zhang, Wang Kang, Bi Wu, Peng Ouyang, Erya Deng, Youguang Zhang, and Weisheng Zhao. 2019. Spintronic processing unit within voltage-gated spin hall effect MRAMs. In TN. IEEE.Google ScholarGoogle Scholar
  59. Li Zhang, Shen gang Hao, Jun Zheng, Yu an Tan, Quan xin Zhang, and Yuan zhang Li. 2015. Descrambling data on solid-state disks by reverse-engineering the firmware. In Digital Investigation.Google ScholarGoogle Scholar
  60. Kai Zhao, Wenzhe Zhao, Hongbin Sun, and et al.2013. LDPC-in-SSD: Making advanced error correction codes work effectively in solid state drives. In FAST. USENIX.Google ScholarGoogle Scholar

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Conferences
    MICRO '21: MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture
    October 2021
    1322 pages
    ISBN:9781450385572
    DOI:10.1145/3466752

    Copyright © 2021 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 17 October 2021

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article
    • Research
    • Refereed limited

    Acceptance Rates

    Overall Acceptance Rate484of2,242submissions,22%

    Upcoming Conference

    MICRO '24

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format