research-article

Public Access

ParaBit: Processing Parallel Bitwise Operations in NAND Flash Memory based SSDs

Authors:
Congming Gao

Tsinghua University, China

Tsinghua University, China
View Profile

,
Xin Xin

University of Pittsburgh, United States of America

University of Pittsburgh, United States of America
View Profile

,
Youyou Lu

Tsinghua University, China

Tsinghua University, China
View Profile

,
Youtao Zhang

University of Pittsburgh

University of Pittsburgh
View Profile

,
Jun Yang

University of Pittsburgh, United States of America

University of Pittsburgh, United States of America
View Profile

,
Jiwu Shu

Tsinghua University, China

Tsinghua University, China
View Profile

MICRO '21: MICRO-54: 54th Annual IEEE/ACM International Symposium on MicroarchitectureOctober 2021Pages 59–70https://doi.org/10.1145/3466752.3480078

Published:17 October 2021Publication History

MICRO '21: MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture

Pages 59–70

ABSTRACT

Processing-in-memory (PIM) and in-storage-computing (ISC) architectures have been constructed to implement computation inside memory and near storage, respectively. While effectively mitigating the overhead of data movement from memory and storage to the processor, due to the limited bandwidth of existing systems, these architectures still suffer from the large data movement overhead between storage and memory, in particular, if the amount of required data is large. It has become a major constraint for further improving the computation efficiency in PIM and ISC architectures.

In this paper, we propose ParaBit, a scheme that enables Parallel Bitwise operations in NAND flash storage where data reside. By adjusting the latching circuit control and the sequence of sensing operations, ParaBit enables in-flash bitwise operation with no or little extra hardware, which effectively reduces the overhead of data movement between storage and memory. We exploit the massive parallelism in NAND flash based SSDs to mitigate the long latency of flash operations. Our experimental results show that the proposed ParaBit design achieves significant performance improvements over the state-of-the-art PIM and ISC architectures.

References

Nitin Agrawal, Vijayan Prabhakaran, Ted Wobber, John D Davis, Mark S Manasse, and Rina Panigrahy. 2008. Design tradeoffs for SSD performance.. In ATC. USENIX.Google Scholar
Junwhan Ahn, Sungpack Hong, Sungjoo Yoo, and et al.2015. A scalable processing-in-memory accelerator for parallel graph processing. In ISCA. ACM.Google Scholar
Berkin Akin, Franz Franchetti, and James C Hoe. 2015. Data reorganization in memory using 3D-stacked DRAM. In CAN. ACM.Google Scholar
Hadi Asghari-Moghaddam, Young Hoon Son, Jung Ho Ahn, and Nam Sung Kim. 2016. Chameleon: Versatile and practical near-DRAM acceleration architecture for large memory systems. In MICRO. IEEE.Google Scholar
Julien Borghetti, Gregory S Snider, Philip J Kuekes, J Joshua Yang, Duncan R Stewart, and R Stanley Williams. 2010. ‘Memristive’switches enable ‘stateful’logic operations via material implication. In Nature. Nature Publishing Group.Google Scholar
James Bruce, Tucker Balch, and Manuela Veloso. 2000. Fast and inexpensive color image segmentation for interactive robots. In IROS. IEEE.Google ScholarCross Ref
Yu Cai, Saugata Ghose, Erich F Haratsch, Yixin Luo, and Onur Mutlu. 2017. Error characterization, mitigation, and recovery in flash-memory-based solid-state drives. In Proceedings of the IEEE. IEEE.Google ScholarCross Ref
Chee-Yong Chan and Yannis E Ioannidis. 1998. Bitmap index design and evaluation. In SIGMOD. ACM.Google Scholar
Feng Chen, Tian Luo, and Xiaodong Zhang. 2011. CAFTL: A Content-Aware Flash Translation Layer Enhancing the Lifespan of Flash Memory based Solid State Drives.Fast.Google Scholar
Ping Chi, Shuangchen Li, Cong Xu, Tao Zhang, Jishen Zhao, Yongpan Liu, Yu Wang, and Yuan Xie. 2016. Prime: A novel processing-in-memory architecture for neural network computation in reram-based main memory. In ISCA. ACM.Google Scholar
Jaeyoung Do, Yang-Suk Kee, Jignesh M Patel, Chanik Park, Kwanghyun Park, and David J DeWitt. 2013. Query processing on smart ssds: Opportunities and challenges. In SIGMOD. ACM.Google Scholar
Charles Eckert, Xiaowei Wang, Jingcheng Wang, Arun Subramaniyan, Ravi Iyer, Dennis Sylvester, David Blaaauw, and Reetuparna Das. 2018. Neural cache: Bit-serial in-cache acceleration of deep neural networks. In ISCA. IEEE.Google Scholar
ENC. [n.d.]. Cosmos OpenSSD Platform. http://www.openssd-project.org/wiki/Cosmos_OpenSSD_Platform.Google Scholar
Chiou-Shann Fuh, Shun-Wen Cho, and Kai Essig. 2000. Hierarchical color image region segmentation for content-based image retrieval system. In TIP. IEEE.Google Scholar
Congming Gao, Liang Shi, Chun Jason Xue, Cheng Ji, Jun Yang, and Youtao Zhang. 2019. Parallel all the time: Plane level parallelism exploration for high performance SSDs. In MSST. IEEE.Google Scholar
Congming Gao, Liang Shi, Mengying Zhao, Chun Jason Xue, Kaijie Wu, and Edwin H-M Sha. 2014. Exploiting parallelism in I/O scheduling for access conflict minimization in flash-based solid state drives. In MSST. IEEE.Google Scholar
Congming Gao, Min Ye, Qiao Li, Chun Jason Xue, Youtao Zhang, Liang Shi, and Jun Yang. 2019. Constructing large, durable and fast SSD system via reprogramming 3D TLC flash memory. In MICRO. IEEE.Google Scholar
Fei Gao, Georgios Tziantzioulis, and David Wentzlaff. 2019. Computedram: In-memory compute using off-the-shelf drams. In MICRO. IEEE.Google ScholarDigital Library
Saransh Gupta, Mohsen Imani, and Tajana Rosing. 2018. Felix: Fast and energy-efficient logic in memory. In ICCAD. IEEE.Google Scholar
JongWook Han, Choon-Sik Park, Dae-Hyun Ryu, and Eun-Soo Kim. 1999. Optical image encryption based on XOR operations. In Optical Engineering. International Society for Optics and Photonics.Google Scholar
Yang Hu, Hong Jiang, Dan Feng, and et al.2012. Exploring and exploiting the multilevel parallelism inside SSDs for improved performance and endurance. In TC. IEEE.Google Scholar
Mohsen Imani, Saransh Gupta, Yeseong Kim, and Tajana Rosing. 2019. Floatpim: In-memory acceleration of deep neural network training with high precision. In ISCA. IEEE.Google ScholarDigital Library
Intel 2015. Intel 64M25 Compute NAND Flash Memory Datasheet. Intel.Google Scholar
Woopyo Jeong, Jae-woo Im, Doo-Hyun Kim, and et al.2015. A 128 Gb 3b/cell V-NAND flash memory with 1 Gb/s I/O rate. In JSSC. IEEE.Google Scholar
Young Tack Jin, Sungjoon Ahn, and Sungjin Lee. 2018. Performance analysis of nvme ssd-based all-flash array systems. In ISPASS. IEEE.Google Scholar
Dongku Kang, Woopyo Jeong, Chulbum Kim, and et al.2016. 256 Gb 3 b/cell V-NAND flash memory with 48 stacked WL layers. JSSC.Google Scholar
Hyukjoong Kim, Dongkun Shin, Yun Ho Jeong, and Kyung Ho Kim. 2017. SHRD: Improving Spatial Locality in Flash Storage Accesses by Sequentializing in Host and Randomizing in Device. In FAST. USENIX, 271–284.Google Scholar
Sungchan Kim, Hyunok Oh, Chanik Park, and et al.2011. Fast, energy efficient scan inside flash memory SSDs. ADMS.Google Scholar
Ricardo Koller and Raju Rangaswami. 2010. I/O deduplication: Utilizing content similarity to improve I/O performance. In TOS. ACM.Google Scholar
Gunjae Koo, Kiran Kumar Matam, I Te, and et al.2017. Summarizer: trading communication with computing near storage. In MICRO. IEEE.Google Scholar
Joo Hwan Lee, Hui Zhang, Veronica Lagrange, and et al.2020. SmartSSD: FPGA Accelerated Near-Storage Data Analytics on SSD. CAL.Google Scholar
Seungjae Lee, Jin-yub Lee, Il-han Park, and et al.2016. 7.5 A 128Gb 2b/cell NAND flash memory in 14nm technology with tPROG= 640μs and 800MB/s I/O rate. In ISSCC. IEEE.Google Scholar
Shuangchen Li, Cong Xu, Qiaosha Zou, Jishen Zhao, Yu Lu, and Yuan Xie. 2016. Pinatubo: A processing-in-memory architecture for bulk bitwise operations in emerging non-volatile memories. In DAC. ACM.Google ScholarDigital Library
Xiaofan Lin, Cong Zhao, and Wei Pan. 2017. Towards accurate binary convolutional neural network. arXiv preprint arXiv:1711.11294.Google Scholar
Kevin Marks. 2013. An nvm express tutorial. In Flash Memory Summit.Google Scholar
Sally A McKee. 2004. Reflections on the memory wall. In CF.Google Scholar
Rino Micheloni, Luca Crippa, and Alessia Marelli. 2010. Inside NAND flash memories. Springer Science & Business Media.Google Scholar
Micron. [n.d.]. Parallel NAND System Power Calculator. https://www.micron.com/support/tools-and-utilities/nand-system-power-calculator.Google Scholar
Micron 2018. NAND MLC Flash Memory Datasheet. Micron.Google Scholar
Kimberly Mlitz. [n.d.]. Data center storage capacity worldwide from 2016 to 2021. https://www.statista.com/statistics/638593/worldwide-data-center-storage-capacity-cloud-vs-traditional//.Google Scholar
Onur Mutlu, Saugata Ghose, Juan Gómez-Luna, and Rachata Ausavarungnirun. 2020. A Modern Primer on Processing in Memory. In arXiv preprint arXiv:2012.03112.Google Scholar
Jisung Park, Myungsuk Kim, Myoungjun Chun, Lois Orosa, Jihong Kim, and Onur Mutlu. 2020. Reducing Solid State Drive Read Latency by Optimizing Read-Retry. In ASPLOS. ACM.Google Scholar
Zhenyuan Ruan, Tong He, and Jason Cong. 2019. INSIDER: Designing In-Storage Computing System for Emerging High-Performance Drive. In ATC. USENIX.Google Scholar
Olga Russakovsky, Jia Deng, Hao Su, and et al.2015. Imagenet large scale visual recognition challenge. In IJCV. Springer.Google Scholar
Arthur Sainio. 2016. NVDIMM: changes are here so what’s next. In Memory Computing Summit.Google Scholar
Samsung. [n.d.]. Samsung 970 Pro. https://www.samsung.com/semiconductor/minisite/ssd/product/consumer/970pro/.Google Scholar
Abu Sebastian, Manuel Le Gallo, Riduan Khaddam-Aljameh, and Evangelos Eleftheriou. 2020. Memory devices and applications for in-memory computing. In Nature nanotechnology. Nature Publishing Group.Google Scholar
Sudharsan Seshadri, Mark Gahagan, Sundaram Bhaskaran, and et al.2014. Willow: A User-Programmable SSD. In OSDI. USENIX.Google Scholar
Vivek Seshadri, Donghyuk Lee, Thomas Mullins, and et al.2017. Ambit: In-memory accelerator for bulk bitwise operations using commodity DRAM technology. In MICRO. IEEE.Google Scholar
Yong Ho Song, Sanghyuk Jung, Sang-Won Lee, and Jin-Soo Kim. 2014. Cosmos openSSD: A PCIe-based open source SSD platform. In Flash Memory Summit.Google Scholar
Synopsys. [n.d.]. H-spice. https://www.synopsys.com/.Google Scholar
Wei Tang, Gang Hua, and Liang Wang. 2017. How to train a compact binary neural network with high accuracy?AAAI.Google Scholar
Hossein Valavi, Peter J Ramadge, Eric Nestler, and Naveen Verma. 2019. A 64-tile 2.4-Mb in-memory-computing CNN accelerator employing charge-domain compute. In JSSC. IEEE.Google Scholar
Grant Wallace, Fred Douglis, Hangwei Qian, and et al.2012. Characteristics of backup workloads in production systems.. In FAST. USENIX.Google Scholar
Zhuo-Rui Wang, Yu-Ting Su, Yi Li, Ya-Xiong Zhou, Tian-Jian Chu, Kuan-Chang Chang, Ting-Chang Chang, Tsung-Ming Tsai, Simon M Sze, and Xiang-Shui Miao. 2016. Functionally complete Boolean logic in 1T1R resistive random access memory. In EDL. IEEE.Google Scholar
Xin Xin, Youtao Zhang, and Jun Yang. 2020. ELP2IM: Efficient and Low Power Bitwise Operation Processing in DRAM. In HPCA. IEEE.Google Scholar
Ching-Nung Yang and Dao-Shun Wang. 2013. Property analysis of XOR-based visual cryptography. In TCSVT. IEEE.Google Scholar
He Zhang, Wang Kang, Bi Wu, Peng Ouyang, Erya Deng, Youguang Zhang, and Weisheng Zhao. 2019. Spintronic processing unit within voltage-gated spin hall effect MRAMs. In TN. IEEE.Google Scholar
Li Zhang, Shen gang Hao, Jun Zheng, Yu an Tan, Quan xin Zhang, and Yuan zhang Li. 2015. Descrambling data on solid-state disks by reverse-engineering the firmware. In Digital Investigation.Google Scholar
Kai Zhao, Wenzhe Zhao, Hongbin Sun, and et al.2013. LDPC-in-SSD: Making advanced error correction codes work effectively in solid state drives. In FAST. USENIX.Google Scholar

Recommendations

NAND flash memory system based on the Harvard buffer architecture for multimedia applications

The main purpose of this research is to design a new memory architecture for NAND flash memory to provide XIP (execute in place) for code execution as well as overcome the biggest bottleneck for data execution. NOR flash for multimedia application is ...
Read More
NAND Flash-Based Disk Cache Using SLC/MLC Combined Flash Memory
SNAPI '10: Proceedings of the 2010 International Workshop on Storage Network Architecture and Parallel I/Os

Flash memory-based non-volatile cache (NVC) is emerging as an effective solution for enhancing both the performances and the energy consumptions of storage systems. In order to attain significant performance and energy gains from NVC, it would be better ...
Read More
Next high performance and low power flash memory package structure

In general, NAND flash memory has advantages in low power consumption, storage capacity, and fast erase/write performance in contrast to NOR flash. But, main drawback of the NAND flash memory is the slow access time for random read operations. Therefore,...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

MICRO '21: MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture
October 2021
1322 pages
ISBN:9781450385572
DOI:10.1145/3466752

Copyright © 2021 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 17 October 2021
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
bitwise operation
flash memory
in-storage computing
near data processing
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate484of2,242submissions,22%
Upcoming Conference
MICRO '24

Sponsor:

sigmicro

57th Annual IEEE/ACM International Symposium on Microarchitecture

November 2 - 6, 2024

Austin , TX , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 3
  Total Citations
  View Citations
- 1,888
  Total Downloads
- Downloads (Last 12 months)747
- Downloads (Last 6 weeks)92
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

ParaBit: Processing Parallel Bitwise Operations in NAND Flash Memory based SSDs

MICRO '21: MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture

ABSTRACT

References

Cited By

Recommendations

NAND flash memory system based on the Harvard buffer architecture for multimedia applications

NAND Flash-Based Disk Cache Using SLC/MLC Combined Flash Memory

Next high performance and low power flash memory package structure

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

ParaBit: Processing Parallel Bitwise Operations in NAND Flash Memory based SSDs

MICRO '21: MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture

ABSTRACT

References

Cited By

Recommendations

NAND flash memory system based on the Harvard buffer architecture for multimedia applications

NAND Flash-Based Disk Cache Using SLC/MLC Combined Flash Memory

Next high performance and low power flash memory package structure

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media