Skip to main content
Log in

An On-Chip Trainable and Scalable In-Memory ANN Architecture for AI/ML Applications

  • Published:
Circuits, Systems, and Signal Processing Aims and scope Submit manuscript

Abstract

Traditional von Neumann architecture-based processors become inefficient in terms of energy and throughput as they involved separate processing and memory units, also known as memory walls. The memory wall problem is further exacerbated when massive parallelism and frequent data movement are required between processing and memory units for real-time implementation of artificial neural networks (ANNs) that enable many intelligent applications. One of the most promising approach to address this problem is to perform computations inside the memory core that enhances the memory bandwidth and energy efficiency. This paper presents an in-memory computing architecture for ANN enabling artificial intelligence (AI) and machine learning (ML) applications. The proposed architecture utilizes a standard six-transistor (6T) static random access memory (SRAM) core for the implementation of a multilayered perceptron. Our novel on-chip training and inference in-memory architecture reduces energy cost and enhances throughput by simultaneously accessing the multiple rows of SRAM array per pre-charge cycle by eliminating frequent data access. The proposed architecture was trained and tested on the IRIS dataset and was observed to consume ≈ 22.46 × less energy/decision compared to earlier DIMA-based classifiers.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig.1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

References

  1. P.G. Emma, Understanding some simple processor-performance limits. IBM J. Res. Dev. 41(3), 215–232 (1997)

    Article  Google Scholar 

  2. Y. Hirai, Hardware implementation of neural networks in Japan. Neurocomputing 5(1), 3–16 (1993)

    Article  Google Scholar 

  3. B. Moons and M. Verhelst, A 0.3–2.6 TOPS/W precision scalable processor for real-time large-scale ConvNets. in 2016 IEEE Symposium on VLSI Circuits (VLSI-Circuits). IEEE (2016).

  4. B. Moons, et al., 14.5 envision: A 0.26-to-10tops/w subword-parallel dynamic-voltage-accuracy-frequency-scalable convolutional neural network processor in 28nm fdsoi. in 2017 IEEE International Solid-State Circuits Conference (ISSCC). IEEE (2017).

  5. M. Price, J. Glass and A.P. Chandrakasan, 14.4 A scalable speech recognizer with deep-neural-network acoustic models and voice-activated power gating. in 2017 IEEE International Solid-State Circuits Conference (ISSCC). IEEE (2017).

  6. P.N. Whatmough, et al., 14.3 A 28nm SoC with a 1.2 GHz 568nJ/prediction sparse deep-neural-network engine with¿ 0.1 timing error rate tolerance for IoT applications. in 2017 IEEE International Solid-State Circuits Conference (ISSCC). IEEE (2017).

  7. M. Kang et al., An in-memory VLSI architecture for convolutional neural networks. IEEE J. Emerg. Selected Topics Circuits Syst. 8(3), 494–505 (2018)

    Article  Google Scholar 

  8. M. Kang, S.K. Gonugondla, N.R. Shanbhag, A 19.4 nJ/decision 364K decisions/s in-memory random forest classifier in 6T SRAM array. in ESSCIRC 2017–43rd IEEE European Solid State Circuits Conference. IEEE, (2017).

  9. J. Zhang, Z. Wang, N. Verma, In-memory computation of a machine-learning classifier in a standard 6T SRAM array. IEEE J. Solid-State Circuits 52(4), 915–924 (2017)

    Article  Google Scholar 

  10. K. Karras et al., A hardware acceleration platform for AI-based inference at the edge. Circuits Syst. Signal Process. 39(2), 1059–1070 (2020)

    Article  Google Scholar 

  11. U.A. Korat, A. Alimohammad, A reconfigurable hardware architecture for principal component analysis. Circuits Syst. Signal Process. 38(5), 2097–2113 (2019)

    Article  Google Scholar 

  12. N. Nedjah et al., Dynamic MAC-based architecture of artificial neural networks suitable for hardware implementation on FPGAs. Neurocomputing 72(10–12), 2171–2179 (2009)

    Article  Google Scholar 

  13. M. Panwar et al., M2DA: a low-complex design methodology for convolutional neural network exploiting data symmetry and redundancy. Circuits Syst. Signal Process. 40(3), 1542–1567 (2021)

    Article  Google Scholar 

  14. E. Won, A hardware implementation of artificial neural networks using field programmable gate arrays. Nucl. Instrum. Methods Phys. Res., Sect. A 581(3), 816–820 (2007)

    Article  Google Scholar 

  15. S.M. Beeraka et al., Accuracy enhancement of epileptic seizure detection: a deep learning approach with hardware realization of STFT. Circuits Syst. Signal Process. 41(1), 461–484 (2022)

    Article  Google Scholar 

  16. O. Krestinskaya, K.N. Salama, and A. P. James, Analog backpropagation learning circuits for memristive crossbar neural networks. in 2018 IEEE International Symposium on Circuits and Systems (ISCAS). IEEE, (2018).

  17. A.J. P´erez-Avila, et al., Multilevel memristor based matrix-vector ´ multiplication: influence of the discretization method. in 2021 13th Spanish Conference on Electron Devices (CDE). IEEE (2021).

  18. W. Woods and C. Teuscher. Approximate vector matrix multiplication implementations for neuromorphic applications using memristive crossbars. In 2017 IEEE/ACM International Symposium on Nanoscale Architectures (NANOARCH). IEEE, (2017).

  19. M.V. Nair, P. Dudek. Gradient-descent-based learning in memristive crossbar arrays. in 2015 International Joint Conference on Neural Networks (IJCNN). IEEE (2015).

  20. L.F. Abbott, S.B. Nelson, Synaptic plasticity: taming the beast. Nat. Neurosci. 3(11), 1178–1183 (2000)

    Article  Google Scholar 

  21. E.R. Kandel, The molecular biology of memory storage: a dialogue between genes and synapses. Science 294(5544), 1030–1038 (2001)

    Article  Google Scholar 

  22. S. Choi et al., SiGe epitaxial memory for neuromorphic computing with reproducible high performance based on engineered dislocations. Nature Mater. 17(4), 335–340 (2018)

    Article  Google Scholar 

  23. S. Park, et al., RRAM-based synapse for neuromorphic system with pattern recognition function. in 2012 International Electron Devices Meeting. IEEE (2012).

  24. Y.-T. Seo et al., Si-based FET-type synaptic device with short-term and long-term plasticity using high-κ gate-stack. IEEE Trans. Electron. Devices 66(2), 917–923 (2019)

    Article  MathSciNet  Google Scholar 

  25. M. Kang et al., A multi-functional in-memory inference processor using a standard 6T SRAM array. IEEE J. Solid-State Circuits 53(2), 642–655 (2018)

    Article  Google Scholar 

  26. N. Shanbhag, M. Kang and M. S. Keel, Compute memory. US Patent US9697877B2.[Online]. Available: https://patentsgoogle.com/-patent/US9697877 (2017).

  27. S.K. Gonugondla, M. Kang, N.R. Shanbhag, A variation-tolerant in-memory machine learning classifier via on-chip training. IEEE J. Solid-State Circuits 53(11), 3163–3173 (2018)

    Article  Google Scholar 

  28. A. Kumar, et al., In-memory Implementation of On-chip Trainable and Scalable ANN for AI/ML Applications. arXiv preprint arXiv:2005.09526 (2020).

  29. I. Tsmots, O. Skorokhoda, and V. Rabyk, Hardware implementation of sigmoid activation functions using FPGA. in 2019 IEEE 15th International Conference on the Experience of Designing and Application of CAD Systems (CADSM). IEEE (2019).

  30. A.H. Namin, et al., Efficient hardware implementation of the hyperbolic tangent sigmoid function. in 2009 IEEE International Symposium on Circuits and Systems. IEEE (2009).

  31. I. Kouretas, and V. Paliouras. Simplified hardware implementation of the softmax activation function. in 2019 8th International Conference on Modern Circuits and Systems Technologies (MOCAST). IEEE, (2019).

  32. Predictive Technology Model. http://ptm.asu.edu/

  33. R.A. Fisher, The use of multiple measurements in taxonomic problems. Ann. Eugen. 7(2), 179–188 (1936)

    Article  Google Scholar 

  34. Iris Dataset. https://archive.ics.uci.edu/ml/datasets/iris

  35. Li. Deng, The mnist database of handwritten digit images for machine learning research [best of the web]. IEEE Signal Process. Mag. 29(6), 141–142 (2012)

    Article  Google Scholar 

  36. A. Biswas, A.P. Chandrakasan, Conv-RAM: An energy-efficient SRAM with embedded convolution computation for low-power CNN-based machine learning applications. in 2018 IEEE International Solid-State Circuits Conference-(ISSCC). IEEE, (2018).

  37. W.-S. Khwa, et al., A 65nm 4Kb algorithm-dependent computing-in-memory SRAM unit-macro with 2.3 ns and 55.8 TOPS/W fully parallel product-sum operation for binary DNN edge processors. in 2018 IEEE International Solid-State Circuits Conference-(ISSCC). IEEE, (2018).

  38. A. Sayal et al., A 12.08-TOPS/W all-digital time-domain CNN engine using bi-directional memory delay lines for energy efficient edge computing. IEEE J. Solid-State Circuits 55(1), 60–75 (2019)

    Article  Google Scholar 

  39. P. Harpe, A compact 10-b SAR ADC with unit-length capacitors and a passive FIR filter. IEEE J. Solid-State Circuits 54(3), 636–645 (2018)

    Article  Google Scholar 

  40. Choi, I., et al., An SRAM-based hybrid computation-in-memory macro using current-reused differential CCO. IEEE J. Emerg. Selected Topics Circuits Syst. (2022).

  41. Y. Toyama et al., An 8 bit 12.4 TOPS/W phase-domain MAC circuit for energy-constrained deep learning accelerators. IEEE J. Solid-State Circuits 54(10), 2730–2742 (2019)

    Article  Google Scholar 

  42. X. Si et al., A local computing cell and 6T SRAM-based computing in-memory macro with 8-b MAC operation for edge AI chips. IEEE J. Solid-State Circuits 56(9), 2817–2831 (2021)

    Article  Google Scholar 

  43. H. Xiao, K. Rasul, and R. Vollgraf. Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. arXiv preprint arXiv:1708.07747 (2017).

  44. D. Robinson, Comparing pairs of mnist digits based on one pixel. Github. https://gist.github.com/dgrtwo/aaef94ecc6a60cd50322c0054cc04478

  45. I. Goodfellow Instead of moving on to harder datasets than mnist, the ML community is studying it more than ever. even proportional to other datasets https://t.co/tao52vc1fg. Twitter (2017). https://twitter.com/goodfellowian/status/852591106655043584

  46. F. Cholle, Many good ideas will not work well on MNIST. Twitter (2017). https://twitter.com/fchollet/status/852594987527045120

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Abhash Kumar.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kumar, A., Beeraka, S.M., Singh, J. et al. An On-Chip Trainable and Scalable In-Memory ANN Architecture for AI/ML Applications. Circuits Syst Signal Process 42, 2828–2851 (2023). https://doi.org/10.1007/s00034-022-02237-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00034-022-02237-7

Keywords

Navigation