IMAGIN: Library of IMPLY and MAGIC NOR-Based Approximate Adders for In-Memory Computing

In-memory computing (IMC) has attracted significant interest in recent years as it aims to bridge the memory bottleneck in the Von Neumann architectures. IMC also improves the energy efficiency in these architectures. Another technique that has been explored to reduce the energy consumption is the use of approximate circuits, targeted toward error resilient applications. These applications have addition as one of their most frequently used operations. In literature, CMOS-based approximate adder libraries have been implemented to help designers choose from a variety of designs depending on the output quality requirements. However, the same is not true for memristor-based approximate adders targeted for IMC architectures. Hence, in this work, we developed a framework to generate approximate adder designs with varying output errors for the 8-, 12-, and 16-bit adders. We implemented a state-of-the-art scheduling algorithm to obtain the best mapping of these approximate adder designs for IMC. We performed an exhaustive design space exploration to obtain the pareto-optimal approximate adder designs for various design and error metrics. We then proposed IMAGIN, a library of approximate adders compatible with the memristor-based IMC architecture, which are based on the IMPLY and MAGIC design styles. We also performed mean filtering on the Kodak image dataset using the approximate adders from the IMAGIN library. IMAGIN can help designers select from a wide variety of approximate adders depending on the output quality requirements and serve as benchmarks for future research in this direction. All pareto-optimal designs will be made available at https://github.com/agra-uni-bremen/JxCDC2022-imagin-add.

S INCE the first fabrication of memristor reported by HP labs [1], extensive research efforts have been focused on the fabrication, modeling, design, and synthesis of memristor-based systems. A memristor is a two-terminal nonvolatile passive circuit element capable of changing its resistive state in response to some applied voltage across its terminals. The device can be set to a low resistance state (logic 1) or a high resistance state (logic 0) depending on the magnitude and polarity of the applied voltage [2]. Memristors are typically fabricated in a crossbar structure. Some of the popular logic design styles using memristors include material implication (referred to as IMPLY) and memristor-aided logic (referred to as MAGIC) [3], [4]. While digital logic implementations using these design styles have been explored [5], approximate designs using memristors have received less attention [6], [7], [8].
Approximate circuits enable a plethora of opportunities for designers to reduce energy, area, and delay [9], [10], [11]. Approximate adders using CMOS have been extensively explored [12], [13], [14]. While approximate adders using memristor crossbars have been studied using analog computation [15], [16], the same is not true for digital computation. In [6], only one approximate adder design based on memristor ratioed logic has been shown. There is a need for approximate adder designs using memristors that can provide a wide range of output qualities, which in turn can cater to the needs of various applications [17]. Hence, in this work, we propose a library of approximate adders that are suitable for in-memory computing (IMC) using a memristor crossbar array. These adders are optimized for varying design metrics such as energy, memristor count, and runtime. Following are the contributions of our work. 1) We developed a framework to perform an exhaustive design space exploration to identify pareto-optimal approximate adder designs with varying output errors. 2) We implemented the state-of-the-art mapping algorithm to map these adders to memristor crossbars for both MAGIC NOR and IMPLY design styles. 3) We have evaluated four design metrics: 1) serial runtime; 2) parallel runtime; 3) memristor count; and 4) energy. We have used three different error metrics: 1) mean average error; 2) mean square error; and 3) mean squared log error. 4) To the best of our knowledge, IMAGIN is the first work that proposes a library of approximate adders based on the IMPLY and MAGIC design styles suitable for memristive crossbars. The library contains approximate adder designs with varying design and error metrics. 5) We performed mean filtering using the proposed approximate adders in the pareto-optimal design set. 6) All the pareto-optimal adder designs in IMAGIN will be made available at https://github.com/agra-unibremen/JxCDC2022-imagin-add. The rest of the article is organized as follows. In Section II, we discuss the necessary background. In Section III, we discuss the IMAGIN framework for performing the design space exploration. In Section IV, we discuss the pareto-optimal designs obtained using IMAGIN. In Section V, we perform mean filtering using the approximate adder designs, and in Section VI, we conclude the article.

II. BACKGROUND
There are several design styles to implement logic functions using memristors [18], [19], [20], [21], [22], [23]. In this work, we have focused on the IMPLY and MAGIC design styles. The ON and OFF resistance values for the memristors are assumed to be 1 and 300 K , respectively. These two design styles are widely used for implementing logic functions using memristors [24], [25]. We will discuss these design styles and their mapping schemes in detail in Sections II-A-II-C.

A. IMPLY
The memristive IMPLY gate is based on the material implication function described in [3]. It consists of two memristors and a resistor as shown in Fig. 1(a). Two different voltages V cond and V set are required to perform the logic operations using the IMPLY gate. The result of the computation is stored in M in2 . The IMPLY gate along with the FALSE operation (M →0) can be used to implement any function. In this work, NOT gate has been realized using IMPLY(A, 0) = NOT (A). The working of the IMPLY gate for different input cases is described below.
1) Case 1. M in1 = 0 and M in2 = 0: V cond is applied to M in1 , which is less than the threshold voltage. Since M in1 has logic 0, this voltage does not appear at the common terminal. However, V set is more than the threshold voltage, which changes the logical state of M in2 to 1.

2) Case 2. Other Input Combinations:
In all the other cases, the memristor M in2 retains the same logic state.

B. MAGIC
The MAGIC design style was proposed in [4]. In MAGIC design style, there is a dedicated memristor for the output (M out ). The MAGIC NOR gate is shown in Fig. 1(b). In this work, NOT gate has been realized using NOR (A, 0) = NOT (A). M out is set to logic 1 before evaluation. The working of the MAGIC NOR gate for different input cases is described below. 1) Case 1. M in1 = 0 and M in2 = 0: When V in is applied to the input memristors, the voltage across M out will be close to 0. Hence, M out remains at logic 1. 2) Case 2. Other Input Combinations: When V in is applied to the input memristors, the voltage across the output memristor will be close to V in as one or both of the memristors have logic 1. Since V in is greater than the threshold voltage, M out changes from logic 1 to logic 0.

C. MAPPING SCHEMES
In Fig. 2(a) and (b), we show the mapping of three independent IMPLY (A→B; C→D; E→F) and MAGIC NOR operations (N1 = A NOR B; N2 = C NOR D; and N3 = E NOR F), respectively. In a serial mapping scheme, the gates are mapped to a single row of the crossbar. For initialization, a reset voltage is first applied to all the memristors, and then a set voltage is applied to the memristors that have the logic 1 value. The gates are then evaluated sequentially. Isolation voltage is applied to the memristors not used for computation to preserve the logic state for MAGIC, while in IMPLY they are disconnected [4], [24]. In Fig. 3(a) and (b), we show the parallel mapping for IMPLY and MAGIC NOR, respectively, for the same functions used in serial mapping. In this mapping scheme, the gates are mapped to different rows of the crossbar.  For initialization, a reset voltage is applied to all the memristors. Then the memristors that have logic 1 value are initialized in a columnwise manner for the first and second columns. In MAGIC, the additional third (output) column also needs to be initialized to logic 1. The gates are then evaluated in parallel by supplying the appropriate operating voltages as discussed in Sections II-A and II-B.

III. IMAGIN FRAMEWORK
In this section, we discuss the framework developed to perform the design space exploration. The overall IMAGIN framework is shown in Fig. 4. We will discuss each of the blocks in the IMAGIN framework in detail in the next sections.

A. APPROXIMATE ADDERS GENERATION
A full adder has three inputs, and the number of possible functions for both Sum and Carry is 256 each. We generated all the possible functions for Sum and Carry. Since Sum and Carry can each be implemented in 256 ways, the total combinations available are 65 536 for the 1-bit full adder (one of these combinations is for the exact full adder). We designed the higher bit approximate adder using the 1-bit approximate adders by replacing the required number of exact adders with approximate adders [13], [26]. We implemented the approximate adder designs for the 8-, 12-, and 16-bit adders in Verilog as shown in Fig. 4 . We approximated the adders at the granularity of 2-bits. For the 8-bit adder, we have designs with 2-, 4-, and 6-bit approximations. For the 12-and 16-bit adders, we have designs from 2-bit approximation up to 10-and 14-bit approximation, respectively. For each combination of the n-bit adders with m-bit approximation, we have 65 536 designs.

B. APPROXIMATE ADDERS SYNTHESIS
The overall flow of the synthesis framework is shown in Fig. 4 . The Verilog designs need to be converted to the print factor (.pf) files for further processing. To achieve this, we synthesized the Verilog-based approximate adders using the Yosys tool to generate the Berkeley Logic Interchange Format (.blif) files. The ABC tool is used to generate .pf files from .blif files. In addition to .blif files, a library file is provided to the ABC tool for synthesis. The library file used for generating the designs for IMPLY has the imply function, i.e., out = (in1' + in2) and for MAGIC has the NOR function, i.e., out = (in1 + in2)'. In addition, NOT gate, 0, 1, and buffer gate are also present in the library files for correct synthesis. Some of the .pf files generated from the ABC tool contain outputs assigned to constant 0, constant 1, or buffer when the output bits are tied to 0, 1, or any of the inputs, respectively. This arises as a result of the introduction of approximation. Since these outputs do not require any computation, we remove them using a parser to obtain the final .pf files of the approximate adder designs. The overall mapping and energy estimation framework is shown in Fig. 4 . We will now discuss the mapping algorithm for the IMPLY and MAGIC NOR-based synthesis in the crossbar array. While we have used these two memristor design styles, as they are the most widely used stateful logic for implementing digital IMC using memristor crossbars [3], [4], [24], [25], the framework can also be extended for other logic styles.

1) MAPPING ALGORITHM
We topologically sorted the .pf netlist and evaluated it in a levelwise manner on the crossbar array. After evaluating a particular level i, the output is stored in buffers requiring one additional step. The same crossbar can now be reused for the level i + 1. Hence, the crossbar size will depend on the level having the maximum number of gates. In [27], three scheduling techniques were used; viz., as soon as possible (ASAP), as late as possible (ALAP), and resource constraints (RCs). In this work, we implemented ASAP, ALAP, and LIST scheduling technique. The LIST scheduling technique is based on the RC algorithm [28] as shown in Algorithm 1. The objective of the LIST scheduling is to maintain uniform number of gates for all the levels. In this article, we have shown the results of the LIST scheduling technique as it gives the minimum number of memristors when compared with ASAP and ALAP. While in [27], only MAGIC NOR-based design style was analyzed, we implemented LIST scheduling for both MAGIC NOR-based and IMPLY-based design style. for each gate G in N sort do 8: if (Number of gates < MAX gate ) then 9: Assign G to earliest possible level 10: if (Schedule Not Possible) then 11: MAX gate ++

2) ENERGY ESTIMATION
To estimate the energy of the memristor-based designs, we have used the VTEAM model [29]. The simulation was carried out in Cadence Virtuoso. Table 1 shows the parameters used in the simulations. These parameter values provide a good fit for the physical devices as reported in [30]. The average energy consumption of the IMPLY gate and the MAGIC NOR gate is 1930.78 and 55.04 fJ, respectively, and the energy for each combination of input is shown in Table 2. We have considered the MAGIC operating voltage V in as 1 V, and for IMPLY, we have considered V cond = 1 V, V set = 2 V, and R = 1.5 K . The energy consumption of the MAGIC NOR gate is less than IMPLY, due to the use of an additional resistor in the case of IMPLY [4]. The switching time used for IMPLY and MAGIC is 1.8 and 1.3 ns, respectively. In this work, we are using the number of cycles (operations) as the metric for time. Since all the designs can be decomposed into these operations as discussed earlier, we multiplied the number of operations with the corresponding energy values to obtain the overall energy of the designs as also done in [27]. In addition, SET and RESET energy of 75.56 and 17.66 fJ for the SET and RESET voltage values is 2 and 1 V, respectively. To maintain consistency with the prior works, we are reporting the operation energy in this work [25], [27], [31].

C. TESTBENCH GENERATION AND FUNCTIONAL SIMULATION
The testbench for all the designs is generated to evaluate the same for various inputs. In approximate circuits, the errors are an important metric that has been evaluated using the testbench. The testbench generation is shown in Fig. 4 . We used Icarus Verilog to perform the simulation and generate the outputs of the approximate circuits. We have used all the possible input combinations for the 8-bit adder. For the 12-and 16-bit adders, we have used 10 000 input combinations sampled from a uniform distribution generated using the python NumPy library. We have used three different error metrics for our evaluation. The metrics are mean squared error (MSE), mean absolute error (MAE), and mean squared VOLUME 8, NO. 2, DECEMBER 2022 FIGURE 5. Pareto-optimal designs with respect to various metrics for the 8-bit adders using IMPLY. FIGURE 6. Pareto-optimal designs with respect to various metrics for the 12-bit adders using IMPLY.
log error (MSLE) given by 1 and 2, respectively, The output trace obtained using Icarus Verilog is fed to python which uses scikit-learn library to obtain the error metrics. The framework for functional simulation is shown in Fig. 4 .

IV. PARETO-OPTIMAL DESIGNS FOR APPROXIMATE ADDERS
In this section, we discuss the results obtained using the IMPLY and MAGIC design styles. We have performed the design space exploration for the 8-, 12-, and 16-bit adders.
While there are a number of works on approximate adder designs, we have compared our work against the state-ofthe-art and open-sourced approximate adder library from EvoApproxLib [14]. For all the figures from Figs. 5 to 10, gray circles represent the entire design space of the IMAGIN library, the red triangles are the pareto-optimal IMAGIN designs obtained using the entire design space exploration, and the blue circles are the EvoApproxLib designs.

A. DISCUSSION ON PARETO-OPTIMAL DESIGNS
We show the pareto-optimal results obtained for IMPLY for the 8-, 12-, and 16-bit adders in Figs. 5-7, respectively. We also show the pareto-optimal results obtained for MAGIC for the 8-, 12-, and 16-bit adders in Figs. 8-10 respectively. There are some cases where the EvoAprroxLib design is better than IMAGIN for some design and error values. However, IMA-GIN overall gives a better pareto-optimal than EvoApproxlib for various design and error metrics for both the IMPLY and MAGIC design styles. We have summarized the trends and the number of designs in Table 3. The range column denotes the range of values for time serial (TS), time parallel (TP), memristor count (MC), and operation energy (EN). For IMPLY-based 8-bit approximate adders, the TS, TP, MC, and EN range from 40 to 160, 30 to 110, 3 to 12, and 20 to 180 pJ, respectively. Similarly, for the MAGIC-based 8-bit approximate adders, the TS, TP, MC, and EN range from 40 to 160, 40 to 120, 4 to 16, and 1 to 5 pJ, respectively. Similarly, the ranges for 12-and 16-bit for the IMPLY and MAGIC-based approximate adder designs are shown in Table 3.
The columns of Table 3 for MSE, MAE, and MSLE show the number of pareto-optimal designs for every combination of error and design metrics. For the IMPLY-based 8-bit approximate adder with error metric MSE and design metric TS, there are 99 pareto-optimal approximate adder designs. For the MAGIC-based 8-bit approximate adder with error metric MSE and design metric TS, there are 87 paretooptimal approximate adder designs. In Table 3, we have also shown the number of pareto-optimal designs for other bit widths, error, and design metric combinations.
Overall, we see that for every bit width, error, and design metric, we have several pareto-optimal designs with different values. Depending on the design and error tolerance requirements, the designer can select a particular adder design from the available choices of designs in the IMAGIN library.

B. MULTIOBJECTIVE OPTIMIZATION
We have also performed multiobjective analysis where three parameters are optimized simultaneously to obtain the pareto-optimal approximate adder designs. For all the designs, the first metric is considered as the error metric.  The second and third metrics for the four variants of the runs are: 1) time serial and energy (TS_EN); 2) time parallel and energy (TP_EN); 3) time serial and memristor count (TS_MC); and 4) time parallel and memristor count (TP_MC). The number of pareto-optimal designs in each case is shown in Table 4.
Overall using our framework, we have generated a library of approximate adders. Depending on the design style, energy, memristor count, runtime, and error metric, the designer can select a particular adder or a set of adders. We also see that the proposed library is better than the stateof-the-art library EvoApproxLib [14]. IMAGIN will enable further research and help designers to build larger systems.

V. CASE STUDY: MEAN FILTERING
In this work, we have used mean filtering as an application for evaluating the approximate adder designs. We have used the Kodak image dataset for performing the evaluations [32]. The dataset consists of 25 images which were converted into grayscale before applying the mean filter. As nine neighboring pixels need to be added and then averaged, we implemented an adder tree to perform the addition as shown in Fig. 11.
The images obtained after performing mean filtering with the approximated images are compared against the images where mean filtering is done using exact addition. The output quality metrics used are peak signal-to-noise ratio (PSNR) and structural similarity index (SSIM). The results for the 8-bit and 12-bit adders are shown in Fig. 12. For the 8-bit addition, we scaled the pixel values by 9 before addition to prevent overflow. We see that a wide variety of PSNR and SSIM are obtained using the pareto-optimal designs. Since the application's resilience can widely vary, different adders with varying design and error metrics are necessary for different applications [11], [14], [17]. IMAGIN helps the designer to select adder designs from the library depending on the requirement of the application as the design and error metrics of the pareto-optimal designs are known.

VI. CONCLUSION
In this work, we performed an exhaustive design space exploration to find the most optimal approximate adder designs using the IMPLY-and MAGIC-based design styles. We implemented the state-of-the-art mapping algorithm for both the design styles to obtain the best metrics for IMC based on memristors. We evaluated the approximate adders for four design metrics and three error metrics. We also performed multiobjective pareto-optimal analysis to obtain approximate adder designs. We then proposed IMAGIN, a library of pareto-optimal approximate adders' designs that are suitable for IMC using memristors. Depending on the requirement of the application, the adder designs can be selected from the pareto-optimal set of the IMAGIN library. These adder designs can act as benchmarks for future research in this direction and will be made open source.