Optimized register level transfer mechanism for ASIC-based memscomputing

Micro-electromechanical system-based semiconductor sensor systems typically need a reliable on-chip analog read-out circuit to identify and process changes in physical properties. To improve the mobility, robustness, reliability and performance (in terms of power consumption and speed) of the sensor system, the remaining parts of the post-processing logic, such as the digital data processor (or part of it), can also integrated into the chip. Cost and integration are always important considerations in both analog and digital designs, but they become even more important; the subsystem of data processing must be squeezed into a very small space near the structure of the micro-elec-tromechanical system (MEMS) sensor. The paper introduces a new Buffalo-based register-transfer level (BRTL) method that aims to improve the efficiency and reliability of the system of digital data processing for MEMS sensor post-processing algorithm. Initially, the memcomputing system was designed with the needed functional processing elements. Then, a novel Buffalo-based register-transfer level design is modelled in the MEMS architecture. The digital data transmission process is performed to estimate the reliability of the proposed BRTL model. Finally, the performance of the proposed model can be validated


Introduction
Memcomputing was one of the technologies, and it would provide noteworthy enhancements for the solution of hard optimization issues in contrast to conventional algorithmic techniques [1].It stands for computing in and with memory [2].It was the computing paradigm where the memory obtained the two tasks, such as storing and information processing [3].Here, the MEMCOMP was a universal computer which simulated both the quantum computers and reservoir computers [4].The universal memcomputing machine (UMM) was the ideal machine, provided with infinite long memory tape as a substitute for inter-connectivity memory cells, that was capable of performing either analog or digital operations, and these operations were controlled by the controlling unit [5].The responsibility of this controlling unit was to transmit the signal, whereas the memprocessor's internal state was organized [6].These UMMs are used to solve all the problems of computing in and with memory [7].Non-volatile memory, such as logic-in-memory and neuromorphic computing, has been introduced into the computational hierarchy [8].Here, the arithmetic logic unit (ALU) and memory were merged into compact memory-ALU to achieve the logic-in-memory [9].This process was focused on MEMCOMP through Boolean logical functions, including NAND, material implication (IMP) and NOR [10].The binary-resistant switch memristor was able to perform a two-state transition, and the logical values were stored consequently in a similar functional unit [11].Digital memcomputing machines (DMM) was defined as non-linear dynamic system which was designed to solve constraint-satisfied issues [12].
The 3-SAT formula was built by the seminal work of Bearden [13], which applies conjunction, disjunction and negation operations to Boolean variables such as TRUE or FALSE.First, transform the Boolean variables into continuous variables.These variables could be realized as voltages on the self-organizing OR gate terminals [14].This gate could impact the terminals for pushing voltages concerning a configuration satisfied by its OR gate, which regards whether the signal received via the gate originates from the convention input or output [15].The voltages were constrained with Boolean values recovered through thresholding ranges from (-1, 1) [16].Application-specific integrated circuits (ASIC) design comprised of processing elements (PE) connected in various topologies provided with latency, incomparable energy efficiency and form factor [17].Then, the hardware design was created through ASIC with adjusted PE and its connections [18].It had optimized logic gates and hardware implemented uniquely for its application, and it was fabricated, redesigned and designed.The benefits of ASIC provided accurate computing, re-usage of data, local memory, memory bandwidth was higher, and MAC units were able to be modified based on ASIC [19].In contrast, it provided a high performance per watt with no reconfigurability, the tape-out process was slow, and the development cost was very expensive.
Also, ASIC designs would diminish power consumption and make it suitable for mobile and embedded systems.In the hardware design, the implementation of data plane functionality in an ASIC [20].This platform was used to provide high performance because of dedicational hardware components such as ternary content addressable memory chips (TCAM) for efficient packet matching.The application-specific hardware, such as ASIC, considered with the data loaded from memory, was said to be a bottleneck.In several phenomena, Winogad transformation was used as essential for less local data reuse; due to this, it might not be able to obtain the performance gain of the designing process.The ASIC provided with effective use of availability on chip-memory for the increment of data locality.In our study, the most effective ASIC-based MEMCOMP process was analyzed and designed in this study.The key objective of the work is to design the optimized MEMCOMP for the digital applications.
The key contribution of this present study is defined as follows, -initially, the memcomputing system was designed with the needed functional processing elements; -then, a novel Buffalo-based register-transfer level design (BRTLD) is modelled in the MEMS architecture; -the digital data transmission process is performed to estimate the reliability of the proposed RTL model; -finally, the chief metrics like delay, memory usage and power/energy utilization were measured and compared with other models.
The present work is presented in the form of related work in the second section.The problems that the conventional method faces are exposed in the section three.Then, the solution for the defined problem is elaborated in section four.The validated results for the novel solution are discussed in the section five.The end of the research paper was concluded with section six.

Related works
The memristive neuromorphic system included with the chips is implemented through integrated resistant switched memories on the CMOS hardware.The term is neuromorphic has been utilized to recognize analog, digital and mixed models of analog and digital integration design.This network was essential to construct the neurons and artificial synapses that were able to impersonate the complexity term of the biological counterparts.Since this network was inherently volatile, binary and poor scaled.The devices depended on the novel physical principles that were necessary for replicating the biological synapses and neurons in this network, which was consistent with the high-density ultra-connections and complexity processes [21].
The implementation was not available in formal technologies using memristors in circuits, which was one of the major problems.So, to overcome those issues, a Memristor Cellular Non-linear Network (M-CNN) was used to implement the digital realization of CNN by memristors.In this network, the process could balance the time and accuracy in the trade-off manner.For instance, several examples are based on this network for diverse applications such as compression of data; Fast Fourier Transform (FFT) computation, image processing algorithms and chaotic equation estimation or conversion of analog to digital circuits.Also, the consumption of energy, complexity of the circuit and overhead of area for a minimum number of bits were decreased.However, the time was necessary to perform the increasing operation with several bits [22].
The primary network unit, an oscillator circuit-based memristor, will be illustrated.To form the network, couple the oscillators.Then, the network was modified to compensate for the unbalanced connections from coupled node to coupled node and introduce the variability in the device-to-device process, and the original array would be extended.To implement the reconfigured network, insert the transistors in series provided with the coupled capacitors that allowed switching on and off the connections between the cells.For example, organise the cell in the matrix configuration while selecting the respective control signals through multiplexers by ad-hoc address code.Meanwhile, reprogramming the connection among oscillators based on the electrical signals allowed the implementation of various computing phenomena [23].
The ant colony optimization (ACO) was designed through the memristor crossbar array.Initially, the non-linear voltage control memristor mechanism with the relaxation term and then the optimization process was performed with the padding strategy.Furthermore, the memristor crossbar array was designed with the external control circuits with the ACO strategy that provided high parallel computing and device density.The threshold for generating the edges was selected directly as the mean of the final conductance matrix for deploying the designing process of the memcomputing through the memristor crossbar array [24].
In-memory computing would eliminate the movement of data among the physical separable computing and memory units in traditional computers.The quadrative Eigenvalue problem was attained due to the transfer function of the circuit, whereas the minimum Eigenvalue value is denoted as the dominant pole that dominates the circuit's response time.Also, the size of the problem would affect the circuit response.The computational speed of the circuit was enhanced based on parameter setting and performs the process faster.To speed up the computation of the circuit, the above parameter was operated synergistically.For instance, this parameter was used to determine the computational error and power consumed in the circuit.This circuit was crucial to develop the in-memory machine learning (ML) accelerator applications for designing the memcomputing [25].

System model and problem statement
Memcomputing machines operate in continuous time, and the simulation process on modern computers is required to describe the discretization of time.The physical time was said to be continuous time; also, the discrete-time was not the physical quantity of the physical time.The UMM was said to be the Turing-complete that simulates the universal Turing machine (UTM).The UMM consists of analog and digital machines.Meanwhile, analog memcomputing machines had remarkable computation power, and analog systems could not plan for the scalability process, as the size of growing the machines required growing resources to obtain the same accuracy.The DMM was scalable and focused on the dissertation process.Moreover, if the functional operations are not optimised, then it has recorded high power consumption and area usage.
The difficulties of the traditional method are described in Fig. 1.Here, the algorithm for the data processing may result in high power and area usage, which leads to poor outcomes.Thus, the standard method for register-level transfer was very difficult.In addition, the methodology does not provide the outcome because of consuming high power and area usage.To overcome the difficulties of the traditional method, design the optimized MEM computing for the digital applications.

Proposed methodology
The first step was to make the idea into the chip in the ASIC design.The specification was provided with goals, constraints, functionality, speed and power, and technology constraints such as size and shape.The next step of this design was structural and functional description.The functionality should match the specifications.Here, a novel Buffalo-based register-transfer level design (BRTLD) is planned to be designed for controlling the ASIC design functional features.Here, the presence of the Buffalo best solution has helped to keep the MEMS architecture in optimal condition by controlling the data overflow.The proposed architecture is defined in Fig. 2.
The proposed methodology developed a novel buffalo-based register transfer level design method aimed at designing the optimized MEMCOMP for digital applications, which is related to the mems sensor's post-processing algorithm.Here, buffalo optimization gives the best solution in the proposed method.

Process of the proposed methodology
To achieve an effective combination of BRTL and the rapid design steps, the novel description language known as AMDL (Algorithmic Microarchitecture Description Language) was developed along with the new pre-synthesis model.AMDL is the register transfer level (RTL) language that is tailored for describing data paths that include algorithmic constructs, which improves the design time and readability.The pre-synthesis algorithm transforms the AMDL model into the structural RTL model represented in the VHDL, which can be synthesized into the higher quality gate level description using actual RTL tools, as defined in Fig. 3.
A well-designed language-based approach can provide a high-level design method and general-purpose algorithmic language to increase the efficiency of hand-optimized BRTL designs.To ensure that high-level optimization is possible beyond the nature of the algorithm, it is important for the designer to have full visibility of the architecture.AMDL tasks contain correct value and left-value expressions that refer to the specific BRTL functional units and data storage elements, which are listed in the resource declaration section of the description. (1) The digital data initialization for the design by using the Eq. ( 1).Where D(x) is denoted as the objective function variable.The binary data was initialized.The AMDL model does not include the accurate functionalities of all operators.The AMDL designers understand the desired behaviour of the design and create an AMDL description that captures this behaviour accordingly.In most cases, an operator is only the library unit, but if its functionality is more complex, the VHDL code can be used to describe the design during the pre-synthesis phase.
Because the designer manually executes scheduling, binding and resource allocation, AMDL is not a programming language in itself but rather a way of writing code that follows algorithmic principles of the BRTL model that describes a higher abstraction level within BRTL.The conversion of AMDL to BRTL is straightforward during the pre-synthesis stage due to the AMDL language's detailed nature.Functional verification is important to a design's success.For the design, there are often billions or millions of potential test cases.The function verification can be done by using the Eq. ( 2).
Where F is denoted as the functional verification variable, D(x) is the objective function variables, T s denoted as the true specification, F s denoted as the false specification, T 1 and T 2 are test one and test 2. By using the equation, verify that the specific model implements the specification correctly.The datapath is created from the task statements in the AMDL model.The AMDL designer created the BRTL netlist by explicitly stating the connections between the logic elements and storage.The pre-synthesis algorithm identifies and resolves situations where multiple resources are competing for the same input by inserting multiplexers into the netlist as needed.
After the BRTL netlist has been generated, the model generator algorithm creates HDL models of the datapath resource and combines them into a single VHDL entity and architecture.The datapath should be complemented by their VHDL model as well if the AMDL model includes a design-specific operator.
The control unit generation is based on the specific mapping regulation to determine the control states needed, and then the distinct AMDL control structures are implemented.AMDL provides additional control structures that improve the language's expression of concurrency beyond the fundamental elements of structured programming like decisions, loops, statements and sequences.The logic equivalence check can be done by using the Eq. ( 3). ( Where Eq is the logic equivalence checking variable, T 1 is denoted as the true specification one, T 2 is the true specification two and r is the random value.A logic equivalence check is a very crucial process to perform after the synthesis process.This process checks the similarity between the original code and the synthesis netlist.To calculate the power by using the Eq. ( 4).
if (power ≤ 0.5 = optimal). ( Where P s is denoted as the power switching, P C is denoted as the power short circuit and P l is denoted as the power leakage, by using the Eq. ( 5) to check the power optimization.A power value of less than 0.5 means optimal.Otherwise, move to gate level netlist and remove the data path using the buffalo optimization for power optimization.

if (area ≤ 3.2 = optimal). (6)
To check the optimization of the area by using the Eq. ( 6).Where 3.2 is the standard area usage value in BRTL.The area value less than 3.2 means optimal.Otherwise, move to the gate level and, reduce the gate size and minimize the logic using the buffalo optimization for area optimization.The control unit generation consists of two main measures: The pre-synthesis steps of the control state allocation process mapped AMDL control structures to VHDL implementation.In this step, the hierarchical structure of FSM fragments is built.The second step reduces the number of clock cycles by finding and using any opportunities where multiple control states can be executed at the same time.The generator of the BRTL model creates the entire implementation of FSM from the optimized tree of control states.

Stop
The flow and step-by-step process of the novel buffalo-based register level transfer design for ASIC-based memcomputing can be described in Fig. 4.After the processing of the described steps, the experimental results of the novel BRTL design have been valued by different measures.

Result and discussion
The novel BRTL design is validated in Matlab and executed in the Windows 10 platform.The buffalo-based register transfer level design was implemented to control the ASIC design's functional features.The presented Buffalo best solution helps to keep the MEMs architecture in an optimal condition by controlling the data overflow.First, initialize the binary data.
The performance of the proposed model can be validated by power consumption, area, delay and memory usage.The specification of the execution parameters is given in Table 1.

Case study
Developing a new ASIC based memcomputing is the new way of computing that uses memory as a computational resource; implement the BRTL in the memcomputing.Memristor is the type of memory that can compute and store information, making them ideal for memcomputing applications.The combination of ASIC and memristor was successfully implemented in the memcomputing application using BRTL design.

Algorithmic operation
The memcomputing has some arithmetic operations.The diagram shows how to perform arithmetic operations using a series of algorithms.The algorithms are stored in the memory of the memcomputing device, and they are  The diagram shows how to perform arithmetic operations using a series of algorithms.The algorithms are stored in the memory of the memcomputing device, and they are executed one after the other to perform the desired operations.Figure 5 is a simplified representation of a memcomputing device, but it shows the basic principles of how memcomputing works.The memcomputing device has four registers: A, B, Sum and Diff.The registers are used to store the two integers being operated on the result of the operation and the difference between the two integers.The device also has four algorithms: Addition, Subtraction, Multiplication and Division.The first step is to load the two integers into registers A and B. The second step is to call the addition algorithm.The addition algorithm adds the two integers together and stores the result in the register sum.The third step is to display the result of the operation on the screen.

Arithmetic operations
The memcomputing device has components like a register, algorithm and control unit.In memcomputing, the algorithms are stored in the memory of the device and are executed one after the other.The advantage of using it to perform arithmetic operations is that it is much faster than traditional methods.This is because the algorithms for the desired operations can be stored in memory and executed very quickly.Additionally, memcomputing devices are very energy efficient, so they can be used to perform arithmetic operations on battery-powered devices.
Figure 6 shows the simplified representation of memcomputing devices that can be used to perform arithmetic operations on two 4-bit unsigned integers.Here, the registers are all 4 bits wide, which means that they can store integers from 0 to 15.The algorithms are very simple, and they can be executed very quickly.The control unit is responsible for sequencing the execution of the algorithms.It ensures the execution of the algorithms, making sure that they run in the correct order and that the output of each algorithm is used as input for the next.The diagram can perform any arithmetic operations, not just addition.The only difference is the algorithm called.For example, to perform subtraction, you would call the subtraction algorithm.To perform multiplication, you would call the multiplication algorithm.To perform division, you would call the division algorithm.

Performance analysis
The presented model is designed using the BRTL implemented on Matlab and runs on the Windows 10 platform.The metrics such as Area, power consumption, memory usage and delay are computed to validate the performance of the proposed method.For validating the performance improvement, take the recently associated model.The existing models such as Spiking Neural Network (SNN) [26], Hardware Optimized and Error Reduced Approximate Adder (HOERAA) [27], Binary Neural Network (BNN) [28], Residue Number System based Memory Loss Distributed Arithmetic (RNS-MLDA) [29].

Power consumption
The power consumption of an electronic device can be calculated as the power supplied to the device minus the power lost by the device.The amount of energy that the circuit uses while it is operating is known as power consumption.It is an essential factor in the design of electronic systems because it can have a significant impact on the overall performance of the system and the life of the battery.
The power consumption can be calculated by using the Eq. ( 4).
The power consumption of the existing model, such as SNN earned 46 µW, HOERAA earned 28.54 µW, BNN earned 59.75 µW, and RNS-MLDA earned 44.11 µW.The proposed model earned a power consumption of 24.1548 µW, which is less than the existing models.The power consumption comparison with the current model is presented in Fig. 7.

Delay
Delay is the time that it takes for the signal to travel from one point to another in a circuit.It is an important factor in the electronic system design because it has a significant impact on the overall performance of the system.
The delay can be calculated by using the Eq. ( 7).Where D t is the transistor delay, which takes the time for a signal to travel through the transistor and D w is the wire delay, which means it takes time for the signal to travel through a wire.
The delay of existing models such as the RNS-MLDA model earned 1.4 ns, the SNN model earned 0.98 ns, the HOERAA model earned 0.97 ns and the BNN model earned 0.97 ns.At the same time, the proposed model earned a delay of 0.812 ns.The comparison of delay with the existing model is presented in Fig. 8.

Area
The metrics area is the amount of space that the circuit occupies on a chip.It is an essential factor in the electronic system design because it could have a significant impact on the system cost.
The area usage of the existing model, such as the HOERAA method, earned 418.58 µm 2 , the SNN method earned 128 µm 2 , the RNS-MLDA method earned 54.32 µm 2 and the BNN method earned 23.18 µm 2 .At the same time, the proposed model earned an area of 20.95 µm 2 , which is lower than the existing methods.The comparison of the area with the existing method is presented in Fig. 9.

Memory
Memory is a circuit which can store the data permanently and temporarily.It is an important component in the electronic system design because it can store data for various purposes.(8) The memory usage can be calculated by the Eq. ( 8).
Where the size of the data is D s N a the number of times the memory will be accessed and T a the time taken for accessing the memory.
The memory usage of the existing model, such as SNN, earned 533 MB.The proposed model earned memory as 259 MB, which is less than the existing method.The comparison of memory usage with the existing method is presented in Figure 10 and Table 2.

Discussion
Overall, the presented method has obtained better metrics scores in all metrics: Deviation time, area, power, frequency and cells.The overall performance of the proposed BRTL model has been tabulated in Table 3.
The overall performance of the proposed methodology is described in Table 3.The proposed model earned power consumption is 24.1548 µW.The power consumption is less when compared with the existing method.

Conclusion
High-quality digital data processing RTL models in MEMS sensors are typically hard to develop and keep up-to-date, as they lead to optimal gate-level implementations.The design of BRTL and optimization techniques that are effective in reducing area and lowering power consumption can significantly improve the effort and design time.The paper proposes a new way to model generate BRTL hardware using the AMDL language.The AMDL provides a high-level algorithmic interface for BRTL designers, which can lead to faster and more efficient designs.The proposed pre-synthesis steps for translating the AMDL model to BRTL have been shown to produce more quality output, which leads to efficient subsequent synthesis steps.Initially, the memcomputing system was designed with the needed functional processing elements.Then, a novel Buffalo-based register-transfer level design (BRTLD) is modelled in the MEMS architecture.The digital data transmission process is performed to estimate the reliability of the proposed BRTL model.The proposed model earned power as 24.1548 when compared with the traditional model, improved by 2%.Finally, the chief metrics like delay, memory usage and power/energy utilization were measured and compared with other models.In future, design the high-speed and area-efficient model with the proposed model.

Figure 1 .
Figure 1.Difficulties of the traditional method

Figure 4 .
Figure 4.The flow diagram of BRTL

Figure 9 .
Figure 9. Area comparison with the existing methods

Figure 8 .Figure 7 .
Figure 8. Delay comparison with the existing methods

Table 1 .
Specification of execution parameters