Proposal of a Takagi-Sugeno Fuzzy-PI Controller Hardware

This work proposes dedicated hardware for an intelligent control system on Field Programmable Gate Array (FPGA). The intelligent system is represented as Takagi-Sugeno Fuzzy-PI controller. The implementation uses a fully parallel strategy associated with a hybrid bit format scheme (fixed-point and other floating-point). Two hardware designs are proposed; the first one uses a single clock cycle processing architecture, and the other uses a pipeline scheme. The bit accuracy was tested by simulation with a non linear control system of a robotic manipulator. The area, throughput, and dynamic power consumption of the implemented hardware are used to validate and compare the results of this proposal. The results achieved allow that the proposal hardware can use in several applications with high-throughput, low-power and ultra-low-latency restrictions such as teleportation of robot manipulators, tactile internet, industrial automation in industry 4.0, and others.


Introduction
Systems based on Fuzzy Logic (FL), have been used in many industrial and commercial applications such as robotics, automation, control and classification problems. Unlike high data volume systems, such as Big Data and Mining of Massive Datasets (MMD) [1,2,3], one of the great advantages of Fuzzy Logic is its ability to work with incomplete or inaccurate information.
The interest in the development of dedicated hardware implementing Fuzzy Systems has increased due to the demand for high-throughput, low-power and ultra-low-latency control systems for emerging applications in [30], the proposal introduced in [31] adopted a semi-parallel implementation, and this way decreased the throughput and increased power consumption. Other Mamdani implementations following the same strategy are also found in [32,33,34,35].
A multivariate Takagi-Sugeno fuzzy controller on FPGA is proposed in [5]. The hardware is applied to the temperature and humidity controller for a chicken incubator and it was projected to two inputs, 6 rules and three outputs. When compared to other works, the hardware proposed in [5] achieved a low throughput about 6 Mflips. A hardware accelerator architecture for a Takagi-Sugeno fuzzy controller is proposed in [7] and this proposal achieved a throughput about 1.56 Msps with three inputs, two outputs and 24 bits.
In [11,12,13] a design methodology for rapid development of fuzzy controllers on FPGAs was developed.
For the case with two inputs, 35 rules and one output (vehicle parking problem), the proposed hardware achieved a maximum clock about 66.251 MHz with 10 bits. However, the TS-FIM takes 10 clocks to complete the inference step, and this decreases the throughput, and it increases the power consumption.
The implementation presented in [14] aims at creating a hardware scheme of fuzzy logic controller on FPGA for the maximum power point tracking in photo-voltaic systems. The implementation takes 6 clocks cycles over 10 MHz and this is equivalent a throughput about 10 MHz 6 ≈ 1.67 Msps. In [16], a Mamdani fuzzy logic controller on FPGA was proposed. The hardware carries out a throughput of about 25 Mflips with two inputs, 49 rules.
The work presented in [17] implements a semi-parallel digital fuzzy logic controller on FPGA. The work achieved about 16 Msps per clock frequency of 200 MHz, that is, 0.08 Msps/MHz. On the other hand, this manuscript uses a fully parallel approach and it achieves 1 Msps/MHz, in other words, it can execute more operations per clock cycle. In the same direction, the proposals presented in [18,20] shows a semi-parallel fuzzy control hardware with low-throughput, about 1 Msps.
Thus, this manuscript proposes a hardware architecture for the Fuzzy-PI control system. Unlike the works presented in the literature, the strategy proposed here uses a fully parallel scheme associated with a hybrid use in the bit format (fixed and floating-point). After several comparisons with other implementations of the literature, the scheme proposed here showed significant gains in processing speed (throughput) and dynamic power savings. Figure 1 shows the Fuzzy-PI intelligent control system operating a generic plant [4,42,43]. The plant output variable y(t) is called the controlled variable (or controlled signal), and it can admit several kinds of physical measurements such as level, angular velocity, linear velocity, angle, and others depending on the plant characteristics. The controlled variable, y(t), passes through a sensor that converts the physical measure into a proportional electrical signal that it is discretized at a sampling rate, t s , generating the signal, y(n).

Takagi-Sugeno Fuzzy-PI Controller
Kp Ki x 0 (n) x 1 (n) r(n) Figure 1: Architecture of the Fuzzy-PI feedback control system operating a generic plant.
The plant drives the kind of sensor that will be used. For level control in tanks used in industrial automation, the sensor can be characterized by the pressure sensor. For robotics applications (manipulators or mobile robotics), the sensor can be position sensor (capture angle information) or encoders sensor (capture angular or linear velocity information).
In the n-th time, the Fuzzy-PI controller (see Figure 1) uses the signal, y(n), and it calculates the error signal, e(n), and difference of error, e d (n). The signal e(n) is expressed by where the y sp (n) is the reference signal also called the set point variable and the signal e d (n) by e d (n) = e(n) − e(n − 1).
After the computation of the signals e(n) and e d (n), the Fuzzy-PI controller generate the signals x 1 (n) and x 2 (n), which can be expressed as and The variables Kp and Ki represent the proportional gain and the integration gain, respectively [4,42,43].
Subsequently, the signals x 0 (n) and x 1 (n) are sent to the fuzzy Takagi-Sugeno inference, called in this article of Takagi-Sugeno -Fuzzy Inference Machine (TS-FIM) (see Figure 1).
The TS-FIM is formed by three stages called fuzzification, operation of the rules (or rules evaluation) and defuzzification (or output function) [4]. In the fuzzification each i-th input signal x i (n) is applied to a set of F i pertinence functions whose output can be expressed as where, µ i,j (·) is the j-th membership function of the i-th input and f i,j (n) is the output of the fuzzification step associated with the j-th membership function and the i-th input in the n-th time. For two inputs, x 0 (n) and x 1 (n), the TS-FIM generates a set of F 0 + F 1 fuzzy signals (f 0,j and f 1,j ) and these signals are processed by a set of F 0 F 1 rules in the operation (or evaluation) phase. Each g-th rule can be expressed as where g = F 0,l+k for (l, k) = (0, 0), (0, 1), . . . , (F 0 −1, F 1 −1). Finally, the output (defuzzification) of TS-FIM, where A g , B g e C g are parameters defined during the project [4].Thus it can be said that every n-th instant TS-FIM receives as input x 0 (n) and x 1 (n) and generates as output v(n), that is, where TSFIM (·) is a function that represents TS-FIM.
After the TS-FIM processing, the Fuzzy-PI controller integrates the signal v d (n) generating the signal v(n) (see Figure 1). The signal is the output of the Fuzzy-PI controller, and it can be expressed as The signal v(n) is saturated between v min and v max , generating the signal r(n) that it is expressed as Finally, the signal r(n) is sent to a actuator, which transforms the discrete signal into a continuous signal, r(t), to be applied to the plant.

Input Processing Module (IPM)
The IPM (shown in Figure 3) is responsible for processing the control signal generated by the plant to the input of the Fuzzy-PI controller. The IPM computes the Equations 1, 2, 3 and 4. The signals associated with this module were implemented with M bits where, one is reserved for the sign and N for the fractional part where, the value of M can be expressed as where y max represents the maximum value, in modulus, of the process variable, y(n   words, there is a delay of the four samples between the n-th output and n-th input.  The TS-FIMM-OS will have a longer sample time than TS-FIMM-P because the critical path is also longer; however, the TS-FIMM-OS does not have a delay. It is important to empathize that the delay inside the feedback control can take a system to instability. The instability degree depends on the system and how long is the delay. The instability will depend on the characteristics of the system and the size of the delay [44]. On the other hand, the pipeline scheme associated with TS-FIMM-P has a short sample time (short critical path), and this permits a high-throughput when it compares to TS-FIMM.

Membership Function Module (MFM)
In the MFM, each i-th input variable is associated with a module that collects F i membership functions, called here Membership Function Group (MFG). Figure 6 shows the i-th MFG, called of the MFG-i, related Each MFG-i collects F i membership functions (see Figure 6) called MF-ij and each module MF-ij implements the j-th membership function associated with the i-th input, µ i,j (x i (n)). In every n-th time instant all membership functions, i F i , are executed in parallel and at the output of each MF-ij is generated a N bits signal of type u and without the integer part, called f i,j [uN.N](n) (see Figure 6). The Fuzzy-PI controller proposed here uses F 0 + F 1 membership functions. Each j-th membership function associated with i-th input was implemented directly on hardware based on the following expressions being µ RT i,j (·) the trapezoidal function on the right, where W and T are the number of bits in the integer and fractional part relative to the constants of the j-th activation function associated with i-th input. For the trapezoidal of the left one has with Finally, for the triangular membership function is expressed as where The values of W and T will set the resolution of the activation functions. In the implementation proposed in this work, the value of W is always expressed as W = 2 × T + 1. The use of non-linear pertinence functions can be accomplished by applying Lookup Tables (LUTs) in the implementation.
Although this implementation uses only two inputs (x 0 [sV.N](n) and x 1 [sV.N](n)) and seven membership functions for each input, this can be easily extended for more inputs and functions, since the entire implementation is performed in parallel.

Operation Module (OM)
The F 0 + F 1 outputs from the MFM module are passed to the OM module that performs all operations relative to the F 0 F 1 rules, as described in Equation 6 on Section 3. Figure  .

Output Function Module (OFM)
The OFM, illustrated in Figure 9, performs the generation of the TS-FIMM output variable during the step called defuzzification. This step essentially corresponds to the implementation of the Equation 7 presented in Section 3. The blocks called NM and DM perform the numerator and denominator operations presented in Equation 7, respectively.  hardware components called WM-g and an adder tree structure. Each g-th WM-g, detailed in Figure 11, is a parallel hardware implementation of the variable a g presented in Equation 7. The F 0 F 1 WMs hardware components are also implemented in parallel and they generated The adder tree structure, illustrated in Figure 10, has a depth expressed as log 2 ( F 0 F 1 ) thus the output signal a(n) (see Equation 7) can be performed as a[sP.N](n) where The DM, presented in Figure 12, is characterized with an adder tree structure with depth also expressed as Figure 10: Hardware architecture of the NM.  Since the TS-FIMM inputs and the values of A g , B g and C g are between −1 and 1, it can be guaranteed, from Equation 7, that the output , v d [sV.N](n), continue normalized between −1 and 1. Thus, one can use the same input resolution, that is, N for the fractional part and V = N + 1 for the integer part, as shown in Figure 9. Figure 12: Hardware architecture of the DM.

Integration Module (IM)
The IM, shown in Figure 13, implements the Equation 9 presented in Section 3. This module is the last step on the Fuzzy-PI hardware and it is composed of the accumulator with a saturation. The output signal, Figure 13: Hardware architecture of the IM.

Synthesis Results
The synthesis results were obtained to Fuzzy-PI controller (see Figure 2) and also to specific modules TS-FIMM-OS (see Figure 4) and TS-FIMM-P (see Figure 5). The separate synthesis of the TS-FIMM allows to analysis of the Fuzzy inference algorithm core in the complete hardware proposal. All synthesis results used an FPGA Xilinx Virtex 6 xc6vlx240t-1ff1156 and that has 301,440 registers, 150,720 logical cells to be used as LUTs and 768 multipliers.

Synthesis Results -TS-FIMM Hardware
Tables 1 and 2 present the synthesis results related to hardware occupancy and the maximum throughput,  Synthesis results show that the hardware proposal for TS-FIMM takes up a small hardware space of less than 1%, PR, in registers and less than 7% in LUTs, PLUT, of the FPGA (see Tables 1 and 2  important point to be analyzed, still in relation to the synthesis, is the linear behavior of the hardware consumption in relation to the number of bits, unlike the work presented in [45], and this is important, since it makes possible the use systems with higher resolution.
The values of throughput, R s , were very relevant, with values about 11.5Msps for TS-FIMM-OS and values about 17Msps for TS-FIMM-P. These values enables its application in various large volume problems for processing as presented in [30] or in problems with fast control requirements such as tactile internet applications [22,21]. It is also observed that throughput has a linear behavior as a function of the number of bits.
The TS-FIMM-P has a speedup about 1.47× ( 17Msps 11.5Msps ) regards the TS-FIMM-OS. This speedup was driven by the critical path reduction with the pipeline scheme. However, the pipeline scheme in TS-FIMM-P used about 3.4× registers (NR) more than TS-FIMM-OS.
with a R 2 = 0.9766. For throughput in Msps was found a plane, f Rs (N, T), characterized as with R 2 = 0.7521.
with a R 2 = 0.9838. For throughput in Msps was found a plane, f Rs (N, T), characterized as with R 2 = 0.5366.

Synthesis Results -Fuzzy-PI Controller Hardware
Tables 3 and 4 present the synthesis results related to hardware occupancy and throughput, R s for the Fuzzy-PI controller hardware (see Figure 2) . The results are presented for several values of N and T = 10.
Synthesis results, drawn on Table 3 and 4, show that the proposed implementation requires a small fraction of hardware space, less than 1%, PR, in registers and less than 8% in LUTs, PLUT, of the FPGA.
In addition, it is possible to see the numbers of embedded multipliers, PNMULT, remained below 7%. This occupation enables the use of several Fuzzy-PI controllers in parallel in the same FPGA hardware and this allows various controls systems running in parallel on industrial applications. The low size implementation also allows the use in low cost and power consumption IoT and M2M applications. Regarding throughput,  application in several problems with large data volume for processing as presented in [30] or in problems with fast control requirements such as tactile internet applications [21].

Validation Results -TS-FIMM Hardware
The Figures 18 and 19 show the mapping between input (x 0 (n) and x 1 (n)) and output v d (n) for proposed hardware and a reference implementation with Fuzzy Matlab Toolbox (License number 1080073) [46], respectively. The Matlab implementation, shown in Figure 19, uses floating-point format with 64 bits (double precision) while in Figure 18 the proposed hardware-generated mapping is presented using lower resolution synthesized (N = 8, V = 9 and T = 4). These figures are able to present a qualitative representation of the proposed implementation, in which the obtained results are quite similar to those expected.
The Table 5 shows the mean square error (MSE) between the Fuzzy Matlab Toolbox and the proposed hardware implementation for several cases N and T . For the experiment, the calculation of M SE is expressed as where Z represents the number of tested points that corresponded to 10000 points spread evenly within the limits of the input values (−1 and 1). The Figures 18 and 19 were generated with these points.

Validation Results -Fuzzy-PI Controller Hardware
In order to validate the results of the Fuzzy-PI controller in hardware, bit-precision simulation tests were performed with a non-linear dynamic system characterized by a robotic manipulator system called the Phantom Omni [47,48,49,50]. The Phantom Omni is a 6-DOF (Degree Of Freedom) manipulator, with rotational joints. The first three joints are actuated, while the last three joints are non-actuated []. As illustrated in Figure 20, the device can be modeled as 3-DOF robotic manipulator with two segments L 1 and L 2 . The segments are interconnected by three rotary joints angles θ 1 , θ 2 and θ 3 . The Phantom Omni has been widely used in literature, as presented in [47,48,49]. Simulations used L 1 = 0.135 mm, L2 = L1, L3 = 0.025 mm and L4 = L1 + A where A = 0.035 mm as described in [49].
Non-linear, second order, ordinary differential equation used to describe the dynamics of the Phantom Omni can be expressed as where θ(t) is the vector of joints expressed as τ is the vector of torques acting expressed as M (θ(t)) ∈ R 3×3 is the inertia matrix, C θ(t),θ(t) ∈ R 3×3 is the Coriolis and centrifugal forces matrix, g (θ(t)) ∈ R 3×1 represents the gravity force acting on the joints, θ(t), and the f θ (t) is the friction force on the joints, θ(t) [47,48,49,50]. θ 3 (n), respectively. The simulation trajectory was of 10 seconds and every 2 seconds was changing. Table   6 shows the angle trajectory changing for set point variables θ sp 1 (n), θ sp 2 (n) and θ sp 3 (n). Simulations used t s = 1 × 10 −5 , Kp = 2000 and Ki = 0.1 for each i-th Fuzzy-PI-i hardware.
In the results presented in Figures 22, 23 and 24 it is possible to observed that the controller followed  the plant reference in all cases. Results also showed that the Takagi-Sugeno Fuzzy-PI hardware proposal has been following the reference even for a small amount of bits, that is, a low resolution. Table 7  In additional, Table 7  In the work presented in [11], the results were obtained for several cases and for one with two inputs, 35 rules and one output (vehicle parking problem) the proposed hardware achieved a maximum clock about       [18,20] shows a hardware can achieve about 1 Msps. The work presented in [18] uses two inputs, 25 rules, one output and 8 bits and the designer presented in [20] was projected with three inputs, 42 rules and one output. The speedup in Msps for the TS-FIMM-OS, TS-FIMM-P, Fuzzy-PI-OS and Fuzzy-PI-P are equal to previously calculated values used in [5]. The speedup in Mflips are about 49 25 ≈ 1.96× and 49 42 ≈ 1.16× over the speedup in Msps for works [18] and [20], respectively. Finally, the hardware proposes in [7] achieved a throughput of about 1.56 Msps with three inputs, two outputs and 24 bits. The speedup in Msps for the TS-FIMM-OS, TS-FIMM-P, Fuzzy-PI-OS and Fuzzy-PI-P are 11 [7] does not use linguistic fuzzy rules and it cannot calculate the throughput in Mflips. Table 8 shows a comparison regarding the hardware occupation between the proposed hardware in this work and other literature works presented in Table 7. The second, third, fourth and fifth columns show the  Table 7. The ratio of the hardware occupation can be expressed as    Table 9 shows the dynamic power saving regards the dynamic power. The dynamic power can be expressed as

Power consumption comparison
where N g is the number of elements (or gates), F clk is the maximum clock frequency and V DD is the supply voltage. The frequency dependence is more severe than equation 31 suggests, given that the frequency at which a CMOS circuit can operate is approximately proportional to the voltage [41]. Thus, the dynamic power can be expressed as For all comparisons, the number of elements, N g , was calculated as Based on Equation 30, the dynamic power saving can be expressed as where the N ref

Analysis of the comparison
Results presented in Tables 7 and 9 demonstrate that the fully parallelization strategy adopted here can achieve significant speedups and power consumption reductions. On the other hand, the fully parallelization scheme can increase the hardware consumption, see Table 8.
The mean value of speedup was about 10.89× in Msps and 30.89× in Mflips (see Table 7) and this results are very expressive to big data and MMD applications [1,2,3]. High-throughput fuzzy controllers are also important to speed control systems such as tactile internet applications [22,21].
This manuscript proposal has LC resource higher utilization than the literature proposals ( Table 8). The mean value regarding NLC utilization was about 6.89×; in other words, the fuzzy hardware scheme proposed here has used 6.89× more LC than the literature proposals. In the case of multipliers (NMULT), the mean value of the additional hardware was about 17.69×. Despite being large relative values, Tables 1, 2, 3 and 4 show that the fuzzy hardware proposals in this work expend no more than 7% of the FPGA resource.
Another important aspect is the block RAM resource utilization (NBitsM). The fully parallel computing scheme proposed here, do not spend clock time to access information in block RAM and this can increase the throughput and decrease the power consumption (see references [11], [5] and [16] in Tables 7, 8 and 9).
The fully parallel designer allows to execute many operations per clock period, and this reduces the clock frequency operation and increases the throughput. Due to the non-linear relationship with clock frequency operation (see Equation 30), this strategy permits a considerable reduction of the dynamic power consumption (see Table 9). The results presented in Table 9 show that the power saving can achieve values from 4 until 10 6 times and these results are quite significant and enable the use of the proposed hardware here in several IoT applications.

Conclusions
This work aimed to develop a dedicated hardware for a fuzzy inference machine of the Takagi-Sugeno applied a Fuzzy-PI controller. The developed hardware used a fully parallel implementation with fixed-point and floating-point representation in distinct parts of the proposed scheme. All details of the implementation were presented as well as results for synthesis and bit-precision simulations. The synthesis results were performed for several bit size resolutions and showed that the proposed hardware is viable and can be used in applications with critical processing time requirements. Through the synthesis data, curves were generated to predict hardware consumption and throughput to untested bit values, in order to characterize the proposed hardware. In addition, comparison results concerning throughput, hardware occupation, and power saving with other literature proposals were presented. Fuzzy-PI-P ≈ 1.19×