FPGA Implementation of an Improved Reconfigurable FSMIM Architecture Using Logarithmic Barrier Function Based Gradient Descent Approach

Recently, the Reconfigurable FSM has drawn the attention of the researchers for multistage signal processing applications. The optimal synthesis of Reconfigurable finite state machine with input multiplexing (Reconfigurable FSMIM) architecture is done by the iterative greedy heuristic based Hungarian algorithm (IGHA). The major problem concerning IGHA is the disintegration of a state encoding technique. This paper proposes the integration of IGHA with the state assignment using logarithmic barrier function based gradient descent approach to reduce the hardware consumption of Reconfigurable FSMIM. Experiments have been performed using MCNC FSM benchmarks which illustrate a significant area and speed improvement over other architectures during field programmable gate array (FPGA) implementation.


Introduction
Digital signal processing (DSP) [1][2][3], pattern matching [4], and circuit testing [5] are the primary applications for most of the digital systems. These applications require a hardwareoriented as well as high-speed control unit. A finite state machine (FSM) is an integral part of any complex digital system. Its inputs are multiplexed to make it hardware oriented, which is known as the finite state machine with input multiplexing (FSMIM). It serves as a control unit, and its operating speed determines the processing speed of the system. The applications as mentioned earlier can be observed as cascaded stages (i.e., multistage) of operations [2], where each stage requires a specific FSM. Hence, a Reconfigurable FSM is investigated in the literature for optimal performance in such applications [6,7]. A Reconfigurable FSM is defined as a single FSM, which acts as one of the FSMs from the set (i.e., set of FSMs for a specific application) by applying particular mode bits. Its implementation is performed on field programmable gate array (FPGA) platforms [6].
The Reconfigurable FSMIM architecture is created by joining (A) Conventional FSMIM architecture [8] and (B) multiplexer bank (which defines the mode based reconfiguration). The optimal synthesis of both the constituting elements is done by Iterative greedy heuristic based Hungarian algorithm (IGHA) [6]. An efficient state encoding technique for an FSM serves as a vital tool to optimize the hardware utilization while implementing on an FPGA platform [9,10]. In the case of Reconfigurable FSMIM, the state encoding of the constituent FSMs altogether affects the look-up table (LUT) requirement of the Reconfigurable FSMIM [6].
The major problem concerning IGHA is the disintegration of a state encoding technique. It uses binary state encoding as a default state assignment technique for operation. The state assignment method for the Reconfigurable FSMIM architecture leads to an optimization problem [6]. To the best of the authors' knowledge, all the state assignment techniques proposed in the literature provide state codes only for a single FSM. Therefore, the objective of this work is the integration of IGHA with an optimal state encoding technique to reduce the hardware consumption of Reconfigurable FSMIM on an FPGA platform.
In the literature, another direction in the implementation of an FSM is RAM-based architectures. The following three 2 International Journal of Reconfigurable Computing types of RAM-based FSM architectures are studied [11]: (a) basic RAM-based FSM architecture, (b) RAM-based FSM architecture with transition-controlled multiplexers, and (c) RAM-based FSM architecture with state-controlled multiplexers. In the basic RAM-based FSM architecture, bits are stored in the form of words. For each transition (i.e., present state combined with the external inputs), the outputs and the state assignment bits for next state are stored in the RAMword memory [12,13]. The RAM size required for basic RAM-based FSM implementation is enormous. Hence, to reduce the RAM depth, RAM-based FSM architecture with transition-controlled multiplexers is used. It consists of an input selector bank, which provides active inputs from the external inputs for selecting a particular state [11]. RAMbased FSM architecture with state-controlled multiplexers is used to reduce the RAM size further. It consists of two separate RAM blocks, out of which the smaller RAM block is assigned to operate the input selector bank [11]. Thus, designing such architecture is very complicated.
In this paper, the Improved Reconfigurable FSMIM architecture is proposed, which surmounts the issue of high LUT consumption during FPGA implementation. The proposed architecture is formed using the improved iterative greedy heuristic based Hungarian algorithm (Improved-IGHA). The Improved-IGHA is the integration of IGHA with the state assignment using logarithmic barrier function based gradient descent approach.
To validate the proposed approach, experiments have been performed using MCNC FSM benchmarks [14]. Experimental results for the proposed architecture illustrate a significant area reduction by an average of 20.38% and speed improvement by an average of 32.73% over VRMUX [11] during FPGA implementation. It also demonstrates an adequate area reduction by an average of 16.05% and speed improvement by an average of 1.77% over Reconfigurable FSMIM-S architecture [6] during FPGA implementation. When these results are compared with CRMUX [11], a speed improvement by an average of 11.06% is obtained. The proposed architecture requires an average of 58.38% more LUTs as compared with CRMUX [11] during FPGA implementation. It is the only trade-off for the proposed design.
The remainder of this article is formed as follows. The research problem formulation is made in Section 2. Section 3 consists of state assignment using logarithmic barrier function based gradient descent approach and an illustrative example. Experimental setup and comparative analysis of this work with the literature are devised in Section 4. In the end, concluding remarks are drawn in Section 5.

Problem Formulation
Recently, the Reconfigurable FSM has drawn the attention of the researchers for multistage signal processing applications. A novel framework for the creation of Reconfigurable FSMIM is given in [6].
A Mealy FSM is represented in a vector form, such as The Reconfigurable FSMIM is defined as a single FSM, which acts as any one of the FSM from the set (i.e., set of FSMs for a specific application) by applying particular mode bits. A set of FSM for a specific application is chosen, where ← the largest FSM (i.e., the FSM with the highest total number of transitions, states, and inputs) in the set and 1, 2, . . . , ← rest of the FSMs in the set.
-mode is the default mode of operation for the Reconfigurable FSMIM [6].
The Reconfigurable FSMIM architecture is created by joining the following two parts: (A) Conventional FSMIM architecture [8], & (B) Multiplexer bank (which defines the mode based reconfiguration). The optimal synthesis of the Multiplexer bank is done by iterative greedy heuristic based Hungarian algorithm (IGHA) [6]. At the last phase of IGHA, state transitions of each constituent FSM of the Reconfigurable FSMIM architecture are presented in Figure 1. Therefore, the state encoding of the constituent FSMs altogether affects the LUT requirement of the Reconfigurable FSMIM architecture. At the end of IGHA, a modified description of a single FSM (i.e., ) is obtained which is used to create the Conventional FSMIM part [6].
In FSM implementation on an FPGA platform, state encoding technique acts as a tool for minimizing the hardware consumption [9,10]. For example, an MCNC FSM benchmark requires 82 LUTs when implemented on a Xilinx xc6vlx75t-3 device (Virtex-6) using the Grey encoding technique. But it needs only 41 LUTs on the same platform using the binary encoding technique.
The major problem concerning IGHA is the disintegration of a state encoding technique. It uses binary state encoding as a default state assignment technique for operation [6]. The state assignment method for the Reconfigurable FSMIM architecture leads to an optimization problem as evident from   Therefore, the objective of this work is the integration of IGHA with an optimal state encoding technique to reduce the hardware consumption of Reconfigurable FSMIM on an FPGA platform.

Methodology
This work is an extension of work presented in [6]. Hence, all the variables from [6] are used in the same context throughout the article. An improved version of IGHA (Improved-IGHA) is proposed. It addresses the issue of optimal state encoding.
A recent body of literature has investigated the performance of three fundamental types of state encoding techniques on an FPGA platform [9]. The studied methods are as follows: (a) structural approaches, (b) heuristic approaches, and (c) pragmatic approaches. Out of these three approaches, structural state encoding technique outperforms on an FPGA platform [9,10]. It uses the knowledge of internal structure (i.e., state transition) of the FSM to generate optimal state codes. Therefore, structural information of FSMs is considered to develop the proposed state encoding technique for the Reconfigurable FSMIM.
The structural information of the Reconfigurable FSMIM (i.e., state transition) is obtained from Figure 1. Hence, a unified weight matrix is defined by adding the weight of all component FSMs for the same corresponding states. It is given in (1).
The mathematical formulation of the cost function for an FSM is given in [15]. It uses the structural information (i.e., state transitions) of the particular FSM. Let ← element of weight matrix and be the hamming distance between two particular state codes. is obtained by counting the number of 1's after an exclusive-OR operation between the binary state codes as shown in Figure 2. Therefore, from the literature [15], the cost associated with a particular set of state codes (i.e., ) is defined by (2). where ( i.e., ) indicates the edge weights between the nodes & ( i.e., columns of ) represents the set of nodes. Hence, each node corresponds to a particular binary state code because opts only the binary labels. symbolizes the total number of nodes in the graph . Let a hypercube be characterized as = ( , ), where is the dimension, is the set of edges, and is the set of vertices of the hypercube [16]. The cardinality of and is given in (3) and (4), respectively.

State Assignment Using Logarithmic Barrier Function
Now, the concept of hypercube embedding is used to reduce (2). An embedding is performed from graph onto a hypercube as described earlier [16,17]. It is defined as : → which is a one-to-one mapping function. Consequently, -binary -vectors are defined as in (5). Thus, if a node of graph (i.e., ) is expressed by a binary state code, the corresponding vertex of the hypercube (i.e., ) is represented by the same binary state code.
In a hypercube, ( , ∈ ) represents the hamming distance between and . It is shown in International Journal of Reconfigurable Computing 5 (6), where is the instantaneous value of . The value of varies between −1 and 1. Therefore, the cost function is reduced to (7) using hypercube embedding.
The objective is thus confined to minimize the cost function given in (7). Evidently, it is a discrete optimization problem, where each state can opt only a particular binary state code.
The convergence of Improved-IGHA depends on the convergence of its constituent algorithms, i.e., IGHA and the applied state assignment technique. Therefore, an algorithm with a high convergence speed is preferred to construct the state assignment technique for Improved-IGHA.
The evolutionary technique, such as genetic algorithm (GA), presents a significant shortcoming as its convergence speed slows down near the global optimum [18,19]. Similarly, particle swarm optimization (PSO) and differential evolution (DE) operate with a high convergence rate but offer premature convergence which is a critical drawback [20,21]. In the literature, penalty-based approaches, such as Lagrangian technique and logarithmic-barrier function (LBF) method, have proven their potentials to obtain the optimum solution with a high convergence speed [22,23]. These methods are advantageous in solving a discrete or combinatorial optimization problem [24,25].
Therefore, the LBF-based Gradient descent approach is adopted to construct the state assignment technique for Improved-IGHA. It is an interior point method that assures the feasible solution. The mathematical formulation of the cost minimization function is performed by LBF. Then, it is reduced iteratively by the gradient-projection approach. The flow chart for the Improved-IGHA is presented in Figure 3.
In LBF technique, the search operation is performed in a continuous space domain to deduce the optimal points. Then, these points are discretized to obtain the optimal solution [26,27].
In LBF method, an objective function subject to inequality constraints is given in The logarithmic barrier function to minimize the cost function (as in (7)) is given in (9). In LBF search, for any move which omits the constraints, the second term serves as a barrier [28]  At the iteration , (9) is defined as shown in min ( , ) Initially, LBF selects a feasible 0 and 0 > 0. Then, it chooses +1 = ⋅ , where < 1. This iterative process goes on until reaches an adequately small value. A full-fledged method is required to solve (10) with respect to . A first-order gradient-projection approach [29] is well-suited for iteratively minimizing (10). In this approach, the model parameters (a.k.a. weight vectors) are evaluated to minimize the objective function when an analytical calculation is not possible [30,31]. In this approach, the underlying representation of the objective function of the problem is given in . .
An iteration of this projection method is defined by (12). In (12), denotes the step size. is chosen to be a small positive real number [29].
Thus, small steps (i.e., ) are taken in the negative gradient direction of the objective function as illustrated in (12). Then, (13) is used to outline the value of on the constraint surface at the next iteration (i.e., ( +1) ). 6 International Journal of Reconfigurable Computing The convergence criterion for this iterative process is defined by (14), where ∈ [0, 1].
In this way, embedding problem is reduced to the determination of -binary -vectors (as shown in (15)) which optimizes the cost function (i.e., (7)).
Hence, the cost function (from (7)) is defined in terms of Hamming distance as shown in The constraint (i.e., boundary condition) for this problem is formed, such as any two vertices on hypercube should not contain the same binary state code (i.e., − ̸ = 0). Hence, the mathematical representation of the constraint is presented in By applying (16) and (17) on (9), the objective function for LBF is reduced to Therefore, the entity ( ) (from (13)) is defined by The evaluation of the derivative term (i.e., ∇ ( , )) is required to move in the gradient descent direction as shown in (12). The needed derivative term is obtained by putting (20), (21), (22), and (23) into (18). Hence, ∇ ( , ) is defined by (24).
If (14) is satisfied, a solution vector which is defined asî s obtained at the end of the iteration. Therefore, the required set of state codes (i.e.,̂) is deduced by discretizinĝusinĝ The pseudocode for the proposed state assignment approach is presented in Algorithm 1.

An
Algorithm 1: State assignment using logarithmic barrier-function based gradient descent approach for the Reconfigurable FSM.

(iii) Dummy State and Position Replacement:
The replacements of the dummy states and positions in and are performed using Algorithm 3 from [6]. The replaced dummy states (highlighted in "bold italic font") and dummy positions (highlighted in "bold font") are presented in Tables 3  and 4. (iv) Output Matching using Bitwise-XOR Operations: Output Matching is not required in this case, as 8 International Journal of Reconfigurable Computing   Tables 3 and 4) is given in Figure 4. Therefore, the weight matrix is formed using (1). It is given in The proposed state assignment algorithm starts by considering the binary state codes as an initial solution. It offers the cost as 62 (from (2)).
At the 100 ℎ iteration, the instantaneous value (from previous iteration, ( 99) ) is obtained as defined by International Journal of Reconfigurable Computing 9 The derivative (from (24)) is evaluated as defined by So, the current value of (i.e., ( 100) ) is obtained from (12). It is given in (30) by choosing = 10 −3 (a very small value).    (26). Hence, the cost is reduced to 48 (from (2)).
In the end, a Bitwise-XOR operation is performed between the updated descriptions of n11 and 9. It provides the Multiplexer bank (i.e., part-B). The updated descriptions of 11 are used to construct the Conventional FSMIM part (i.e., part-A).

Numerical Results and Discussions
To validate the proposed approach, experiments have been performed using MCNC FSM benchmarks [14]. MATLAB (2016b) environment is used to implement the proposed Improved-IGHA. It produces the optimized description for the constituting parts of the Improved Reconfigurable FSMIM architecture. The obtained description is then converted into the Verilog HDL code using MATLAB HDL Coder tool-box. The implementation of the Improved Reconfigurable FSMIM architecture is performed on the Virtex-6 speed-3 device as in [6,11]. The configuration of the workstation to execute computations is as follows: Intel(R) Core i7 (6th Gen), 16 GB RAM, and 3.5 GHz CPU.
In Improved-IGHA, combinations of input lines, states, and output lines are generated using permutation to perform input, state, and output matching, respectively. The number of input and output lines used for matching is restricted to 7 (i.e., 7 P 7 = 5040 combinations) to utilize the resources efficiently. Hence, the information content of an input/output line becomes the criteria for selection. An input/output line with high information content is preferred.
1494 is chosen as (i.e., the circuit added at the 0 ℎ iteration of Improved-IGHA), as it is more complex (i.e., the total number of transitions is high) as compared with the other FSMs in the set. The other FSMs in the set are added iteratively in the design in their respective order.
In an FSM, a specific state is chosen only if a particular set of input bits (i.e., 1's or 0's) are present. Hence, the percentage of 1's and 0's together in an input line acts as information content as shown in Table 5 (the selected input lines to match  with  are highlighted). Similarly, the output is always defined by "1." Hence, the percentage of 1's in an output line serves as information content as shown in Tables 6 and 7 (the selected output lines to match with are highlighted). At the first phase of Improved-IGHA, input and state matching are performed together, and optimal assignments (with respect to ) are made. It is presented in Table 5. All the states are mapped onto states in their respective order. Output matching (with respect to ) is performed iteratively by Bitwise-XOR operations. It is presented in Tables 6 and 7. Then, after updating the descriptions of constituting FSMs, the state assignment using logarithmic barrier function based gradient descent approach is performed.
To present a comparative analysis of the total computation time required by IGHA [6] and Improved-IGHA, an inbuilt feature in MATLAB named "stopwatch timer" is used. It evaluates the elapsed time (i.e., the execution time between the starting and stopping of a function). As evident from the literature [6], linear assignment problems (LAPs) are solved several times by IGHA to perform matchings among all generated combinations to add ∈ { 1, . . . , } iteratively. The convergence period of IGHA to solve a single LAP ranges from 0.03 ms to 0.6 ms. Hence, the total elapsed time taken by IGHA (i.e., ) is given in (32). The convergence time for the state assignment using LBF-based gradient descent approach (i.e., ) to add ∈ { 1, . . . , } iteratively is given in Table 8. Therefore, the total elapsed time taken by Improved-IGHA (i.e., ) is an addition of and (from Figure 3). It is presented in Table 8.
The experimental results presented in Table 8 illustrate that the total computation time required by IGHA is far higher than the convergence time for the proposed state assignment technique (i.e., ≫ ). Therefore, the total computation time required by Improved-IGHA is equivalent to the total computation time needed by IGHA (i.e., ≅ ).
Experimental results for the proposed architecture illustrates a significant area reduction by an average of 20.38% and speed improvement by an average of 32.73% over VRMUX [11] during FPGA implementation. It also demonstrates an adequate area reduction by an average of 16.05% and speed improvement by an average of 1.77% over Reconfigurable FSMIM-S architecture [6] during FPGA implementation. When these results are compared with CRMUX [11], a speed improvement by an average of 11.06% is obtained. The proposed architecture requires an average of 58.38% more LUTs as compared with CRMUX [11] during FPGA implementation. It is the only trade-off for the proposed design. A comparative analysis of the hardware consumption and maximum operating frequency variation on FPGA implementation is presented in Figures 6 and 7, respectively.

Concluding Remarks
This article furnishes the framework for the Improved-Reconfigurable FSMIM architecture. The Improved-Reconfigurable FSMIM architecture is created by joining the following two parts: (A) Conventional FSMIM architecture and (B) Multiplexer bank (which defines the mode based reconfiguration). An improved version of iterative greedy  heuristic based Hungarian algorithm (Improved-IGHA) is proposed to establish the constituting parts as mentioned earlier. Improved-IGHA is an integration of IGHA [6] and a state assignment using logarithmic barrier function based gradient descent approach. It reduces the hardware consumption of the proposed architecture by performing an optimal state encoding. An illustrative example using MCNC FSM benchmarks is also given to demonstrate the steps involved in the creation of the proposed architecture.
The proposed architecture illustrates a significant area reduction by an average of 20.38% and speed improvement by an average of 32.73% over VRMUX [11] during FPGA implementation. It also demonstrates an adequate area reduction by an average of 16.05% and speed improvement by an average of 1.77% over Reconfigurable FSMIM-S architecture [6] during FPGA implementation. When these results are compared with CRMUX [11], a variation-based reconfigurable multiplexer bank (VRMUX) [11] combination-based reconfigurable multiplexer bank (CRMUX) [11] Reconfigurable FSMIM-S architecture [6] Proposed Reconfigurable FSMIM architecture speed improvement by an average of 11.06% is obtained. The proposed architecture requires an average of 58.38% more LUTs as compared with CRMUX [11] during FPGA implementation. It is the only trade-off for the proposed design.

Data Availability
The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
The authors declare that they have no conflicts of interest.