Walsh Spectral Techniques for Logic Synthesis FPGA

The implementation value of multi-output Boolean functions in logic synthesis FPGA can be reduced by using Walsh spectral representation. This paper proposes an algorithm for calculating the maximum coefficient of the autocorrelation function of BF without generating a truth table, using the heuristic procedure limits the maximum autocorrelation coefficients of sorting on a small subset of the function. We also suggest a spectral technique of the linear function transformation defined by disjoint cubes. This method for decomposition of BF, which allows to reducing the complexity of the linear part of the corresponding blocks about 25– 55 %, and the complexity of the nonlinear part of the blocks do not increase more than 10 %, compared to the traditional approach.


Introduction
There are two popular categories of field programmable gate array (FPGA) block structures, namely Look-Up Table-based (LUT) and multiplexor-based (MB); the resulting architectures are called LUT-based and MUX-based architectures respectively [1].
The basic block of an LUT architecture is a look-up table that can implement any Boolean function of up to m inputs, m ≥ 2. For a given LUT architecture, m is a fixed number.In commercial architectures, m is typically between 3 and 6.An m-LUT is typically implemented by static random access memory (SRAM) that has m address lines and 1 data line.An m-LUT can implement any Boolean function of up to m inputs.
In MB architecture core logic element has a structure consisting of configuration multiplexers.An example is the architecture proposed by Actel [5], in which the base unit has a configuration comprising three elements of the multiplexer; and/or series of logical blocks separated trace channels, consisting of the actual trace of the system and global synchronization [1], [2] and [3].
In this paper will be used architecture LCA (Logic Cell Array) type TLU of Xilinx Company, base on configurable logic blocks (CLBs), are bigger and more complex than the Actel or QuickLogic cells.The Xilinx LCA basic logic cell is an example of a coarse-grain architecture.The Xilinx CLBs contain both combinational logic and flip-flops [4].
The first generation of LCA devices appeared in 1985.They consist of logical blocks that include the generator combination of functions that implements the 4-input foundation, and the only element of memory and the trigger.Family of crystals marked with the symbol XC2000 and had two structures with conventional equivalent complexity from 1200 to 1800 twoinput elements (gates).
The second generation of LCA devices, which appeared on the market in 1987, included the logic blocks, extended to implement the 5-input foundation, as well as containing two triggers.Corresponding family of crystals marked with the symbol XC3000 and had five structures, ranging in complexity from 1200 to 5000 gates.Clock frequency XC3000 reaches 125 MHz, which is equivalent to clock frequency of the system in the 30-40 MHz.
The third generation of LCA devices appeared in 1991.It further increased the possibilities of this architecture.In addition, in this series for the first time it was possible to reconcile the memory of a random sample and a combinational logic on a single chip.Corresponding family of crystals be marked XC4000, has ten structures, ranging in complexity from 2000 to 20000 gates.System clock frequency is 60-70 MHz, which is approximately higher than two times in the previous series.
The main drawback of the crystals XC4000 is the underutilization of their resources.The maximum "occupancy" of the crystal does not exceed 70-80 %, since the greater utilization of the crystal having trouble tracing.To solve this problem, the fourth generation architecture of the crystals (series XC5200), submitted in 1996, was redesigned in the direction of greater "traceability" and the possibility of more "waste" of resources.XC5200 family has five structures, ranging in complexity from 2000 to 23000 gates with the system clock frequency of 50 MHz.
As already mentioned in this paper, the main difference between the latest developments in the area of the LCA devices, will based on static memory technology is to improve the characteristics of the trace of the crystal.Thus, the analysis of architecture and technology of FPGA allows us to conclude that, in addition to common for the entire microelectronics industry trends to increase the degree of integration, improving overall performance, reduce costs, etc., the new trend is the increased ease of design and debug circuits.However, increasing complexity of both the integrated density and application requirements become higher every pasing day.Those are questions of design and development of algorithms for automatic logic synthesis.It follows that the main problems of logic synthesis in the FPGAs minimize the number of used logic blocks and reduce the complexity of the trace.

Spectral and Correlation Analysis of Boolean Functions
We use the definition of BF in the monographs [6], where they are treated as multi-dimensional functions with m-inputs and k -outputs, and carry out mapping of the form f :{0,1} m → {0,1} k .Set of outputs is denoted as BF f k−1 , ... f 0 , and used the decimal indices x = (X m−1 , ... X 0 ) ∈ 0,1 m is calculated the formula: where x and f can be interpreted as the coordinates of the binary vectors to decimal numbers.Note that the Eq. ( 1) and Eq. ( 2) describe the BF as a piecewise constant function F(x) of real argument on the halfopen interval [0.2 m].With this notation system of BF can be represented as a lattice of y = f(x), defined at the points 0, 1, ... , 2 m -1 interval [0, 2 m ].Extend the function y = f(x) to piecewise constant function F(x) as follows: We say that a piecewise constant function F(x) represents the original system of BF, if it satisfies the condition in Eq. (3) and f(x) is constructed by equations Eq. ( 1) and Eq. ( 2).Thus, the foundation can be described as a vector F =[f (0),f (l), ..., f (2 m -l)] T , where x =(x m−1 , ..., X 0 ), (0 < x < 2 m −l) -a set of input vectors, and f(x) is an integer value, here It is known that between BF and Walsh functions, there is a relationship, which explains the possibility of effective use of spectral analysis in the basis of Walsh functions to analyze the fleet.In order to determine this relationship, we consider details of the Walsh function.These functions are piecewise constant and are given on the half-open interval [0, 2 m ] expression: where 0< ω < 2 m − 1, m ∈ N , and ω i and x i are determined from the binary representations ω and x.
Autocorrelation function of BF f (x 0 , x 1 , ..., x m−1 ) is determined on the basis of relations: where τ ∈ 0, 1, ..., 2 m -1.As seen from Eq. ( 5), the original function is related to the autocorrelation function of convolution transforms.Cross-correlation or simply the correlation function of two BF f 1 (x ) and f 2 (x ) is the function: where τ ∈ 0, 1, ..., 2 m -1.Establish a connection between the correlation functions and features considered earlier Walsh, also known as Wiener-Khinchin theorem [7] and [8]: Properties of the correlation characteristics of BF determined by the properties of convolution transforms of the original features.In particular, the form of these transformations implies the invariance of the correlation characteristics to shift the argument of the original.Converse is also true that the autocorrelation function of the original function can be restored up to a shift of the argument.
The complexity of BF is usually understood as the minimum number of two-input elements necessary for the construction of the scheme; it realizes that the complexity criteria are now known a lot.The simplest and most natural criterion of BF f (x 0 , x 1 , ..., x m−1 ), , which equals the number of arguments to this function, from which it depends, it is assumed that the function essentially depends on the arguments x i , if there are α, β ∈ {0, 1}, such that for any set of arguments (x 0 , ..., This criterion is called the µ 0 , we note that this assessment is quite easy to get, but it is µ 0 criterion of BF very weakly associated with specific properties of the original BF.
Frequently uses criterion of BF µ 1 .To determine this, we use the notion of Hamming distance in the discrete Euclidean space, i.e. if 2 ∈ {0,1}) then the Hamming distance between x 1 and x 2 will be: Then the complexity of BF µ 1 (f ), we mean the number of vectors pairs {x 1 , x 2 } with Hamming distance between them d (x 1 , x 2 ) = 1 such that f (x 1 ) = f (x 2 ).Similarly, we introduce criteria of BF µ r , where r = d (x 1 , x 2 ).Strength criteria with increasing r, but also increases the complexity of their calculation are determined by C r m 2 m .Note that µ-criteria of BF may be related to their correlation functions.Indeed, since the number of true minterm at a distance, for example, 1 corresponds to the values of the autocorrelation function of BF in points τ = 1, 2, 4, ..., 2 m -1, then the function can be regarded as a measure of simplicity of this function, and, as shown in [7], µ(f ) = km2 m -1-ψ(f ).Consider a set of m linear transformations of the arguments of the original BF f (z ).BF obtained to be denoted as f i (z ), and their autocorrelation functions -as B i (τ ); moreover: Denote where T = (τ qs ), τ ∈ {0, 1} and (q, s = 0, l, ..., m-l ).
It is obvious that the function B (T) holds Karpovsky theorem [7], whose formulation is given below.
Here | T | -determinant T, E m -identity matrix size m × m.The importance of this theorem is due to the fact that its use can introduce the concept of an optimal linear transformation of the arguments given BF σ η .It consists of the following: conversion σ η , corresponding theorem Karpovsky, considered the optimal linear transformation of the arguments of BF by the criterion η.

Decomposition of the Boolean Function
Assume that the BF is implemented using a logic block, shown in Fig. 1 and its decomposition -a block in Fig. 2. Thus, under the decomposition of BF realize its expansion on the linear σ and non-linear f σ part.In the literature [10] is used and the more generalized notion of decomposition, called splitting decomposition or disjoint decomposition.This kind of decomposition illustration in Fig. 3 (in all figures the input variables are designated as x = (x m−1 , ..., x 0 ), and the termination as f = (f k−1 , ..., f 0 ); study of disjoint decomposition foundation is dedicated to monograph [7].
Consider the disjoint-decomposition for different numbers of input variables.
• For 2 variables, there is only one type of decomposition, shown in the Fig. 4a.
• For the 3 variables are known, as illustrated in Fig. 4b, has two types of decomposition, the total number of functions involved in it will be C 2 3 + C 2 3 .
• For 4-input variables, number of types of decomposition is three with the total number of functions , as shown in Fig. 4c.
• For 5 variables will be four types of decomposition, the total number of functions involved C 4 5 + C 3 5 + C 2 5 + C 1 5 , as shown in Fig. 4d.From the analysis of above sections, it follows that for any m input variables exist (m-1) BF type (variant), its decomposition, which involves functions Using the rule of common geometric progression, we estimate the upper limit of the functions mentioned below.Then we obtain: Note that this number is negligible to compare with the total number of BF in m variables.
As shown in [8], the proportion of linear BF volume sets, these functions are involved in the disjoint decomposition and all of BF can be roughly illustrated by Fig. 5, where the set of BF, close to the line, indicated by a dotted line .
• Desired transformation σ η = T −1 .Thus, the linear transformation of the BF arguments, the optimal criterion η, is given by: In this case, the sum modulo 2 can be realized in z i , require many inputs, how many units contained in the i-th row, that is σ η , in the worst case complexity of linear part of the BF is proportional to the square of the number of input variables, since the matrix σ η can contain m × (m − 1) of non-zero values.The nonlinear part f σ of BF can be calculated by multiplying each minterm (x m−1 , ..., x 0 ) in the matrix σ.The resulting vector will be minterm nonlinear part f σ of the BF.
To illustrate this fact consider the following example.
Assume that the operation described by summing the decimal function f (x 3 , T ; We note that the i-th column of F -is the decimal representation of the binary digital signal in the output of three bit adder contained in the i-th column of the truth table.
Next, we use the linearization procedure of BF, as described above: after deleting the coefficient B(0), we find that the maximum coefficient of the autocorrelation function of BF is 18 with the number of columns (address) τ 0 = 10, which corresponds to the binary representation of 1010.Thus, L 0 = {0, 10}.Then, strike out from the vector in the term l 0 find that following its maximum rate is 16 and is located at 5. Thus, τ 1 = 5 = 0101, L 1 = {0, 10, 5, 15}.Similarly, we find that τ 2 = 7 or 13.Arbitrarily choose a value.Let it be 13 (1101); L 2 = {0, 10, 5, 15, 13, 7, 8, 2}.Note that the L 2 will remain the same regardless of the choice, because it contains a linear combination of 13 and 7. Similarly, we have τ 3 = 1 = 0001.Then: Thus, during decomposition of BF initially implemented block σ, which translates as if the original set of input variables x in a different set of z, conversion between them is as follows: As an example, consider z = (0010) = 2; The main drawback of the above methods of decomposition of a BF is the fact that for the calculation of the autocorrelation function for the Wiener-Khinchin theorem requires a truth table of this function, resulting in a double need to apply the procedure of transformation that requires a m • 2 m+1 elementary operations.And after the construction of the matrix σ, you want to convert the original truth table of f in the truth table of the function f σ , resulting in the computational complexity of the problem of decomposition increases exponentially with the number of variables, and memory requirements, as well as high-speed computers have become unacceptably large.This paper proposes a procedure for calculating the maximum coefficient of the autocorrelation function of BF without generating a truth table, using the heuristic procedure limits the maximum autocorrelation coefficients of sorting on a small subset of the function on the basis of the Varma-Trachtenberg method [11].

Disjoint Cubes Performance Analysis
The intersection of two cubes C i and C j is the cubic C l , whose coordinates are defined as follows [9]: C l is empty (there is no intersection), if at least one x l i = ∅ or z l j = 0 for all l(0 ≤ l ≤ k − 1).A set of cubes representing the function f, called a covering of C(f ), and the number of elements of the covering C is its size.
Pair-wise intersection (PWI) the set of cubes Cis a set of non-empty cubes are pair-wise intersection of all the cubes of a given set of C i and C j , i = j.In this case, PWI (C ) covers all values of the original features that are included in more than one cube of the function.
Weight w(C i ) of the cube C i is the number of values of the original function, its covering.That is, w .., z i 0 , where τ = (τ m−1 , ..., τ 1 , τ 0 ) and for a set of cubes Define the excess covering logical function f, which can be obtained from the minimum cover the following logical.C = N i=0 C i , here C 0 = C(f ) by definition, and Define a symbol for each cube in C : Note that the properties of cubes, described above, provide an opportunity to perform arithmetic operations on the logical surface, where the elementary set may include more than one value of the original function.At the same time, the operations of calculating the spectrum and autocorrelation functions are arithmetic functions sets of given values in elementary form.This fact allows us to obtain a definite advantage in computational complexity by using operations on the cubes to decompose BF.For example, to calculate the number of unit values of BF simply adds the weights of all cubes in C, since this amount is characterized by the number of elementary sets contained in a cube.The sign of a cube shows whether the weight is added to or subtracted from the weight of the other cubes, as a result, removes duplicate sets, and the situation becomes as if each elementary set was presented once.
Coefficient of the autocorrelation function of BF with B(τ ) can be calculated for any τ (0 ≤ τ ≤ 2 m − 1) by adding (with sign) the weights of all cubes in C(f (x) ∩ C(f (x ⊕ τ )).The proof of this fact is considered in detail [13].From this analysis that is possible to calculate the autocorrelation function of BF without a truth table.To illustrate the above assertion, consider the following example.
Let BF f is presented in the following cover: And the weight of all the cubes w(C Then B(τ ) with τ = 0 equals the total number of minterm, i.e.B(0 It is obvious that finding the maximum coefficients of the autocorrelation function requires going through all the values of their coordinates.However, there is a way to limit the enumeration to only those coordinates, the values of autocorrelation coefficients which are maximal with the highest probability.That is, should examine only those τ , whose units are in the positions corresponding to the uncertain positions in the cubes of the original function with a maximum value of weights, since the intersection of f(x) and f (x ⊕ τ ) vector τ should make minimal changes to the original cover for the largest weights of cubes in the cover.
This simple heuristic can be used to limit the number of coordinates, searched to find the maximum values of the autocorrelation function (Nagayama et al., 2005), even though a relatively small part of the original truth table.In practice, have BF, in which the size of cubes C(f ) grows exponentially.In these cases, the procedure does not apply.

Numerical Results
In this section we provide simulation results on the benchmarks with wide-AND/OR architectures.The complexity of the logic blocks which are general PLAs with several inputs (from about 20 to 100) connected together by some kind of bus structure is high.The performance of the suggested logic synthesis is examined in terms of the cost function and the execution time.The S420 represents a Finite State Machine (FSM) that has 19 input variables, 16 state variables and two output bits.
The FSM is defined by a set of 18 Boolean functions d(i) of 35 variables.In the table: N is the number of disjoint cubes in the representation of f(i) and L orig and L lin stand for the number of literals in SOP representation of the original function and the linearized function as computed by ESPRESSO [14].
Figure 6 shows the average execution time of the linearization procedures of [8] and the proposed method with w = 3 as a function of the number of imputs.The execution time was measured in Intel-Corei3, 2.5 Ghz, 2 GB RAM.For the statistics we used random PLA'S of four outputs and 50 products.The variance of the measurements was less than 3 %.It is clear from Fig. 1 that linearization over disjoint cubes is more efficient in terms of execution time than linearization based on Wiener-Khinchin theorem (W-K).
Table 4 compares the average execution time of the linearization procedure of [9] and the suggested method (SM) (both with w = 3) for randomly generated PLAs having 10 to 40 inputs, four outputs 50 products.
Tab. 4: Average execution-time in seconds for 4-outputs and 50-products PLAs.One example of short realization for simulation results of the linear part with the selection method of using Trachtenberg and Varma's algorithm.

Conclusion and Future Extension
This paper proposed a heuristic algorithm for the first stage of decomposition of the BF, which uses treatment x 3 = x 0 1001000 x 4 = x 3 1100000 x 5 = x 5 Tab. 7: The transformed function.

Transformed function
The ternary matrix Function of the Walsh spectrum of the original BF, linearization technique Karpovsky, as well as general properties of the autocorrelation functions.This algorithm involves finding the maximum autocorrelation coefficient of BF and to determine its address, i.e. serial number.Then, this number appears in the binary system, and location of units produced binary number determined by variables that are involved in its formation.Further search is carried out only in the variables of which was formed by the maximum rate.This greatly reduces the volume of the entire procedures.Then, by typing the required number of input variables, coefficients are deleted from the table for further search; the algorithm terminates the current step and starts a new one.
In future, the proposed technique is verified over standard benchmark functions and randomly generated Boolean functions for different number of variables and products.The experimental results will clearly demonstrate more efficiency.

Fig. 6 :
Fig. 6: The average execution time in seconds of Wiener-Khinchin theorem (labeled as W-K) [8], and the proposed spectral algorithm (labeled as SM) as a function of the number of inputs of randomly generated PLAs (4 outputs and 50 products).

Table 3
refers to the benchmark function S420.