Design of Low-Power Structural FIR Filter Using Data-Driven Clock Gating and Multibit Flip-Flops

Optimization for power is one of the most important design objectives in modern digital signal processing (DSP) applications.,e digital finite duration impulse response (FIR) filter is considered to be one of the most essential components of DSP, and consequently a number of extensive works had been carried out by researchers on the power optimization of the filters. Datadriven clock gating (DDCG) and multibit flip-flops (MBFFs) are two low-power design methods that are used and often treated separately. ,e combination of these methods into a single algorithm enables further power saving of the FIR filter. ,e experimental results show that the proposed FIR filter achieves 25% and 22% power consumption reduction compared to that using the conventional design.


Introduction
Actually, power consumption presents an important issue when designing electronic devices such as mobile phones. Power in digital electronic circuits can be considered as static, dynamic, leakage, and short-circuit power, where the main advantage of CMOS VLSI circuits is low static power, and the dynamic power is the major power source of them all. e source of dynamic power consumption is due to the highest switching rate of the clock signal. On the other hand, the finite impulse response (FIR) filter is widely used as a critical component for implementing several digital signal processing (DSP) hardware circuits for their guaranteed linear phase and stability. ese circuits perform key operations in various recent mobile computing and portable multimedia applications such as high-efficiency video coding (HEVC), channel equalization, speech processing, and software defined radio (SDR).
is fact pushed designers to search for new methods to grant low power consumption for the FIR filter. In several applications, such as the SDR channelizer, there is a need to implement the FIR filters in reconfigurable hardware [1,2]. In [3,4], authors have minimized power consumption of the FIR filter by reducing the filter coefficients without modifying its order. In [5], an approximate signal processing technique is used. In several approaches, the structure of the filter is simplified by add and shift operations. For low-power architectures, many techniques are used [6]. An integer linear programming (ILP) approach to design optimal finite wordlength linear-phase FIR filters in the logarithmic number system (LNS) domain has been proposed in [7], and different input wordlength and filter taps are adopted in [8]. In [9], a reduced dynamic signal representation technique is used. In [10], a reversible technique has been used. A memristor-based FIR filter has been proposed in [11]. In [12], a multibit flipflop (MBFF) technique has been introduced for FIR power optimization. In [13], the data-driven clock gating (DDCG) technique has been used for power digital filter optimization.
Several works have been proposed in the last decade using the clock gating technique for digital filters. In [14], data-driven clock gating for digital filters has been implemented.
In this study, we propose a combination of (MBFFs) and (DDCG) techniques on a single algorithm applied to an appropriate structure of the FIR filter for power saving. e remainder of this paper is organized as follows. Section 2 presents the background of the existing FIR filter. Section 3 discusses the power optimization by the clock gating technique. Section 4 analyzes the reduction of the power optimization by the MBFF technique. Section 7 shows the discussions and results of the proposed FIR filter. Finally, conclusions are drawn in Section 8.

Background and Existing FIR Filters
An N-th order FIR filter performs N-point liner convolution of the input sequence with filter coefficients for a new input sample. e transfer function of the linear invariant (LTI) FIR filter can be expressed as the following equation: where N represents the length of the filter, h k is the kth coefficient, and x(n − k) are the input data at time instant (n − k). e z transform of the data output is where H(z) is the transfer function of the filter given by However, FPGA comes at the cost of speed, power, and overhead compared to ASICs. e improvement of the performance of the filter by algorithm reformulation is limited by the generalized reconfigurable nature. For this, several architectures have been proposed in the last recent years.
A filter can be implemented in the direct form (DF) or the transposed direct form (TDF). Figures 1 and 2 present, respectively, the structure of the DF and the TDF of the FIR filter. e transposed form and the direct form of an FIR filter are equivalent. It is easy to prove that, in the direct form, as shown in Figure 1, the wordlength of each delay element is equal to the wordlength of the input signal. However, in the transposed form, each delay element has a longer wordlength than that in the direct form; moreover, the delay elements are used to delay the product or sum of products. e transposed structure reduces the critical path delay, but it uses more hardware. In the critical path, there are 1 multiplier + (M − 1) adders in the DF but only 1 multiplier + 1 adder in the TDF. e improvement on performance is more observable for large M.
In VLSI implementation, the TDF is preferred over the direct form due to its inherent pipelined accumulation section. A TDF consists of two modules: multiple constant multiplication (MCM) and product accumulation to produce the filter output. In the last decade, a lot of effort has been put to reduce the complexity of the MCM module.
However, the product accumulation module is often ignored. In [14], Xin et al. proposed a novel structure for areaefficient implementation of FIR filters by replacing parts of long wordlength with shorter wordlength SAs. Figure 2 shows the proposed transposed direct form FIR filter architecture.
e number of adders can be estimated as [15] (4) Figure 3 shows the proportion of the MCM block and the accumulation block in terms of FAs (full adders). MCM consumes about 10% of the total FAs, while the accumulation block consumes about 90%. In [16], Proakis et al. prove that if the phase of the filter is linear, the coefficients are symmetric or antisymmetric: If M is odd, Figure 1 can be improved to save the hardware cost, shown as in Figure 3. e structure in Figure 3 uses only (M + 1)/2 multipliers, which is reduced by almost 50% for a large M and uses the same number of adders as the structure in Figure 2. Since multipliers consume the most area in the filter, the optimization based on the symmetric structure can reduce power dissipation. e canonical signed digit (CSD) arithmetic can be used for reducing the area and power for the M-tap filter. Transposed structure and symmetry structure can be used to improve the delay and cost. Figure 4 shows the transposed symmetric structure of the FIR filter with M being odd. erefore, in this paper, we are interested on the optimization of the accumulation block. If we can reduce the power of the accumulation block module, we can reduce the power consumption of the filter.
In [8], Xin et al. have expressed (5) as Figure 5 shows the structure of the filter with an odd order N. A similar structure as in Figure 5, without the tap h 2[(N−1)/4] , can be obtained for an even order N. e structure in Figure 5 presents an average power reduction over [16,17] 41% and 23.5%, respectively. In the rest of this paper, we will use the filter structure in Figure 5.

Power Optimization by Clock Gating Technique
Clock gating is a technique that reduces the switching power dissipation of the clock signals.
When the present and the next state of the D flip-flop is observed, it is noticed that when two continuous inputs are identical, the D flip-flop gives the same value as the output. Even if the inputs do not change from one clock to the next, the latch still consumes clock power.
At each clock edge, the main aim of the clock controller block is as follows: e clock gating methods can be classified into three groups: e synthesis-based methods: the clock enables the signal which are synthesized based on the logic of the underlying system

Power Optimization by MBFF Technique
Flip-flops (FFs) are usually used in digital systems to store data. To reduce the clock power, several FFs can be grouped into a module called a multibit FF (MBFF), replacing several flip-flops [18]. Each flip-flop contains two inverters which generate opposite-phase clock signals. e use of MBFFs was proposed for optimizing clock delay, controlling clock skew, and improving routing resource utilization. e MBFF can be used in the RTL design level to reduce the clock-to-Q propagation delay (t p CQ). e driving capability of a clock buffer can be evaluated by the number of minimum-sized inverters that it can drive on a given rising or falling time. Because of this phenomenon, several flipflops can share a common clock buffer to avoid unnecessary power waste. Figure 6 shows the block diagrams of 1-and 2-bit flip-flops. If we replace the two 1-bit flip-flops as shown in Figure 6(a) by a 2-bit flip-flop as shown in Figure 6(b), the total power consumption can be reduced because the two 1-bit flipflops can share the same clock buffer. e best grouping of FFs that minimizes the energy consumption has been explained in [19].

Introducing Clock Gating into MBFF
is approach consists of maximizing the clock deactivation at the gate level, where the clock signal driving a flip-flop is deactivated (gated) when the flip-flop states are not subject to a change in the next clock cycle. Since the clock must be disabled when the inputs to all the flip-flops in a group do not change, it is, consequently, beneficial to group flip-flops whose switching activities are highly correlated to derive a joined enabling signal.
If we consider p probability of switching the clock data, the energy E1 consumed by a 1-bit FF can be expressed as follows: We denote λ 1 and λ k as, respectively, the energy of the FF's and MBFF's internal clock drivers μ 1 and μ k as, respectively, the data toggling energy of the FFs and per-bit data toggling energy of MBFFs In [19], the authors have shown that the expected energy is e relation (8) presents the worst case. In fact, the correlation between FF toggling lets to have an upper energy savings. e energy saving potential of k-MBFF can be expressed as follows: For p � 0, the energy consumption is the approximation of 35% savings for the 2-MBFF and 55% savings for the 4-MBFF. Figure 7 illustrates a filter architecture with a DDCG integrated into a k-MBFF. e group size k that maximizes the energy savings solves the following equation: where C FF is the clock input load of FF and C latch is the clock input loads of the latch.

Implementation Methodology
Several methodology designs have been proposed in the literature [20][21][22] that differ according to system requirements, abstraction level, model refinements, and other design issues. In this paper, the platform-based design methodology has been used. is methodology allows designers to work at high levels of abstraction.  Journal of Electrical and Computer Engineering e PBD [23][24][25][26] is a design methodology that was proposed to decrease time to market and enhance product reuse. is methodology neither uses the bottom-up nor the top-down view; it is defined as a "meeting in the middle" process where refinements of specifications meet with abstractions of potential implementations. is methodology also allows designing the system at high levels of abstraction without making distinction between hardware and software tasks. After defining system specifications and requirements, the designer defines the parts that will be implemented as hardware components, the parts that will be implemented as software running on the component, and the parts realized with reconfigurable hardware.

Results and Analysis
We have used a speech signal for testing the proposed FIR filter. Several parameters have been also used in the present  discussion. e power consumption ratio P r means the ratio of the proposed filter power consumption to the conventional filter power: In the following discussions, as a metric of power savings, we use the power consumption ratio, which means the ratio of the proposed filter power consumption to the conventional filter power. P sl means the leakage power saving ratio and P rl means consumption ratio of the leakage power: e dynamic power saving ratio P sd is expressed as where P rd means the consumption ratio of the dynamic power.
where P st presents the total power saving ratio and P rt means the power consumption ratio of the total power. We compare the proposed implementation of the proposed structural FIR filter using data-driven clock gating and multibit flip-flops with the conventional FIR filter. Table 1 presents the leakage power comparison for random and speech signals input in milli-Watts. e proposed structural reconfigurable FIR filter gives lesser power consumption when compared to the conventional FIR filter design. e specifications of the implemented FIR filters are 16-bit input sequence data and coefficients Data range is [−1, 1] 16-bit multipliers 24-bit digital output signal Speech corpus database in the wav format from ITU-T P.50 [29] and NOIZEUS [30] has been used. Table 2 shows the dynamic power comparison for random and speech signals input in milli-Watts. e proposed FIR filter presents lesser power consumption compared to the classic architectures of FIR filters. Table 3 shows the total power comparison for random and speech input signals input in milli-Watts. e total power of the proposed FIR filter is lower than the conventional design.
As a measure of filter performance degradation, we use the mean square error (MSE) between the proposed reconfigurable filter output and the original filter output. Table 4 presents the comparison in terms of MSE for the random and speech signals. e MSE is reduced compared to the conventional works: where n, S, and Si are the number of samples, the expected output, and the proposed output, respectively. e signal power to mean square error ratio (SMR) [31] is defined as the ratio of the desired signal to the distorted error signal power. Table 5 shows the comparison in terms of SMR for both existing and proposed FIR filter designs.
e proposed reconfigurable FIR filter shows decrease in SMR as taps increase, when compared to the existing designs.
Simulation result examples of gains vs frequency of a low-pass FIR filter with two cutoff frequencies of 100 Hz and 5 kHz are proposed in Figures 8(a) and 8(b). e proposed architecture is also implemented using FPGA of family  [10] 0.0077 0.00191 0.0062 0.00191 Reference [27] 0.0089 0.00194 0.0065 0.00194 Reference [28] 0.0091 0.0096 0.0068 0.0097    Table 6 show the resources occupied by the implementation of the FIR filter with and without the proposed method. e results show that the use of this method does not degrade the other design parameters such as the material resources occupied (Table 6). e proposed FIR filter designed by multibit flip-flops and clock gating has the advantage in terms of area, delay, and power consumption. erefore, the circuit performance is high compared to the FIR filter designed by single-bit flipflops. In this architecture, we have used the carry look ahead adder and the array multiplier for implementing the FIR filter. For example, for the 9-tap FIR filter, we get the output at the 8th clock pulse. But, in the proposed technique, we get the output at the 2nd clock pulse. erefore, delay is reduced and speed of the circuit performance is increased and also power consumption of the circuit is reduced due to multibit flip-flops. e proposed method has been also synthesized using TSMC 0.25 CMOS technology. Power consumption is measured in the spice-level simulations using nanosim [33] with the operation frequency of 100 MHz. In Table 7, the proposed architecture is compared with previous works [5,32] in terms of power saving and MSE. e proposed filter shows power savings than the filters in [5,32].

Conclusion
In this paper, a novel architecture of the FIR filter is proposed. e proposed design is based on the combination of data-driven clock gating (DDCG) and multibit flip-flops (MBFFs) applied to an appropriate FIR filter structure. To Table 5: SMR performance comparison in the speech signal case.

Filter type
Signal power to mean square error ratio (SMR) 50 taps 75 taps Reference [32] 75.4741 86.2563 Reference [9] 75.4641 92.8273 Reference [27] 78.7937 92.1074 Reference [28] 80   compare the power consumption between the conventional and the proposed FIR filters, two types of inputs are used: speech signal and random signal. In the proposed structure of the filter, the leakage power, the dynamic power, and the total power are reduced. e proposed methods allow a power saving up to 22%. e design results show that the loss in resources is less than 10%.
is loss is considered as negligible when compared to the gain in power consumption.

Data Availability
e data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare that they have no conflicts of interest.