Implementation of Fixed-Point Lattice Wave Digital Filters for Increased Sampling Rate

Low complexity and high speed are the key requirements of the digital filters. These filters can be realized using allpass filters. In this paper, design and minimum multiplier implementation of a fixed point lattice wave digital filter (WDF) based on three port parallel adaptor allpass structure is proposed. Here, the second-order allpass sections are implemented with three port parallel adaptor allpass structures. A design-level area optimization is done by converting constant multipliers into shifts and adds using canonical signed digit (CSD) techniques. The proposed implementation reduces the latency of the critical loop by reducing the number of components (adders and multipliers). Three design examples are included to analyze the effectiveness of the proposed approach. These are implemented in verilog HDL language and mapped to a standard cell library in a 0.18 μm CMOS process. The functionality of the implementations have been verified by applying number of different input vectors. Results and simulations demonstrate that the proposed design method leads to an efficient lattice WDF in terms of maximum sampling frequency. The cost to pay is small area overhead. The postlayout simulations have been done by HSPICE with CMOS transistors


Introduction
Wave digital filters constitute a wide class of infinite impulse response (IIR) digital filters that transform an analog network into a topological equivalent digital filter [1].These filters find applications in a wide variety of areas such as communication, control, biomedical engineering, audio processing and others.A major advantage of WDFs over most of other recursive filters is that they can inherit the fundamental properties such as low coefficient sensitivity and stability under finite-arithmetic conditions [2].Therefore, these are very attractive for Very Large Scale Integration (VLSI) implementation.In these filters, silicon area, computational complexity, power consumption, and maximum achievable sampling rate are highly dependent on coefficient word length [3].Therefore, the word length should be as short as possible, but must be sufficient to satisfy the given filter specifications [2].Many researchers have investigated WDFs that demand low power consumption and high speed, etc., however the toughest challenge is the implementation.The VLSI implementation of WDFs using symmetric two port adaptor structure is represented in [3], [4].In [5], the bit level systolic array method is used to increase the sampling rate to design unit element WDF and a lattice WDF using the same specifications.The systolic hardware architecture of the two filters is compared to the expected values of the integrated circuit parameters.Another method which is used to achieve significant increase in sampling rate of WDFs is most significant bit first arithmetic [6].However, all the filters mentioned above are based on conventional two port adaptor structures also known as Richards' allpass structure.Although M.S. Anderson et al. have compared two port and three port series adaptor realizations of second-order allpass section but VLSI implementation is not done [7].In [8], we have proposed the VLSI implementation of lattice WDF using three port series adaptor allpass structure which provides improved maximum sampling frequency compared to Richards' allpass structure based WDFs.
In this paper, we have replaced conventional Richards' allpass structures with three port parallel adaptor allpass structures using bit parallel arithmetic to improve the maximum sampling rate.To increase maximum sampling frequency, the latency can be reduced by using low-sensitivity filters, resulting in short coefficients (low-latency multiplications) and by removing unnecessary operations in the critical loop via numerical equivalent transformations [9].However, in this work, we have mainly concerned with minimizing the critical loop latency.It is minimized by reducing the number of logic components in the critical loop.Three port parallel adaptor allpass structures can be realized with adders, delays and multipliers [9], [10].The adaptor coefficients are quantized in fixed-point arithmetic.A general multiplier element is very costly in full-custom VLSI implementation.To solve this problem, the multiplication of a data sample by a filter coefficient value is carried out by using a sequence of shifts and adds and/or subtracts.For low power dissipation, the challenge is to implement the multiplier with minimum number of adders.For this purpose, it is attractive to use the canonical signed digit (CSD) representation.Therefore, using CSD coefficients, the hardware cost is reduced as well as speed is increased.The minimum number of nonzero bits are observed in CSD coefficients compared to other radix-2 representations.This reduces the number of adders/subtractors [11].Multiple constant multiplication method is applied to implement CSD coefficients, which in turn again reduces the number of adders [9], [12].
To verify the results, VLSI implementation of lattice WDF of different orders is illustrated in Sec. 6.These filters are coded in verilog HDL language and mapped to a standard cell library in a 0.18 µm CMOS process.For the same specifications, these filters are implemented using conventional Richards' allpass as well as three port parallel adaptor allpass structures.This is enabled us to make a proper comparison between their corresponding hardware realizations.The implemented filters are simulated and tested by applying different input vectors.The comparison results show that the latter design is more efficient than the conventional design in terms of the maximum sampling rate at the cost of small area overhead.
The rest of the paper is organized as follows.Section 2 describes the lattice wave digital filters.Realization of allpass structures is presented in Sec. 3. In Sec. 4 the fixed point coefficient realization is explained.Section 5 explores the VLSI implementation.Three design examples of fixed point lattice wave digital filters using conventional Richards' allpass and three port parallel adaptors allpass structures are presented in Sec. 6. Comparative analysis of the different approaches is also given in Sec. 6. Section 7 concludes the paper.

Lattice Wave Digital Filters
An explicit class of wave digital filters are called lattice wave digital filter.It is well known that the lattice WDF structures have many attractive properties such as low coefficient sensitivity and consequently the low accuracy requirements for the register word length, higher dynamic range, higher overflow level, lower round-off noise, stability and good nonlinear properties under finite-arithmetic conditions where effects of rounding, truncation and overflow are present [2], [10], [13].Lattice WDF structures find applications in lowpass-highpass filter, bandpassbandstop filter, Hilbert transformers and quadrature mirror filters (QMF) realization [14], [15].The resulting structures are found to have minimum hardware, highly modular and less sensitive, making them suitable for signal processors and VLSI implementation.The lattice WDF is represented by two parallel branches, which realize allpass filters.These allpass filters can be realized by using firstand/or second-order wave digital allpass sections.These sections can be implemented using symmetric two port or three port networks known as adaptors in lattice WDF terminology and delay elements [16].The signal flow graph of adaptor consists of multipliers and adders.The multipliers are the γ coefficients that characterize the lattice WDF.The signal flow graph of an N th order lattice WDF is depicted in Fig. 1, where block z −1 represents the unit delay.For any order N there are N +1 2 stages and a maximum of N adaptors.The transfer function of a lattice WDF can be written as the sum of transfer function of two allpass branches where H 0 (z) and H 1 (z) are the transfer functions of stable allpass filters of orders M and N, respectively.In case of low pass filters, M = N − 1 or M = N + 1 so that M + N order of overall H (z) is odd.These filters can be realized in many different ways [17].
In this work, we only consider the cascade realization of the first-and second-order allpass sections.A first-order allpass section can be realized using Richards' structure, where a symmetric two port adaptor and a delay element are used [9].The second-order allpass section can be realized using a cascade of two first-order Richards' allpass structures.A second-order allpass section is also realized using a three port parallel adaptor and two delay elements [1].The detailed discussion of the first-and second-order allpass sections is given in Sec. 3.These allpass sections are recursive structures.Generally, recursive structures require a smaller number of arithmetic operations per sample than their nonrecursive counterparts.One limitation of the recursive structure is the maximum sampling frequency f max at which a filter can operate [1].The maximum sampling frequency for a recursive algorithm, described by a fully specified signal flow graph is [18] where T min is the minimum sampling time, T tot is the total latency of the arithmetic operations and N i is the number of delay elements in the directed loop i [18].The loop(s) that determines the maximum sampling frequency is called the critical loop(s).The digital filters with high maximum sampling frequency are suitable candidates of low power and high speed applications.The reason is that if required sampling rate is less than the maximum sampling rate, the excess speed can be utilized to reduce the power consumption via power supply voltage scaling techniques [17], [18].The area can be minimized by clever hardware design [19].From (2), we observe two factors that are affecting the maximum sampling rate.The first factor is the number of delay elements in the critical loop and second is the total latency in the critical loop.The maximum sampling frequency can be increased by increasing the number of delay elements in the critical loop or by minimizing the critical loop latency.In this work, we have mainly concerned with minimizing the critical loop latency.It is minimized by reducing the number of logic components in the critical loop.It is further minimized by reducing the critical delay at logic level.

Realization of Allpass Structures
A lattice WDF, is realized by the two parallel allpass branches whose output are summed to produce the filter output.These allpass filters are replaced by the cascaded firstand second-order allpass sections implemented using either symmetric two port or three port parallel adaptor structures and delay elements.A first-order two port adaptor has a coefficient value (γ) which controls the response of the allpass section.This adaptor requires a single multiplication and three additions each.Lattice WDFs use four types of symmetric two port adaptors as its building blocks depending on the value of γ coefficient.The signal flow graphs of four single multiplier symmetric two port configurations are shown in Fig. 2.These adaptor coefficients γ may be guaranteed to fall into the interval −1 < γ < 1 [13].Methods to calculate these coefficients from the design specifications have been discussed in [13].The different adaptor structure can be chosen depending on the value of γ coefficient as given in Tab. 1.
In these adaptor structures, the coefficient value of the actual multiplier (α) to be implemented is always positive and less than or equal to half, that is, 0 ≤ α ≤ 1 2 .To design allpass section from these adaptor structures, one port is terminated with the delay element.Generally, the transfer function of a first-order allpass section is given by and for a second-order allpass section where t 0 = −γ 0 , t 1 = (γ 1 − 1)γ 2 and t 2 = −γ 1 [13].The first-and the second-order allpass sections realized using symmetric two port adaptors are called as Richards' allpass structure.Different symmetric two port adaptor structures can be chosen depending on the value of γ coefficient as given in Tab. 1.The second-order allpass sections are also realized using three port parallel adaptor and delays, called as three port parallel adaptor allpass structure.

Richards' Structures
First-order Richards' allpass structure is composed of the symmetric two-port adaptor and a delay element, as shown in Fig. 3.The signal flow graph of the conventional symmetric two-port adaptor forming the first-order Richards' allpass structure is described by the following equations where a 1 and a 2 are the inputs and b 1 and b 2 are the outputs.The critical loop is shown by thick lines in Fig. 3. Since this critical loop has one multiplier, two adders and one delay element, the total latency T tot is equal to T m,α + 2T a , where T m,α is the time delay for the multiplier and T a is the adder delay [9], [19] and N i = 1.Using (2), the maximum sampling frequency f max of this structure is given by Similarly, the second-order Richards' allpass structure is cascade of the two first-order allpass structures and is shown in Fig. 4. Since this critical loop has two multipliers, four adders and one delay element, T tot is equal to T m,α 1 + T m,α 2 + 4T a , where, T m,α 1 and T m,α 2 are the time delays for the two multipliers and T a is the adder delay [9] and N i = 1.The f max of this structure is given by

Three Port Parallel Adaptor Allpass Structure
A second-order three port parallel adaptor allpass structure is shown in Fig. 5.The transfer function of this section is given by [18] where β 1 and β 2 are the adaptor coefficients.Comparing with (4) we get the relationship between the adaptor coefficients to [18] It is observed from Fig. 5 that one of the two loops can be the critical loop.Assuming same number of fractional bits of β 1 and β 2 , loop 1 has one multiplier, three adders and one delay element.While, loop 2 has one multiplier, four adders and one delay element.Since loop 2 contains more components, therefore, it is considered as the critical loop.The maximum sampling frequency of this structure is given by We observe that the critical loop of the Richard's second-order allpass structure contains two multipliers and four as shown in Fig. 4.However, a three port parallel adaptor allpass structure contains only one multiplier and four adders.For the latter realization, the price to pay is somewhat longer coefficient wordlength to meet the filter specifications.However, it is found that the three port adaptor coefficients typically require one extra bit to match the performance of the two port realization for a given coefficient wordlength [7].

Fixed-Point Coefficients
In this work, we concentrate on coefficient quantization in fixed point arithmetic.The goal of a fixed-point arithmetic is to maximize the filter performance and minimize finite-word-length effects [20][21][22][23][24].It is desired that the coefficient values γ k for k = 0, 1, 2, . . ., (M + N −1) are expressed as the following fixed-point binary numbers [10] where x r for r = 0, 1, . . ., B is either 0 or 1.Here x 0 is the sign bit.For negative numbers sign bit is equal to one, whereas for non-negative numbers it is equal to zero.The goal is to express all the filter coefficient values in the above form with the minimum number of fractional bits B. For efficient multiplier implementation in full-custom VLSI implementation, the multiplication of a data sample by a filter coefficient value is carried out by using a sequence of shifts and adds and/or subtracts.In this case, it is desired to express the coefficient values in the following form R r=1 x r 2 −P r (13 where each x r is 1 or −1 and the P R 's are nonnegative integers in the increasing order.The goal is to find all the coefficient values with minimum number of R, the number of power-of-two terms and the maximum number of shifts P R is made as small as possible [10].For this purpose, it is attractive to use the canonic-signed-digit (CSD) representation.This representation has three digits, −1, 0 and +1 as opposed to the two's-complement representation which has only two digits, 0 and +1 [10].The number of adders and/ or subtractors required to realize a CSD coefficient is one less than the number of nonzero digits in this coefficient representation form [25].

VLSI Implementation
In this section, VLSI implementation of lattice WDF is presented.These filter structures are realized using adders, multipliers and delay elements.The multiplication of a data sample with each filter coefficient value is performed using a sequence of shift and add and/or subtract operations which is called as multiplierless implementation.Hence, the filters are implemented only with the adders and/or subtractors and delay elements.For minimum adder implementation the coefficients are realized in CSD representation.Steps followed for the implementation are given in Fig. 6 [26].Mentor Graphics ASIC Design Kit (ADK) tools are used for IC flow, synthesis to standard cells and IC physical design and simulation.

Design Examples
To show the design process three examples of the lattice wave digital structure and their multiplierless implementation are presented.In these implementations Richards' and three port parallel adaptor allpass structures are used.The input samples wordlength is chosen as 8-bits.The coefficients wordlength is 9-bits and 10-bits for Richards' and three port parallel adaptor allpass sections, respectively.The performance of the two implementations are compared in terms of maximum sampling frequency and area.We see from Fig. 1 that 9 th order lattice WDF is composed of one first-and four second-order allpass sections [9].Its transfer function is given by For multiplierless implementation of the lattice WDF, γ coefficients, adaptor type, α coefficients (for Richards' implementation), β coefficients (for three port parallel adaptor implementation) and their CSD representations are given in Tab. 2. For Richards' implementation, blocks of first-and second-order allpass sections are replaced with their equivalent signal flow graphs depicted in Fig. 3 and 4, respectively.The minimum sampling periods T min of individual allpass sections are as follows The minimum sampling period T min of the overall filter, is given by The maximum sampling frequency f max is given by For multiplierless implementation, f max is given by the following equation To implement a low-pass lattice WDF using three port parallel adaptors, blocks of second-order allpass sections are replaced with three port parallel adaptor allpass structures, shown in Fig. 5. Although, (α 0 ) is implemented with the Richards' first-order allpass structure.The f max of the overall filter is determined by one of the critical loops of this allpass section.The f max in terms of T m and T a for each of these allpass sections is same as given in (11).The multipliers are implemented with a network of shift and add and/or subtract operations using CSD coefficients.β coefficients and their CSD equivalents are given in Tab. 2. For the multiplierless implementation, f max is given by Comparing equations ( 18) and ( 19), we observe that f max is improved by approximately 49 % by reducing critical delay.The filters are implemented in CMOS VLSI design and results are summarized in Tab. 3. Here, f max for three port adaptors allpass based lattice WDF is improved by 15 % compared to Richards' allpass based filter.However, the area is increased by 24 %.

Example 2:
Consider an elliptic low-pass lattice WDF with the following specifications [13].F = 16 kHz, f p = 3 kHz, f s = 5 kHz, A p = 1.0 dB , A s = 40 dB, Filter type = Chebyshev, and N=5.From Fig. 1, we observe that 5 th order lattice WDF consists of one first-and two second-order allpass sections.For the above specifications, transfer function is given by For multiplierless implementation of the lattice WDF, γ coefficients, adaptor type, α coefficients (for Richards' implementation), β coefficients (for three port parallel adaptor implementation) and their CSD representations are given in Tab. 4. For Richards' structure implementation, f max is given by To implement the low-pass lattice WDF using three port parallel adaptors, the f max in terms of T m and T a for each of these allpass sections, is same as given in (11).The β coefficients and their CSD equivalents are given in Tab. 4. For the multiplierless implementation, f max is given as follows Comparing equations ( 21) and (22) shows that f max is improved by approximately 28.5 % by reducing critical loop delay.The filters are implemented in CMOS VLSI design to verify the results and are summarized in Tab. 5. Here, f max for three port adaptors allpass based lattice WDF is improved by approximately 16.5 % compared to Richards' allpass based filter.However, the area is increased by 18 %.

Example 3:
Consider an elliptic low-pass lattice WDF with the following specifications [13].F=16 kHz, f p = 3.4 kHz, f s = 4.6 kHz, A p = 0.2 dB, A s =65 dB, Filter type= Cauer, and Filter order N= 7. From Fig. 1, we observed that the 7 th order lattice WDF requires one first-and three second-order allpass sections.For given filter specifications, the transfer function is obtained as The magnitude and phase response of the designed filter are depicted in Fig. 7(a) and 7(b).When an input signal x(t) = sin(80πt) + sin(12000πt) is applied to the filter, its output is y(t) = sin(80πt).Both x(t) and y(t) are shown in Fig. 7(c) and 7(d).The responses shown in Fig. 7 are illustrated using MATLAB tool.
For multiplierless implementation of the lattice WDF, γ coefficients, adaptor type, α coefficients (for Richards' implementation), β coefficients (for three port parallel adaptor based implementation) and their CSD representations are given in Tab. 6.For Richards' implementation, f max is given by To implement a low-pass lattice WDF using three port parallel adaptors, the f max in terms of T m and T a for each of these allpass sections, is same as given in (11).β coefficients and their CSD equivalents are given in Tab. 6.For multiplierless implementation, minimum f max of the overall filter is given by Comparing of equations ( 24) and ( 25), f max is improved by approximately 28.5 % by reducing critical loop delay.The filters are implemented in CMOS VLSI design and results are summarized in Tab. 7. CMOS layout diagram of the WDF using three port adaptor allpass structure is depicted in Fig. 8.The f max for three port adaptors based lattice WDF is improved by 13 % compared to Richards' structure based filter.However, the area is increased by 20 %.

Conclusion
In this paper, novel approach to design a fixed-point lattice WDF for increased maximum sampling frequency is presented.It is increased by reducing the number of logic components in the critical loop resulting reduced critical delay of the logic components.Second-order three port parallel adaptor allpass section has smaller number of logic components in their critical loop than Richards' allpass section.For the given examples the maximum sampling frequency is improved by using three port parallel adaptor allpass than the conventional Richards' allpass structures.Three design examples are included here of different order lattice WDF.Three port parallel adaptor and Richards' allpass structures based lattice WDF meeting the same filter specifications were designed and implemented using logic synthesis from Verilog HDL description.Lattice WDF structures were evaluated with respect to throughput and arithmetic complexity.The efficient implementation of lattice WDF is presented using 0.18 μm CMOS process in a standard cell library.

Example 1 :
Specifications of the Chebyshev low-pass lattice WDF are as follows [16, p. 12]: Sampling frequency F = 16 kHz, Passband edge frequency f p = 3.4 kHz, Stopband edge frequency f s = 4.5 kHz, Passband ripple A p = 0.5 dB, Stopband attenuation A s =50 dB and Filter order N= 9.
Comparison of f max and area of low-pass lattice WDF based on Richards' and three port parallel adaptor structures (Example1).