Elsevier

Microelectronics Reliability

Volume 93, February 2019, Pages 16-21
Microelectronics Reliability

A cost-efficient error-resilient approach to distributed arithmetic for signal processing

https://doi.org/10.1016/j.microrel.2018.12.007Get rights and content

Abstract

Distributed arithmetic (DA) brings area and power benefits to digital designs relevant to the Internet-of-Things. Therefore, a new error resilient technique for DA computation is proposed to improve robustness against process, voltage, and temperature variations. The proposed approach mitigates the effect of timing violations by first providing a guardband for significant (most significant bit) computations. This guardband is initially achieved by modifying the order of DA serial operations and borrowing time from the least significant bit (LSB) group. Therefore, LSB computation can correspond to the critical path, and timing error can be tolerated at the cost of acceptable accuracy loss. Moreover, the shifted-phase clock signals are applied on the end-point registers, thereby increasing the global guardband without any effect on system sampling rate. Our approach is demonstrated on a 16-tap FIR filter using the 65 nm CMOS process. The simulation results demonstrate that this design can maintain error-free operation without worst case timing margin, and achieve up to 42% power savings by voltage scaling when the worst case margin is considered. This is at the cost of a 6.3% delay and 7.3% overhead.

Introduction

In recent years, there has been increased attention on Internet-of-Things (IoT) technologies [1,2], which is expected to benefit numerous application areas including industrial wireless sensor node systems and healthcare systems manufacturing. Specifically, an IoT network is created by integrating smart sensors into a multitude of devices that subsequently share their data with other devices. Therefore, it is necessary to install small-size sensors in a multitude of places with cost-efficient hardware devices capable of performing digital signal processing (DSP) using extremely low levels of energy [3,4].

Among state-of-the-art circuit techniques, distributed arithmetic (DA) has been widely used in area-efficient and low-cost signal processing applications for convolution [5,6], transforms [78], and filtering [9,10]. Moreover, DA-based architecture is also exploited as an excellent technique for implementing approximate computing [11,12], which has recently emerged as a promising approach for designing energy-efficient IoT-related systems [13]. However, with the drastic scaling of CMOS technology, the design of robust systems is becoming a major concern [14,15,16]. Some approaches specifically solve reliability issues in DA circuits, and DA is commonly regarded as a promising technique in current VLSI design. In Khairy's work [17], a novel N-modular redundancy (NMR) algorithm based on the maximum a posteriori (MAP) and the statistics of output bit-failure rate was proposed to tolerate faults. However, it is not cost-efficient in terms of hardware implementation. Ting et al. proposed an approximate distribute arithmetic architecture to improve robustness, at the cost of computation accuracy [18], which clock-gates the whole system when a timing error is predicted. If the circuit suffers from serious process variation or ageing effects, the circuit performance can be shaped drastically, as only N2-bit effective computations remain for an N-bit arithmetic.

For conventional circuits, a large number of error resilient techniques [19] have been proposed to solve this reliability issue. These include algorithmic noise tolerance (ANT) [20], noise reduction unit (NRU) [21], Razor [22], and the adaptive latency technique [23,24,25]. Among these, the adaptive latency technique is a very popular fault tolerant technique, which addresses device variability by tuning architectural latencies. In Choi's work [23], a novel FIR filter synthesis technique based on a common-subexpression-elimination algorithm was proposed, which exploited the principle that not all filter coefficients are equally important for obtaining “reasonably accurate” computation results. In this design, the critical paths of computations involving the important coefficients are constrained to take a fixed number of adders, while the later computational steps only compute the less important coefficient outputs. In this case, only the less important outputs are affected by process variation and voltage scaling. However, this approach does not provide an inherent error-detection mechanism, and only works in CSE-based circuits. ARM research group proposed a new error-resilient approach called path delay shaping (PDS) [24], where it is ensured that the critical paths correspond to a group of LSB result registers by using modified carry-merge adder and device sizing. While PDS is limited by the choice of arithmetic unit, only those that have a minimum delay difference between the LSB and most significant bit (MSB) paths would be suitable, such as the Kogge-Stone adder. If the same idea is applied on the conventional carry-save adder tree or ripple adder, the circuit overhead would be considerable. In [25], Tiwari et al. transferred the time slack of the faster stages to the slow ones by skewing clock arrival times to latching elements in a pipeline processor. Therefore, Timing violations due to process variations in single stage can be prevented by borrowing some extra time from another stage.

In this paper, the idea of adaptive latency computation is exploited on DA for the first time. Two error-resilient approaches are proposed to bound the magnitude of timing errors with the presence of timing violations, which is demonstrated on a 16-tap FIR filter using the 65 nm CMOS process. We initially reverse the bit order in DA-based serial computations and provide the MSB paths with more guardband by time borrowing, which ensures that the timing errors only weakly affect the computation results. In addition, a shifted-phase clock is applied on the end-point registers, thereby providing extra time slack globally without any effect on the system sampling rate. We demonstrate that our approach is more cost-efficient compared with state-of-the-art error-resilient techniques.

The rest of the paper is organized as follows: Section 2 contains a brief review about the distributed arithmetic based digital filter. Section 3 describes the principle of the proposed error-resilient approaches and VLSI implementation. The corresponding simulation results are analysed in Section 4, and Section 5 summarizes and concludes the paper.

Section snippets

Distributed arithmetic

Distributed arithmetic computes the inner product of two vectors (one of which is a constant) in parallel, which is a bit-serial operation in nature. The main advantage of DA computation is that no multiply operations are required, which are replaced by precomputed look-up tables. The DA-based multiplication and accumulation (MAC) can be expressed as follows: yk=k=0L1hk*xkwhere xk and yk represent the input and output data, respectively. If we consider that each input data is an N-bit two's

Proposed error-resilient approaches

With the presence of PVT variations, the path delay increases relative to the fixed clock period, eventually resulting in failure when the critical paths are activated. Therefore, the robustness of the circuit can be improved via an increased timing margin. As illustrated in Section 2, an N-bit DA-based computation consumes N clock cycles, and each bit of the final result is generated every cycle using the same addressing, fetch, and accumulation circuits. Hence, delay imbalance across result

Simulation results

As mentioned in the previous section, the proposed approach can improve circuit robustness against timing errors at the cost of hardware area and delay. To demonstrate this, the proposed architecture was applied on a digital FIR filter. This design is synthesized using the Synopsys Design Compiler with the ST 65nm CMOS process.

Conclusion

In this paper, novel approaches to DA are proposed to mitigate the effects of PVT variations. Extra timing slack is provided for computations of the MSB group by a time borrowing technique, and a shifted-phase clock is applied on the end-point register, thereby increasing the global guardband against variations. The proposed design can achieve significant power savings up to 42% under different operating conditions when the worst case margin is included. In general, it outperforms other

References (25)

  • M. Martina et al.

    Result-biased Distributed-Arithmetic-based filter architectures for approximately computing the DWT

    IEEE Trans. Circuits Syst. Regul. Pap.

    (2015)
  • S.K. Singhal et al.

    Efficient parallel architecture for fixed-coefficient and variable-coefficient FIR filters using distributed arithmetic

    J. Circuits, Syst. Comput.

    (2016)
  • Cited by (3)

    • Intelligent logistics scheduling model and algorithm based on Internet of Things technology

      2022, Alexandria Engineering Journal
      Citation Excerpt :

      At the same time, in the literature [2], a system architecture of the Internet of Things belonging to China is also designed, and a new design method is proposed in combination with this architecture. Literature [3] mainly uses the SNMP protocol to achieve effective management of China's Internet of Things system. Literature [4] mainly proposes that in a universal environment, many computers will adjust in time to provide more appropriate services according to external conditions.

    View full text