A cost-efficient error-resilient approach to distributed arithmetic for signal processing

doi:10.1016/j.microrel.2018.12.007

Microelectronics Reliability

Volume 93, February 2019, Pages 16-21

https://doi.org/10.1016/j.microrel.2018.12.007 Get rights and content

Abstract

Distributed arithmetic (DA) brings area and power benefits to digital designs relevant to the Internet-of-Things. Therefore, a new error resilient technique for DA computation is proposed to improve robustness against process, voltage, and temperature variations. The proposed approach mitigates the effect of timing violations by first providing a guardband for significant (most significant bit) computations. This guardband is initially achieved by modifying the order of DA serial operations and borrowing time from the least significant bit (LSB) group. Therefore, LSB computation can correspond to the critical path, and timing error can be tolerated at the cost of acceptable accuracy loss. Moreover, the shifted-phase clock signals are applied on the end-point registers, thereby increasing the global guardband without any effect on system sampling rate. Our approach is demonstrated on a 16-tap FIR filter using the 65 nm CMOS process. The simulation results demonstrate that this design can maintain error-free operation without worst case timing margin, and achieve up to 42% power savings by voltage scaling when the worst case margin is considered. This is at the cost of a 6.3% delay and 7.3% overhead.

Introduction

In recent years, there has been increased attention on Internet-of-Things (IoT) technologies [1,2], which is expected to benefit numerous application areas including industrial wireless sensor node systems and healthcare systems manufacturing. Specifically, an IoT network is created by integrating smart sensors into a multitude of devices that subsequently share their data with other devices. Therefore, it is necessary to install small-size sensors in a multitude of places with cost-efficient hardware devices capable of performing digital signal processing (DSP) using extremely low levels of energy [3,4].

Among state-of-the-art circuit techniques, distributed arithmetic (DA) has been widely used in area-efficient and low-cost signal processing applications for convolution [5,6], transforms [78], and filtering [9,10]. Moreover, DA-based architecture is also exploited as an excellent technique for implementing approximate computing [11,12], which has recently emerged as a promising approach for designing energy-efficient IoT-related systems [13]. However, with the drastic scaling of CMOS technology, the design of robust systems is becoming a major concern [14,15,16]. Some approaches specifically solve reliability issues in DA circuits, and DA is commonly regarded as a promising technique in current VLSI design. In Khairy's work [17], a novel N-modular redundancy (NMR) algorithm based on the maximum a posteriori (MAP) and the statistics of output bit-failure rate was proposed to tolerate faults. However, it is not cost-efficient in terms of hardware implementation. Ting et al. proposed an approximate distribute arithmetic architecture to improve robustness, at the cost of computation accuracy [18], which clock-gates the whole system when a timing error is predicted. If the circuit suffers from serious process variation or ageing effects, the circuit performance can be shaped drastically, as only $\frac{N}{2}$ -bit effective computations remain for an N-bit arithmetic.

For conventional circuits, a large number of error resilient techniques [19] have been proposed to solve this reliability issue. These include algorithmic noise tolerance (ANT) [20], noise reduction unit (NRU) [21], Razor [22], and the adaptive latency technique [23,24,25]. Among these, the adaptive latency technique is a very popular fault tolerant technique, which addresses device variability by tuning architectural latencies. In Choi's work [23], a novel FIR filter synthesis technique based on a common-subexpression-elimination algorithm was proposed, which exploited the principle that not all filter coefficients are equally important for obtaining “reasonably accurate” computation results. In this design, the critical paths of computations involving the important coefficients are constrained to take a fixed number of adders, while the later computational steps only compute the less important coefficient outputs. In this case, only the less important outputs are affected by process variation and voltage scaling. However, this approach does not provide an inherent error-detection mechanism, and only works in CSE-based circuits. ARM research group proposed a new error-resilient approach called path delay shaping (PDS) [24], where it is ensured that the critical paths correspond to a group of LSB result registers by using modified carry-merge adder and device sizing. While PDS is limited by the choice of arithmetic unit, only those that have a minimum delay difference between the LSB and most significant bit (MSB) paths would be suitable, such as the Kogge-Stone adder. If the same idea is applied on the conventional carry-save adder tree or ripple adder, the circuit overhead would be considerable. In [25], Tiwari et al. transferred the time slack of the faster stages to the slow ones by skewing clock arrival times to latching elements in a pipeline processor. Therefore, Timing violations due to process variations in single stage can be prevented by borrowing some extra time from another stage.

In this paper, the idea of adaptive latency computation is exploited on DA for the first time. Two error-resilient approaches are proposed to bound the magnitude of timing errors with the presence of timing violations, which is demonstrated on a 16-tap FIR filter using the 65 nm CMOS process. We initially reverse the bit order in DA-based serial computations and provide the MSB paths with more guardband by time borrowing, which ensures that the timing errors only weakly affect the computation results. In addition, a shifted-phase clock is applied on the end-point registers, thereby providing extra time slack globally without any effect on the system sampling rate. We demonstrate that our approach is more cost-efficient compared with state-of-the-art error-resilient techniques.

The rest of the paper is organized as follows: Section 2 contains a brief review about the distributed arithmetic based digital filter. Section 3 describes the principle of the proposed error-resilient approaches and VLSI implementation. The corresponding simulation results are analysed in Section 4, and Section 5 summarizes and concludes the paper.

Section snippets

Distributed arithmetic

Distributed arithmetic computes the inner product of two vectors (one of which is a constant) in parallel, which is a bit-serial operation in nature. The main advantage of DA computation is that no multiply operations are required, which are replaced by precomputed look-up tables. The DA-based multiplication and accumulation (MAC) can be expressed as follows: $y_{k} = \sum_{k = 0}^{L - 1} h_{k} * x_{k}$ where x_k and y_k represent the input and output data, respectively. If we consider that each input data is an N-bit two's

Proposed error-resilient approaches

With the presence of PVT variations, the path delay increases relative to the fixed clock period, eventually resulting in failure when the critical paths are activated. Therefore, the robustness of the circuit can be improved via an increased timing margin. As illustrated in Section 2, an N-bit DA-based computation consumes N clock cycles, and each bit of the final result is generated every cycle using the same addressing, fetch, and accumulation circuits. Hence, delay imbalance across result

Simulation results

As mentioned in the previous section, the proposed approach can improve circuit robustness against timing errors at the cost of hardware area and delay. To demonstrate this, the proposed architecture was applied on a digital FIR filter. This design is synthesized using the Synopsys Design Compiler with the ST 65nm CMOS process.

Conclusion

In this paper, novel approaches to DA are proposed to mitigate the effects of PVT variations. Extra timing slack is provided for computations of the MSB group by a time borrowing technique, and a shifted-phase clock is applied on the end-point register, thereby increasing the global guardband against variations. The proposed design can achieve significant power savings up to 42% under different operating conditions when the worst case margin is included. In general, it outperforms other

References (25)

J. Gubbi et al.
Internet of Things (IoT): a vision, architectural elements, and future directions
Futur. Gener. Comput. Syst.
(2013)
M. Radfar et al.
A yield improvement technique in severe process, voltage, and temperature variations and extreme voltage scaling
Microelectron. Reliab.
(2014)
A. Islam et al.
A technique to mitigate impact of process, voltage and temperature variations on design metrics of SRAM Cell
Microelectron. Reliab.
(2012)
M. Alam
Reliability-and process-variation aware design of integrated circuits
Microelectron. Reliab.
(2008)
I. Muhic et al.
Internet of Things: current technological review and new low power wireless sensor network protocol proposal
South. Eur. J. Soft Comput.
(2014)
M.S. Golanbari et al.
Runtime adjustment of IoT system-on-chips for minimum energy operation
S. Kiamehr et al.
Temperature-aware dynamic voltage scaling to improve energy efficiency of near-threshold computing
IEEE Trans. Very Large Scale Integr. VLSI Syst.
(2017)
G.D. Licciardo et al.
Weighted partitioning for fast multiplierless multiple-constant convolution circuit
IEEE Trans. Circuits Syst. Express Briefs
(2017)
M. Panwar et al.
Modified distributed arithmetic based low complexity CNN architecture design methodology
J. Xie et al.
Hardware-efficient realization of prime-length DCT based on distributed arithmetic
IEEE Trans. Comput.
(2013)

M. Martina et al.

Result-biased Distributed-Arithmetic-based filter architectures for approximately computing the DWT

IEEE Trans. Circuits Syst. Regul. Pap.

(2015)

S.K. Singhal et al.

Efficient parallel architecture for fixed-coefficient and variable-coefficient FIR filters using distributed arithmetic

J. Circuits, Syst. Comput.

(2016)

Cited by (3)

Intelligent logistics scheduling model and algorithm based on Internet of Things technology
2022, Alexandria Engineering Journal
Citation Excerpt :
At the same time, in the literature [2], a system architecture of the Internet of Things belonging to China is also designed, and a new design method is proposed in combination with this architecture. Literature [3] mainly uses the SNMP protocol to achieve effective management of China's Internet of Things system. Literature [4] mainly proposes that in a universal environment, many computers will adjust in time to provide more appropriate services according to external conditions.
With the continuous development of science and technology, the information age has arrived, and people have fully entered the information age. In this new era, Internet technology has achieved unprecedented development, and many innovative technologies have also been proposed. The competition in the logistics industry is becoming more and more fierce. Someone proposed the concept of intelligent logistics. This concept has been continuously expanded and the requirements have become higher and higher. Finally, an intelligent logistics management supported by the Internet of Things technology has emerged. technology. In our school's research, we mainly proposed an intelligent distribution model based on the Internet of Things. This model not only optimizes the distribution process, but also proposes an efficient distribution strategy when faced with a large amount of data. This technology mainly uses the information interaction technology in the Internet of Things, which can ensure that the average delivery speed is the fastest, the average transportation distance is the shortest, and the time consumed in the logistics transmission process is the shortest. In the decision-making process of this intelligent distribution model, we must first establish some intelligent distribution models controlled by multiple indicators. During the experiment, we built a logistics perception system and used heuristic algorithms to solve the packing problem. Answered. The mainstream solution algorithm we use genetic algorithm, a large number of experiments and data show that this algorithm is scientifically based, and it has also exerted its due effect in application. This article mainly analyzes the impact of the intelligent logistics system provided by the Internet of Things technology, and then looks forward to the future development trend of the system. We hope that through the analysis of the status quo, we can promote the further development of my country's Internet of Things technology, and also promote the further development and optimization of my country's intelligent logistics system technology.
Hybrid optimal algorithm-based 2D discrete wavelet transform for image compression using fractional KCA
2020, Multimedia Systems
Improving functional density of time-critical applications using hardware-based dynamic reconfiguration and bitstream specialisation
2019, South African Computer Journal

View full text

A cost-efficient error-resilient approach to distributed arithmetic for signal processing

Abstract

Introduction

Section snippets

Distributed arithmetic

Proposed error-resilient approaches

Simulation results

Conclusion

Futur. Gener. Comput. Syst.

Microelectron. Reliab.

Microelectron. Reliab.

Microelectron. Reliab.

Internet of Things: current technological review and new low power wireless sensor network protocol proposal

South. Eur. J. Soft Comput.

Runtime adjustment of IoT system-on-chips for minimum energy operation

Temperature-aware dynamic voltage scaling to improve energy efficiency of near-threshold computing

IEEE Trans. Very Large Scale Integr. VLSI Syst.

Weighted partitioning for fast multiplierless multiple-constant convolution circuit

IEEE Trans. Circuits Syst. Express Briefs

Modified distributed arithmetic based low complexity CNN architecture design methodology

Hardware-efficient realization of prime-length DCT based on distributed arithmetic

IEEE Trans. Comput.

Result-biased Distributed-Arithmetic-based filter architectures for approximately computing the DWT

IEEE Trans. Circuits Syst. Regul. Pap.

Efficient parallel architecture for fixed-coefficient and variable-coefficient FIR filters using distributed arithmetic

J. Circuits, Syst. Comput.