Novel Low-Power Floating-Point Divider With Linear Approximation and Minimum Mean Relative Error

Floating-point division involves the computation of the ratio (1 + Mx)/(1 + My), where Mx and My represents the mantissas of the input values. In this paper, we propose a new method for approximating this operation using a linear function of Mx, with coefficients that depend on My. The coefficients are calculated to minimize the Mean Relative Error Distance (MRED) of the approximation. To this end, the range of My is partitioned in N sub-intervals where the minimization of MRED is formulated as a linear programming problem, whose solution gives optimal coefficient values. The hardware implementation requires a small lookup table, two multipliers and an adder. An aggressive coefficients quantization is exploited to further optimize the design. Obtained MRED improves by increasing $N$ , ranging from 1.4% to 0.33%. Implementation results in a 28nm CMOS technology show that the proposed design outperforms the state-of-the-art, offering the best trade-off between hardware complexity and accuracy. Results for two image processing applications, change detection and JPEG compression, demonstrate remarkable performance, with SSIM very close to 1 and PSNR values exceeding 50dB.

In this scenario, Approximate Computing (AC) constitutes a valuable solution allowing to reduce area and power at the cost of accepting errors in the computation [4], [5].In addition, the limit of human senses and the error-tolerant nature of many practical applications (as image and audio processing, or adaptive filtering) make the AC approach very effective [6], [7], [8].
Several works have been dedicated to the design of fixedpoint approximate adders and multipliers, proposing a plethora of techniques able to optimize power and area.For instance, the papers [9], [10], [11] show a decomposition method that divides the adder in atomic fast sub-adders, each one working on a portion of the input signals, while [12], [13], [14] exploit an approximate carry-skip architecture able to reduce the critical path delay.In [15] the speculation method is applied to parallel-prefix adders, while [16], [17] present approximate full-adders both at gate and transistor level.
Unlike adders and multipliers, dividers have received less attention in literature.However, in the design of several commercial microprocessors and products [30], [31], [32], hardware dividers are preferred to software realization of the division.
The division between two fixed-point numbers generally exploits iterative algorithms based on subtractions/multiplications in order to compute the quotient starting from an initial estimate [33], [34], [35], [36], [37], [38].In this case, latency and power consumption are primary concerns in the design.Algorithms as the Sweeney-Robertson-Tocher (SRT) try to reduce the number of iterations involving high-radix coding and redundant representations of the quotient [38].Further approaches achieve power improvements approximating the subtractor [39], [40] or applying signal segmentation [41].The realization of non-iterative dividers constitutes a further solution able to compute the quotient with low energy and reduced latency.In this case, the logarithmic number system (LNS) is a valuable means since it allows to express the division as two-operand subtraction followed by a shift [42].In [43], the divisor y is recoded in order to employ only a multiplication and a left-shift, while [44] exploits a linear approximation for the term 1/y.LNS with mean-error compensation is proposed in [45], whereas [46] devises a rounding-based approach to simplify the divider.
Floating-point arithmetic, which represents numbers with sign, exponent, and mantissa, offers both large dynamic range and fine accuracy [47].These properties make floating-point divider design important for many practical DSP applications.
In a hardware divider, sign and exponent computation are simple to implement, involving only a XOR and a subtraction.On the other hand, the mantissa computation is much more complex, requiring a fixed-point division: (1 + Mx)/(1 + My), where Mx and My are the mantissas of dividend and divisor, respectively.A two-step approximate technique is proposed in [48] to perform the mantissa division by means of shiftand-add operations.In this case, the amount of shift and the number of additions, defined at design time, allow to tune the tradeoff between precision and hardware complexity.In [49] a piecewise constant approximation is exploited.Like [48], different levels of accuracy can be achieved by properly choosing the number of ranges in which the constant approximation is applied.In [50] the mantissa division is approximated by means of subtractions and a variable correction term, stored in a LUT, is employed to recover precision.In this case, the number of bits of the correction term is a critical design parameter, since it impacts both the accuracy and the LUT size.In [51] the division is revisited as a two-variable function and best-fitting planes are used to approximate the surface of the quotient.
In this paper, we propose a novel approximate floating-point divider (named FPDME in the following), that is non-iterative and has minimal error.In our approach we start by considering the exact operation (1 + Mx)/(1 + My), and we express the division as a linear function of the mantissa Mx, with coefficients depending on My.
The choice of coefficients affects the accuracy of the divider.In our approach, the coefficients are determined in order to minimize the Mean Relative Error Distance (MRED) of the approximation.To this end, the range of My is partitioned in N sub-intervals and in each sub-interval the minimization of MRED is formulated as a linear programming problem, whose solution gives optimal coefficient values.While we considered MRED minimization, it is worth noting that our proposed approach can be easily modified to target error metrics, such as mean absolute error, for example.
Mantissa truncation and coefficient quantization are also exploited to further optimize the design.
From a hardware perspective, the proposed divider requires only a lookup table (LUT), used to store the coefficients, and two multipliers and an adder, fused in a unique carry-save arithmetic structure.Suitable choice of N and of parameter quantization allow to tune at design-time the tradeoff between hardware complexity and accuracy.
The proposed FPDME allows to achieve MRED comparable or better than previously proposed approximate floating-point dividers.Synthesis results in TSMC 28nm CMOS technology also highlight an improvement of hardware performances with respect to the state-of-the-art, measured in terms of power-delay product (PDP) and area-delay product (ADP).We present results for two image processing applications: change detection and JPEG compression.Both applications further remark the advantages of the proposed technique, exhibiting competitive performances in terms of peak signal-to-noise ratio (PSNR) and Mean Structural Similarity Index (SSIM).
The paper is organized as follows.Section II introduces the floating-point notation and main steps used to perform the division.Section III describes our approach for approximating the division, while the Section IV shows the hardware implementation.Afterwards, the results are discussed in Section V in terms of error metric and hardware assessment, whereas Section VI presents the achieved performances in change detection and JPEG compression applications.Finally, Section VII concludes the paper.

II. FLOATING-POINT DIVISION
In floating-point notation, a real number A is represented as follows: where S, E, and M are sign, exponent, and mantissa of A, respectively, whereas bias is a constant term used to shift the exponent.While one bit is used for the sign, the bit-width of E and M and the value of bias change in accordance with the desired precision.The Fig. 1 shows the single precision IEEE-754 format [47].The representation of A requires 32 bits, with E and M that are unsigned numbers expressed on 8 and 23 bits (highlighted in blue and green, respectively).
The exponent E lies in the range [0, 255], whereas the mantissa M varies in the range [0, 1).In addition, bias is set to 127 in order to shift the overall exponent of (1) in the range [−127, 128].
In the following we assume that divider inputs are singleprecision floating-point numbers, but the proposed technique is general and can be applied equally well to other floating-point formats such as IEEE half-precision or BFloat16.
In order to show the floating-point division, let us consider the two operands: where Sx, Ex, and Mx are sign, exponent, and mantissa of the dividend, X , while Sy, Ey, My are sign, exponent, and mantissa of the divisor Y .
The division Z = X /Y has a similar representation: where the mantissa Mz is normalized, assuming values in [0, 1).It is also worth noting that the quantity (1 + Mz) lies in the range [1,2).The sign Sz of the division is simply the XOR of the sign bit of the operands, whereas the modulus of Z can be written as: Let us consider the term (1 + Mx)/(1 + My).Its maximum value is obtained for My very close to zero and Mx very close to one, resulting (slightly) less than 2. The minimum value is obtained in the opposite case and is (slightly) larger than 0.5.Therefore, the following inequality holds: In addition, it is worth noting that the factor (1 + Mx)/(1 + My) is larger than 1 when the condition Mx > My is true.Then, starting from ( 4) and ( 5), the following two cases are considered for the computation of Ez and Mz: Indeed, the quotient (1 + Mx)/(1 + My) is naturally in the interval [1,2) when Mx ≥ My (see ( 6)).Conversely, (1 + Mx)/(1 + My) is in the range [0.5, 1) when Mx < My.Therefore, in order to have (1 + Mz) in [1,2), the normalization process imposes to double (1 + Mx)/(1 + My) and to subtract a '1' from the exponent for compensation as shown in (7).

III. PROPOSED FLOATING-POINT DIVIDER
In this section we describe the technique used to approximate the divider.Firstly, we express the division (1 + Mx)/(1 + My) as a linear function of the mantissa Mx, with coefficients that depend on My.Next, we obtain the coefficient values that optimize the MRED by solving a minimization problem formulated as a linear constrained programming problem.In a subsequent step, we perform an aggressive quantization of the coefficients to further optimize the design.To that purpose, we reformulate the optimization problem as an integer linear programming problem.

A. Division Approximated as a Linear Function of Mx
In order to show the proposed technique, let us first define the exact ratio as f (Mx, My) = (1 + Mx)/(1 + My) and the approximate one as φ(Mx, My).The relative error distance (RED) between f (Mx, My) and φ(Mx, My) is while the MRED is the average value of RED.Let us also rewrite the division between mantissas as follows: As shown in ( 9), f (Mx, My) is linear with respect to Mx with coefficients that depend on My.Starting from this observation, we can write f (Mx, My) as follows: From ( 9)-( 10) we should select g(My) = c(My) = 1/(1 + My)) to make the error equal to zero.However, c(My) is to be multiplied by Mx to obtain the final result.Therefore, from the hardware implementation perspective, it makes sense to use two different approximations for g(My) and c(My), using a rougher approximation for c(My).
With the above consideration in mind, we partition the range of My in N subintervals, each one having a width of 1/N .This corresponds to divide the mantissas' plane Mx − My in N horizontal stripes as shown in Fig. 2. Note that we choose N as a power of two, so that each stripe can be easily identified by means of h = log 2 (N ) most significant bits (MSBs) of My.
In the k-th stripe (k − 1)/N ≤ My < k/N we approximate c(My) with a constant: c(My) = c k , while g(My) is approximated with a linear function of My as follows: Using the above assumptions, the equation ( 10) in the k-th stipe becomes: This equation requires a total of 3 • N coefficients a k , b k and c k to approximate the quotient and our goal becomes to compute the coefficients which minimize the MRED.

B. Obtaining the Optimal Coefficients
To obtain the values of the coefficients a k , b k and c k , we discretize each stripe by considering nx × ny equally spaced points (highlighted in red in Fig. 2), in which the relative error distance is computed.Then, in a generic point Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply. of coordinates (Mx i , My j ), the relative error distance RED i,j is expressed as: with: i = 0, 1, . . .nx − 1 and: j = 0, 1, . . .ny − 1.Our problem can be formulated as follows: find the coefficients a k , b k , c k in each stripe in order to minimize the following objective function: It is worth noting that the summation in (13) corresponds to the MRED in the k-th stripe, except for a scaling factor.Therefore, minimizing (13) in each stripe allows to minimize the overall MRED of the divider.We also underline that other error metrics, not just MRED, could also be considered as a cost function in (12), (13), as an example the mean absolute error.
The optimization (13) can be further formulated as a linear programming problem by introducing some auxiliary variables u ij such that: Then, posing f ij = f (Mx i , My j ) for conciseness, (13) can be rewritten as: where the constraints are derived from ( 14) after some algebra.The problem (15) takes the form of a standard linear programming problem of the form: Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.where the unknown vector x is composed by 3 + nx • ny elements (that are a k , b k , c k and u ij for i = 0, 1, . . .nx − 1 and j = 0, 1, . . .ny − 1) and the number of constraints is 2 • nx • ny. Figure 3a, 3b and 3c show the contour plot of RED for N = 4, 8 and 16, respectively, with the minimization problem solved in MATLAB using the linprog command.In the following, we assume nx = 100 and ny = 20.As shown, increasing N allows to achieve low values of RED in large regions of the mantissas' plane, as demonstrated by the blue sections that expand from N = 4 to N = 16.Accordingly, the MRED also improves by increasing the N value.
In addition, Fig. 3 suggests also to properly choose N in order to meet the desired accuracy constraints (dependent on the adopted floating-point format as an example).

C. Quantization of Coefficients
In order to realize the mantissa division in hardware, quantized values of the coefficients a k , b k , c k are required.To that purpose, we rewrite a k , b k , c k as follows: where LSBa, LSBb, LSBc are the weights of the less-significant bits (LSB) of the coefficients (defined at design time), while a int,k , b int,k , c int,k are integer variables, to be found.It is worth noting that the choice of LSBa, LSBb, LSBc can be properly tailored depending on the adopted floating-point format in order to meet the target accuracy.
By substituting (15), we obtain a mixed-integer linear programming problem that can be solved in MATLAB with intlinprog command, giving the values of quantized coefficients that minimize the MRED.
Figure 4 shows the behavior of MRED when coefficients are quantized.In the figure, the MRED is function of LSBc for N varying between 4 and 32, with LSBa fixed to 2 −7 and LSBb equal to 2 −1 or 2 −3 .We report also the error obtained with real (non-quantized) coefficients (see the black dashed line).In these simulations, the MRED is computed by considering 10 6 divisions, performed with 10 6 couples of uniform distributed numbers, expressed on 23 bits.
As shown in Fig. 4, the MRED exhibits a remarkable dependence on LSBc in all the cases.Indeed, a decrease in the values of LSBc, corresponding to finer resolutions of coefficients c ′ k , leads to an improvement in precision, as expected.
On the other hand, a weaker dependence on LSBb is observed, particularly for N ≥16, as shown in Fig. 4c and 4d.In this case, in fact, the MRED achieved for LSBb = 2 −3 is very close to the one achieved for LSBb = 2 −1 .
In addition, a proper choice of LSBa also leads to satisfactory performances while being less critical for the design.In this case, we found that LSBa = 2 −7 is reasonable to achieve acceptable MRED for fine values of LSBc.
The results in Fig. 4 indicate that selecting LSBc as 2 −3 for N = 4 and in the range 2 −4 -2 −7 for N ≥ 8 results in acceptable error.Likewise, choosing LSBb = 2 −1 is also a reasonable option.Based on these observations, we focus our attention on the following test cases, with the aim to get both accurate results and moderate hardware complexity: Tables I-IV collect the obtained values for the coefficients a int,k , b int,k , c int,k , in the four considered cases.

IV. PROPOSED FLOATING-POINT DIVIDER
The hardware implementation of the proposed FPDME is depicted in Fig. 5a.The sign Sz is computed by XORing Sx and Sy, whereas a multi-operand adder computes the exponent Ez.The approximate mantissa division is performed in the ApprxDiv block.The h MSBs of My index lookup table (LUT) that stores the quantized coefficients, while two multipliers and an adder compute the quotient.Since b int,k is always negative, we store in the LUT its absolute value   and synthesized targeting a standard-cell library as detailed in Section V.
The approximate quotient φ k is computed by multiplying c int,k and b int,k with the mantissas and by adding a int , k to the products.With the aim to reduce the complexity of multipliers, nt LSBs of mantissas are truncated, obtaining the signals Mx nt and My nt .We underline that nt can be carefully chosen in dependence on the used floating-point format, and, accordingly, in dependence on the desired precision.
Moreover, the multipliers and the adder are organized in a fused carry-save arithmetic structure, named CSAS in the figure, to further optimize hardware.
The It is also worth noting that the CSAS computes only 12 bits of the quotient instead of 24, thus allowing to reduce the hardware complexity of the normalization process (detailed in the following).In general, the number of bits computed by CSAS is n φ = 24 − nt + |log 2 (LSBc)|.
Finally, the Normalization block in Fig. 5 rearranges φ k in the interval [1,2) to extract the mantissa Mz.As stated in Section II, the quotient varies in [0.5, 2), and, accordingly, its MSB (indicated as φ k [n φ −1] in the figure) has a weight 2 0 .If φ k [n φ −1] = 0, then φ k is in the range [0.5, 1) and the normalization process provides to add a zero at the least significant position in order to double the quotient (see the signal φ 1 in the Normalization block).Moreover, ∼φ k [n φ −1] is subtracted to the exponent for compensation, with "∼" representing the inversion operator.
Conversely, if φ k [n φ − 1] = 1, then φ k is already in [1, 2) and no further operation is required.In this case, the fractional part of φ k corresponds to Mz (see the signal φ 2 in the figure).In the architecture of Fig. 5, a multiplexer selects between φ 1 and φ 2 , and the result is expressed on 23 bits by adding zeros at the least significant position.

A. Error Metrics
Let us indicate the exact and the approximate quotients as Q and Q apprx , respectively.We define the approximation error E = Q − Q apprx , while the Relative Error Distance and the Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

TABLE IV COEFFICIENTS
Mean Relative Error Distance are RED = |E/Q| and MRED = avg(RED) as shown in Section II, where avg(•) is the average operator.We also compute the Error Bias defined as EB = avg(E/Q) [49], and the probability of having RED larger than 2% (referred as PRED in the following).
The error metrics are computed by performing 10 6 divisions, with 10 6 couples of random, uniformly distributed, floating-point single-precision numbers.
For the sake of comparison, the performances of dividers [42], [44], [48], [49], and [50] are also shown.The divider [42], named ALD in the following, subtracts mantissas in the LNS representation, processing only the first q MSB of Mx and My, with q = 8 in our trials.The work [49] TABLE V ERROR METRICS OF THE PROPOSED DIVIDER AND THE STATE-OF-THE-ART approximates 1/(1 + My) using 2 d values, with d that is 2 or 3, and exploits a truncated multiplier with t preserved columns.In the following, the divider [49] will be presented as LPCAD(d, t), with t = 4, 8.The work [50], named CADE in the following, divides the mantissas' plane in 2 P × 2 P square regions and computes, for each section, an error compensation term expressed on L bits.For our study, we consider L = 8 and P = 3, 4. The design [44], referred as TruncApp, exploits linear approximation for the term 1/(1 + My), and employes only r bits for computing the quotient, with r = 4 in our trials.Finally, the work [48] involves 2 α possible shiftand-add operations for realizing the division, with α defining the approximation level.Moreover, each operation involves β adders, whose addends are truncated on 5 bits.In the following, we refer to [48] as FPAD LαAβ.
Table V collects the error metrics for both the proposed divider and the state-of-the-art, with MRED and EB reported in percentage values.As expected, the performance of the architecture proposed in this paper depends on the number of partitions N , with the MRED improving from 1.5% (for N = 4) to 0.33% (for N = 32).PRED also exhibits a marked dependance, passing from 2.4 × 10 −1 to 3.2 × 10 −4 , whereas EB results almost constant.In addition, also nt has an effect on the accuracy of the divider, with best approximation achieved when the number of truncated LSBs is low.

TABLE VI HARDWARE PERFORMANCES OF THE PROPOSED DIVIDER AND THE STATE-OF-THE-ART SYNTHESIS WITH MINIMUM AREA AND POWER
L = 8 that achieves MRED of 0.65%.The other dividers exhibit poorer accuracy, with MRED in the order of 2% or larger.In this case, ALD and TruncApp show worst results, with MRED around 4% and PRED of about 7 × 10 −1 .

B. Hardware Performances
We have described the proposed dividers and the state-ofthe-art dividers in Verilog HDL and synthesized the circuits in TSMC 28nm CMOS technology using a physical flow in Cadence Genus.
In a first experiment, we have imposed a very loose constraint on the maximum delay of the circuits (10ns), so that the synthesizer is able to implement minimum area and minimum power versions of the dividers.In this case we also synthesized the exact floating-point divider, chosen from the ChipAware library of the synthesizer.
In a second experiment, we have imposed a tighter constraint on the maximum delay (750ps), to investigate the performance when a higher operating frequency is required.In this second experiment, we have opted to exclude the exact divider due to the complexity of the circuit, making it impractical to meet the timing constraint.
In both experiments the power consumption is obtained by simulating the synthesized netlists with 10 5 random inputs, with path delays annotated in standard delay format (SDF) file and switching activity annotated in toggle count format (TCF) file.
Table VI collects the results for the first experiment.The last two columns report the power-delay product (PDP) and the area-delay product (ADP).
All the investigated architectures drastically reduce the PDP with respect to the exact divider.Best results are shown by ALD and TruncApp, with PDP in the order of 3fJ.These architectures, however, are also the one with the largest error.The proposed architecture shows good tradeoff between error and PDP.For instance, FPDME 4 (7, 1, 3) nt = 17 exhibits a lower PDP compared to all versions of LPCAD, CADE, and FPAD, and it also has lower error (with the sole exception of CADE P = 4 L = 8, LPCAD (2,8) and LPCAD (3,8)).A similar behavior is also shown for the ADP.
In order to have a joint assessment of electrical and accuracy performances, Fig. 6 depicts the PDP and the ADP with respect to the MRED for both experiments.Here, implementations closer to the bottom-left corner exhibit low PDP/ADP with high accuracy, thus defining the Pareto front.
As shown in Fig. 6a, the proposed dividers offer the best trade-off between PDP and MRED and are all on the Pareto front (highlighted by the black dashed line).Only LPCAD (3,8) is close to the optimal curve, whereas other implementations show worse behaviors, with the only exception of ALD and TruncApp showing, however, a large MRED.The proposed FPDME are on the pareto front also in Fig. 6b, determining the best trade-off between ADP and MRED.Again LPCAD (3,8) results competitive as well as ALD and TruncApp for low accuracy.A similar trend is shown also in Fig. 6c and 6b for

A. Change Detection
Change detection is often employed in computer vision to highlight motion in subsequent frames.The division between pixels is suitable to detect differences images.Indeed, if objects do not move, their pixels are practically constant among the frames and, accordingly, their division is very close 1.Conversely, division is far from 1 in case of a change, thus highlighting a motion.
In this paragraph, we analyze the performances of the proposed divider and the state-of-the-art when changes are detected in the frames Walter Cronkite, Chemical Plant (far and close view), and Toy Vehicle, from the database [52].For our assessments, we report the peak signal-to-noise ratio (PSNR), expressed in dB, and the mean structural similarity index (SSIM), commonly used to qualify algorithms in image and video processing.In addition, we also for each investigated divider the average PSNR and the average SSIM the four experiments.As shown in Table VIII, our proposal is competitive with the state-of-the-art and offers remarkable results, with SSIM very close to 1 in all the cases, and average PSNR in the range 46.5dB/53.6dBfor N ≥ 8.In addition, FPDME 32 (7,1,5) overcomes 50dB in all the trials and achieves the highest PSNR (58.7dB) with Toy Vehicle.The implementations ALD, TruncApp and FPAD show poorer performances, with an average PSNR lower than 40dB and an average SSIM of 0.938 in the case of TruncApp.Accuracy of LPCAD depends on the approximation parameters, with PSNR varying between 35dB and 48dB, whereas CADE performs better, showing PSNR slightly less than 50dB.Figure 7 represents the image obtained by dividing the frames of Walter Cronkite.As shown, results obtained with LPCAD(2, 4), LPCAD (3,4), and TruncApp exhibit a visible degradation in the background, whereas the proposed dividers allow to get images practically unchanged with respect to the exact case.

B. JPEG Compression
As further example, we assess the accuracy of approximate dividers in image compression.The JPEG compression exploits cosine transformation and variable quantization to approximate images.The compression algorithm roughly quantizes the high frequencies of an image, whereas employes a finer quantization step to approximate the low frequencies.In this way, the compression algorithm reduces the size of images in memory at the cost of a worse representation of the high frequencies, which are less evident to human eye.In addition, an approximation factor Q, laying in the range [0, 100], allows to define the overall amount of compression by modifying the quantization steps, with Q = 0 and Q = 100 indicating worst and finest quantization, respectively.In our case, we employ the approximate dividers in the quantization phase since a division between the pixels and the variable quantization steps is required.Three test images, Lena, Cameraman, and Peppers, are considered for our simulations, compressed with factors Q = 40, Q = 70 and Q = 100.For each Q and for each image, we compute the PSNR and SSIM comparing the approximate and the exact results.Then, we average the PSNR and SSIM computed for each Q, reporting the respective values in Table IX.In addition, the overall average PSNR and SSIM are also shown in the last two columns of the table.
As observable, the proposed dividers are competitive with the state-of-the-art, exhibiting both high values of PSNR and SSIM.Best results are achieved in the case Q = 100, with PSNR larger than 55dB for N = 16, 32.In addition, FPDME 32 (7,1,5) is the only one able to achieve an average PSNR of 50dB. Figure 8 confirms these observations since the images compressed with the proposed the exact dividers are practically undistinguishable.

VII. CONCLUSION
In this paper, we have proposed a novel non-iterative approximate floating-point divider based on linear approximation.
In our divider, we have approximated the quotient (1 + Mx)/ (1 + My) as a linear function of Mx with coefficients dependent on My.The coefficients have been calculated to minimize the Mean Relative Error Distance (MRED) of the approximation.To this end, the range of My has been partitioned in N sub-intervals and in each subinterval the minimization of MRED has been formulated as a linear programming problem, whose solution gives optimal coefficient values.Mantissa truncation and coefficient quantization have also been exploited to further optimize the design.
The hardware structure of the whole floating-point divider has been described in detail, and the performance of the proposed architecture has been compared with previously proposed approximate dividers.Our analysis shows that the proposed architecture overcomes the state of the art, offering the best trade-off between PDP/ADP and accuracy for a wide range of mean relative error distance values.We have also presented results for two image processing applications that both remark the advantages of the proposed technique, exhibiting competitive performances in terms of peak signalto-noise ratio (PSNR) and Mean Structural Similarity Index (SSIM).

Fig. 1 .
Fig. 1.Floating-point single-precision representation of the real number A.
Fig. 6. a) PDP and b) ADP with respect to the MRED for minimum power and area implementations.c) PDP and d) ADP with respect to the MRED with a 750ps constraint on the maximum delay.The black line represents the pareto front.
Figure 5b shows details of the CSAS in the case N = 8, L S Ba = 2 −7 , L S Bb = 2 −1 , L S Bc = 2 −4 and nt = 16.Here, a int,k , |b int,k | and c int,k are expressed on 8, 2 and 4 bits, respectively, whereas M x nt , M y nt are on 23 − nt = 7 bits.Then, the first 4 blue rows are due to M x nt • c int,k , whereas the other 2 orange rows are related to M y nt • |b int,k |.The term a int,k is depicted in green.In addition, having M x nt , M y nt a LSB of weight 2 −(23−nt) = 2 −7 , the products M x nt • c int,k , M y nt • b int,k have LSBs of weight 2 −11 and 2 −8 , respectively.

Fig. 7 .
Fig. 7. Change detection for Walter Cronkite image with proposed dividers and the state-of-the-art.

Fig. 8 .
Fig. 8. JPEG compressions in the case of Peppers image with the exact and the proposed dividers, Q = 100.

TABLE VII HARDWARE
PERFORMANCES OF THE PROPOSED DIVIDER AND THE STATE-OF-THE-ART SYNTHESIS WITH A 750ps CONSTRAINT ON THE MAXIMUM DELAY TABLE VIII PERFORMANCES OF THE PROPOSED DIVIDER AND THE STATE-OF-THE-ART IN CHANGE DETECTION APPLICATIONthe faster implementations, where the proposed dividers define or are very close to the pareto front.

TABLE IX PERFORMANCES
OF THE PROPOSED DIVIDER AND THE STATE-OF-THE-ART IN JPEG COMPRESSION