A note on Dekker’s FastTwoSum algorithm

More than 45 years ago, Dekker proved that it is possible to evaluate the exact error of a floating-point sum with only two additional floating-point operations, provided certain conditions are met. Today the respective algorithm for transforming a sum into its floating-point approximation and the corresponding error is widely referred to as FastTwoSum\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\,\mathrm{FastTwoSum}\,}}$$\end{document}. Besides some assumptions on the floating-point system itself—all of which are satisfied by any binary IEEE 754\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$754$$\end{document} standard conform arithmetic, the main practical limitation of FastTwoSum\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\,\mathrm{FastTwoSum}\,}}$$\end{document} is that the summands have to be ordered according to their exponents. In most preceding applications of FastTwoSum\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\,\mathrm{FastTwoSum}\,}}$$\end{document}, however, a more stringent condition is used, namely that the summands have to be sorted according to their absolute value. In remembrance of Dekker’s work, this note reminds the original assumptions for an error-free transformation viaFastTwoSum\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\,\mathrm{FastTwoSum}\,}}$$\end{document}. Moreover, we generalize the conditions for arbitrary bases and discuss a possible modification of the FastTwoSum\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\,\mathrm{FastTwoSum}\,}}$$\end{document} algorithm to extend its applicability even further. Subsequently, a range of programs exploiting the wider applicability is presented. This comprises the OnlineExactSum algorithm by Zhu and Hayes, an error-free transformation from a product of three floating-point numbers to a sum of the same number of addends, and an algorithm for accurate summation proposed by Demmel and Hida.


Introduction and notation
A floating-point number system with base β, mantissa length p, and exponent range [e min , e max ] may be defined via F := {m · β e : m, e ∈ Z, −β p < m < β p , e min ≤ e ≤ e max }. (1) Let F be accompanied by a set of floating-point operations {⊕, , , . . .} that approximate their real equivalents {+, −, ·, . . .} in accordance to some mapping fl : R → F. More specifically, x y = fl(x • y) for all x, y ∈ F, where • can be any supported operation between two numbers. If the mapping fl(·) is conform with a rounding from the IEEE 754 floating-point standard, we obtain a model for an arithmetic that is in line with the same standard.
If not stated otherwise, we henceforth assume the operations on F to be evaluated in rounding to nearest, i.e., the mapping fl : R → F satisfies ∀r ∈ R, f ∈ F : |fl(r ) − r | ≤ | f − r |.
For instance, ⊕: F × F → F is called nearest-addition if it approximates the addition over real numbers by a nearest number within F, that is, ∀x, y ∈ F : |(x ⊕ y) − (x + y)| = min{| f − (x + y)| : f ∈ F}.
Another frequently considered assumption on floating-point approximations is faithful rounding. We call an operation faithfully rounded if there lies no other floating-point number between the rounded and the real result.
The FastTwoSum procedure first appeared in 1965 as a part of Kahan's compensated summation algorithm [6]. Kahan introduced his algorithm as a simpler alternative to Wolfe's summation method, which is based on cascaded accumulators [19]. However, Kahan neither provided an error estimate for his algorithm nor gave the conditions for an error-free transformation.
Function (s, e) ← FastTwoSum(x, y) The introduction of the FastTwoSum algorithm as a technique for extending the available floating-point precision as well as the proof of its error-free transformation property in 1971 is due to Dekker [2]. The modern term FastTwoSum was likely coined by Shewchuk [16]. Occasionally, this algorithm is also referred to as Quick-Two-Sum.
Let e( f ) = e f denote the exponent of f ∈ F according to a representation f = m f · β e f that complies with definition (1). Dekker proved the following relation between the input and the return values of FastTwoSum.
Theorem 1 Let x, y ∈ F with the base of F being restricted to β ∈ {2, 3}. If ⊕ realizes a nearest-addition, realizes some faithful-subtraction, and e(x) ≥ e(y), (2) then s + e = x + y with s = x ⊕ y.
Furthermore, in [2] properly truncated addition 1 was considered, for which Dekker proved that (3) holds true without the restriction on β. Nevertheless, due to the absence of fast hardware implementations of properly truncated rounding, this result is of rather theoretical interest and typically disregarded.
Another rarely considered property of Dekker's theorem emerges from his definition of an exponent e(x) to a floating-point number x ∈ F. Since the original inequality e(x) ≥ e(y) is difficult to check, usually only the more stringent condition |x| ≥ |y| is regarded. In the following section we will take a closer look at the generality of the former inequality and generalize Dekker's result for arbitrary bases β. Subsequently, a range of applications will be presented for which the transformation by FastTwoSum is error-free without the usually considered inequality |x| ≥ |y| being met.

Multiple representations
From a mathematical perspective -and maybe also reasoned in the wide-spread usage of the IEEE 754 standard -, we typically connect the exponent of a floating-point number with the exponent to a normalized representation of the same. This may be a major reason for leaving a wider applicability of the FastTwoSum algorithm unrecognized.
Definition (1) allows multiple representations for many of its elements. Exemplary, let β = 2, p = 4 and consider the number x = 3. Supposing a sufficiently wide range of feasible exponents, there are three presentations that comply with (1). Hence, e(x) can be any integer from the set {−2, −1, 0}. In [2], Dekker simply assumes that there are feasible representations of x, y ∈ F for which e(x) ≥ e(y) is satisfied.
Using the notation of the unit in the last place (ULP), it is possible to give an equivalent, more explicit condition than the one due to Dekker. The ULP is defined for real numbers r ∈ (−β e max + p , β e max + p ) as Hence, the unit in the last place of a nonnegative number x ∈ F is the step-length to its successor. (2) is satisfied for some representation of x, y ∈ F complying with (1) if, and only if, ∃k ∈ Z : x = k ulp(y).

Lemma 1 Inequality
Proof Let e max x denote the maximal exponent over all feasible representations of x, and let e min y denote the minimal exponent of y, accordingly. Then an equivalent condition to (2) is e max x ≥ e min y . In the trivial case x = 0, it is e max x = e max and the equivalence with (5) is evident. We henceforth assume x = 0. By (1) and (4), we have β e min y = ulp(y). Another consequence of (1) is that β e max x denotes the maximal power of β that divides It is noteworthy that, in [15], the sufficiency of condition (5) was already proved for β = 2 and rounding to nearest in every operation. However, neither was [15,Lemma 3] linked to Dekker's original condition nor has the result been exploited for any of the applications given in Section 4.

Generalization for arbitrary bases
In the previous section it was recalled that Dekker's result is applicable to more general constellations than x, y ∈ F with |x| ≥ |y|. Before discussing applications where weaker presupposition are beneficial, here we discuss a possible generalization of Dekker's result for arbitrary bases β.
A typical example to pinpoint the necessity of the restriction β ∈ {2, 3} in Theorem 1 is x = 99, y = 98 with β = 10, p = 2, for which FastTwoSum returns s = 200 and e ∈ {−12, −2}. The ambiguity of e is due to the faithful evaluation of t ← s x. In either case the identity s + e = x + y is not satisfied. Apparently, in the context of floating-point systems with larger bases, presupposition (2) is not sufficient to ensure (3). In the following we give an alternative result that covers Dekker's Theorem as a special case. Theorem 2 Consider the FastTwoSum algorithm for given input x, y ∈ F. Let ⊕ and realize a nearest-addition and some faithful-subtraction, respectively. If there is a representation of x such that then the computed s, e ∈ F satisfy (3).
Proof As a consequence of (7), clearly (2) is satisfiable. Moreover, by definition (1) the difference of two floating-point numbers a, b is a multiple of β min{e(a),e(b)} such that, in the absence of overflow, A similar statement applies to the addition of two floating-point numbers.
We use inequality (2) and implication (8) to prove (3), first verifying the equality t = s − x and then e = y − t. The proof of the former is by distinction into three cases.
Since x, y, and s are necessarily multiples of β e(x) = β e(y) , we have for some representation of s. Then implication (8) yields t = s − x. Case 2 Suppose that |x + y| ≤ β p β e(y) is satisfied for all feasible representations of y. By (8) we have s = x + y and thereby t = s − x = y ∈ F.
For β ∈ {2, 3} there is no number in F larger than β p − β 2 β e(x) but also smaller than β p β e(x) . It is thus straightforward to show that condition (7) and Dekker's original condition e(x) ≥ e(y) are equivalent for base β ∈ {2, 3}.
In the proof of Theorem 2, we are not so much concerned with the sum x ⊕ y being evaluated in rounding to nearest. The only property of nearest-addition that we make use of is given in (9). In particular, for any mapping ⊕ : the inequality in (7) can be adapted so that (3) remains valid. Unfortunately, (10) is not necessarily satisfied for a faithfully rounded summation. If y is smaller than half the distance between x and its nearest floating-point neighbor but x ⊕ y = x, then (10) does not hold true and typically x ⊕ y − (x + y) / ∈ F. On the other hand, as long as the exponents of x and y are not too far apart, we can prove that also the rounding error of any faithfully rounded sum x ⊕ y necessarily lies in F. To be precise, one can show the following: then (3) is true also for faithfully rounded addition in line 1 of FastTwoSum.

Proof
We use a similar argument as for Theorem 2. The proof has to be modified in two places: In Case 1, faithful rounding only implies |s − (x + y)| < ulp(x + y) ≤ ββ e(x) without the factor 1 2 . Nevertheless, by y, x, s ∈ β e(x) Z and the tighter bound on |y| given in (11), we still have Also the argument in (9) is no more applicable for faithful rounding. Nevertheless, |s − (x + y)| ≤ β p β e(y) still holds valid and can be shown as follows. The right inequality in (11) is equivalent to ulp(x) ≤ β p β e(y) by which On the other hand, if |s − (x + y)| > ulp(x), then ulp(x + y) > ulp(x) and therefore |y| = |x + y| − |x| ≥ β −1 ulp(x + y). Hence proves that the outer inequality in (9) remains valid.
It is noteworthy that for β = 2 the right inequality in (11) is again equivalent to (2). Since most modern computers implement IEEE 754 binary floating-point formats, generalizations of certain FastTwoSum applications on theses platforms are straightforward.
Theorem 2 and Remark 1 pinpoint the actual conditions under which the transformation due to FastTwoSum is error-free. Though these conditions do not directly restrict the base β, the limitation illustrated in (6) remains.
To overcome this issue, we need to modify the original algorithm. Define the constant c β := β p − β−2 2 β − p and assume c β ∈ F. The condition c β ∈ F necessarily holds valid if the normal range of F encompasses (β −1 , 1], which is a reasonable assumption. We extend the code of FastTwoSum as follows. In return for an additional operation and loosing the beneficial property s = x ⊕ y, c β -FastTwoSum allows an error-free transformation of any pair x, y ∈ F satisfying e(x) ≥ e(y), independent of the choice of β.

Theorem 3
Consider the c β -FastTwoSum algorithm for given input x, y ∈ F with c β := β p − β−2 2 β − p ∈ F. Let , ⊕, and realize a nearest-multiplication, a nearest-addition, and some faithful-subtraction, respectively. If x is a multiple of ulp(y), i.e., condition (2) is satisfiable, then s, e ∈ F satisfy Proof To avoid a separate argument for the underflow case, we exploit the notation of the unit in the first place (ufp) introduced in [14]: Certain equalities, such as ufp(a) = β p−1 ulp(a), are only valid for numbers in the normalized range of F. In the following argument, we only use relations that are also satisfied in the underflow case, for instance, ufp(a) ≤ β p−1 ulp(a).
By definition of c β and |y| ≤ (β p − 1)β e(y) , we have Since β p − β 2 β e(y) is the unique nearest floating-point number to this upper bound and e(x) ≥ e(y), we have .
Hence x,ỹ are in accordance with (7) and one can exploit the first part of the proof of Theorem 2 to show that t = s − x.
For β ∈ {2, 3} the scaling factor is c β = 1 and c β -FastTwoSum works just like the original implementation. Our code demonstrates a possible generalization of Dekker's FastTwoSum algorithm.
The above approach can be generalized further for faithful-addition as in Remark 1. For this purpose, we just need to redefine the scaling factor c β := 1 − β 1− p + β − p , assume p ≥ 2, and adapt the error estimate together with the respective argument. The statement remains true if the product c β y is rounded faithfully. We leave the analysis to the well-disposed reader. If embedded or low-level programming is used, alternative approaches to compute a suitableỹ are available. One possibility is to round y towards zero into a floating-point format with same base and exponent range as F but a by 1 reduced mantissa length p−1. Such a rounding is easily implemented by resetting the last mantissa bit in a normalized representation of the respective number. Alsoỹ ← sign(y)·min{|y|, (β p −β)β e(y) } is an option that can be realized efficiently by applying suitable integer operations solely to the mantissa bits of y. For the sake of clarity and transparency, here we refrain from going any further into detail.

Applications
The examples in the following subsections serve to illustrate a wider applicability of the FastTwoSum algorithm. We follow the same notation as above. In accordance with (1), p shall denote the mantissa length, β denotes the base, and e min , e max define the feasible range of exponents. Unless otherwise specified, we assume that the arithmetic operations are evaluated in rounding to nearest. Moreover, we generally assume the absence of overflow. All other assumptions, including possible restriction on the base β as well as exceptions for underflow, are explicitly mentioned for each case individually if present.

Error-free transformation -single exponent summation
As an immediate application of Dekker's original theorem, let us consider the recursive summation of floating-point numbers with the same ULP. Since all intermediate sums are multiples of this ULP, these numbers may be added accurately using FastTwoSum. The respective error term can be summed up without introducing any errors by applying plain floating-point addition; at least until the error grows above β p times the respective ULP.
To prove that the transformation due to Algorithm 1 is error-free for a limited number of summands, we first show the following two auxiliary results.
Lemma 2 For given numbers s, x ∈ F, choose e, l s , l x , u s , u x ∈ Z such that l s β e ≤ s ≤ u s β e and l x β e ≤ x ≤ u x β e .

end
If s ⊕ x is rounded faithfully, then Proof The left-hand side of (13a) implies |l s + l x |β e ≤ β p+e , such that (l s + l x )β e either lies in the underflow range of F or is itself a floating-point number. 2 In the former case, the result is evident due to error-free summation in the underflow range. On the other hand, for (l s + l x )β e ∈ F, faithfully rounded evaluation and (l s + l x )β e ≤ s + x imply (l s +l x )β e ≤ s ⊕ x. The implication (13b) can be shown by a similar argument.
Lemma 3 can be proved by a simple induction argument using Lemma 2. We exploit this result to show the desired behavior of Algorithm 1.

Corollary 1 Let given x
then Algorithm 1 transforms n i=1 x i error-free into s + e.

Remark 2
The statement in Corollary 1 remains true for faithful-addition if β = 2 and the restriction on n in (15) is replaced by Remark 3 Moreover, the transformation by Algorithm 1 remains error-free without the restriction on β but with rounding to nearest if we assume and replace the FastTwoSum calls with their c β -FastTwoSum equivalents.
Proof The initial assumption on the addends x i imply a similar property for the intermediate values s j of s, i.e., ∀ j, k : Thus, the requirements for an error-free transformation are met for each call of FastTwoSum. It remains to show that the summation of the error terms does not involve further rounding errors.
In each call of FastTwoSum the variable s is updated simply by adding the respective summand x i . Hence, s i := s i−1 ⊕ x i for i = 2, . . . , n. Let k be the index of the summand with maximum absolute value, such that ∀i : |x i | ≤ |x k | < β p ulp(x k ).
Since this upper bound is a power of β, we can apply Lemma 3 to show that and thereby for i = 2, 3, . . . , n. By definition of q, we have 2q ≥ p + 1 + log β 2 , such that Let r := log β n . Then n < β q ⇒ r < q and Together with Since each error term is as well a multiple of ulp(x k ), we have e + t ∈ F in every iteration of the for-loop; the summation is error-free. The argument for Remark 2 is very similar. However, for faithful-addition |a +b| ≤ β t only implies |a ⊕ b − (a + b)| < β t− p so that we loose the factor 1 2 in (16). To prove that the right-hand side of is less than or equal to β p ulp(x k ), we distinguish the cases n < β q f −1 and β q f −1 ≤ n ≤ β q f β+1 + β p−q f . Both cases can be shown by similar arguments as above. For the proof of Remark 3, we follow again a similar approach. Due to the slightly worse estimate for |e| in (12), we have to update the inequality for the overall sum of errors as follows: The cases n < β q−1 and β q−1 ≤ n ≤ β q β+1 + 2β p−q − 6 β β−2 2 may then be treated individually using the inequalities from above.
A good example for the benefit of Corollary 1 is the OnlineExactSum algorithm introduced in [20]. The core element of this algorithm is the addition of all summands into the respective accumulators. This is done according to the exponent of the most significant digit of each summand. To every possible exponent position there is an accumulator pair assigned; one floating-point number for the approximate sum and another for the corresponding error. The authors, Zhu and Hayes, advised to use Dekker's procedure together with the error sum after an if-statement for a branch depending on the comparison of the exponents of the intermediate sum and the current summand. Corollary 1 demonstrates that the branching is not necessary and that their bound on the number of summands until possible loss of digits can be improved. 3 Moreover, Remarks 2 and 3 show that the algorithm also works for faithfully rounded operations and that it can be easily modified for general bases β, requiring only one more operation at each step instead of the three additional operations that would be introduced if we replaced FastTwoSum with TwoSum [8, Theorem B, 4.2.2].

Error-free transformation -ThreeProduct
Many adaptive and accurate algorithms for problems involving products of three numbers use error-free transformations to transform these terms into unevaluated sums of four floating-point numbers. Exemplary, we want to mention the adaptive algorithms for the 3D orientation problem given in [3,4,12,16]. Here we designate the algorithm that realizes this transformation as FourSumThreeProduct.
The subroutine TwoProduct is a well-known algorithm [2,11] for the transformation of a product of two floating-point numbers into an unevaluated sum of two floatingpoint numbers. If neither under-nor overflow occurs, this transformation is error-free. To be more specific, we have in line 1 of FourSumThreeProduct. From (17) and the respective conditions for line 2 and 3, the equality s 1 + 4 i=2 s i = 3 i=1 x i is evident. Nevertheless, the aforementioned application in mind, this transformation is improvable. We will show that the error of the sum fl(s 2 + s 3 ) can be added to s 4 without introducing another rounding error. It is therefore possible to replace s 2 , s 3 , s 4 with only two addends. In particular, we prove that-although |s 2 | ≥ |s 3 | and even ulp(s 2 ) ≥ ulp(s 3 ) do not generally hold true-condition (7) is always satisfiable. Thus, it is possible to use FastTwoSum without any restriction on the base β.

Lemma 4 Consider the procedure ThreeProduct and assume the absence of underflow errors within the FourSumThreeProduct call. Then
Proof In the absence of underflow errors, the FourSumThreeProduct transformation is free of errors. We will prove (18) by validating s 2 + s 3 = 4 i=2 s i . The following argument applies independently of a scaling by a power of β, provided overflow and underflow do not occur. In this respect and by triviality of the case x 1 x 2 x 3 = 0, we henceforth assume without loss of generality ulp(x i ) = 1 for i = 1, 2, 3, such that Let t h and t l be the output of TwoProduct in line 1 of FourSumThreeProduct and denote by fl (·) a rounding to +∞, that is, For the absolute value of t h , we have Together with (17), we further derive (in order) The error term s 2 of the product x 1 t h is necessarily a multiple of ulp(x 1 ) ulp(t h ). Hence, there is a representation of s 2 satisfying β e(s 2 ) ≥ ulp(t h ) by which Condition (7) is satisfied and thereby By |s 2 + s 3 | ≤ |s 2 | + |s 3 | ≤ β 2 p , we have |s 3 | = |s 2 − (s 2 + s 3 )| ≤ 1 2 β p and so that s 3 + s 4 ∈ Z lies in F and s 3 = s 3 + s 4 is evaluated without error.

Accurate summation of preordered addends
As a final example for the applicability of FastTwoSum, we consider a summation approach due to Demmel and Hida. In [3], the authors were concerned with recursive summation of floating-point numbers that are sorted according to their ULP in nonascending order. For the summation via an extended floating-point register with k bits additional precision and assuming that the number of addends is bounded by 1 + 2 k 1−2 − p , Demmel and Hida proved a small relative error (≈ 1.5 ulp) of the computed result. However, the availability of extended precision formats depends on the CPU architecture as well as the programming language. If such a format is not available, high-precision numbers need to be emulated. In this context, we consider the DoubleDouble type implemented in the QD library [5].
The algorithm for adding a floating-point double number to a DoubleDouble number requires 10 operations. This is less than the 14 additions used in the respective implementation in the DoubleDouble library [1] but still improvable for our purpose. For the summation within the loop of Algorithm 2, we simply took the code from the QD library and replaced the TwoSum call with its FastTwoSum equivalent. This is possible due to the ordering of the addends. Though Algorithm 2 requires only 7 operations per addition into the double-word accumulator (s h , s l ), this pair of p-bit floating-point numbers behaves almost the same as an actual 2 p-bit floating-point number.
this implies that s l + v l is representable by p mantissa digits and therefore evaluated without rounding error.
Case 2 Assume ulp(s max ) ≤ ulp(s h + s l + x i ). Then (21) implies If these inequalities are actually equalities, the computation is error-free. On the other hand, if the outer inequality is strict, p ≥ 2 and . Hence, t h has no significant digits whose exponents are smaller than the exponent of the digit at rounding position. The number represented by t h + t l results from a nearest rounding of the base β representation of t h + v l + s l at the position with value ulp(s l + v l ). Together with ulp(s l + v l ) ≤ β − p ulp(s h + s l + x i ), this yields (20).
Then s l < 0 and s h = fl(s h + s l ) ≤ β p−1 ulp(s max ). Since the difference between β p−1 ulp(s max ) and its neighbored floating-point numbers is strictly greater than x i , the only feasible choice for s h is s h = β p−1 ulp(s max ). This also implies v l = x i ≥ 0, − 1 2β ulp(s max ) ≤ s l , and s l + x i < 0, by which The remainder follows by a similar argument as in Case 2. Case 4 Suppose that none of the previous cases apply, so that s h + s l ≥ β p−1 ulp(s max ) > s h + s l + x i and |x i | < β −1 ulp(s max ).
Similarly as above, we follow that s h = β p−1 ulp(s max ) is the only feasible choice for Together with s l ≥ 0 and |v l | ≤ 1 2 ulp(s h + x i ) = 1 2β ulp(s max ), this gives Using once again the argument from Case 2, we prove (20). For the proof of the second statement of Corollary 2, we distinguish two cases. First, assume that x i > − 1 2 s h and therefore 1 β s h ≤ 1 2 s h ≤ t h . Then |s l | ≤ 1 2 ulp(s h ) ≤ β 2 ulp(t h ), |v l | ≤ 1 2 ulp(t h ), and p ≥ 2 yields On the contrary, suppose and complete the proof.
If the considered arithmetic obeys an unambiguous tie-breaking rule, this result can be proved also for p = 1. 4 Nevertheless, due to the absence of practical relevance, we skip the argument for this case.
To treat floating-point systems with bases other than 2 or 3, we may replace the FastTwoSum calls in line 2 and 4 of Algorithm 2 with their TwoSum equivalents or safe two operations per iteration by using c β -FastTwoSum instead. The latter modification requires p ≥ 3 and may cause a loss of accuracy by two mantissa digits. Moreover, although Remark 1 is not applicable here, a generalization to faithful-summation is possible if we use any of the means of computing a suitableỹ described at the end of Section 3.
For the sake of clarity, we refrain from discussing either of the above mentioned modifications and leave it to the well-disposed reader. Instead, we conclude this note by deducing an error estimate for the output of Algorithm 2 similar to the one in [3]. In exchange for a tighter estimate, unlike [3, Theorem 1], the following result only regards the range from 2 to β p + 1 for the number of addends. On the other hand, due to the restriction to our specific problem and the use of techniques from optimization, our proof is much more compact than the argument by Demmel and Hida.

Theorem 4
For given x ∈ F n , let s h , s l ∈ F be evaluated according to Algorithm 2. If β ∈ {2, 3}, p ≥ 2, and 2 ≤ n ≤ β p + 1, then Proof Let t i+2 denote the computed approximation represented by the unevaluated sum t h + t l in the i-th step of the for-loop of Algorithm 2. Moreover, let k denote the index where the accumulation is erroneous the first time, i.e., t k = t k−1 + x k = k i=1 x i . By design the initial transformation is always error-free and therefore k ≥ 3. With regard to I := {k, k + 1, . . . , n}, define u s := max{ulp(t i ) : i ∈ I } as well as the index sets In the context of estimate (20), the first erroneous accumulation satisfies Thus, there is no power of β larger than β − p−1 ulp(t k ) that divides both x k and t k−1 . By the ordering in line 1 of Algorithm 2, the same is true for all subsequent addends, so that In a similar way, using the definition of I 2 , we derive the stricter bound Denote by n 1 and n 2 the cardinalities of the sets I 1 and I 2 , respectively. Note that |I \(I 1 ∪ I 2 )| ≤ n −2−n 1 −n 2 and ∀i ∈ I 1 : |t i | ≥ β p−1 u s . Without loss of generality, we further assume that I 1 is not empty. This is possible because I 1 = ∅ implies I = ∅ and therefore the absence of approximation errors. By a similar argument as for Lemmas 2 and 3, we derive On the other hand, Corollary 2 implies the following individual error bounds: By combining these inequalities, we derive the estimate n 1 β − p u s + n 2 β − p−1 u s β p−1 u s − (n − 2 − n 1 − n 2 )β −1 u s − n 2 β −2 u s |t n | = 1 2 β 1− p n 1 + β − p n 2 β p − n + 2 + n 1 + (1 − β −1 )n 2 |t n |.

Conclusion
In most previous works that involve the use of Dekker's FastTwoSum algorithm, not only is the floating-point system restricted to bases β ∈ {2, 3} but also it is assumed that the summands x, y satisfy |x| ≥ |y|. In this note, we reminded that Dekker's original result is more general than this. The three examples in the previous section and further examples in the literature, including the accurate summation algorithms introduced in [13,15], demonstrate a wider applicability of FastTwoSum.
Theorem 2 generalizes Dekkers's condition for floating-point systems with larger bases β. Here our result is used to show that the transformation by ThreeProduct is error-free independent of the choice of β. It also can be used to prove similar generalizations for the algorithms in [13,15].
Moreover, we introduced a modified version of FastTwoSum that requires four instead of three operations but enables us to apply the FastTwoSum approach in cases where the conditions of Theorem 2 are not met. In the first and the third application discussed above, this is a better alternative than using the TwoSum function whose implementation require six basic operations.
We also brought up the applicability of FastTwoSum in the presence of faithful rounding. This can be useful if we work on a platform that, for the sake of performance or for other reasons, does not support rounding to nearest. The consideration of faithful rounding is also necessary if one has no control about possible changes of rounding modes caused by other routines.