Integer Representation of Decimal Numbers for Exact Computations

A scheme is presented and software is documented for representing as integers input decimal numbers that have been stored in a computer as double precision floating point numbers and for carrying out multiplications, additions and subtractions based on these numbers in an exact manner. The input decimal numbers must not have more than nine digits to the left of the decimal point. The decimal fractions of their floating point representations are all first rounded off at a prespecified location, a location no more than nine digits away from the decimal point. The number of digits to the left of the decimal point for each input number besides not being allowed to exceed nine must then be such that the total number of digits from the leftmost digit of the number to the location where round-off is to occur does not exceed fourteen.


Introduction
Mainstream computers base integer and floating point arithmetic on fixed word lengths. As a consequence, only values with a limited number of significant digits can be represented directly, so that the results of arithmetic operations may have to be rounded off or truncated. Such errors can be avoided or, at least, mitigated, by implementing special algorithms for the execution of arithmetic operations. A fully "exact arithmetic", however, would have to be based on quotients of integers for representing numerical values. In any case, a final limitation is due to finite memory.
The need for exact arithmetic became apparent during the development of software for generating triangular and tetrahedral nets from very large point sets. Typically, this need is not due to high accuracy requirements for results-the input data are often noisy or given up to only a few significant digits-but is rather due to the need to maintain the consistency of a combinatorial structure. Building or manipulating such geometry-based combinatorial structures requires the calculation of indicators such as determinants in order to evaluate their sign and to check for zero values: roundoff may lead to a false sign or zero value. An example considered later in this paper is to decide whether four given spatial points are coplanar. The approach of using exact computations for implementing computational geometry algorithms in a robust manner has been addressed in [2], [3], [4], [5], [6], [7]. Some computational geometry implementations ( [2], [3], [4], [5]) reduce computational effort by utilizing exact arithmetic selectively whenever a decision might be affected by round-off.
In this paper, we document software for exact integer arithmetic, accommodating an indeterminate number of A scheme is presented and software is documented for representing as integers input decimal numbers that have been stored in a computer as double precision floating point numbers and for carrying out multiplications, additions and subtractions based on these numbers in an exact manner. The input decimal numbers must not have more than nine digits to the left of the decimal point. The decimal fractions of their floating point representations are all first rounded off at a prespecified location, a location no more than nine digits away from the decimal point. The number of digits to the left of the decimal point for each input number besides not being allowed to exceed nine must then be such digits, for multiplication, addition, subtraction, but excluding division. We found that, in many computational geometry applications, decision variables such as determinants can be calculated without division. Also the sign of a decision variable stated as a quotient but not evaluated is readily derived from the signs of the numerator and denominator.
We also describe a preprocessing step, called "twointeger decomposition", which leads from floating point input to one composed of integers only. At the root of this step lies the concept of space as an integer grid of points, all of which have integer coordinates in some shared unit. After completing the transition from floating point numbers to intermediate representations as pairs of integers-prompted by the fact that Fortran 77 does not provide a double precision integer format-a polynomial decomposition creates the number representations to be used in the exact arithmetic calculations. Software for this preprocessing step together with software for exact integer arithmetic has been successfully incorporated into several computational geometry related programs such as REGTET [1].
In what follows, a "standard computer" is a computer that uses 64 bits of storage for a double precision number and 32 bits for an integer. Given a standard computer, even though it may not store exactly an input decimal number as a double precision floating point number, it is safe to assume that the number will be represented as accurately as possible by a double precision floating point number up to its fourteenth significant digit.

Two-Integer Decomposition
Let x(i), i = 1, ..., n, be a double precision array into which input numbers x i , i = 1, ..., n, have been read. The two-integer decomposition process is a preprocessing step that takes place before any computations based on the input data are carried out. It rounds off the numbers in the array at a prespecified location of their decimal fractions and decomposes each rounded off number into two integers that are saved in integer arrays, say ix(i), ix2(i), i = 1, ..., n. The rounded off numbers are then saved in array x(i), i = 1, ..., n.
Given integers k, l, 1 ≤ k ≤ 9, 0 ≤ l ≤ 9, k + l ≤ 14, and assuming each input number x i , i = 1, ..., n, has no more that k digits to the left of the decimal point, each number x(i), i = 1, ..., n, is rounded off at the lth digit of its decimal fraction and decomposed into two integers in one of two ways according to its size. If the absolute value of x(i) times (10.0d0) l is less than 2 30 (=1073741824), x(i) is multiplied by (10.0d0) l and rounded off at the decimal point. The resulting integer is then placed in ix(i) while ix2(i) is set to zero. Finally, x(i) is redefined to be the double precision value of integer ix(i) divided by (10.0d0) l . On the other hand, if the absolute value of x(i) times (10.0d0) l exceeds or equals 2 30 , x(i) is truncated at the decimal point. The resulting integer (absolute value less than 2 30 since k ≤ 9) is placed in ix(i). In addition, the signed decimal fraction obtained by subtracting the double precision value of this integer from the initial value of x(i) is multiplied by (10.0d0) l and rounded off at the decimal point. The resulting integer (absolute value less than 2 30 since l ≤ 9) is placed in ix2(i). Next, x(i) is redefined to be the double precision value of integer ix(i) plus the value obtained by dividing the double precision value of integer ix2(i) by (10.0d0) l . Finally, if the integer ix2(i) is zero then ix2(i) is set to 2 30 so that ix2(i) is zero if and only if the initial absolute value of x(i) (before the two-integer decomposition process) times (10.0d0) l is less than 2 30 .
The following is Fortran code for carrying out the two-integer decomposition process. Variables are either integer or double precision following convention. It is noted that while some loss in precision may occur at the time the input numbers are read and transformed into double precision floating point numbers, some additional loss in precision may occur here as well when the decimal point in a number is shifted by dividing or multiplying it by a multiple of 10.0d0, when the signed decimal fraction of a number is obtained by truncating the number at its decimal point and subtracting the result from the initial value of the number, and when a number is rounded off with the two-integer decomposition process. However once the two-integer decomposition process is completed all computations that follow, exact and otherwise, are carried out in terms of the arrays x(i), ix(i), ix2(i), i = 1, ..., n, under the assumption that for the purposes of the user for each i, i = 1, ..., n, x(i) represents closely enough the input number x i rounded off at the lth digit of its decimal fraction, and an integer (not necessarily stored by the computer) in terms of ix(i) and ix2(i) represents closely enough the input number x i times 10 l rounded off at the decimal point.

Polynomial Decomposition
Given an integer l, 0 ≤ l ≤ 9, let x i , i = 1, ..., n, be input numbers whose double precision floating point representations have been rounded off at the lth digit of their decimal fractions through the two-integer decomposition process. Let x(i), ix(i), ix2(i), i = 1, ..., n be the arrays produced by the two-integer decomposition process that contain the rounded off numbers and the two-integer decompositions. For each i, i = 1, ..., n, an integer J(i, l) is symbolically defined as follows (its actual value is not necessarily computed or stored by the computer). If ix2(i) equals zero then J(i, l) is set equal to ix(i). If ix2(i) equals 2 30 then J(i, l) is set equal to ix(i) · 10 l . Finally, if ix2(i) is neither zero nor 2 30 then J(i, l) is set to ix(i) · 10 l + ix2(i). In all cases for each i, i = 1, ..., n, J(i, l) is considered to approximate closely enough (for the purposes of the user) the input number x i times 10 l rounded off at the decimal point.
Set M to 2 15 . Given J(i, l), 1 ≤ i ≤ n, the polynomial decomposition process is a procedure (presented below in the form of Fortran subroutine decmp2) that decomposes the integer J(i, l) into a unique collection of inte- Integers a k , k = 1, ..., ika, are saved in an integer array, say ia(k), k = 1, ..., ika, and the collection of integers isga, ika, ia(k), k = 1, ..., ika, and the symbolic expression isga(Σ ika k=1 ia(k) · M k-1 ) are then called, respectively, the polynomial decomposition and the symbolic polynomial representation of J(i, l), with isga as the sign of the representation. For each i, i = 1, ..., n, the polynomial decomposition of the integer J(i, l) is identified each time an exact computation involving additions, subtractions, or multiplications is required that references the input number x i . During one such computation, for each i, 1 ≤ i ≤ n, if the number x i is referenced in the computation, once the polynomial decomposition of the corresponding integer J(i, l) is identified, each reference of x i in the computation is replaced by the symbolic polynomial representation of J(i, l). The computation then takes effect as a sequence of additions, subtractions, or multiplications of symbolic polynomial representations with the final result being itself the symbolic polynomial representation of some integer. This final result can usually be used in only one of two ways. If it is known that for some positive integer m the integer that is equal to the final symbolic polynomial representation is approximately equal to the product of (10 l ) m and the true value of the computation, then this integer is computed approximately as a double precision floating point number from its symbolic polynomial representation and the true value of the computation is then approximately obtained by dividing it by ((10.0d0) l ) m . On the other hand, if the purpose of the computation is simply that of obtaining the sign of the true result then the sign of the final symbolic polynomial representation is a satisfactory answer.
The concepts of polynomial decomposition and symbolic polynomial representation defined above for J(i, l), 1 ≤ i ≤ n, can also be defined for any integer K (not necessarily stored by the computer) in the same manner. Accordingly, the following is a Fortran subroutine called decomp for finding the polynomial decomposition isga, ia(k), k = 1, 2, (ika is already known to equal 2) of an integer iwi (stored by the computer) with absolute value less than 2 30 . Here mhalf equals 2 15 (=32768). In particular if isclu is set to 10 l then isclu is less than 2 30 (since l ≤ 9) so that the polynomial decomposition isgu (equal to 1), iu(i), i = 1, 2, (iku is already known to equal 2) of isclu can be obtained by calling subroutine decomp with a Fortran instruction as follows.

Multiplying Symbolic Polynomial Representations
Given the polynomial decompositions isga, ika, ia(k), k = 1, ..., ika, and isgb, ikb, ib(k), k = 1, ..., ikb, of two integers K 1 and K 2 , respectively, the following is a Fortran subroutine called mulmul that produces the polynomial decomposition isgo, iko, io(k), k = 1, ..., iko, of the integer K 1 · K 2 by multiplying the symbolic polynomial representation of K 1 by that of K 2 (as polynomials) to produce a symbolic polynomial representation of K 1 · K 2 from which the polynomial decomposition of K 1 · K 2 can be obtained. Here nkmax is the dimension of the arrays ia, ib, io in the calling routine and mhalf equals 2 15 . It is noted that the value of mhalf is of importance here since given integers i, j, 15 , so that the product ia(i) · ib(j) is less than 2 30 and therefore can be stored in a 32 bit integer word.

Subtracting Symbolic Polynomial Representations
Given the polynomial decompositions isga, ika, ia(k), k = 1, ..., ika, and isgb, ikb, ib(k), k = 1, ..., ikb, of two integers K 1 and K 2 , respectively, the following is a Fortran subroutine called muldif that produces the polynomial decomposition isgo, iko, io(k), k = 1, ..., iko, of the integer K 1 -K 2 by subtracting the symbolic polynomial representation of K 2 from that of K 1 (as polynomials) to produce a symbolic polynomial representation of K 1 -K 2 from which the polynomial decomposition of K 1 -K 2 can be obtained. Here nkmax is the dimension of the arrays ia, ib, io in the calling routine and mhalf equals 2 15 . It is noted that by setting isgb equal to -isgb the polynomial decomposition of K 1 + K 2 can also be obtained with this subroutine.

Application: Locating a Point Relative to a Plane
Given an integer n, n ≥ 4, let S be a set of n points in 3-dimensional space. Given an integer l, 0 ≤ l ≤ 9, let x i , y i , z i , i = 1, ..., n, be the input decimal coordinates of the points in S, and assume that their double precision floating point representations have been rounded off at the lth digit of their decimal fractions through applications, one per coordinate, of the two-integer decomposition process. Accordingly, let x(i), ix(i), ix2(i), y(i), iy(i), iy2(i), z(i), iz(i), iz2(i), i = 1, ..., n, be the arrays produced by the three applications of the two-integer decomposition process that contain the rounded off x-, y-, z-coordinates and their two-integer decompositions.
Given points p 1 , p 2 , p 3 in S that are vertices of a nondegenerate triangle, a fundamental problem in computational geometry is that of finding the location of a point p 4 in S relative to the plane H that contains the triangle. Let H + be the open half-space defined by H for which p 1 , p 2 , p 3 appear in a counterclockwise direction around the boundary of the triangle when looking at the triangle from H + . Let Hbe the other half-space defined by H. Determining in which of H, H + , H -, the point p 4 is located may not on occasion be satisfactorily done using floating point arithmetic. Accordingly, the following is a Fortran subroutine called crsinn for doing this using polynomial decompositions. On output the sign isgo (-1, 0, 1) of some polynomial decomposition determines the location of p 4 (H -, H, H + ).
This routine actually does more. It produces polynomial decompositions isgox, ikox, iox(k), k = 1, ..., ikox, isgoy, ikoy, ioy(k), k = 1, ..., ikoy, isgoz, ikoz, ioz(k), k = 1, ..., ikoz, of integers that are the coordinates of a vector v pointing into H + and perpendicular to H. It also produces the polynomial decomposition isgo, iko, io(k), k = 1, ..., iko, of an integer whose sign isgo determines the location of p 4 and whose value when divided by both 10 l and the length of v is the perpendicular distance from p 4 to H. Here mhalf equals 2 15 , mfull equals 2 30 , and if ir, isec, ithi, ifou are locations in the arrays ix, ix2, etc. corresponding to the points p 1 , p 2 , p 3 , p 4 , respectively. In addition, isclp(k), k = 1, 2, is an array such that the polynomial decomposition of 10 l is isclp(1), isclp(2) (the sign of 10 l and the dimension of array isclp are already known to be 1 and 2, respectively).  Sometimes besides knowing the location of the point p 4 relative to the plane H it may be desirable to know the perpendicular distance from p 4 to H. The following is Fortran code for this purpose. It uses the polynomial decompositions that are part of the output of subroutine crsinn. Variables here are either integer or double precision following convention. Here r215 equals (2.0d0) 15 , dscle equals (10.0d0) l , and dist is the resulting signed perpendicular distance. In addition, it is assumed that subroutine doubnm (presented below) The following is subroutine doubnm that was called above.

Numerical Examples
Twelve lines follow, each line containing three numbers. Each line corresponds to a point in 3-dimensional space, and the three numbers in the line correspond to the x-, y-, z-coordinates of the point, in that order. Given i, 1 ≤ i ≤ 12, point i is the point corresponding to the ith line. It is assumed that the coordinates of the twelve points are read into double precision arrays x(i), y(i), z(i), i = 1, ..., 12, so that x(i), y(i), z(i) contain the x-, y-, z-coordinates, respectively, of point i. Given l equal to 8, through the two-integer decomposition process, the numbers above are rounded off at the lth = 8th digit of their decimal fractions and saved in x(i), y(i), z(i), i = 1, ..., 12, so that then they appear as follows. Each rounded off coordinate is also decomposed into two integers. Twelve lines follow, each line containing six integers. For each i, i = 1, ..., 12, the first two integers in the ith line are the two integers into which the xcoordinate of point i is decomposed. Similarly the next two integers correspond to the y-coordinate, and the last two to the z-coordinate. The two-integer decompositions of the twelve points are then saved into ix(i), ix2(i), iy(i), iy2(i), iz(i), iz2(i), i = 1, ..., 12, in the obvious manner. It is noted that when mfull = 2 30 = 1073741824 appears as the second integer corresponding to a coordinate it is to be interpreted as a zero. Given l as above equal to 8 and setting isclu to 10 l = 10 8 , by calling subroutine decomp the polynomial decomposition of isclu is found to be isgcl, isclp(1), isclp (2), with isgcl equal to 1, isclp(1) equal to 24832 and isclp(2) equal to 3051 (10 8 = 24832 + 3051 · mhalf = 24832 + 3051 · 2 15 ).
As an example of how exact computations are carried out that reference coordinates of the input points given above, the computation that is the product of the x-coordinate of point 2 and the y-coordinate of point 4 minus the product of the y-coordinate of point 2 and the x-coordinate of point 4 is described. Using rounded-off numbers the result of the computation should equal x(2) · y(4) -y(2) · x(4), i. e., (38.000000000) · (-49.840807040) -(7.049967880) · (85.557213020).
Finally, an example is given for the purpose of describing the process of locating a point relative to a plane that contains three other points. Here the point whose location is desired is point 12 as given above, and the three other points are point 1, point 2, point 8 also as given above.

Summary
A scheme has been presented and software has been documented for transforming into a series of integers input decimal numbers that have been read into a computer as double precision floating point numbers and for carrying out multiplication, addition and subtraction operations based on these numbers using their integer representations. The total number of significant digits of each input number must not be greater than 14, and the number of digits to the left of the decimal point must not exceed 9. Through a preprocessing step the double precision floating point representation of each input decimal number is rounded off at a prespecified location of its decimal fraction, a location no more than 9 digits to the right of the decimal point, and the rounded off number is decomposed into two integers. All operations that follow involving the input number are then carried out in terms of the rounded off double precision floating point number and when this is not satisfactory in terms of the two integers. This scheme has been successfully incorporated into several computational geometry related programs such as REGTET [1]. Other programs that incorporate this scheme can be found at http://math.nist.gov/~JBernal/JBernal_Sft.html. Besides being used for locating a point relative to a plane, in these programs the scheme has also been used for locating a point relative to a sphere, for computing the intersection of a line and a plane, for computing the center of a sphere, etc.