Specialized processors and algorithms for computing standard functions

. This article discusses the problem of creating a specialized processor device that allows you to increase the number of calculated mathematical functions, resulting in a single calculation scheme for all elementary functions, an algorithm for lowering the degree of a polynomial by the Chebyshev polynomial method and a single calculation scheme for the Horner scheme, by comparing the prototype by introducing a specialized reprogramming block processor device


Introduction
The goal of the task is to create a specialized processor device that allows increasing the number of calculated mathematical functions by introducing a random-access memory unit, a De multiplexer and a recording control unit into the specialized processor device, which allows reprogramming the specialized processor device with commands from the central processing unit of an electronic computer in which this specialized processing unit. Now we will consider the calculation of power polynomials and give methods for the transition from software to hardware implementation [1].
Usually, all algorithms for calculating functions have a common identical part: they require using the appropriate formulas to bring the entire interval of variation of the argument of a given function to a certain standard interval, where the approximation is carried out.
As you know, the time spent on such a reduction is the same for all algorithms. It may turn out that one algorithm has high performance and takes up a large amount of memory, and the other has low performance with a small amount of memory. The question arises of which of them is more efficient for hardware implementation. The choice of a specific algorithm will be influenced by the goals that we strive to achieve with hardware implementation, and the technical means that we will have at our disposal. Now we will consider the questions of a generalized algorithm for calculating elementary functions using the method of approximation by polynomials of the best uniform approximation.
As stated by the existing algorithm for calculating elementary functions [2] does not meet the requirements of a special purpose. Therefore, the proposed method for calculating elementary functions f(x) -by means of their expansion into series or by their approximations by polynomials of the best uniform approximation -is based on the calculation of a polynomial of degree n from a given argument x, which has constant coefficients for a given function. If a computer allocates an area in fast memory to store these coefficients, then the time for calculating elementary functions will depend on how quickly we can calculate the polynomial of the nth degree [3,4]. The tendency to reduce the cost of permanent memory and the use of only two arithmetic operations (multiplication and addition) make this method and, accordingly, the algorithm very promising for the hardware implementation of elementary functions, especially in special-purpose machines.
A function expanding into a Taylor series is more convenient to uniformly approximate a function on a segment with an accuracy of ԑ using the method of finding the best uniform approximation. Now we formulate a generalized algorithm for calculating elementary functions (Fig. 1). The operation of the algorithm begins with the selection of a segment of the Taylor series that approximates the given function with higher accuracy than the specified accuracy, i.e.
As a result, we obtain a new type of polynomials: Estimate the error of a polynomial: Where m « n. 7. Calculate error estimates: We check the following condition If the condition is met go to step 9, otherwise go to step 1. with a given accuracy ԑ. Has a minimum degree. The experiments carried out on the computer on well-known methods: "Function expansion into a Taylor series", "Best approximation", and "Fractional-rational approximation" show that to obtain trigonometric functions sin x, cos x , tgx and ctgx with an accuracy of 10-10 by the method the expansion of a function in a Taylor series requires 9 of the 13 terms of the series, respectively, in a certain interval, and the proposed method for finding polynomials of the best uniform approximation using a generalized algorithm gives the same accuracy with 5 terms. To calculate the inverse trigonometric functions arcsinx and arctgx, a minimum degree of 6 is sufficient, together with 11 terms of the Taylor series; the method of fractional-rational approximation requires 9 addition operations and 1 division operation.
The obtained results show that the algorithms developed by us are more efficient in comparison with other methods. For example, the results by reducing the degree of a polynomial while maintaining accuracy and the generalized algorithm allow a 2-fold increase in the calculation speed and reduce hardware costs, compared with the Taylor series expansion method and the fractional-rational approximation method.
The proposed method and algorithm is applied to special-purpose computers, processors and aircraft.
Efficiency is usually measured either by the number of costs required to obtain certain costs, an integral characteristic, or a system efficiency criterion.
From this point of view, a high-speed algorithm and device are proposed for providing specialized digital computers.
Based on the above, we will calculate the exp(x) functions according to the formula: To calculate ex, we use the expansion in a Taylor series into the interval [-1,1]. Moreover, it is enough to take 14 terms of the series so that the absolute error does not exceed 10-11, i.e., |R14| < 10-11.
Under the influence of the method of finding the polynomial of the best uniform approximation, we lower the degree of the series. Moreover, the error after that does not exceed 10-10. It can be seen that in place of 14 members remains 4.
For |u| ≤ 0.5e u calculated by the following formula: 16. C14=C13*C12; 17. e x =c14. It can be noted that this algorithm is easily applied to obtain arbitrary precision (on demand).
We also propose a method of parallel computing and parallelization of the device, based on two arithmetic operations -multiplication and addition, to increase the speed of devices [7].
The disadvantage of the known device is a limited class of tasks to be solved, i.e., the ability to calculate only one function ln(1 ) yx =+ and slow performance. In a known device, this function is approximated by a segment of the Taylor series, which is calculated according to the Horner scheme: x  -the difference between the argument values and the nearest integer;  We propose a device that performs actions according to formula (1) to find the value that occurs sequentially, i.e., the product is calculated first 0
(those. is an iterative process). Therefore, the time for calculating the function for our device without selection is the time for sampling from the ROM.
The proposed device is more advanced -expanding the class of tasks to be solved due to the possibility of calculating all the functions that lead to the solution according to the Horner scheme and increasing the speed by combining the operations of multiplication and fetching from ROM.
This goal is also achieved by the fact that a third register and a bus buffer are additionally introduced into the device, the first port of which is connected to the computer, and the second port is connected to the inputs of the first, second and third registers, as well as to the outputs of the first and second groups of gates having three states, the second group of inputs of the adder is connected to the outputs of the third register, and the first and second groups of inputs of the multiplier are connected respectively to the outputs of the first and second registers, the outputs of the adder are connected to the inputs of the first group of gates, the corresponding outputs of the control unit are connected to the control inputs of the third register, bus buffer and ROM. Figure 2 shows a block diagram of the proposed device. The circuit includes registers 1,2 and 3, a multiplier 4, an adder 5, the first 6 and second 7 gate groups, ROM 8, a bus buffer 9, a control unit 10, the first group of inputs of the multiplier 4 is connected to the outputs of register 1, and the second the group of inputs is connected to the outputs of register 2, the first group of inputs of the adder 5 is connected to the outputs of the multiplying device, and the second group of inputs is connected to the outputs of register 3, the inputs of the first group of gates 6 are connected to the outputs of the adder, the inputs of the second group of gates 7 are connected to the outputs of the ROM 8, the first port of the bus buffer 9 the first port is connected to the inputs 1,2, and 3 of the registers and the outputs of the first and second groups of gates, and the second port of the bus buffer is connected to the data bus of the computer, with the control inputs of the registers, ROM gates and the bus buffer are connected to the corresponding inputs control unit 10.
In the initial state, gates 6 and 7, as well as bus buffer 9, are in the third state, in addition, bus buffer 9 is configured to receive information from the computer.
The creation of specialized computing devices designed to solve special problems is accompanied by well-known difficulties associated with the need to increase their speed and calculation accuracy.
Methods for constructing such devices are based on the implementation of new calculation methods, improvement and wider use of a complex of independent computing units.
A structural scheme for organizing the calculation of elementary functions is proposed, consisting of a device that converts numbers from a generalized interval to a standard one and vice versa. A device has been developed that converts floating-point numbers to fixedpoint numbers and vice versa.
To improve the technical characteristics of specialized devices, in the conditions of widespread use of large integrated circuits and the development of control systems for a digital computer, a structural organization of the calculation of elementary functions is proposed to ensure the high accuracy of the system. Computer control systems and their development largely depend on the available central processor and the ability to connect additional external devices, taking into account the possibility of organizing a backbone architecture.
Studies have shown that the proposed method for calculating elementary and special functions has a single architectural solution for calculating all functions. Ensuring the homogeneity of the structure of the developed device is satisfied with a decrease in the number of operations to be calculated to recurrent formulas:  The implementation of the results of the work and the main scientific provisions will not only save time for the tasks being solved but also reduce hardware costs and financial resources.