Load balancing in data distribution systems

The issue of distribution of large amounts of information on server stations is researched. The analysis of existing methods for distributing data of heavily loaded web applications is carried out. A system for computing load distribution using a balancing server is proposed. A description of the software necessary for the functioning of the system and server management interface is provided.

The analysis and comparison of existing monitoring and control systems for load distribution led to the following conclusions: − all considered systems allow storing data about servers and have the ability to make decisions about choosing a server for downloading information; − all systems reviewed have the ability to work with databases. They can process information, receive notifications of actions on servers, and operate with minimal hard disk load; − Nagios system uses CPU utilization to ensure stable operation during data processing, which determines its advantage over other systems; − low CPU utilization is typical for Icinga and Cacti systems. But at the same time, the advantage of these systems is the high speed of the program. It should be noted that all considered systems have the following advantages: open source code and availability for free use. The disadvantages of the systems considered are the need to adapt the code for user tasks and the complexity of forming functions that allow you to choose a server based on complete information about its status and characteristics of end users.
To solve the problem of load balancing, taking into account the states and characteristics of server stations, it is necessary to develop a load balancing system. The developed system should carry out information and control functions, enabling employees working with it to receive information about the status of servers and end users. An automated system that provides a complete list of servers that are available provides technical support for the entire server complex. It should be noted that the developed system provides the ability to access the required information, i.e. an authorized user (who has a certain set of rights) will be able to view the list of servers and check their status. Also, a timely response to the occurrence of errors on the server will make it possible to restore its operability on time. In addition, the AS will eliminate such a drawback of the considered systems as the complexity of forming functions for selecting a server, taking into account the status of the servers and the characteristics of end users. Thus, to solve the problem of load distribution, taking into account the status of server stations and their characteristics, a load balancing system was developed, including an operation algorithm and software that implements the functions of the system.

Theory. Load balancing algorithm
Consider the algorithm of the load distribution system in the information system that redirects client requests to the least loaded or most suitable server from the group of machines on which copies of the information resource are stored, figure 1.

Figure 1. The load balancing algorithm
In order to select a server station, the balancing server receives a list of all servers and begins to sort through them. The balancing server sends a request to each server (from the available list of servers) to receive data on the location of the server station and the current load. If the server station does not respond, then it is marked as idle and is excluded from the list of servers. The server station may be excluded from the list for the following reasons: it is not possible to establish a connection with it, or a response has been received about the server memory overflow. The implementation of the server station selection function is carried out based on the specifics of the ongoing project and the goals that are planned to be achieved. The main goals for which balancing is used in a highly loaded information system are the following: − timeliness, that is, a guarantee that system resources are allocated for the processing of each request and in this case situations will be excluded when one request is processed and the remaining requests are in standby mode; − efficiency, which means that all server stations that process requests are as fully loaded as possible; − reducing the query execution time, that is, ensuring the shortest possible time between the start of processing the request and its completion; − reduction of response time, that is, the formation of a response to a user request in the least possible time. Also, the load balancing algorithm of an information system should have the following properties: − predictability, that is, the user must understand in what situations and under what loads the algorithm will be effective for solving the tasks; − the ability to ensure uniform workload of all hardware and software resources; − scalability, which means that the data distribution system must remain operational while increasing the information load. To increase the operating speed of AS, the authors developed computational algorithms [7], that make it possible to distribute data and tasks between servers, which allows to reduce the workload of individual nodes of the information system, to obtain an increase in the speed of work, and ultimately to increase the efficiency of data processing in the system.

Results of experimental studies
When developing the AS, an authentication system using a database was implemented using the ExpressJS framework [8]. This software platform allows you to implement restriction of access rights and to delimit users of the system, as well as apply extensions to organize role policies. When working with speakers, unauthorized users cannot perform any operations and switch from the input page to other menu pages: the user will be automatically redirected to the login form. A user without access rights (without a username and password) can be registered only by the system administrator using the database. Also, only the AS administrator has the ability to directly change the set of actions allowed to the user. To change the availability of actions performed by the system for different users, the AS administrator must go to the action menu bar with the same name as specified in the address bar of the browser and add an array of users for this action (or action). Then these users will have the right to perform the specified operation. MongoDb is used as a database in a data distribution system. [9]. Figure 2 shows the program code of the function that adds the list of characteristics to the speakers.  Consider some of the main functions and actions implemented by the system. Each action implemented by the AS corresponds to a separate operation (action) of one or another controller. Each controller function performs certain actions, interacts with the model and returns control to the file view. In order to describe the internal operation of the system, we describe the actions of all the controllers used and the models and types of actions they cause. During the development, three main controllers were created that describe all the functions performed. These functions correspond to the models: Server, Statistic, User. Table 3 provides descriptions of each of the created models.   Figure 4 shows that the list of server stations is displayed in tabular form. The page contains several basic controls, such as buttons: add, delete, update and view. The table shows the identifiers of the servers in the AS, their names and descriptions. This table structure is designed to build a clear and understandable association of the server, its description and number. In addition to the add server button, there is also an edit and delete button; these buttons are located at the beginning of the table and have text prompts. The statistics page is intended for server monitoring; This interface page is shown in figure 5.

The discussion of the results
As a result of the system, we get a large amount of data on the status of server stations. Using methods for analyzing large amounts of data, new patterns can be obtained in the data on the status of server stations. We believe that it is advisable to use the multidimensional statistical procedure for cluster analysis for the final processing of data. So we can divide the received data into groups with similar parameters and add each new application to one of the groups, thereby determining a suitable server for it [10].

Conclusion
The paper describes the results of the development of an automated load balancing system in highly loaded computing systems. A comparative analysis of the existing data distribution methods of highly loaded web applications is carried out. It is noted that these technologies do not take into account the hardware status of server stations, their workload and speed, which negatively affects the speed of data delivery to the user. A load balancing system in highly loaded systems using a balancing server that implements a choice of servers based on an analysis of data on their dynamic characteristics is proposed. The results of the selection of hardware and software tools necessary for the implementation of the development of AS are presented. As a result of the work, the server-balancer control interface and the software necessary to perform the required AS functions have been created, which allows rationally distributing large amounts of information system information to server stations.