Self-optimization of mobile robot actions using voice commands

Annotation. In most cases, multifunctional robots used both in industry and in the home environment to facilitate human labor have a common problem, which is a lack of autonomy and self-learning. This does not mean that it is impossible to implement self-learning within the operation of these machines, but most often such technical units function according to a specific algorithm, without deviating from its principles, even if this may lead to better results. First of all, a clear adherence to the program, in accordance with the acceptable quality of the final result, is put at the forefront. However, when working with technical units that do not depend on clear industrial or household algorithms, it is allowed to use an individual approach to the implementation of management processes. This work is devoted to the construction of the concept and principles of implementing voice control of a mobile robot for performing movement tasks.


Introduction
This paper presents the concept of a robot that uses neural networks to recognize voice commands. It is possible to integrate into the previously presented development of a mobile robot runner for novice athletes, competing with a group or individual athlete on running tracks with a set time when moving around the stadium [1].
In an analytical review of existing market solutions, it was found that the orientation systems of many products are most often represented by logical systems based on a variety of sensors. The proposed system is implemented as a neural network and requires only the presence of cameras and an inertial sensor of the robot's position for further orientation in space and avoiding unwanted collisions. In General, such a system is more flexible and cheaper due to the smaller number of additional sensors, and the only factor limiting the quality of the robot's orientation is the quality of the training procedure performed, taking into account the amount of data used.
As you know, a neural network is an artificial, multi-layered highly parallel (i.e., with a large number of independently parallel elements) logical structures made up of formal neurons [2].
Main applications of neural networks in mobile robotics:  automation of the classification process;  automation of forecasting and recognition;  automating the decision-making process;  managing, encoding, and decoding information.
In further work, the authors are primarily interested in the interaction of neural networks and sensor signals, the problems of signal perception and recognition in intelligent systems. Since the problem of signal recognition has been and remains relevant, more and more advanced methods of solving it are being developed [3,8,9,10].

Concept
If the economic aspect is not taken into account, the main problem hindering the widespread use of robotic systems is the problem of the limited scope of robot applications. Creating universal robotic systems that can be used for a wide range of tasks is still a partially unsolved problem. [12] The main disadvantages of existing speech recognition systems are the need for a clear audio signal and a system that can separate extraneous noise from the informative part of the signal without significant losses. This problem is proposed to be solved using the technology of the silent speech interfaces (SSI). [11] The concept is based on a robotic mobile platform with automatic workflow optimization based on reinforcement learning. [12] the robot's trajectory is formed by the operator's voice commands. After the initial implementation of the movement process, it is possible to change the trajectory based on additional adjustments.
Additionally, two cameras on the body provide the ability to avoid unwanted collisions with environmental objects. If a separate device is developed that is not integrated into the previously presented concept [4], there are mounting slots on the roof of the case, which are used, for example, for attaching an Autonomous manipulator, or a transport basket for transporting small loads. You can also add a Doppler radar if you need to work among other robots or recognize voice commands from behind a wall [6].
The appearance of the mobile transport platform is shown in Figure 1.  The software trainable part of the robot consists of 2 trainable recurrent neural networks. The first part of the neural network includes four layers, two of which are hidden.
[7] The structure of a fourlayer neural network is shown in figure 3.the input of the neural network receives an audio signal and falls on the first input layer of the first of the two neural networks, where the recorded audio track is divided into segments of 25 milliseconds and 10 milliseconds in increments. [19] This fragmentation is necessary so that in each segment the next hidden neural layers can detect the word among the stream of noise. After splitting over a short 25 millisecond track, it is transformed into a matrix of values, which is understandable for the neural network. [15] The resulting matrix is sent sequentially to two hidden layers, where its values are added to such pre-selected coefficients during training, so that the output layer has a specific integer symbolizing the label of a specific word. The connection between placemarks and audio tracks has already become known to the neural network after the machine learning procedure. The resulting integer values are written to a variable and based on its value, the robot moves straight, turns, or stops. The second part of the network also consists of four layers and its main task is to adjust actions based on voice commands after the end of the action cycle. A matrix consisting of 2 rows is fed to the input layer of the second neural network. [7] This matrix represents the order of actions recorded by the robot. The first line contains the values of a variable that describes actions (stop, forward, left, right). The second line describes the instants of time in seconds in which the action takes place. An example of such a matrix is shown in Table 1. In this case, "1" indicates movement, " 2 " a 90-degree turn and subsequent stop, "3" a -90-degree turn and subsequent stop, and "0" a stop and end of the path. The values of the described matrix are rounded and fed to the hidden layers. The task of hidden layers is also to add a certain coefficient to the matrix values based on the previous recorded passes of one path, but this time the task of the output layer is to collect a new generated matrix of values. A new path is generated each time the previous path is completed. This adaptability can be useful in a constantly changing environment. [20]  The principle of operation is based on repeating the robot's movements. Each movement is part of a cycle in which the robot implements the necessary actions. In this way, you can quickly and efficiently retrain the robot for a new task or change an existing one without additional software. [16] The mobility of the platform implies its active use, therefore-easy accessibility and the ability to quickly replace components. If the concept of a mobile robot designed to perform running exercises is taken as the basis [1], then components are used: Dimensions are not specified, as they may change directly during the assembly process.

Conclusion
The main advantages of the proposed concept are self-learning and ease of use without the need for additional software to manage the platform by the operator when using voice commands [14,17]. The basic nature of the platform provides a wide range of options for replacing individual elements in the event of a malfunction, as well as reducing the risk of disabling the device. It was mentioned above that the concept can be integrated into the previously presented development of a mobile robot runner for novice athletes. [1] control of the robot is planned directly, using voice.
The project can be used both for the development of household devices (in particular, robotic home cleaners of the new generation), and in sports, and for the construction of transport vehicles.