Automating log analysis for industrial equipment maintenance using elastic stack

. Contemporary technological equipment generates a substantial amount of data that can provide insights into system performance and help predict failures and critical errors. However, manual analysis of this data is time-consuming, and hence there is a need for automated tools to collect, process, and store log files. In this context, the study aims to develop an information system that can streamline log file analysis using ELK software solutions from the IT industry. The article explores the structure and components of ELK software and develops a software solution's structural diagram and architecture for practical use. Based on the conducted research, a software solution was implemented, and log data provided by the NASA Ames Research Center was analyzed and visualized through graphs and histograms. The study's novelty lies in using the ELK software stack for log file analysis of technological equipment, which is a widely used solution in the IT industry. The proposed system aims to reduce log file analysis time and help make informed decisions about system performance and maintenance.


Introduction
Data collection from technological equipment is one of the important tasks in industrial enterprises, as well as in conducting research work. Tracking a variety of diagnostic data and timely receipt of information about the operation or failure of equipment allows production to immediately respond to emergencies, take actions to reduce downtime and extend equipment life [1,2]. In order to ensure the performance of systems, engineers analyze various types of data generated by control systems during their operation [3]. Files about events created by CNC machines -log files, which are formed in the form of text, can serve as a source of data on the operation of technological equipment used for further analysis. Based on the process information provided in the log files, monitoring of the system parameters, identifying irregular situations, or obtaining other information (for example, equipment uptime, machine status data) that can be useful in analyzing the causes of failures can be carried out. However, in its raw form, working with log files becomes a labor-intensive task due to the lack of the ability to search and filter data, which is unstructured. As a result, the creation of a system for aggregation, control and display of log files to present technological information to higher levels of enterprise management is an urgent task [4,5]. To solve this task, the article proposes a way to use a set of specialized software tools that allows collecting, storing and centrally controlling the log files of technological equipment.
As evidenced by research reviews [6][7][8], modern methods have recently been implemented to improve the data collection systems of technological equipment to solve the problems of creating a unified information environment. To do this, the work [6] proposes a single multi-protocol and cross-platform communication environment, and the articles [7,8] propose a model of the components of the subsystem for assessing and monitoring the health of a CNC machine. In these studies, the problems of obtaining data and communication protocols with technological equipment are also considered, which confirms the relevance of these tasks.
Also, the idea of using equipment log files was explored in [9], where an application for predictive monitoring and analysis of the root causes of downtime of woodworking machines is proposed. The area of study in our article is at the intersection of the problems presented above. The paper considers the technological aspect -obtaining information about the operation of equipment in production, aggregation and visualization of the data obtained. In this case, not specialized production control systems are used, but a stack of ELK software solutions for analyzing data from a file that contains information about the results of milling machine runs.
In the process of analyzing log files, there are several problems, firstly, the lack of a standard for maintaining log files, which is why each file in the system has its own, and most often, unstructured format. In addition, the large size of unstructured files can make it difficult to find important information among other messages about the progress of process equipment. The article covers a method of using a specialized stack of software solutions for the collection, storage and centralized control of log files of process equipment. In the next section we explore the structure and components of the ELK software solution. In section III based on this analysis, the structural diagram and architecture of a software solution for the practical use of ELK in production have been developed. Further, based on the conducted research, a software solution was implemented, with the help of the data set presented by the NASA Ames Research Center was analyzed and visualized in the form of graphs and histograms. The results discussed in section IV.

2
Methods of research

Toolkit for analyzing unstructured technological data
The purpose of this study is to reduce the time of log files analysis of technological equipment by creating an information system for the presentation and analysis of log files using software solutions from the IT industry. There were four primary objectives of this study: to select a tool for processing log files, to develop theoretical aspects for describing the construction of software solutions for processing log files of technological equipment, to implement a software solution for automated processing and to test the developed software solution for the log file analysis system. As a result of the study of existing log file analysis tools, such as Splunk, ELK Stack, Graylog, Sumo Logic, Loggly, a set of tools was selected from three ELK software solutions. Almost all the tools studied involve the analysis of single-line log files, however, if systems create log files in XML format, then extracting useful content from logs of this type requires focusing on the structure of the document. In the ELK software stack -Elasticsearch, Logstash, Kibana, each component performs a specific task, Elasticsearch is a database and search engine, Kibana is a tool for displaying data, Logstash performs the role of a collector and the function of primary processing of log files. Logstash is not tied to the log format, it is flexibly configured through configuration files, that is, this component can be configured both for single-line log files and for XML, JSON. In addition to this, the choice of the tool was also influenced by the open source code of the ELK Stack components, the ability to deploy on the internal network of the enterprise and analyze data in real time.
Collection, storage, processing of unstructured data at the production site is an expensive task for most enterprises. In this regard, when analyzing existing tools for processing log files, in addition to technical characteristics, it is important to take into account the cost of software components. In this regard, the ELK software stack was chosen to build a system for analyzing log files, which is also distributed as an open-source solution. ELK is an abbreviation of the names of three software products: Elasticsearch, Logstash, Kibana, developed and supported by Elastic.
The core of the stack is a component of Elasticsearch, which is a database with a full-text search and analysis system based on the specialized Apache Lucene search technology, which makes Elasticsearch different from relational databases or NoSQL systems. In relational databases, there are concepts such as rows, columns, tables, and schemas. Elasticsearch and similar repositories work differently. The basic unit of information stored in Elasticsearch is a json-document, which is a text format for data exchange between a client and a server. As shown in Fig. 1, documents are stored inside types, and types are stored inside indexes. An index can contain one or more types, and each type can contain a huge number of documents. Index structure in Elasticsearch can be compared to the database structure in relational databases. Continuing the analogy, the type in Elasticsearch corresponds to the table, and the document corresponds to the records in the table (Table 1). Indexes are divided into data segments and distributed among cluster nodes. A node is a single server in the system, which may be part of a large cluster of nodes. The cluster consists of several nodes, each of which is responsible for storing and managing its part of the data (Fig. 2). Elasticsearch is a distributed system that is designed to work even if the hardware it runs on fails. For this, copies -replicas of the main index segments are provided. In presence of replicas, if the first node fails, then the segment from this node will still be available in the other two nodes. In order to access the distributed system of main shards, a coordinating node receives search queries and then sends reformulated queries to the cluster nodes. The second component of the stack, i.e. Logstash is a log file aggregator that collects data from various input sources, performs the necessary transformations, and then sends them to the database for further processing. The event-processing container in Logstash has three stages: entry, filtering, output (Fig. 3). Only the input and output stages are required, filtering is an optional part. The input stage creates events, filters modify input events, and outputs send them to the destination. The Logstash container is stored in a configuration file. Configuration file sections, i.e. input{}, filter{}, output{} contain one or more plugin configurations. The input plugin is designed to customize the events passed to Logstash. The filter plugin is used to modify the data. The output plugin is used to send data to the destination.
The third component of the Kibana stack, the visualization layer, runs on top of Elasticsearch, giving users the ability to visually represent and analyze data. Kibana can be roughly divided into two main modules: the user interface module, which defines the graphical user interface with which users usually interact, and the second module, the server, which transfers data from the Elasticsearch clusters to the first module through the internal application program interface -API, (Fig. 4). Kibana functions are implemented through software modules or kernel plugins that contain the required business logic. The server part of Kibana is linked to Elasticsearch and provides an internal API for using modules from the user interface. When a user accesses Kibana through the graphical user interface, the user interface module loads all the core plugins that include the necessary Kibana functionality.

3
Research results

The structure of a single platform for the analysis of technological data
The integration of individual components into a single platform for log files of technological equipment is shown in the structural diagram of the proposed system (Fig. 5). Logstash modules are program code and run on a single board computer. Next, the data is read from the log file of the technological equipment, converted to the required form, and then sent to the data processing server. Log files are converted according to the specified configuration file in accordance with the required data structure. On the server, the Nginx reverse proxy that handles the SSL connection handles data transfer security. The transformed data is transferred to the Elasticsearch database for storage, retrieval, and analysis. Visualization of the received data is carried out using the Kibana utility, which provides the ability to create control panels consisting of graphs, charts and other visualization options. At the next stage, if necessary, it is possible to set up real-time monitoring, add various notifications, and display reports on errors that occur. The multilevel architecture of the log file analysis system is shown in Fig. 6. The process equipment and communication interfaces with the equipment are located at the lower level. At the server level, the presented system is located, consisting of three components: a module for collecting log files, a data processing server, and a data-providing server. Information display devices are located at the top level.
Therefore, application of the presented set of software solutions allows searching, analyzing, collecting and visualizing data. To do this, configuration of Elasticsearch for search and analysis, Logstash for data collection, and Kibana for data visualization is needed [10].

Analysis of log files on an example of "Mill Data Set"
The practical implementation of the analysis of log files using the presented solution was carried out based on open data from the NASA Ames research center [11]. The data in the specified set are records in the file -the results of the launches of the milling machine under various operating conditions (Fig. 7). In particular, the wear of the tool during cutting was studied in the process. Runs were made under various input conditions (such as depth of cut, feed rate, working piece material). Under the same conditions, the launch of the equipment continued until the wear limit of the tool. Initial data was organized in a text format file designed to represent tabular data (CSV format), consisting of 14 fields and 167 records. The fields contain data on the number of starts, tool wear, as well as data collected by different types of sensors (current, vibration and acoustic emission sensors).
Transformation of the initial data from the file was carried out according to the configuration file (Fig. 8), in which the data processing pipeline is configured. The file consists of three sections. The first section of the above configuration contains an instruction to read data from a file located at the provided address.
In the next section, the filter plugin scans each row of data for placement. By default, this plugin uses a comma character to separate fields. The following is a conversion setting inside the filtering section, which changes the data type of the fields, which are written as strings by default.
The output plugin in the third section is used to send events from Logstash to Elasticsearch. This is not the only way to do this, but it is the preferred one. Once the data is in the Elasticsearch database, it can be used for visualization in Kibana.  For visual analysis of the obtained data on the control panel using the Kibana user interface, an interface assembly is implemented in the form of line graphs and histograms, which contain vibration values, table and spindle acoustic emission signals for different milling parameters (Fig. 9).   Fig. 9. Testing a software solution for a log data processing system

Discussion
A fairly large number of process equipment monitoring systems have appeared on the world market these days, among which can be distinguished as solutions for specific models of control systems that are used on equipment [7,12], as well as universal systems, which, for example, use standards such as OPC UA [13] for communication. In addition, it is worth noting that technological equipment is used in a wide variety of applications, various articles discuss options for monitoring machine-building [7,12], woodworking [9] and other types of equipment. However, this is not the full range of tasks that monitoring systems need to solve. In addition to collecting data, it is necessary to process, analyze and present them in a graphical version for ease of perception by the operator. In this case, it becomes necessary to implement specialized software modules that can be used to analyze log files from various types of equipment. The solution of this problem was successfully demonstrated in our article. For this kind of tasks, both individual and entire stacks of modular open source software can be used. This approach allows to reduce development time without losing the quality of the final product.

Conclusions
In the course of the study, the most suitable technologies for storing and analyzing log files of technological equipment were studied and selected, the structural and architectural schemes of the system were developed, and a test bench was deployed. Then using Kibana, a visualization tool, infographics were built based on the data. A practically significant result is the ability to work with data not by reading text files, but by using a tool for analytics and building various data control panels. This makes the process of analyzing initially unstructured data accessible and fast. Subject specialists get access to data and a powerful tool with which they can conduct analysis. One of the advantages of the selected software package is that the data entered into Elasticsearch is stored in a structured form, the structure of the logs is determined at the stage of setting up the configuration file in Logstash. Having the data in a prepared and easy-to-analyze form, specialists can focus directly on the analysis itself to obtain important information from the log data, rather than spend time structuring the data, thereby significantly reducing the time spent in the analysis process.