Analysing Data with the R Programming Language to Control Machine Operation

This article describes the benefits offered by the analysis of data from production processes. With the correct processing, such data allows issues to be identified both within the analysed process and in the ways machines are used. The paper presents an initial analysis of data from the fragmentation process in a hard coal mine using longwall shearers. The analysis is described using R programming language functions.


Introduction
For many years now, effective management of any company has relied more and more extensively on quantitative methods. Analyses of large amounts of data, modelling, and simulations are becoming the basis for informed decision-making like in [3,4,5,6]. Such calculations have become widespread and easy to make as a result of the tremendous advancement of computer technologies.
By identifying issues in their production processes, companies can reduce production costs through avoiding long downtimes, reducing repair costs, enforcing and maintaining discipline among employees, and optimising machine uptime to ultimately enhance production processes. Appropriate lessons learned from the analysis of production-process data can also be the basis for the adjustment of machine operation procedures and troubleshooting.
IT personnel, who usually are well-versed in the details of individual processes, can develop software to control and analyse machines' and people' work. With contemporary programming languages, it is possible to quickly create applications that offer such functionalities, and if they use server databases which store information also from outside the controlled area, this facilitates a more general analysis of correlations between issues identified in production processes. In-depth analysis of production processes and the knowledge it offers can provide a company with competitive advantage, which is why businesses should employ advanced computing technologies to optimise their processes. [7] The R programming language, which has been designed to support statistical computations, has recently been upgraded to handle data exploration. Examples of such computations were presented by [9].
Below you will see how the operation of a longwall shearer can be analysed using the R programming language by examining the electrical current drawn by its motors. This analysis is part of a larger system developed to support decision-making on the basis of an expert system. For more information on the system, see [1,2].

R programming language
R is an R software environment programming language and is designed mainly for the purposes of statistical computations. It is available under the GNU General Public Licence, so it is completely free of charge. Furthermore, as an open-source project, it is continuously expanded to include libraries of new functions available on the Internet as packages. Many users of this language claim that it can be used to develop any program, but what is of greatest benefit for analysts is that it supports modelling, testing, classification, clustering, time-series analysis, etc. What is also important to note are its capabilities related to the production of a wide range of high-quality graphs.
Another asset of the R software environment is its opensource package repository (Comprehensive R Archive Network, CRAN). It is a collection of libraries created by users from all around the world and includes relevant documentation. Commercial equivalents of such packages can be worth at least tens of thousands of zlotys. Anyone can upload their library to that repository to make it available to other people. With interfaces to many other programming languages, code for such additional libraries can be written not only in R, but also in Java, C or Fortran. [10] R can be used to organise data in various structures, the simplest of which is a vector, i.e. a series of numbers arranged in a specific order. There are no simpler structures in R, so a single number is a vector with a length of 1. [11]

An analysis example
The example provided below shows the basic analysis of the operation of a longwall shearer on the basis of the current drawn by its four motors. A similar analysis, carried out on the basis of a spreadsheet and the VBA language, was provided by [8]. Shearer motors are responsible for its movement, coal-cutting action, and power-cable handling. Their symbols are presented in Table 1 below.
The electronic system on board of the shearer measures the current drawn at each of the four sensors every second. Then these data are immediately communicated to the database, from which it can be analysed with a negligible delay. The data are communicated as three key-value pairs, including motor number, reading date and time, and current. In order to avoid unnecessary data recording during shearer downtime, a solution was adopted to communicate data only if the current changed between 1-second intervals. This helped reduce the number of recorded data. When a motor is turned off, its current is zero and is recorded as such in the database with its timestamp. The next piece of data will be recorded for a time when the measurement is greater than zero, so that no data is recorded for motor downtime. Any downtime can be computed on the basis of the difference between consecutive timestamps. To analyse machine operation, an additional algorithm has to be used to identify and parse any gaps in changes to current values.

Importing data to the R programming language
Originally, measurement data is stored as csv (comma separated values) files, in which, as its name suggests, the data has the form of rows with values separated by commas.
In the R programming language, such data can be imported using the function read.csv: data <-read.csv(file="prady1.csv",sep=";", head-er=TRUE, dec=".") This imports the data from the csv file into computer's operating memory as a data frame named "data".

Data decompression
Data can only be analysed after they are converted back into their original form. This is achieved through a function designed to read each row from the data array and complete it with subsequent rows containing timestamps (at one-second intervals) and the last recorded current. This action is performed as many times as indicated by the time between the consecutive rows of compressed data. This cycle is repeated for the time interval specified in the function parameters. The other parameters of the function, in addition to the start and end times, include data frame name and the name of the motor for which the data is to be decompressed. The signature of this function is presented below with comments.   The resulting data frame d contains data about the current drawn by motor 2992 every second.

Analysis of electrical current
The R programming language makes it possible to create box plots, which can be used to identify the basic characteristics of data. The lower side of the rectangle corresponds to the first data quartile (Q1), the line inside the rectangle corresponds to the median (Q2), and the upper side of the rectangle is the third quartile (Q3). Below and above the rectangle there are whiskers, which indicate the limits beyond which outliers can be identified. The ends of the whiskers are determined as follows on the basis of the interquartile range (IQR): start of the upper whisker: end of the upper whisker:

W2=Q3 + 1.5 * IQR,
where IQR = Q3 -Q1 If the data contains values lower than W1, or greater than W2, these are treated as outliers and should not be taken into account during the development of the model, but, on the other hand, should be analysed to explain them. Fig. 3 shows a box plot for the distribution of recorded current drawn by machine's motors. This data visualisation accounts only for values greater than zero.
The graph was created in the R programming language, using the ggplot function.
ggplot(data=filter(data2,electr_curr>0))+geom_boxplot(mapping=aes(y=electr_curr, x=motor_no., group=motor_no)) Electrical current outliers can be observed (as individual points above the upper whisker) for the motor that powers the coal-cutting component (2992). Since such large current can cause damage, it is important to analyse what situations caused these surges to occur.
To identify outliers, the global dataset (exclusive of zeroes) needs to be processed to identify any data with current greater than W2. In the R programming language, this can be done as follows: 'filtering the data to identify current values greater than zero data2992gz <-filter(data2992,electr_curr>0) 'calculating W2 W2 <-quantile(data2992gz$electr_curr,0.75)+1.5*IQR(-data2992gz$electr_curr) 'filtering the data to identify current values greater than W2 data2992ggw <-filter(data2992,electr_curr>W2) As you can see in the graph (Fig. 1), the continuity of current values is disrupted as values reach around 200A. Therefore, for the purposes of further analysis, time data for values greater than 200A was identified.
t <-select(filter(data2992,electr_curr>200), time) The identified times are the basis for finding in the primary data the currents at times immediately (by a second) preceding values greater than 200A.
filter(data2992,time %in% (t[,1]-1), electr_curr>0) The resulting dataset contains only the data where current is equal to zero. This means that the data which were identified during the previous analysis as outliers are recorded when motor starts.
As a result, the electrical current during motor start was examined. This allowed the identification of the times when current at time t was equal to zero, but at time t+1 was greater than zero: and then to identify current at times t+1 starts <-filter(data2992,time %in% (k[,1]+1), electr_ curr>0) The resulting dataset was the basis for the creation of a histogram (Fig. 2) which presents the distribution of current ggplot(data=starts)+geom_histogram(mapping=aes(x-=starts$electr_curr), binwidth=25) The histogram presented in Fig. 2 shows two groups of data. The current values observed during motor start can be grouped into two ranges -up to around 150A and above 300A. There are also single data items between these two ranges, but these are marginal cases. Such distribution of data suggests two types of motor starts -with and without a load.
• softstart -without a load, is characterised by low motor current values, • hardstart -with a load, characterised by high motor current values, which makes them undesirable, as they cause a significant increase in machine wear and failure rate. [Kesek, Zagórska, 2015]

Summary
Under current economic conditions, the accurate assessment of production processes is vital. It can help maximise machine use, reduce downtime, bring down repair costs, and, consequently, lower production costs and increase revenue by boosting production over time. With the right computer tools to analyse database input and process such data, production processes can deliver valuable knowledge.
While initially perhaps incomprehensible, the ways to use R programming language functions described in the article quickly become clear and useful, and the opportunities for automating such computations prove instrumental. What is also great about this language is its graph-generating capability, which can be tailored to one's needs with additional commands. Another important functionality of R is its communication with databases through the ODBC interface, which makes it useful for analytical purposes in complex database systems.
The performed analysis produced a dataset with critical cases where electrical current during motor start was high. These findings can be used to individually examine each case identified during the analysed time period. This paper was supported by AGH University of Science and Technology [nr 11.11.100.693].