Real-time Data Analytics Edge Computing Application for Industry 4.0: The Mahalanobis-Taguchi Approach

.


Introduction
Advances in Internet of Things (IoT), Cyber-Physical Systems (CPS), Cloud Computing, Big Data, and Artificial Intelligence have significantly impacted manufacturing and are driving the fourth industrial revolution, Industry 4.0 [1]- [5].With the development of these Industry 4.0 technologies, the volume of raw data obtained during manufacturing processes is constantly increasing (i.e., Big Data) [6], [7], so many researchers agree that data is becoming a key for optimizing manufacturing processes and improving competitiveness [8], [9].Data acquired across the product life cycle can be converted into manufacturing intelligence and yield positive impacts on all aspects of manufacturing [10].Nevertheless, Big data also increases complexity of processing and analysis of large amount of information in manufacturing systems.Consequently, Big data stimulates the development of data analytics [11], [12], which unfortunately often has to cope with insufficient processing power of existing software applications and personal computers [8].However, when big data is exploited in a right way, the discovery of patterns in raw data enables real-time predictive data analytics [13].Thus, by using predictive models, real-time decisions can be made transforming effectively a reactive manufacturing system into a predictive manufacturing system [8].
Predictive manufacturing systems enable proactive behavior; that is, by using predictive models, they permit to anticipate an error within the manufacturing system before it occurs and to immediately take appropriate actions to avoid errors [14], [15].Developing a predictive model depends largely on the nature of the data collected from manufacturing systems over a period of time, and it is necessary to select a dataset that will not have incomplete, homogeneous, or noise data that disrupts the quality of a defined dataset.Notably, precisely selected heterogenic dataset composed of fault and non-fault data, with emphasis on fault data particularly, is a challenge for industry and academia [16], [17].Up to now, different predictive data analytics methods have been proposed in the literature [18]- [20].The review of the relevant literature showed that the available articles focus on developing predictive models based on large and varied datasets (i.e.consisting of both fault and non-fault data) [19]- [21].On the one hand, Cloud Computing has an important role in Data Analytics process with the accent on Big Data (i.e.Big Data Analytics) as it offers access-based computing infrastructure oriented to subscription, data, and application services [22], [23].On the other hand, the Edge Computing, unlike Cloud Computing, represents the decentralized computing service for storage, processing and applications.It takes place on the network edge and acting as a middle layer between end user and cloud data centers.In that way it increases system responsiveness by reducing the distance that data must travel on the network while producing minimal delays [24].
To eliminate the need for large and varied datasets for development of predictive models in the present research we develop real-time predictive models based on small dataset without faulty data.This is achieved by using Mahalanobis-Taguchi system (MTS) [25] and Edge Computing.Firstly, for development of the predictive models we apply the MTS.The MTS involves the development of predictive models based on small carefully selected nonfault data samples collected from the manufacturing process together with the company experts.Notably, these data samples, differently from existing methods [20], do not require the presence of fault data so predictive models can be developed observing the process that works without errors.Secondly, to enable handling of large amount of data with high responsiveness, in secure environment and at a low cost, we apply Edge Computing solution MELIPC MI5000 developed by Mitsubishi Electric Company.
Further, the review of the relevant literature showed that the majority of the research available in the relevant literature deals with models developed and tested in experimental conditions.The present research looks to fill this gap by testing two developed predictive models in a case company from process industry (i.e.vinyl floor industry sector) with product quality issues.Notably, the case company satisfies highest standards of the World Class Manufacturing and is highly automatized.With these characteristics it presents a great testing ground for the developed predictive models.
The present paper is organized as follows.Section 2 provides an overview of the related work for the real-time data analytics applied in manufacturing systems, the MTS for multivariate data, and Edge Computing for multivariate limited-scale data analysis.Section 3 presents details of the research method, the predictive models development and the testing results of the two developed predictive models.Section 4 discusses the results and compares results of the two developed predictive models.Finally, Section 5 derives some conclusions, summarizes the paper's contributions, and suggests opportunities for the future research.

Real-time data analytics for manufacturing systems
The increasing availability of manufacturing data is changing the way decisions are made in industry regarding predictive maintenance and quality improvement using data analytics methods [26].The implementation of advanced Industry 4.0 technologies, namely CPS and IoT combined with data analytics, can enable predictive manufacturing and networked production environments [8].
Data analytics, as part of the data science field, represents a practice that reveals hidden information among data collected from various devices by using advanced analytics techniques, including expert systems, machine learning, and advanced statistical analysis [27], [28].These analytical techniques further enable real-time decision making in manufacturing systems that use real-time data analytics [12].
Real-time data analytics, as a part of data analytics, refers to analytical techniques where data is processed and analyzed as it is generated, in real time International Journal of Industrial Engineering and Management Vol 11 No 2 (2020) [29] or near to real time [30].Currently, there are few real-time data analytics applications for manufacturing systems [30].The existing architectures have been mainly developed for offline data analysis and are not suitable for real-time data processing and analysis [31].
Among the available applications of real-time data analytics, Zang et al. [32] proposed a real-time production performance analysis and exception diagnosis integration model for manufacturing systems.Their model, consisting essentially of a hierarchical timed colored Petri net model and a decision tree model, achieves real-time production performance and exception information for dynamic and stochastic manufacturing processes.Similarly, Oh et al. [33] focused their research on real-time monitoring of quality and on controlling the system with an integrated cost-effective support vector machine (CESVM) algorithm.In their study, an integrated CESVM was developed and installed for the door trim manufacturing process using a kiosk so real-time quality inspection data could be collected, analyzed, and predicted.Finally, Qian et al. [34] proposed the real-time data-driven framework, Intelligent Collaborative Mechanism (ICM), in order to achieve collaborative and effective interactions among assembly stations and operators.The real-time data-driven ICM for a fixed-position assembly system is based on three models, namely the Petri net model of the assembly workflow, the constraint matrix of tasks, and the partitioned structure of the task pools.Using these models, the ICM was able to monitor the assembly progress and select proper tasks for the further matching process of dynamic scheduling in manufacturing.

Mahalanobis-Taguchi system for multivariate data
A high quality dataset is crucial for developing an effective predictive model.Specifically, for a dataset to be considered high quality, it should consist of samples from periods when the manufacturing system worked without problems, interruptions, and difficulties (non-fault data samples), as well as data from when problems occurred in the system (fault data samples).However, it is very difficult to find a balance between the number of non-fault data samples and the number of fault data samples [20].Moreover, for the non-fault data that comprises the majority of the data collected, it is not possible to use the most common data analytics methods, which require a balanced use of faulty and non-faulty data (e.g., linear regression, support vector machines, artificial neural networks etc.) [17].Additionally, the non-fault data mainly consists of a large number of similar data values that do not provide different types of information about the condition of the manufacturing system.In this case, it is necessary to optimize the collected nonfault dataset in the right way.
The MTS is a relatively new method in the field of diagnosis and prediction using multivariate data [25].The MTS uses the Mahalanobis distance introduced by Mahalanobis [35] in order to distinguish the pattern of an observed group from other groups [36] based on the correlation between data that is maintained with the Mahalanobis distance and various patterns that can be identified and analyzed in relation to the observed dataset [37].
Taguchi [37] extended the use of the Mahalanobis distance by developing the primary methodology of the MTS [38], [39].Therefore, the MTS is Taguchi's hybrid systematic diagnosis and forecasting methodology based on the Mahalanobis distance principle for multivariate data [25].The MTS calculates the distance between observed data and sample data, and quantitatively determines their differences and optimizes the measurement scale of unobserved data [35], [37].
In many scenarios, the MTS is used for monitoring the quality of products and the quality of the manufacturing systems [35].For example, Su and Hsiao [40] investigated the effect of imbalanced training set using the MTS and other classification techniques, namely stepwise discriminate analysis, decision tree analysis, back-propagation neural network, and support vector machines.The results showed that the MTS has the best classification ability and is the most robust classification technique.Additionally, Huang, Hsu, and Liu [41] integrated the MTS and the artificial neural network algorithm to create a novel algorithm that solves pattern-recognition problems and can be applied to construct a model for manufacturing improvements regarding inspections in dynamic environments.Jobi-Taiwo and Cudney [42] used the MTS for extracting information in a multidimensional system and integrating information from different variables into a single composite metric.The method was evaluated in a case study on a multiple fault class of steel plate manufacturing, which indicates the practicality of the method for improving system quality in industrial applications.Using multi-sensor signals and the MTS, Rizal et al. [43] investigated predictive maintenance for detecting cutting tool wear during manufacturing processes, and the results showed that the medium wear and critical wear stages of the cutting tools can be detected in real time.
Wang et al. [44] focused on developing a method called the adaptive Multiclass Mahalanobis-Taguchi system in conjunction with variational mode decomposition and singular value decomposition, which are employed to detect faults based on variable conditions.Finally, Reséndiz-Flores, Navarro-Acosta, and Hernández-Martínez [45] proposed a novel methodology representing a combination of the MTS and a hybrid binary metaheuristic based on particle swarm optimization and gravitational search algorithm for improving product quality.They did so by performing an optimal feature selection to detect the relevant variables in a real foam injection process in the automotive industry.

Edge Computing for multivariate limited -scale data analysis in Industry 4.0
As already stated, the volume of acquired data in Industry 4.0 environment is constantly increasing [8].Due to insufficient processing power [8], big data often cannot be processed and analyzed using the existing software applications and personal computers.Therefore, new technologies (i.e., Cloud, Fog, and Edge Computing) use advanced data analytics techniques to detect hidden information.To overcome the problems of processing and analyzing large amounts of data, various approaches and system architectures have been proposed in the research literature [24], [46]- [48].However, a large amount of data does not guarantee the good quality datasets required for data analysis.On the contrary, a carefully selected limited-scale dataset that reflects the real state of the manufacturing system can provide a lot of valuable information if the right technologies for the data analysis are selected.The most appropriate technology for limited-scale data analysis is Edge Computing which is mainly used for data storage, processing, and applications that take place on the network's edge [24].Edge Computing optimizes different technologies, namely Cloud Computing, by performing data analytics as close to the data sources as possible [23].Thus, we argue that data analytics should begin during manufacturing, using an Edge Computing solution and striving for real time predictive data analytics in order to minimize network traffic [49], increase data security [50], [51], speed up data processing [52] and reduce costs regarding data analysis transfer in comparison to Cloud Computing [53].
Cases of Edge Computing use for data analytics are not numerous in the literature, especially for limited-scale data.For example, Qian et al. [52] proposed an Edge Computing framework for fault diagnosis and dynamic control of rotating machines and deployed it on a designed edge computing node based on small amounts of data that have machine fault values.Their framework is used for real-time diagnosis of multiple types of electrical and mechanical faults by fusing multiple sensor data.Similarly, Forkan et al. [54] presented an architecture for implementing edge computing in Industrial IoT applications to perform real-time data analysis and data mining of the local database based on limited-scale data collected from a customized candy production line.The experiment showed that the self-organized scheduling mechanism of Edge Computing provides obvious advantages in terms of bandwidth optimization compared to traditional approaches.Finally, Vater, Harscheidt, and Knoll [55] proposed IT architecture based on a combination of Edge and Cloud Computing for prescriptive automation that enables network-based interoperable process control.Their architecture offers the possibility of comprehensive data processing for continuous increases in manufacturing process productivity where real-time data analysis is not possible.

Research method
In the present research we opted for application of MTS for development of the predictive models.This is because the MTS has proved itself as an effective approach for fault detection, diagnosis, and data classification in the context of improving product quality and the quality of manufacturing processes when applied to multivariate datasets in manufacturing environments [56].These characteristics have led to great interest from both industry and researchers for MTS.
The present research applies a research method based on the data mining approach [57]-[59] -Figure 1.The research method, used for development of the predictive models, contains six phases which result in application the MTS for real-time data analytics in Industry 4.0.We develop two predictive models (Figure 1), namely 1) a random parameter model (RPM); and 2) a parameter configuration model (PCM) (Figure 1).The RPM represents a predictive model developed from randomly collected samples with random process parameters, while the PCM represents a predictive model developed for a group of products with a defined parameter configuration.

Development and testing of the predictive models: the performed steps
We apply the proposed six-phase method in a process industry conditions in a company from vinyl floor sector.First, we developed the RPM and the PCM models.Subsequently we tested them.Finally, we assessed and compared the performances of these two models.Hereafter we describe the six phases of the method and exemplify them, when needed, with what we performed in the case company.
Phase 1: Problem definition -This phase has three parts (Figure 1): determining the manufacturing system characteristics; defining the type of problem in the manufacturing system; and specifying the location of the problem.
Phase 2: Data identification -Based on the defined problem (Phase 1), in this phase, the data that affect the onset of the problem in the manufacturing system are identified and collected (Figure 1).Data identification is performed in three steps: specification of data types, identification of influential parameters, and checking the availability of influential parameters.Influential parameters represent all process parameters that have a significant impact on the occurrence of product quality issues.As a result the influential parameters are defined.
Phase 3: Production data collection -In this phase, two different datasets are collected for the two developed predictive models (Figure 1).The first dataset, used for the RPM predictive model, represents the data collected continuously for 24 h for random parameter samples.The second dataset, used for the PCM predictive model, represents data collected intermittently for 12 days in which the data was collected for a group of nine products that share the same parameter configuration for the observed part of the production line (i.e., Coating 1, Coating 2, and Printing).
Phase 4: Data pre-processing -The dataset optimization based on multivariate analysis (MVA) was done in this phase (Figure 1).The MVA approach is defined based on the range option, where the defined number of collected rows of data, called samples (exactly 120 samples), is grouped into single data for each set of collected data.According to this methodology, the final number of samples used for further analysis for the first dataset is 710, and for the second dataset is 815 samples.The number of influential parameters differs in the datasets because of constant values for certain parameters, so the number of influential parameters for the first dataset is 63, while the number of influential parameters for the second dataset is 62.
Phase 5: Data processing -In this phase, the pre-processed data is processed using the MTS for each set of collected data separately (Figure 1).As a result, two different predictive models are developed for early detection of quality issues based on principles of mutual distance in the reference data, that is, the Mahalanobis distance.According to Peng et al. [60], the Mahalanobis distance in data values that are characterized as non-fault is less than 2.5, but in certain cases, values over 4 are defined as non-fault [60].
Phase 6: Real-time model testing -In this phase, two developed predictive models are tested in real time in manufacturing environment conditions (Figure 1).The two models are tested in a process industry company from the vinyl floor sector for a defined period of 15 days.Real-time model testing was performed exclusively for the products with defined parameter configuration.As a result of this phase, the performances of the two developed predictive models are summarized and compared.

Testing of the developed predictive models: context information and results
The developed predictive models were tested in a vinyl floors process industry company.The observed production line consists of 12 machines, and the main problems are product quality issues.The quality issues include wrinkles on the final product, low paint finish quality, stains on the final product, and so on.The company is not currently able to detect root causes of these quality issues.
In order to address the need to prevent these quality issues, an MELIPC MI5000 edge computing solution developed by Mitsubishi Electric Company was implemented in the company.According to the manufacturer, the MELIPC MI5000 can run two operating systems at the same time, namely VxWorks and Windows.VxWorks is used for device control and data collection, while Windows displays analysis results for collected data, allowing superior processing.The pre-installed software enables easier collection of production data.Installing additional software allows easier collection of third-party company production data, allowing off-line model development, real-time data processing, and monitoring of production processes on the shop floor using functions provided at the edge computing level.Finally, the most important characteristic of this Edge Computing solution is its ability to support machine self-configuration by providing corresponding feedback.
The present research is defined as a pilot study for early detection of product quality issues in a vinyl floors production company.The research has been limited to the first three machines in the production line (i.e.Coating 1, Coating 2, and Printing) due to the limited resources of Edge Computing for real-time model testing and the great complexity of the production line in the manufacturing system.Accordingly, production data was generated and collected on these three machines.Thus, the occurrence of quality issues is limited to the three machines observed.The two predictive models (RPM and PCM) were developed and tested using the MTS approach according to the six-phase method presented in Figure 1.A comparison of the performances of the two developed predictive models is presented in Table 1, which provides the main results for each phase of the proposed and applied method.

Discussion
At the beginning of the research, the review of the relevant literature showed that the available articles focus on developing predictive models based on large and varied datasets (i.e.consisting of both fault and non-fault data) [19]- [21].However, the needed size of the dataset and the lack of faulty data can both represent a problem for development of predictive models and implementation of predictive manufacturing systems in industry.Thus, in order to respond to this gap, in the present research we focused on development of predictive models based on small dataset for fault detection with lack of fault data samples.Specifically, we proposed development of the predictive models based on MTS (Figure 1) that enables working on non-faulty data.Further, we used Edge Computing (designed to process small datasets) to increase MTS responsiveness, provide security and decrease costs [24].Subsequently, we developed and operationalized two predictive models using small sets of carefully selected data.
The main difference between the two developed predictive models is the methodology for data sampling that is used for creating the optimal dataset to develop a predictive model.The data sampling methodology is adapted for process industry conditions for real-time fault detection.The dataset for developing the RPM predictive model consists of random parameter configurations sampled over 24 h, while the dataset for developing the PCM model consists of data for a group of products with defined parameter configuration collected intermittently for 12 days.
In the data processing phase (Phase 5, Figure 1), both models demonstrated high accuracy.The PCM model, with an accuracy of 98.04%, slightly outperformed the RPM model's 97.89% (Table 1).Furthermore, the final testing of the predictive models (i.e., Phase 6. Real-time model testing, Figure 1) was performed in an industrial environment in real time using the Edge Computing solution MELIPC MI5000.The system was designed to give alarms when there are changes in the process parameters that signal the occurrence of a product quality issue.During the real-time testing of both models, only the products pertaining to the defined parameter configuration were used because the PCM model was developed exclusively for defined parameter configuration, while the goal with the RPM model was to understand whether it could be universally applicable to parameter configurations in the production system.
The PCM model outperformed the RPM model regarding the total accuracy of the model by almost 10% during real-time model testing (Table 1).However, the RPM model correctly predicted 5 faults out of 7 (real number of faults provided by company experts) regarding product quality issues, giving better recall performances compared to the PCM model, which correctly predicted 4 faults.Comparing the number of false alarms, even though the number of false alarms was very high in both cases, the RPM model produced almost twice as many alarms as the PCM model, which again gives the advantage to the PCM model.
Deepening the issue of false alarms, while discussing the results with the company's experts, it was concluded that besides faults regarding product quality issues, there were also so-called "system faults" present in the system.These system faults appear, for example, at the start of production of a new batch while the machine calibration is done.Both models predicted not only product quality issues but also system faults, which were originally classified as false alarms.These system faults were left to be remedied in the future and are beyond the scope of the present research.

Conclusions
The case company in which the two developed predictive models were tested is from the vinyl floor process industry.For the case company's process, we developed two predictive models based on different sampling for dataset creation, namely 1) random parameter sampling and 2) parameter configuration sampling.Subsequently, the two models were applied in industry conditions, and the performances of these two models were assessed and compared.Both  models demonstrate high accuracy and applicability to a large number of different products benefiting the company that implements them in two ways: 1) the manufacturing errors are anticipated before they occur, and 2) new hidden relationships among data influencing the occurrence of the errors in production are discovered.
Based on the results obtained and discussed, the parameter configuration model (PCM) outperformed the random parameter model (RPM) in the real-time testing phase in an industrial environment using Edge Computing.This is an indication that the PCM model has strong potential for application in the process industry, with high accuracy and usability of the model based on the small dataset.Notably, PCM model was subsequently applied as a pilot project in the case company since it was recognized by the company top management as a viable way of moving towards digital transformation based on Edge Computing, advanced statistics and artificial intelligence techniques.Moreover, the application of the developed PCM model demonstrated possibilities to immediately eliminate great amount of quality issues on the produced products.The developed PCM model in an Edge Computing environment can provide signal information (alarms) if a critical state is nearly reached for faults regarding product quality issues and system faults.This signal can also be sent to PLCs in order to autonomously make changes inside the manufacturing system and achieve machine self-configuration by changing the parameter configuration.
Although the PCM outperformed the RPM in the real-time testing phase, the limitation of this predictive model is that it is not universal.Specifically, the PCM provides strong performances if it is used for preventing quality issues in products that are produced with a defined parameter configuration.Thus, products that do not fall into the exact parameter configuration cannot use this predictive model.For products that have other parameter configurations, a new PCM should be developed.
It can be argued that the RPM could come close to the performance of the PCM in some specific cases.However, we expect that the PCM will significantly outperform the RPM in a great majority of applications in the process industry.Thus, a recommendation for the process manufacturer would be to develop the necessary number of PCMs to cover the entire product assortment.Accordingly, newly developed products should be fitted in one of the existing PCMs or a new PCM should be developed for new products in order to obtain good results in real-time fault detection.
The next possible step in the research could be to identify all parameter configurations and to develop PCM models for all present parameter configurations in the vinyl floor process industry sector case company.Testing these models on part of the production line and then extending this testing to the rest of the production line machines is a plausible and meaningful continuation of the present research in the future.

# 2 identification 1 .
Constant reduction in batch sizes and increase of product variety 2. Type of problem definitio # Product quality classification p oblem 3. Specifying the location of problem occurrence # The beginning of the production line consisting of three machines: Coating 1, Coating 2, and Printing Data Specification of data type # Real time production data generated from three machines (Coating 1, Coating 2, and Printing) in the manufacturing system 2. Identification of influential p ameters # Influential pa ameters are identified based on experts' kn wledge # The number of identified pa ameters is 65 in total 3. Availability of influential pa ameters # All influential pa ameters are generated by sensors implemented on the production line confirming their vailability 3 Production data collection

Table 1 .
Comparison of the characteristics and performances achieved by the two developed predictive models # Characteristic/performance are equal for both predictive models • Characteristic/performance are diffe ent for each predictive model International Journal of Industrial Engineering and Management Vol 11 No 2 (2020)