Data analytics in quality 4.0: literature review and future research directions

ABSTRACT The quality level in manufacturing processes increasingly concerns manufacturing firms, as they respond to pressures such as increasing complexity and variety of products, more complex value chains and shortened time-to-market. Quality management is becoming increasingly challenging as model variety, and highly complex products harbour the danger of distributing defective products in the market. Data analytics has started gathering the interest of quality researchers and practitioners, who investigate approaches, algorithms, and methods for supporting the manufacturing quality procedures in the context of Industry 4.0. This trend is facilitated by the wide expansion of sensory technology and the accelerated adoption of information systems by the manufacturing firms. Since quality and process control has been identified as one of the major challenges with a high potential of big data analytics, in this paper we investigated the manufacturing quality research field from a data analytics perspective. Specifically, we examined the existing literature, we provided clarity to the Quality 4.0 research field, we synthesized the literature review outcomes, and we identified the research gaps and challenges. On top of them, we proposed directions for future research.


Introduction
Innovation, model variety, and highly complex products addressing the needs and wishes of sophisticated customers harbour the danger of distributing defective products in the market (Nalbach et al. 2018;Schmitt et al. 2020). In consequence of these developments, the number of recalls continues to reach new levels, entailing massive image losses (Deloitte 2016). As they respond to pressures such as increasing complexity and variety of products, more complex value chains and shortened time-tomarket, manufacturing firms are increasingly concerned with the quality level in manufacturing processes (Köksal, Batmaz, and Testik 2011;Psarommatis et al. 2020a).
ISO 9000:2015, Quality management systems, fundamentals and vocabulary standard (ISO 2000) defines quality management as management with regard to quality; quality assurance as part of quality management focused on providing confidence that quality requirements will be fulfilled and quality control as part of quality management focused on fulfilling quality requirements. A Quality Management System (QMS) is a part of a management system regarding quality, and it comprises activities by which the organization identifies its objectives and determines the processes and resources required to achieve desired results. What we recognize as today's quality profession began during the middle of the second industrial revolution (Radziwill 2018). As quality gained more and more importance over time, many quality concepts and methods have emerged. Quality models, approaches, and practices have evolved from inspection to quality control, quality assurance, quality management, and business excellence. Several models, frameworks, and tools have been developed to support organizations in managing and improving quality in all activity sectors. These include the Quality Tools and methods (such as 100% inspection and Statistical Quality Control), the ISO 9001 Quality Management International Standards, continual improvement methodologies such as Lean, Six Sigma, or Lean Six Sigma, the teaching of quality gurus such as Juran, Crosby, Deming, or Taguchi, and the business excellence models, namely, the EFQM (Europe), the MBNA (EUA), or the Deming (Japan) models or awards (Fonseca and Domingues 2018;Köksal, Batmaz, and Testik 2011).
Despite the significance of quality management and quality assurance in modern manufacturing companies (Gunasekaran, Subramanian, and Ngai 2019) that affect the whole product lifecycle, i.e. the design, the manufacturing, and the service stages (Pal, Franciosa, and Ceglarek 2014), existing methods and tools rely largely on the experience and subjective judgment of the people involved, which makes them time-consuming and unreliable (Nalbach et al. 2018). Therefore, too often, quality engineers make their decisions by using only intuition and/or qualitative assessments (Zonnenshain and Kenett 2020). Most existing quality methods have not been adapted to cope with the data-intensive modern manufacturing environment and thus, have lost their effectiveness (Bai et al. 2017;Psarommatis et al. 2020b). The emergence of IoT and the increasing use of sensors in the shopfloor for monitoring the manufacturing process and the machines has been providing huge amounts of data that can be utilized, among others, in the context of quality management.
In the context of Industry 4.0, Quality 4.0 has emerged as the combination of quality management and improvement models and approaches with technology to foster critical competencies and factors for organizational success (Sader, Husti, and Daroczi 2021). Similarly, Total Quality Management 4.0 refers to the ecosystem that supports the integration between technology, quality and people, which results from the adaptation of quality management to the technologies of I4.0, in the industrial scenario (de Souza et al. 2021). While Industry 4.0 is more technology-centric and quality is customer-centric but using technology as an enabler, both approaches aim for improved performance and results. Product and process quality is required to allow Industry 4.0 to improve flexibility and productivity fully. Conversely, intelligent sensors, automation, and big data can provide data for Quality Management Systems and business excellence models and support Statistical Process Control (SPC) or Six Sigma at the process level (Fonseca, Amaral, and Oliveira 2021). Fonseca, Amaral, and Oliveira (2021) point out that Quality 4.0, by combining quality management with digitalization and technology, provide a management and process dimension to the digital transformation technology driver and support the successful I4.0 adoption. The quality management body of knowledge, encompassing models, systems, techniques, tools and extensive application experience, can support the planning, implementation, and improvement of Industry 4.0 processes.
Key enablers of Quality 4.0 are data analytics, models and algorithms for the analysis of data sets, for example, industrial process, asset and product data (Tsai et al. 2015). With data analytics, Quality 4.0 is meant to be proactive: adverse effects of quality flaws have to be prevented before they become relevant in the actual use of a product (Kupper et al. 2019;Nalbach et al. 2018;Berger et al. 2018). Proactivity enables to measure and predict the quality of systems and products far in advance of those provided by traditional preventive approaches (Köksal, Batmaz, and Testik 2011;Deloitte 2016;Eger et al. 2018a;Psarommatis et al. 2020b;Zonnenshain and Kenett 2020;Bousdekis et al. 2018a;Bousdekis, Apostolou, and Mentzas 2020). Moreover, the increasing variety and complexity of manufacturing processes as well as the low-quality, low-quantity and poor-suitability of manufacturing data pose significant challenges to the defect's identification and prediction (Chongwatpol 2015;Gittler et al. 2019). Data analytics, often enabled by machine learning algorithms, has being among the main emerging proactivity enablers (Chiarini 2020;Zonnenshain and Kenett 2020;Dhamija and Bag 2020). While the recent state of research contains several literature reviews on general applications of ML in manufacturing (Wuest et al. 2016;Lee, Shin, and Realff 2018b;Belhadi et al. 2019;Vater, Harscheidt, and Knoll 2019;Diez-Olivan et al. 2019), reviews with focus on quality-related applications are rarely found (Schmitt et al. 2020). Quality and process control has been identified as one of the major challenges with a high potential of big data analytics (Belhadi et al. 2019).
The objectives of this paper are (i) to investigate the existing literature regarding data analytics and machine learning in manufacturing quality operations; (ii) to provide clarity on the research field of Quality 4.0; (iii) to synthesize the literature review in order to identify the existing research challenges; and, (iv) to outline directions for future research. The rest of the paper is organized as follows: Section 2 presents the methodology of the literature review. Its three main steps are addressed in the subsequent sections. Section 3 frames the literature review by defining its scope and review protocol, by identifying related literature reviews, and by presenting a bibliometric analysis. Section 4 analyses the reviewed research works by providing a classification, analysis, and synthesis of these works. Section 5 discusses the resulting research gaps and challenges, and outlines the directions for future work. Section 6 concludes the paper.

Literature review methodology
In this Section, we outline the methodology of the literature review which is based on the methodology of Tranfield, Denyer, and Smart (2003), largely used in literature reviews for data analytics and information systems (Nguyen et al. 2018;Barbosa et al. 2018;Duan, Edwards, and Dwivedi 2019;Koivisto and Hamari 2019). The methodology is presented in Figure 1.
The adopted methodology consists of three main steps: • Framing the literature review: This step, presented in Section 3, aims at posing the objectives and the research questions of the literature review. It consists of the scope definition (Section 3.1), the identification of related literature reviews (Section 3.2), the review protocol (Section 3.3) and a bibliometric analysis (Section 3.4). • Review results: This step, presented in Section 4, aims at analysing the reviewed papers in order to identify the current status on the development of data analytics algorithms in quality management. It consists of the classification of the reviewed papers (Section 4.1), their analysis in terms of the approaches and methods used for quality data analytics (Section 4.2), and their synthesis (Section 4.3). • Research gaps, challenges, and future directions: This step, presented in Section 5, summarizes and discusses the main conclusions of the literature review with a focus on the identified research gaps and challenges which form the directions for future research.

Framing the literature review
In this Section, we present the frame of our literature review. We define the scope, we identify related literature reviews, we present the review protocol, and we present a bibliometric analysis.

Scope definition
Quality management has a systems perspective that encompasses the two prominent views: productoriented and process-oriented quality (Psarommatis et al. 2020a). Product-oriented quality studies the defects on the actual parts and tries to find a solution, while process-oriented quality studies the defects of the manufacturing equipment, and based on those can evaluate whether the manufactured products are good or not. The latter lays within the predictive maintenance concept, which has been extensively explored in the literature in the frame of the recent IoT advancements (Lindström et al. 2020;Psarommatis et al. 2020); see e.g. (Bousdekis et al. 2018b;Carvalho et al. 2019;Zonta et al. 2020). On the other hand, product-oriented quality is mainly governed by manual knowledge-based processes or by statistical quantitative techniques such as Statistical Process Control (SPC) and Theory of Constraints (TOC) (Chongwatpol 2015;Belhadi et al. 2019).

Related literature reviews
The first attempts of exploiting data analytics algorithms involved data mining techniques (Köksal, Batmaz, and Testik 2011). The authors performed a review on data mining applications for manufacturing quality improvement. Since then, the related literature has been evolved at a slow pace, while the focus has been on the use of ICT advancements from a strategic point of view. It was only recently that the need of framing quality in the data-rich context of Industry 4.0 was arisen. Psarommatis et al. (2020a) provided a literature review on Zero Defect Manufacturing (ZDM) and identified four distinctive strategies, i.e. detection, repair, prediction, and prevention. Zonnenshain and Kenett (2020) reviewed research trends on quality engineering and Industry 4.0, and proposed future research directions for quality and reliability engineering. Psarommatis et al. (2020b) performed a literature review on the implementation of Lean Manufacturing, Six Sigma, Lean Six Sigma, Total Quality Management, and Theory of Constraints philosophies and compared them with ZDM philosophy. Chiarini (2020) analysed the literature in terms of relationships between Industry 4.0, quality management and TQM in order to identify the most relevant aspects referring to Quality 4.0. Four categories of topics emerged, namely, creating value within the company through quality (big) data, analytics; developing Quality 4.0 skills; customer value co-creation; cyber -physical systems and ERP for quality assurance and control. Sony, Antony, and Douglas (2020) performed a literature review in order to reveal the key ingredients for the effective implementation of Quality 4.0. The authors concluded with the following ones: handling big data, improving prescriptive analytics, using Quality 4.0 for effective vertical, horizontal and end-to-end integration, using Quality 4.0 for strategic advantage, leadership, training, organizational culture, and top management support.
The aforementioned literature reviews are summarized in Table 1. Table 2 presents the review protocol. We limited our search space to include only journals, books and conference publications. We exclude any grey literature like white papers and blog posts because their quality may vary and can affect the validity of our results. The keywords were used in various combinations among them for the different scientific databases. The reviewed papers were selected according to whether they contribute to the development and application of data analytics methods and algorithms in the manufacturing quality domain. In this sense, the resulting number of papers does not include research works with qualitative and subjective quality tools that have not been adapted to the data-intensive modern manufacturing environment. Therefore, we concluded to 48 papers to be reviewed.

Bibliometric analysis
In this Section, we provide a bibliometric analysis of the reviewed papers. Figure 2 depicts the amount of the identified published papers on quality data analytics per year. It is obvious that, during the last years, there is an increasing trend on the development and implementation of data-driven methods and algorithms for quality management.
Moreover, Table 3 presents the journals including related papers along with the publisher and the number of papers identified in each one. The journals including more than one related paper are Journal of Intelligent Manufacturing, International Journal of Production Research, and Expert systems with Applications. Table 4 presents the conferences including related papers along with the proceedings publisher and the number of papers identified in each one. The conferences including more than one related paper are CIRP Conference on Manufacturing Systems (CMS), International Conference on Flexible Automation and Intelligent Manufacturing (FAIM), CIRP Global Web Conference, and IEEE international conference on industrial informatics (INDIN). It should be noted that very few papers are derived from quality-related journals, since they have strongly focused on extensions, improvements, and applications of traditional and manual approaches, relying on human judgement. Data analytics algorithms are usually out of the scope of these journals. In order to further analyse the selected papers, we used the VOSviewer tool which constructs and visualizes bibliometric networks. Figure 3 depicts the most influential authors in the portfolio of the reviewed papers, while Figure 4 depicts the main themes mentioned throughout the articles.

Review results
In this Section, we present the results of our review. We classify the reviewed papers, we discuss their proposed approaches and methods, and we create a taxonomy of methods that have been used in quality data analytics.

Classification of reviewed papers
We classified the reviewed papers in five dimensions: type of contribution, manufacturing lifecycle stage, level of intelligence, data & information sources, and industry. Each dimension is presented in the following sub-sections.

Type of contributions
We classified the reviewed papers according to the type of their contribution: conceptual framework, system design and development, algorithm, and comparison of methods. Table 5 presents this classification.

Manufacturing lifecycle stage
We classified the reviewed papers according to the manufacturing lifecycle stage which they refer to. The literature review focuses on the manufacturing phase of product lifecycle modelling and not on the design and the service phases. To this end, we provide a zoom-in to the manufacturing phase, which was separated in three stages, as depicted in Figure 5: process configuration, in-process quality, and quality control. Table 6 presents this classification by also presenting the respective papers. Most of the research works deal with in-process quality, i.e. during the operation of the production line. Process configuration mainly relies on manual processes that are executed based on the expert knowledge due to the high complexity, variety and criticality of the decisions that need to be taken. Therefore, data analytics algorithms in process configuration have not been used to a wide extent. Finally, we found a limited amount of papers dealing with the quality control stage, three of them proposing a conceptual framework. Most of these papers use the same dataset. The literature review revealed that, despite the expansion of data analytics algorithms, that is facilitated by the increasing amounts of data, the quality control stage is still governed by traditional and manual approaches, such as Six Sigma and Lean Manufacturing. Even some quantitative methodologies, such as Theory of Constraints and Statistical Process Control have limitations in incorporating and analyzing data from a variety of data sources in the dynamic and complex manufacturing environment.

Level of intelligence
Data analytics can be categorized into three main stages characterized by different levels of difficulty, value, and intelligence (Lepenioti et al. 2020): (i) descriptive analytics, answering the questions 'What has happened?', 'Why did it happen?', but also 'What is happening now?' (mainly in a streaming context); (ii) predictive analytics, answering the questions 'What will happen?' and 'Why will it happen?' in the future; (iii) prescriptive analytics, answering the questions 'What should I do?' and 'Why should I do it?'. We classified the reviewed papers according to the level of intelligence as derived from the level of data analytics maturity, i.e. descriptive, predictive, and prescriptive analytics. As depicted in Figure 6, each stage is a prerequisite for the next one in order to reach the desired level of intelligence. It is pointed out that data pre-processing is a prerequisite step for transforming the raw data in a format capable of being further processed by the data analytics algorithms (Level 0). Depending on the data format and structure, as well as the implemented algorithms, the complexity of data pre-processing varies Table 7 presents the classification of the reviewed papers according to the level of intelligence they reach.

Data and information sources
We classified the reviewed papers according to the data and information source. We identified the following sources: manufacturing sensors, environmental sensors, product tracking technology, video & image, enterprise & operational systems, and process knowledge and specification. The classification is presented in Table 8. Most of the research works utilize a combination of at least two types of data sources.

Industry
We classified the reviewed papers according to the industry to which they were applied. This classification is presented in Table 9. Steel industry gathers the larger number of quality applications. Automotive, domestic appliances, and electronics industries have also gathered a significant research interest.

Analysis of reviewed papers
For each paper, we identified the level of intelligence per manufacturing information lifecycle stage, as shown in Table 10. Overall, the majority of research works deal with in-process quality algorithms, systems and approaches, where they rely on descriptive analytics (Level 1) and on predictive analytics (Level 2). Some research works reach Level 3 (prescriptive analytics), although only two of them (He et al. 2017;Lindström et al. 2020) propose new algorithms; the rest are conceptual frameworks. Process configuration is mainly at Level 1, while we found 1 paper (Kim and Ryu 2020) reaching Level 2. In Level 3, we found 2 conceptual research works (Psarommatis et al. 2020a;Zonnenshain and Kenett 2020). As already mentioned, quality management is still governed by manual and statistical approaches, and it has benefitted the least from the expansion of data analytics methods and technologies. We found only 2 research works at Level 2. Although the potential of data analytics to quality control processes has been recently identified in the literature (Psarommatis et al. 2020a;Zonnenshain and Kenett 2020), the development of related methods, algorithms and systems is at its early stages. Figure 7 depicts the data sources that are exploited in the reviewed papers per each level of intelligence.
Descriptive analytics drew the most attention of researchers.
However, descriptive analytics approaches rely to a large extend to process knowledge, thus proposing ad-hoc and domain-specific solutions, not easily extensible to different industries and manufacturing processes. On the other hand, predictive analytics algorithms take advantage of the available data to a larger extent. Prescriptive analytics is the least explored area.

Conceptual frameworks
It was only recently that the literature started to frame quality management in the recent advancements of Industry 4.0, and big data from a conceptual point of view. Psarommatis and Kiritsis (2018) presented a conceptual architecture of a scheduling tool for product-oriented ZDM which consists of four strategies: detection, repair, prevention, and prediction. The first three strategies are covered to some extent by traditional quality improvement philosophies, such as Lean Manufacturing, Six Sigma, Theory of Constraints, and Total Quality Management (Psarommatis Giannakopoulos et al. 2020b). The prediction strategy asks for advanced ICT systems and data analytics algorithms capable of early identifying and anticipating defects in the product and the process (Psarommatis Giannakopoulos et al. 2020b). According to the authors, there are numerous scheduling tools available in the literature, but they are 'machine' oriented instead of 'product' oriented. Psarommatis et al. (2020a) extended this approach by unifying the product-oriented and the processoriented ZDM. Zonnenshain and Kenett (2020) proposed a Quality 4.0 framework aiming at structuring the challenges and opportunities of quality and reliability engineering in the Industry 4.0 era. The authors consider the integration of reliability models to quality engineering and the development of predictive and prescriptive algorithms as significant enablers of Quality 4.0. Teucke et al. (2018) outlines an approach for integrating sensor-based quality data into supply chain event management. Franciosa et al. (2020) proposed a digital twin framework for assembly systems with compliant parts fusing sensors with deep learning and CAE simulations in order to enable the 'Closed-Loop In-Process (CLIP) quality improvement. Bousdekis et al. (2021) proposed a framework for implementing quality analytics for decision augmentation through optimized human-AI interaction in quality control. All proposed conceptual frameworks provide a useful tool for structuring challenges and opportunities of Industry 4.0 technologies in support of quality management and for depicting approaches addressing new and emerging quality problems. Few, however, are capable of being operationalised and hence their practical applicability is limited.

System design and development
In this Section, we discuss the research works dealing with contributions to the quality information systems design and development. Peres et al. (2018) proposed a data model applicable to multi-stage ZDM manufacturing for facilitating the reduction of defects, the identification of their root causes and the elimination of propagation along the line. Angione et al. (2019) described the integration challenges of a platform employing multi-agent systems, smart on-line inspection tools, data analytics and knowledge management technologies. Luckow et al. (2018) evaluated architectures, models and deployment challenges related to the use of deep learning techniques in the automotive manufacturing quality and logistics focusing on computer vision problems. Hamzeh et al. (2020) developed a semiautomatic welding system using a linear track and a Gas Metal Arc Welding (GMAW) power source as well as a mechanism for data acquisition, integration, and visualization using heterogeneous sensor data. Nalbach et al. (2018) proposed a machine learning-based system that links the quality assurance to the design phase in order to support predictive quality. The system embeds several algorithms, namely logistic regression, naive Bayes, random forests, multi-layer perceptron-style neural networks (MLPs) and AdaBoost based on decision trees. System-based approaches are useful primarily for systems engineers and architects because they provide practical support in designing and developing holistic quality management systems, encompassing and combining various algorithms and methods. On the other hand, they have limited applicability in the shop floor, and they are not intended to be of use by operators.     (2020) Artificial Neural Networks (ANN), Support Vector Regression (SVR), Decision Trees (DT), Random Forest (RF), and Gradient Boosted Trees (GBT) in order to identify process dependencies and key quality drivers in battery manufacturing. Chongwatpol (2015) proposed three alternative models for diagnosing root causes of defects and variations by applying logistic regression, decision tree, and artificial neural network (ANN) aiming at explaining the characteristics of defects that have a great impact on manufacturing yield and the quality of products. Lee et al. (2014) presented an intelligent system, using fuzzy association rule mining with a recursive process mining algorithm, to find the relationships between production process parameters and product quality.     Their results outperform similar approaches using neural network and empirical models existing in literature.
Existing literature covers adequately Level 1 in process configuration, providing a plethora of algorithms for describing and generating insights from qualityrelated data. However, we found only one paper reaching Level 2 in process configuration. Kim and Ryu (2020) applied a Convolutional Neural Network (CNN) in order to derive predictions for quality management in the molding industry. Finally, we found no research works on algorithms for process configuration reaching Level 3, i.e. implementing prescriptive analytics.
In-process. The literature is rich on in-process quality analytics algorithms, mostly descriptive analytics algorithms. Liu et al. (2019) developed a real-time quality monitoring algorithm based on deep belief network (DBN) for quality spectra. Schreiber et al. (2019) proposed an approach for optical quality assurance using various machine learning algorithms. Lokrantz, Gustavsson, and Jirstrand (2018) proposed a machine learning framework using Bayesian networks combined with expert knowledge, in order to model the causal relationships between manufacturing stages and to identify the root causes of failures and quality deviations. Hao et al. (2016) proposed a model to represent the impact of tool wear on quality degradation. Kim et al. (2012) compared seven novelty detection methods and three different dimensionality reduction methods for detecting faulty wafers in semiconductor manufacturing. Sun, Yang, and Wang (2017) proposed a method based on the particle swarm optimization and the kernel extreme learning machine in resistance spot welding to target the accurate and fast joint quality identification. Teti (2015) applied multisensor signal processing for the extraction and selection of signal features for pattern recognition. Lieber et al. (2013) implemented data pre-processing and feature extraction and combined supervised and unsupervised learning methods to identify operational patterns, and quality-related features. Oliff and Liu (2017) proposed a methodology incorporating the rule-based learning algorithms C4.5 and RIPPER (Repeated Incremental Pruning to Produce Error Reduction).  Wuest, Irgens, and Thoben (2014) proposed an approach based on clustering and supervised learning for coping with the complexity and highdimensionality of product state data. Li et al. (2012) proposed the use of SVR (Support Vector Machine for Regression) for improving the cell vernier forecasting model to enhance the production yield in the Color Filter manufacturing process. Escobar and Morales-Menendez (2018) proposed a pattern recognition methodology supported by a hybrid feature elimination algorithm and optimal classification threshold search algorithm for the detection of rare quality events. Eger et al. (2020) presented an approach to compensate the dimensional deviations of an inner contour of a turbine shaft at an early stage in the aerospace industry. Haleem, Bustreo, and Del Bue (2021) proposed an online testing system for measurement of nep defects by using imaging and computer vision techniques. The developed system directly captures yarn images on a spinning frame and uses Viola-Jones object detection algorithm for real-time detection of nep defects. Scheibel, Mangler, and Rinderle-Ma (2021) proposed an approach to extract dimensioning information from engineering drawings and to integrate this information into the production process to facilitate and optimize quality control. The extraction process is based on 2D clustering.
There is also a considerable number of papers reaching Level 2, i.e. developing and implementing both descriptive and predictive analytics algorithms. Bai et al. (2017) proposed a deep neural network (DNN), consisting of a deep belief network (DBN) in the bottom and a regression layer on the top in order to overcome the challenges of the shallow architecture in product quality. Jun, Chang, and Jun (2020) proposed semi-supervised learning, time-series analysis, and classification models within a framework for predicting defects and improve yield in continuousflow manufacturing. In order to perform quality and yield prediction, they implemented and compared Artificial Neural Networks (ANN), Logistic Regression, Decision Tree, Random Forest, Linear Discriminant Analysis (LDA), Gaussian Naive Bayes (GNB), K-Nearest Neighbor (KNN), and Support Vector Classifier (SVC). Chatterjee et al. (2019) applied adaptive neuronfuzzy inference system and multi-gene genetic programming, in order to predict several performance measures in laser drilled holes quality. Bai et al. (2018) developed a framework with deep restricted Boltzmann machine and the stack autoencoder and compared it with a feed forward neural network with one hidden layer and a least squares support vector machine with no hidden layers. Lee et al. (2018) developed a Cyber-Physical Production System (CPPS) for quality prediction and operation control in metal casting. Several machine learning algorithms such as decision tree, random forest, artificial neural network, and support vector machine were used for quality prediction.
Jin, Zhang, and Gu (2020) proposed a selfmonitoring system based on real-time camera images and deep learning algorithms to classify the various extents of delamination in a printed part through additive manufacturing and to predict the onset of warping. Wang et al. (2019) proposed a generative neural network model for automatically predicting work-in-progress product quality. An autoencoding neural network is trained using raw manufacturing process data and the extracted features are reformed as time-series and are fed into a multi-layer perceptron. Schmitt et al. (2020) developed an integrated solution for predictive model-based quality inspection on the basis of recorded process parameters. Liu et al. (2019b) explored the relationship between the welding process and welded quality by developing a multiple sensor fusion system with principal component analysis and support vector machine Escobar, Morales-Menendez, and Macias (2020) proposed a big data-driven process monitoring aimed at rare quality event detection using the Support Vector Machine, Logistic Regression, Naive Bayes and k-Nearest Neighbors learning algorithms. Bustillo and Correa (2012) proposed a predictive model based on Bayesian Networks to optimize deep drilling operations under high-speed conditions for the manufacturing of steel components. Lieber et al. (2012) investigated how data mining techniques and intelligent machine-to-machine telematics could be used to predict internal quality issues of intermediate products. Frumosu and Kulahci (2018) proposed an approach to make use of latent structure -based methods in the pursuit of better predictions. Paul (2016) collected Experimental Forming Limit Curve (FLC) and tensile properties of various steel grades from the literature and developed a predictive model based on non-linear regression. Liu et al. (2020) proposed an end-to-end unified product quality prediction framework in order to capture temporal interactions among processes in manufacturing and assembly processes. To do this, they developed a bidirectional serial -parallel LSTM devised as an instantiated model of temporal-interactive model.
The extensive use of heterogeneous data sources on the shop floor has facilitated an increased level of intelligence of in-process quality algorithms and prescriptive analytics algorithms have just started to emerge. However, these works are limited, while they utilize domain-specific optimization approaches rather than machine learning techniques and algorithms (e.g. Tambe and Kulkarni 2015;He et al. 2017) Quality control. As already mentioned, quality control is governed by manual and traditional quality procedures. Therefore, some research works investigate their transformation to data-driven approaches. For example, Sanchez-Marquez et al. (2020) proposed a method to study quality management systems and predict key performance indicators of balanced scorecards. Another challenge has to do with the scarcity of related datasets in the literature. The rest of the identified papers use the Bosch dataset, a publicly available but highly anonymized dataset. Carbery, Woods, and Marshall (2018) used Bayesian networks for defect detection. Maurya (2016) performed anomaly detection and binary classification by using Gradient Boosting Machine and Bayesian optimization in order to identify rare defects. Ge et al. (2021) compared Federated Support Vector Machine and Federated Random Forest algorithms with centralized learning techniques for product failure predictions. Pavlyshenko (2016) combined the XGBoost treebased classifier, the generalized linear model, and a Bayesian approach for logistic regression. Gashi et al. (2021) proposed an approach to improve End of Line testing when condition monitoring data are missing in order to predict low-quality products or products with a high probability of failure over time.
To do this, they used a classification model by comparing linear discriminant analysis with shrinkage, Random Forest, and lightGBM and they employed a multi-component system view for explaining the prediction results. Although quality control is still supported primarily by manual and traditional quality procedures, data-driven approaches offer the potential of fusing a broader spectrum and data, and hence, optimising globally control corrective actions. They have not been operationalised yet, their reliability and efficacy in real-world conditions are largely untested, and hence their applicability in the shop floor is, for the moment, limited.

Mapping of reviewed papers to lifecycle stages and analytics methods
In this section, we map the reviewed papers to the respective manufacturing lifecycle stage (i.e. process configuration, in process, quality control) and category of methods (i.e. statistical correlation, regression analysis, rule-based learning, neural networks, SVM, deep learning, clustering, probabilistic models). Not surprisingly, many papers use combinations of methods to address quality challenges. Consequently, these references are associated with more than one category of methods. Moreover, based on the defined taxonomy, for each paper we identified whether the methods used address descriptive, predictive, or prescriptive analytics. For each manufacturing lifecycle stage (i.e. process configuration, in-process, quality control), we classified the descriptive, predictive, and prescriptive analytics research areas and the associated categories of methods that have been used in the literature. Therefore, we classified the research areas and their associated categories of methods according to the data analytics stage that they address.
Data pre-processing is a pre-requisite step for all the data analytics algorithms so that the data are logically organized, structured and prepared for feeding into the algorithms. In some cases, this process is straightforward; however, sometimes the nature of data asks for more advanced signal processing and dimensionality reduction techniques. Table 11 presents such techniques as derived from the literature review. Table 12 presents the reviewed works assigned to the related category of methods and manufacturing lifecycle stage. Figure 8 provides a map showing the methods that are used for each data analytics stage (i.e. descriptive, predictive, prescriptive) and for each manufacturing lifecycle stage (i.e. process configuration, in-process, quality control). For each point, it indicates the number of papers that are associated with the aforementioned dimensions.

Research gaps, challenges and future directions
In this Section, we discuss the main research gaps that were identified in the literature based on the review outcomes, their corresponding challenges, and the related directions for future research. We identified five main areas: (i) providing data-driven problem solving; (ii) developing predictive and prescriptive analytics algorithms; (iii) combining multiple data sources; (iv) combining data and knowledge; and, (v) providing augmented analytics capabilities. As shown in Table 13, the column 'Research Gaps' summarizes the main findings of our literature review. They have been identified based on the conclusions derived from the aforementioned literature review. These gaps lead to the research challenges and the future directions.

Providing data-driven problem solving
The increasing use of sensors in the shop-floor provides new capabilities for improving quality in the production line and facilitating timely and proactive response to machine malfunctioning and product defects. To this end, in-process quality algorithms have taken advantage of the large amounts of realtime data in order to provide timely and reliable insights during the actual operation. However, process configuration and even more, quality control stages have not benefit from the technological advancements of data analytics methods in the frame of Industry 4.0. Process configuration is characterized by a high complexity and variability, thus having sophisticated requirements for expert knowledge. On the other hand, quality management is still governed by traditional and manual approaches, relying on human judgement. Even some quantitative methodologies, such as Statistical Process Control have limitations in incorporating and analyzing data from a variety of data sources in the dynamic and complex manufacturing environment, while they cannot adapt to the available data and the specific processes in different industries.
Despite the expansion of Industry 4.0, quality procedures still do not adopt a data-driven approach that has the potential to facilitate the extraction of unrevealed insights even ahead of time, e.g. by predicting the upcoming defects. Even common enterprise systems that have been adopted in the previous years, such as ERP, MES, CMMS, end of line quality management systems, etc., have not been seen as a potential generator of quality insights that can lead to quality process optimization. Future research can focus on the development of data-driven frameworks and algorithms capable of coping with the complexity and variability of process configuration. In addition, there is the need for the development of appropriate algorithms and methods for transforming the procedures of the quality control stage from manual ones to data-driven by exploiting various data sources that currently stay untouched.

Developing predictive and prescriptive analytics algorithms
The current focus in the literature is on in-process quality algorithms. Indeed, there is a large amount of descriptive and predictive algorithms, while the literature has started investigating the potential of prescriptive analytics. On the other hand, process configuration is mainly addressed with descriptive analytics algorithms, while quality control has not exploited the opportunities of machine learning. However, the existing approaches rely to a large extent to process knowledge as well. On the other hand, existing predictive analytics algorithms take advantage of the available data to a larger extent. Prescriptive analytics is the least explored area, because of, among others, the scarcity of datadriven prescriptive analytics algorithms (Lepenioti et al. 2020). Teti (2015) Synthetic Minority Oversampling Technique (SMOTE) Gashi et al. (2021) Edited Nearest Neighbors (ENN) Gashi et al. (2021) Predictive analytics in quality control moves beyond the traditional quality management approaches by providing the capability to foresee upcoming quality issues ahead of time and to take mitigating actions. Further, prescriptive analytics have the potential to maximize the quality goals and at the same time mitigate the likely risks by recommending optimal sequences of actions by considering organizations quality objectives.

Combining multiple data sources
Today's manufacturing environment is rich on data sources providing different levels of information for different operations. For example, equipmentinstalled sensors generate real-time measurements for indicators of machine degradation; MES provide data related to productivity and efficiency; quality management systems store data related to defects; ERP systems store data related to production management and supply chain management; CRM systems gather customer-driven data on reviews, complaints and service defects. In addition to these, there are usually specific guidelines and specifications for executing certain tasks, which are either recorded in a semi-structured or unstructured format or are based on the expert knowledge and are not recorded in any system. However, there are data silos that typically hold manufacturers back from greater efficiencies and cost savings.
Our literature review revealed that most of the research works process data derived from sensors and from enterprise systems. Other data sources, such as environmental sensors, cameras, and product tracking technology (e.g. RFID), are rarely exploited in the existing quality analytics algorithms. Towards this direction, Wei, Wu, and Terpenny (2020) developed a decision-level data fusion approach that transforms low-dimensional decisions (i.e. predictions) made based on individual sensor data such as temperature and vibration to highdimensional decisions for quality control. Data fusion approaches have also been studies in the context of quality characterization in metal additive manufacturing (Grasso, Gallina, and Colosimo 2018), automating shop floor operations (Chen and Jin 2018) and manufacturing process monitoring (Kong et al. 2020).
It should also be noted that existing quality analytics algorithms require the embodiment of expert knowledge and process specifications. This implies that the user can provide their knowledge and expertise in a structured way, while a large volume of knowledge still remains tacit. Future research may focus on the development of methodologies and technologies for data and information fusion from heterogeneous sources.

Combining data and knowledge
Pure data-driven quality management approaches have limitations on their practical implementation, especially when their outcomes are against the operator's knowledge and experience. Optimized humanmachine collaboration is enabled when data-driven and knowledge-based systems interact in order to take into account both the engineered knowledge ('voice of experts') and the extracted knowledge from data ('voice of data') by providing non-intrusive decision augmentation. To tackle this issue, several research works incorporate domain knowledge at design time and combine it with data analytics algorithms. For example, hybrid approaches for fault detection and health monitoring combine datadriven analysis with knowledge-based models to overcome a lack of data and to increase fault detection accuracy (Tidriri et al. 2016;Wilhelm et al. 2021). Hybrid approaches have also been applied for dynamic scheduling of shop floor operations (Ma et al. 2022) as well as for enhancing predictive control capabilities of industrial robots (Chee, Jiahao, and Hsieh 2022) Existing hybrid approaches often lead to domainspecific solutions, with low flexibility and scalability putting barriers to their wider application. Future research may focus on a generalised framework for fusing data-driven with knowledge-based methods for manufacturing applications such as quality management in order to provide non-intrusive decision augmentation.

Providing augmented analytics capabilities
One barrier in the wider adoption of data analytics in quality management is that machine learning algorithms require advanced analytical skills for their training, configuration and interpretation. Therefore, transforming data into valuable insights requires an automatic way by creating an optimized humanmachine collaboration in quality procedures. The aim of augmented analytics is to allow manufacturing companies to use machine learning to automatically extract insights and visualize relevant findings from data without having to write algorithms or build complex models. They also aim at optimizing the use of data for decision-making to augment human intelligence and contextual awareness (Gartner 2018). Augmented analytics may include natural language processing and conversational interfaces, allowing all users to interact through spoken and written language (Prat 2019).
In the context of augmented analytics, data analytics can be further enhanced with the use of conversational interfaces as well as the exploitation of the human knowledge representation through intelligent digital assistants allowing all users to easily interact with data and insights. Augmented analytics is an emerging topic that is still at its infancy. The communication with the user by voice pose new challenges to the development and execution of data analytics algorithms. Apart from visualization dashboards, the data analytics outcomes should be structured in a way that can be translated to speech and be comprehensible by the users. On the other way around, they should be able to take as input parameters that are derived from the human speech.

Conclusions
The quality level in manufacturing processes increasingly concerns manufacturing firms, as they respond to pressures such as increasing complexity and variety of products, more complex value chains and shortened time-to-market. Quality management is one of the evergreen research areas in the modern century that affects the whole product lifecycle, i.e. the design, the manufacturing, and the service stages. Our study focuses on quality management in the context of manufacturing. In the future, we plan to extend the scope of analysis of quality management to encompass suppliers, partners, customers and consumers and the overall production and distribution supply chains.
Data analytics has started gathering the interest of quality researchers and practitioners, who investigate approaches, algorithms, and methods for supporting the manufacturing quality procedures in the context Process configuration and quality control stages have not benefit from datadriven approaches.
Process configuration is characterized by a high complexity and variability. Quality management is still governed by traditional and manual approaches, relying on human judgement.
Development of data-driven frameworks and algorithms for the process configuration and quality control stages.
5.2 Developing predictive and prescriptive analytics algorithms Predictive analytics has mainly been exploited for in-process functions and not for process configuration and quality control stages. Prescriptive analytics is an unexplored area.
Existing approaches rely to a large extend to process knowledge as well.
The training of predictive analytics algorithms is challenging to the scarcity of recorded past events. Prescriptive analytics has been addressed with domain-specific optimization methods rather than machine learning.
Development of predictive and prescriptive analytics algorithms in order to support quality engineers with reliable prescriptions to mitigate upcoming quality issues based on data-driven predictions. Exploitation of machine learning models for generic and scalable solutions.

Combining multiple data sources
Μost of the research works deal with data derived from sensors and enterprise systems. Other data sources are rarely exploited.
Existing quality analytics algorithms usually require the embodiment of expert knowledge and process specifications to a large extent.
Today's manufacturing environment is rich on data sources providing different levels of information for different operations There are data silos that typically hold manufacturers back from greater efficiencies and cost savings.
Development of methodologies and technologies for data and information fusion from heterogeneous sources.

Combining data and knowledge
Existing literature has not investigated the interaction between data-driven and knowledge-based methods.
There are domain-specific solutions, with low flexibility and scalability.
Pure data-driven approaches have limitations on their practical implementation, especially when their outcomes are against the operator's experience.
Coupling of data-driven with knowledge-based methods in order to augment decision making based on both data and knowledge elicted from experts.

Providing augmented analytics capabilities
Machine learning algorithms require advanced analytical skills for their training, configuration and interpretation.
Transform data into valuable insights in an automatic way by creating an optimized human-machine collaboration in quality procedures.
Supporting the operator to perform their manual tasks with human augmentation technologies.
In the context of augmented analytics, data analytics can be enhanced with the use of conversational interfaces through intelligent digital assistants allowing all users to easily interact with data and insights. Augmented analytics may use machine learning to automatically extract insights and visualize relevant findings from data without having to write algorithms or build complex models.
of Industry 4.0. This trend is facilitated by the wide expansion of sensory technology and the accelerated adoption of information systems by the manufacturing firms. Since quality and process control has been identified as one of the major challenges with a high potential of big data analytics, in this paper we investigated the manufacturing quality research field from a data analytics perspective. Specifically, we examined the existing literature, we provided clarity to the Quality 4.0 research field, we synthesized the literature review outcomes, and we identified the research gaps and challenges. On top of them, we proposed directions for future research.