A data-driven algorithm to predict throughput bottlenecks in a production system based on active periods of the machines

Smart manufacturing is reshaping the manufacturing industry by boosting the integration of information and communication technologies and manufacturing process. As a result, manufacturing companies generate large volumes of machine data which can be potentially used to make data-driven operational decisions using informative computerized algorithms. In the manufacturing domain, it is well-known that the productivity of a production line is constrained by throughput bottlenecks. The operational dynamics of the production system causes the bottlenecks to shift among the production resources between the production runs. Therefore, prediction of the throughput bottlenecks of future production runs allows the production and maintenance engineers to proactively plan for resources to e ﬀ ectively manage the bottlenecks and achieve higher throughput. This paper proposes an active period based data-driven algorithm to predict throughput bottlenecks in the production system for the future production run from the large sets of machine data. To facilitate the prediction, we employ an auto-regressive integrated moving average (ARIMA) method to predict the active periods of the machine. The novelty of the work is the integration of ARIMA methodology with the data-driven active period technique to develop a bottleneck prediction algorithm. The proposed prediction algorithm is tested on real-world production data from an automotive production line. The bottleneck prediction algorithm is evaluated by treating it as a binary classi ﬁ er problem and adapted the appropriate evaluation metrics. Furthermore, an attempt is made to determine the amount of past data needed for better forecasting the active periods.


Introduction
Digital Manufacturing and Industrial Internet of Things (IIoT) are new emerging technologies to increase the productivity in manufacturing (Lee, Lapira, Bagheri, & Kao, 2013). Manufacturing companies collect shop floor data in digital format using Manufacturing Execution Systems (MES), sensor technologies, etc. (Hedman, Subramaniyan, & Almström, 2016). For example, one of the automotive manufacturing company in Sweden collects 100 data points of machine data per hour by MES (Subramaniyan, 2015). This means that, on 8-h operating shift, 800 data points are collected per machine. This voluminous data at a high velocity when scaled up to a production system level or a factory level can be referred to as big data (Lee et al., 2013). With the exponential growth in the data acquired from the machines, new opportunities emerge to leverage data science to enhance the state of manufacturing and enable more data-driven decision making (Shao, Shin, & Jain, 2015). To enable such data-driven decision making, companies need informative analytical algorithms to turn high volumes of fast-moving data into meaningful insights (Lavalle, Lesser, Shockley, Hopkins, & Kruschwitz, 2011;Bokrantz, Skoogh, Berlin, & Stahre, 2017). This necessitates research into data analytics which can enable efficient and effective extraction of information from the raw data to derive new knowledge and insights, which can further be applied to introduce intelligence into the control of production processes and can also improve the system-level operation of manufacturing enterprises (Wuest et al., 2016).
In the manufacturing domain, throughput is an important indicator used to evaluate production system performance. The throughput of a production line depends on the individual throughput of the machines (Chang, Ni, Bandyopadhyay, Biller, & Xiao, 2007). It is often constrained by one or more machines in the production system, which are usually called as bottlenecks (Goldrat & Cox, 1990). The prioritization of maintenance and other production improvement activities on bottleneck machines results in substantial improvement in the overall https://doi.org/10. 1016/j.cie.2018.04.024 throughput of the production system (Gopalakrishnan, Skoogh, & Christoph, 2013;Wedel, Noessler, & Metternich, 2016). Therefore, the bottleneck machines are critical to the production system performance, especially when there are limited resources.
There are a number of research efforts focussed on developing datadriven algorithms that use digital historical machine data collected by MES to detect the bottlenecks at a system level. For example, Li, Chang, and Ni (2009) proposed an algorithm that uses the online blockage and starvation data of the machines to detect the bottlenecks, while  proposed an MES based algorithm that uses the active states of the machine to detect the bottlenecks. These algorithms are descriptive in nature, meaning they detect the past bottlenecks in the production system. The bottlenecks in the production line are dynamic in nature and the current bottlenecks may not be the bottlenecks for next period as the machines' behaviour changes. This is due to the system performance degradation caused by random noise and disturbances such as failure etc. (Li, Qing, Xiao, & Ambani, 2011). This change causes the bottlenecks to shift which in turn affects the overall throughput of the production system. To maximise the throughput of the production system, it is essential to predict the bottleneck shifts in advance of the production run so that those machines can be prioritized for maintenance (e.g. reactive and proactive maintenance strategies on bottleneck machines) and other improvement activities. Hence, a system level decision support tool is required to indicate the bottleneck machines of the future production run (Jin, Weiss, Siegel, & Lee, 2016). The requirement of such a system level decision support for maintenance planning in digitalized manufacturing environment was also identified by Bokrantz et al. (2017) in a Delphi based scenario study of the future of maintenance by 2030. Further, Bokrantz et al. (2017) indicated that the real-time data analytics will be used as a potential tool by the production and maintenance engineers to make decisions on a system level.
In the past, there have been few research efforts on predicting the future bottlenecks of the production system. Li et al. (2011) proposed a time-series based predictive algorithm that predicts the blockage and the starvation of the machines and thereby detect the bottlenecks using a turning point method. Cao, Deng, Liu, and Wang (2012) proposed an algorithm to predict the bottlenecks in semiconductor manufacturing industries using different variables including the product types, releasing strategies, work in progress, processing times, utilization rate, and buffer length. Wedel et al. (2016) proposed an algorithm that predicts the bottlenecks based on the buffer levels. Though there are various predictive algorithms, these algorithms are developed and tested in a simulation environment using a discrete event simulation model of the production system and are limited to the real-world validation of the algorithms using the online data collected from the machines on the shop floor. Moreover, the performance of the different predictive algorithms is not assessed with the standard algorithms like naïve method (Keogh & Kasetty, 2002).
The purpose of the paper is to increase the productivity of the manufacturing systems by facilitating data-driven decision making using the real-time MES data. The aim is to propose a data-driven algorithm to predict the throughput bottlenecks in a production system using the active periods of the machines and to evaluate the prediction algorithm. The active period method of bottleneck detection is gaining more popularity within manufacturing research, as they can detect a group of potential bottlenecks in the production system and can also be used in decoupled and coupled systems (Lima, Chwif, & Barreto, 2008;Roser & Nakano, 2015;Subramaniyan, Skoogh, Gopalakrishnan, & Salomonsson, et al., 2016). However, there is no reported research exploring the ways by which the active period method can be used to predict the bottlenecks.

Literature review
In this section, the past research efforts on bottleneck prediction are presented followed by time series analysis and the evaluation metrics used to evaluate the prediction algorithms.

Previous work on bottleneck prediction in manufacturing system
An exhaustive list on the literature of prediction of bottlenecks in manufacturing is presented in Table 1.
According to the scientific publications presented in Table 1, it can be understood that there are many unique predictive algorithms to predict the bottlenecks in the production system. But these algorithms are tested in a simulation environment i.e. using a discrete event simulation model of the production system. In a simulation environment, Li et al. (2011) achieves a prediction accuracy 97.38% in predicting the blockage and the starvation periods of the machines while ANFIS algorithm prediction proposed by Cao et al. (2012) achieves an accuracy of 92.03% in predicting the main bottlenecks. While simulation is considered to be a vital development tool in the production analysis, the input data to the simulation follows well-defined distributions and therefore, using simulation to predict the bottlenecks can yield higher accuracy (Amar & Gupta, 1986). Compared to the simulation environment, real-time data from the machines do not fit any standard distributions due to the random nature of production disturbances. Moreover, the simulation models might not truly represent the realworld dynamics of the production as it is difficult to update the models with the changes made in the production system and this is time-consuming (Skoogh, 2011). Therefore, in order to claim the performance of the predictive algorithms, these need to be validated on real-world datasets in order for them to be useful in the real-world operations (Amar & Gupta, 1986). Validating the algorithms on the real-world production lines also enables the manufacturing companies to trust and adopt the algorithm (Jin et al., 2016), thus increasing the use of scientific knowledge in industry. The need to validate the algorithms on real-world data sets was also emphasised by Liao, Deschamps, Freitas, and Loures (2017) so as to reduce the widening gap between laboratory environments and industrial applications. The predictive algorithms presented in Table 1 lack the real-world validation and moreover, there are no clear explanations as to whether the data used to construct the simulation model was from a real-world production line. Also, these algorithms are not compared with the standard predictive algorithms such as naïve and therefore, it is not possible to evaluate the performance of those algorithms compared to standard algorithms (Keogh &   Table 1 Literature collection on bottlenecks prediction in production system. Kasetty, 2002;Hyndman, 2014).
It can be seen from Table 1, that there are different bottleneck detection techniques used to predict the bottlenecks. When compared to turning point method and buffer levels methods, the active period method can detect the primary and secondary bottlenecks more accurately and can be used for coupled and decoupled systems (Roser & Nakano, 2015). The turning point method is not accurate in detecting the bottlenecks in a decoupled system especially when there are minor production disturbances in the machine that doesn't necessarily cause the upstream machines to be blocked or downstream machines to be starved. The buffer method to detect the bottlenecks could be misleading especially when the buffers are shared between the machines or when the parts transfer devices fail or when the transfer time is more (Li et al., 2009).
Also, it can be seen from Table 1 that, there is no method found in the literature for predicting the bottlenecks using the active period based method. Therefore, this was the motivation to derive a new bottleneck prediction method based on active periods of the machines.

Theory on active period method of bottleneck detection
In this method, the machine is in the active state when the machine is engaged in producing a product, when under breakdown, during set up, and so on, and considered to be in inactive when the machine is blocked or starved (Roser, Nakano, & Tanaka, 2001) as seen in Fig. 1. The active and inactive states are similar to equipment dependent and equipment independent states respectively as proposed by De Ron and Rooda (2005). Equipment dependent states are those when the machine is in a producing state, unscheduled downstate or scheduled downstate. The equipment independent states include lack of input to the equipment, blocked state etc. By computing the percentage of the time the machine is active during the scheduled production time and comparing it with other machines, a group of potential bottlenecks can be identified. This method also gives an opportunity to obtain diagnostic insights as these are based on the aggregation of the different machine states to better understand the nature of bottleneck machines. Moreover, this method can be used to detect bottlenecks in different types of the production system, from job shop to flow systems with or without finite buffers (Roser & Nakano, 2015). Furthermore, the active period of a machine can be converted into a time series data with time resolution equal to months, weeks, days, shifts, etc. From such a time series data, an established time series based prediction methods can be applied to estimate the future active periods of the machines, and thereby predict the bottlenecks in the production line.

Time series analysis using auto regressive integrated moving average (ARIMA)
Time series is defined as a sequence of data points collected at a constant time interval. These time series data are analyzed to determine the trends to forecast the future. The aim of the time series algorithm is to explore and derive hidden patterns and insights that help in making decisions (Brockwell & Richard, 2016). The relationship between the observed sample values and the underlying stochastic process is analogous to the relationship between sample and population in hypothesis testing (Keskin, Taylan, & Terzi, 2006). Therefore, the time series is a sample from the underlying stochastic process that generated the series. There are different time series forecasting methods available in the literature and those methods can be grouped as statistical based and artificial intelligence based (Wang, Wang, Zhang, & Guo, 2012). As the active period percentages can be expressed as a time series when collected for different production runs and they are dependent on their own historical data, an ARIMA statistical method in the form of stochastic linear difference equations can explain the dependency in the active period data of the machines. ARIMA is also proved to be a robust and an efficient method for short-term time series forecasting, especially one-step ahead forecasts (Kuvulmaz, Usanmaz, & Engin, 2005). Moreover, ARIMA has been extensively used in the field of finance for short-term prediction of stock prices etc. Adebiyi & Adewumi, 2014).
ARIMA model can be represented as ARIMA (p,d,q) where p,d,q are non-negative integers that represents the order of autoregressive (AR), integrated (I), and moving average (MA) parts of the model (Ho & Xie, 1998). In practice, most of the time series data are non-stationary. Therefore, Auto-regressive (AR), moving average (MA) or auto-regressive moving average (ARMA) cannot be applied directly. One way to convert the data into stationary data is by applying differencing. If the original data are differenced d times before fitting the ARMA model, then the model for the original time series data is called as ARIMA model, where (I) denotes the number of differencing operations (Ho & Xie, 1998). The ARIMA models not only provide the forecasted value but also other metrics related to the forecasted values including standard errors associated with the forecast, confidence and prediction intervals. To compare the performance of the ARIMA model, it can be compared to a benchmark forecasting technique such as the naïve method, which is a simple technique in which the forecast is the same as the last observation (Hyndman, 2014).

Algorithm evaluation metrics
To evaluate the performance of the predictive algorithm with that of the naïve algorithm, different metrics are proposed in the literature by different authors. The most commonly used metrics are Mean Absolute Percentage Error (MAPE) (Hyndman, 2014) and accuracy, precision and recall(metrics derived from confusion matrix) (Bradley, 1997). More recently, a new informative metric, Intersection over Union (IoU), was proposed by Microsoft to evaluate the prediction algorithms (Ahmed, Tarlow, & Batra, 2015).

Mean Absolute percentage error (MAPE)
MAPE is a measure of forecast accuracy, in that it compares the forecasted value to that of the actual value and is widely used in time series forecasting (Hyndman, 2014). This MAPE is a scale-independent statistic and expresses the prediction error as a percentage. For the series of forecasted values ( … F F F F , , , , ) (1)

Confusion matrix metrics
The confusion matrix summarises the performance of the classification prediction algorithm (Bradley, 1997). This matrix is a summary of results got from a prediction algorithm on a classification problem type, e.g. a binary classification problem, and is a representation of the number of correct and incorrect predictions results. Table 2 shows the confusion matrix. The metrics that can be derived from a confusion matrix are accuracy, precision, and recall as shown in Eqs.
(2)-(4). These metrics give a way of assessing how the predicted responses obtained from the algorithm align with the actual responses.

Intersection over Union (IoU)
IoU is a metric developed by Microsoft and is widely used for the evaluation of the image segmentation algorithms in the computer vision domain (Ahmed et al., 2015). This metric can be used to evaluate the multi-classifier predictive algorithms and compares the degree of overlap between the results obtained from the prediction algorithm with that of the actual objects as shown in Fig. 2. In other words, IoU is the ratio between the intersection of the predicted and the actual objects to the union of the predicted and actual objects. The goal of this metric is to quantify the dissimilarities of the proposed algorithmic solution with respect to the ground truth in a meaningful way (Ahmed et al., 2015). The greater the overlapping area, the greater the algorithm performance.

Methodology
The methodology adopted was based on the Cross-Industry Standard for Data Mining (CRISP-DM) (Pete et al., 2000). It provides a structured methodology for a data mining project and is widely used in manufacturing (Gröger, Niedermann, & Mitschang, 2012). The CRISP-DM was adapted to design a MES based data-driven predictive algorithm. The methodology can be broadly classified into two categories: the algorithm development phase and testing phase.

Algorithm development phase
The algorithm development phase includes literature study, the study of the MES data collected from a real-world production line and the interaction with the production domain experts to understand the operational dynamics of the production system in a detailed manner. The theory of active period based bottleneck detection method as proposed by Roser et al. (2001) using a discrete event simulation model of the production system was studied in detail. In addition to that, the step by step construction of the ARIMA based time series prediction algorithm and the evaluation metrics of prediction algorithm was also studied. ARIMA algorithm was chosen out of other time series algorithms because it is proved to work for short time series data and it can capture the trends found on the historical data and project future values (Kuvulmaz et al., 2005). Moreover, it offers the rolling window flexibility which is an enabler to estimate the length of the historical data, especially when the data is an aggregated data such as active periods (Hyndman & Yeasmin, 2008;Vafaeipour, Rahbari, Rosen, Fazelpour, & Ansarirad, 2014). Hence ARIMA based forecasting algorithms can also be applied to data in constrained data storage environments. A realworld MES data set from a production line as exemplified in Table 3 is also studied in detail. This is done to determine the relevant data with respect to the active period method within the large set of MES data. The insights gained from the MES data in combination with the literature studies and production and maintenance domain expert inputs were used to design and develop the prediction algorithm.

Testing phase
In this phase, the algorithm is tested on a real-world industrial test data of a production line. The algorithm is applied to the dataset and the active periods for future production runs are forecasted. The forecast accuracy, MAPE as explained in Section 2.4.1 is then calculated to evaluate the performance of the algorithm in terms of forecasting the future active periods of the machine against the naïve method. MAPE is chosen because it is a better metric to express and communicate the performance to the industrial community as it is based on percentages and percentages are easily understandable and interpretable. Moreover, MAPE being a scale-independent metric enables the opportunity to compare the active period time series forecasting model performance with different time scales. Thus, enabling benchmarking of different algorithms using the same metric. From the forecasted active periods of the machine, a group of bottlenecks is predicted. The bottleneck prediction algorithm is then evaluated based on the different measures such as accuracy, precision, recall and IoU, and then compared with that of the naïve approach. The results were then discussed with the  production and maintenance domain experts. The modeling of the industrial data and the evaluation of the algorithm was carried out by uploading the raw MES data into R software, which is a powerful language used widely for data analytics and modeling (Hyndman & Khandakar, 2008).

Real-world industrial test study description
The algorithm was tested on the MES data of an automotive engine production line in Sweden. The layout of the production line is shown in Fig. 3. The production line has 13 machines from M1, M2,….…, M13. Each machine has an ANDON light and the MES collects and stores the ANDON information of the machine across a production run as shown in Table 3. There are four different ANDON lights: Red, Yellow, Green, and White. At any given time, the machine may have one light or a combination of lights. These ANDON lights can be grouped to represent the machine states as shown in Table 4.
The MES system stores not more than 315 production runs data at any point in time. Each production run is of 17 hours duration. To understand the degree of the shiftiness in the bottlenecks between the production runs, the MES data for 215 consecutive production runs were analyzed. The active period method as explained in Section 2.2 was used to identify the bottleneck machines for each production run. The bottleneck machines of each production run were checked against the bottleneck machines of the previous production run, which is the naïve method. The probability of the previous production run bottleneck machine being the true bottleneck for the next production run is 20.8%. This indicates that the bottleneck shifts between the production runs. Therefore, it is necessary to predict the bottlenecks of the future production run so that the engineers of the production system can plan and allocate the production and maintenance resources to effectively manage the bottleneck machines. In addition to that, as MES stores data no more than 315 production days at any point in time, it is desirable to find the size of the historical data that should be used to provide a better forecast for the next production run considering the dynamics of the production system.

Proposed bottleneck prediction algorithm based on active period method
The active period based bottleneck prediction algorithm consists of three steps. (1) To compute the active periods of the machine from MES data and then convert it into a time series of desired time resolution of production runs (shifts, days, weeks etc.). (2) To use time series techniques to forecast the active periods of the machines for the future production runs. (3) To detect the bottleneck machines in the production run from the corresponding forecasted active periods of the machines. Fig. 4 explains these three steps. The three steps put together is called as bottleneck prediction algorithm in this paper. This algorithm takes the machine level MES data to predict the bottlenecks at a system level.

4.1.
Step 1: Time series generation of the active periods of the machine Let the elapsed time of each active state be a mjn , where m ∈ {1,…,M} is the index representing each machine, ∈ … j I {1, , } represents the particular active of a machine m, and ∈ … n N {1, , } is the production run. Let b n be the scheduled production hours of a production run n. The active period percentage (A) for machine m on a given production run n can then be calculated as shown in Eq. (5).
For a time horizon T, the historical data of the active period percentages of a machine can be divided into N production runs as shown in Fig. 5 to form a time series. The active period percentages of a machine are then calculated for N production runs individually. The assumption made during the construction is that the active period percentages are dependent on its own historical data i.e. A 1 depends on A 2 etc. (each machine active period percentages during a time t reflects the system dynamics). The active period percentages are calculated for all the machines in the production system 4.2.
Step 2: Time series based forecasting of the active periods of the machines This step can be further divided into two steps: the first step is to determine the size of the historical data that is required to forecast the active periods of the machine and the second step is to forecast the active periods of the machines. The first step is carried out only when designing the algorithm to fix the size of historical data. Once the size of the historical data is determined it can be used to forecast the active periods in the second step without changing the size of the historical data.

4.2.1.
Step 2.1: Determination of the size of historical data As the dynamics of the production in the shop floor often change due to the continuous improvement activities on the machine, it may not be valid to assume that the ARIMA model parameters (p, d, and q) are constant over time. Also, if MES stores data only for limited production runs, it is necessary to find the amount of past data that can be used to provide better forecasts of the active periods. With the one-stepahead rolling ARIMA with fixed window size, the model parameters and regression coefficients are computed for each step forecast to capture the dynamics of the production system after the improvement activities are performed on the machines. When rolling forecast has a fixed number k of the previous observed values as inputs, then it is called a sliding window technique (Vafaeipour et al., 2014) as presented in Fig. 6.  The next step is to determine the optimum size of k. For this, the data set is divided into two sets: training and testing data set. Training data set is used to train the algorithm and the testing data set is used to test the performance of the algorithm (Hyndman, 2014). For a given sliding window of size k and given the active periods of the machine A t, A t−1 , A t−2 , …, A k at time t, the active duration needs to be predicted for t+1. ARIMA model is applied over the historical data of active periods. Box Jenkins methodology of ARIMA application includes three iterative steps, algorithm identification, parameter estimation and diagnostic checking (Brockwell & Richard, 2016). This three-step algorithm building is typically repeated several times until a satisfactory algorithm is selected. More detailed statistics involved in the three steps is explained in Brockwell and Richard (2016). The three-steps are carried out automatically using the time series libraries in R (Hyndman & Khandakar, 2008). The ARIMA algorithm procedure is applied over the historical time series data of active periods of the machine to estimate the algorithm parameters and the coefficient which is represented in Eq. (6).
where + A t 1 represents the forecasted active duration of the next production run, at ε t is the random variable for the disturbance and they are independent and identical distributed, ∼ ε N σ (0, ) t 2 and ϕ,θ are the regression coefficient of the auto-regressive and moving average parts of ARIMA model respectively. Eq. (6) represents the ARMA equation after the series of data is made stationary. ARIMA models with different sliding window sizes can be tested on the test data set and can be compared against each other to select the best size k of the sliding window. MAPE metric as explained in Section 2.4.1 can be used to compare the forecasted and actual active periods for different sliding windows sizes on the test data set. The sliding window size which has a minimum MAPE value is the optimum window size that can then be used as algorithm to forecast the future values of the active duration. The same procedure is followed for all the machines in the production line to find the optimum window size that is needed to forecast the active duration for next production run.

Step 2.2: Forecasting of the active periods
Once the sliding window size is determined, the same window size is used to forecast the future active periods of the machines and the size of the sliding window need not be determined every time a new forecast is made. The output from the ARIMA model is the forecasted value of the active period for t + 1 period and the associated standard error of the forecast.

Step 3: Bottlenecks prediction from forecasted active periods of the machines
The forecasted active duration and the standard error associated with the forecast of the machines in the production line are compared against each other to identify the group of bottlenecks. Let K be the index of the machine with highest active period percentage i.e. K = argmax (A m, t+1 ), m ∈ {1,…,M}. To test the statistical significance of differences in the predicted active period for the all the machines with respect to the highest active period machine, two-tailed t-tests as shown in Eq. (7) is performed for all machines m ∈ {1, ... , M} ⧹ K.
where A K is the predicted active period percentage of machine K and A m is the predicted active period percentage of machine m. The estimated standard errors for machines K and X are represented by SE K and SE X , respectively. The difference in the predicted active percentage periods of the two machines is statistically significant if > < − or t 1.96 t 1.96 stat stat at 95% confidence level (Knezevic, 2008)   M. Subramaniyan et al. Computers & Industrial Engineering 125 (2018) 533-544 confidence level the mean difference is not statistically significant and hence the machines K and X will be judged as bottlenecks in the system. The rest of the machines are classified as non-bottleneck machines.

Proposed measures for algorithm performance evaluation
The bottlenecks prediction algorithm can be treated as a binary classification algorithm that classifies each machine into bottleneck and non-bottleneck. Therefore, to evaluate the algorithm performance, the metrics derived from confusion matrix and IoU as explained in Sections 2.4.2 and 2.4.3 respectively can be used. The adaptation of the metrics to the bottleneck prediction problem in manufacturing is explained in this section.

Metrics from confusion matrix
The confusion matrix for the bottleneck prediction algorithm can be constructed as shown in Table 5.
The matrix leads to the calculation of the metrics: accuracy, precision and recall as explained in Section 2.4.2.
• Accuracy is the measure of how often the proposed algorithm predicts the correct bottlenecks and the non-bottlenecks.
• Precision gives the proportion of the machines that were predicted as the bottleneck machines were the actual bottlenecks.
• Recall is the proportion of machines that were the actual bottleneck machines and was also predicted by the algorithm correctly.

Intersection over union method (IoU Method)
Let the set of predicted groups of bottlenecks in the production system be represented as set P and the actual bottleneck set be represented as A such as P ⊂ {M1, M2, M3…} and A ⊂ {M1, M2, M3…}, where M1, M2, M3… ∈ M are the machines in the production line. Let set Z represent the intersection of set P and set A as shown in Eq. (8), in that it contains the bottleneck machines that were predicted and was the actual bottlenecks. Let set X represent the union of set P and set A as shown in Eq. (9), in that it contains the total number of distinct bottleneck machines in the predicted bottleneck set and the actual bottleneck set.
IoU determines what percentage is the predicted group of bottlenecks is the actual number of bottlenecks in the production line. The IoU for a production run can be calculated as shown in Eq. (10).
This metric penalizes the algorithm performance for prediction of  wrong bottlenecks in the production system. The higher the IoU metric, the better is the prediction algorithm performance. This metric can also be used for comparing different prediction algorithms including naïve and will be helpful for making decisions (e.g. trade-offs) in selecting the suitable algorithm for implementation. Even though IoU and accuracy metric from confusion matrix looks similar, there is a distinction between the two due to the reason that IoU is not influenced by the true negatives.

Comparison of the performance metrics of the proposed algorithm and naïve algorithm
If there are N production runs for which metrics accuracy, precision, recall and IoU are calculated, then the following formula can be applied to get the mean and standard error for each of the metrics for the proposed algorithm as shown in Eqs. (11) and (12).
The above metrics of the algorithm can be compared with that of the naïve algorithm to assess the performance of the proposed algorithm. The two-tailed t-tests as proposed by Knezevic (2008) can be used to test the statistical difference in the mean values of the proposed algorithm with that of the naïve as shown in Eq. (13).
where P Y and SE P is mean value and the standard error of the metric for proposed algorithm, B Y and SE B is the mean value and the standard error of the metric for the naïve algorithm. If t stat is greater than 1.96 or t stat < −1.96 at 95% confidence level then there is a statistical difference in the performance of the proposed algorithm with that of the naïve algorithm.

Industrial test case results
The description of the industrial test is shown Section 3.3. The MES data was collected for a period of 315 production runs from 13 machines. It is assumed that no structural change happened in the production system during the 315 production runs, meaning that there is no change in the production flow.

Time series generation of the active period percentages of the machine
The ANDON light combinations as shown in Table 3 are grouped into three states; Producing, Blocked/Starved and Down, based on detailed discussions with the domain experts of the production line including the production and maintenance engineers. The Producing and the Down states of the machine constitutes the active period of the machine. Thereafter, the active period of each machine is calculated for each of the 315 production runs to form a time series data according to the Eq. (5) explained in step 1 in Section 4. The example of the active period time series data of machine M1 is shown in Table 6.
The data set is thereafter divided into train and test datasets. The training data set consists of the first 185 production run data and the test data set consists of 130 production run data of active periods for each machine. The reason to have a higher amount of test data is to evaluate the prediction performance over many production runs.

Forecasting of the active periods
The first step is to determine the sliding window size of the ARIMA model that can be used to forecast the future active periods of the machines as explained in step 2.1 in Section 4. Thereafter, the future active periods of the machines are forecasted using the same sliding window size of ARIMA model explained in Step 2.2 in Section 4.

Determination of the size of historical data
To estimate how much of the past data can be used for constructing an appropriate forecasting algorithm which can provide a better forecast, one-step prediction sliding ARIMA as explained in Section 4 with four different fixed window sizes was carried out. The first window consists of 50 data points (production run 1 to 50), the second consists of 100 data points (production run 1 to 100), the third consists of 150 data points (production run 1-150) and the fourth set of data consists of 185 (production run 1-185) data points. The MAPE values from the sliding window sized ARIMA models tested on the testing dataset (so as the number of test data points are same across window sizes) using different window sizes for all machines is presented in Table 7.
The most notable feature of MAPE as presented in Table 7 is that the values are almost robust among the different window sizes. This indicates there is a strong randomness associated with the active period percentages of the machines and it is difficult to forecast the randomness using only the past active period percentages of the machine data drawn from MES data with very low error. From Table 7, MAPE for onestep sliding ARIMA of 100 window size is the better out of the different window sizes except for the machines M4, M5, and M11. The MAPE for window size 50 is only slightly better than window size 100 by 0.15%, 0.14% and 0.02% for machines M4, M5, and M11 respectively. Also, the average MAPE for the production line is less for a window size of 100. A trade-off was made between the MAPE values of 50 and 100 window size for M4, M5, and M11 to have a homogenous forecasting algorithm across the machines in the production system. Therefore, sliding ARIMA algorithm with a window size of 100 is selected for all the machines in the production system.

Forecasting of the active periods
The sliding ARIMA model with a window size of 100 is used to forecast the active periods of the machines in the production system. An example of the forecasted active duration for all the 13 machines for 190th and 191st production run is summarised in Table 8. The MAPE values of a sliding window of size 100 ARIMA model over the test data set is compared with the naïve method as shown in Fig. 7. From Fig. 7, the MAPE for sliding ARIMA model with a window size of 100 historical production run data is consistently lower than that of the naïve for all machines the production system indicating that it is a better forecasting model.

Bottlenecks prediction by ARIMA algorithm
The machine which has the highest forecasted active period percentage for the next production run is the bottleneck machine. T-tests at 95% confidence level are run for other machines with respect to the bottleneck machine to test the statistical significance of differences in the forecasted active period percentage of other machines with respect to the bottleneck machine as explained in step 3 in Section 4. From this, a group of potential bottleneck machines for the next production run is estimated. The t-test results for the 190th and 191st production run are shown in Table 8. It can be observed that for 190th production run machine M1, M6 and M7 are the groups of bottlenecks and the actual bottleneck for that production run is M6. And for a 191st run, M7 and M9 are the predicted group of bottlenecks and the actual bottleneck for that run is M7.

Algorithm evaluation
The predicted bottlenecks are compared with the actual bottlenecks to calculate the accuracy, precision, recall and IoU metrics. When identifying the actual bottlenecks for a particular production run based on the highest active periods of the machines, the bottleneck is one machine and not a group of bottleneck machines. This is due to the fact that the standard error cannot be calculated when computing the actual active period percentage of the machines for a single production run. In other words, the sample size for calculating the actual bottleneck for a production run is one. However, the fixed window size sliding ARIMA computes the variance associated with the forecast based on the historical data and hence group of bottlenecks can be predicted as shown in Section 4. Therefore, the algorithm is evaluated in two different ways as esplained below and the metrics accuracy, precision, and recall are calculated Method 1: Evaluating the actual bottleneck machine is within the group of predicted bottleneck machines Method 2: Evaluating the actual bottleneck machine with the machine with highest predicted active period percentage (without identifying the group of bottlenecks) T-tests at 95% confidence level were carried out to test the statistical significance of the difference in the improvement achieved in each of the metrics as explained in Section 5.3. The results are summarised in Table 9.

Domain experts review of the results
From Table 9, the proposed algorithm has higher and significant mean accuracy, precision, recall, and IoU compared to a naïve method based on the t-values for method 1. However, in method 2, the algorithm doesn't yield statistically significant improvement in recall and IoU compared to naïve. Method 1 of predicting the group of bottlenecks has several advantages for operational reasons when compared to  Table 9 ,the domain experts including the production and maintenance engineers of the test company were interested in the recall metric as that measures the proportion of the actual bottlenecks that was also predicted as bottlenecks by the proposed algorithm. The proposed algorithm has a statistically significant higher recall than that of the naïve method as seen in Table 9 i.e. the recall of 62.53% against the naïve method of 24.69% i.e. an improvement by 37.84% compared to naïve. On the other hand, the accuracy of the proposed metric is also statistically significantly higher than the naïve method by 9.27% for method 1. Even though the different performance metrics of the proposed algorithm in detecting the group of bottlenecks (as shown in method 1 of Table 9) is not 100%, it has surpassed the naive method of bottleneck detection for this production system. This indicates that the algorithm provides additional value in predicting the bottlenecks of the future production run.

Discussion
The aim of this paper is to propose a data-driven algorithm to predict throughput bottlenecks of the future production run. The proposed algorithm was tested on a real-world production line. A wide range of metrics was used to evaluate the algorithm performance by comparing it with the naïve method. The proposed algorithm when tested over the real-world production line predicts a group of bottlenecks with a recall metric of 62.53% compared to the naïve method of 24.69%.

Contributions to the interdisciplinary research field of production and data sciences
The research work presented on developing a data-driven algorithm for bottleneck prediction have significant contributions to the interdisciplinary field of production and data sciences. Compared to the existing methods in the literature as shown in Table 1 in which the different predictive algorithms are developed and tested in a simulation environment, this paper uses real-world MES data of the machines to develop and test the algorithm. The simulation environment is a highly controlled environment where the variables that affect the throughput are pre-defined as inputs to a simulation model and therefore it is possible to achieve higher accuracy in predicting the bottlenecks as shown by Li et al. (2011) andCao et al. (2012). When predicting the bottlenecks in a real-world scenario as presented in Section 6, it wasn't possible to achieve higher accuracy level than simulation. Hence, this suggests a note of caution that the total dependence on the results from the simulation is risky without those algorithms being tested on the real-world production data sets. Also, using the real-world production data sets to develop and test the bottleneck prediction algorithm will enhance the credibility of the results among the practitioners and manufacturing companies (Amar & Gupta, 1986;Jin et al., 2016;Liao et al., 2017). Moreover, the existing methods in literature were limited to evaluating the performance of the bottleneck prediction algorithm whereas the proposed algorithm is evaluated based on a wide range of metrics including accuracy, precision, recall, and IoU. Thus, engineers in manufacturing companies can easily understand the algorithm performance based on these metrics. Moreover, this enables the optimisation of the algorithm based on the selected metrics that is more important to the type of decision support that the engineers need about the production line. This facilitates the integration of data sciences domain with the production for making informed decisions. The overall evaluation framework can also be used as a benchmarking tool by researchers to assess the performance of the different algorithms and to select the more appropriate metric.
The production improvements in the machines or other production resources are continuous in the production system (Roser & Nakano, 2015). Due to these changes, the data from very old production runs can no longer be representative of the current conditions and dynamics of the production system, and the usage of that data increases the chance of the predictive algorithm to cover the dynamics of more production runs than it can accurately represent. Moreover, as it is presented in the real-world industrial test study, the MES stores the data only for 315 past production runs. In such cases, the determination of the size of historical data that needs to be used in forecasting the active period is essential. In the test study presented, MAPE for sliding windows of different sizes were estimated using ARIMA model and it is found that the past 100 production run data is slightly better in predicting the performance of the machines for the next production run compared to other window sizes for the time unit of interest. Compared with the current prediction algorithms in the literature, this is the first approach to determine how much of the past active duration data is a good representative to predict the future active durations of the machines.

Potential impact on management decisions of manufacturing companies
The proposed algorithm can have a significant impact on the production and maintenance management decisions. The algorithmic predictions of the potential group of bottlenecks will help engineers to understand where the production bottlenecks will be in the production line for the next production run and can thereby frame effective strategies to mitigate it. The algorithm not only gives indications on the probable future bottlenecks but also gives valuable information on the possible non-bottlenecks of the production system. The alerts given by the algorithm on bottlenecks and non-bottlenecks can be evaluated by the production and maintenance domain experts who have years of experience in their production system to decide whether they are logical and actionable to frame strategies for mitigating the bottleneck. The aim of the predictive algorithms is to complement their efforts to manage the true bottlenecks more effectively. The combined insights can then be used in the production planning and management meetings before the production run (e.g. morning meetings before the start of the production) to effectively plan proactive actions. For example, the strategies for reducing speed in the non-bottlenecks could be made which can increase the tool life etc. (Roser & Nakano, 2015) and prioritize the bottleneck machines for reactive maintenance. In this way, the engineers' knowledge of production systems can be combined with data sciences to get more transparency on the expected dynamics of the machines for the future production run and plan strategies. Hence, this type of system level decision support tool helps the engineers to focus more on bottleneck machines in the production system by which the throughput and the productivity can be increased (Jin et al., 2016;Bokrantz et al., 2017).

Methodological discussions
The active period method used in this study detects the bottlenecks using only the states of the machine. Quality parameters, which factors the scrapped manufactured products or rework is not taken in account when detecting the bottlenecks. Hence the active period method is a method that focuses on determining the bottleneck from an utilisation perspective. This perspective is still important for improving the production flow and maintenance planning on bottleneck machines (Wedel et al., 2016).
The intention of this research work is to provide an algorithm towards the bottleneck prediction in a production system using real-world MES type of data and to provide a generic framework to evaluate the algorithm based on different metrics. The strength of the proposed algorithm has been demonstrated using a real-world test study. In that test study, the MES data set is divided into training and testing data and the algorithm is tested on the testing data set. This type approach prevents the data bias (Keogh & Kasetty, 2002). Though the proposed algorithm is not tested on the MES data sets of multiple production systems, the authors are convinced that the algorithm works in an environment where MES records the different active states of the machines and their timestamps during the production runs and across different production runs. These are the only inputs to the proposed algorithm. The same algorithm can also be used for modeling the active periods for transportation resources if the information on the states and the corresponding timestamps are available in MES. For example, when the transportation resources move to a pickup location or a drop of location, then the state of this resource is comparable to the producing state of the machine and hence it can be classified as an active state. Similarly, when the resource is waiting for the parts to be picked up, then it could be classified as an inactive state (Roser et al., 2001). This when analyzed in combination with the machines active period, it can be investigated whether the transportation resources act as bottlenecks or not. Also, though the proposed algorithm is demonstrated for a time resolution of a complete production run, the same algorithm can also be used for different time resolutions by aggregating the active period data based on the required resolution.

Future work
The proposed bottleneck detection algorithm was developed, tested and validated with the historical data of a real-world production system and evaluated its performance. On the other hand, the algorithm also needs to be evaluated in a real-time scenario and potential effects need to be studied further. This will allow the institutionalization of predictive analytics by manufacturing companies to enable data-driven decisions at the shop floor level to be more effective.
Also, as this paper is focused towards predicting the bottlenecks in the production system using MES data, the future steps will be to get diagnostic insights on the predicted bottleneck to understand the reasons of the machine disturbances. Trying other time series algorithms such as neural networks etc. can be used to compare the prediction accuracies with that of the based ARIMA algorithm for bottleneck detection. Moreover, variety based analytics including the quality parameters of the machine etc. can be combined with the active period information to predict the bottlenecks which ensures the fact that improving the throughput from the bottlenecks also ensure high quality. The results from the prediction algorithm can be augmented with the expert opinions and can be embedded into the system to improve prediction accuracy.

Conclusion
In this paper, a data-driven algorithm is proposed to predict the throughput bottlenecks in a production system based on the active periods of the machines. The inputs to the algorithm are the states of the machines and the corresponding time stamps of those states across different production runs. The algorithm is tested on MES data sets of a real-world production system and the performance of the algorithm is evaluated over a wide range of metrics, thus enbaling the manufacturing companies to trust the algorithm. In the test study, the recall metric of interest for the engineers of the proposed algorithm outperformed the naïve method with an improvement of 37.84%. Compared to the current techniques in the literature, an attempt is made to determine how much of the historical data is used to predict the bottlenecks. The approach presented was developed with active involvement from the domain experts from machine learning and production field. Appropriately formulating the problem by considering the nature of the data and the real-world constraints and incorporating the appropriate metrics for evaluation are the lessons learned. In addition, this paper provides a generic framework for the evaluation of the different algorithms. Thus, this framework can be used as a benchmarking tool by other researchers and engineers to compare and evaluate the performance of different algorithms. From the indications on predicted bottlenecks combined with the knowledge of the production and maintenance engineers, fact-based decisions can be made to mitigate the bottlenecks and thus increasing the throughput and productivity of the system. This research work contributes to the crossdisciplinary field of production and data sciences.