A data-driven methodology for separately quantifying the effects of tool wear of upper and lower tool on the quality of cut surfaces in shear cutting processes

Shear cutting processes are characterised by physical parameters that change in time and space because of influencing process effects. Those parameters partwise can be measured and recorded by sensors and assigned to causal process effects through detailed evaluation based on analytical models. If the analytical model can classify the causal process effects in detail, the quality of the processed sheet metal component can be monitored on the basis of knowledge about tool wear influences and so a permanent part quality only can be ensured by a preventive tool change. In detail, the quality of the cut surface of a shear cut component to be assessed in this process is influenced by both the tool wear effect of the upper and lower tool, whereby both in fact show a significant influence on the different dimensional and surface characteristics of the cut surface. Analytical models for such analysis often are not available, or, if available, it may deliver only proximate results that needs improvement for the respective process effects. Previous procedures for such prediction purpose are frequently based on large and complex neural networks that require as well as process a huge amount of data. In this regard, a novel evaluation method has shown great potential for data-driven quantification of tool wear effects and looks capable of far outperforming previous process evaluation methods in its performance under different process conditions. In previous studies, specific upper and/or lower tool wear was not considered in a separate manner and a detailed inspection of the finished cut surface there for could not be carried out. This paper investigates how a data-driven approach can be used to evaluate specific wear effects of upper and lower tools in a differentiated manner, and in which degree of detail the main quality characteristics of the cutting surface in a shearing process thus can be assessed.


Introduction
Due to the expected increase in complexity in production and the customer requirement to deliver specific products in ever shorter production times, operators must be increasingly supported by adaptive assistance systems in the execution of their work. This calls the continuous enhancement of the degree of automation of corresponding manufacturing processes. One of the leading tasks in production technology is to continuously develop digital networking and make it even more economical, precise and future proof. By collecting and analysing process-related data in the immediate manufacturing process, it is possible to observe important information about the current quality of the manufactured parts at any time through data-driven monitoring systems, or even to use it for the implementation of adaptive control strategies. If the analytical model is able to classify the causal process effects in detail, IOP Publishing doi: 10.1088/1757-899X/1238/1/012065 2 the quality of the processed sheet metal component can be monitored on the basis of knowledge about tool wear influences and so a permanent part quality only can be ensured by a preventive tool change. Through the targeted use of assistance systems, both the flexibility of a system and its efficiency can be significantly increased, resulting in a decisive competitive advantage.
In the case of shear cutting processes, the product quality in the manufacturing process is particularly influenced by deviations in the cutting geometry of the tools caused by built up edges, rounding and edge breakouts [1]. The influences of the cutting geometry deviations, that negatively affected the area of the cut surfaces, can be detected with suitable measurement technology in temporal and spatial changes of the characterising physical process variables. The machine tool TRUMPF TruPunch 5000 used here sets new standards in terms of productivity and process reliability. In this respect, up to 1,600 cutting operations per minute are realised in the machine during machining, which poses special challenges for the evaluation speed.
In this context, time series that are composed of consecutive measurements over time are gaining popularity and are increasingly becoming the focus of data analysis. Time series are therefore the values of variables in a temporal sequence. Depending on the number of variables, time series can be divided into univariate time series and multivariate time series. The added value of the acquired process signals depends on the signal quality, the measurement concept and the localization [2]. Previous research has shown that sensor signals for the process force and the stroke movement in time series show strong deviations at constant process conditions [3]. The main reasons are time delays due to the drive concept, influences by machine dynamics and cycle times in the machine control. As cumulative disturbance influences, these lead to latency times that cause deviations of more than 30 per cent in the time series data of the sensor signals under constant conditions. The bandwidth of these disturbances prevents a classification use of models that are not very computationally intensive and can be trained fast or do not have to be pre-computed, such as the KNN algorithm. Previous methods, with which a classification for this prediction purpose was possible, are based on large and complex neural networks that require and process a huge amount of data. Current approaches therefore require enormous computing capacities and long computing times for the model training with slightly better performance [4]. Furthermore, they make an embedded approach in the machine control of the machine tool impossible and because of the extensive model complexity it is nearly impossible to explain or reconstruct misjudgement of the model. To solve these problems, a new method based on automatic feature extraction has recently been proposed to quantify the effects of tool wear [5]. By linking the recorded process signals, the inherent information content about the tool wear condition can be recovered, and signal interference is significantly reduced. The linking approach pursued describes a two-stage procedure that considers the physical origin of the individually recorded process signals from the multivariate time series data and combines them into a new physical quantity through smart pairing. The procedure not only reduces the necessary amount of data of the signals enormously, but also surpasses previous work in classification by far. However, the issue of how suitable this proposed method is for separately assessing the tool wear effect of the upper and lower tool from the process signals is still open. Die wear was not considered in this series of tests. For this purpose, each class of the cutting edge preparation on the punch was used in combination with an unused die. However, this differentiated evaluation is necessary if the resulting cut surface quality is to be assessed based on the tool wear effects. In detail, the quality of the cut surface of a shear cut component to be assessed in this process is influenced by both the tool wear effect of the upper and lower tool, whereby both in fact show a significant influence on the different dimensional and surface characteristics of the cut surface. This paper investigates how a data-driven approach can be used to evaluate specific wear effects of upper and lower tools in a differentiated manner, and in which degree of detail the main quality characteristics of the cutting surface in a shearing process thus can be assessed. Furthermore, a metamodel is presented from which quality maps can be derived that describe and evaluate the relationship between tool wear and cutting surface quality. According to the classification approach presented, these can be used as a link for a direct quality assessment based on process signals.

Overview of time series classification
Comprehensive data collection and analysis are basically no longer novel trends and can be considered as standard practice in many fields. A large part of this practice also dealt with time series data from sensor measurements or other types of monitoring, which are generally the most significant data sources for automation and the vision of "Industry 4.0". However, due to the nature of time series data; high dimensionality, large data size -especially for multivariant time series, this often remains in a difficult problem. Learning of representations and classification of time series continue to receive attention and can be divided into three categories according to the classification scheme: model based, distance based, and feature based [6].

Distance based classification
The first category is based on distance. After defining a distance function, the similarity between two time series is determined. Thus, the key point of these approaches is how to define the distance function. The most significant representatives of this category are the KNN and the support vector machine (SVM) classifier that have already been used successfully in a wide variety of applications [7] [8] [9] [10]. Both offer the possibility to use them directly for classification on the raw data. The basic idea in using SVM is to map a sequence in a feature space and find the hyperplane with the largest distance to separate two classes. The kernel function specifies the higher-dimensional feature space. Depending on the selected kernel function, SVM can be used for two categories, feature extraction and measure a distance [6]. The KNN classifier is a lazy learning method and does not pre-compute a classification model. For the KNN classifier, a variety of possibilities to define the distance function are already known [11]. However, this method has the serious disadvantage that the length of the two series must be equal, and it is sensitive to distortions in the time dimension [6]. To deal with this disadvantage, an alternative distance function is proposed, the DTW (Dynamic Time Warping) distance [12]. For this purpose, the sequences are non-linearly "warped" in the temporal dimension to determine a measure of their similarity independent of certain non-linear deviations in the temporal dimension. An overview of the different classifiers with the aim of classifying time series using a distance-based approach as well as a discussion of the strengths and weaknesses of each method can be found in [13]. Overall, it has been shown that compared to a variety of classifiers, such as neural networks, SVM and HMM, the KNNclassifier with dynamic time-warping distance generally has higher accuracy [7]. Today these approaches are commonly used until they fail with long or noisy time series.

Model based classification
The second category of time series classification approaches is based on models. The basis for this is the assumption that the time series in a class are generated by an underlying model, and then the model parameters are adjusted and determined by the training examples in that class. As a result, different models are generated for the respective classes. Subsequently, new time series can be compared with the models to determine to which class it belongs. The best-known representatives in this category are autoregressive models [14] [15] [16], the Markov models and hidden Markov models [17] [18].

Feature based classification
The selection of temporal features is a difficult task, as numerical data does not allow for an enumeration of features and is the key aspect of feature-based classification in time series data. Therefore, these methods are mainly divided into two categories: handcrafted feature methods and learned feature methods. An overview of feature based classification algorithms for multivariate time series represented in [19]. The principles and procedures of these methods are showed, and the advantages and disadvantages of each method are discussed. In the category of learned features, neural networks, especially those based on deep learning, have gained popularity in recent years [20] [21] [22]. One of the deep learning architectures successfully used in computer vision is convolutional neural networks (CNN). Besides their original domain of computer vision, CNNs are increasingly used in the classification of time series data with good performance [23]. Inspired by the CNN-framework for image recognition, these approaches are based on the transformation of a multivariate time series into a vector by singular value decomposition and other transformations [24]. CNNs can discover and extract the IOP Publishing doi:10.1088/1757-899X/1238/1/012065 4 appropriate internal structure to automatically detect and generate deep features of inputs through convolution and pooling operations. In contrast, handcrafted methods require experience and domain knowledge, and are a complex and challenging task, which kindles the popularity of learned feature methods.

Machine design and tooling concept
In the investigations in this paper, a hydraulically driven punching machine with a flexible tool system was used. The tool concept on upper (a) and lower (b) side are shown in figure 1. The force input into the punching process is via a hydraulic unit. For the vertical movement of the stroke, pressure is applied to the two differently sized surfaces on the upper and lower side of the ram in the punching head of the machine. Depending on the pressure ratio applied, the ram moves up or down.

Applied Material
The test material investigated in this study was the deep drawn steel DC01 (EN-AW 1.0330) having nominal sheet thickness of 1.0 mm and 3 mm and a sheet format of 1000 mm * 1000 mm. To characterise the mechanical properties of the material, uniaxial tensile tests were carried out with various orientations to rolling direction. The resulting mechanical properties are shown in table 1. The lowest value of ultimate tensile strength occurred against rolling direction. Diagonally to the rolling direction, the value was significantly higher.

Measurement methods and setup of the experiments
The data sets used in the experiments were a selection of three sensor signals captured at a fixed sampling rate between 2 and 20 kHz for each sensor in the manufacturing process. The recorded signals were collected by a measurement amplifier and were recorded time synchronously. The sensor signals used include the process force, the movement of the tool and the suspension behaviour of the machine frame. The tool movement, called stroke movement of the punch, was measured using a linear measuring system on the ram rod and were sampled at 2 kHz. The force measurements of the required force F Cut for the cutting process were carried out via three rotationally symmetrically arranged load cells (KISTLER type 9021A), that were integrated in the die between die holder and cutting unit. The construction of the segmented die for force measurement is shown in figure 1 b). The piezoceramic sensors were installed with a preload force of 20 percent of the measuring range and were sampled at IOP Publishing doi:10.1088/1757-899X/1238/1/012065 5 20 kHz. For measuring the elastic components of the machine and the tools, strain gauges were applied to the C-Frame of the machine. The strain gauge sensors were calibrated in advance to reflect the resulting distance between punch and die, which occur due to the softness in the machine design and the tool components because of the applied process force, as a function of the measured strain in the C-frame. For the entire investigation, round punches with a diameter of 8 mm were used, whereby the cutting clearance to the die was always 10 percent of the sheet thickness. For the wear classification, the individual classes of upper and lower tools are considered individually with a non-preparation counterpart, as well as continuous combined stages for the respective classes of the two tools. This results in a total of 13 different combinations that are considered within the scope of the examination.

Data pre-processing
First, a pre-processing step is required in which the raw data is cropped to obtain a multitude of equal samples for each class of tool wear from the recorded data set. The time interval of the multivariate time series data to be cropped is triggered by the initial stroke movement of the tool. The end is defined by a fixed period, and the start value of all cropped intervals are reset to zero. If a predefined limit value is reached, all process signals within these common time stamps are cropped out. This converts a long time series with sequential cutting processes into many individual data sets. Each of these data sets contains exactly one cutting process that always starts at the same time. The linking approach proposed in [5] is used for the investigation, which starts from the generally known assumption that the mechanical work required for identical processes must also be identical. In the literature, for mechanical manufacturing processes, the representation of the energy transferred to or from the tool by the application of force along the stroke movement is recommended for this intention  [25]. For an actual implementation of this proposal, all tool and machine components in the force flow that are elastically deformed along the stroke movement must be considered. The implementation of the linking approach was carried out in a two-stage process and is shown in figure 2.
In the first step, the elastic components of the machine and the tool are extracted from the measured stroke movements on the ram rod. For the implementation of the first step, the considered time series must be available in the same sampling frequency, i.e. for each of the common time values in the multivariate time series, a measured value must be stored. For the execution of this requirement and for the consideration of non-linear effects in the signals, the series, which was recorded at a sampling rate of 2 kHz, is increased in their point density by a factor of ten using the scikit-learn class splinetransformer. The result is the actual distance covered by the upper and lower tools in relation to the workpiece to be processed. In the second step, the signals are converted into the mechanical work, i.e. the energy transferred to or from the tool via the application of force along the stroke movement.

Feature extraction
In the next step, the features are extracted from the cropped data sets. Good features for classification are characterised by the fact that the process is described completely and the changes in the signals that can be found within the classes to be recognised are emphasised as far as possible. In this context, the features of S Fstart , S Fend , S Fi50 , S Fd50 , S Fi80 and S Fd80 were found to be excellent representatives, where the corresponding measured values show the smallest variance within a class. The start S Fstart and end value S Fend are determined at 10% of F max occurring in the increase and decrease. The extraction of the other features follows the same principle. The values in the increase and decrease were extracted at 50 and 80 percent of the maximum process force (i = increase, d = decrease), which was necessary for the respective cutting process. These values have proven to be a good trade-off to keep large differences as small as possible and still be able to describe the entire process to the extent possible. An additional significant feature is the work W total , that must be expended for separating the sheet. It is equal to the area under the signals that lies between the values S Fstart and S Fend . A total of 7 features are thus extracted from the multivariate time series for each cutting process.

Classification of wear effects
After defining the features, a normalisation is carried out in a final step. For uniformly distributed data, minmax scaling is generally recommended. First, the entire dataset is divided into training and test data, as the test data should only be scaled to the min-max values of the training data. The ratio of training to test data is set at 80 to 20. To visualise the decision boundaries of the model, the 7 features are reduced to two dimensions by a dimension reduction step with principal component analysis (PCA). The sum of the variance explained by the two remaining components is over 0.99. The minmax scaler, which is used, scales and translates each individual feature so that it is given in a range between zero and one in the training set. Therefore, the estimator is fitted to the training data and then transferred to the test data. Hence, the feature values of the test data can be larger or smaller than the specified interval between [0,1]. This procedure is preferred because it guarantees the independence of the models from the test data. The first investigation was carried out for a DC01 material with a sheet thickness of s = 1 mm. To find the best performing model for the classification, a grid search was implemented, in which the evaluation of the models was carried out using a cross-validation with k-folds = 5. The hyperparameters for the used grid search were weights ∈ [uniform, distance] and n neighbours ∈ [1, 10]. For the implementation of the dimension reduction, the division into training and test data, the normalisation and the classification, the standard methods from the scikit-learn library were used. For all hyperparameters of the applied classes that are not explicitly specified, the implemented default values were adopted. The best performance in the classification is achieved by the model with n neighbours = 3 and weights = "distance" with 95.4 percent and is shown in figure 3 with the decision boundaries as well as the data distribution (o = training, * = test). On the right side of figure 3, the heatmap for the respective tool wear classes has been added. It is easy to see that the misinterpretations of the model are more often found in neighbouring die wear (Class1 -Class4). Due to the ten times smaller chamfers on IOP Publishing doi:10.1088/1757-899X/1238/1/012065 7 the cutting edge of the dies (in contrast to the punches), it is also much more challenging for the model to extract the necessary information content for different wear conditions from the process signals. However, the exception is the max. punch-side wear Class8 and the 4 th combined wear Class11. By far the falsest imputations are recorded for these two classes. This suggests that a very similar feature image is generated for the 4 th class of combined tool wear (Class11) and the max. wear allowance on the punch (Class8). A similar result was obtained for the sheet thickness of s = 3 mm, whereby the accuracy score of the best model with was slightly higher at 96.2 percent. However, the result of 95.4 -96.2 percent accuracy for 13 wear classes to be classified shows the excellent capability of the presented procedure to evaluate specific wear effects of upper and lower tools in a differentiated manner.

Influence of tool wear effect on the characteristics of the cut surface
Of the one hundred cutting operations processed in a wear class, every tenth perforation was cut out of the processed sheet for sample preparation and grinding for the cut surface assessment. To avoid irregularities and value distortions of the characteristics on the cut surfaces due to deviations in the concentric alignment of the upper and lower tools, it is expedient to characterise the samples taken from the perforations at two opposite points regarding their topology -20 assessments for each class of tool wear. The VDI 2906 [26] was used to determine the cut surface characteristics, whereby only those parameters were considered that were relevant for the vertical topology structurerollover height, clean cutting height, fracture height and burr height. Figure 4 summarises the results of the cutting surface measurement with the characteristic micrographs (s = 1 mm) for the respective wear classes. For the punch-sided wear, it can be stated unambiguously that with increasing size, it is mainly the rollover height, that increases. In contrast, the resulting burr height remains almost identical across all wear levels. For the wear on the die side, it can first be stated that an increase in the wear effects leads to a strong increase in the burr height. In contrast to the wear on the punch side, where the rollover increased strongly with rising wear effects, the rollover remains almost constant with increasing wear effects on the die. When looking at the cutting surface characteristics for the combined wear, it is first noticeable that the behaviour of the rollover is almost identical to what could be determined for the punch-side wear. The same can be noted for the combined wear conditions in comparison to the wear conditions on the die side for the burr height. Overall, it can also be observed that the influences on the changes in the cut surface are more pronounced with thinner sheet thicknesses than with thicker ones.

Metamodel and quality plot for the surface quality parameter burr height
The metamodel is intended to provide the still missing bridge between the separate quantification of the effects of upper and lower tool wear from the process signals and the correlated formation of the burr height on shear cut components. For the resulting model, the total characteristic values of the determined burr heights, as determined in chapter 5.2, were entered in a diagram as a function of the two wear conditions on the punch and die. The x-axis shows the wear condition on the punch side and the y-axis the wear condition on the die side and the z-axis represents the burr height. Subsequently, the min. (green diamond), max. (red diamond) and average values (blue diamond) of the measured burr heights were determined. For the modelling of the burr topology, mesh grids are first generated for all three burr levels, which are then approximated with quadratic functions and multiplied in their point density. The result is the min., max., and average burr topology as a function of the wear effects on the upper and lower tools. To derive the quality maps for the ridge height, the isohypsis (height level lines) are entered in 0.02 mm steps in the respective surfaces of the model. From the burr height quality maps, the expected burr heights can be read quickly and easily depending on the different effects of upper and lower tool wear. Figure 5 shows the metamodel and the derived quality maps for a sheet thickness of 1 mm. The metamodel is intended to provide the still missing bridge between the separate quantification of the effects of upper and lower tool wear from the process signals and the correlated formation of the burr height on shear cut components. In principle, this procedure can be applied to each of the characteristic quality parameters and the entire topology of the cutting surface can be described and evaluated after the shearing process. The metamodel and the three derived quality plots for the burr height assessment as a function of the wear effects on both tools for the sheet thickness of 3 mm is shown in figure 6.  Figure 6. Metamodel for the material DC01 (s = 2 mm) and the corresponding quality maps.
Compared to figure 5, the isohypsis of all three quality maps in figure 6 increase almost in parallel with the wear effects on the die side. While the wear on the punch side still had a significant influence on the burr height with a sheet thickness of 1 mm, the resulting burr height is only influenced by the wear effects on the die side for a sheet thickness of 3 mm. In addition, it can be seen by the higher density of the burr level lines that the burr height increases much more in its absolute value with increasing wear effects on the die side. In reference to the sheet thickness, however, thinner sheets tend to cause higher burr levels.

Conclusion
In the experimental investigations carried out as part of this work, a data-driven methodology for separately quantifying the effects of upper and lower tool wear was presented for the first time, and how this can be used to evaluate the cut surfaces quality during the machining process. First, it could be shown that with the linking approach proposed in [5] for pre-processing the process signals, not only the wear condition on the punch side of the tools can be uniquely identified, but also the tool wear effects on the die side can be clearly assigned. Moreover, it could be shown that not only one-sided wear effects can be clearly assigned, but also combined wear effects on the upper and lower tool can be identified with the applied method. Afterwards, the tool wear effects used in the experimental investigations were analysed regarding their impact on the characteristics of the shear cut surfaces. For the distinctive quality characteristics, a procedure for the development of metamodels was presented, which can describe and evaluate the correlation behaviour of tool wear, both on punch and die, separately for the cutting surface quality. The proof for the procedure has been demonstrated exemplarily on the most significant characteristic, the burr height. By extending the study to a further sheet thickness, the recommended method could be validated, and the effectiveness of the approach was demonstrated as part of a proof of concept. This offers the unambiguous opportunity for the first data driven quantification of the quality of shear-cut components during the shear cutting process. It could also be established that the influence of the punch-sided wear effects on the burr height decreases with increasing plate thickness. Currently, the proof of concept has only been applied to one material and one tool contour, therefore it is recommended to falsify the transferability of the proposed method for other materials and tool contours in further investigations.