Tool path planning of consecutive free-form sheet metal stamping with deep learning

Sheet metal forming technologies, such as stamping and deep drawing, have been widely used in automotive, rail and aerospace industries for lightweight metal component manufacture. It requires specially customised presses and dies, which are very costly, particularly for low volume production of extra-large engineering panel components. In this paper, a novel recursive tool path prediction framework, impregnated with a deep learning model, is developed and instantiated for the forming sequence planning of a consecutive rubber-tool forming process. The deep learning model recursively predicts the forming parameters, namely punch location and punch stroke, for each deformation step, which yields the optimal tool path. Three series of deep learning models, namely single feature extractor, cascaded networks (including state-of-the-art deep networks) and long short-term memory (LSTM) models are implemented and trained with two datasets with different amounts of data but the same data diversity. The learning results show that the single LSTM model trained with the larger dataset has the most superior learning capability and generalisation among all models investigated. The promising results from the LSTM indicate the potential of extending the proposed recursive tool path prediction framework to the tool path planning

of more complex sheet metal components.The analysis on different deep networks provides instructive references for model selection and model architecture design for sheet metal forming problems involving tool path design.

Highlights:
• Proposed a novel, heuristics-respective recursive tool path prediction framework

Introduction
Sheet metal forming technology has been developed for centuries for manufacture of lightweight metal components, which has been vastly employed in the automotive and aerospace industries as of today.Among all the sheet metal forming techniques, cold stamping is believed to be the most commonly used one (Zheng et al., 2018).Owing to its short forming cycle, stamping is widely used for mass production.However, subject to its one-step forming pattern, the forming flexibility of this technique is limited.In addition, high capital cost on customised punch and dies can shrink the profit margins of products, especially for low volume production.As a consequence, these constraints necessitate the development of consecutive free-form stamping processes which could have the same forming results as one-step stamping, and the optimisation of the tool path plays a dominant role in the development.
Over the last two decades, machine learning technology, especially deep learning, has seen its resurgence and burgeoning development in various areas thanks to the revolutionarily increased computation power of computer processors.The areas receiving the most benefits from machine learning include Computer Vision, National Language Processing (NLP) and Medical Imaging.For example, Zhao et al. (2019) introduced the powerful capability of deep neural networks in learning semantic, high-level and deeper features from images by reviewing different network architectures for object detection.Cambria and White (2014) reported that the speed of NLP analysis of a sentence had been boosted from 7 minutes for a sentence to less than a second, with the rapid development of deep learning.A quick advancement of Medical Image Analysis (MIA), especially in identifying, classifying and measuring patterns in medical images, was also brought by the recent improvements of deep learning, as suggested by Buettner et al. (2020).Machine learning algorithms are commonly classified into three taxonomies: supervised learning, unsupervised learning and reinforcement learning (Monostori et al., 1996).In sheet metal forming industry, supervised learning dominates the machine learning related applications.
As of today, the most prevailing machine learning applications to sheet metal stamping include process monitoring, fault diagnosis and surrogate assisted optimisation.In terms of process monitoring and fault diagnosis, machine learning models are developed to predict the process condition or manufacturing defects based on process-relevant parameters.For example, García (2005) used two separate multilayer perceptrons (MLPs) for wrinkles and cracks detection in a sheet metal stamping process.The MLPs were embedded in a coupled sensors-based monitoring system which proved to be useful for defects detection of deep drawing parts.Chen et al. (2019) analysed the punch sounds from stamping process with a 5layer neural network to estimate the life period of a stamping press.They constructed a prototype which had been proved to realise the real-time estimation.Similarly, Huang and Dzulfikri (2021)  is advantageous over the traditional method in accuracy, generalisability, robustness and informativeness.For more state-of-the-art surrogate assisted methods, Wang et al. (2017) surveyed on a series of models and discussed the performance and potential development of these methods in sheet metal forming design.
Although many studies have been reported on machine learning applications to metal forming, few has focused on tool path prediction of a consecutive sheet metal stamping.Most of the path generation optimisations with machine learning are applied to incremental sheet metal forming (ISF) technique because of its continuous deformation process.Hartmann et al.
(2016) developed a neural network, which is composed of four neural layers, to learn the desired workpiece shape and predict the optimal tool movement path in an incremental sheet metal free-forming process.The input to the network is a processed form of target workpiece geometry, they claimed that the forms of inputs and outputs have to be carefully designed to achieve good learning efficiency.Störkle et al. (2016) exploited a reinforcement learning algorithm to adjust the tool path in an ISF process and managed to increase the geometric accuracy of the products.Opritescu and Volk (2015) established a shallow neural network to predict the tool path strategy for a driving process, which deformed an L-shaped sheet metal by local material stretching and shrinking.The sheet metal shape deformed by the strategy from neural network was found to be highly accurate only for short profile lengths.Liu et al.
(2020) developed an overall learning algorithm with the incorporation of a reinforcement learning algorithm for the prediction of optimal tool path of a free-form sheet metal stamping process.However, a shared issue hindering applications of reinforcement learning in engineering problems is that the computation for finite element analysis (FEA) of each forming operation can be prohibitively expensive.A sophisticated fast FE simulation approach, such as Knowledge-Based Cloud FE simulation method developed by Wang et al. (2017), is to be developed for reducing computation requirements.As a consequence, due to the huge path search space for complex component deformation, the slow learning process of machine learning models may not guarantee the discovery of an optimal tool path.
The aim of this research is to fill in the gap of intelligent path planning for consecutive free-form stamping.In this paper, a recursive tool path prediction framework was proposed, with a deep learning model as the core component.This method was instantiated on a 2dimensional (2-D) consecutive rubber-tool forming process, which is a forming technique designed for producing sheet metals with Class A surface condition.Three different series of deep learning models, namely single feature extractor, cascaded network and long short-term memory (LSTM) models, were developed compared in predicting the optimal tool path and the corresponding forming parameters at each forming step, based on their topology learning from the shape discrepancy between the input workpieces.Two forming parameters, punch location and punch stroke, were used to quantitatively describe the forming operation.For more comprehensive comparison, two datasets with maximum deformation times of two and three were used for training the deep learning models.
The main contributions of this paper are as following: 1) filling the gap of tool path planning for free-form sheet metal stamping process with deep learning technologies in metal forming industry; 2) proposing a novel heuristics-respected recursive tool path prediction framework, with which the performances of several adapted state-of-the-art deep learning networks in tool path prediction for a free-form sheet metal stamping process were compared; 3) providing instructive references for model selection and model architecture design for sheet metal forming problems involving tool path design.

Sheet metal forming test
In this research, a novel rubber-tool forming process was proposed to deform sheet metals so that the product has a Class A surface condition.Figure 1 shows the test setup, FE model and FE mesh of the rubber-tool forming.The workpiece, laying on a large piece of silicon rubber, is deformed by a rubber-wrapped rigid punch.During forming, the workpiece is only in contact with rubber and thus protected from scratches and dents.As shown in Figure 1a, the punch is only allowed to translate along Y-direction.The bottom of the workbench rubber is supported by a rigid body and its mid-point is fixed in all directions.The forming experiments were performed computationally on Abaqus/2019, with mesh shown in Figure 1b.The punch was assured to deform the workpiece at the expected location by ensuring the coincidence of three lines: the mid lines of the punch rubber and workbench rubber and the line passing through the punch location in thickness direction of the workpiece.The specification of the test setup in Figure 1a is presented in Table 1.For simplicity, the dimension in the third direction was assumed infinite and therefore the numerical experiments were performed in 2-D space.The workpiece and the rubbers were meshed with 8-node biquadratic plane strain quadrilateral elements with reduced integration, CPE8R.The punch was meshed as a rigid body.The workpiece and two pieces of rubber were set to be homogeneous.The friction coefficient between the workpiece and rubber was 0.76.To determine the mesh size of the components in FE simulation, a grid-independence study was performed.Based on this study, a total of 103,716 mesh elements were used.
The materials of the workpiece and rubber used in this research was AA6082 (Rong et al., 2019) and natural rubber (Cambridge University Engineering Department, 2003).The properties of these materials used for simulation are shown in Table 2.

Consecutive forming process
To realise the consecutive forming process, of great importance is the role automatic repositioning plays.For the ease of geometry data processing and assurance of forming flexibility, 1001 evenly distributed discrete locations on the upper surface of the workpiece were selected as possible punch locations, which is the same number as the mesh nodes on this surface in computational domain.Figure 2a shows a schematic diagram of all the selectable punch locations during forming.The punch locations were numbered from left to right, starting from 1 to 1001.The denotation of deformed workpiece is in the form of "location-stroke" for one single punch and "location1-stroke1_location2-stroke2_..._locationN-strokeN" for multiple punches.For example, "501-5mm" denotes a single punch at location 501 with stroke of 5 mm; "301-5mm_501-9mm" denotes a sequential deformation with stroke of 5 mm at location 301 and 9 mm at location 501. Figure 2b shows the rubber-tool forming process, in which the flat sheet is sequentially deformed to its target shape.During the forming process, the deformed workpiece at each step is reserved as the workpiece shape for the next forming step.The workpiece is repositioned over the workbench rubber to assure that the punch location is aligned with the central axis of the punch.The repositioning of sheet metal during forming was realised by the alignment of three lines as shown in Figure 1b.At the start of each consecutive forming operation, the punch and workbench rubber were reset to their initial positions while the workpiece was rotated and translated for alignment.

Data pre-processing and data structure
It is common knowledge that deep neural networks (DNNs), especially the convolutional neural networks (CNNs), have achieved significant success in localising and extracting prediction-related features from image-based data as suggested by Johnson et al. (2016).In this context, rather than feeding the original coordinates data of workpiece from the simulations into machine learning models, these data were transformed into binary pixelbased images.Figure 3 shows the data representations of workpiece before and after preprocessing.Based on the workpiece size, the size of the binary image was 288×576, of which the spatial distance between two contiguous pixels was 0.2 mm.

Figure 3. Examples of workpiece representations in graph drawn by coordinates data (left) and binary image (right).
The pre-processing algorithm is presented in Algorithm 1.
Create a zero matrix M of size 288×576 (precision of 0.2) 2.
Create a mapping function fm converting spatial coordinates to pixel position:  = 196 − 5,  = 5 + 288, where (x,y) is the spatial coordinates and (i,j) is the pixel position 3.
for mesh node n = 1, N do 5.
If node n is on the outer surface of the workpiece then 6.
Find the pixel closest to node n: p(ik,jk) 8.
end for 14.
Export the binary image matrix M In machine learning, unlike classification which predicts a label presented in the dataset, regression predicts a continuous quantity which could not be in the training data.In real cases, the selection of punch location and stroke on workpiece can be continuous, for which the prediction of them is considered a regression problem.The machine learning model is expected to capture the intrinsic relationship in the causality between the selection of punch location/stroke and the resulting workpiece shape.For this reason, a series of FE simulations were performed with five selectable punch locations and corresponding punch strokes.Figure 4 shows all the punch locations and corresponding punch strokes selected in the simulations, which also indicates the structure of the dataset.In the simulations, the sheet metals were sequentially deformed at the selectable punch locations with selectable strokes (up to 3 combinations of location and stroke).Every possible deformation result was collected with the selectable punch locations and strokes, and useful results were sifted for model training.First of all, infeasible workpiece shapes, which have conflicts with real manufacture, were removed from the dataset.Figure 5 shows two examples of infeasible workpiece shapes.One of them has insufficient opening between the workpiece ends to leave room for the punch, and the other workpiece intersects with itself which is not practical.After excluding all the infeasible results, the number of geometries for workpieces deformed with one, two and three punches are 23, 422 and 5737.The models were trained with two datasets, respectively termed as Dataset within 2 and 3 punches, consisting of the images of the workpiece deformed by maximum of 2 (including 1 and 2) and 3 (including 1,   2 and 3) strokes.Dataset within 2 punches includes 445 workpiece geometries and the one within 3 punches includes 6182 geometries.
The 6182 data were reallocated and assembled to conform to different model architectures.As the deep learning models were used for learning a regression problem, the training data cannot include one-to-many data.i.e., the same input features but different labels, which would cause the value of prediction to lie between these labels.Although a target workpiece could be deformed through different tool paths, only one of them is reserved in the training data.Thus, when configuring the training data, the extra data causing one-tomany problem were trimmed.The training data were trimmed in the way that only the label with the lowest value of punch location was reserved when one-to-many problem appeared.

Consecutive tool path prediction framework
In real workshop-scale sheet metal forming process, the tool path planning can determine the delivery quality of the final product.During forming, it is a common case that one is prone to take the next forming operation at the location where the local shape of the current deformed workpiece has salient differences from its target.For example, when making a metal bowl, blacksmiths would put the sheet metal on a sandbag and start the strike from the middle part of the metal.In this regard, the data pipeline through the deep learning model was developed to respect this heuristic.Figure 6 presents the data pipeline through the deep learning models at training phase and application phase.It is noted that the inputsoutputs pair describes a single step through the whole tool path prediction process, of which the three words highlighted in red are the denotations signifying temporal characteristics.The "current" workpiece is the most recently obtained workpiece shape after deformation, and "target" workpiece introduces itself literally."Immediate" punch location/stroke signifies the forming parameters of the punch on the "current" workpiece.At the application phase, the well-trained models were used to predict the tool path of a new "target" workpiece.At each forming step, the prediction of the "immediate" forming parameters was used to perform an FE simulation.The new workpiece geometry, retrieved from the simulation, was then pre-processed to be a binary image.The pre-processed geometry substituted the "current" workpiece in the first channel of inputs, which was then concatenated with the "target" workpiece and fed into the deep learning models again for new "immediate" punch parameters.The recursive prediction would yield the tool path from blank sheet metal to its desired geometry, once the "current" workpiece shape conforms to the "target" workpiece to a prescribed extent.

Model training and prediction accuracy metric
The deep learning model  in Figure 6 can be defined as follow: where  ̂(; ) is the prediction from the deep learning model, which denotes the predictions of punch location and stroke. ∈ ℝ ℎ×× is the concatenated binary image input to the network model, of which h and w denote the height and width of the binary image while d denotes the number of images concatenated.For example, in the example shown in Figure 6, h, w and d are 288, 576 and 2, respectively. represents trainable parameters, i.e., weights and biases, and hyperparameters, such as learning rate.
As introduced above, the tool path prediction is configured as a regression problem.
Therefore, Mean Square Error (MSE), which measures the Euclidean distance between prediction from the model and the ground truth, is used as the objective function (Bishop, 2006) for training the deep learning models: where (•) denotes objective function. is composed of   , which denotes the ground truth results of punch location or stroke for given scenario . ̂ is the predicted value of the punch location or stroke from the model, which is computed from  ̂(; ) using equation (1).m is 2 in this research, averaging the errors of the predictions from two learning tasks.
As suggested by Lecun et al. (2015), the internal parameters, weight matrices and bias vectors, of deep learning models are changed by using backpropagation algorithm and chain rule.The goal is to iteratively update the model parameters to obtain the lowest possible objective function value before overfitting occurs.The optimisation problem can be defined as following: where  * denotes the optimal value of , with which the objective function value is lower than those with any other values of .The commonly used optimisation algorithms, which have been applied to various optimisation problems over the last decade, include stochastic gradient descent (SGD), root mean square propagation (RMSProp) (Hinton et al., n.d.) and Adam (Kingma and Ba, 2015).
In this research, the networks were trained using Keras with TensorFlow v2.learning rate scheme was used to realise this process, which is defined as follows: where  and  0 represent the current and initial learning rate, respectively.  is the decaying rate, which was chosen as 0.96.  and   are the total decaying step and current decaying step, respectively.  was chosen as 100 000.
The prediction accuracy of punch location and stroke is measured as follow: where  ̂ is the predicted value and  is the ground truth value.It is noted that if the prediction accuracy is calculated to be a negative value, i.e. the prediction is two times away from its ground truth, the accuracy is set to be zero.

Deep learning models
For tool path prediction, several deep learning models have been investigated in this research, which can be separated into two groups: models able to process spatial features, such as CNNs, and those able to process temporal features such as LSTMs.In the former category of models, in addition to using a single feature extractor, cascaded networks were also developed to improve the learning performance.The cascaded networks include an image reconstruction network and a convolutional net, which generate the next deformed workpiece shape before predicting the "immediate" forming parameters.Figure 6 shows the image reconstruction model studied in this paper, including CNN autoencoder, fully convolutional network (FCN), U-Net and generative adversarial network (GAN).

a) The architectures of image reconstruction models
There are six image reconstruction models investigated in this experiment: CAE, FCN4s (Shelhamer et al., 2017), semi-FCN4s, U-Net (Ronneberger et al., 2015), Res-SE-U-Net (Nie et al., 2020) and cGAN (Isola et al., 2017).The architectures of the six models and details of parameters are shown in Figure 8.In terms of the selection of FCN, instead of using FCN8s which was stated to outperform other FCNxs models in semantic segmentation according to Shelhamer et al. (2017), FCN4s was selected in this study as it was found, unlike FCN8s, to be able to capture and predict the minimum workpiece curvature deformed by 1 mm stroke.
As shown in Figure 8b, a model named "semi-FCN4s", which adds 1000 neurons as the latent space before upsampling, was developed to exploit the prominent feature learning and semantic segmentation ability of the CNN autoencoder and FCN4s, respectively.Res-SE-U-Net is a special format of U-Net with several Res-SE blocks between the encoder and decoder blocks, which has been found to have salient performance in engineering problems as suggested by Nie et al. (2020) and Zhou et al. (2022).In this work, six Res-SE blocks were used with a downscale ratio of 16 in each block.cGAN, unlike all the previous generative models, includes another discriminator model to classify whether the image generated by the generator model is "real" or "fake".It is noted that only the generator in the cGAN was used as the image reconstruction model in the cascaded model.As shown in Figure 8d, the generator has the same network architecture with the U-Net.Thus, the two models differ only in the training process.known to be superior in performing end-to-end and pixel-to-pixel classification, the "next" workpiece prediction is configured as a classification problem for all image reconstruction models for conformity.In addition, the performances of other models in learning classification and regression problems were compared, which have negligible differences and will not affect the qualitative analysis.
Two classes were created, with "0" denoting the black background pixels in the image and "1" denoting the workpiece pixels.A pixel-wise binary cross-entropy function (Goodfellow et al., 2016) was used as the objective function in training the image reconstruction models, which can be simplified and defined as follow: where n denotes the number of pixels, c denotes the number of classes.
Unlike CAE, FCN and U-Net, GAN has a different training process.In the training of GANs, the input image was classified as "real" if it is from the dataset or "fake" if it was generated from the generator.The discriminator is trained directly with input features and labels, while the generator was updated via the discriminator.The two models are simultaneously trained in an adversarial process until the prediction from the generator could fool the discriminator.In this research, a conditional generative adversarial network (cGAN) was developed, which was trained with the objective function suggested by Isola et al. (2017).The cGAN consists of a convolutional "PatchGAN" classifier as the discriminator, which classifies if each  ×  patch in the input image is real or fake.The patch size in this research was 18×36.In practical training, the discriminator and the generator are updated with the loss function of binary cross-entropy and mean absolute error (MAE), respectively.

LSTM models
LSTM was initially developed by Hochreiter and Schmidhuber (1997) 3.  To further evaluate the CNNs learning performance, they were trained to predict the forming parameters for a single punch without forming sequence.The inputs data to the models, shown in Figure 6, were redesigned to replace the "target" workpiece with "next" workpiece."next" workpiece denotes the workpiece geometry deformed from the "current" workpiece with a single punch, whose forming parameters were predicted by the model.In this context, both classic CNN and VGG16 were trained with the new dataset without forming sequences.Table 5 presents the key parameters used in the training.To evaluate the accuracy of predictions from the image reconstruction models, intersection over union (IoU) is used to measure the similarity between the prediction and its ground truth image.IoU for workpiece prediction (class of 1) is defined as following: where target and prediction denote the pixel set of the target and predicted workpiece images.TP, FP and FN denote True Positive, False Positive and False Negative, respectively.Table 7 shows the key parameters used for training all six image reconstruction models.To evaluate the performance of the six image reconstruction models, the IoU score density distributions over its whole score domain from these models are compared in Figure Compared with the path prediction accuracy with a single feature extractor VGG16 shown in Table 4, the cascaded models with FCN4s and CAE have even worse performance, and most of their prediction accuracies are under 80 %. semi-FCN4s, although developed to improve the learning ability, only has similar performance to a single VGG16.However, cascaded models with cGAN, U-Net and Res-SE-U-Net precede the single VGG16 in generalisation for stroke prediction by about 5 %.In addition, cGAN has the best performance on trainingset learning among all models investigated so far.The training data for LSTM models were prepared following the rules introduced in section 2.3.In this context, the key parameters for training both single and stacked LSTM models are summarised in Table 9.

Learning results from Dataset within 3 punches
To further evaluate the deep learning models in predicting the workpiece of more complex shape, a new dataset was created with additional data of workpiece deformed by 3 punches.All the deep learning models were trained again with the new dataset and the learning results of them are compared in this section.Table 12 shows the learning results from the two CNN models in predicting the tool path of workpiece deformed within 3 punches.It can be seen that, unlike the models trained with Dataset within 2 punches shown in Table 4, the single feature extractors can hardly learn the true function of the recursive prediction of tool paths, with the prediction accuracies of about only 70 % in all evaluations.Consequently, a single feature extractor is not qualified for tool path prediction of more complex sheet metal workpiece.Figure 14 shows the comparison of the performance between all investigated models which were trained with Dataset within 3 punches.From the learning results of all deep learning models trained with Dataset within 3 punches, the single LSTM is found to have the best learning capability for tool path planning.

LSTM performance analysis
To further analyse the performance of the LSTM models, the prediction accuracies on each punch location and stroke were obtained for misprediction analysis.Both single and stacked LSTMs were analysed and the predictions for different locations and strokes are compared in Figure 15.
Figure 15a shows that the prediction accuracies for location 101 are the worst among 5 locations, with just over 95 % and 90 % from the single and stacked LSTM, respectively.
This could be due to that the data with label of location 101 account for more percentage than that with other locations in the whole dataset.This phenomenon is caused in data acquisition stage, as introduced in section 2.3 about the data trimming, lower location number is prioritised to be reserved in the dataset at data trimming.As each training data represents a unique forming scenario, the learning complexity for this location is greater than other locations.This problem could be avoided by dataset structure design in the future.
From Figure 15b, it is evident that, except for stroke 1 mm, the prediction accuracies for other strokes are above 90 %.The prediction accuracies for stroke 1 mm are only around 80 % and 75 % from the single and stacked LSTM, respectively.The misprediction for stroke 1 mm could be due to the minor deformation stroke of 1 mm can make.It is much harder for models to discern the difference between the workpiece shape before and after a deformation with stroke of 1 mm than those deformed by higher strokes.This problem could be prevented by adding additional feature extractor in the models to improve local feature recognition.

Conclusions
In this research, a novel recursive tool path prediction framework is proposed for the forming sequence planning of a consecutive rubber-tool forming process.A deep learning model is the major component in the framework, which predicts the optimal tool path by learning the topological relationship between the current and target workpiece shape.Three series of deep learning models, namely single feature extractor, cascaded networks and LSTM model, were developed, of which the single feature extractor and cascaded network can learn from the spatial feature in input images while the LSTM can learn both spatial features and temporal characteristics from the inputs.These models were trained with datasets with workpiece deformed within 2 punches and 3 punches, and the performances of all the models investigated were compared.From the learning results, the following conclusions can be drawn: 1) A single feature extractor is not able to predict the optimal tool path for the consecutive rubber-tool forming process, while it has nontrivially better performance in learning the forming parameters for a single deformation, with about 15 % improvement in generalisation compared to that trained with data including multiple deformations.
2) With Dataset within 2 punches, the cascaded network, which includes an extra image reconstruction sub-model, outperforms the other two series of models.In addition, cGAN is found to precede all other sub-models in the accuracy and consistency of reconstructing the target image.The cascaded network with cGAN and the pre-trained VGG16 has the most superiority among models trained with Dataset within 2 punches.
3) With Dataset within 3 punches, both of the single feature extractor and the cascaded networks and can hardly learn the true function of tool path prediction and fail to predict the optimal forming parameters.However, both single and stacked LSTM models exhibit promising prediction results with the generalisation accuracy over 90 %, which blatantly precedes the best model trained with Dataset within 2 punches.In addition, the single LSTM marginally outweighs the stacked LSTM by 1 % -2 % in this study.
4) The performance of the LSTM models is improved by over 15% when more training data are provided, even though the additional training data do not enrich the data diversity of the dataset but only include forming results from 3 punches in the same forming pattern.Besides, with additional data, the LSTMs manage to learn the optimal tool path of a more complex deformation process.However, as the learning complexity increases, the performance of both the single feature extractor and cascaded networks blatantly deteriorates by over 10 %.
developed a one-dimensional convolutional neural network (CNN) to monitor the tool health during the real-time stamping process.In their work, the vibration signal during the stamping process was measured for training the CNN to predict the class of the tool wear condition.For surrogate assisted optimisation, an end-to-end prediction is realised by the machine learning model for fast forming results estimation, with which shorter optimisation cycle can be achieved.Hart-Rawung et al. (2020) exploited a deep neural network (DNN) with 50 hidden layers as the surrogate model to replace the traditional austenite decomposition model for faster hot stamping simulations.The results showed that the surrogate model managed to predict the final phase fraction within the desired accuracy.Zhou et al. (2022) compared the performance of a U-Net based surrogate model and an MLP based surrogate model in predicting the plastic strain fields of stamping simulations.Instead of using traditional hand-crafted features to describe the workpiece geometry, they used image-based inputs to describe the design parameters and found that the image-based method

Figure 1 .
Figure 1.The test setup, plane strain FE model and FE mesh in forming test.

Figure 2 .
Figure 2. Punch locations and rubber-tool forming process.

Figure 4 .
Figure 4. Illustration of possible punch locations and stroke options.

Figure 5 .
Figure 5. Infeasible workpiece shape with insufficient opening for punch in (a) and self-intersection in (b).

Figure 6 .
Figure 6.Data pipeline through deep learning models at the training phase (solid square) and application phase (dashed square).The filled arrows denote the data flow during the training of the deep learning model.
2.0 as backend on a facility with the specification of NVIDIA Quadro RTX 6000 GPU, 24 GB of RAM memory.All deep learning model in this study were trained with the Adam optimisation algorithm.Default values of hyperparameters ( 1 ,  2 , ) of Adam in Keras were used.For optimisations using the gradient decent algorithm, the model parameters updated at each learning step are related to the learning rate multiplying the gradient of objective function over model parameters.Once the model parameters are approaching their global/local optimum, a low learning rate is recommended to facilitate convergence and reduce the wiggling of solution at the end of training.Thus, an exponentially decaying

2. 6 . 1
Figure7shows the data pipeline through the cascaded model, and the overall data

Figure 7 .
Figure 7. Data pipeline through the cascaded network.A target workpiece "301-9mm_501-5mm_701-9mm" is used as an example for illustration.The filled arrows denote the data flow during the training of the image reconstruction model.The top long unfilled arrow denotes that the "current" workpiece is also an input to the CNN model.The CNN model is pre-trained in prior and implemented as a feature extractor in the cascaded network.

Figure 8 .
Figure 8. Network architectures and details of parameters of the six image reconstruction models used in the cascaded network.Conv and pooling layers are defined in the format of (channels, kernel, stride) and (pool, stride), respectively.ConvT denotes transposed convolution layer.In all architectures, all convolutional layers are followed by a ReLU activation layer, which is not denoted in figures for brevity."Same padding" is used in conv and convT for all models.As the learning is configured as a classification problem, the outputs of the models have two channels.
to deal with the vanishing gradient problem which can frequently occur during the training of a vanilla recurrent neural network (RNN).The vanishing gradient problem can incur data loss from previous time history and even impede the neural network from further training when the network becomes deep.Inspired by the model architecture used for image captioning developed byVinyals et al. (2015), which takes an image and a partial caption sequence as inputs to predict the next word describing the image, the model used in this experiment also has a many-to-one configuration.Figure9shows the data pipeline through the deep learning model in this research when the LSTM model is used, and the overall data pipeline can be obtained by substituting the LSTM model into the architecture shown in Figure 6.The deep learning model takes two inputs, target workpiece image and the partial forming sequence images, to predict the immediate punch location and stroke.The workpiece images are encoded to vectors with dimension of 256 by a feature extractor before fed into the LSTM or other operations.The LSTM takes input sequence of four time steps, each of which is an encoded vector of the workpiece shape image in the deformation history.If the deformation history contains less than four time steps, the rest of the vectors in the input sequence is zero padded.Thereafter, the output from the LSTM is added to the encoded vector of the target workpiece and fed into a feed forward network.The LSTM model was trained to minimise the MSE objective function, as shown in equation (2).

Figure 9 .
Figure 9. Data pipeline through the CNN LSTM.A target workpiece "301-9mm_501-5mm_701-9mm" is used as an example for illustration.The input includes the workpiece shapes at the first three deformation time steps and the model predicts the forming parameters at the fourth time step.

Figure 10
Figure10shows the architectures of the two LSTM models used in this research.The

Figure 10 .
Figure 10.The architecture of single LSTM model (without layers in dashed square) and stacked LSTM model (with layers in dashed square).The four inputs at four time steps to the LSTM has dimension of 288×576×1.The LSTM layer has 256 units.
section 3.1.1,a single feature extractor is insufficient for the tool path prediction.The increasing development of deep learning in image-to-image translation or image reconstruction has boosted its applications in areas such as satellite map translation performed by Enomoto et al. (2017), image semantic segmentation performed by Xia et al. (2019) and medical image synthesis performed by Lee et al. (2019).In this paper, a cascaded network is developed to cascade an image reconstruction model and a CNN as the deep learning model in the recursive prediction framework shown in Figure 6.The cascaded network architecture is shown in Figure 7.
Figure 12.The IoU results distribution over its whole score domain from six image reconstruction models.

Figure 13 Figure 13 .
Figure13shows the comparison of the performance between all investigated models

Figure 14 .
Figure 14.Bar chart of prediction accuracy from all models on training set and test set of the Dataset within 3 punches, respectively.The learning results are averaged from all three trainings and the error bar shows the performance variation.All trainings are provided with the same amount of raw data in the original workpiece image dataset, and these data are reallocated and assembled to conform to different model architectures.
training and test accuracy on each punch location and stroke from single and stacked LSTM models.The accuracies were averaged from all trainings.

5)
High ratio of mispredictions from the LSTM models is found on location 101 and stroke 1 mm, which could be improved by dataset structure design and incorporating additional local feature extractors.6) Models capable of learning both spatial and temporal features from inputs outperform those learning only spatial features.Consequently, the promising results from the LSTM models indicate the potential of implementing LSTM for tool path planning and optimisation of more complicated sheet metal workpiece.The limitation of the LSTM model is that the tool path prediction for a completely different target workpiece to those in the training and test dataset does not conform to the ground truth.The future work resides in improving the inference ability of the LSTM model and applying the model to tool path predictions for more complex target sheet metals.In addition, to improve the generalisation of the deep learning models on forming pattern different to those used in the training data, algorithms comparing the input workpiece shape might be used to guide the deep learning process.Using AI-based feature selection and extraction method developed by Ghahramani et al. (2020) might also help gain useful insights about the free-form stamping process from input images.

Table 3 . Key parameters for CNN training with Dataset within 2 punches.
Table4summarises the predictions of tool path from the classic CNN and VGG16.It can be seen that overfitting occurs in the training of both CNNs because of the around 10 % difference between the prediction results on training and test dataset, respectively.In addition, for both models, the test loss is more than 10 times greater than the training loss.Although VGG16 slightly outperforms the classic CNN in the evaluations on both training and test dataset, the two models do not fulfil the industrial precision requirement in terms of their low prediction accuracies.The poor performances of both CNNs indicate that a single feature extractor model is incapable of predicting the tool path from "current" workpiece to "target" workpiece, based on topology learning.

Table 5 . Key parameters for training with the new dataset without forming sequence.
Table6summarises the prediction of punching parameters without forming sequences from the classic CNN and VGG16.Compared with the training results shown in Table4, the performance of both models trained with dataset without forming sequences blatantly outweighs those trained with dataset including forming sequences by around 10 %, and both training and test losses are reduced by 10 times.In this regard, a single feature extractor model has promising performance in topology learning between successively deformed workpieces.In addition, VGG16 outperforms the classic CNN in generalisation accuracy, by about 5 %, on predictions for both punch location and stroke.

Table 10 .
It can be seen that the stacked LSTM, which has one more layer of LSTM than the single LSTM model, slightly outperforms the single LSTM.The performance of the stacked LSTM precedes the single one on training-set learning and the generalisation in location prediction by about 5 % and 3 %, respectively.Compared with the performance of the cascaded models shown in Table8, both LSTMs have inferior prediction accuracy than the cascaded models with U-Net and cGAN on both training and test dataset.

Table 11
presents the key parameters for training the single feature extractor with Dataset within 3 punches.

Table 12 . Learning results from single feature extractors trained with Dataset within 3 punches on the prediction of tool path.
the feature extractor in the cascaded network in this experiment.Table13shows the key parameters for training the cascaded models with Dataset within 3 punches.The learning results of the two LSTM models trained with Dataset within 3 punches are shown in Table16.It can be seen that both LSTM models are blatantly superior to all other models discussed above.In addition, the generalisation losses from these models are barely slightly higher than the training loss, which indicates that negligible overfitting occurs during the training process.The prediction accuracies of both models evaluated on training and test set exceed 90 %.It is also worth noting that, unlike the LSTM models trained with Dataset within 2 punches, the performances of both LSTMs are significantly improved by over 10 % in generalisation ability.Besides, the performance of the LSTM model deteriorates with one more stacked layer of LSTM cell by about 2 % in location prediction and 1 % in stroke prediction, respectively.