A multi-objective optimization method based on Gaussian process simultaneous modeling for quality control in sheet metal forming

A new modeling method, related to multiple inputs and multiple outputs (MIMO), simultaneously based on Gaussian process (GP), is proposed to optimize the combinations of process parameters and improve the quality control for multi-objective optimization problems in sheet metal forming. In the MIMO surrogate model, for the use of the system information in processing and the accuracy of the model, quantitative and categorical input variables are both taken into account in GP simultaneously. Firstly, a general method is proposed for constructing covariance functions for GP simultaneous MIMO surrogate model based on correlation matrices. These covariance functions must be able to incorporate the valid definitions of both the spatial correlation based on quantitative input variables and the cross-correlation based on categorical input variables. Secondly, the unrestrictive correlation matrices are constructed by the hypersphere decomposition parameterization, thus directly solving optimization problems with positive definite constraints is needless, and the computational complexity is simplified. Compared with independent modeling method, the proposed GP simultaneous MIMO model has higher accuracy and needs less number of estimated parameters. Moreover, the cross-correlation between the outputs (quality indexes) obtained by proposed model provides some reference to further develop quality intelligent control strategies. Finally, a drawing-forming process of auto rear axle housing is taken as an example to validate the proposed method. The results show that the proposed method can effectively decrease the crack and wrinkle in sheet metal forming.


Introduction
The drawing forming of large-and medium-sized panels is always the emphasis and difficulty of the stamping process. Sheet metal forming is affected by many factors: die structure, blank shape, material thickness, material properties, filet radius, blank holder force, drawbead layout, lubrication condition, and so on. Therefore, it is difficult to meet the requirement of high accuracy through parameters selection by only experience. For quality monitoring and control, it often needs to model the product fabrication process. However, sheet metal forming is a complex process involving the three nonlinear problems in mechanics [1]. Thus, rigorous derivation of such mathematically-physically derived models for sheet metalforming processes can be a formidable task. The model building process is very lengthy, tedious, and error prone or can even be impossible to derive analytically. Recently, considering the deficiency of the mathematically-physically derived models and cost reduction in trial and error, many researchers and engineers devote their attention on finite element approach [2][3][4] based on design of experiments (DOEs) [5].
However, a large finite element simulation of sheet metal forming can take hours or even days to run, together with different experiment designs and processing parameters may have plenty of combinations. It will raise huge computational cost in simulation and optimization. Although the solution obtained from DOEs can get a feasible combination of given process levels. But it is usually a local optimum, not the global one [6]. Recently, some scholars use surrogate models in optimization procedure of sheet metal forming to establish a mathematical or statistical approximation which will replace the expensive simulation analyses, such as response surface model [7,8], artificial neural network [9,10], and support vector regression [11]. Surrogate models are widely used to study complex systems which are time-consuming or expensive to be physically experimented on. Compared with physical models, high-fidelity surrogate models provide an efficient alternative for design of complex engineering systems and have much higher flexibility.
Recently, nonparametric models and techniques enjoy a growing popularity in the field of machine learning. Among these Bayesian frameworks [12], the Gaussian process (GP) model has recently received significant attention [13]. Zhou et al. presented an integrated simulation-based optimization system based on GP approach [14,15]. The architecture of process optimization based on Gaussian process regression and parameter correlation system were put forward in Liao's article [16].
In order to monitor and control multiple quality indexes, it needs to develop the multiple inputs and multiple outputs (MIMO) surrogate modeling methods [17,18]. And recently, intelligent optimization algorithms [19,20] for multiobjective problems [21] have developed rapidly, such as Cuckoo search algorithm [22], bee colony optimization approach [23], particle swarm optimization approach [24], differential evolution algorithm [25,26], and so on. However, the disadvantages of traditional MIMO models are mainly included in the following two points. Firstly, traditional methods assume that all the inputs are quantitative, so that the existing models are designed to deal with only quantitative input factors (quantitative input variables). In fact, the number of investigated outputs is given, and each output is categorical by nature (categorical input variables). This kind of input information can determine some important properties of the models. Because it do not take into account both types of input variables in modeling, the traditional methods for MIMO models often fail to take full advantage of the data abundance. Secondly, those outputs of the MIMO models often bear similarities with each other. But the traditional methods ignore possible correlations among the outputs at the same input values for the quantitative input variables [27].
In this data-rich environment, the surrogate modeling methods for MIMO models with both quantitative and categorical input variables have drawn increasing attention. But the research on such simultaneous MIMO surrogate models [28] is still very limited. Qian et al. proposed a general framework for building general stochastic processes models with qualitative and quantitative factors [29]. In material processing, just like sheet metal forming [30][31][32], the investigated quality indexes often have some correlation with each other. So a novel surrogate modeling method based on GP that can incorporate both quantitative input variables and categorical input variables is proposed in this paper. As a key to the development of the new method is developing, a valid and physically meaningful covariance functions, which must be able to incorporate both the spatial correlation based on quantitative input variables and the crosscorrelation based on categorical input variables for the GP simultaneous MIMO surrogate model. The cross-correlation based on categorical input variables is in the form of correlation matrices, so the proposed models are also called GP simultaneous MIMO surrogate model based on correlation matrices. Correlation matrices can be divided into the unrestrictive and the restrictive [29,33,34], common restrictive matrices are isotropic correlation matrices, multiplicative correlation matrices, group correlation matrices, ordinal correlation matrices, and so on. It is possible to significantly simplify the computational complication by taking a restrictive correlation matrix. But such one usually can be applicable to specific process or system, thus lacking the flexibility of capturing various types of correlations of the outputs.
In this paper, by using hypersphere decomposition parameterization, a novel GP simultaneous MIMO surrogate modeling framework based on unrestrictive correlation matrices is proposed. The hypersphere decomposition parameterization [35] is used to construct unrestrictive correlation matrices. This new parameterization essentially turns some optimization problems with positive-definite constraints into standard and easy-to-compute optimization problems with box constraints. Such approach inherits the flexibility of the unrestrictive correlation matrices but replaces the complicated estimation and optimization procedure used in traditional approach. The estimation and optimization procedures of hyperparameters and correlation parameters in the covariance functions are also developed. Comparing with traditional MIMO models through numerical examples, a case study of drawing forming is used to validate the proposed method. This new surrogate modeling method is simpler, has higher prediction accuracy, and requires minimal physical knowledge of the underlying system to obtain the correlations between outputs. Combined with the correlations of each process parameter for one certain single objective mentioned in Xia et al.'s literature about canonical correlation analysis based on GP [36], more comprehensive strategies are further developed for multi-objective optimization in sheet metal forming and quality intelligent control.
The rest of this paper is composed as follows. The following sections give a brief mathematical description of GP. GP simultaneous MIMO surrogate model based on correlation matrices is proposed. Then, unrestrictive correlation matrices through hypersphere decomposition parameterization are introduced. The estimation and optimization procedures of hyperparameters and correlation parameters in the models are also developed. Comparisons between the proposed MIMO models and traditional MIMO models through two numerical examples are presented in Section 3. The superiorities of the proposed method in accuracy, correlation capture, and Pareto fronts obtained from optimization algorithm through different modeling methods has been demonstrated, respectively. The effectiveness and feasibility of the proposed method to reduce the crack and wrinkle of an auto rear axle housing is illustrated in Section 4 followed by conclusions (Section 5).
2 The GP surrogate model based on unrestrictive correlation matrices with the hypersphere decomposition parameterization

The Gaussian process regression
The theory of GP can be dated back to the classical statistical method by O'Hagan [37]. However, the application of GP as a regression (and classification) technique was not common until the late 1990s, when rapid development of computational power facilitated an implementation of GP for large datasets. Recently, successful applications of GP models have been reported in various fields, including biomass estimation in batch biotechnological processes [38], chemical process modeling [39], modeling of dynamic systems for gas-liquid separator [40], and mechanical system modeling and optimization [41]. For the followed theoretical description, in this subsection, a brief overview of GP regression technique is given, including the formulation and implementation of the model.
Gaussian process is a flexible, probabilistic, and nonparametric model with uncertainty predictions in probability statistics. Its usage and modeling properties are reviewed in Rasmussen's literature [42].
Consider a collection of random variables y = [y 1 ,…, y N ] (N observations) that have a joint multivariate Gaussian distribution. The data are often noisy as well, from measurement errors and so on. Thus, each observation y can be viewed as a function related to an underlying function f (x ) through a Gaussian noise model: where f(x i ) is the corresponding random variable of Gaussian ε i is the noise that have independent identical Gaussian distribution: In general, it is assumed that the mean of the Gaussian process { f(x)} is zero. Thus, what relates one observation to another in such cases is just the covariance function, Cov ( y p ,y q )=k (x p ,x q ), and the covariance function fully specify the GP. Note that the covariance function k(,) can be any function having the property of generating a positivedefinite covariance matrix.
A common choice is the squared exponential covariance function: where θ =[w 1 ,…,w D ,v 0 ,v 1 ] T are the hyperparameters of the covariance functions, v 0 is the estimated white noise variance, v 1 is the estimate of the vertical scale of variation, D is the input dimension, and δ pq =1, if p =q and δ pq =0 otherwise. For a given case, the parameters are learned through the training dataset. And then the importance of the relevant input components can be measured by the w parameters: if w d is zero or near zero, it means that the inputs in dimension d contain little information and could possibly be removed. Consider a set of N-dimensional input vectors X =[x 1 ,x 2 , …,x N ] T and a vector of output data y = [y 1 ,y 2 ,…,y N ] T . Based on the data (X, y), and given a new input vector X * , we wish to find the predictive distribution of the corresponding output y * . Unlike other models, there is no model parameter determination within a fixed model structure. In this model, tuning the parameters of the covariance function is taken as the key. It can be resolved by maximizing the log-likelihood of the parameters, which is computationally relatively demanding, however, the number of parameters to be optimized is small (D +2; see Eq. 3), which means that the optimization convergence might be faster and that the "curse of dimensionality" is circumvented or at least decreased.
The described approach can be easily utilized for a regression calculation. Based on the training set X, among all possible combinations of these points, we calculate the covariance function, (3), summarizing the findings in the three matrices: where K is a covariance matrix (N ×N), K * is the covariances between the test and the training cases (N ×1), and K * * is the covariance between the test input and itself. For a new test input, the predictive distribution of the corresponding output is y * |(X, y), X * , and it is Gaussian, with a mean and a variance: The hyperparameters of the covariance function are not known in advance, and they must be determined by using the training data. The hyperparameter θ can be estimated by maximizing the marginal log-likelihood function: This is a nonlinear optimization problem which can be solved by using gradient-based methods, e.g., the conjugate gradient method. These methods require calculating the derivative of log-likelihood with respect to each hyperparameter θ , which is: where ∂C/∂θ can be obtained from the covariance function. This method also is called maximum a posteriori approach [30].

A general method for constructing covariance functions for GP simultaneous MIMO surrogate model based on correlation matrices
In many engineering applications, the surrogate models for multi-objective are often necessary, i.e., MIMO models. However, the conventional way to build a MIMO model is to build a surrogate model for each output individually using methods such as GP. In such way, the MIMO model is essentially treated as a group of independent single output models, ignoring the correlations between the outputs in a MIMO model. In fact, the number of objectives is known, and it should be considered as inputs for modeling as well as training set (quantitative input variables). This type of input information ignored in traditional MIMO modeling is called categorical input variables z in this paper. If no special illustration, categorical input variables z is not ordinal here.
To take into account the distinct natures and roles of quantitative and categorical input variables in a single GP MIMO model, a general method for constructing covariance functions based on GP is proposed in this subsection, which must be able to incorporate the valid definitions of both the spatial correlation based on quantitative input variables and the cross-correlation based on categorical input variables.
Consider an input vector h =(x t , z c ) t , where x t is quantitative input variable, z is categorical input variable, denoted by c =1,…m , which is the number of outputs, namely objectives. Similar to (1), the model assumes the following: the basic hypothesis of this model is consistent with the standard GP model. The spatial correlation based on quantitative input variables has been defined in covariance function (3). Thus, it only needs to define a valid cross-correlation based on categorical input variables in the covariance function, then the GP simultaneous MIMO surrogate model is fully specified. Let ε c (x)=ε((x t ,c) t ), for c =1,…,m, and Assuming that ε * (x)=Aη(x), where A =(a 1 ,…,a m ) t is an m ×m nonsingular matrix with unit row vectors (i.e., a c t a c =1 for c =1,…m ), and η (x)=(η 1 (x),…η m (x)) t , where η 1 (x), …η m (x) are independent Gauss processes with the same variance and covariance function k(x p ,x q ). Thus, Cov(η(x p ), η(x q ))=k(x p ,x q )I m , with I m being the m ×m identity matrix. Then, for the number of outputs is two, i.e., m =2, the covariance function is defined to be Let τ r,s =a r t a s , where r,s =1,…,m . Then, T ={τ r,s }=AA t is an m ×m positive definite matrix with unit diagonal elements (PDUDE), also called correlation matrix. In fact, any PDUDE can be written as BB t , where B is a nonsingular matrix with unit row vectors. Thus, for any PDUDE T ={τ r,s } and any k (x p ,x q ), all of them have a relevant valid covariance function For the squared exponential covariance function is chosen, as following where τ z 1 ;z 2 is the cross-correlation between categories z 1 and z 2 . Through the described general method for constructing valid covariance functions, it can be known that the crosscorrelation based on categorical input variables is in the form of correlation matrix T. The proposed modeling method mainly has the following advantages. Firstly, comparing with the traditional GP MIMO models, this proposed modeling method is more efficient. Considering a GP MIMO model with six design variables and three outputs, the traditional way has to build three GP models for each output individually, which involves 24 (3×(D +2)) parameters to optimization. In contrast, the proposed modeling method only needs to build one MIMO model, which involve only 11 D þ 2 ð Þþ m m−1 ð Þ 2 À Á parameters (where D is the number of quantitative input variables and m is the number of categorical input variables). Secondly, due to the valid definitions of both types of input variables in the covariance function, the GP simultaneous MIMO model can be fully specified. The purpose of taking full advantage of the data abundance is achieved, thus improving the accuracy. Thirdly, from the optimized correlation matrix T, the correlations between outputs also can be obtained.

Unrestrictive correlation matrices by using hypersphere decomposition parameterization
For (14) to be a valid covariance function, the matrix T must be a valid matrix (i.e., PDUDE) which is positive definite. Standard methods used in GP models for maximizing a likelihood function involving positive definite matrix constraints. These constraints then convert to a series of nonlinear inequalities involving the elements of the matrix. Then, an optimization problem is solved with the resulting nonlinear inequalities as the constraints and the elements of the matrix as the optimization variables. This "element-oriented" approach involves many complicated nonlinear inequalities and a huge number of optimization variables even when the dimension of the matrix is not very large, making it computationally complex or even infeasible. Considering the limitation of restrictive correlation matrices, and in order to address the positive definite constraints on matrices, the unrestrictive correlation matrix is constructed by hypersphere decomposition parameterization in this subsection. This approach inherits the flexibility of the unrestrictive correlation matrix but replaces the complicated computation of optimization problem with a clever parameterization. It means that the approach essentially turns some optimization problems with positive definite constraints into standard and easy-to-compute optimization problems with box constraints. The hypersphere decomposition parameterization provides a simple yet flexible way to construct a PDUDE matrix. It consists of two steps.
In step 1, a Cholesky-type decomposition is applied to T given by where L ={l r,s } is a lower triangular matrix with strictly positive diagonal entries. In step 2, each row vector (l r,1 ,…,l r,r ) in L is modeled as the coordinate of a surface point on an r dimensional unit hypersphere described as follows. For r =1, let l 1,1 =1 and for r =2,…,m , use the following spherical coordinate system l r;1 ¼ cos ϕ r;1 À Á ; l r;s ¼ sin ϕ r;1 À Á ⋯sin ϕ r;s−1 À Á cos ϕ r;s À Á ; s ¼ 2; …; r−1; l r;r ¼ sin ϕ r;1 À Á ⋯sin ϕ r;r−2 À Á sin ϕ r;r−1 À Á ; where ϕ r,s ∈(0,π ), all ϕ r,s are denoted by Φ. Because each ϕ r,s is restricted to take values in (0,π), the diagonal entry l r,r in L is strictly positive, thus guaranteeing that T is a positive definite matrix. In addition, τ r;r ¼ ∑ s¼1 r l 2 r;s r ¼ 1; …; m ð Þ by (16), implying that T must have unit diagonal elements. Thus, the matrix T under this parameterization is always a PDUDE. For illustration, consider the case with three outputs m =3. In step 1, a 3×3 correlation matrix is decomposed as In step 2, (l 21 ,l 22 ) are transformed into a 2D spherical coordinate system as and (l 31 ,l 32 ,l 33 ) are transformed into a 3D spherical coordinate system as where ϕ r,s can be calculated based on the following relations: In (18), (l 21 ,l 22 ) are the coordinates of a point on the half unit circle given by l 21 2 +l 22 2 =1 and l 22 >0; (l 31 ,l 32 ,l 33 ) are the coordinates of a surface point on the unit hemisphere given by l 31 2 +l 32 2 +l 33 2 =1 and l 33 >0 in (19), as shown in Fig. 1. According to the description above, the unrestrictive correlation matrix using hypersphere decomposition parameterization has several advantages. Firstly, comparing with the general method for constructing unrestrictive correlation matrix, it turns the complicated PDUDE constraint on T into simple box constraints ϕ r,s ∈(0,π ). Secondly, because ϕ r,s take values in (0,π), the elements in T can be either positive or negative, thus possible to capture various correlations across different outputs. Thirdly, any PDUDE matrix and Φ has a one-to-one correspondence, i.e., a PDUDE matrix with any arbitrary structure can be parameterized using a set of Φ values and any given Φ always gives a PDUDE matrix. It means that Φ only need to be optimized in box constraints (0,π ), thus obtaining the optimized T.

The parameters optimization for GP simultaneous MIMO model
The parameters estimation and optimization of the proposed method are presented in this subsection. Similar to the standard GP model, the elements in T, i.e., the correlations between different outputs are not known in advance (besides autocorrelation coefficients as the diagonal elements are all 1). Thus, it needs to give an initial value for T, consider a model with three outputs.
Setting like this means that, at the beginning of the fitting, the correlations between different outputs are very small. And then the values (correlations) of the elements in T are adjusted according to the training set in the coming optimization. All of the explanation for the parameters optimization is in this assumption.
The maximum likelihood method is used for the parameters estimation and optimization. The covariance matrix of the proposed method is denoted by K m , which depends on the hyperparameters Θ and T, where T has a one-to-one correspondence with a set of Φ through the hypersphere decomposition parameterization. Thus, the parameters in the model to be optimized are Θ and Φ .
Note that h =(x t ,z c ) t is a cross-array of X and z c through DOEs, like Latin hypercube, where X is a N ×D design matrix for the training set, and z c is an m ×1 design matrix for the categorical input variables. Consequently, h consists of all level combinations between those in X and z c . Hence, h has n =Nm rows (runs). This cross-array structure [43] can significantly simplify the computations of the parameters optimization in K m : the K m can be expressed as the Kronecker product K m =T ⊗K of the correlation matrix T and the covariance matrix K defined in (4). By the properties of Kronecker product [44], considering m =3 and N =6, it can be given by it means that the K m transformed into an Nm ×Nm block matrix. And then a series of transformation is carried out, From (25), the correlation matrix T and the covariance matrix K are mutually independent. In addition, the T has a one-to-one correspondence with a set of Φ through the hypersphere decomposition parameterization, thus similar to (8), then Consequently, the optimizations of Θ and Φ in the proposed model can be done separately according to (26) and (27).

Numerical examples
In this section, two numerical examples are provided to demonstrate the effectiveness and feasibility of the proposed modeling method. For comparison purposes, we consider the following two modeling methods.

(a) The individual Gaussian process method is denoted by
IG. This method is to build a surrogate model for each output separately using the standard Gaussian process model. (b) The simultaneous Gaussian process method is denoted by SG. This method is to build a GP simultaneous MIMO surrogate model using the proposed method described above.

A numerical example with both positive and negative cross-correlations
This example considers one quantitative input variable x, taking values on x ∈[0,1] and one categorical input variable z with three levels. The response is taken from the following functions: For each level of z, the training set are obtained by using a Latin hypercube design of ten runs for x ∈[0,1], and the testing data are then taken at 20 equally spaced points of 0; 1 19 ; 2 19 ; …; 1 . Figure 2 compares the three fitted curves of the functions. Intuitively, these curves are all similar to another. The second equation is negatively correlated with another two, while the first equation is positively correlated with the third one.
The root mean squared errors (RMSEs) of the testing data are calculated for the two models to assess prediction accuracy. This procedure of data generation, model fitting, and prediction accuracy assessment is repeated 100 times. And then each RMSE of the IG and SG methods, respectively, is plotted as box plots. Figure 3 compares the RMSEs of the two methods.
The mean values of the 100 RMSEs for the IG and SG methods are 0.0419 and 0.0214, respectively, indicating that the SG method achieves the better prediction performance and that the model has much higher accuracy.
In addition, average estimates of the cross-correlations parameters τ 12 ,τ 13  It is clear that the SG method correctly captures both the positive and negative cross-correlations.
This numerical example demonstrates that, comparing with IG method, the SG method has much higher prediction accuracy. Moreover, the SG method takes full advantage of the data abundance, furthermore capturing correctly both the positive and negative cross-correlations.

A numerical example with a multi-objective problem
Due to the proposed modeling method that will be applied to the multi-objective problems in engineering, one more numerical example is required to demonstrate the superiority of the method in multi-objective problems. The multi-objective problem in this numerical example is described as following: When the multi-objective problem is solved by using the NSGA II, the new generated populations are substituted into the analytic formulas described above for the corresponding calculations (called the analytic formula-substitution method in this example), thus the Pareto fronts are obtained.
However, many multi-objective problems in engineering cannot be derived by the analytic formula, for example sheet metal drawing forming. Consequently, the qualities of the Pareto fronts obtained by IG method are compared with those of Pareto fronts obtained by SG method, which is different from the way of verifying the performance of optimization algorithms. The specific work is as followed: the training set are obtained by using a Latin hypercube design of 20 runs for x ∈[0,1], surrogate models are built for the analytic formulas f 1 and g(x 1 ,x 2 ) in the multi-objective problem by using the IG and SG methods, respectively.
The purpose of this way is that, while the optimization algorithm is solving the multi-objective problem, the corresponding calculations is to use the surrogate model predictive values for the new generated populations other than being based on the analytic formula-substitution method. It means that, for the same optimization algorithm, the qualities of the obtained Pareto fronts depend on the prediction accuracy of surrogate models.
In this example, the NSGA II is used for all cases. And the population size is 50, the crossover ratio is 0.8, the mutation ratio is 0.3, and the number of iterations is 200. The experimental results are shown in Fig. 4; a, b, and c is the Pareto front for using the analytic formula-substitution method, the Pareto front for using the IG method, and the Pareto front for using the SG method, respectively.
For the reliable optimization algorithm, the Pareto front for using the analytic formula-substitution method can be regarded as the standard front. From Fig. 4, the Pareto front for using the IG method has a very big deviation on the boundary and uneven solutions. In contrast, the Pareto front for using the SG method is quite precise near the boundary. Although the distribution of the solutions near the boundary is not as good as Fig. 4a, the front coincides overall better with Fig. 4a than the Pareto front for using the IG method, and its distribution of the solutions is more uniform, too.
According to these two numerical examples, it can be demonstrated that the proposed modeling method has much higher prediction accuracy, and takes full advantage of the data abundance, thus achieving the better performance for solving multi-objective problems.
4 Applications of the GP surrogate model used in sheet metal forming

The optimization problem
Taking an auto rear axle housing as an example, its material is 08F, and the material properties are given in Table 1. The size of the part is 200 mm in diameter, 63 mm in height, and 2.5 mm in thickness. It is a typical sheet metal-forming part, shown in Fig. 5. Crack and wrinkle are two main defects illustrated in Fig. 6. For assembly, crack is not allowed to exist in the shell structure. For welding fabrication, wrinkle is not also allowed to exist in the flange. However, many factors may cause severe crack and wrinkle. So, it is not easy to find good solutions to reduce both crack and wrinkle furthest by try-and-error method. In a workshop, for a given determined die structure and material, crack and wrinkle defects have emerged, when a 500-T hydraulic press is used. So it is given priority to reduce the crack and wrinkle by adjusting the process parameters.

Definitions of objective functions
In this subsection, the quantization objective functions of crack and wrinkle are defined according to forming limit diagram (FLD) as shown in Fig. 7. For the finite element analysis of sheet metal drawing forming, a curve is usually defined as the safety forming limit curve ϕ (ε 2 ): where Δε is the crack safety margin, it is usually 10 % of the FLD 0 . If an element in the forming area is below the curve ϕ(ε 2 ), no matter how much the distance between this element and the curve ϕ(ε 2 ) is, the forming for this element is safe, thus having no possibility of crack. While an element is above the forming limit curve (FLC) φ(ε 2 ), it will crack, and the further distance between this element and the FLC φ(ε 2 ), the more severe the crack is. When an element is between the ϕ(ε 2 ) and the φ(ε 2 ), the element has risk of crack. In order to quantify the contribution degrees of crack for elements on the different zones on FLD and the distances from the safety forming limit curve, the objective function of crack is defined as where i is the number of the grid elements on sheet metal. Similar to the definition of the objective function of crack, firstly, the wrinkle limit curve is defined as where θ is the wrinkle safety margin; it is 15 % in this application. If an element is below the curve η (ε 2 ), this element has the contribution degree of wrinkle, otherwise has none. So the objective function of wrinkle is defined as where i is the number of the grid elements on sheet metal. The curve φ (ε 2 ) and w (ε 2 ) can be obtained according to Keeler's empirical formula [45].   For this case, the blank holder force (BHF) is an important design parameter. For a given determined hydraulic press, it is easy to adjust BHF. Lower BHF cannot control effectively material flow, thus causing wrinkle. While higher BHF may avoid wrinkle, but the trend of crack is increasing significantly.
The blank size is another important design parameter. Smaller diameter of the blank sheet metal cannot supply material for forming area, while bigger diameter may cause crack.
Because sheet metal has relative motion with the die and blank holder, different lubrication conditions have significant influence for the stress-strain distribution in forming, thus affecting the forming process.
Here, four design parameters are taken into account. They are BHF, diameter of the blank sheet metal D , static friction coefficient between sheet metal and die μ 1 , and static friction coefficient between sheet metal and blank holder μ 2 . Most of the engineers consider these process parameters to optimize the crack and wrinkle in the adjustment stage. In fact, filet radius of female die also has significant influence. Because the female die filet is involved in the forming process, the filet radius is fixed to 5 mm. Therefore, the optimization problem can be written as For this optimization problem with four design parameters, 30 initial process parameter combinations are selected by the enhanced translation propagation Latin hypercube design [46]. By using sheet forming computer-aided engineering analysis, the objective function values of these combinations can be obtained.
The proposed GP simultaneous model is adopted to approximate the relationship between process parameters and objective function values with these 30 initial samples as the training set. The squared exponential covariance function and correlation matrix T are proposed to construct the valid covariance function. By giving a initial value for the T, the surrogate model is fitted. Consequently, the RESM of the D C and D W is 0.104 and 0.635, respectively. There are many intelligent optimization algorithms that can solve multi-objective problems like hybrid particle swarm optimization approach [47] and hybrid immune algorithm [48]. In this paper, to solve this multi-objective optimization, the Gaussian mutation hybrid genetic algorithm (GGA-h) [49,50] is used, while the population size is 50, the crossover ratio is 0.9, the mutation ratio is 0.1, and the number of iterations is 300. Figure 8 shows the obtained Pareto front after 300 iterations. And for this case, the cross-correlation parameter τ D C ;D W between D C and D W is −0.4393. It means that they are negatively correlated but not perfectly negatively correlated. Figure 8 shows that the distribution of the solutions is really well, which the solutions near the boundary are quite close to extreme values, resulting in good scalability. And the crack and wrinkle have nonlinear relationship with each other in this case. According to the cross-correlation τ D C ;D W , they are negatively correlated but not perfectly negatively correlated. Namely, as the more D C decreases, the more D W increases. When the compression reaches a certain level, the sheet metal loses stability, and the in-plane strain transforms into the bending deformation. Consequently, the material flow is blocked, resulting in crack.

Discussion and verification
To verify the obtained solutions by the GP simultaneous model GGA-h method, the model algorithm prediction values and the corresponding CAE verification values of three optimization parameter combinations selected from the Pareto front are enumerated in Table 2. These data shows that the model algorithm prediction values are very close to the corresponding CAE verification values. Although all the values have the similar deviations, the variation trends of two results are accordant. So the optimization solutions can be consultable for the auto rear axle housing. Table 3 shows three set of data as comparisons selected from the Pareto front and initial samples, respectively, which their D C values are almost equal. Comparing with their D W values, corresponding values of the Pareto front have varying degrees of decreases. If the data, whose D C values are almost equal, are selected, the result is the same. It demonstrates that the GP simultaneous model-GGA-h method can optimize both quality indexes of crack and wrinkle at the same time.
The above conclusions indicate that the proposed modeling method can meet the requirement of accuracy in sheet metal forming, combining with the multi-objective optimization algorithm GGA-h, the satisfactory Pareto front also can be obtained. Then, one optimization process parameter combination can be selected according to specific requirements of the product to balance two objectives.
Point P in Fig. 8 is selected as the satisfactory solution, its D C and D W value is 0.01307 and 4.518, respectively. The corresponding values of BHF, D , μ 1 , and μ 2 is 241.7, 286, 0.117, and 0.119, respectively. Simulating with this   optimization combination, the FLD is shown in Fig. 9 and the trends of crack and wrinkle are all very small. The minimum thickness is 1.81 mm and the maximum thinning rate is 27.6 %, thus meeting the product quality requirements. The process parameters of the hydraulic press under consideration are set according to this optimization combination, but the values are only modified slightly according to real setting accuracy. The produced part is shown in Fig. 10; the forming quality is quite well and consistent with result of the simulation. It demonstrates that the proposed method in this paper can achieve good performance for solving multiobjective problems in sheet metal forming.

Conclusions
In this study, the GP simultaneous model GGA-h method is introduced to minimize the crack and wrinkle of sheet metalforming parts simultaneously, and has achieved stage result in modeling innovation and engineering application.
1. The spatial correlation based on quantitative input variables is defined with distance-based functions, while the cross-correlation based on categorical input variables is defined by correlation matrices. Then, a general method for constructing covariance functions based on correlation matrices is proposed and a novel surrogate modeling method is also developed, which can take into account both types of input variables. Furthermore, the hypersphere decomposition parameterization is used to construct unrestrictive correlation matrices. This approach maintains the flexibility of unrestrictive correlation matrices, which can turn some optimization problems with complex positive-definite constraints into standard and easy-tocompute optimization problems with box constraints.
The numerical examples demonstrate that the GP simultaneous model has much higher prediction accuracy than standard GP models. Therefore, the GP simultaneous model is used to approximately construct the relationship between quality indexes and process parameters, which can replace the expensive computation in optimization. The optimization algorithm GGA-h is used to solve this multi-objective optimization problem, thus obtaining the Pareto optimal set. Then, an optimization process parameter combination is selected according to specific requirements of the product to balance two objectives. 2. As an application, an auto rear axle housing is investigated. For the Pareto optimal set, when one objective value and the corresponding value in the samples are almost equal, the other objective value always has varying degrees of improvement. Hence, the GP simultaneous model GGA-h method can be considered to minimize the crack and wrinkle of sheet metal-forming parts simultaneously, thus reducing defects as possible and obtaining products with good formability. Meanwhile, the crosscorrelations between quality indexes provides some consultable reference to further develop quality intelligent control strategies, which can be obtained from the outputs of GP simultaneous model. Finally, according to verifications, optimization process parameters are consultable in real production.  . 9 FLD of the auto rear axle housing after optimization. This artwork is generated by Dynaform. It is the FLD result with the optimal parameters for rear axle housing after simulation Fig. 10 The product based on optimization design. An entity is produced with the parameters optimized. As is shown, the forming quality is very well 3. The proposed method can be expanded to any other material processing, such as injection molding and aluminum extrusion, which the investigated quality indexes often have some correlation with each other. Due to the flexibility of unrestrictive correlation matrices by using the hypersphere decomposition parameterization, the computational complexity does not increase significantly with the increasing number of investigated quality indexes. Therefore, this method can be easily expanded to three and even more objectives problems. And further studies will be performed in these fields.