Optimizing the Rail Profile for High-Speed Railways Based on Artificial Neural Network and Genetic Algorithm Coupled Method

Abstract: Though the high-speed railways are seen as a sustainable form of transportation, the fact that the rail wear in high-speed railways negatively affects the running safety and riding comfort, as well as the maintenance of railways, has drawn a wide range of concerns among researchers and scholars. In order to reduce the rail wear and achieve the goal of sustainable transportation, this paper proposes an ingenious optimization program of rail profiles based on the artificial neural network (ANN) and genetic algorithm (GA) coupled method. The candidate solutions of the nonlinear GA programming model are regarded as the inputs of the trained ANN model. Meanwhile, the outputs of the trained ANN model serve as the objective functions of the GA model. The computational results show that the optimized rail profile not only has superior performances in terms of the wheel/rail wear and contact conditions, but also maintains good dynamic performances. Therefore, this study can provide the theoretical and practical basis for the design and the preventive grinding of rails in the high-speed railways. Also, the ANN-GA coupled model can be extended and further employed on the optimization of other rail profiles.


Introduction
High-speed railways in China had reached 35,000 km by the end of 2019 (http://society.people. com.cn/GB/n1/2020/0101/c1008-31530914.html). With the rapid growth of the operating mileage and train density, non-uniform rail wear has become one of the critical damages to track infrastructures, and the costs of maintenance and replacements of rails caused by wheel/rail wear are billions of CNY (China Yuan) per year in China [1]. Moreover, the deteriorated running stability and safety as well as the ride comfort of trains will be incurred by the severe rail wear. The root cause of the rail wear is the wheel-rail profile matching because the wheel/rail profiles will directly affect the interaction between them. However, at present, very few existing researches have focused on the optimization of rail profiles aiming at reducing rail wear in high-speed railways. Considering the peculiarities of high-speed railway, such as large volumes of passenger flows and busy lines [2], the time for maintenance is short and the workload is heavy. As a consequence, it is necessary to reduce the rail wear by optimizing rail profiles with a proper method. Besides, the optimized rail profiles, which will eliminate the defects of the rails' decarburized layer and reduce wheel/rail wear, can achieve the goal of prolonging the rail service life and cutting down the expenses of maintenance.
Traditionally, the trial-and-error method was employed to design the rail/wheel profile, which was mainly based on railway engineers' experience. Recently, many advanced design methods of rail/wheel profiles have been proposed, benefiting from the rapid development of optimization techniques. Cui

Rail Profile Design
This section presents the mathematical descriptions for the rail profile design.

Variables Design
The Chinese 60N rails have been widely employed in China high-speed railways. Also, according to a large amount of the onsite observations, it is found that the width of the contact bands is usually 20 to 30 mm at the top of rails [16]. Consequently, the points-to-be-optimized are chosen among the −20 mm to +20 mm region at the top of rails, as shown in Figure 2. This region is divided into n + 1 curve segments. The points at both ends are fixed at y and z coordinates, but the points-tobe-optimized are only fixed at y coordinates. As a result, the changes of these target points in z directions are able to generate different rail profiles by cubic splines. Moreover, the number of pointsto-be-optimized not only impacts on the accuracy of the optimized results but also the scale of the optimization procedure. Considering that, 10 points-to-be-optimized are chosen.

Objective Function
The wear of rails which will change the profiles may have an impact on the running safety and ride comfort. Therefore, it is important to develop an objective function that can accurately describe the wear of the region to be optimized. The objective function developed in this paper is expressed as follows:

Rail Profile Design
This section presents the mathematical descriptions for the rail profile design.

Variables Design
The Chinese 60N rails have been widely employed in China high-speed railways. Also, according to a large amount of the onsite observations, it is found that the width of the contact bands is usually 20 to 30 mm at the top of rails [16]. Consequently, the points-to-be-optimized are chosen among the −20 mm to +20 mm region at the top of rails, as shown in Figure 2. This region is divided into n + 1 curve segments. The points at both ends are fixed at y and z coordinates, but the points-to-be-optimized are only fixed at y coordinates. As a result, the changes of these target points in z directions are able to generate different rail profiles by cubic splines. Moreover, the number of points-to-be-optimized not only impacts on the accuracy of the optimized results but also the scale of the optimization procedure. Considering that, 10 points-to-be-optimized are chosen.
Sustainability 2020, 12, 658 3 of 23 Points-to-be-optimized chosen among the major contact area of the Chinese 60N rail Rail wear calculated employing the multibody dynamics-rail wear (MDRW) model The ANN model trained and tested by training and testing sets The rail profile optimized by means of the GA model developed in this paper

Rail profiles satisfying the constrains
Coordinates of points-tobe-optimized as input data Output data Objective functions related to rail wear as output data Candidate solutions Figure 1. The framework of the optimization method in this paper.

Rail Profile Design
This section presents the mathematical descriptions for the rail profile design.

Variables Design
The Chinese 60N rails have been widely employed in China high-speed railways. Also, according to a large amount of the onsite observations, it is found that the width of the contact bands is usually 20 to 30 mm at the top of rails [16]. Consequently, the points-to-be-optimized are chosen among the −20 mm to +20 mm region at the top of rails, as shown in Figure 2. This region is divided into n + 1 curve segments. The points at both ends are fixed at y and z coordinates, but the points-tobe-optimized are only fixed at y coordinates. As a result, the changes of these target points in z directions are able to generate different rail profiles by cubic splines. Moreover, the number of pointsto-be-optimized not only impacts on the accuracy of the optimized results but also the scale of the optimization procedure. Considering that, 10 points-to-be-optimized are chosen.

Objective Function
The wear of rails which will change the profiles may have an impact on the running safety and ride comfort. Therefore, it is important to develop an objective function that can accurately describe the wear of the region to be optimized. The objective function developed in this paper is expressed as follows:

Objective Function
The wear of rails which will change the profiles may have an impact on the running safety and ride comfort. Therefore, it is important to develop an objective function that can accurately describe the wear of the region to be optimized. The objective function developed in this paper is expressed as follows: where m is mth type of vehicle, α m represents the weight of the mth type of vehicle, in which the influences of the types of vehicles, running speed, and other factors can be considered, and the value of it can be settled based on demand. z 1 , z 2 , z 3 , . . . , z n are the z coordinates of the design variables, w m i denotes the rail wear of the mth type of vehicle's ith point-to-be-optimized, and w m is the average rail wear of the mth type of vehicle's all points-to-be-optimized.
The rail wear mentioned in the objective function is evaluated by the means of the MDRW model, and the MDRW model consisting of the multibody dynamics model and the wear model is introduced in the following subsections.

Multibody Dynamics Model
The vehicle model is built with detailed suspensions, as described, which consists of one car body and two bogies, and each bogie is made up of one frame, two wheelsets, and four axel boxes. The whole vehicle model established by SIMPACK is made up of 15 rigid bodies and 50 degrees of freedom in all, as listed in Table 1. Moreover, the force elements within the primary and secondary suspensions are nonlinear force elements, including spring stiffnesses and dampers. Further parameters of the vehicle model can be found in Xin et al. [17]. In order to simulate actual operational conditions, irregularities of the high-speed railways measured on site are adopted in the model, as shown in Figure 3. Furthermore, the moving track model is applied in this paper, which consists of rails, fasteners, and the track slab, and the detail parameters of the moving track model can be found in Zhai [18], as shown in Figure 4. In this model, dynamic simulations are conducted on a track including tangent and curved sections, and the parameters of the track are listed in Table 2. The results of dynamic simulations, such as contact points positions, contact forces, creepages, and so on, are input to the wear model for the calculations of rail wear.
where m is mth type of vehicle, αm represents the weight of the mth type of vehicle, in which the influences of the types of vehicles, running speed, and other factors can be considered, and the value of it can be settled based on demand. z1, z2, z3, …, zn are the z coordinates of the design variables, m i w denotes the rail wear of the mth type of vehicle's ith point-to-be-optimized, and m w is the average rail wear of the mth type of vehicle's all points-to-be-optimized.
The rail wear mentioned in the objective function is evaluated by the means of the MDRW model, and the MDRW model consisting of the multibody dynamics model and the wear model is introduced in the following subsections.

Multibody Dynamics Model
The vehicle model is built with detailed suspensions, as described, which consists of one car body and two bogies, and each bogie is made up of one frame, two wheelsets, and four axel boxes. The whole vehicle model established by SIMPACK is made up of 15 rigid bodies and 50 degrees of freedom in all, as listed in Table 1. Moreover, the force elements within the primary and secondary suspensions are nonlinear force elements, including spring stiffnesses and dampers. Further parameters of the vehicle model can be found in Xin et al. [17]. In order to simulate actual operational conditions, irregularities of the high-speed railways measured on site are adopted in the model, as shown in Figure 3. Furthermore, the moving track model is applied in this paper, which consists of rails, fasteners, and the track slab, and the detail parameters of the moving track model can be found in Zhai [18], as shown in Figure 4. In this model, dynamic simulations are conducted on a track including tangent and curved sections, and the parameters of the track are listed in Table 2. The results of dynamic simulations, such as contact points positions, contact forces, creepages, and so on, are input to the wear model for the calculations of rail wear.   The wear model is established based on Archard's wear law [19], which has been widely used to calculate the wear caused by the contact friction between objects. The original formula is expressed as follows: where Vwear is the wear volume, kwear is the wear coefficient, N is the normal contact force, s is the sliding distance, and H is the surface hardness of the softer material. The FASTSIM algorithm is used to analyze the tangent contact within contact patches, and Hertz's contact theory is employed to analyze the normal contact. The contact ellipse is discretized first into many rectangle elements, and then the normal stress at the center of a discrete element can be calculated as follows: where (x, y) denotes the coordinates of the center of a discrete element in the contact patch's coordinate system, and a and b represent the lengths of the semi-major axis and the semi-minor axis of the elliptical contact patches, respectively. In order to calculate the wear depth in a discrete element, the normal stress at the center of a discrete element is assumed to be the normal stress of the element, and then the wear depth in the element is evaluated as follows:

Wear Model
The wear model is established based on Archard's wear law [19], which has been widely used to calculate the wear caused by the contact friction between objects. The original formula is expressed as follows: where V wear is the wear volume, k wear is the wear coefficient, N is the normal contact force, s is the sliding distance, and H is the surface hardness of the softer material. The FASTSIM algorithm is used to analyze the tangent contact within contact patches, and Hertz's contact theory is employed to analyze the normal contact. The contact ellipse is discretized first into many rectangle elements, and then the normal stress at the center of a discrete element can be calculated as follows: where (x, y) denotes the coordinates of the center of a discrete element in the contact patch's coordinate system, and a and b represent the lengths of the semi-major axis and the semi-minor axis of the elliptical contact patches, respectively. In order to calculate the wear depth in a discrete element, the normal stress at the center of a discrete element is assumed to be the normal stress of the element, and then the wear depth in the element is evaluated as follows: where ∆d is the elastic deformation in time interval ∆t and it can be calculated by Equation (5): where S = [s x s y ] T is the total slip speed obtained by the FASTSIM algorithm, V c is the velocity of the center of a discrete element related to the contact patch, and ∆x represents the length of a discrete element in the running direction. Substituting Equations (3) and (5) into Equation (4), the wear depth in each discrete element can be obtained as follows: where the wear coefficient k wear can only be obtained from a large number of onsite experiments. However, due to the large volumes of passenger flows and busy lines of high-speed railways in China, not enough experiments can be conducted. Therefore, there is no wear coefficient for high-speed railways in China at present. However, many researchers and scholars [20][21][22][23] have used the wear coefficient obtained from the commuter rail system in Stockholm to study the wheel/rail wear, which indicates that this wear coefficient can be suitable for various conditions, including the wheel/rail wear in China high-speed railways. Therefore, the wear coefficient adopted in this paper is depicted by Figure 5 according to the commuter rail system in Stockholm (Jendel [24]).
where Δd is the elastic deformation in time interval Δt and it can be calculated by Equation (5): where S = [sx sy] T is the total slip speed obtained by the FASTSIM algorithm, Vc is the velocity of the center of a discrete element related to the contact patch, and Δx represents the length of a discrete element in the running direction. Substituting Equations (3) and (5) into Equation (4), the wear depth in each discrete element can be obtained as follows: where the wear coefficient kwear can only be obtained from a large number of onsite experiments. However, due to the large volumes of passenger flows and busy lines of high-speed railways in China, not enough experiments can be conducted. Therefore, there is no wear coefficient for highspeed railways in China at present. However, many researchers and scholars [20][21][22][23] have used the wear coefficient obtained from the commuter rail system in Stockholm to study the wheel/rail wear, which indicates that this wear coefficient can be suitable for various conditions, including the wheel/rail wear in China high-speed railways. Therefore, the wear coefficient adopted in this paper is depicted by Figure 5 according to the commuter rail system in Stockholm (Jendel [24]). For each time step of the calculation (2 × 10 −3 s), the wear of all discrete elements in contact patches along the running direction is summed, and it is regarded as the total wear of the rail profile caused by one rolling circle of a wheel. By summing up the wear of four wheels on the same side, the distributions of rail wear can be obtained when a vehicle passes one time.

Model Verification
The running speed of the vehicle, the radius of the curve, the length of the transition curve, and the super-elevation are: 250 km/h, 6000 m, 180 m, and 70 mm, respectively. In addition, the German high-speed track spectrum is adopted as irregularities. The operating conditions are the same as those in Zhai [18], and the dynamic responses of the multibody dynamics model established in this paper and Zhai [18] are listed in Table 3. For each time step of the calculation (2 × 10 −3 s), the wear of all discrete elements in contact patches along the running direction is summed, and it is regarded as the total wear of the rail profile caused by one rolling circle of a wheel. By summing up the wear of four wheels on the same side, the distributions of rail wear can be obtained when a vehicle passes one time.

Model Verification
The running speed of the vehicle, the radius of the curve, the length of the transition curve, and the super-elevation are: 250 km/h, 6000 m, 180 m, and 70 mm, respectively. In addition, the German high-speed track spectrum is adopted as irregularities. The operating conditions are the same as those in Zhai [18], and the dynamic responses of the multibody dynamics model established in this paper and Zhai [18] are listed in Table 3. It can be seen that the maximum values of the established model are close to those of Zhai [18]. Therefore, it indicates the correctness of the multibody dynamics model established in this paper.

Constraint Function
To achieve optimization goals that conform to the actual conditions, it is necessary to apply constraints on variables.
Constraint function a: The determination of rail profiles must satisfy the requirement of the convex curve, that is, the slope of adjacent points decreases with the increase of y coordinates. The constraint function is expressed as follows: where (y i , z i ) are the coordinates of the ith point-to-be-optimized in the optimized region of the rail profile. Constraint function b: Optimizations of rail profiles are carried out based on the original rail profiles, so the optimized profiles should be located below the original rail profiles. Moreover, optimized rail profiles should satisfy the requirement of the maximum grinding depth, which is limited to 2 mm according to the management measures in China. Therefore, the constraint functions are expressed as follows: where l i and u i are the lower and upper limits of the ith point-to-be-optimized respectively, and ∆z i is the difference in z coordinate of the ith point-to-be-optimized between the original profile and the new one.

Optimization Model
Considering that the objective function of each generation needs to be obtained in the optimization process of the GA model, it is very cumbersome and difficult to calculate the rail wear of each generation and then substitute it into the objective function to obtain the objective function of each generation. Therefore, some sets of the variables and corresponding objective function data are obtained to train the ANN model, and then the cumbersome process mentioned above can be replaced with the trained ANN model. The inputs to this trained ANN model are the candidate solutions of the non-linear constrained GA model, and the outputs of this ANN model are the objective functions of the GA model as well. As a consequence, the functions of predicting the objective functions related to rail wear and optimizing the rail profiles are both significant to achieve the optimization process of rail profiles. Moreover, the ANN-GA coupled method makes the optimization process simple and effective.

ANN Model
Similar to the biological neural networks that constitute human/animal brains, an ANN is based on a collection of connected units or nodes called artificial neurons, as shown in Figure 6. In an ANN, these nodes are essentially a series of mathematical functions that receive weighted inputs. After processing the inputs, these nodes transfer the outputs to other nodes of the next layer (if any). An ANN is trained to predict the outputs by modifying the weights [25][26][27].
The prediction accuracy of an ANN is usually determined by activation functions, training algorithm, the number of hidden layers and nodes, and arrangement of nodes [28]. Multi-layer perceptron (MLP) is one of the most widely used ANNs for the regression analysis [29]; therefore, MLP is employed in this paper. As for the MLP model developed in this paper, the hyperbolic tangent sigmoid function and the linear function are adopted in the hidden layer and output layer, respectively, and the Levenberg-Marquardt algorithm is employed for training. According to Hornik et al. [30], a three-layer BPNN, which has one hidden layer, can approximate any function at any precision well, given a sufficient number of hidden-layer nodes. Therefore, one hidden layer is applied to the MLP model proposed in this paper, and the number of nodes in the hidden layer is determined by the means of the trial-and-error method based on the training data, as shown in Figure 7. It can be found from Figure 7 that when the number of hidden-layer nodes is 15, the value of mean square error (MSE), which is expressed by Equation (9), becomes minimum. Also, the variation tendency of MSE is rising with the increase of the number of hidden-layer nodes. As a result, the number of hidden-layer nodes is taken as 15 in the MLP model developed in this paper.
where Q is the number of the dataset, T q is the qth target value of the dataset, and y q is the qth predicted value of the dataset.
Sustainability 2020, 12, 658 8 of 23 processing the inputs, these nodes transfer the outputs to other nodes of the next layer (if any). An ANN is trained to predict the outputs by modifying the weights [25][26][27]. The prediction accuracy of an ANN is usually determined by activation functions, training algorithm, the number of hidden layers and nodes, and arrangement of nodes [28]. Multi-layer perceptron (MLP) is one of the most widely used ANNs for the regression analysis [29]; therefore, MLP is employed in this paper. As for the MLP model developed in this paper, the hyperbolic tangent sigmoid function and the linear function are adopted in the hidden layer and output layer, respectively, and the Levenberg-Marquardt algorithm is employed for training. According to Hornik et al. [30], a three-layer BPNN, which has one hidden layer, can approximate any function at any precision well, given a sufficient number of hidden-layer nodes. Therefore, one hidden layer is applied to the MLP model proposed in this paper, and the number of nodes in the hidden layer is determined by the means of the trial-and-error method based on the training data, as shown in Figure  7. It can be found from Figure 7 that when the number of hidden-layer nodes is 15, the value of mean square error (MSE), which is expressed by Equation (9), becomes minimum. Also, the variation tendency of MSE is rising with the increase of the number of hidden-layer nodes. As a result, the number of hidden-layer nodes is taken as 15 in the MLP model developed in this paper.
where Q is the number of the dataset, Tq is the qth target value of the dataset, and yq is the qth predicted value of the dataset. ANN is trained to predict the outputs by modifying the weights [25][26][27]. The prediction accuracy of an ANN is usually determined by activation functions, training algorithm, the number of hidden layers and nodes, and arrangement of nodes [28]. Multi-layer perceptron (MLP) is one of the most widely used ANNs for the regression analysis [29]; therefore, MLP is employed in this paper. As for the MLP model developed in this paper, the hyperbolic tangent sigmoid function and the linear function are adopted in the hidden layer and output layer, respectively, and the Levenberg-Marquardt algorithm is employed for training. According to Hornik et al. [30], a three-layer BPNN, which has one hidden layer, can approximate any function at any precision well, given a sufficient number of hidden-layer nodes. Therefore, one hidden layer is applied to the MLP model proposed in this paper, and the number of nodes in the hidden layer is determined by the means of the trial-and-error method based on the training data, as shown in Figure  7. It can be found from Figure 7 that when the number of hidden-layer nodes is 15, the value of mean square error (MSE), which is expressed by Equation (9), becomes minimum. Also, the variation tendency of MSE is rising with the increase of the number of hidden-layer nodes. As a result, the number of hidden-layer nodes is taken as 15 in the MLP model developed in this paper.
where Q is the number of the dataset, Tq is the qth target value of the dataset, and yq is the qth predicted value of the dataset.  In this paper, the prediction steps of the MLP model are as follows [25]: In this paper, the prediction steps of the MLP model are as follows [25]: Step 1: The input of a node in the hidden layer is summed up with the weighted inputs and a constant bias, which is formulated by Equation (10): where p and r are the pth node of the input layer and rth node of the hidden layer respectively, P and R are the number of nodes in the input layer and the hidden layer, x p is the input to the MLP model, w pr and b r are the weight and constant bias of the hidden layer respectively, and n r is the input to the rth node of the hidden layer.
Step 2: After applying the activation function to all the nodes of the hidden layer, the output of a node in the hidden layer is expressed as follows: where g(x) is the activation function, which is the hyperbolic tangent sigmoid function, and H r is the output of the rth node in the hidden layer.
Step 3: The activation function of the output layer is linear. Therefore, the output of a node in the output layer is summed up with weighted outputs from the hidden layer and a constant bias, which is expressed by Equation (12): where k is the kth node of the output layer, K is the number of nodes in the output layer, w rk and b k are the weight and constant bias of the output layer respectively, and y k is the output of the kth node in the output layer.
Step 4: The updates of weights in the hidden layer and the output layer are expressed by Equations (13) and (14), respectively: where η is the learning rate.
Step 5: The updates of biases in the hidden layer and the output layer are formulated by Equations (15) and (16), respectively: Following the above steps, the MLP model can be mathematically expressed as follows:

GA Model
GA is a heuristic search that is inspired by Charles Darwin's theory of natural evolution. Compared with tradition techniques, the advantages of GA are as follows: (1) it can search the solution globally and not easily fall into local minima and (2) it can effectively avoid differential operation on variables [31].
GA begins with a population that represents a potential set of solutions to a problem, while a population consists of a certain number of individuals encoded by genes. Each individual is actually an entity with characteristic chromosomes, which determine the external representation of shapes of individuals. Therefore, mappings from the phenotype to the genotype, that is, encoding work, need to be implemented at the beginning. Since the gene encoding is complex, it must usually need to be simplified, such as the binary encoding. After the initial generation of the population, according to the survival-of-the-fittest principle, the evolution of each generation produces better and better approximate solutions. In each generation, it is based on the fitness of the individual to make selections. The new population is generated utilizing genetic operators, which consist of combination, crossover, and mutation. This process will cause the descendant population to be more adaptable to the environment than the previous population, and the best individual in the last generation population is decoded, which can be used as the approximate optimal solution to a problem [26].
Generally speaking, the penalty function is employed to solve optimization problems with nonlinear constraints [32,33]. The penalty makes sure that any search space which violates nonlinear constraints will be abandoned. The nonlinear constraints are usually handled by creating an augmented Lagrangian form called subproblem for the original problem [34], which is formulated by Equation (18): where F(x) is the objective function, d and e are the number of inequality and equality nonlinear constraints respectively, C(x) and Ceq(x) are, separately, inequality and equality nonlinear constraints, λ is the non-negative Lagrange multiplier, l is the slack variable, ρ is the penalty parameter, and the augmented Lagrangian subproblem is minimized in the nonlinear constrained GA.
In each iteration, the Lagrange multipliers are fixed. Therefore, the subproblems do not reach minima until the termination principles are achieved. What calls for special attention is that since the subproblem extracts the nonlinear constraints from the primal problem, only linear constraints and bounds need to be satisfied. The procedure for the optimization of rail profiles employing the ANN-GA coupled optimization method is shown in Figure 8. an entity with characteristic chromosomes, which determine the external representation of shapes of individuals. Therefore, mappings from the phenotype to the genotype, that is, encoding work, need to be implemented at the beginning. Since the gene encoding is complex, it must usually need to be simplified, such as the binary encoding. After the initial generation of the population, according to the survival-of-the-fittest principle, the evolution of each generation produces better and better approximate solutions. In each generation, it is based on the fitness of the individual to make selections. The new population is generated utilizing genetic operators, which consist of combination, crossover, and mutation. This process will cause the descendant population to be more adaptable to the environment than the previous population, and the best individual in the last generation population is decoded, which can be used as the approximate optimal solution to a problem [26]. Generally speaking, the penalty function is employed to solve optimization problems with nonlinear constraints [32,33]. The penalty makes sure that any search space which violates nonlinear constraints will be abandoned. The nonlinear constraints are usually handled by creating an augmented Lagrangian form called subproblem for the original problem [34], which is formulated by Equation (18): where F(x) is the objective function, d and e are the number of inequality and equality nonlinear constraints respectively, C(x) and Ceq(x) are, separately, inequality and equality nonlinear constraints, λ is the non-negative Lagrange multiplier, l is the slack variable, ρ is the penalty parameter, and the augmented Lagrangian subproblem is minimized in the nonlinear constrained GA.
In each iteration, the Lagrange multipliers are fixed. Therefore, the subproblems do not reach minima until the termination principles are achieved. What calls for special attention is that since the subproblem extracts the nonlinear constraints from the primal problem, only linear constraints and bounds need to be satisfied. The procedure for the optimization of rail profiles employing the ANN-GA coupled optimization method is shown in Figure 8.

Results and Discussion
In the optimization process, all parameters are consistent with the actual operational conditions, which are listed in Table 2, and the detailed geometric parameters of the Chinese 60N rail profile are illustrated in Figure 9. Moreover, three types of vehicle models mainly operating in high-speed railways are employed in the optimization process, which are equipped with S1002CN, XP55, and LMA wheel profiles, respectively. The three types of wheel profiles are shown in Figure 10. The influences of different wheel profiles are considered by weighting in their objective functions, as expressed in Equation (1), in which the weight values of three types of vehicles are all set as 1/3.

Results and Discussion
In the optimization process, all parameters are consistent with the actual operational conditions, which are listed in Table 2, and the detailed geometric parameters of the Chinese 60N rail profile are illustrated in Figure 9. Moreover, three types of vehicle models mainly operating in high-speed railways are employed in the optimization process, which are equipped with S1002CN, XP55, and LMA wheel profiles, respectively. The three types of wheel profiles are shown in Figure 10. The influences of different wheel profiles are considered by weighting in their objective functions, as expressed in Equation (1), in which the weight values of three types of vehicles are all set as 1/3.

Results and Discussion
In the optimization process, all parameters are consistent with the actual operational conditions, which are listed in Table 2, and the detailed geometric parameters of the Chinese 60N rail profile are illustrated in Figure 9. Moreover, three types of vehicle models mainly operating in high-speed railways are employed in the optimization process, which are equipped with S1002CN, XP55, and LMA wheel profiles, respectively. The three types of wheel profiles are shown in Figure 10. The influences of different wheel profiles are considered by weighting in their objective functions, as expressed in Equation (1), in which the weight values of three types of vehicles are all set as 1/3.

Simulation Results of MDRW Model
According to the constraint functions, 20 sets of satisfactory rail profiles were generated, as shown in Figure 11. The profiles were input to the MDRW model to calculate rail wear and objective functions, and the results are described in Figures 12 and 13, respectively. concentrated at the rail heads, which is from −20 mm to 10 mm. Also, due to the deficient superelevation in the curved section, the wear of the left rails as the outer rails is more severe than that of the right rails. However, the deficient super-elevation is only 1.7 mm, so the rail shoulders of the left rails will not appear worn. With the growth of the distances from new profiles' rail heads to the original ones, the wear of both side rails performs as decreasing previously and increasing later. Therefore, the optimal solution should appear in the range of 20 sets of rail profiles. Figure 13 indicates the objective functions of the left rail, right rail, and the sum of them, and it can be observed that the trends of objective functions are in good agreement with the wear of 20 sets of rail profiles. As a consequence, the objective functions developed in this paper can effectively demonstrate the mapping relationship between design variables and rail wear. Considering that the optimized profile should be suitable for left and right rails, the sum of left and right rail's objective functions is regarded as the final objective function in this paper.  Figure 11. The satisfactory rail profiles. Figure 11. The satisfactory rail profiles.
It can be observed from Figure 12 that the main wear regions of different rail profiles are concentrated at the rail heads, which is from −20 mm to 10 mm. Also, due to the deficient super-elevation in the curved section, the wear of the left rails as the outer rails is more severe than that of the right rails. However, the deficient super-elevation is only 1.7 mm, so the rail shoulders of the left rails will not appear worn. With the growth of the distances from new profiles' rail heads to the original ones, the wear of both side rails performs as decreasing previously and increasing later. Therefore, the optimal solution should appear in the range of 20 sets of rail profiles. Figure 13 indicates the objective functions of the left rail, right rail, and the sum of them, and it can be observed that the trends of objective functions are in good agreement with the wear of 20 sets of rail profiles. As a consequence, the objective functions developed in this paper can effectively demonstrate the mapping relationship between design variables and rail wear. Considering that the optimized profile should be suitable for left and right rails, the sum of left and right rail's objective functions is regarded as the final objective function in this paper. (e) (f)

Prediction Accuracy of the ANN Model
In this paper, the input data are the z coordinates of points-to-be-optimized, and the output data are the corresponding values of the objective function. Moreover, 20 sets of data including 220 data points are adopted to train and test the ANN model, as listed in Table 4. Among these data, 80 percent of the data are taken randomly as the training set and the remaining data are treated as the testing set, Sustainability 2020, 12, 658 14 of 23 respectively. The relative errors of prediction results with target values based on testing data are listed in Table 5, and the mathematical expression of relative error is expressed by Equation (19).
where x p is the prediction result and x t is the target value.

Prediction Accuracy of the ANN Model
In this paper, the input data are the z coordinates of points-to-be-optimized, and the output data are the corresponding values of the objective function. Moreover, 20 sets of data including 220 data points are adopted to train and test the ANN model, as listed in Table 4. Among these data, 80 percent of the data are taken randomly as the training set and the remaining data are treated as the testing set, respectively. The relative errors of prediction results with target values based on testing data are listed in Table 5, and the mathematical expression of relative error is expressed by Equation (19).
where xp is the prediction result and xt is the target value.   It can be observed from Table 5 that the relative errors are all controlled below 3%. Moreover, the relative errors show that the trained MLP model can effectively illustrate the mapping relationship between inputs and outputs. As a result, the outputs of the MLP model can be employed as the objective functions of the GA model.

Optimization Process
The initial population is generated randomly in the range of constraint function b, and the optimization process is conducted based on the initial population. The candidate solutions of the GA model are input to the trained ANN model, and the outputs from the trained ANN model are regarded as the objective functions of the GA model. The two-directional iterative process of the ANN-GA coupled model is conducted until the minimum value of the objective function is obtained. The variation trend of the fitness with the increase of the generation is illustrated in Figure 14. It can be seen that the maximum fitness and the average fitness all remain constant after 100 generations. The optimization result can be obtained after 100 iterations, and the optimized rail profile is generated by cubic splines, as shown in Figure 15. In addition, the objective functions obtained by the ANN-GA model and the MDRW model are 11.936 × 10 −7 mm and 11.785 × 10 −7 mm respectively, and the relative error is only 1.28%. Therefore, the optimization result of the ANN-GA model is in perfect agreement with that of the MDRW model. It can be observed from Table 5 that the relative errors are all controlled below 3%. Moreover, the relative errors show that the trained MLP model can effectively illustrate the mapping relationship between inputs and outputs. As a result, the outputs of the MLP model can be employed as the objective functions of the GA model.

Optimization Process
The initial population is generated randomly in the range of constraint function b, and the optimization process is conducted based on the initial population. The candidate solutions of the GA model are input to the trained ANN model, and the outputs from the trained ANN model are regarded as the objective functions of the GA model. The two-directional iterative process of the ANN-GA coupled model is conducted until the minimum value of the objective function is obtained. The variation trend of the fitness with the increase of the generation is illustrated in Figure 14. It can be seen that the maximum fitness and the average fitness all remain constant after 100 generations. The optimization result can be obtained after 100 iterations, and the optimized rail profile is generated by cubic splines, as shown in Figure 15. In addition, the objective functions obtained by the ANN-GA model and the MDRW model are 11.936 × 10 −7 mm and 11.785 × 10 −7 mm respectively, and the relative error is only 1.28%. Therefore, the optimization result of the ANN-GA model is in perfect agreement with that of the MDRW model.

Wear Performances
The wear of both side rails and leading wheelsets are separately calculated by the MDRW model, in which the parameters of the track are listed in Table 2. The wheel wear of the leading wheelsets is depicted in Figure 16. It can be concluded that, due to the use of the optimized rail profile, the left and right wheels' wear depth becomes smaller than those of the original rail profile. In addition, the maximum wear depth appears to be on the left wheel of XP55. After applying the optimized rail profile to three types of vehicles, the wear regions of both side wheels basically become larger, but they are still in the center regions of the wheel profiles.

Wear Performances
The wear of both side rails and leading wheelsets are separately calculated by the MDRW model, in which the parameters of the track are listed in Table 2. The wheel wear of the leading wheelsets is depicted in Figure 16. It can be concluded that, due to the use of the optimized rail profile, the left and right wheels' wear depth becomes smaller than those of the original rail profile. In addition, the maximum wear depth appears to be on the left wheel of XP55. After applying the optimized rail profile to three types of vehicles, the wear regions of both side wheels basically become larger, but they are still in the center regions of the wheel profiles.

Wear Performances
The wear of both side rails and leading wheelsets are separately calculated by the MDRW model, in which the parameters of the track are listed in Table 2. The wheel wear of the leading wheelsets is depicted in Figure 16. It can be concluded that, due to the use of the optimized rail profile, the left and right wheels' wear depth becomes smaller than those of the original rail profile. In addition, the maximum wear depth appears to be on the left wheel of XP55. After applying the optimized rail profile to three types of vehicles, the wear regions of both side wheels basically become larger, but they are still in the center regions of the wheel profiles. With regards to the wear of both side rails, the results are presented in Figure 17. After applying the optimized profile to three types of vehicles, the wear depths of both side rails are smaller than those of the original profile. Also, because of the deficient super-elevation in the curved section, the wear depth of left rails as the outer rails is bigger than that of right rails, whether employing the optimized rail profile or not. The maximum wear depth also appears to be on the left rail of XP55. For both side rails, the wear areas are extended after employing the optimized profile. Therefore, the wear distributions of rail heads are more uniform, and the occurrence of concentrated wear is decreased. With regards to the wear of both side rails, the results are presented in Figure 17. After applying the optimized profile to three types of vehicles, the wear depths of both side rails are smaller than those of the original profile. Also, because of the deficient super-elevation in the curved section, the wear depth of left rails as the outer rails is bigger than that of right rails, whether employing the optimized rail profile or not. The maximum wear depth also appears to be on the left rail of XP55. For both side rails, the wear areas are extended after employing the optimized profile. Therefore, the wear distributions of rail heads are more uniform, and the occurrence of concentrated wear is decreased. With regards to the wear of both side rails, the results are presented in Figure 17. After applying the optimized profile to three types of vehicles, the wear depths of both side rails are smaller than those of the original profile. Also, because of the deficient super-elevation in the curved section, the wear depth of left rails as the outer rails is bigger than that of right rails, whether employing the optimized rail profile or not. The maximum wear depth also appears to be on the left rail of XP55. For both side rails, the wear areas are extended after employing the optimized profile. Therefore, the wear distributions of rail heads are more uniform, and the occurrence of concentrated wear is decreased.

Dynamic Performances
To compare and analyze the dynamic performances when employing the optimized and original rail profiles, the vehicle-track coupled dynamic model is established, as mentioned in Section 2.2.1. The parameters of the track are listed in Table 2.
Considering the running safety and stability, it is worth noting that the evaluation indexes include the vertical/lateral wheel-rail force, derailment coefficient, wheel load reduction ratio, and area of contact patch. The peak values of these evaluation indexes when employing the original profile and the optimized profile are illustrated in Figure 18. As for the area of contact patch, the average values are also listed. The statistical histograms of the vertical/lateral wheel-rail force, derailment coefficient, and the wheel load reduction ratio are illustrated in Figure 19, and the time history curves of contact patch area are shown in Figure 20. Moreover, the derailment coefficient and the wheel load reduction ratio can be calculated by Equations (20) and (21).
where FL is the lateral wheel-rail force, FV denotes the vertical wheel-rail force, α is the angle of wheel flange, and μ denotes the wheel-rail friction coefficient.
where ΔW is the quantity of the wheel load reduction, W is the average static wheel load, and W1 and W2 denote the two sides of wheel loads, respectively. From Figures 18 and 19, it can be found that very small differences, when comparing the original profile with the optimized profile under three conditions, exist in the vertical/lateral wheel-rail force, derailment coefficient, and wheel load reduction ratio. Moreover, the statistical distributions of these indexes mentioned above all show tends of normal distributions, approximately. Except for the lateral wheel-rail force of S1002CN, the peak values of the other three evaluation indexes all declined. As for the lateral wheel-rail force of S1002CN, the peak values of the optimized profile only increased by 3.13%. However, the peak values of the vertical wheel-rail force, derailment coefficient, and wheel load reduction ratio were much smaller than the limiting values in corresponding regulations [35].

Dynamic Performances
To compare and analyze the dynamic performances when employing the optimized and original rail profiles, the vehicle-track coupled dynamic model is established, as mentioned in Section 2.2.1. The parameters of the track are listed in Table 2.
Considering the running safety and stability, it is worth noting that the evaluation indexes include the vertical/lateral wheel-rail force, derailment coefficient, wheel load reduction ratio, and area of contact patch. The peak values of these evaluation indexes when employing the original profile and the optimized profile are illustrated in Figure 18. As for the area of contact patch, the average values are also listed. The statistical histograms of the vertical/lateral wheel-rail force, derailment coefficient, and the wheel load reduction ratio are illustrated in Figure 19, and the time history curves of contact patch area are shown in Figure 20. Moreover, the derailment coefficient and the wheel load reduction ratio can be calculated by Equations (20) and (21).
where F L is the lateral wheel-rail force, F V denotes the vertical wheel-rail force, α is the angle of wheel flange, and µ denotes the wheel-rail friction coefficient.
where ∆W is the quantity of the wheel load reduction, W is the average static wheel load, and W 1 and W 2 denote the two sides of wheel loads, respectively.      (j) (k) (l) Figure 19. The statistical histograms of (a-f) vertical/lateral wheel-rail force, (g-i) derailment coefficient, and (j-l) wheel load reduction ratio of S1002CN, LMA, and XP55, respectively. (The red and blue bars refer to optimized profiles and the original profiles, respectively).
Unlike the other evaluation indexes, it can be observed from Figure 18e,f, and Figure 20 that the differences of the contact patch area are obvious when comparing the optimized profile with the original profile under three conditions. After using the optimized profile, the contact patch area becomes larger and the maximum increment can reach up to 46.87%, which appears in the peak value of LMA. As a result, employing the optimized profile can improve the contact conditions between rails and wheels. To sum up, the differences are very small on the whole, though sometimes the dynamic responses of the optimized profile are bigger than those of the original profile, except for the area of contact patch. Moreover, the peak values of both profiles are much smaller than limiting values. As a result, the optimized profile not only achieves the optimization goal but also maintains good dynamic performances.

Effectiveness Evaluation
To evaluate the optimization effectiveness of the ANN-GA coupled method proposed in this paper, the increment of the area of contact and the decrease of the maximum contact pressure are compared with that obtained by the method developed in Cui et al. [36]. The comparisons of the two evaluation indexes are listed in Table 6. It can be observed that the method proposed in this paper performs better than that developed in Cui et al. [36].  Figure 19. The statistical histograms of (a-f) vertical/lateral wheel-rail force, (g-i) derailment coefficient, and (j-l) wheel load reduction ratio of S1002CN, LMA, and XP55, respectively. (The red and blue bars refer to optimized profiles and the original profiles, respectively). (j) (k) (l) Figure 19. The statistical histograms of (a-f) vertical/lateral wheel-rail force, (g-i) derailment coefficient, and (j-l) wheel load reduction ratio of S1002CN, LMA, and XP55, respectively. (The red and blue bars refer to optimized profiles and the original profiles, respectively).

Conclusions
Unlike the other evaluation indexes, it can be observed from Figure 18e,f, and Figure 20 that the differences of the contact patch area are obvious when comparing the optimized profile with the original profile under three conditions. After using the optimized profile, the contact patch area becomes larger and the maximum increment can reach up to 46.87%, which appears in the peak value of LMA. As a result, employing the optimized profile can improve the contact conditions between rails and wheels. To sum up, the differences are very small on the whole, though sometimes the dynamic responses of the optimized profile are bigger than those of the original profile, except for the area of contact patch. Moreover, the peak values of both profiles are much smaller than limiting values. As a result, the optimized profile not only achieves the optimization goal but also maintains good dynamic performances.

Effectiveness Evaluation
To evaluate the optimization effectiveness of the ANN-GA coupled method proposed in this paper, the increment of the area of contact and the decrease of the maximum contact pressure are compared with that obtained by the method developed in Cui et al. [36]. The comparisons of the two evaluation indexes are listed in Table 6. It can be observed that the method proposed in this paper performs better than that developed in Cui et al. [36].  From Figures 18 and 19, it can be found that very small differences, when comparing the original profile with the optimized profile under three conditions, exist in the vertical/lateral wheel-rail force, derailment coefficient, and wheel load reduction ratio. Moreover, the statistical distributions of these indexes mentioned above all show tends of normal distributions, approximately. Except for the lateral wheel-rail force of S1002CN, the peak values of the other three evaluation indexes all declined. As for the lateral wheel-rail force of S1002CN, the peak values of the optimized profile only increased by 3.13%. However, the peak values of the vertical wheel-rail force, derailment coefficient, and wheel load reduction ratio were much smaller than the limiting values in corresponding regulations [35].

Conclusions
Unlike the other evaluation indexes, it can be observed from Figure 18e,f, and Figure 20 that the differences of the contact patch area are obvious when comparing the optimized profile with the original profile under three conditions. After using the optimized profile, the contact patch area becomes larger and the maximum increment can reach up to 46.87%, which appears in the peak value of LMA. As a result, employing the optimized profile can improve the contact conditions between rails and wheels.
To sum up, the differences are very small on the whole, though sometimes the dynamic responses of the optimized profile are bigger than those of the original profile, except for the area of contact patch. Moreover, the peak values of both profiles are much smaller than limiting values. As a result, the optimized profile not only achieves the optimization goal but also maintains good dynamic performances.

Effectiveness Evaluation
To evaluate the optimization effectiveness of the ANN-GA coupled method proposed in this paper, the increment of the area of contact and the decrease of the maximum contact pressure are compared with that obtained by the method developed in Cui et al. [36]. The comparisons of the two  Table 6. It can be observed that the method proposed in this paper performs better than that developed in Cui et al. [36].

Conclusions
In this paper, an ANN-GA coupled optimization method was developed to optimize the rail profile. The ANN model was developed by feeding forward back-propagation neural networks with the MLP structure. The input data were the coordinates of points-to-be-optimized and the output data were the objective functions related to the rail wear. The prediction model was able to predict the objective functions with accuracy above 97%. Therefore, it can indicate that the prediction model is accurate and efficient.
The trained ANN model coupled with the GA model was developed in this paper, which has multiple nonlinear constraints. The candidate solutions of the GA model were input to the trained ANN model, and the outputs from the trained ANN model were regarded as the objective functions of the GA model. The two-directional iterative process of the ANN-GA coupled model were conducted until the minimum value of the objective function was obtained.
The related error of the optimized rail profile's objective function predicted by the ANN-GA coupled model was only 1.28%, compared with that calculated by the MDRW model. Therefore, the validity of the optimized rail profile can be proved. Then, the optimized profile and the original profile were systematically compared in aspects of the wheel/rail wear and dynamic performances, and the MDRW model developed in this paper was used to conduct the comparative study. It can be found from computational results that the wear depth of the optimized rail profile was smaller, and the wear distributions were more uniform. The wear depth of wheels of the leading wheelset became smaller after adopting the optimized rail profile as well. Furthermore, the differences of the vertical/lateral wheel-rail force, derailment coefficient, and wheel load reduction ratio were very small. Moreover, they were much smaller than the limiting values. As for the contact patch area, the differences in the contact patch area were obvious. After using the optimized rail profile, the contact patch area became larger, and the increment can reach to 46.87%. Therefore, employing the optimized profile can improve the contact conditions between rails and wheels.
Conclusions can be drawn that the optimized rail profile not only has superior performances in terms of the wheel/rail wear and contact conditions but also maintains good dynamic performances. The present work in this paper aimed to propose the ingenious ANN-GA coupled method theoretically and then the optimized rail profile was compared and analyzed by simulating. Finally, this study can provide the theoretical and practical basis for the design and the preventive grinding of rails used in the high-speed railways. Moreover, the ANN-GA coupled model can also be used for the optimization of other rail profiles.