Energy-efﬁcient train operation with steep track and speed limits: A novel Pontryagin’s maximum principle-based approach for adjoint variable discontinuity cases

In this study, an energy-efﬁcient speed trajectory planner is proposed for high-speed trains traveling on tracks with steep gradients and speed limits, especially for situations in which the speed limit has been reached, which causes adjoint variable discontinuity during calculation. New optimal switching rules at points where the speed limit is reached on steep tracks are derived by analysing the jump condition of state-constrained Pontryagin’s maximum principle. Accordingly, a novel two-step algorithm for high-speed trains, including an optimal-cruise minimum-time algorithm and search-substitution algorithm, is designed to solve dynamic train models considering time-energy and space-energy conversions, respec-tively. Practical case studies demonstrates that the proposed method can save energy by approximately 3 % and 10 % in comparison to the approximate-optimal time-satisﬁed and minimum running time strategies, respectively. Moreover, the proposed method approximately consumes 0.98% and 1.62% of the computation time taken by discrete dynamic programming and reinforcement learning, respectively.


FIGURE 1
Schematic scenario for this study two most promising measures for energy-saving methods [2]. The EEDS involves calculating the optimal speed profile and the sequence of its corresponding control modes, thus reducing the consumed energy maximally [3]. The energy recovery theory involves recycling the electric energy converted from kinetic energy during the braking process to realise the goal of energy saving [4]. However, the EEDS for high-speed trains equipped with ATO and energy recovery devices on undulating terrains is more intricate in comparison to that for normal trains. It must manage tracks with steep grades that frequently appear in the real world [2] and prevent trains from overspeeding, as shown in Figure 1.
Considering the nonlinear characteristics of a dynamic train model in steep tracks, intelligent algorithms are widely used owing to their universality, flexibility, and generality. Genetic simulated annealing, firefly, and Big Bang-Big Crunch algorithms were employed in the light rail network in Eskisehir, Turkey, to determine the optimal switching points in the optimal operation trajectory [5]. Yin et al. [6] developed intelligent train operation systems based on expert systems and reinforcement learning (RL) to generate the optimal speed profile. Wang et al. [7] addressed the intelligent driving problem for heavy haul trains in long steep downhill sections based on the fusion of expert human knowledge and machine learning methodologies. Lu et al. [8] proposed a new graphic model that can be used for the application of more general optimisation algorithms for comparative studies. Considering the operation rules of fixed-block signalling systems, ant colony optimisation was used to manage operational constraints [9]. Track alignment, speed limit, and schedule adherence were studied using the simulated annealing algorithm [10]. Genetic algorithm-based methods have been widely applied in train control to find the optimal switching point between different modes [11][12][13][14]. Although intelligent algorithms can be easily implemented, they can only obtain suboptimal solutions and are prone to getting trapped in local optima. Artificial intelligence methods, in particular, require significant computing power to train a model, whose hardware costs are high.
Analytical and numerical methods, based on mathematical derivation results, can prevent invalid search for intelligent algorithms in calculating energy-efficient speed profiles, such as sequential quadratic programming [15] and pseudospectral method [16,17]. Pontryagin's maximum principle (PMP) is an analytical method that can be traced to the calculus of variations (COV). However, in comparison to COV, the classical PMP is widely used because it can provide the necessary conditions FIGURE 2 Speed limit violated and cases where the limit is not reached at a typical steep track when using the energy-efficient strategy when the control variables are constrained for global optimisation by maximising the Hamiltonian [18].
The scheduling and control group at the University of South Australia studied an energy-efficient operation method on flat routes based on the PMP [19]. The analysis demonstrated that optimal operations should strictly obey the power-cruise-coastbrake rule on flat tracks. Based on the PMP, Liu and Golovitcher [20] further derived that an adjoint variable can be expressed as an algebraic function of speed, and provided the optimal speed control conversion diagram in normal tracks. Khmelnitsky [21] established the train state-space equation with kinetic energy and potential energy as state variables, and pointed out the relationship between the optimal traction cruise speed and optimal braking cruise speed. In this study, a framework based on the concept of linking functions is presented. Albrecht et al. [22,23] divided a track into a gentle slope section, steep uphill section, and steep downhill section, and analysed the optimal driving strategy for steep slopes when the speed limits were not reached. Converting the linking function to an approximate solution based on the equivalent travel time criterion, Yang et al. [24] proposed a new numerical method for cases where the speed limit constraints were not reached. Lin et al. [25] analysed the energy-efficient operation of freight trains in long-distance steep downhill sections with air-filled time constraints. Zhou et al. [26] studied the optimal control strategy on continuous change gradient steep downgrades. However, the classical PMP cannot solve the state-constrained problem. Therefore, the classical PMP-based EEDS method can only be used for slow or freight trains, whose scheduled time is significantly larger than the minimum running time and where the speed limit will not be reached in EEDS operation.
However, in daily operation, the scheduled running time for a high-speed train is always close to or slightly longer than the minimum running time owing to efficiency and capacity, implying that the running speed of high-speed trains is close to or slightly less than the speed limit. Hence, speed limits are frequently reached while using the EEDS, especially in tracks with various speed limits and steep slopes. Therefore, the classical PMP-based EEDS methods are not suitable to solve cases where the speed limit is reached when the train runs through a steep section using the EEDS, or their operations may violate the speed limit, as shown in Figure 2. The state-constrained PMP is more complex than the classical PMP that is used in cases where the limit is not reached, and it needs to solve the discontinuity of adjoint variables, which has not been well researched yet. This study contributes to improve the energy efficiency of high-speed trains in tracks with steep grades and speed limits, especially for cases in which the speed limit is reached during EEDS operation. Using the state-constrained PMP and jump condition proposed by Hartl et al. [27], new optimal energyefficient mode switching rules for cases where the speed limit is reached on steep tracks are proposed to solve adjoint variable discontinuity during calculation. Based on these aforementioned new rules, a novel two-step speed trajectory optimisation algorithm is developed that utilises information such as the dynamic speed limit, schedule, steep tracks and train operation rules. The effectiveness and flexibility of the proposed method in comparison to other methods have been demonstrated through real-world case studies.
The remainder of this paper is organised as follows. Section 2 presents the description of the optimisation problem. Section 3 shows the formulation of the state-constrained PMP and jump condition. In Section 4, the structure of the two-step algorithm is explained. Section 5 discusses results of case studies. Finally, Section 6 presents the conclusions of this work. Examples in the Appendix illustrate the calculation process of the proposed method. The notations used in this paper, including parameters, auxiliary variables, and decision variables are listed in the Nomenclature section.

Dynamic train model
The movement of a train can be formulated as the state space formulation as follows [28,29]: where independent variable x represents the front position of the train in the track profile, running time t and speed v are the state variables. u(x) is the controlled output force per unit mass, which is positive for traction output, negative for brake output, and zero for no output. w 0 (v) is running resistance force per unit mass, g(x) is the gradient resistance force per unit mass due to the track gradient, w r (x) is curve resistance force per unit mass. The details of each force are explained as follows: Controlled output force per unit mass: The output force per unit mass cannot exceed the range of the maximum traction force U (v) > 0 and the maximum braking force per unit mass U (v) < 0. U (v) and U (v) are the functions of the speed, as shown in Figure 3.
Running resistance force per unit mass: The running resistance force per unit mass w 0 (v), as shown in Figure 3, is calculated by the Davis formulation [30]: where A, B and C are empirical coefficients fitted from field data, and they are positive constants. g is the gravity acceleration. Gradient resistance force per unit mass: Because the slope value is numerically small, the relationship between gradient and slope can be further formulated as sin( ) ≈ tan( ) = . Therefore, the component of gravitational acceleration is defined as: where L is the train length, (x) is the slope along the route, dl is the derivative of the train length. The positive gradient of uphill tends to slow the train, and the negative gradient of downhill makes the train prone to accelerate. Curve resistance force per unit mass: The curve resistance force per unit mass is given by [31]: where R(x) is the radius along the route. D is the empirical coefficient fitted from field data. If the track alignment in position x is a straight line, then R(x) equals ∞.

Optimisation problem
The goal of EEDS in tracks with various speed limits and steep gradients is to find an energy saving speed profile so that the train, equipped with energy recovery devices, can complete the transportation task.
Cost function: The energy consumed by the auxiliary equipment is constant when the running time is fixed and the power of auxiliary equipment is unchanged. Therefore, the total energy consumption in this paper is defined as the mechanical energy, used for traction and cruise, minus the energy recovered in the braking process. The energy-efficient speed profile optimisation problem is formulated as follows: The average energy recovery and utilisation efficiency in the objective function (6) is , ∈ [0, 1]. If equals 0, it means no braking energy is recycled. The closer is to 1, the higher the utilisation rate of recovered energy. The constraints of the problem are listed below. Train dynamic constraints: The movement of the train should obey the physical laws, which are shown in Equations (1) and (2).
Schedule constraints: The train is required to depart from the starting station according to the timetable, and arrive at the terminal station within the scheduled time.
t ( Control output constraint: The value of control output per unit mass at every position should be between the maximum traction and the maximum braking force per unit mass.
Speed limit constraint: The speed limit constraint is also the state constraint, which makes sure that the point of the train space occupancy dose not violate its corresponding speed limit.
The definitions of the steep sections are illustrated as follows.
Steep downhill: The train running at the steep downhill section with speed v still tends to accelerate when the control output is zero.
Steep uphill: The train running at the steep uphill section with speed v still tends to decelerate when the control output is U (v).

METHODOLOGY
The difficulty of solving the optimisation model proposed in Section 2 is that the state variable v has a constraint and the classical PMP can only analyse cases where the control variable u is constrained. Because no information can be derived to satisfy the state constraint, only the first-order necessary conditions on the extremality of the Hamiltonian are used. Therefore, this paper uses the state-constrained PMP and jump condition proposed by Hartl et al. [27] to get some extra remarks.

State-constrained maximum principle
Necessary condition of the state-constrained PMP: If the optimal trajectory of state variables, time t * (x) and velocity v * (x), and the corresponding optimal control u * (x) are found, then there exist adjoint variables , multipliers , and , which are not always zero for every x and the following condition holds: The Hamiltonian ℋ is obtained by using Equations (6), (1) and (2): where 1 and 2 are adjoint variables. The Lagrangian ℒ is obtained by using Karush-Kuhn-Tucker conditions to handle the constraints (9) and (10): where 1 , 2 and are multipliers.
To find an optimal control that maximises the Hamiltonian subject to constraints, we have: The canonical equations for the Lagrangian are as follows: It can be seen that 1 is a constant. Based on Equations (13)- (21), conclusions shown in Remarks 1 to 3 can be obtained by using a similar proof process described in [19][20][21][22][23] based on the classical PMP. Therefore, the proof process for these remarks is simplified here. These remarks are used for designing the algorithm in Section 4. The reader can refer to the above studies for details. Remark 1. The optimal driving strategy consists of five driving modes listed in Table 1. The switching rule of the optimal driving strategy depends on the value of the auxiliary adjoint variable , which is defined as Because v(x) is a continuous variable, the study of the adjoint variable 2 (x) can be equivalently converted to the study of the auxiliary adjoint variable (x).
Remark 2. The traction cruise mode is used to maintain the lower speed of the optimal traction cruise speed v bq and speed limit V . The brake cruise mode is used to maintain the lower speed of the optimal braking cruise speed v bz and speed limit V . The relationship between the optimal traction cruise speed v bq and the optimal braking cruise speed v bz is given as follows: where the auxiliary function (v) is a non-negative strictly increasing function, which is defined as Proof. When traction cruise mode is used, (x) = 1 and d (x)∕dx = 0 hold. The following equation is obtained by solving the simultaneous equations of Equation (21) and the differentiated result of Equation (22).
For v(x) < V (x), (x) = 0 holds according to Equation (18). Therefore, (v) = − 1 has only one solution v bq . In this case, the control output is calculated as For In this case, the control output is calculated as When brake cruise mode is used, (x) = and d (x)∕dx = 0 hold. The following equation is obtained by solving the simultaneous equations of Equation (21) and the differentiated result of Equation (22).
For v(x) < V (x), (x) = 0 holds according to Equation (18). Therefore, (v) = − 1 ∕ has only one solution v bz . In this case, the control output is calculated as For v(x) = V (x), (x) ⩾ 0 holds according to Equation (18). Therefore, (v) = − 1 ∕ − v 2 (x) ⋅ (x)∕ has only one solution V (x). In this case, the control output is calculated as According to Equation (23), once we know one of the v bq and v bz , the other one can be determined. □ Remark 3. The auxiliary adjoint variable (x) in the continuous regular control section can be expressed as follows: Proof. Equation (27) is derived by substituting Equation (24) and (v bq ) = − 1 into the simultaneous equations of Equation (21) and the differentiated result of Equation (22). Therefore, the value of the auxiliary adjoint variable (x) at most points can be obtained by using the speed at the corresponding position. □

FIGURE 4
Six basic auxiliary adjoint variable discontinuity cases

Jump condition
Next, the properties of the discontinuity of the adjoint variable are analysed by using the jump condition proposed by Hartl et al. [27] to gain some extra useful optimal mode switching rules.
Jump conditions: For any position where x = triggers or releases the speed limit constraint (10), the adjoint variable may have a discontinuity following the jump conditions: It can be seen that the multiplier cannot be determined in the jump condition if v * ( ) = V ( ) and dv * ∕dx = dV ∕dx, which makes the jump value of the adjoint variable 2 in Equation (29) impossible to calculate. Hence, it is not easy to use the jump condition Equations (28)-(31) directly to calculate the gap value between 2 ( − ) and 2 ( + ), which is also a limitation of the PMP as mentioned before. Therefore, Remark 4 to Remark 6 are proposed, based on Equation (28)- (31). Proof. The following inequality is obtained by putting the condition ( ) ⩾ 0 in Equation (31) into Equation (29).
Therefore, the adjoint variable 2 may only jump up, and the auxiliary adjoint variable may also only jump up, according to Equation (22). □ Remark 5. Under the premise of using the coast mode on both sides of the position at which the speed limit reached , if the discontinuity of the adjoint variable, 2 ( − ) < 2 ( + ), occurs, which also means ( − ) < ( + ), then this position must be the end point of the steep downhill section (ESDS), as shown in Figure 4(a). Under the premise of using the full power mode on both sides of the position at which the speed limit reached , if the discontinuity of the adjoint variable, 2 ( − ) < 2 ( + ), occurs, which also means ( − ) < ( + ), then this position must be the starting point of the steep uphill section (SSUS), as shown in Figure 4 Proof. Considering this remark has two similar parts, we prove the necessities of the two parts in order and the sufficiency.
The train reaches the local maximum speed at the ESDS by using coast mode. At this point, the acceleration of the train is zero,̇v( ) = 0. Besides, V is constant. The limits of the forces implemented in the train at both sides of are the same, namely u( According to the definition of the steep downhill section mentioned in Equation (11), at the ESDS, the formula −w 0 ( − ) − g( − ) − w r ( − ) = −w 0 ( + ) − g( + ) − w r ( + ) = 0 always holds. Therefore, we have the following 1 is a constant variable according to Equation (20). Hence, the limits of the Hamiltonian on both sides are equal. Therefore, Equation (30) and the other equations in the jump condition hold. The necessity of the first part is proved.
The difference between the necessity proof process of the second part and the first part is that the train reaches the local maximum speed at the SSUS by using full power mode. At this point, the acceleration of the train is zero. According to the definition of the steep uphill section mentioned in (12), = 0 always holds. Therefore, Equation (33) also holds here. The limits of the Hamiltonian on both sides are equal. Hence, Equation (30) and the other equations in the jump condition hold. The necessity of the second part is proved. Now, to prove the sufficiency by contradiction, assume that except for the ESDS and SSUS, there are other positions where the train can use the same control mode at both sides of the point, , at which the speed limit was reached, which also results in the discontinuity, 2 ( − ) > 2 ( + ). The speed will be maintained at the cruise speed and the adjoint variable is a constant according to Remark 2, if the control mode used on both sides of is traction cruise mode or brake cruise mode, which obviously does not match the basic condition that the adjoint variable is discontinuous. The speed limit will be violated, if the control mode used on both sides of is one of the remaining modes because once the speed limit is reached on one side of , it must exceed the speed limit on the other side. This is also the reason why using coast mode at the starting of steep downhill section (SSDS) and using full power mode at the end of steep uphill section (ESUS) are not allowed when the speed limit is reached at these two points. Therefore, the assumption must be false and thereby the sufficiency is proved. □ Remark 6. Under the premise of using different modes with the same limits of control output on both sides of the position at which the speed limit was reached , if the discontinuity of the adjoint variable, 2 ( − ) < 2 ( + ), occurs, which also means ( − ) < ( + ), then the limits of the control outputs u( + ) and u( − ) must be zero and the point must be ESDS, as shown in Figure 4(c-e), or the limits of the control outputs u( + ) and u( − ) must be U and the point must be SSUS, as shown in Figure 4(f).
Proof. The proof of the necessity of the first part is similar to that of the necessity of the first part in Remark 5. The difference is that although the limits of the control output u( + ) and u( − ) are both zero in two cases, they belong to different modes here. There are three kinds of possible mode sequences that satisfy the jump condition and above description, which are brake cruise mode to coast mode (see Figure 4(c)), coast mode to traction cruise mode (see Figure 4(d)), and brake cruise mode to traction cruise mode (see Figure 4(e)). The other limits of the forces implemented in the train on both sides of are the same, namely w 0 ( − ) = w 0 ( + ), g( − ) = g( + ), w r ( − ) = w r ( + ). According to the steep downhill section definition (11), at the ESDS, the formula −w 0 ( = 0 always holds. Therefore, Equation (33) holds. The remaining part is the same as the necessity proof process of the first part in Remark 5.
The necessity proof process of the second part is similar to that of the second part in Remark 5. The difference is that although the limits of the control output u( + ) and u( − ) are both U in two cases, they belong to different modes here. There is only one kind of possible mode sequence that satisfies the jump condition, which is traction cruise mode to full power mode (see Figure 4(f)). According to the steep uphill section definition (12), at the SSUS, the formula u( To prove the sufficiency, we divide the origin remark into two parts and prove the sufficiencies one by one by contradiction. First, assume that if the discontinuity of the adjoint variable, 2 ( − ) < 2 ( + ), occurs, then it must not be the ESDS or SSUS, under the premise of using different modes with the same limits of control output on both sides. The acceleration at must be 0 because the train will exceed the speed limit, if its speed at is V and the acceleration at is not equal to 0. The limits of the forces implemented in the train at both sides of are the same, namely u( − ) = u( + ), w 0 ( − ) = w 0 ( + ), g( − ) = g( + ), w r ( − ) = w r ( + ). As the point is a point on the track other than the SSDS, ESDS, SSUS and ESUS, the formula u( − ) − w 0 ( − ) − g( − ) − w r ( − ) = u( + ) − w 0 ( + ) − g( + ) − w r ( + ) ≠ 0 holds. However, the train reaches its local minimum speed at the SSDS and ESUS because it will exceed the speed limit, if it reaches V at the SSDS or ESUS, which is not allowed. According to the discontinuity condition, 2 , which contradicts Equation (30). Therefore, the assumption must be false and the first part of the sufficiency is proved.
Then, assume to the contrary of the output that if the discontinuity of the adjoint variable, 2 ( − ) < 2 ( + ), occurs, then the limits of control output at the ESDS are not zero, and the limits of control output at the SSUS are not U . According to the definition of the steep downhill section (11), if the limits of the control output at the ESDS are not zero, then the formula u( − ) − w 0 ( − ) − g( − ) − w r ( − ) = u( + ) − w 0 ( + ) − g( + ) − w r ( + ) ≠ 0 holds. According to the definition of the steep uphill section (12), if the limits of the control output at the SSUS are not U , then the above formula also holds in this situation. The formulas mentioned above can derive , which contradicts Equation (30). Therefore, the assumption must be false and the second part of the sufficiency is proved. □ Based on Remark 4 to 6 of the above analyses, the optimal switching diagram at the adjoint variable discontinuity point is summarised in Figure 5.

SOLUTION ALGORITHM
Based on the new operation rules derived from the stateconstrained PMP and the jump condition in Section 3, this section develops a two-step algorithm to handle the energyefficient speed profile with steep grades and speed limits for high-speed train, especially for the cases where speed limit was reached on the steep slope. The framework of the two-step algorithm is shown in Figure 6, which contains the optimalcruise minimum-time algorithm and the search-substitution algorithm.
Estimate v bq by timetable: According to Remark 2, once the optimal traction cruise speed v bq is decided, the optimal braking cruise speed v bz is also determined. However, v bq cannot be calculated directly. Therefore, in the first iteration, v bq is estimated by the timetable: Estimate v bq by error: According to the error between the calculated running time t r k in the kth iteration and the scheduled running time t (x 0 ) − t (x f ), if the error does not satisfy the requirement, v bq k is further adjusted by Equation (35). where is the offset factor whose initial value is recommended in [0%, 4%], and is the scale factor whose initial value is recommended in [1,1.4]. These two factors are used to accelerate the search process because t r 1 calculated by the average speed in Equation (34) must be larger than the scheduled time. So the calculated running time of the first few iterations must be larger than the scheduled time as well, which means v bq k should be larger. Therefore, for the first few iterations, each v bq k adjustment size should be proportional to the running error, and the positive offset factor can be used to speed up the convergence. After that, the running time error has been greatly reduced compared to the previous iterations, so the offset factor is set to 0, otherwise, it may lead to the fluctuation of t r k . Although the estimated optimal traction cruise speed v bq k varies from the actual optimal traction cruise speed v bq , it will not be misleading to record v bq k as v bq in Sections 4.1 and 4.2 because the following subsections only have a relationship with the estimated optimal traction cruise speed in a certain iteration. In addition, this helps the reader understand the relationship with the previous analysis.

1st step: Optimal-cruise minimum-time algorithm
The optimal-cruise minimum-time algorithm is obtained by assuming that the energy recovery rate equals 1, based on the information derived from Section 3. According to Table 1 in Remark 1, if = 1, the optimal EEDS operation set consists of four modes except the coast mode, and v bq = v bz . Therefore, this situation is much easier compared with the other situations where ≠ 1. The work flow of the algorithm is described as follows: Step 1: Track division: The cruise target speed at each position is defined as v target (x) = min{v bq , V (x)}. Add v target (x) into Equations (11) and (12) to divide the route into three categories, steep downhill section, steep uphill section and gentle slope section. Actually, v target (x) and the actual running speed are close in size, which means the division here is roughly accurate. The division needs further fine adjustment, if the actual running speed is not v target (x). Positions with consecutively identical target speed are grouped into the same section. The ith section's target speed is denoted as v target i . To initiate and terminate the algorithm, the departure point and arrival point are also added as additional sections without length.
Step 2: Precomputation & backward calculation: According to the EEDS driving rules described in Section 3, full power mode is used to accelerate to the cruise speed and to travel through the steep uphill. Traction cruise mode and brake cruise mode are used to maintain the optimal cruise mode in gentle section and steep downhill, respectively. Full braking mode is used to decelerate. The details are presented in Algorithm A.1, see Appendix A.1.

Step 3: Generation of cut-in adjoint variable:
The cut-in adjoint variable c (x) is denoted as 1 except in the area using full braking mode, where it is denoted as 0. Both traction cruise mode and brake cruise mode belong to singular control whose corresponding auxiliary adjoint variable is 1, if = 1 (see Table 1). The other modes are deemed as the branches to link different cruising areas. So, set the cut-in adjoint variable as the basis first, and then correct it to the auxiliary adjoint variable using the following search-substitution algorithm.

2nd step: Search-substitution algorithm
The main function of the search-substitution algorithm is to optimise the steep track section and the deceleration section of the approximate-optimal solution obtained by the optimal-cruise minimum-time algorithm to the optimal solution. The newly obtained part is called inserted interval, which is used to substitute the corresponding interval in the approximate-optimal solution. The trial inserted interval (TII) is the trial result in the process of finding the inserted interval, starting from a point of the approximate-optimal solution and calculated with the possible optimal control mode order.

Search
One-dimensional search algorithms such as Fibonacci search, Dichotomous and Exhaustive search, are used to find the optimal mode switching point of the TII. To better adapt the those methods to this problem, some new concepts and functions are proposed. In this paper, we use the Golden-section Search (GSS) to calculate [32]. (2) and (27), whose initial value is the value at the starting point of the inserted interval in the approximate-optimal solution. The control output u(x) can refer to Remark 1 and 2.

Function of the inserted interval: The function of the inserted interval consists of Equations
Evaluation function: Evaluation function E (x) is designed as a kind of penalty function to score the start points of different TIIs. The general form of the evaluation function used to score the first type of the TII, which is used to keep running, is as follows: The general form of the evaluation function used to score the second type of the TII, which is used to decelerate, is as follows: where the independent variable x of the evaluation function is the starting point of the TII, which is also the mode switching point.

Substitution
The substitution part refines every steep section and deceleration section in the approximate-optimal solution, according to the modes composition and the optimal control switching rules in Remark 1 to 6. The algorithm for steep uphill section is shown in Algorithm 1. According to the relationship between V , v bq , and v bz , steep downhill section can be further classified into three cases: whose details are presented in Algorithms 2 to 4. For gentle slope section with deceleration process, the substitution procedure is similar to the previous one and therefore is not repeated here. Take a simple example to illustrate the above process. The high-speed train is scheduled to travel across a steep downhill section bc in Figure 7(a). The optimal cruise speed v bq , shown with a pink solid line, is obtained in the first step of the two-step method by assuming = 1. The steep downhill section bc is determined by the current running speed, v bq . Algorithm 3 is used owing to v bq ≤ V < v bz in the steep downhill section.
According to the algorithm, first, the coast mode is assumed as the optimal control mode of the inserted interval. As is shown in Figure 7(a), the set of orange lines corresponds to Equation (36c) in E 1 (x, v bq , d ESDS , d ESDS , 1), where the train ALGORITHM 2 Steep downhill section, with V ≤ v bq < v bz stops before entering the steep downhill section. The set of purple lines corresponds to Equation (36a), where the train could enter the steep downhill section, but could not leave the ESDS at a speed higher than v bq . The sets of green and blue lines correspond to Equation (36b), where the train could not only enter the steep downhill section but also leave the ESDS at a speed higher than v goal , namely v bq . This solution is shown with a red dashed line, which is also the result of the classical PMP. This solution violates the speed limit. Hence, this example belongs to the case of adjoint variable discontinuity, which should be further solved by the state-constrained PMP. According to Remark 5, the position where the speed limit is triggered should be the ESDS, where the optimal control mode is the coast mode (see Figure 4(a)). Therefore, the optimal solution is obtained by first updating the ESDS position from point c to point f because the actual speed is V rather than the target speed v bq , followed by using the coast mode, starting from the ESDS with V to link the left-hand and right-hand speed trajectories. After determining the optimal switching points, e and g, the limits of the corresponding auxiliary adjoint variable (x) at the discontinuity position are finally obtained. The final inserted interval is plotted in a red solid line in Figure 7(b).
Explanations for Algorithm 1, 2 and 4 are illustrated in Appendix A.2 to A.4.

CASE STUDY
The proposed method was applied to real-world high-speed rails in western China, as shown in Figures 8-10. The scheduled time of the high-speed train in three cases was approximately 0.6-1.5 min longer than the minimum running time, which is 3.40-6.43% of the minimum running time for each case. Various tracks with steep gradient were included in these cases, ranging from −20% to 25% . The algorithms described in this paper were implemented in MATLAB, and the experiments were performed on a Windows 10 personal laptop with Intel(R) Core(TM) i7-8750H CPU @ 2.20 GHz and 32.00 GB RAM. In case study 1, a temporary speed restriction (TSR) zone (from K28 + 140 to K33 + 900) was implemented in the track with steep gradients to test the flexibility of the algorithm in handling a vast speed limit gap, which may be as great as 100 km/h in the case of daily operation. A common route for the high-speed train with steep gradients was exhibited in case study 2, and the TSR zone was not implemented. Therefore, this scenario can be used to test the performance of the algorithm under normal conditions. Considering that the energy recovery ratio has a significant impact on total energy consumption, in case study 3, we selected a standard high-speed train route with steep gradients, moreover, the energy recovery ratio was much larger than that in the previous cases. The parameters for each case are summarised in Table 2.

Baselines
For comparison purposes, the minimum running time (MRT) strategy and the approximate-optimal time-satisfied (AoTs) strategy are tested to demonstrate the effectiveness of the proposed method. These two strategies can be realised based on the two-step method. To further show the flexibility and advantage performance of this method, two other typical methods are also executed.

ALGORITHM 4
Steep downhill section, with v bq < v bz ≤ V i. Algorithm for the MRT strategy: To maintain the speed as high as possible, the control output of this strategy is prone to be the full power mode as long as the speed limit is not violated, and its cruise speed is usually the speed limit. Coast mode is not allowed in this strategy. The adjustment part of the twostep algorithm is to set v bq as a constant that satisfies v bq > max V (x), ∀x, and all the steps, except for the 1st step, are skipped. ii. Algorithm for the AoTs strategy: AoTs strategy fully utilises the given time and tries to use cruise mode as long as possible. This strategy can be realised by skipping the 2nd step of the two-step method and keeping the remaining steps unchanged. iii. Discrete Dynamic Programming (DDP): DDP method is a direct method that transforms the original continuous problem into a multi-stage decision process, divides the state space

Results analysis
The results of the aforementioned methods are demonstrated in Figure 11, and the details regarding the speed trajectory and control output are shown in Figures 8-10. For the MRT strategy, as shown in Figures 8(b), 9(b) and 10(b), the least amount of time is required to travel from the departure station to the arrival station (see Figure 11(a)). The full power mode is frequently used if the current speed is lower than the speed limit. The energy expended during acceleration and cruise in each case is 1211.15, 952.61 and 765.93 kWh. The recycled energy during deceleration is 184.05, 122.28 and 242.10 kWh, as shown in Figure 11(c). The MRT strategy consumes the maximum energy compared with others, i.e. 1027.10, 830.32 and 523.83 kWh in each case, as shown in Figure 11(b).

FIGURE 11
Performance of MRT strategy, AoTs strategy, DDP method, RL method and two-step method For the AoTs strategy, as shown in Figures 8(c), 9(c) and 10(c), the given time is completely utilised. Therefore, its cruise speed is 186.59 km/h in case 1, 205.73 km/h in case 2, and 184.71 km/h in case 3, which is significantly less than the cruise speed in the MRT strategy. In comparison to the energy consumed by the MRT strategy, it saves 8.43%, 4.86% and 6.98% in each scenario.
The DDP method is theoretically recognised as a method that can determine the global optimum. The results of the DDP method are extremely sensitive to the size of lattice, namely the resolution, especially for high-speed rail. Furthermore, the DDP method requires significantly larger memory in comparison to other methods, as shown in Figure 11(f). The speed range, route length, and running time for high-speed rail are considerably larger than those for other types of rail transit, such as urban railway. Therefore, under the same resolution, the total amount of calculation for high-speed rail increases exponentially. We attempted to set the minimum lattice and fully utilise the equipment. The results still contained some turbulence, which reduces the comfort, as shown in Figures 8(d), 9(d) and 10(d).
The RL method obtains similar results as the DDP method. However, in case 1, it takes longer to meet the requirement. This phenomenon can be attributed to the TSR zone, which acts like a wall blocking the path of the agent. Therefore, the agent spends a significant amount of time learning to bypass the TSR area. Then, the agent can learn to improve the quality of the solution. The RL and DDP methods share the same default, i.e. there are too many states to calculate.
For the two-step method, as shown in Figures 8(f), 9(f) and 10(f), the optimal traction cruise speed, v bq , is 199.87, 214. 19, and 196.88 km/h. The corresponding optimal braking cruise speed, v bz , in each case is 238.43, 255.40 and 208.26 km/h, respectively. Although the cruise speed of the two-step method is higher than that of the AoTs strategy, implying that more energy is spent to overcome the running resistance, the coast mode is used in the steep downhill section and deceleration process, which do not consume any energy in these sections. This is why the energy recycle percentage of the two-step method is considerably lower than that of others, which is approximately 5% in cases 1 and 2, and approximately 10% in case 3, as shown in Figure 11(d). The energy expended during acceleration and cruise is 926.10, 786.49 and 560.28 kWh in each case. The corresponding recycled energy is 46.07, 38.41 and 89.38 kWh, respectively. The total energy consumption in the two-step method in case 1 is 880.03 kWh, which saves energy by 5.89% and 14.32% in comparison to the AoTs and MRT strategies, respectively. In case 2, the two-step method saves energy by 5.05% and 9.90% compared with the AoTs and MRT strategies, respectively. In case 3, the two-step method saves energy by 3.13% and 10.11% compared with the AoTs FIGURE 12 Effects of different parameters of the two-step algorithm on the convergence rate and MRT strategies, respectively. Because the energy recovery ratio in case 3 is much closer to 1 than that in cases 1 and 2, the final energy consumption of the two-step method in case 3 is not significantly decreased when compared with that under the AoTs strategy. Furthermore, for the same reason, the recycled energy and energy recycle rate in case 3 are much larger than those of cases 1 and 2. Its computation time is only 0.39-1.69% of the DDP method and 0.06-3.20% of the RL method. RAM usage for the two-step method is of the same order of magnitude as that for the MRT and AoTs strategies, which is less than 1% than the DDP and the RL methods.
These scenarios are typical adjoint variable discontinuity cases, in which the speed limit is frequently reached, as shown in Figures 8(g), 9(g) and 10(g). In case 1, according to Remark 5, it can be observed that adjoint variable discontinuity first occurs at the ESDS of the first steep downhill section (from K8 + 886 to K13 + 925), where the optimal control mode should be the coast mode and the value of the auxiliary adjoint variable has the following relationship: < − < + < 1. Then, adjoint variable discontinuity occurs at the ESDS of the second steep downhill section (from K30 + 889 to K33 + 033). According to Remark 6, the auxiliary adjoint variable has the following relationship: = − < + = 1, and the optimal control sequence is from the brake cruise to traction cruise mode. Two other discontinuities were observed at points where the speed limit changed (K28 + 140 and K34 + 101), and the speed of the train was equal to the speed limit. The control mode sequence was observed from the full braking to traction cruise mode and from the traction cruise to full power mode. For case 2, according to Remark 5, adjoint variable discontinuity occurs at the SSUS of the first steep uphill section (from K9 + 772 to K15 + 686), where the optimal control mode should be the full power mode and the limits of the auxiliary adjoint variable have the following relationship: 1 < − < + . According to Remark 5, another auxiliary adjoint variable discontinuity occurs at the ESDS of the steep downhill section (from K27 + 910 to K32 + 584). This situation is similar to case 1, where the optimal mode should be the coast mode and the auxiliary adjoint variable at the discontinuity point has the following relationship: < − < + < 1. Auxiliary adjoint variable discontinuity also occurs at the speed limit change point (K44 + 232), where the speed of the train becomes equal to the speed limit. In case 3, the discontinuity point first occurs at the ESDS of the first steep downhill section (from K5 + 621 to K8 + 84), where the auxiliary adjoint variable jumps from = − to + = 1 according to Remark 6, and the optimal control sequence is obtained from the brake cruise to traction cruise mode. The discontinuity at the ESDS of the second steep downhill section (from K24 + 446 to K29 + 510) is similar to the previous and will not be repeated here.

Sensitivity analysis
Nine orthogonal experiments of different combinations of offset and scale factors are tested to explore the effects on the convergence rate, as shown in Figure 12. Take the parameters in case 1 as an example. The parameters of the two-step method used in Figures 8-10 are = 1.00 and = 2%, and five iterations were run. A proper combination of and can effectively improve the convergence rate, for example, the combination of = 1.4 and = 2% only needs four iterations taking 382.41 s which is 87.69% that of the previous experiment. The greater and make the convergence process faster when the calculated running time is larger than the scheduled run- in certain kth iteration, which needs more iterations to converge. However, the orthogonal experiments can only point out the best combination of and for this case study. A more reasonable and acceptable method is to relax the tolerance error of calculated running time, such as changing it from 1 to 10 s, which is marked in grey dashed line in Figure 12. Irrespective of the combination of and , the process can always finish within five iterations. Some of them can even achieve in only three iterations, which saves considerable time and has very little impact on the whole railway system.

CONCLUSION
In this study, a novel energy-efficient speed trajectory planner was developed for high-speed trains traveling on tracks with steep gradients and speed limits, especially for the cases in which the speed limit is reached during energy-efficient operation. Unlike the previous studies, wherein the adjoint variable was continuous when using the classical PMP, this study solved the cases in which the speed limit is reached in the steep section, which causes discontinuity in the adjoint variable.
The optimal switching rules and switching positions at the point where the speed limit is reached point in the steep track were derived by additionally analysing the jump condition of state-constrained PMP. A new control mode switching diagram was proposed based on the above rules, which utilised information such as dynamic speed limit, schedule, steep tracks, and train operation rules. A novel two-step algorithm for these cases was proposed. In the first step, the optimal-cruise minimumtime algorithm was designed to obtain the approximate-optimal solution from the perspective of time-energy conversion. In the second step, the search-substitution algorithm adjusted the approximate-optimal solution from the perspective of spaceenergy conversion.
Real-world high-speed rail case studies with an additional TSR zone and different energy recovery ratios were used to demonstrate the effectiveness and flexibility of the proposed method. The results showed that the proposed method saved up to 10% and 3% operational energy in comparison to the MRT and AoTs strategies, respectively. The computing time of the two-step method was significantly better than that of the DDP and RL methods; approximately 0.98% and 1.62% of the computation time of the DDP and RL methods was required. In addition, the proposed method only required a small amount of memory and it can run on most devices.
Future work will further enhance its performance and make it more versatile. For example, the optimal cruise speed in this study was obtained by iteration. With the development of AI, the optimal cruise speed may be obtained directly using static parameters, such as speed limit and track alignment, which may accelerate the convergence rate. Moreover, in some cases, the scheduled running time for different parts of the task may have their own requirements. This method can be modified to calculate these scenarios by transforming the original problem into a multi-stage planning problem with extra connecting constraints.
the optimal solution is obtained by first updating the SSUS position from point b to point f because the actual running speed is V rather than the target speed v bq , followed by using the full power mode, starting from the SSUS with V to link the lefthand and right-hand approximate-optimal speed trajectories on both sides. After determining the optimal switching points, e and g, the limits of the corresponding auxiliary adjoint variable, (x), at the discontinuity point are finally obtained. The final inserted interval is plotted in a red solid line in Figure A.1(b). The results of the first step can be considered the optimal solution if v bq ≥ V , which corresponds to the case in Remark 6 (see Figure 4(f)).

A.3
Algorithm 2: Steep downhill case, with V ≤ v bq < v bz In this example, the high-speed train is scheduled to travel across a steep downhill section, where V ≤ v bq < v bz . The optimal cruise speed, v bq , shown with a pink solid line, is obtained in the first step of the two-step method by assuming = 1. The steep downhill section, bc, is determined by the current running speed, v bq . Algorithm 2 is used owing to V < v bq in the steep downhill section.
According to the algorithm, first, the coast mode is used. The set of orange lines corresponds to Equation (36c) in E 1 (x, V , d ESDS , d SSDS , ), where the train switches the coast mode too early; consequently, it cannot enter the downhill section. The set of purple lines corresponds to Equation (36a), where the train can enter the steep downhill section but its exit speed at the ESDS is lower than the target speed, i.e. v(d c ) ≤ V . The set of blue lines corresponds to Equation (36b), where the train can enter the steep downhill section and leave the section at a speed higher than V . In this example, the optimal TII, v 1 , evaluated by Equation (36) has a non-zero minimum value, indicating that v 1 (arg x v = V ) > . Therefore, the brake cruise mode is not required in this section. This search process is shown in Figure A.2(a).
According to Remark 6, auxiliary adjoint variable discontinuity should occur at the ESDS (see Figure 4(d)). The search results are the substitution part, as shown in Figure A.2(b).

A.4
Algorithm 4: steep downhill case, with v bq < v bz ≤ V The example in this section can be considered as an extension of the example in Appendix A.3. The high-speed train is scheduled to travel across a steep downhill section, be, which is determined by the current running speed, v bq , by assuming = 1 in the first step. This example is solved by Algorithm 4 owing to v bz ≤ V in the steep downhill section.
The search process for the ac interval is skipped because it shares a similar process as the previous example. The difference lies in the manner in which the train leaves the steep downhill section. According to Algorithm 4, E 1 (x, v bq , d ESDS , d ESDS , 1) is used to find the possible optimal solution. The set of blue and green lines corresponds to Equation (36b), where the train exits the steep downhill section at a speed higher than v goal , namely v bq . This solution is shown with a red dashed line, which is also the solution using the classical PMP. However, this solution violates the speed limit at the ESUS. Therefore, this example belongs to the cases of adjoint variable discontinuity, which should be further solved by the stateconstrained PMP. According to Remark 5, the position where the speed limit was reached should be the ESDS (see Figure 4(a)). The optimal solution is obtained by first updating the ESDS from point e to point h because the actual running speed here is V rather than the target speed, followed by using the coast mode, starting from the ESDS with V to link the left-hand and right-hand speed trajectories. After determining the optimal switching points, g and i, the limits of the corresponding auxiliary adjoint variable, (x), are obtained. The gap between intervals ac and gi is linked using the brake cruise mode. The final solution is plotted in a red solid line in