Model Predictive Control of Permanent Magnet Synchronous Motor Based on State Transition Constraint Method

Permanent magnet synchronous motors are widely used and have sufficient development prospects in the drive systems of electric vehicles. Traditional model predictive control (MPC) methods are shown to achieve good control performance by tracking the dand q-axis current as well as limiting the current amplitude. However, the dynamic response performance and current harmonics during the switching process are not considered in the traditionalMPC.+erefore, this paper proposes anMPC that can effectively improve control performance, where the switch transfer sequence in the switch constraint module is considered in the improved model. +e state transition error is obtained from the switch constraint module according to the current switch state and the transition probability, after which, the integration into the cost function in which the driving error, tracking error, and constraint error are considered. A reinforcement learning (RL) algorithm is used to obtain the weight coefficient of the transition error term in the constraint module for automatically determining the best switch state for the next control period using the cost function. Simulation tests show that the total harmonic distortion of the phase current based on the improved MPC is 978.4%, less than 2843.0% of the traditional MPC method under 20Nm at 1000 rpm. +e torque response time of the motor is reduced by 0.026 s, whereas the simulation results indicate that the 100 km acceleration performance of an electric vehicle is improved by 9.9%.


Introduction
Permanent magnet synchronous motors (PMSMs) are widely used owing to their small size and high efficiency, with development prospects for the drive systems of electric vehicles. e dynamic response performance and highprecision control are directly related to its control system, which also require better motor control systems [1][2][3]. Moreover, PMSMs have the potential for continuous operation and fault tolerance. Nevertheless, because practical implementation can result in different parameters of the PMSM, the control characteristics generally vary with time and are nonlinear.
Model predictive control (MPC) is an advanced control algorithm that is widely used for industrial control. It uses a mathematical model of the control object to control motor movement through rolling prediction. e performance objective function is used to determine the optimal control vector by minimizing the error between the actual response vector and the expected vector [4][5][6][7][8][9]. A single-loop MPC and moment of inertia recognition based on the forgetting factor method, which has better dynamic performance than traditional MPC, is proposed in literature [10]. Given that the current prediction method has better steady-state performance while maintaining good dynamic characteristics, a current prediction method which operates by observing the back electromotive force is proposed in the literature, which is estimated by the historical stator voltage [11]. A torque predictive control method has been proposed, which avoids the difficulty of selecting weight factors in traditional torque predictive control [12]. A simplified finite control set method is proposed to solve the problem of heavy calculation burden of traditional finite control set methods [13]. To predict the future stator current and disturbances caused by parameter mismatch and current measurement error simultaneously, a novel current and disturbance observer was proposed [14]. In [15], a deadbeat-direct current control method was proposed in which two adjacent active voltage vectors with one zero-voltage vector was applied to an interior PMSM in each control cycle. e method that considered the sign of the error between the stator current components and their corresponding reference values achieved superior performance in torque ripple, stator ripple, and stator-current THD. However, this method only considered the modulation of PWM, whereas the switch transition was not considered. An integrated harmonic reference generator was presented, with the control framework allowing for both fast dynamic torque response during transients and maximum utilization of the drive system in the entire operating range without switching between different control strategies. However, the control method requires high precision of the driving model and needs to integrate parameters for online identification or observation models, meaning that the control model is more complex in application [16].
Reinforcement learning methods are highly effective in path-following control problems, such as the continuous control systems used for online control of hydraulic cylinders [17], intelligent electric motor control [18], and deterministic promotion RL for vehicles [19].
e key advantage of RL lies in its ability to learn by receiving a reward rather than learning from ground truth. is allows the controller to respond to the unforeseen states from the environment.
In summary, traditional control strategies in motor control systems rely heavily on the experience of developers. However, an AC motor is a nonlinear, complexly coupled, multivariable control system, and its parameters usually change during operation, which results in a decline in control performance. Meanwhile, the reliability and response of the control system are usually the most problematic issues. In addition, random factors exist in the motor control process. Such random factors cannot be estimated using an accurate mathematical control model for motor control. With regard to the switch sequence, determined by the control vector, current harmonics would directly cause torque ripple and speed fluctuation, which might result in vibration, noise, and drive comfort deterioration in traditional current MPC. In particular, an accurate mathematical model of stochastic factors that also affect its performance cannot be established. However, stochastic factors that affect the transfer of switches have not been considered in the existing literature.
Hence, an improved MPC method is proposed in this study, which considers the switch transfer sequence in the switch constraint module combined with the MPC model. e state transition error can be obtained from the switch constraint module according to the current switch state and transition probability. An RL algorithm was used to obtain the weight coefficient of the transition error term in the constraint module. Meanwhile, the state transition error is integrated into the cost function, which also considers the driving error, tracking error, and constraint error. en, the optimum control vector or motor operation state is obtained, meaning that the PMSM dynamic response can be improved by restraining the switching state of the switch and suppressing the current harmonics. e remainder of this paper is organized as follows: the PMSM state space model, transition predictive control, and cost function are discussed in Section 2. Numerical simulation tests are proposed in Sections 3 and 4. Finally, the conclusions are summarized in Section 5.

Motor Model and Random Transition
where the state variable x � [i d , i q ] T , and the state matrix is expressed as where R represents the motor stator resistance and L d and L q denote the d-axis and q-axis inductances, respectively. ω denotes the motor electrical angular speed, and ψ represents the permanent magnet flux of the motor. e currents of the d-and q-axis were considered as the direct control variables. e switch state of the converter device combined with the predicted value of the controlled variable is considered as the output control variable, given as u � [ud, uq] T � MDU. e switching-state vector U can be expressed using a three-phase representation system where θ denotes the electrical angle of the rotor and U can be expressed by the voltage of the DC power supply and switch status S, given as follows: where S � [Sa Sb Sc] T represents the switch status symbol and S y (y � (a/b)/c) is the switching state of the inverter. U dc is the voltage value of the DC power supply, and F is the coefficient, which can be expressed as e electromagnetic torque of the motor is given as e electrical motor speed is given by the following differential equation: e mathematical model of current prediction is illustrated as where p is the number of pole pairs; T l is the load torque; J m is the moments of inertia; i d and i q are the components of the d-axis and q-axis currents, respectively; and Np denotes the prediction step size.

Cost Function and State Transition Matrix.
To select the switch transfer state in the next time period, the output basic voltage vector was rolling optimized and updated by constructing an objective function g based on the state transition. us, the historical data sequence of the switch state over a long period was obtained by a repeat operation. e cost function used in the MPC is as follows: where C D � λ id (id) 2 denotes the driving error, which was determined by the d-axis current error. C Q � λ iq (iq * − iq) 2 represents the tracking error, which was mainly determined by the q-axis current error, and iq * denotes the reference qaxis current. C R denotes the constraint error, which required that the sum of the squares of the currents of the d-and qaxis be less than the maximum current value I max . λ(· · ·) denotes the weighting coefficient, which was determined based on the authors experience. e values of C D , C Q , and C R were found in literature [20].C T denotes the state transition constraint, which was determined by the state transition probability. When the transition probability was large, a smaller weight coefficient value was adopted; otherwise, a larger value was adopted. e state transition error C T is the constraint condition for the state transition and is illustrated as follows: where P ij is the state transition probability calculated using the Markov chain andλ T denotes the weight coefficient of the state transition, which can be obtained according to experience or the intelligent algorithm method. RL algorithms were introduced to examine the weight coefficient. As depicted in Figure 1, the basic RL setting consisted of an agent and an environment, where the environment can be seen as a problem setting and the agent as a problem solver. At every time step t, the agent performs an action, a t ∈ A, on the environment which affects the state of the environment.
is is updated based on the previous state, s t ∈ S, and the action a t to s t . Afterward, the agent receives a reward, r t+1 , for taking this action, and the environment shows the agent a new observation of the environment, o t+1 .
For example, in motor control environments, the observations are a concatenation of environmental states and references. Based on this new observation, the agent calculates the new action, a t+1 . e agent's goal is to find an optimal policy π: S ⟶ A, where π is a function that maps the set of state S to the set of actions A, and an optimization policy maximizes the expected cumulative reward over time. Owing to the dynamics of the environment, the state and reward at a time step t depend on the previously taken actions. erefore, the reward for taking an action is often delayed over multiple time steps.
e RL algorithm was used to calculate the weight coefficient, whereas the speed fluctuation error, torque ripple error, and current harmonic component were the inputs for reinforcement learning. e reward ratio was distributed according to the performance of the control system. Finally, the weight coefficient of the cost function was obtained. e transfer process of the switch between each state was considered as a Markov chain process, which describes the relationship between the actual state and the next states of a system. P ij denotes the probability of state i of variable X at time step n + 1 given the current state j at time step n, illustrated as follows [21]: Mathematical Problems in Engineering Given that the three-phase PMSM has only eight switching states, there were only eight transition states in the transfer process of the Markov chain, numbered as states 1 to 8.
ese eight states can transfer to each other, but the specific transfer direction and object are determined by the transfer probability. Figure 2 shows a schematic of the state transition, with arrows showing the transition directions. e transition time sequence composed of several transition states was input into the inverter as a control parameter. e real-time update transition probability matrix was obtained from historical data of the transfer sequence. Some state transfer events did not occur because of the small amount of historical data. As shown in the red box in Figure 3, the current state is 2, and the next state is 3, that is, the probability of transferring from state 2 to state 3 is 0.9961. However, some state transitions have a probability close to 0, for example, the transition of states 6 to 7. is smaller transition probability means that there is almost no transition from the current state to the next state in the historical data. e calculation of the improved MPC cost function is shown in Figure 4. First, the state transition probability was calculated according to the transfer sequence of the switch historical data to obtain the state transition matrix. Second, the possible transition state in the next time period was estimated according to the current state and state transition probability. en, both the possible transition states and state transition probability were input into the cost function. e values of the state transition constraint module in the cost function were obtained from (9). Analogously, the reference current and feedback current of the d-and q-axis, motor speed, and rotation angle were input into the MPC module to obtain the drive error, tracking error, and constraint error; then, the value of the objective function was obtained by substituting the state transition error into the cost function. Finally, the optimal switching sequence, determined by the minimum value of the objective function, was selected as the actual switching output of the next control period.
Stability is one of most important topics in control. e Markov chain can be used to reveal the practical stability problem. is method also can be improved to adapt the stochastic systems, when the input, output, and interference of the system have random or uncertain factors [22,23].

Numerical Simulations
e improved MPC strategy and traditional MPC were simulated using the MATLAB/Simulink platform. Only three types of errors were considered in the traditional MPC: the driving error, tracking error, and restriction error. In addition, the weighting coefficients about traditional MPC were based on the empirical method. A simulation diagram of the control strategy is shown in Figure 5, in which the MPC module, inverter module, power supply module, motor model module, historical data collection module, state transition probability module, and observer module are included. e transition state of the power transistors was controlled by the switch sequence, which is the input signal of the inverter in the next control period. Data on the switch state was stored in data storage, and the amount of historical data in the storage became increasingly large with time. erefore, the calculated values of the transition probability became increasingly accurate. In addition, the scale of the data amount was determined by experience according to practical demand or by calibration with experiments. e historical data in the simulations were only applied to a PMSM.
A power battery was adopted as the power supply, and the directed current was converted into alternating current by the inverter. e switch signal or pulse width modulated signal was used as the ON and OFF signal of the switch tube.
e three-phase current of the PMSM collected by the current sensor was input into the observer module; then, the phase current was sent to the Clark transformation and Park transformation module to obtain feedback currents of the dand q-axis current.

Mathematical Problems in Engineering
Analogously, the motor angle, actual rotation speed, target speed, and target torque were input into the observer module.
en, the d-and q-axis reference currents were obtained from this module and imported to the MPC module.
e parameter values of the motor control system in the numerical simulation and bench test are listed in Table 1.
is study verified the correctness of the methods under different conditions using MATLAB software. e target was to control the motor speed from 0 to 1000 rpm. e torque response characteristics of the traditional MPC and improved MPC are shown in Figures 6 and 7, respectively. Simulation results show that the torque response of the improved MPC was faster than that of the traditional MPC, and torque fluctuation was effectively reduced. e torque and velocity were corrected when the comprehensive transition probability was included in the improved MPC model, where the driving error, tracking error, and the constraint error of the current were considered. e simulation results from Figure 6 show that the improved MPC ((b) and (d)) and the traditional MPC ((a) and (c)) can respond to the target speed faster. e steadystate response time of the improved MPC was 0.021 s, and that of the traditional MPC without state transition constraint was 0.032 s. From (c) and (d), it was determined that the phase current (A) fluctuation of the improved MPC was smaller than that of the traditional MPC without state transition constraint. Figure 7 shows the torque and phase current (A) performance comparisons of the two methods under a load torque of 20 Nm at 1000 rpm. As shown in (e) and (f), the improved method has the lowest current ripple. From the comparison of (a), (c), and (b), (d), it was shown that the dynamic response time of the improved MPC torque was 0.005 s-0.026 s faster than the 0.031 s of a traditional MPC. Using a new control method, the response speed of the motor was improved, enabling it to approach the target torque faster.    Figure 8. Simulation results show that the improved MPC effectively reduced the current fluctuation. By comparing the q-axis currents, it can be seen that the probability constraint error term in the cost function had a positive influence on the current response. Analogously, the same regularity was also found in the simulation results of the d-axis current. e essence of MPC is current predictive control. In traditional model predictive control, only the tracking, driving, and restriction errors of the current are considered. In the improved model, the state transition constraint was added to the prediction, and the coefficients of the state transition constraint were obtained through reinforcement learning. To obtain a good weight coefficient, the reward function of the harmonic terms in RL was restricted. e frequency spectrum of the A-phase current of the two control models was analyzed. It showed that the THD of the phase current based on the improved MPC was 978.4%, less than the 2843.0% of the traditional MPC method in Figure 9. erefore, the effectiveness of the proposed control method in suppressing current harmonics and torque ripple was verified. e switch transfer states, speed fluctuation, and torque ripple were deliberated as optimization objectives in the optimization function. In addition, the operating mechanism promotes RL running in a more favorable direction. If the speed fluctuation or the torque ripple dwindled, the award factor will be enlarged in the reward function of the RL. Hence, the exporting control vectors' performers of PMSM in the improved MPC were better than those in traditional MPC.
e deep deterministic policy gradient (DDPG) algorithm was used to determine the weight coefficient. e training progress of the RL DDPG agent is shown in Figure 10, where the lines represent the episode reward, average reward, and long-term incentive discount (episode Q0) of the trained process, respectively. e horizontal axis indicates the number of episodes of the training process.

Simulation Case about Electric Vehicle Acceleration
To verify the effectiveness of the control method proposed in this paper, an electric vehicle dynamics model and control model were established on the MATLAB/Simulink platform. Furthermore, a 100 km acceleration test of the vehicle was performed. e driving equation of a car can be expressed by (13) and (14).
where F is the sum of the vehicle resistance; λ is the conversion coefficient of the vehicle rotating mass; m is the mass of the vehicle; a represents the longitudinal acceleration of the vehicle; and T, i g , and i 0 are the driving torque from the engine or motor, transmission ratio, and final drive ratio, respectively. η, r, and G denote the mechanical transmission efficiency, wheel rolling radius, and vehicle gravity, respectively, whereas f, α, C D , A, and v represent the rolling resistance coefficient, road gradient coefficient of aerodynamic drag, windward area, and vehicle speed, respectively. e vehicle parameters used in the simulations are listed in Table 2. As shown in Figure 11, the 100 km acceleration time of the traditional MPC was 10.8 s, whereas that of the improved MPC was 9.395 s, an improvement of 9.9%.

Conclusion
is paper proposes an improved MPC for a PMSM drive system. By calculating the transition probability of the switch transfer historical data, the probability constraint error term was considered in the MPC cost function, which exports the control vector of a PMSM system. e Markov chain method was used to determine the probability of switch state transfer historical data in real time, and the transition constraint error term was obtained by combining the present state and the next state with the probability. e drive error, tracking error, constraint error, and transition constraint error were included in the cost function to optimize the switch transition vector. RL was then used to obtain the weight coefficient of the transition constraint error term. e effectiveness of the proposed method was verified through numerical simulation tests. From the torque response simulation results, it was shown that the torque response time of the motor is reduced by 0.026 s. e frequency spectrum of A-phase current showed that the total harmonic distortion of phase current based on the improved MPC was reduced 1864.6% compared with the traditional MPC under 20 Nm at 1000 rpm. In addition, the effectiveness of the proposed motor control method was verified through a 100 km acceleration simulation.

Data Availability
e data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare that they have no conflicts of interest.