Next Article in Journal
Optimizing Manufacturing Parameters of DLP Machine by Taguchi Method and Desirability Approach
Next Article in Special Issue
Design of Fuzzy PID Controller Based on Sparse Fuzzy Rule Base for CNC Machine Tools
Previous Article in Journal
Horizontal Bi-Stable Vibration Energy Harvesting Using Electromagnetic Induction and Power Generation Efficiency Improvement via Stochastic Resonance
Previous Article in Special Issue
Review and Comparison of Clearance Control Strategies
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Feature-Based MPPI Control with Applications to Maritime Systems †

1
Institute of System Dynamics, HTWG Konstanz—University of Applied Sciences, 78462 Konstanz, Germany
2
Department of Microsystems Engineering (IMTEK), University of Freiburg, 79110 Freiburg, Germany
3
Department of Mathematics, University of Freiburg, 79110 Freiburg, Germany
*
Author to whom correspondence should be addressed.
This paper is an extended version of our paper published in Proceedings of the 6th IEEE Conference on Control Technology and Applications (CCTA), Trieste, Italy, 22–25 August 2022, with the title “Feature-Based Proposal Density Optimization for Nonlinear Model Predictive Path Integral Control”.
Machines 2022, 10(10), 900; https://doi.org/10.3390/machines10100900
Submission received: 13 September 2022 / Revised: 27 September 2022 / Accepted: 29 September 2022 / Published: 6 October 2022
(This article belongs to the Special Issue Nonlinear Control Applications and New Perspectives)

Abstract

:
In this paper, a novel feature-based sampling strategy for nonlinear Model Predictive Path Integral (MPPI) control is presented. Using the MPPI approach, the optimal feedback control is calculated by solving a stochastic optimal control (OCP) problem online by evaluating the weighted inference of sampled stochastic trajectories. While the MPPI algorithm can be excellently parallelized, the closed-loop performance strongly depends on the information quality of the sampled trajectories. To draw samples, a proposal density is used. The solver’s and thus, the controller’s performance is of high quality if the sampled trajectories drawn from this proposal density are located in low-cost regions of state-space. In classical MPPI control, the explored state-space is strongly constrained by assumptions that refer to the control value’s covariance matrix, which are necessary for transforming the stochastic Hamilton–Jacobi–Bellman (HJB) equation into a linear second-order partial differential equation. To achieve excellent performance even with discontinuous cost functions, in this novel approach, knowledge-based features are introduced to constitute the proposal density and thus the low-cost region of state-space for exploration. This paper addresses the question of how the performance of the MPPI algorithm can be improved using a feature-based mixture of base densities. Furthermore, the developed algorithm is applied to an autonomous vessel that follows a track and concurrently avoids collisions using an emergency braking feature. Therefore, the presented feature-based MPPI algorithm is applied and analyzed in both simulation and full-scale experiments.

1. Introduction

Recently, empowered by the progresses in efficiency of optimization algorithms and the available computing power, sample-based nonlinear model predictive control (NMPC) approaches can be used to solve nonlinear stochastic OCP in real time [1]. Using the path integral control framework [2], the stochastic HJB equation that corresponds to a restricted class of stochastic OCPs can be transformed into a linear partial second-order differential equation [3]. This class is characterized by arbitrary but input affine dynamics, containing additive white Gaussian noise (AWGN) at the inputs, and the cost function is only restricted to be quadratic in the controls [4]. According to [3], a Feynman–Kac path integral can be used to express the solution for both, the optimal controls and the optimal value function. Thus, the solution of the transformed stochastic HJB is formulated as a conditional expectation value with respect to the system dynamics. As a result, the optimal control can be estimated using Monte Carlo methods drawing samples of stochastic trajectories [5]. While in general the resulting optimal feedback control function has an unknown structure, there are different approaches for its representation. Besides reinforcement learning (RL) approaches based on offline learning a parametrized policy [6,7], NMPC has become the de-facto technological standard [8,9]. Based on path integrals, a new type of sample-based NMPC algorithm was presented in [8]. Using a free energy definition described in [10] and [11], the input affine requirement is completely removed by [4]. Because MPPI is based on Monte Carlo simulation, the information content of the drawn samples is highly dependent on the proposal density [12,13]. Due to this fact, the system’s performance can be significantly improved by provisioning a good proposal density. While recently in [14] a robust MPPI version, in [15] a covariance steering approach and in [16] a learning-based algorithm are introduced, in this paper, a novel feature-based MPPI extension is presented. In Section 2, the basics of the MPPI approach are described. This is followed by the derivation of a feature-based extension of the MPPI algorithm in Section 3. In Section 4, an application scenario is defined containing the equations of motions of a vessel and cost functions, which are combined as an OCP. Furthermore, an emergency braking feature is defined. In Section 5, the controller design is presented including the architecture, the controller parameters and the cost function parameters. A comparison of usual MPPI control and its feature-based extension in a simulation environment is given in Section 6. In Section 7, full-scale experiments with the research vessel Solgenia are used to compare the MPPI algorithm and its feature-based extension. This is followed by the conclusions of this paper including ideas for future work in Section 8.

2. Model Predictive Path Integral Control

Recently, a sample-based NMPC algorithm was derived by [4] using the path integral framework. This so-called MPPI approach calculates the optimal control inputs by numerically solving the time-discrete nonlinear stochastic OCP
min U E Q ϕ ( X T ) + t = 0 T 1 [ C ( X t ) + u t R u t ] | X 0 = x 0
with X t + 1 = F ( X t , v t ) , t { 0 , 1 , , T 1 } , and v t = u t + ϵ t , ϵ t N ( 0 , Σ ) , R = λ Σ 1 , λ R +
in real time, where F : R n × R m R n denotes the time-discrete nonlinear system dynamics of the stochastic system state X t R n , and the actual system input denoted by v t R m that is given by the commanded system’s input u t R m with time discrete AWGN ϵ t R m with covariance matrix Σ R m × m . The so-called temperature of the stochastic system is denoted by λ . The objective function is given by the expected value of the terminal costs denoted by ϕ ( X T ) , the instantaneous stage costs denoted by C ( X t ) and a quadratic input term with weighting matrix R R m × m . The objective function to be minimized in (1a) evaluates the expected costs subject to the commanded system inputs U = { u 0 , u 1 , , u T 1 } R m × T under the measurement distribution Q corresponding to the probability density function (PDF)
q ( V ) = t = 0 T 1 [ ( 2 π ) m | Σ | ] 1 2 exp 1 2 ( v t u t P ) Σ 1 ( v t u t P ) ,
where V = { v 0 , v 1 , , v T 1 } denotes the sequence of actual input values. The distribution of the controlled system denoted by Q corresponds to an open-loop sequence of manipulated variables u t = u t P for t = 0 , 1 , , T 1 . According to [17], the expectation value under a measurement is defined as E Q ( ρ ( V ) ) : = ρ ( V ) q ( V ) d V , where ρ denotes a scalar function. In this section, the basics of this novel algorithm are described before it will be extended in the next section. According to [4], the density function of the uncontrolled system denoted by P with zero input u t = 0 , t = 0 , 1 , , T 1 leads to the PDF
p ( V ) = t = 0 T 1 [ ( 2 π ) m | Σ | ] 1 2 exp 1 2 v t Σ 1 v t .
An initial state x 0 and a realized sequence of input values V k can be uniquely assigned to a trajectory without stochastic influence by recursively applying (1b). Introducing the cumulated state-dependent path costs S ( V k ) = ϕ ( x k , T ) + t = 0 T 1 C ( x k , t ) according to [4], the value function of the OCP (1a) is given by
V ( t , x t ) = λ log E P e 1 λ S | X t = x t ,
with respect to the uncontrolled dynamics [4]. To express the value function with respect to Q , the likelihood ratio p ( V ) q ( V ) must be introduced, which yields
V ( t , x t ) = λ log E Q p ( V ) q ( V ) e 1 λ S | X t = x t .
Applying Jensen’s inequality according to [4] yields
V ( t , x t ) λ E Q log p ( V ) q ( V ) e 1 λ S | X t = x t ,
where the bound is tight with
q * ( V ) = 1 η exp 1 λ S ( V ) p ( V ) ,
where η R denotes a normalization constant. In [4], it is shown that the associated optimal control values are given by
u t * = arg min u t D K L ( Q * | | Q ) = E Q * [ v t ]
where D K L denotes the Kullback–Leibler divergence, and Q * denotes the abstract optimal distribution. According to [4], the optimal input is given by
(9a) u t * = Ω V q ( V ) q * ( V ) p ( V ) p ( V ) q ( V ) ω ( V ) v t d V (9b) = E Q { ω ( V ) v t }
minimizing (1a), where Ω V denotes the image of the sample space, and the importance weighting
ω ( V ) = 1 η exp 1 λ S ( V ) + t = 0 T 1 1 2 u t Σ 1 u t v t Σ 1 u t
can be calculated using the PDFs (2), (3) and (7). Using Monte Carlo simulation, (9a) can be estimated via the iterative update law
u t i + 1 = u t i + n = 1 N ω ( V n ) ( v t n u t i ) ,
where N samples are drawn from the system dynamics (1b) with the commanded control input sequence U = { u 0 , u 1 , , u T 1 } . The iterative procedure described in Algorithm 1 is used to estimate the optimal commanded control input, and to improve the required importance sample distribution Q simultaneously.
Algorithm 1 Optimize Control Sequence (OCS) acc. to [4]
Input: F : Transition model;
  K: Number of samples;
  T: Number of timesteps;
   U I : Initial control sequence;
   x 0 : Recent state estimate;
   Σ , ϕ , C , λ : Control hyper-parameters;
Output: U O : Optimized control sequence
   S ¯ : Average costs;
1:  for k { 0 , 1 , , K 1 } do
2:      x k , 0 x 0 ;
3:     Sample { ϵ 0 k , ϵ 1 k , , ϵ T 1 k } ;
4:      S ( k ) 0 ;
5:     for  t 1 , 2 , , T  do
6:         x k , t F ( x k , t 1 , u t 1 I + ϵ t 1 k ) ;
7:         S ( k ) + = C ( x k , t ) + λ u t 1 I Σ 1 ϵ t 1 k ;
8:     end for
9:      S ( k ) + = ϕ ( x k , T ) ;
10:  end for
11:   β min k [ S ( k ) ] ;
12:   η k = 0 K 1 exp ( 1 λ S ( k ) β ) ;
13:  for k { 0 , 1 , , K 1 } do
14:      ω ( k ) 1 η exp ( 1 λ S ( k ) β ) ;
15:  end for
16:  for t { 0 , 1 , , T 1 } do
17:      u t O u t I + k = 0 K 1 ω ( k ) ϵ t k ;
18:  end for
19:   S ¯ 1 K k = 0 K 1 S ( k ) ;
20:  return U O = { u 0 O , u 1 O , , u T 1 O } and S ¯

3. Extension to Feature-Based Proposal Density

In this section, the exploration problem of standard MPPI control is described, the idea of feature-based extension is presented and the resulting feature-based MPPI algorithm is presented.

3.1. Exploration Problem of Classical MPPI Control

To improve the sampling efficiency, both the value function (5) and the optimal control sequence (9b) can be calculated by sampling trajectories under the probability measure Q  with the proposal PDF (2). Due to the restrictive assumption R = λ Σ 1 , the proposal PDF is only parametrized by a sequence of inputs U P = { u 0 P , u 1 P , , u T 1 P } . The sampling efficiency is dependent on the quality of the proposal density [13]. In [4], the recent estimate U t of the optimal control sequence U * is used to determine the proposal PDF for the next sampling iteration. The objective function (1a) is not necessarily convex. Using an infinite number of samples, the MPPI algorithm is a global minimizer. However, due to a finite number of drawable samples, the explored state-space is concentrated around the last approximation. In order not to remain in a local minimum, the explored state-space must be enlarged. An exemplary possibility to enlarge the explored state–space is shown in Figure 1.

3.2. Feature-Based Extension of the Search Space

An enlargement of the explored state-space is possible by introducing an additional feature-based proposal density to draw a part of the samples from. Therefore, a feature is implicitly defined by solving the artificially designed OCP
U F = arg min U E Q F ϕ F ( X T ) + t = 0 T 1 [ C F ( X t ) + u t R u t | X 0 = x 0 ]
with X t + 1 = F ( X t , v t ) and v t N ( u t , Σ ) , R = λ Σ 1 ,
where ϕ F : R n R denotes the terminal costs of a feature and C F : R n R denotes the instantaneous costs of a feature. Thus, the feature-based proposal density is given by
q F ( V ) = t = 0 T 1 [ ( 2 π ) m | Σ | ] 1 2 exp 1 2 ( v t u t F ) Σ 1 ( v t u t F ) ,
parametrized by a sequence of inputs U F = { u 0 F , u 1 F , , u T 1 F } . Analogous to the previous section, the stochastic OCP (12a) can be solved by using the MPPI algorithm.

3.3. Resulting Feature-Based MPPI Algorithm

The resulting feature-based MPPI algorithm is presented in Algorithm 2. First, the control sequence is optimized using Algorithm 1, and the predicted costs are evaluated. Then, all feature control sequences are improved, and their performances regarding the main cost function are evaluated. The best control sequence is chosen to be the main control sequence. Then, the first element of the control sequence is applied. Subsequently, the sequence is shifted, and the last element is initialized. Note: In the implementation, lines 4 and 5 can be combined to reduce the computational effort.
Algorithm 2 Feature-Based MPPI Control
Input: F : Transition model;
  K: Number of samples;
  T: Number of timesteps;
   U 0 : Initial control sequence;
   x 0 : Recent state estimate;
   Σ , ϕ , C , λ : Control hyper-parameters;
  I: Number of features with feature index i = { 1 , 2 , , I } ;
   U i : Initial control sequence of i th feature;
   ϕ i , C i : State dependent costs of i th feature;
   K i : Number of i th features samples;
1:  while Controller is active do
2:      [ U 0 , S ¯ 0 ] OCS ( K , T , U 0 , x 0 , Σ , ϕ , C , λ )
3:      for  i { 1 , 2 , , I }  do
4:        [ U i , ] OCS ( K i , T , U i , x 0 , Σ , ϕ i , C i , λ )
5:        [ , S ¯ i ] OCS ( K i , T , U i , x 0 , Σ , ϕ , C , λ )
6:      end for
7:       i * argmin i S ¯ i , with i = 0 , 1 , . . , I
8:       U 0 U i *
9:      SendToActuator u 0 0 ;
10:     for  t { 1 , 2 , , T 1 }  do
11:       u t 1 0 u t 0
12:     end for
13:      u T 1 0 Initialize u T 1 0
14:  end while

4. Maritime Application Scenario

Autonomous mobility of maritime systems is a central problem of our time. Due to advances in science and technology, the requirements for the performance are constantly increasing. For example, recent publications such as [18,19,20] show how NMPC algorithms are used in the maritime environment to solve complex control engineering problems. Moreover, algorithms in the context of RL such as Deep Q-learning [21], Deep Deterministic Policy Gradients [22] or Actor–Critic [23] are increasingly used, which achieve good results for various applications, but the majority of them are only validated in simulation. While MPPI control was used to dock a fully actuated vessel in full-scale autonomously [24], a gap in this research area is the feature-based MPPI control of an autonomous vessel for collision avoidance. In order to also being able to perform autonomous maneuvers, such as heading for a defined target outside the port, it must be ensured that the hazards during these maneuvers are minimized. As regulated by the International Regulations for Preventing Collisions at Sea (COLREG) [25], in the event of a predicted collision, it is mandatory to take evasive action. In poor visibility conditions such as fog, heavy rain or due to obstacles appearing through the water surface, there are scenarios where the vessel’s sensors will not detect potential collision partners until a short distance away. In such a scenario, using the standard MPPI approach, all sampled trajectories would result in a collision. This overloads the controller and makes it impossible to avoid a collision. Thus, the question arises how feature-based MPPI control can be applied to avoid a collision even in such a scenario. To answer the question, the feature-based MPPI control algorithm is applied in an emergency braking scenario for an autonomous vessel. In the scenario being discussed, the vessel is traveling at full speed along a predefined path, where an obstacle appears at a certain point. Therefore, the vessel has to autonomously perform a so-called last-minute maneuver to avoid a collision. To provide an overview, first the state and dynamics of the fully actuated research vessel Solgenia are presented. Then, a standard MPPI controller is parameterized, which causes the vessel to follow a predefined path. This is then extended to a feature-based MPPI controller by introducing an emergency brake feature.

4.1. Dynamics of the Vessel

In [26], a detailed description of the dynamics of the research vessel Solgenia including identification of the model parameters and the calculation of the actuator thrusts is outlined. Nevertheless, a short description of the modeling is given in the following since the model is an elementary component of the MPPI approach used in the scenario. A photography of the research vessel Solgenia is shown in Figure 2a. According to [27], the vessel’s state vector is given by a combination of the 2D pose η = ( x y ψ ) and the velocity vector ν = ( u v r ) in body-fixed coordinates, shown in Figure 2b. This model was extended in [24] to also consider the dynamics of the actuators. Thus, the transfer properties of the i = 1 , 2 , 3 controlled actuators can be modeled in each case by a first-order low-pass filter with an equivalent time constant T E , limited slope a ˜ , and dead time T T represented by
a ˙ i ( t ) = a ˜ i if a i ( t ) w i ( t T T i ) T E i > a ˜ i a ˜ i if a i ( t ) w i ( t T T i ) T E i < a ˜ i a i ( t ) w i ( t T T i ) T E i else ,
where w i denotes the i-th actuator’s desired state and a i denotes the i-th actuator’s actual state. Thus, the actual actuator states a = ( a 1 a 2 a 3 ) = ( n AT α n BT ) , where the speed of the azimuth thruster (AT) is denoted by n AT , the orientation of the AT is denoted by α and n BT denotes the speed of the bow thruster (BT), and the desired actuator states w = ( n d , AT α d n d , BT ) denoting the desired variables of a get a part in the system state
x = ( η ν a w ) R 12 .
The whole dynamic model of the vessel is given by
w ˙ = u
a ˙ = d ( a , w )
M ν ˙ + C RB ( ν ) ν + N ( ν ) ν = τ c ( a , ν ) + τ d
η ˙ = J ( ψ ) ν ,
where in (16a) the desired variable vector w includes the integral action u . In (16b), the actuators’ dynamics are considered by d : R 3 × R 3 R 3 whose components are given in (14). According to [27], the time derivative of the body-fixed velocity ν ˙ is implicitly described in (16c) as a function of the body-fixed velocity ν , the mass matrix
M = m X u ˙ 0 0 0 m Y v ˙ m x g Y r ˙ 0 m x g N v ˙ J c o m p ,
the Coriolis matrix
C RB ( ν ) = 0 m r m x g r m r 0 0 m x g r 0 0
and the damping matrix
N ( ν ) = X u 0 0 0 Y v Y r 0 N v N r
including the system parameters listed in Table 1, the input vector τ c and disturbance vector τ d . The kinematics Equation (16d) describes the transformation of the body-fixed velocity into local coordinates as a function of the rotation matrix J ( ψ ) . The influence of unmodeled effects and environmental disturbances are represented by the disturbance vector τ d . The controlled force vector
τ c ( a , ν ) = F AT ( n AT , ν ) cos α F AT ( n AT , ν ) sin α + F BT ( n BT , ν ) F BT ( n BT , ν ) L BT L AT F AT ( n AT , ν ) sin α
depends on the geometric parameters L BT , L AT and α and the thrusts F AT and F BT , shown in Figure 2b. The thrust F AT is generated by the AT, and F BT denotes the thrust generated by the BT. These thrust forces can be modeled according to [28] dependent on various physical constants, the body fixed velocity vector ν and the actual states of the actuators a . Thus, the force generated by the BT is given by
F BT = K T ρ d BT 4 n BT | n BT | e c b u 2 ,
where K T denotes a unitless constant with
K T = d 1 if n BT 0 d 2 if n BT < 0 ,
ρ denotes the density of the water, the diameter of the propeller is denoted by d BT . Furthermore, the quadratic damping of the thrust dependent on the surge velocity component is scaled by a coefficient denoted by c b . Using this modeling structure, the influence of the relative speed in the axial direction of the BT is assumed to be small and hence neglected. This assumption cannot be used when modeling the force F AT , since the relative axial velocity component of the AT is large in the given application scenario and given by
u a = u cos ( α ) + ( v r L AT ) sin ( α ) ,
where L AT denotes the distance between the AT and the vessel’s center of gravity (CG). According to [26], the force generated by the AT is modeled by
F AT = c 1 ρ d AT 4 n AT | n AT | c 2 ρ d AT 3 u a | n AT | ,
where d AT denotes the AT’s diameter, and the coefficients c 1 and c 2 are given by
c 1 c 2 = ( a 1 b 1 ) if n AT 0 u a 0 ( a 1 0 ) if n AT 0 u a < 0 ( a 2 b 2 ) if n AT < 0 u a < 0 ( a 2 0 ) if n AT < 0 u a 0 ,
where a 1 , a 2 , b 1 , b 2 R denote the coefficients of the AT. The parameters identified by [26] are listed in Table 2. Note that, according to [26], the parameters b 1 , b 2 and c b are eliminated during the evaluation phase of the identification process. For more detailed information about the dynamics, the reader is referred to [24,26,27].

4.2. Inequality Constraints

Due to the maximum speed of the actuators, two inequality constraints
h ( x , u ) = | n AT | | n AT , max | | n BT | | n BT , max | 0 , t [ t 0 , t 0 + T ] ,
are defined. By considering the actuator dynamics (14) in the model, no further inequality constraints are required.

4.3. Equality Constraints

In the treated scenario, the most important aim is to avoid collisions. In order to achieve this, first the indicator function
ζ ( η ) = 0 if η η fine 1 if η η col
is defined, where η col is the subset of the state-space where the vessel causes a collision and η fine is a disjoint subset. Consequently, the equality constraint to be fulfilled is given by
g ( x ) = ζ ( η ) ,
where only the indicator function (27) is used. In the next part, the derivation of the cost function is presented for the given scenario.

4.4. Cost Function

The instantaneous state-dependent cost function C ( x ) significantly governs the behavior of the controlled system, since it is used to evaluate the sampled stochastic trajectories. However, for a complex task, the stage cost function can be learned from expert behavior; using inverse reinforcement learning [29,30,31] for the given scenario, we can define the linguistic criteria for the cost function and then express them as simple equations and subsequently add them up. The vessel should move on a predefined trajectory shown in Figure 3 and should avoid collisions. This task is complicated by significant current, wind and further disturbance effects that occur when moving in the Rhine River. For a good system behavior, the vessel should meet the following criteria, which are sorted in descending order of importance:
  • A collision should be prevented.
  • The actuators must not be overloaded.
  • The position should follow a predefined the trajectory.
  • The orientation should be in line with the trajectory.
  • While the surge velocity component should match a reference, the absolute value of the sway component and yaw-rate component should be minimized.
While criteria 1 and 2 are already treated introducing the inequality and equality constraints, to meet criteria 3–5, the quality of sampled trajectories is evaluated based on two parts
C ( x ) = C pos ( η ) + C vel ( ν ) ,
where C pos ( η ) denotes the position and orientation dependent costs used to meet the criteria 3 and 4, C vel ( ν ) denotes the part of the cost function that is dependent on the body-fixed velocity ν and thus is introduced w.r.t. meeting criterion 5. In the following, the parts of the cost function (29) are presented.

4.4.1. Costs Dependent on the Position

For a clear presentation of the costs dependent on the position, these are defined in transformed local coordinates, which are generated by the linear transformation
η ¯ = ( x ¯ y ¯ ψ ¯ ) = J 1 ( ψ start ) ( η η start ) ,
where the start orientation of the trajectory is denoted by ψ start and η start denotes the 2D pose located at P1 given in Table 3. Subsequently, the costs dependent on the transformed position are given by
C pos ( η ¯ ) = c x ( | x des x ¯ | ) + c y | y ¯ | + c ψ | ψ ¯ | ,
where c x , c y , c ψ R + denote weighting coefficients, and x des R + denotes the distance between P 1 and P 2 shown in Figure 3.

4.4.2. Costs Dependent on the Velocity

The difference between the body-fixed velocity components and the desired reference velocity is penalized using
C vel ( ν ) = c u | u des u | + c v | v | + c r | r | ,
where c u , c v , c r R + denote the weighting coefficients. Note that this velocity dependent part of the cost function is minimized by the optimal velocity ν * = ( u des 0 0 ) . Thus, a drifting behavior of the vessel is penalized. In the next section, the resulting problem formulation is given.

4.5. Resulting Problem Formulation

To minimize the cumulated costs (29) subject to the given dynamics (16a)–(16d), for the inequality constraints (26) and the equality constraints (28) using MPPI control, the OCP has to be formulated in the assumed structure (1a)–(1b). Consequently, the equality and inequality constraints must be considered in the cost function with
C MPPI ( x ) = C ( x ) + c pen max [ 0 , h ( x ) ] + g ( x )
being defined, where c pen R + denotes coefficients of the penalty terms. In addition, the terminal cost function ϕ ( x ) must be determined. Because this function is not needed in the given scenario, it is defined as
ϕ ( x ) = C MPPI ( x ) .
In MPPI control according to [4], a discrete-time system dynamics with additive input noise is assumed in (1b). Therefore, the time continuous system dynamics given in (16a)–(16d) is discretized using the explicit fourth order Runge–Kutta method with a step size h. Therefore, the input signal is chosen to be piecewise constant within this step size. Consequently, the discrete-time vessel dynamics that maps a system state is given by
X t + 1 = F v ( X t , u t ) .
Finally, the assumption that the input of the system is disturbed with time discrete AWGN yields
u t N ( u des , t , Σ ) ,
where u t denotes the actual and u des , t the desired system input at time instance t. The covariance matrix of the AWGN is denoted by Σ . Using these assumptions, the resulting stochastic OCP is given by
min U d e s E Q t = 0 T 1 [ C MPPI ( X t ) + u des , t R u des , t ] | X 0 = x 0
with X t + 1 = F v ( X t , u t ) , t { 0 , 1 , , T 1 } , and u t N ( u des , t , Σ ) , R = λ Σ 1 ,
where the sequence of desired inputs is denoted by U des = { u des , 0 , u des , 1 , , u des , T 1 } . In the next part, the usual MPPI approach is extended to the feature-based MPPI to improve the system’s behavior.

4.6. Feature Definition

To use the presented feature-based MPPI algorithm, at least one feature has to be chosen. As already discussed in Section 4.4, the selection of the features can be done by inverse reinforcement learning [30,31] or by the definition of a linguistic quality criterion, which is subsequently formulated mathematically. It is important to note that only well-chosen features lead to an improvement of the system’s behavior. In this context, well-chosen means that, in some cases, low-cost regions of the original cost function (33) of state-space are explored by sampling trajectories with controls, which minimizes the feature costs. Note, for the given scenario, the following linguistic quality criterion is defined: a reduction in speed can mitigate or even prevent a collision with an obstacle. For this purpose, an emergency break feature is defined by choosing
C 1 ( x ) = ϕ 1 ( x ) = ν Q ν , Q = diag ( c F u , c F v , c F r ) ,
where c F u , c F v , c F r R + denote the coefficients of the velocity dependent costs. Consequently, a control sequence that minimizes this feature leads to the exploration of the region of the state-space decreasing the vessel’s velocity. In the following section, the presented scenario is used to compare the performance of classical MPPI and feature-based MPPI control.

5. Controller Design

In this section, the design for both the standard MPPI controller presented in Section 2 and the feature-based MPPI controller presented in Section 3 is shown with application to maritime scenario presented in Section 4. For this purpose, the modularized and distributed control structure used is described. Note that an Unscented Kalman Filter (UKF) according to [33] is used to estimate the system state based on the sensor signals and the actuators’ states. These are controlled in subordinate control loops. This is followed by a description of how the control parameters are determined. Finally, the selection of the cost function’s coefficients is presented.

5.1. Architecture

An overview of all parts of the used control architecture is given in Figure 4. The state estimation and the communication are performed by the rapid prototyping system MicroAutoBox2 (MABX2). While the supervisory MPPI algorithm is running on the MABX3, three subordinate 1232SE control units by Curtis are used to control the drives. An RTK-GPS and an IMU sensor send their measurement data to the MABX2 using a UDP protocol. Using a UKF, this information is fused with the measured actuator state a to estimate the system’s 2D pose denoted by η ^ , the body-fixed velocity vector ν ^ and the disturbance forces τ ^ d , which are sent to the MABX3 via UDP protocol. The MPPI algorithm uses the provided state estimation to calculate the desired system input u . Because the system input is defined as the zero-order-hold time derivative of the desired values of the actuator control loops, an explicit Euler integrator can be used to calculate w implemented on the MABX3. Subsequently, w is sent via UDP protocol to the MABX2, which then provides w to the subordinated actuator controllers via CAN protocol. The encoders’ measurements of a are denoted by a m and collected with high quality; therefore, no additional signal processing is needed. The measured actuator states are sent from the encoders to the 1232SE drive controllers, which then send them via UDP to the MABX2 for state estimation.

5.2. Controller Parameters

The choice of controller parameters determines the properties of the sampled trajectories, which are realizations over Q . They have a significant influence on the quality of the estimation of the optimal control sequence.

5.2.1. Numerical Solution of the Initial Value Problem

As already discussed, the corresponding initial value problem can be solved numerically using the explicit fourth order Runge–Kutta method with a step size of h = 40 ms to sample the trajectories. It should be noted that, in principle, there is no information about the future course of the disturbance variables. Due to the river current, the disturbance vector τ d is assumed to be constant within the predicted horizon.

5.2.2. Choice of Controller Step Size

The controller step size denoted by T C T R L should be selected as large as possible to reduce the dimension of the control sequence in the optimization problem, but also small enough to be able to induce a desired system behavior. Furthermore, it has to be considered that T C T R L is a multiple of h = 40 ms. Simulations have shown that good performance can be achieved with T C T R L = 360 ms.

5.2.3. Prediction Horizon

The dimension of the control sequence, as well as the influence of the inaccuracies of the model, increases with rising prediction horizon denoted by T. However, a sufficiently large prediction horizon is essential for evaluating the result of the choice of control sequence. Based on the vessel’s dynamics, the prediction horizon is determined based on the time until the maximum surge velocity is achieved. To reach u m a x = 3 m/s with a maximal acceleration of 0.2 m/s 2 , a duration of 15 s is needed and thus a prediction horizon of T = 41 time steps is chosen.

5.2.4. Covariance Matrix of Additive Noise

In path integral control framework [12], the choice of the covariance matrix Σ of the AWGN, the quadratic costs of the input is determined by Σ = λ R . The quadratic penalty term u R u relates to the accelerations of the propellers and the rotational velocity of the AT. We assume uncorrelated noise processes to excite the system’s behavior and thus the covariance matrix
Σ = σ AT 2 0 0 0 σ α 2 0 0 0 σ BT 2
is chosen. The elements of Σ are chosen proportionately to the corresponding actuator dynamics listed in Table 1. While the maximum dynamics of the AT is given directly, the maximum acceleration of the BT is calculated using a first order Taylor approximation of the delay time, this yields an equivalent time constant of T T + T E = 350 ms. Thus, the variances of the AWGN are given by
( σ AT 2 σ α 2 σ BT 2 ) = c 71.66 s 4 1.4966 rad 2 / s 2 188.57 s 4 ,
where parameter c is introduced to scale the variances. Regarding the explore-exploit dilemma [12], the optimal value c * = 0.09 is approximated by numerically solving the minimization problem
c * = arg min c k = 0 T sim [ C MPPI ( x ˜ k ) + u ˜ k R u ˜ k ] ,
where T sim denotes the number of simulation steps, u ˜ k denotes the approximated optimal system input using MPPI control and x ˜ k denotes the resulting state at the time instance k with x ˜ k + 1 = F ( x ˜ k , u ˜ k ) . Using the initial state η ¯ 0 = ν 0 = a 0 = w 0 = 0 and T sim = 3 T = 123 simulation steps c * is found, which parameterizes the MPPI algorithm leading to minimal cost of the scenario. Note that, using the MPPI update law (11), the noise of the AWGN and thus its scaling factor c has a huge influence of the approximated optimal controls u ˜ .

5.2.5. Temperature

According to [12], the temperature λ with
R = λ Σ 1
scales the quadratic input costs matrix R . The relation between R and C MPPI has a significant impact on the resulting system behavior. However, C MPPI has already been scaled freely. Thus, λ = 1 has been chosen.

5.2.6. Number of Predicted Trajectories

Using the MPPI update law (11), the expected value is approximated by the mean over the sampled trajectories. Thus, the quality of the approximation increases with the number of drawn samples used. However, the required computing time also increases with the number of realizations. A buffer t buf > 15 ms is targeted to make the real-time capability of the controller robust. Empirically determined, one core of the TI AM5K2E04 processor with four ARM Cortex-A15 cores 1.4 GHz on the rapid prototyping system MABX3 requires 343 ms to simulate and evaluate 9000 trajectories. Thus, the real-time capability of the controller with a buffer t buf = 17 ms > 15 ms is ensured. The choice of all controller parameters determined in this subsection. To provide a structured overview, the parameters are listed in Table 4. Note that, for an objective comparison of the standard MPPI and the feature-based MPPI approaches, the parameters are chosen to be equal.

5.2.7. Cost Function Parameters

The structure of the cost function (33) and the structure of the feature costs (38) are already derived in previous sections. Since their coefficients have a huge influence on the behavior of the system, they must be determined with a similar diligence as the control parameters. The units of the coefficients are chosen in such a way that the corresponding product is unitless. With respect to the parameters c x , c y and c ψ , which are associated with the positions, reaching the desired position x des = 409.9 m and keeping the desired orientation is weighted more than leaving the path orthogonally. Thus, c x = 500 1/s > c y = 200 1/s and c ψ = 2500 1/rad are chosen. With reference to c u = 5000 s/m, c v = 10 s/m and c r = 7500 s/rad, which are assigned to the velocity components, errors of the surge speed w.r.t. the desired surge speed u des = 2 m/s and yaw rate is weighted more heavily than the occurrence of a sway speed component. The feature cost coefficients with c F u = 300 s/m, c F v = 30 s/m and c F r = 10 s/rad are chosen to weight the surge speed component by far the most, since it can be assumed that this component dominates in the application scenario. Subsequently, it is assumed that there is an obstacle which in the case of x ¯ > x collision = 80 m would cause a collision. Furthermore, it is assumed that the vessel detects the obstacle if its position is x ¯ > x appear = 60 m. An overview of the coefficients is given in Table 5.

6. Simulation Results

The MPPI controller presented in Section 2 and its feature-based extension introduced in Section 3 are embedded in a simulation environment to compare their performances when controlling the vessel model in the maritime application scenario described in Section 4. Therefore, the parameters derived in Section 5 are used. The vessel model including the actuator dynamics, the UKF according to [33], the MPPI controller and the feature-based MPPI controller are implemented in a simulation environment. According to [33], a constant river current is assumed with a velocity of 0.45 m/s and an orientation of 200 to the x ¯ -axis. The simulation time is determined to 60 s. To compare the vessel’s behavior, both the standard MPPI approach and subsequently the feature-based MPPI approach are used to control the vessel. The resulting trajectories are compared in Figure 5. Furthermore, to visualize the orientation of the vessel along the trajectory, the vessel’s contour is plotted at the time instances t i = i ( Δ T ) with i = { 0 , 1 , 2 , 3 , 4 , 5 , 6 } and Δ T = 10 s.
In the absence of an obstacle, both trajectories are quite similar in orientation and position. Note the position difference in y ¯ occurs since it is only penalized slightly in (1a). After the obstacle appears, the different control approaches cause different behavior. While the system controlled by the standard MPPI controller causes a collision with the obstacle, the feature-based MPPI control approach can prevent this collision by performing a braking maneuver. The time courses of the body-fixed surge velocity and the vessel’s velocity in the direction of the x ¯ -axis are shown in Figure 6. In the absence of the obstacle for t 44 s, the plotted velocity components show the same behavior. After a learning phase of about 5 s immediately after activating the controllers, the vessel accelerates till it reaches the desired surge velocity. After the acceleration phase, the surge velocity is kept exact at 2 m/s. For t > 44 s using the feature-based approach, the vessel decelerates and reaches x ˙ = 0 m/s at t = 59 s. A comparison of the desired and the actual actuator values is given in Figure 7. Furthermore, in Figure 8, the caused costs using the different approaches are plotted over time. In contrast, using the standard MPPI controller, the vessel’s velocity does not change after the vessel appears. This effect results from the fact that, due to the small explored state-space using conventional MPPI control, all predicted trajectories cause a collision for t > 44 s.
Consequently, the vessel controlled by the standard MPPI approach collides with the obstacle at full speed at t = 52.5 s. Regarding the input trajectories shown in Figure 7, the desired and actual actuator values are well fitting. Furthermore, in absence of an obstacle for t 44 s, the input trajectories show nearly the same behavior. To keep the surge velocity at 2 m/s, the AT is driven with about 950 rpm after it reaches its maximal value during the acceleration phase. While the AT is used to reach and keep the desired velocity, the BT is used to compensate the disturbances due to the river current. Using the feature-based MPPI approach for t > 44 s, the actuators’ trajectories lead to the braking behavior of the system.
To reach the maximal deceleration of the vessel, the AT’s orientation is inverted while its speed is maximized. Regarding the time series of the stage costs shown in Figure 8 in the absence of an obstacle for t 44 s, both approaches cause the same costs. Both controllers find inputs leading to decreasing costs. After the obstacle is detected, the feature-based control approach applies the learned inputs corresponding to the defined feature. These inputs lead to increasing cost due to the decelerating surge velocity; however, in contrast to the standard MPPI approach, a collision is prevented and thus being in the high cost region of state-space is avoided. In this section, it was shown that, using the feature-based MPPI control approach, a significant improvement of the vessel behavior in a maritime application scenario is reached.

7. Full-Scale Results

To validate the simulation results, full-scale scale experiments are used. For this purpose, the algorithms used in the simulation will be transferred to the rapid prototyping systems of the research vessel Solgenia. A distributed structure of the components is used as already described in Section 5. In addition, the same controller parameters and cost functions are used as in the previous sections. The data are recorded directly during the experiments on the vessel. Regarding the disturbances immediately before the experiments, a river current of 0.46 m/s was determined in the vicinity of the trajectory. This current has an orientation of about ψ c u r r = 165 and is thus almost opposite to the movement of the vessel on the trajectory. However, it should be noted that, although the current values were measured by allowing the research vessel to drift uncontrolled in the vicinity of the trajectory, they vary along the river and can therefore only be regarded as rough estimates, which, in reality, are overlaid by a stochastic component. In the experimental phase, the task described in Section 4 was executed once with a standard MPPI controller and once with a feature-based MPPI controller. For this purpose, the vessel’s position is stabilized at the beginning of the trajectory using feedback linearization according to [33] until the MPPI controller is activated. Then, the MPPI controller is activated for 60 s because both a collision or a collision avoidance maneuver could occur within this time span. In Figure 9, a comparison of the vessel’s behavior is shown for both experiments. In this figure, the vessel’s trajectories are plotted in local coordinates. Furthermore, the vessel’s contour and thus its orientation are shown at the time instances t i = i ( Δ T ) , with i = { 0 , 1 , 2 , 3 , 4 , 5 , 6 } and Δ T = 10 s. While using the standard MPPI control visualized in the upper plot of Figure 9, the vessel collides with the obstacle; using feature-based MPPI control, a collision can be prevented. Compared to the simulation results presented in the previous section, in the full-scale experiments, the vessel tracks the reference trajectory with higher errors. This behavior is to be expected due to model inaccuracies and the disturbance vector assumed to be constant within the prediction horizon.
In the application example described, special focus is placed on the surge velocity component, since it is the most weighted in the selection of the coefficients of the cost function and thus the most important property in the case of the absence of an obstacle. However, if an obstacle is detected in this scenario, it is interesting to see how the speed develops in the direction of this obstacle. Thus, the surge velocity in body-fixed frame denoted by u and the velocity in direction of the obstacle denoted by x ˙ are shown in Figure 10. The corresponding actuators’ signals are shown in Figure 11. The minimal path costs β are plotted over time in Figure 12 in logarithmic scale. Using the described subordinated actuator control loops, the actual values of the actuators follow the desired values pretty good. Both the conventional MPPI controlled system and the feature-based MPPI controlled system show almost the same behavior for t 44.6 s in the absence of an obstacle.
Standalone paragraphs require initial indentation, i change to indentation, please confirm At t = 5.1 s, the surge velocity increases until a maximum is reached with u = 1.93 m/s at t = 15.5 s. After that, the surge velocity is kept at about 1.9 m/s by both controllers. It should be noted that both controllers achieve a good response to the disturbances. The difference between the desired and actual surge velocity is assumed to be due to the limitation of the propeller speed of the AT and model inaccuracies in this state region of high speed. At t = 44.6 s, the vessel collides with the obstacle within the prediction horizon. The standard MPPI controller does not decelerate after the detection of the obstacle. This effect results from the fact that, due to the small explored state-space using conventional MPPI control, all predicted trajectories cause a collision for t > 51.2 s. Consequently, predicted trajectories with surge velocity close to u des cause lower costs. In contrast, the feature-based MPPI controlled system explores more important areas of the state-space shown in Figure 1.
This allows the feature-based MPPI controller to apply the learned braking trajectory, which is optimal regarding the emergency braking feature (38), as soon as the obstacle is detected. Thus, it can also react excellently for t > 44.6 s and brake in time to avoid a collision. As can be seen in Figure 9, braking is achieved by counterclockwise rotation of the vessel, which allows the hydrodynamic effects of the vessel to be utilized in addition to the thruster forces of the propellers. As a consequence, while using the standard MPPI controller, the vessel’s velocity in the direction of the obstacle is 1.85 m/s at the moment hitting the obstacle; using the feature-based MPPI approach, the feature-based MPPI controller decelerates the velocity in the direction of the obstacle down to 0.05 m/s and thus avoids a collision. The minimum path costs, which are shown in Figure 12, are equal due to the equivalent system behavior of the two control approaches up to the detection of the obstacle. Using the standard MPPI approach after the detection, a low-cost region of state-space next to η col is explored. However, this leads to the fact that, already for t > 51 s, all predicted trajectories cause a collision. In contrast, using feature-based MPPI control, directly after the obstacle detection, the explored state-space is located at a region around the braking trajectory. This yields to higher costs, but even prevent the real high state-dependent costs caused by a collision. Using an MPC approach, the consideration of costs directly provides a quantitative quality criterion. Thus, if the cost curves in Figure 8 and Figure 12 are compared, it can be seen that the quality with regard to the cost function to be minimized is almost identical. In this section, it was shown that feature-based MPPI control in full-scale conditions can greatly expand and improve the behavior of a vessel and prevent collision without increasing the computational cost.

8. Conclusions

In this paper, an extension of the MPPI algorithm [4] is presented improving the sample efficiency by a knowledge-based feature definition. This is the first possibility to incorporate information subject to a scenario additive to the usual cost function into the control algorithm. The conventional MPPI and the presented feature-based MPPI control approaches are compared in a collision avoidance scenario. Therefore, a stochastic nonlinear OCP for enabling a vessel to follow a track and avoid collisions is presented and subsequently solved with both approaches. The performances of the algorithms are evaluated using data out of a simulation environment and full-scale experiments. In both simulation and full-scale experiments, while the standard MPPI would collide with an obstacle, the feature-based MPPI can prevent the collision although the same number of trajectories were drawn in both cases. In the absence of an obstacle, the feature-based MPPI controller learns an optimal braking trajectory that minimizes the feature costs subject to the modeled system dynamics. When the obstacle is detected, this is then successfully applied to avoid a collision. Thus, by inserting an emergency braking feature, the performance and usability of the algorithm could be increased significantly. Concerning other NMPC approaches, the MPPI control algorithm is characterized by a very good parallelization capability, since the samples can be drawn in different kernels. This property is not affected by the presented feature-based extension. Thus, the number of features that could be explored and learned can be increased by adding further parallel computing power. In future work, an evaluation of how the presented feature-based approach can be used in other sample-based NMPC algorithms could be scientifically valuable.

Author Contributions

Conceptualization, H.H., S.W. and J.R.; methodology, H.H.; software, H.H. and S.W.; validation, H.H. and S.W.; formal analysis, H.H.; investigation, H.H.; resources, J.R. and S.W.; data curation, H.H. and S.W.; writing—original draft preparation, H.H.; writing—review and editing, H.H., S.W., M.D. and J.R.; visualization, H.H.; supervision, S.W., M.D. and J.R.; project administration, J.R. All authors have read and agreed to the published version of the manuscript.

Funding

The article processing charge was funded by the Baden-Württemberg Ministry of Science, Research and Culture and the HTWG Konstanz—University of Applied Sciences in the funding programme Open Access Publishing.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

The recorded data from simulations and real experiments, which are presented in this paper, can be downloaded using the link https://www.htwg-konstanz.de/fileadmin/pub/ou/isd/Regelungstechnik/Regelungstechnik_Data/MDPI_Machines22_Feature_Based_MPPI_Data_Simulation_and_Experiment.zip. Accessed on 23 May 2022.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:
AWGNAdditive white Gaussian noise
CGCenter of gravity
HJBHamilton–Jacobi–Bellman
MABXMicroAutoBox
MPPIModel predictive path integral
NMPCNonlinear model predictive control
OCPOptimal control problem
RLReinforcement learning
UKFUnscented Kalman filter

References

  1. Homburger, H.; Wirtensohn, S.; Reuter, J. Feature-Based Proposal Density Optimization for Nonlinear Model Predictive Path Integral Control. In Proceedings of the 6th IEEE Conference on Control Technology and Applications (CCTA), Trieste, Italy, 22–25 August 2022. [Google Scholar]
  2. Kleinert, H. Path Integrals in Quantum Mechanics, Statistics, Polymer Physics, and Financial Markets, 5th ed.; World Scientific Publishing Ltd.: Singapore, 2009. [Google Scholar]
  3. Kappen, H.J. Path Integrals and Symmetry Breaking for Optimal Control Theory. J. Stat. Mech. Theory Exp. 2005, 2005, P11011. [Google Scholar] [CrossRef] [Green Version]
  4. Williams, G.; Wagener, N.; Goldfain, B.; Drews, P.; Rehg, J.; Boots, B.; Theodorou, E. Information Theoretic MPC for Model-Based Reinforcement Learning. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Singapore, 29 May 2017–3 June 2017. [Google Scholar]
  5. Schaal, S.; Atkeson, C. Learning Control in Robotics. IEEE Robot. Autom. Mag. 2010, 10, 20–29. [Google Scholar] [CrossRef]
  6. Theodorou, E.A.; Buchli, J.; Schaal, S. A Generalized Path Integral Control Approach to Reinforcement Learning. J. Mach. Learn. Res. 2010, 11, 3137–3181. [Google Scholar]
  7. Kappen, H.J.; Ruiz, H. Adaptive Importance Sampling for Control and Inference. J. Stat. Phys. 2016, 162, 1244–1266. [Google Scholar] [CrossRef] [Green Version]
  8. Gómez, V.; Thijssen, S.; Symington, A.; Hailes, S.; Kappen, H.J. Real-Time Stochastic Optimal Control for Multi-agent Quadrotor Systems. In Proceedings of the 26th International Conference on Automated Planning and Scheduling (ICAPS 16), London, UK, 12–17 June 2016; 2015. [Google Scholar]
  9. Homburger, H.; Wirtensohn, S.; Reuter, J. Swinging Up and Stabilization Control of the Furuta Pendulum using Model Predictive Path Integral Control. In Proceedings of the 30th Mediterranean Conference on Control and Automation (MED), Athens, Greece, 28 June–1 July 2022. [Google Scholar]
  10. Theodorou, E.A.; Todorov, E. Relative entropy and free energy dualities: Connections to path integral and KL control. In Proceedings of the IEEE 51st Annual Conference on Decision and Control (CDC), Grand Wailea Maui, HI, USA, 10–13 December 2012; pp. 1466–1473. [Google Scholar]
  11. Theodorou, E.A. Nonlinear Stochastic Control and Information Theoretic Dualities: Connections, Interdependencies and Thermodynamic Interpretations. Entropy 2015, 17, 3352–3375. [Google Scholar] [CrossRef]
  12. Kappen, H.J. An Introduction to Stochastic Control Theory, Path Integrals and Reinforcement Learning. Coop. Behav. Neural Syst. Ninth Granada Lect. 2007, 887, 149–181. [Google Scholar]
  13. Thijssen, S.; Kappen, H.J. Path Integral Control and State Dependent Feedback. Phys. Rev. E 2015, 91, 032104. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  14. Gandhi, M.S.; Vlahov, B.; Gibson, J.; Williams, G.; Theodorou, E. Robust Model Predictive Path Integral Control: Analysis and Performance Guarantees. IEEE Robot. Autom. Lett. 2021, 6, 1423–1430. [Google Scholar] [CrossRef]
  15. Yin, J.; Zhang, Z.; Theodorou, E.; Tsiotras, P. Improving Model Predictive Path Integral using Covariance Steering. arXiv 2021, arXiv:2109.12147. [Google Scholar]
  16. Kusumoto, R.; Palmieri, L.; Spies, M.; Csiszar, A.; Arras, K.O. Informed Information Theoretic Model Predictive Control. In Proceedings of the International Conference on Robotics and Automation (ICRA), Montreal, Canada, 20–24 May 2019; pp. 2047–2053. [Google Scholar]
  17. Williams, G.; Drews, P.; Goldfain, B.; Rehg, J.M.; Theodorou, E. Information-Theoretic Model Predictive Control: Theory and Applications to Autonomous Driving. IEEE Trans. Robot. 2018, 34, 1603–1622. [Google Scholar] [CrossRef] [Green Version]
  18. Abdelaal, M.; Hahn, A. NMPC-based trajectory tracking and collision avoidance of unmanned surface vessels with rule-based colregs confinement. In Proceedings of the IEEE Conference on Systems, Process and Control (ICSPC), Hammamet, Tunisia, 16–18 December 2016; pp. 23–28. [Google Scholar]
  19. Lutz, M.; Meurer, T. Optimal trajectory planning and model predictive control of underactuated marine surface vessels using a flatness-based approach. arXiv 2021, arXiv:2101.12730. [Google Scholar]
  20. Bärlund, A.; Linder, J.; Feyzmahdavian, H.R.; Lundh, M.; Tervo, K. Nonlinear MPC for combined motion control and thrust allocation of ships. In Proceedings of the 21st IFAC Wolrd Congress, Berlin, Germany, 11–17 July 2020. [Google Scholar]
  21. Zare, N.; Brandoli, B.; Sarvmaili, M.; Soares, A.; Matwin, S. Continuous Control with Deep Reinforcement Learning for Autonomous Vessels. arXiv 2021, arXiv:2106.14130. [Google Scholar]
  22. Martinsen, A.; Lekkas, A. Curved Path Following with Deep Reinforcement Learning: Results from Three Vessel Models. In Proceedings of the IEEE Oceans MTS, Charleston, SC, USA, 22–25 October 2018; pp. 1–8. [Google Scholar]
  23. Yin, Z.; He, W.; Yang, C.; Sun, C. Control Design of a Marine Vessel System Using Reinforcement Learning. Neurocomputing 2018, 311, 353–362. [Google Scholar] [CrossRef] [Green Version]
  24. Homburger, H.; Wirtensohn, S.; Reuter, J. Docking Control of a Fully-Actuated Autonomous Vessel using Model Predicitve Path Integral Control. In Proceedings of the 20th European Control Conference (ECC), London, UK, 12–15 July 2022. [Google Scholar]
  25. Lloyd. Articles of the Convention on the International Regulations for Preventing Collisions at Sea, 1972; Lloyd’s Register of International Maritime Organization: London, UK, 2005. [Google Scholar]
  26. Kinjo, L.M.; Wirtensohn, S.; Reuter, J.; Menard, T.; Gehan, O. Trajectory tracking of a fully-actuated surface vessel using nonlinear model predictive control. In Proceedings of the 13th IFAC Conference on Control Applications in Marine Systems, Robotics, and Vehicles (CAMS), Oldenburg, Germany, 22–24 September 2021; pp. 51–56. [Google Scholar]
  27. Fossen, T.I. Marine Control Systems: Guidance, Navigation and Control of Ships, Rigs and Underwater Vehicles, 1st ed.; Marine Cybernetics: Trondheim, Norway, 2002. [Google Scholar]
  28. Blanke, M. Ship Propulsion Losses Related to Automatic Steering and Prime Mover Control, 1st ed.; Technical University of Denmark: Lyngby, Denmark, 1981. [Google Scholar]
  29. Ramachandran, D.; Amir, E. Bayesian Inverse Reinforcement Learning. In Proceedings of the 20th International Joint Conference on Artificial Intelligence, Hyderabad, India, 6–12 January 2007. [Google Scholar]
  30. Ng, A.Y.; Russell, S. Algorithms for Inverse Reinforcement Learning. In Proceedings of the 17th International Conference on Machine Learning (ICML), San Francisco, CA, USA, 29 June–2 July 2000. [Google Scholar]
  31. Arora, S.; Doshi, P. A Survey of Inverse Reinforcement Learning: Challenges, Methods and Progress. Artif. Intell. 2021, 297, 103500. [Google Scholar] [CrossRef]
  32. Google Maps. Rhine River. Available online: https://www.google.com/maps/@47.6679863,9.1762762,16.96z (accessed on 5 May 2022).
  33. Wirtensohn, S.; Hamburger, O.; Homburger, H.; Kinjo, L.M.; Reuter, J. Comparison of Advanced Control Strategies for Automated Docking. In Proceedings of the 13th IFAC Conference on Control Applications in Marine Systems, Robotics, and Vehicles (CAMS), Oldenburg, Germany, 22–24 September 2021; pp. 295–300. [Google Scholar]
Figure 1. Explored state–space in a full–scale experiment visualized by sampled trajectories using proposal densities with mean U 0 (red) and mean U F (blue) representing the emergency braking feature. The actuators’ force vectors are drawn in red. The desired trajectory is visualized in green.
Figure 1. Explored state–space in a full–scale experiment visualized by sampled trajectories using proposal densities with mean U 0 (red) and mean U F (blue) representing the emergency braking feature. The actuators’ force vectors are drawn in red. The desired trajectory is visualized in green.
Machines 10 00900 g001
Figure 2. Photography and technical drawing of the research vessel Solgenia. (a) photography of the research vessel Solgenia on Lake Constance in front of the Constance harbor; (b) technical drawing of the research vessel Solgenia with local and body-fixed coordinate systems, geometrical parameters and thruster forces.
Figure 2. Photography and technical drawing of the research vessel Solgenia. (a) photography of the research vessel Solgenia on Lake Constance in front of the Constance harbor; (b) technical drawing of the research vessel Solgenia with local and body-fixed coordinate systems, geometrical parameters and thruster forces.
Machines 10 00900 g002
Figure 3. Bird’s eye view on the Rhine river between the docking area and the open lake [32]. The locations of P 1 and P 2 are specified in Table 3. The desired path is drawn in green. The borders to the inadmissible area next to the river bank are visualized in red. Due to the application scenario, the width and orientation of the river are approximated by 2 b = 120 m and ψ s t a r t = −31.96 .
Figure 3. Bird’s eye view on the Rhine river between the docking area and the open lake [32]. The locations of P 1 and P 2 are specified in Table 3. The desired path is drawn in green. The borders to the inadmissible area next to the river bank are visualized in red. Due to the application scenario, the width and orientation of the river are approximated by 2 b = 120 m and ψ s t a r t = −31.96 .
Machines 10 00900 g003
Figure 4. Distributed and modularized control architecture for full-scale experiments on research vessel Solgenia. The connections are realized via UDP and CAN protocol.
Figure 4. Distributed and modularized control architecture for full-scale experiments on research vessel Solgenia. The connections are realized via UDP and CAN protocol.
Machines 10 00900 g004
Figure 5. Comparison of the resulting trajectories of the vessel in the simulation environment. While the upper plot shows the performance of standard MPPI control, the lower plot shows the performance of the feature–based MPPI control approach. Both plots use transformed coordinates calculated by (30). The green lines visualize the reference trajectories in both plots. After passing the black line, the vessel detects the obstacle visualized as a red line.
Figure 5. Comparison of the resulting trajectories of the vessel in the simulation environment. While the upper plot shows the performance of standard MPPI control, the lower plot shows the performance of the feature–based MPPI control approach. Both plots use transformed coordinates calculated by (30). The green lines visualize the reference trajectories in both plots. After passing the black line, the vessel detects the obstacle visualized as a red line.
Machines 10 00900 g005
Figure 6. Comparison of the body–fixed surge velocity u and the velocity x ˙ in simulation.
Figure 6. Comparison of the body–fixed surge velocity u and the velocity x ˙ in simulation.
Machines 10 00900 g006
Figure 7. Comparison of the resulting input trajectories of the vessel in the simulation environment.
Figure 7. Comparison of the resulting input trajectories of the vessel in the simulation environment.
Machines 10 00900 g007
Figure 8. Comparison of costs in the simulation scenarios plotted over time. Note the logarithmic representation of the plotted costs.
Figure 8. Comparison of costs in the simulation scenarios plotted over time. Note the logarithmic representation of the plotted costs.
Machines 10 00900 g008
Figure 9. Comparison of the resulting trajectories of the vessel in full–scale experiments at Rhine river in Constance. Both plots use transformed coordinates calculated by (30). The green lines visualize the reference trajectories in both plots. After passing the black line, the vessel detects the obstacle visualized as a red line.
Figure 9. Comparison of the resulting trajectories of the vessel in full–scale experiments at Rhine river in Constance. Both plots use transformed coordinates calculated by (30). The green lines visualize the reference trajectories in both plots. After passing the black line, the vessel detects the obstacle visualized as a red line.
Machines 10 00900 g009
Figure 10. Comparison of the body–fixed surge velocity u and the velocity x ˙ in x ¯ direction in local frame.
Figure 10. Comparison of the body–fixed surge velocity u and the velocity x ˙ in x ¯ direction in local frame.
Machines 10 00900 g010
Figure 11. Comparison of the resulting input trajectories of the vessel in the full–scale experiments.
Figure 11. Comparison of the resulting input trajectories of the vessel in the full–scale experiments.
Machines 10 00900 g011
Figure 12. Comparison of costs plotted over time in full-scale experiments. Note the logarithmic representation of the cost.
Figure 12. Comparison of costs plotted over time in full-scale experiments. Note the logarithmic representation of the cost.
Machines 10 00900 g012
Table 1. Identified vessel parameters of the dynamic model [26] and the parameters of the actuators’ models [24].
Table 1. Identified vessel parameters of the dynamic model [26] and the parameters of the actuators’ models [24].
Vessel’s Dynamics ParameterValueActuators’ Dynamics ParameterValue
m3100 kg | n AT , max | 2300 rpm
x g 0 m T E , 1 105 ms
J c o m b 21,179 kgm 2 T T , 1 240 ms
X u ˙ 155 kg a ˜ 1 71.66 1/s 2
Y v ˙ 1070 kg | α max | -
Y r ˙ 1008 kgm T E , 2 95 ms
N v ˙ 3328 kgm T T , 2 160 ms
X u 86.5 Ns/m a ˜ 2 1.496 rad/s
Y v 796 Ns/m | n BT , max | 3800 rpm
N r 5230 Ns/m T E , 3 270 ms
Y r 896 Ns/m T T , 3 80 ms
N v 958 Ns/m a ˜ 3 10,000 1/s 2
Table 2. Identified parameters of the actuators’ thrust models according to [26].
Table 2. Identified parameters of the actuators’ thrust models according to [26].
ParameterValueParameterValue
a 1 0.9047 d 1 0.21
a 2 0.6545 d 2 0.24
d AT 0.36 m ρ 1000 kg/m 3
d BT 0.23 m L AT 2.9 m
Table 3. Significant points specifying the application scenario.
Table 3. Significant points specifying the application scenario.
PointLatitudeLongitudeDescription
P147.6682792180231239.174018424704204Start position
P247.6662826402563349.178596676673507End position
Table 4. Controller parameters of the standard MPPI and the feature-based MPPI approaches.
Table 4. Controller parameters of the standard MPPI and the feature-based MPPI approaches.
ParameterMPPIFeature-Based MPPI
K90008500
T4141
c * 0.0167 0.0167
λ 11
C ( X t ) Equation (33)Equation (33)
ϕ ( X T ) 00
K 1 -500
C 1 ( X t ) -Equation (38)
ϕ 1 ( X T ) -Equation (38)
Table 5. Coefficients of the costs (33) and the feature costs (38).
Table 5. Coefficients of the costs (33) and the feature costs (38).
ParameterValueParameterValue
c x 500 1/m x des 409.9 m
c y 200 1/m u des 2 m/s
c ψ 2500 1/rad c F u 300 s/m
c u 5000 s/m c F v 30 s/m
c v 10 s/m c F r 10 s/rad
c r 7500 s/rad x appear 60 m
c pen 1 × 10 8 x collision 80 m
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Homburger, H.; Wirtensohn, S.; Diehl, M.; Reuter, J. Feature-Based MPPI Control with Applications to Maritime Systems. Machines 2022, 10, 900. https://doi.org/10.3390/machines10100900

AMA Style

Homburger H, Wirtensohn S, Diehl M, Reuter J. Feature-Based MPPI Control with Applications to Maritime Systems. Machines. 2022; 10(10):900. https://doi.org/10.3390/machines10100900

Chicago/Turabian Style

Homburger, Hannes, Stefan Wirtensohn, Moritz Diehl, and Johannes Reuter. 2022. "Feature-Based MPPI Control with Applications to Maritime Systems" Machines 10, no. 10: 900. https://doi.org/10.3390/machines10100900

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop