Machine Learning Enhancement of Manoeuvring Prediction for Ship Digital Twin using Full-scale Recordings

Digital Twins have much attention in the shipping industry, attempting to support all phases of a vessel’s life cycle. With several tools appearing in Digital Twin software suites, high-quality manoeuvring and performance prediction remain cornerstones. Propulsion efficiency is in focus while in service. Simulator-based training is in focus to ensure safety of manoeuvring in confined waters and harbours. Prediction of ships’ velocity and turn rate are essential for correct look and feel during training, but phenomena like dynamic inflow to propellers, bank and shallow water effects limit simulators’ accuracy, and master mariners often comment that simulations could be in better agreement with actual behaviours of their vessel. This paper focuses on digital twin enhancements to better match reality. Using data logged during in-service operation, we first consider a system identification perspective, employing a first-principles model structure. Showing that a complete first-principles model is not identifiable under the excitation met in service, we employ a Recurrent Neural Network to predict deviations between measured velocities and the model output. The outcome is a hybrid of a first-principles model with a machine learning generic approximator add-on. The paper demonstrates significant improvements in prediction accuracy of both in-harbour manoeuvring and shallow water passage conditions.


Introduction
Accurate digital twin models of ships are crucial for training master mariners in harbour manoeuvring and berthing, for safe navigation in other confined waters, for design of new port and bridge facilities and also essential for monitoring of propulsion performance and cost.The common practice for specifying a manoeuvring model is to use wellestablished mathematical procedures to determine parameters from model tank tests with a scale model of the vessel and propellers.Models are refined by detailed scrutiny of various full-scale effects, but updating based on full-scale trials remains a challenge.This is evident in particular during berthing, where in-harbour and shallow water effects need to be updated by data from real life experiments.
Modelling of a ship's dynamic properties is a classical problem based on first-principle physics and hydrodynamics, describing the forces and moments due to flow past the hull and effects of linear and angular accelerations, see Abkowitz (1964) and Newman (1977).
Parameters in these open-water models are captured in model-scale tests using Planar Motion Mechanisms (PMM) for manoeuvring, towing The main contributions of this research are twofold: (i) it suggests a novel hybrid implementation of the digital twin manoeuvring model that combines dynamical models with data-driven models fitted on real-world ferry operation data.This significantly differentiates from previous studies in the domain of hybrid models that often used simulation data.A few used lab experiments but did not accept time-varying real input.(ii) The proposed digital twin model outperforms state-ofthe-art approaches in prediction accuracy of full-scale ship behaviour and yields significantly improved predictions for the ship's surge, sway, and yaw velocities.
The remainder of this article is structured as follows: This introductory section is elaborated by a discussion of related work in Section 2.Then, Section 3 gives a description of the surface vessel under study, its currently considered digital twin, discusses accuracy and identifiability challenges and finally formulates the problem at hand.As motivated by the presented evidence and analyses, Section 5 details the hybrid framework proposed in this study for obtaining more accurate digital twins.Section 6 presents the obtained results and, finally, conclusions are drawn in Section 7.
A list of the abbreviations and symbols used throughout the paper is provided in Tables 1 and 2, respectively.

Digital twins in the maritime domain
With hybrid simulation technology being a cornerstone in the spacecraft industry since the early steps towards Space, offering complete mission simulations, training for astronauts, validation of real-time control and communication, NASA defined a new buzzword, the Digital Twin (Glaessgen and Stargel, 2012), for a system of systems: an integrated multi-physics, multi-scale, probabilistic simulation of a vehicle or system that uses the best available physical models, sensor updates, fleet history, etc., to mirror the life of its [...] twin.The concept of system of systems simulation has also become a focus of the maritime sector, including the Open Simulation Platform software (OSP, 2020) for co-simulation (Hatledal et al., 2020).In the marine domain, Digital Twins combine state-of-the-art engineering model and analytics with asset specific operational data to create digital simulation and information models (Danish Maritime Authority, 2018).Manoeuvring and performance prediction are essential parts (Øvereng et al., 2021;Schirmann et al., 2019;Skulstad et al., 2021).The information updating over the lifetime of a vessel would also include maintenance and condition overviews for machinery components, but this research focus on enhanced prediction capabilities of manoeuvring-and ship speed prediction.This research investigates how manoeuvring and speed performance prediction can be assessed, and prediction accuracy be improved.We first discuss system identification as a tool to assess the parameters of nonlinear hydrodynamic models, with structure given by first physical principles.We subsequently turn to investigate how ''black box'' neural network approximation techniques could be employed to predict behaviours that are not captured by the classical models, but are learned by training against observed behaviours of the full scale vessel in service conditions.

System identification for manoeuvring models
System identification using the model topology given by physics and hydrodynamic knowledge (grey box models), has been investigated by several authors.Research in this area, over a long period of time, has shown that manoeuvring models have several challenges: Up-scaling of model parameters to a full-size vessel often comes with some loss of accuracy.Hull resistance and power to speed relation has uncertainties in actual conditions of sea, wind and shallow water.Fine-tuning of parameters in a model with data collected on-board of operating vessels has limits due to the available amount of excitation.Open water parameter identification of steering dynamics were conducted by Åström and Källström (1976) for a linear discrete-time recursive model, with parameters that were not directly related to hydrodynamic models.Continuous-time models are closer to physics and several linear hydrodynamic parameters were estimated in Abkowitz (1981) using an Extended Kalman Filter (EKF).This approach was extended to nonlinear hydrodynamic models in Zhou and Blanke (1989), who showed bias-free estimation of parameters that occur linearly in a nonlinear model, using an instrumental reformulation of the EKF, the innovations filter (Ljung, 1979).Where Abkowitz (1981) used the standard zig-zag tests and could identify selected parameters, Blanke and Jensen (1997) used dedicated excitation (multi-frequency binary test sequence on the rudder angle and in closed heading control loop), giving rise to better identifiability.The issue of identifiability of parameters was investigated by several authors and findings summarized in the book (Walter and Pronzato, 1997).The cases of marine vehicles was the focus in Blanke and Knudsen (1999) and Blanke and Knudsen (2006).It was demonstrated that sufficient input excitation is a fundamental limitation for parameter identification in both linear and nonlinear systems and also showed that changes in model structure and parametrization impact the identifiability of a model.Richer model structures might hence exist that better approximate observed behaviours for a given excitation.

Data-driven models for ship motion estimation
Data-driven models is an alternative approach to the pre-determined structure of first-principle models.Statistical methods in machine learning describe a range of nonlinear approximators that in principle can approximate any nonlinear relationship.Deep Learning (DL) methods have shown exceptional results in different domains, e.g. for object detection and recognition, realistic image generation (Karras et al., 2019), and text-to-speech conversion (van den Oord et al., 2016).In the realm of ship manoeuvring, Moreira and Soares (2003) tested the feasibility of using RNNs as an alternative to manoeuvrability models for simulation, using data obtained from simulations of tactical circles and zigzags by a manoeuvrability model.They found that RNNs can learn to robustly and accurately reproduce the simulation results.They later pursued testing their method with data from full-scale trials of a catamaran and concluded that the RNN was capable of predicting the manoeuvres of the ship with satisfactory accuracy (Moreira and Soares, 2012).However, the quality was dependent on the influence of the environment and neglected input.A probabilistic approach for tuning hydrodynamic parameters in a vessel model was employed using simulated sensor data by Han et al. (2021)

Hybrid models
Both modelling methods have challenges that limit the obtainable accuracy.Purely data-driven approaches do not adhere to physical principles, potentially resulting in infeasible models outside the domain covered by the available data.First-principle models do not capture all phenomena affecting ship motions.Attempts of explicitly describing such phenomena result in models with high complexity.This research investigates the feasibility of using a hybrid where a deep learning model captures the residues between first-principles model and observed data.We refer to this as a hybrid model (Rai and Sahu, 2020).The approach is to employ Deep Learning to estimate the velocity discrepancy between the measured velocity and the manoeuvring model's prediction.Combining the two models utilize their individual strengths, creating a hybrid model with the approximation capabilities of a Deep Learning model and the adhesion to physical principles of the first-principles model.
Several methods have been employed as a means to incorporate physical knowledge into data-driven models.One line of research has sought to provide NNs with prior knowledge by integrating them into the differential equations to ensure physical plausibility of the model (Rueden et al., 2019).Looking at satellite reaction wheel dynamics, (Cen et al., 2011) used NNs to estimate input and output nonlinearities in a state-space representation.They further compared their model against other types of NN for Fault Detection and showed superior performance.Training of the NNs was performed separately in a static manner.(Raissi et al., 2019) used NNs to estimate the nonlinearity of Burger's equation and extended the loss function to include a loss that penalizes breaking of initial and boundary conditions.The training was done using numerical differentiation.Similarly, (Lutter et al., 2019) encoded physical knowledge into NNs by using them as approximators of unknown functions in a second-order Ordinary Differential Equation (ODE) derived using Lagrangian Mechanics, and in doing so, extended the loss function to include the ODE, enforcing the NN to comply with the laws of physics.
Another line of research seeks to embed knowledge into the datadriven models by using the output of a physics-based model together with the data-driven model.For instance, Wang et al. (2017) used the Reynolds-Averaged Navier-Stokes equations to predict pressure, velocity, turbulent kinetic energy, and distance to the nearest wall from which they constructed a set of input features.They then used Random Forest regression to model the discrepancy of the Reynolds stresses between the RANS and high fidelity Direct Numerical Simulation.Qraitem et al. (2020) used Long Short-Term Memory with Time Distributed Dense layers to predict simulated two-dimensional fluid flows.They simulated the case of a simplified model of a real system, using two dynamical models, one that served as the ground truth and one that represented the known dynamics.They used the second model's output as an input to their NN and tested three different sources of inaccuracies: difference in system parameters, difference in system forcing functions and missing terms in the differential equations of the system.In all three cases, the NN significantly improved the prediction accuracy.By applying the machine learning as a compensator of the unmodelled behaviours or inaccuracies of a conventional dynamic model, Skulstad et al. (2021) made an investigation of in-harbour manoeuvring, showing significant improvements in the path prediction during docking.This was done by adding nonlinear compensation as part of the ship dynamic model.The authors used a RNN with LSTM cells to model the error of their vessel model and obtained new predictions by combining the output of both models.Further overviews of hybrid models were published in Karpatne et al. (2016) and Rai and Sahu (2020).
While the research direction presented in this study is related to that of Skulstad et al. (2021), the focus here is on velocity estimation.Due to the nature of dynamical systems, prediction errors do not accumulate for velocity estimates in the same way they do for position.Therefore, the validity of the estimation is not limited to a finite time horizon.Furthermore, Skulstad et al. (2021) focuses on prediction ahead of time, assuming constant model inputs.This work evaluates the proposed model's accuracy by using measured input at every time-step.

Problem formulation
Ship dynamics is very well understood and the existence of mechanisms for reproducing the main hydrodynamics effects in towing tanks facilitate accurate manoeuvring representations for research and comparison between hull designs.However, all real life conditions are not captured in standard model tests and up-scaling from model to the real ship is subject to uncertainty.These mainly relate to propeller performance behind ship, hull resistance description, prediction of directional stability with trim and loading condition, shallow water and bank effects and flow phenomena in harbour manoeuvring.
Two areas are of particular concern for the use of digital twins.One is the accuracy of ship speed prediction in shallow water, another is manoeuvring accuracy for use in simulator training.This study addresses the improvement of manoeuvring models via the use of a hybrid representation of the ship dynamics.

Test case -Scandlines M/F Berlin
This research is based on data provided by Scandlines for the ferry M/F Berlin, seen in Fig. 1, connecting the Danish harbour of Rødby with the German harbour of Rostock and back.M/F Berlin is a roll-on/rolloff hybrid ferry with two 4500 kW main engines and one 4500 kW hybrid engine.It has a 13,500 kW Controllable Pitch Propeller (CPP), two 3500 kW azimuth propellers and two 1350 kW bow thrusters.The propeller configuration is given in Fig. 2. It is 169.5 m long with a width of 25.4 m.The logged data stems from 53 separate transits, each with a duration of approximately two hours, collected between the 7th and 23rd of October, 2019.Several sensors were used to collect the data, resulting in different sampling rates for different measurements, ranging between 0.5 Hz and 2 Hz.Furthermore, each sensor's sampling frequency fluctuates with ±1 Hz from the mean frequency.A list of all relevant measurements is given in Table 3.
Since M/F Berlin operates a single route, the manoeuvres captured in the collected data tend to be similar for voyages heading in the same direction.Each voyage can roughly be split into 5 phases: (1) Manoeuvring out of the starting port, (2) acceleration, (3) cruise speed, (4) deceleration and (5) manoeuvring into destination port and docking.All 5 manoeuvres are illustrated in Fig. 3.The first and last phases happen at low speeds (< 5 m/s) and are where all major turning manoeuvres occur.When the ferry leaves Gedser, it reverses out of the harbour and performs a 180 • turn.The ferry also performs a 180 • turn inside Rostock before reversing into the ferry berth, but this manoeuvre is performed almost in place and at a lower speed.Examples of both manoeuvres are seen in Fig. 4. When leaving Rostock, the ferry has to steer from the berth onto the pathway exiting the port.It performs no major turning manoeuvres when entering Gedser.During these phases, the azimuth propellers are the primary source of propulsion, except during certain manoeuvres where the bow thrusters are turned on.During the acceleration phase, the centre propeller is turned on, which is illustrated in Fig. 3.The ferry is sailing on a straight path in the acceleration/deceleration and cruise speed phases, only performing slight turning manoeuvres to avoid oncoming traffic if required.

Digital Twin of the M/F Berlin
The degree of realism experienced by mariners during simulation training depends on the digital twins' ability to predict the ship's velocity.In this paper, the velocity components of interest are the surge speed, the sway speed and the yaw-rate of the ship.Accurate prediction of the velocity vector requires a rigorous representation of the ship's dynamics.The 3 degrees-of-freedom (3 DOF) kinetics model of a surface marine vehicle (e.g. a ship) is based on the general equations of motion for a rigid body developed by Newton and Euler and was first developed by Abkowitz (1981).It is given in the ''robot form" by Fossen ( 2011) where  ≜ [  ]  is the position and heading of the vessel in the North-East-Down (NED) coordinates system,  ≜ [  ]  is the vector of velocities as well as the yaw-rate,  ≜ [      ]  is the vector of thrusts forces and moments,  are the time-varying wind and currents forces and moments, () is the rotation matrix by  about the vertical axis (  in the body frame) and   =  −   ()  is the vessel-current relative velocity vector, with   being the current velocity.
A constant mass-inertia matrix for the vessel is assumed in this study.This assumption is not overly conservative since variations in the vessel's mass due to loading of passengers or on-board vehicles can be estimated and residual uncertainty is negligible compared to the vessel's mass.
Given that the mass-inertial matrix coefficients are usually known by carrying out offline hydrodynamic computations based on the ship geometry data and Strip theory (Faltinsen, 1993), the most important parameters for fully identifying the vessel model are the elements of the damping and Coriolis matrices, namely the hydrodynamic coefficients.Apart from the large number of parameters (more than 20), which implies quite demanding excitation conditions, another obstacle to accurate parameter estimation is the fact that the hydrodynamic coefficients significantly vary with the water depth, bank effects, hydrodynamic angles, etc.
An alternative parametrization of surface vessel dynamics used tabular representation of hydrodynamic forces and moments.This ap-

Digital twin comparison with vessel
The accuracy of the baseline model is evaluated by simulating 53 transits and comparing the velocity output with measured data.Table 4 presents the overall accuracy of the Dk1 model in each of the velocities in surge, sway and yaw.
Overall, the Dk1 simulator model (Jensen et al., 1993), is capable of predicting the speed of M/F Berlin.However, the accuracy of prediction is not uniform through the entire journey.The predictions of surge speed of the ferry during the cruise speed phase and the turning rate during manoeuvres lack accuracy.This is illustrated in Figs. 3 and 4. Some reasons for these discrepancies becomes clearer in the following subsections, and the paper will suggest a hybrid structure of the DT solution that improves the accuracy.

Model identification for a Digital Twin
System identification of a model aims to minimize the difference between measured output, () = (|(), ()) and estimated output, ŷ() = ŷ(|, ()), where  denotes discrete time samples, () is measured input, () is unknown input and  a vector of parameters in the model, then the output error is (2) The goodness of fit between measured and estimated output is commonly expressed by the mean square output error, where  is a positive-definite scaling matrix that ensures contribution of the same order of magnitude for each of the elements in the output error vector.
The system identification problem is then to determine a parameter estimate that minimizes the cost function   , i.e. the mean square output error, θ = arg min    (|, ). (4) Since   is quadratic, the parameter estimate is the vector θ that gives a zero value of the gradient of   , The gradient of the mean square output error, the left hand side of (5), is essential for identification of any model.The parameter estimate θ is obtained by iteration, where the updating term in Eq. ( 6) determines identifiability of a model and confidence in identified parameters.The first and second derivatives of the mean square output error with respect to parameters, the gradient and the Hessian, determine these properties.

Identification experiments
Full parameter estimation of surface vessels using conventional model-based system identification poses challenges due to the large number of parameters and inability to provide sufficient excitation with available manoeuvring devices on a full-scale vessel.Test manoeuvres for manoeuvring model identification include the classical zig-zag and turning tests, which explore course stability and speed reduction during a turn.Tests to particularly excite the linear dynamics include a random noise, sinusoidal and the multi-frequency binary excitation, which by far outperformed others (Blanke and Knudsen, 1999).Parametrization of the PMM-based models (Abkowitz, 1964), Blanke and Jensen (1997) show parameters that are co-linear, i.e. can vary together without any visible effect on model output (Blanke and Knudsen, 2006).
The achievable model quality, including uncertainty of parameters and goodness of fit to measured output, can be assessed by analysis of identifiability and model parameter sensitivity.This is the topic of the next subsection.

Identifiability assessment
Identifiability of the classical first-principles based models, where parameters were the hydrodynamic derivatives, have been addressed in several studies (Blanke and Jensen, 1997;Abkowitz, 1981;Blanke and Knudsen, 2006).These indicated that overparameterization was commonly encountered, i.e. the parameters that could be identified given the available input excitation for a surface vessel was a small subset of the model parameters set.The alternative parametrization, introduced by Chislett (1996), is studied below, where tabular representation is used for hydrodynamic forces and moments on the hull.In this representation, the nonlinear elements of (  ) in Eq. ( 1) are expressed as the superposition of look-up tables, each of which captures the contribution of a specific factor (e.g.hydrodynamic angles, water depth, thruster efficiency or Froude number).As an example, the relation between hull resistance  and Froude number   is captured by a look-up table   describing the surge resistance coefficient as a function of   .Fig. 5 shows the resistance coefficient as a tabular function of   .
Six different tables are considered for the identifiability analysis.The selected tables describe surge, sway and yaw hull resistance coefficients as functions of the corresponding velocities , , , the ''correction" on the surge and yaw hull resistance curves due to the hydrodynamic angles for drift  = arctan (

−𝑣 𝑢
) and yaw  = arctan ( 0.5  ) as defined in Delefortrie et al. ( 2016) and finally, the thruster efficiency.An example of the effect of varying the nominal value of surge resistance on the predicted forward speed is plotted in Fig. 6.Perturbation of tables is done by uniform variation of all elements in a table, which is equivalent to applying a gain factor on the respective force or moment.Changes were made by ±5%, ±10% of the nominal values of the gains, which are 1.The Sensitivity analysis is performed on the Dk1 model in order to quantify the contribution of selected parameters to the system's outputs, which are the rates of change in surge , sway  and yaw .The unknown parameter vector  has 6 components and is defined as, where   ,   ,   correspond to the gain factors applied to the surge, sway and yaw hull resistances tables, respectively,  ,  ,  ,  to the surge and yaw resistance correction tables and   to the thruster efficiency table.
Let  * be a nominal value for the parameter vector.The absolute sensitivity of the output   to the parameter   is defined as the variation of the output to a small perturbation of the parameter Walter and Pronzato (1997) and Saltelli et al. ( 2008) which is numerically approximated by Here  , denotes the perturbed th output due to variation of the th parameter by   .This quantity can be scaled by a suitable factor to become a non-dimensional sensitivity defined by where   has been chosen to be the root mean square value of output   less its average, ȳ .With   samples and sampling period   ,   is given by where   denotes the time at which the th sample was obtained.
Remark 4.1.Each parameter in  is a gain factor for an entire table.
For calculating  , , only the relative variation is needed for each component of .The latter equals the chosen percentages of variation (±5% or ±10%).
The Dk1 model is simulated with each of the parameters being perturbed one at a time by ±5% and ±10% of its nominal value.The mean square index    defined by Walter and Pronzato (1997) The vector  , contains all  , values calculated at each sample from Eq. ( 9), is used to rank the parameters for each system output according to their contribution to it.The cumulative -mean square index is a metric of the overall significance of a parameter for the system model and it is defined for each parameter as where   = 3 is the number of measured outputs (surge, sway velocities and yaw rate).Parameters with -mean square index smaller than a threshold, usually defined as a percentage of the maximum    (Blanke and Knudsen, 1999), are not sufficient to allow identification with reasonable confidence in the parameter estimates.
Both model structure and the actual input signal have paramount importance on the sensitivity measure and therefore on the identifiability of a model (Blanke and Knudsen, 2006).Fig. 7 shows the ranking of the 3DOF vessel tabular parameters based on the cumulative -mean square indices.The threshold is defined as 2% of the maximum   .
The results of the sensitivity analysis show that the correction tables for the surge and yaw hull resistance curves that correspond to variations of  and  are rather insignificant for the model and, therefore, difficult to estimate accurately.Specifically, the -mean square index of  ,   is below the identifiability threshold, while   and   dominate in the outputs of the system.This confirms the case of challenging accurate identification of the vessel parameters even in this consolidated tabular parametrization.The preceding analysis motivates a different approach to the problem of obtaining a sufficiently accurate description of the vessel's digital twin via learning-based methods.This leads to the following problem formulation: Problem statement: Obtain a refined data-driven manoeuvring model for a marine surface vessel based on an existing digital twin for manoeuvring and the signals listed in Table 3.
The subsequent sections demonstrate that incorporating operational data together with a machine learning model to provide corrections to the manoeuvring model can improve both the overall accuracy and the accuracy in the specific areas where predictions are generally more complicated.

Generic approximators
Approximation of nonlinear relations between input and output, without regard to model structure, is possible using methods from semi-parametric statistics and smoothing (Hastie et al., 2017), chapter 11.
The idea is to let machine learning estimate combinations of input to form derived features, referred to as  , and then approximate output as a nonlinear function   of these.With  being the input vector of dimension  and   ,  = 1, 2, … , , -dimension vectors of weights, then derived features are calculated as     .The function   and the vector   are both iterated to minimize an index  , (13) using training data Neural networks are approximators in several layers, each using a similar approach.Based on the outputs of the previous layer, each new layer produces a latent feature vector  () with  being the layer index.The output a network with  layers,  (), is then given by,  (1) =  (1) ( (1) ) ( 14) For simplicity, the equations above uses the full weight matrix  = , with also includes the bias term.Note that number of latent features in each layer can vary, i.e.   is not necessarily the same for all layers.In the same way, neural networks are not confined to a single output and  () is a vector of length , with  > 1 for multivariate regression.Common choices for all elements of (⋅) is the sigmoid function ( 17) or the hyberbolic tangent function.
In regression models, (⋅) can be a linear transformation mapping the hidden state of the previous layer to the desired number of output values, .
For regression, one typically uses a mean square error as measure of fit, often referred to as the loss function, Minimizing  is done using gradient descent.The gradient of the loss is expressed using a compact notation for parameters,  = { (1) ,  (2) , … ,  () }.The parameter update scheme is then, where   is referred to as the learning rate and  denotes the current iteration.The process of computing the gradient of the loss-function with respect to the network parameters is called backpropagation as it is necessary to backtrack through each layer of the network.A common approach during gradient descent is to make more frequent parameter updates based on a smaller batches of the training data, to avoid premature convergence to local minima, as the loss-function surface often is non-convex.Once trained on a set of training data [   ,    ]  , the quality of this type of black box model is done using mean square measures between new data [   ,    ]  and predicted output ŷ, where new data do not comprise the training set.

Neural networks for time-series data
For sequential input, such as a time-series, the input becomes a matrix,  ∈ R  × , where the superscript  denotes the number of samples in the time-series.Recursive Neural Networks (RNNs) are variants of the neural network, designed to process sequential data.RNNs process each step of the input iteratively and transfers information between steps through a cyclic connection from the latent features of the previous iteration.In RNNs, the latent features  is referred to as the hidden state of the network.In RNNs, Eqs. ( 14)-( 16) take the form  (1)   = ( (1)     +  (1)    (1)  −1 ) (20) ⋮   (  ) = ( ()    (−1) with the initial value of the hidden state,  ()  0 commonly being set to zero.The size of the hidden state   is determined by the number of rows in the   and   matrices of the respective layer and does not have to be the same as the size of the input vector or the hidden state of the previous layer.For regression models, a choice for (⋅) could simply be a linear transformation.As with the neural network, Eq. ( 19) is used to update the parameters of the RNN.Because the same parameters are used to compute the latent features at each step of the sequence, the derivation of the gradient requires repeated application of the chain rule.This form of backpropagation is referred to as backpropagation through time.For instance, the partial derivative of the loss function with the respect to   is calculated as The details of Eqs. ( 23) and ( 24) can be found in Ref. (Pascanu et al., 2013).For simplicity, they show the derivative for a single layer RNN.The recursive multiplications in Eq. ( 24) can cause gradients to approach either zero or infinity, a phenomenon known as the vanishing/exploding gradient problem.This makes training of RNNs difficult.To overcome this, Hochreiter and Schmidhuber (1997) suggested the Long-Short Term Memory-cell, which was later refined by Gers et al. (1999).The LSTM has an additional feedback connection called the cell-state,  and multiplicative gates that control updating.The gates are referred to as input , forget  , and output  gates.These are defined by, with the ⊙ operator used to denote element-wise multiplication.Cho et al. (2014) suggests an alternative to the LSTM, the Gated Recurrent Unit (GRU) that employs a similar strategy but without the additional cell-state.The gates of the GRU are referred to as update-gate  and reset -gate , defined by, ) ) For a more in depth explanation of RNNs the reader is referred to Goodfellow et al. (2016), chapter 10.The following shows how a black box approximator is combined with a first-principles model to form a hybrid entity between machine learning and the first-principles model.

Hybrid model
Given a set of control and environmental inputs, the ferry produces measurable outputs, where the velocity   = [      ]  is of particular interest.The subscript  is used to denote the actual (measured) velocity.Given the same inputs, the Digital Twin will produce a prediction of the velocity,   , with the subscript  indicating the velocity output of the manoeuvring model.As explained in Section 4, there will be some difference  =   −   between the measured and predicted velocity.Assuming that  is primarily caused by effects that concern the set of measured input data, a data-driven model can learn to estimate this error, with  being a time series of inputs to the ferry and the Digital Twin,  is the model parameters and  ℎ is the velocity discrepancy prediction of the hybrid model.If  ℎ can be predicted, such that new and more accurate model can be achieved by adding the output of the manoeuvring model and the data-driven model.
The Dk1 model will serve as the foundation of the hybrid model, predicting the speed in the surge, sway and yaw directions of the ferry.Fig. 8 shows a diagram of how the two models are combined.While it is possible to have a single multi-input, multi-output deep learning model that predicts corrections for all three velocity components, initial experiments revealed that this yielded worse performance compared to single-output models that only modelled one velocity component error.Therefore, the suggested hybrid model will include an RNN for compensation predictions for each of the three velocity components.

Hyper-parameter selection for recursive neural networks
A set of hyper-parameters is chosen for each model, before training of the RNNs begin.Hyper-parameters specify the architecture of the model: The number of layers, and the number of units in layers' hidden states and similar.These design factors are important and are chosen with care.The number of hidden units in each layer does not have to be identical, however was chosen to be so to reduce the number of hyper-parameters.Another design consideration is the length of the input time-series.While it is possible to train a network on entire time-histories of recorded transits, this would be impractical as longer sequences increase the likelihood of exploding/vanishing gradients, and only relatively recent events have relevance for the current velocity.The error prediction  ℎ, at time  is based on a fixed length of past inputs and velocity predictions Here,  denotes the sequence length, i.e. the number of past samples used for the prediction.The optimal value of  depends on how much of the past influences current events and is a part of the hyper-parameter set.
Other hyper-parameters select type of recurrent cell, GRU or LSTM, and the optimization algorithm used to tune model parameters during training.Early results indicated that adaptive variants of the gradient descent algorithm achieved good accuracy, therefore two such schemes were considered: Adam (Kingma and Ba, 2014) and RM-Sprop (Hinton et al., 2012).Table 5 provides a complete list of all the hyper-parameters included in the search.
Hyper-parameter tuning is the process of choosing an appropriate hyper-parameter configuration.It consists of training a model with different configurations and comparing the resulting performance.As the training of neural networks is a stochastic process, it is necessary to use cross-validation for each configuration.The hyper-parameter tuning of the RNNs used in the hybrid model was done using the Bayesian optimization algorithm suggested by Snoek et al. (2012).The implementation is that of KerasTuner (O'Malley et al., 2019).The RNNs were built and trained using Keras (Chollet et al., 2015) with TensorFlow 2.0 (Abadi et al., 2015).

Validation on full-scale data
Each sensor used to collect data has deviation in sampling frequency across the collection period and data were hence re-sampled at 0.5 Hz using Linear Interpolation.Measured Speed Over Ground (SOG) was used to identify individual transits; when SOG rises above 0.5 knots, a transit starts, and stops when it falls below 0.15 knots.
The hybrid model uses data-driven models as compensators for the prediction error made by the manoeuvring model, which implies an  underlying assumption that prediction errors are due to inaccuracy of the manoeuvring model.If the error used as the regression target for the data-driven models is affected by other potential error sources, such as e.g.measurement errors causing input to differ from actual input to the ferry, the compensation would be invalid.Precautions were taken to avoid this.First, the Speed Through Water (STW) measurements in the surge direction of the ship, used to calculate ocean current, proved to have a significant bias and be unreliable during acceleration phases.Therefore, the bias was estimated and removed, and the STW measurements during acceleration were set to that of SOG, such that the corresponding current estimate would be zero.Secondly, due to the ship's heading being close to constant for most of the transit, the manoeuvring model would start to diverge because of a difference in equilibrium between rudder angle and wind effect from the actual ferry.
Consequently, the hybrid model was only used to predict the yawrate at surge speeds below 9.72 knots (5 m/s).Almost all manoeuvres are performed below this speed, except slight turns to avoid oncoming traffic.
The training data for the RNNs was obtained by simulating each transit with the Dk1 simulator and subtracting the resulting output velocity from the measured velocity.As mentioned in Section 5.3, using separate deep learning models to estimate the compensation term for each velocity component gave superior accuracy compared to a single deep learning model estimating compensation terms for all the velocity components.Thus, three separate RNNs had to be trained, one for each velocity component.A suitable set of hyper-parameters for each RNN was found by running the Bayesian optimization algorithm for 60 iterations with 3-fold cross-validation performed at each iteration.The hyper-parameter tuning process included 41 of the 53 trips, whereas the remainder were kept for evaluation.Subsequently, the models were re-trained with all 41 trips, using the identified hyper-parameter sets.The hyper-parameter tuning resulted in several models with similar performance, so for each velocity, the three best models were re-trained on the larger dataset and the best performing model was selected.The resulting configuration of each is given in Table 6.
The Mean Absolute Error (MAE) was used as the model loss for training of the RNNs, both during hyper-parameter tuning and retraining.MAE was chosen instead of the mean squared error since the gradient of the latter decreases faster for small values of the estimation error, which in turn, prevents any substantial updating of  (36).Training was stopped if the validation loss did not improve by at least 1 × 10 −3 for ten consecutive epochs.All data were normalized using MinMax-normalization, as shown in Eq. ( 37).

Velocity prediction in 3 degrees of freedom
The following section will compare the prediction accuracy of the manoeuvring and the hybrid models to illustrate the potential of the suggested approach.The hybrid model predictions for surge speed, sway speed and yaw-rate are obtained by adding the velocity predictions of the manoeuvring model and the respective compensation predictions by the RNNs.The overall Mean Absolute Error (MAE) and the Root Mean Squared Error (RMSE) for the training and test transits for both the manoeuvring model and the hybrid model are given in Table 7. Considering the test set, the correction provided by the hybrid model offers a decrease in mean absolute surge speed error by more than a factor of two.Fig. 9 shows an example surge speed plot in both directions of travel and the corresponding velocity predictions.Overall, the surge speed of the manoeuvring model follows the measured surge speed but with a varying offset.In most cases, the correction estimates of the deep learning model manage to decrease this offset.
Insight into during which parts of the transit showed the most significant improvements were obtained by dividing the transit between Gedser and Rostock into separate phases.These comprise the initial manoeuvring phase (out of the harbour and onto the route), an acceleration phase, a transit phase in open sea with close to steady speed, a deceleration phase, and final manoeuvring into the ferry berth.The distinction of phases is based on surge velocity, with the manoeuvring phases being identified as passage with surge speed exceeding a 5 m/s threshold.The acceleration and deceleration phases are the sequences between the manoeuvring phase and 90% of maximum achieved velocity on a specific transit.Fig. 9 shows vertical dotted lines to separate the phases.The analysis furthermore considers the direction of travel, as manoeuvres are generally similar on transits in the same direction.
Fig. 10 shows the accuracy of the manoeuvring and hybrid models in the five different phases.Considering the plots in Fig. 9, it would  seem that a significant error in the surge speed prediction of the manoeuvring model is what resembles a steady-state error in the transit phase.This is also visible in the rightmost plot in Fig. 10, where the combined test set prediction error shrinks from 0.98 knots to 0.24 knots, making it the phase with the most improved surge speed prediction.
Two other challenging phases of the journey are the ferry leaving Gedser and during acceleration outside of Rostock.In both cases, the hybrid model improves predictions.In the manoeuvring phases around Gedser, the accuracy of both models is lower than in Rostock.This is explained by the fact that the majority of manoeuvring at Gedser happens outside the harbour, where the ship is exposed to wind and current, and on top of this the water outside Gedser is shallow.Together, these factors interact in complex ways that are not captured by the mathematical model whereas machine learning tracks some of the complex behaviours.The impact in the results is clearly visible.
The prediction accuracy of sway speed is shown in Table 8, indicating a decrease in the prediction error by a factor in favour of the hybrid model.Plots of the measured sway speed and corresponding     12).These particular transits were subject to substantial cross-wind, which the Dk1 seems to underestimate.The test error for the hybrid model, on the other hand, is the same for transits in both directions, indicating that it is capable of incorporating the effects of wind on sway speed properly.A reason could be that wind turbulence disturbs anemometer readings but the neural net approximator is able to capture this in prevailing wind conditions.
Predictions of the vessel's turning rate are only considered in the first and last phase of the journey when the vessel's surge speed is below 5 m/s.Table 9 provides the overall yaw-rate prediction accuracy of the hybrid and manoeuvring models.An overall improvement by a factor of approximately two is seen, as was the case with the linear velocities.
When the ferry leaves Gedser, it reverses out of the harbour and performs a 180 • turn in the turning basin.When the ferry enters Rostock, it will travel along a fairway until it reaches the outside of the ferry berth, where it makes a 180 • turn before reversing into the berth.Following the fairway in both directions inside Rostock also requires a slight turning manoeuvre to get to the right dock.Examples of the trajectory of both manoeuvres can be seen in Fig. 13.When entering Gedser, the ferry does not make any significant turns.Examples of the prediction in all 4 scenarios are given in Fig. 14.A significant improvement in yawrate can be seen inside Rostock both during entering and departing.As already stated, the prediction task is more challenging in Gedser due to less protection from environmental effects and shallow water.This is also clear from Fig. 15, where the combined prediction error of the test set in Rostock is reduced from 8 deg/min to 3 deg/min compared to a reduction from 8 deg/min to 5 deg/min in Gedser.

Conclusions
This paper proposed a hybrid manoeuvring model for accurately predicting the speed of a ferry under model uncertainty and varying operating conditions.The hybrid model comprised an augmentation of a first-principle 3 DOF manoeuvring model, Dk1 (Jensen et al., 1993), with a learning-based representation of the velocity error.The superposition of the two models facilitated a much more accurate estimation   The paper has thus demonstrated how significant improvements in ability to predict behaviours were obtained by adding a nonlinear deep learning network as an approximator to capture complex phenomena, which are not part of a first-principles hydrodynamic model.
Ability to track propulsion and manoeuvring behaviours over time, based on data logging from in-service operation, will be important for condition assessment, planning of hull cleaning and propeller polishing, which impact fuel economy.

Fig. 2 .
Fig. 2. Propulsion configuration and underwater hull outline of M/F Berlin.

Fig. 3 .
Fig. 3. Top: Time histories of surge speed: measured (blue) and predicted by Dk1 (orange), using actual input commands to actuators.Bottom: Two input signals, centre propeller pitch (blue) and centre propeller shaft speed (orange).(For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

Fig. 6 .
Fig. 6.Surge speed of the Dk1 model for different variation to the   table.

Fig. 7 .
Fig. 7. Parameters ranked according to their identifiability based on the −mean square index.

Fig. 8 .
Fig. 8.The suggested hybrid model framework, showing data flow of input and output.The blue box Deep Learning Compensator comprise different Deep Learning models, one for each velocity component.

Fig. 10 .
Fig. 10.Training and test results for different phases.Surge speed prediction accuracy (kn) for manoeuvring (MM) and hybrid (HM) models.

Fig. 12 .
Fig. 12. Training and test: sway speed prediction errors of Dk1 and hybrid models.
R.E.Nielsen et al.

Fig. 14 .
Fig. 14.Yaw-rate inside the harbours.Upper right and lower left plot are the same time-series as shown in Fig. 4. The hybrid model achieves a very significant improvement.
The NN compensator is restricted by availability of training data and will not predict accurately beyond conditions covered by training, i.e. if faced with manoeuvring scenarios that are significantly different from what has been used for training.Finally, future research could further improve in-harbour manoeuvring prediction by having the NN compensator learning flow phenomena related to proximity to the hull of banks and piers.CRediT authorship contribution statementRasmus E. Nielsen: Conceptualization, Methodology, Software, Validation, Formal analysis, Investigation, Data curation, Writingoriginal draft, Writing -review & editing, Visualization.Dimitrios Papageorgiou: Software, Validation, Formal analysis, Investigation, Writing -original draft, Writing -review & editing, Visualization.Lazaros Nalpantidis: Conceptualization, Methodology, Investigation, Writing -original draft, Writing -review & editing, Supervision.Bugge T. Jensen: Resources, Writing -review & editing.Mogens Blanke: Conceptualization, Methodology, Formal analysis, Investigation, Writing -original draft, Writing -review & editing, Supervision, Project administration, Funding acquisition.

Table 2
Nomenclature: Use of symbols.

Table 3
Variables used as input and output to models.

Table 4
Prediction error of Dk1 model over 53 transits.
Chislett (1996)oduced inChislett (1996)as the DEN-Mark 1 (Dk1) model structure.Non-dimensional hydrodynamic forces are represented via tables, each of which represents specific hydrodynamic forces and moments in the model.One table, as example, gives non-dimensional hull resistance, others make ''corrections'' to account for different factors, e.g.water depth.These tables describe velocity-resistance curves in relevant conditions and were obtained via a combination of model tank experiments and subsequent cubic-spline interpolation between model tank data.The training simulator model of the ferry M/F Berlin employs the Dk1 model structure and tabular data from model tests of the ferry.The accuracy of this model is assessed in the following.

Table 5
Hyper-parameters used for tuning of the Recurrent Neural Networks.

Table 6
Hyper-parameter sets used for surge speed, sway speed, and yaw-rate compensation models.

Table 7
Surge speed prediction error for Dk1 and Hybrid Models.

Table 8
Sway speed prediction errors for Dk1 and Hybrid Models.
model obtains significant improvements during passage at open sea and in the major manoeuvring phases (leaving Gedser and entering Rostock).As demonstrated in Section 3.3, the yaw-rate predictions of Dk1 can be inaccurate during major turns.This is also a reason for the inaccurate sway speed predictions in the manoeuvring phases.Despite the poor quality of the STW measurements used to estimate the current, it appears that the hybrid model can capture its effect on sway speed quite well, as this is one of the primary causes of sway

Table 9
Turn-rate prediction error for Dk1 and Hybrid Models.