Estimating the effect of biofouling on ship shaft power based on sensor measurements

ABSTRACT Marine biofouling on a ship's hull and propeller increases the resistance of the ship moving through water and reduces the propulsion efficiency of the ship. Estimating the effect of fouling is difficult, as the biomass is rarely measured. In this paper, we present a new data-driven model for the total shaft power use of a large containership, in order to estimate the unobserved effect of fouling. Due to the limitations of both physical models and machine learning models, we develop a Bayesian generalized additive model for our purpose. We discuss issues of representative training data for the model. Further, we subset and subsample the data to a representative sample. Models are compared by out-of-sample predictive quality, physical appropriateness, and through autocorrelation of residuals. The Bayesian generalized additive model combined with computational inference using integrated nested Laplace approximations gives a robust estimate of the biofouling effect over time. It also allows a decomposition of the total shaft power use into effects of speed, weather, and other conditions. This model can be used to understand the effectiveness and timing of different hull and propeller treatments.


Introduction
Marine biofouling (fouling) on a ship's hull and propeller increases the roughness of the surfaces in contact with water, and thus the frictional resistance of the ship (Schultz et al. 2011). This significantly increases the resistance of the ship moving through water and seriously influences the propulsion efficiency and fuel use of the ship. The efficiency lost due to biofouling can be 5-15% of the total shaft power and fuel usage (Molland et al. 2014;Wang and Lutsey 2013;Hakim et al. 2017). Coraddu et al. (2019) state that this contributes to approximately 2.7% of the global anthropogenic carbon dioxide (CO 2 ) emissions (Smith et al. 2015).
Fouling is removed by proper hull and propeller cleaning in order to maintain efficiency, but such cleaning is costly and time consuming. Moreover, cleaning may damage the anti-fouling system and hence accelerate further marine growth. Therefore, hull and propeller cleaning should only be performed when necessary and the optimal time to clean the hull is a trade-off between the cost of cleaning and the efficiency gain achieved. In order to make qualified decisions regarding cleaning, one needs models to estimate the effect of biofouling on the ship resistance and propulsion efficiency, and to separate the effect of fouling from other factors such as ship speed, loading condition and weather conditions. Such models can also be used to analyse the effectiveness of different anti-fouling systems, including the decision of which paint to use.
There has recently been a regulatory push for emission reduction and more environmentally friendly shipping. There are global energy-efficiency requirements from the International Maritime Organization (IMO) as well as global and regional caps on air pollution from ships (MARPOL annex VI (IMO 2020)). In 2018, IMO adopted a strategy for the reduction of GHG emissions from ships, aiming at reducing total GHG emissions from international shipping by at least 50% by 2050 (IMO 2018). Hence efficiency is high on the agenda in the maritime industries, and optimized interventions to reduce the effect of biofouling could contribute to reduce emissions and meet IMO goals. A number of technical measures to reduce fuel consumption in shipping, both in design and operation, are discussed in Sharifi et al. (2017). Furthermore, the International Convention on the Control of Harmful Anti-Fouling Systems on Ships (IMO 2005), adopted in 2001 and entered into force in 2008, prohibits the use of some anti-fouling systems that have previously been effectively used to reduce biofouling. Hence, stricter requirements on anti-fouling systems contribute to the challenge of optimizing interventions to reduce biofouling.
Due to the importance of fuel efficiency and emission reduction in shipping, a number of numerical models have been developed to assess ship performance. Such models include various components related to, e.g. calm water resistance, air resistance, added resistance in waves and wind, propulsion and propeller efficiency, see e.g. Tvete et al. (2020). The effect of biofouling would be an important part of such models, but, as pointed out in Tvete et al. (2020), this effect is still not fully understood and there is a lack of reliable models, see also Yeginbayeva and Atlar (2018). Hence, this paper proposes a data-driven model to account for the effect of biofouling, and this can be incorporated as an important sub-model to improve the accuracy of ship performance models.
For studying propulsion efficiency and fuel consumption, models for relevant physical relations are often proposed. A ship's speed-power curve is typically determined by sea trials and model tests prior to commissioning. Sea trials are ideally carried out under steady conditions, with deep waters, no wind, no current and no waves and with specified draught and trim conditions. Different methods can be used to correct the effect of waves and wind during such trials (Tsujimoto et al. 2021). Power curves are important in the calculation of the attained energy-efficiency design index (EEDI) (Tu et al. 2018) and in the overall assessment of a ship performance and fuel consumption (Jalkanen et al. 2009;Tillig et al. 2018;Bialystocki and Konovessis 2016).
Data-driven models for fuel consumption have become more and more popular due to their ability to take into account many phenomena by using large amounts of data together with limited assumptions based on physical knowledge about the problem. They can also take into account ship-specific and environmental phenomena better than purely physical models. A review of various data-driven methods can be found in Gkerekos et al. (2019), who attempt to find efficient ways to implement the fuel consumption model. This includes Linear Regression (both with and without regularization), Decision Tree Regressors, Random Forest Regressors, Extra Trees Regressors, Support Vector Regressors, K-Nearest Neighbours, Artificial Neural Networks and ensemble method algorithms. One of the conclusions of Gkerekos et al. (2019) was that the quality of the model output correlates with the quality of its training input. This is an important finding that applies to all data-driven models. A critical assumption is that the conditions under which the model is trained are representative for the time period where the model is used to perform predictions. Various data-driven regression models were explored by Brandsaeter and Vanem (2018) to account for the effect of environmental conditions on ship speed. Newer studies that use various data-driven methods can be found in Laurie et al. (2021). They use linear regression, decision trees (AdaBoost), Knearest neighbours, artificial neural network and random forest to predict shaft power to identify performance deterioration due to fouling. Kim et al. (2021) use linear regression and an artificial neural network to predict fuel consumption. The importance of data preprocessing and filtering when using datadriven models to predict ship propulsion is highlighted in Karagiannidis and Themelis (2021).
The aim of this paper is to analyse, based on sensor data collected from ships in operation, the effect of biofouling on the performance of the ship, and the additional propulsion power that is needed due to this effect. Flexible Bayesian models based on the sensor data will be established, avoiding the pitfalls of the machine learning methods mentioned above. As we will use some physical knowledge when we formulate the model, we first outline the necessary background related to the resistance of a ship in water. Thereafter, we comment on our experiences with the machine learning approaches mentioned above and advocate the use of more interpretable models.
In Section 2, we present the data, while the Bayesian generalized additive model is introduced in Section 3, the results in Section 4, and finally we conclude the paper in Section 5.

Ship resistance
The resistance of a ship moving through water is typically described by the relationship between speed and power demand, and is expressed by a speed-power curve. Essentially, a ship's resistance is a sum of the resistance from several sources, such as water and air resistance. It is influenced by the ship's speed, displacement and hull form. In particular, the frictional resistance from water depends on the ship's wetted area, wave conditions, speed and fouling on the hull.
A simple model for the relationship between ship speed, displacement and propulsion power demand is expressed through the Admiralty coefficient (Brown and Aldridge 2019). The Admiralty coefficient, A, is a constant for a given ship and is defined as follows where ∇ denotes the displacement, v is the ship speed and P is the propulsion power need. It is generally acknowledged that this is an approximate model. Further, it assumes a cubic relationship between power demand and speed, given displacement. Hence, the Admiralty law translates to the following approximate relationship between power need and ship speed where C A is a constant (depending on displacement). Another simplified model for the relationship between power need and speed is the propeller law (Brown and Aldridge 2019), which is based on the power required to drive a propeller with speed v p P = cv n p , where c and n are coefficients defining the relationship. It is seen that for n = 3 and c = C A , and by replacing the engine power and propeller speed with the ship power need and ship speed, the two laws are in general agreement. Resistance-based models take other factors into account and will typically be on the form (Brown and Aldridge 2019) where C T is the total resistance coefficient, ρ is the sea water density, S is the hull-wetted surface area and h T denotes the engine efficiency. A model for the instantaneous power used by a ship as a function of velocity and technical parameters of the ship is expressed as follows in Jalkanen et al. (2009), where CF, CR, CA and CAA are the frictional, residual, appendage and air resistances, respectively, S is the wet surface area and 1 0 is the propulsive coefficient. In summary, different models based on physical considerations suggest using a cubic relationship between propulsion power and speed, but these models are all acknowledged to be approximations, see e.g. Tu et al. (2018). The statistical modelling presented in this paper will use an adaptive spline for the speed-power relationship, to alleviate potential inaccuracies in the cubic model, especially for slow or fast speeds. However, for the effect of biofouling, we will assume a proportionality with speed to the power three.

Machine learning by training on clean hull data and estimating the fouling effect from residuals
One intuitive approach to the problem of estimating the effect of fouling is to use a two-step procedure as in Coraddu et al. (2019) in order to obtain predictions of fouling. First, we use a Random Forest (RF) (Breiman 2001) to build a model for shaft power based on the available covariates. This RF is trained on clean hull data and used as a reference model. Secondly, we use the estimated RF model to predict shaft power in the entire period and compare this prediction with observed shaft power. The difference between the predicted and observed shaft power is then assumed to represent the biofouling effect. The critical assumptions here are that the estimated model is fitted to data where marine fouling is essentially absent and that the combination of covariate values in the training period is representative for the entire time period. In the training period with clean hull data, we need to observe the ship in many operational and environmental conditions, and these conditions must be representative for the entire time period. It is not clear how to define the training period in order to balance the need of a clean hull and a sufficient number of representative samples, and for some cases, there exists no good training dataset. In our case, the prediction model varied a lot depending on how we defined the training period. If we use the first year of data to train the model, we find that the ship speed was overall much lower in the training interval than in the succeeding period (see Figure 1 and some additional details in Appendix). Hence, the training period was not long enough to contain representative values for the ship speed. At the same time, the training period was already too large, as our main results show that the hull was far from clean at the end of the period.
The problems we encountered when fitting RF models show that when using black box machine learning models, we depend heavily on having representative training data. It is also difficult, if not impossible, to correct for this in retrospect. In fact, the results of the RF model were so unreliable and unstable that they were not even comparable to the results of the main model in this paper. We believe this was not a problem with RF, as both Gkerekos et al. (2019) and Laurie et al. (2021) had good results with RF, but a problem with the approach of training on clean hull data. The main model we propose will be based on a completely different approach; on fitting an interpretable model to the entire dataset.

Interpretable, flexible additive models
Instead of dividing the dataset into training and prediction sets, we chose to fit an interpretable model to the entire dataset, to infer the unobserved effect of fouling. Since we do not have a measured covariate for the fouling level, this is not a typical regression problem. Instead, we assume a flexible parametric form for the effect of fouling, and use this as a timevarying model component in a generalized additive model (GAM) (Hastie and Tibshirani 1986).
Our proposed model is inherently interpretable, which is a great advantage. As Rudin (2019) states, but trying to explain black box models, rather than creating models that are interpretable in the first place, is likely to perpetuate bad practices and can potentially cause great harm to society. The way forward is to design models that are inherently interpretable.
In the problem of estimating fouling, this is especially relevant. In this application, there is no clear prediction target where we can choose whichever model is the best at attaining good predictions. Rather, we aim for a model that gives a good approximation of the true data generating mechanism, and then we use the fouling estimate from this model. In order to investigate whether our fouling estimate is good, we must be able to interpret and criticize the model. The main assumption in a GAM is that the model components f j add together to give the predictor which is then used to fit the observations y through a likelihood, or loss function, L(y|h). This additive structure enables interpretable model building, where relevant covariates and physical understanding is used to build the model components. The main purpose of using GAM for modelling fouling is to adjust for all covariates unrelated to fouling, and simultaneously model the residual effect of fouling.

Data
For this study, a 2500 TEU containership of about 26,000 gross tonnage with a length of approximately 200 m and breadth approximately 30 m has been analysed. We utilize data from several sources. The primary data source for ship performance is various sensors onboard the ship in operation. These data are supplemented by combining Automatic Identification System (AIS) data for the location of the vessel with weather data from the ERA 5 reanalysis (Hersbach et al. 2020) to obtain the prevailing weather and sea state conditions at all times. The covariates we used from the ERA 5 reanalysis were Wave Direction, Wind Speed, Wave Period and Wave Height. In addition, six dates at which cleaning was performed were obtained from the ship owner, where the last date was the end of our dataset. Unfortunately, we were not able to obtain precise information about the type of each cleaning, e.g. whether it was hull or propeller cleaning.
Based on these data sources, a large number of covariates were available. From these, we selected a set of covariates based on expert judgement and preliminary analysis. The covariates sea water temperature, air temperature, dew point temperature, and fraction of sea ice, were available from the ERA 5 reanalysis but excluded based on lack of significance in preliminary models. See Table 1 for the response and all the  included covariates, together with a brief explanation. Note that the last part of the code name gives the measurement unit. The six cleaning events were converted to the covariate Cleaning_period, an integer representing the six time periods before the cleaning events. We note that wind direction was not included directly, but used to compute Wind Speed relative to ship direction.

Data preprocessing and cleaning
We have approximately 5 years of data. The sampling frequency varied between the different variables, from 1 minute up to 15 minutes. In addition, the sampling frequency for the performance monitoring data varied throughout the period, but it was typically 15 minutes. Hence, we aggregated all variables into 15-min intervals using the mean value in the interval. We used the circular mean value for the variables that were measured in degrees.
In this paper, we study fouling by modeling shaft power as a function of ship speed, weather and other conditions. The fouling effect is a slowly varying component that is best explained in steady-state conditions, like when the ship is in transit. The behaviour of shaft power is more complex when we have low ship speed or high accelerations, which is typically when the ship is manoeuvring close to shore. Hence, to reduce noise in the data, we disregard all time points where the ship speed was lower than 5 knots, and similarly, we disregard all time points where the linear acceleration was greater than 1 knot per 15 minutes and the turning acceleration greater than 2.5 knots per 15 minutes.

Model definitions
We denote the models we propose as BGAM, for Bayesian generalized additive models. The main model we present, BGAM-main, is where y is the observed Shaft Power in MW, ε is the residuals, η is the predictor, and the other terms are referred to as model components. We use linear effects (Xb) for the variables Linear Acceleration, Turning Acceleration, Draft Trim, Draft Mean, Wind Speed, Wave Height and Wave Period. The β's have approximately flat priors. The effect f rw2 (Speed) is for the Ship Speed, and is a second order spline, called a random walk order 2 spline (see e.g. Gómez-Rubio (2020) Chapter 9). We chose this flexible spline, instead of e.g. a Speed 3 curve, to account for any model misspecification at low or high speeds. The scaling parameter s rw2 of this spline is given an exponential prior, based on the penalizing complexity (PC) framework . The effect f 2 is for Wave Direction and is a two-parameter trigonometric function, a 1 sin(x) + a 2 cos(x). We could not use a linear effect for Wave Direction because the minimal angle and the maximum angle should have approximately the same effect, as they represent the same direction. The two parameters have approximately flat priors. The residuals are modelled as iid. Gaussian variables, The biofouling effect h(t * ) is a piecewise linear function in t * , which is time at rest since the last cleaning event, The a k and b k are given approximately uniform priors. The fouling effect is multiplied with Speed 3 following the theory in Section 1.1. The piecewise linear structure is due to our expectation of a jump in the fouling effect after a cleaning event, and then a gradual increase until the next cleaning event.
In addition to the main model, we present a few secondary models for comparison. The model BGAM-T is the same as BGAM-main, except that the time t * used for the fouling effect, is clock time since the last cleaning (not just when the ship was standing still). The models BGAM-V1 and BGAM-V2 are the same as BGAM-main, except for the exponent of the Speed in Speed 3 h(t * ), which is changed to Speed 1 and Speed 2 , respectively. From the physical principles in Section 1.1, we find it natural to use Speed 3 , but we wanted to investigate alternatives. However, as we see in Figure 2, the effect of using a different exponent should not be large for this dataset.

Bayesian inference
We follow a Bayesian approach to inference for the models we propose. We denote the prior model p(y, h), giving a joint distribution for the predictor η and the response y. This prior model is built as p(y|h)p(h), where the first term is referred to as the (observation) likelihood, and the second as the prior. The prior of η is defined through the prior for all the components of η as described above. The posterior is To interpret the model, we compute the posterior estimate and uncertainty for the individual model components that make up η.
We present one main model and several other candidate models. It is impossible to find the 'one true model', as no model is a perfect representation of reality. What we want is a good representation of reality that is useful for answering questions and provides reasonable uncertainty estimates. We approach this in three ways. The first is to build the model structure in line with a physical understanding of the process. The second is to study the quality of out-of-sample predictions, in Section 4.1. The third is to study whether the statistical assumptions are appropriate, e.g. whether the residuals are independent.
For computational inference, we use the R package INLA (Rue et al. 2009, short for Integrated Nested Laplace Approximations. The INLA algorithm is based on quadratic approximations of the likelihood function and sparse matrix representations of Gaussian Markov random fields (GMRFs) (Rue and Held 2005). These models can also be extended to spatial models within the same framework Krainski et al. 2018).

Subsampling to correct bias and variance
When we use a Gaussian likelihood, showed that ε has a strong autocorrelation in time.
Informally, this autocorrelation hints that we are reusing the data, as adjacent observations contain overlapping information about the system we are modelling.
As an extreme example, consider measuring the ship variables for only a single day, but every millisecond, producing a large dataset. Such a dataset would not be informative of fouling as the data would be mostly redundant. Ignoring this problem could cause the models to have a local bias, where different observation frequencies, e.g. due to missing data, imply different levels of informativeness on different days. Worse, however, is that all uncertainty estimates would be rendered useless. Inference on repeated data gives results which are overconfident about their own certainty. Fitting models to the full dataset gave very small uncertainty estimates which were clearly wildly inaccurate.
To reduce or eliminate this correlation, we subsampled the data. First, denote by subset 1 the full, processed data set. Then subset 2 is this data subsampled to every hour, subset 3 to every 4 hours, and subset 4 to every 24 hours (daily). Later we present results for the different levels of subsampling. Even with subset 4, there were missing data.

Size decomposition of additive models
Consider the model in Equation (7) as a sum of vectors, we can then take the length, or norm, of each of the vectors, and consider whether they sum up If we take − to be the Euclidean norm, after centring the vectors, i.e. y = i (y i − y) 2 , and assume that all the vectors are parallel, i.e. correlation 1, then the equation holds. This − is the empirical standard deviation (StDev), and leads us to decompose the standard deviation of y in terms of standard deviations of the model components. Because the correlation is not 1, the sum of the standard deviations will not equal the standard deviation of y. However, presenting the numbers gives us an understanding of the relative sizes. Alternatively, if we take − to be the Euclidean norm squared, i.e. y = i (y i − y) 2 , and assume that all the vectors are orthogonal, i.e. uncorrelated, then this equation is the Pythagorean theorem. This − is the empirical variance (Var), and leads us to decompose the variance of y in terms of variances of the model components. Because the variables are not independent, the sum of the variances will not equal the variance of y. However, presenting the numbers again gives us an understanding of the relative sizes.

Cross validation for out of sample performance
We use out of sample predictions to study the performance of the models. We name the scheme CVC, for cross validation of cleaning intervals. Here, for each cleaning interval, we fit the model to all the data except for the last 1/3 of observations in that cleaning interval. Because we have 6 cleaning intervals, we get 6 training-test combination sets.
To summarize the predictive quality we use the mean squared error (MSE)

Results and discussion
In this section, we compare the different candidate models, we select the main model, and then we detail the results of the main model.

Model comparison and selection
We give an overview of all the models' out-of-sample predictive quality in Table 2, with the assessment described in Section 3.5. The first row of the table contains the model we will give detailed results on. We also report the measure of the single worst fold in parenthesis. Because we are mainly interested in robust estimation of the fouling effect, the worst performance across the folds is as interesting as the mean performance. If this were a prediction problem, and we knew that our predictive scheme and our predictive RMSE summary matched exactly the goal of the model, we would have picked the model with the lowest RMSE. However, in this application, we want to estimate the hidden fouling effect, thus we want a model with good overall behaviour. In Table 2, the four first lines are with the same model, but with different subsets (see Section 3.3). The less subsetting, the higher the mean RMSE and MAE for prediction accuracy. This is expected due to using more data for prediction. However, the worst-case RMSE is the best for BGAM-main on subset 3. For the smaller subsets, both the 90% coverage is better (larger), and the autocorrelation of the residuals is better (smaller). We chose subset 3 as a good tradeoff between predictive accuracy, reasonable coverage and low residual autocorrelation. However, we find subset 4 to also be a reasonable choice, due to the residual autocorrelation being substantially lower. Lines 5 and 6 in Table 2 represent the models with different exponents for the Speed multiplier in the fouling component. We observe that using a smaller exponent (1 or 2) gives better predictive performance and coverage. However, the improvement is minimal, so due to the physical arguments in Section 1.1, we did not select one of these models.
BGAM-T is clearly worse than the main model, which is the reason why we used t * (time at rest) in this paper. Our experimentation showed that t * was superior to using clock time across a wide range of models.
The last row of Table 2 uses a different approach to subsetting, where more of the data was removed for being 'extreme' before the data was subsampled. Here, Ship Speed > 7 knots, Linear Acceleration < 1, Turning Acceleration < 2.5, Scaled Wind Speed < 1.9 and Scaled Wave Height < 1.9. A scaled variable has the mean removed and is divided by the standard deviation. After this, the observations were subsetted to every 4 hours (similarly to subset 3). We note that the performance is improved across all criteria. However, the results from this row are not directly comparable with the other rows, due to the test set being substantially different; informally, this is an easier prediction task.

Results from the selected Bayesian generalized additive model
We present the results from the main model (BGAMmain) fitted to subset 3. We present the posterior estimates of the various model components and the variance decomposition in Table 3. The variance decomposition is used to understand how much each model component contributes to the model, and to compare effect sizes, as introduced in Section 3.4. The row named Sum is the sum of the above variances (similar for Stdev). If the model components were independent, the sum of variances should equal the variance of the predictor, which they do not, hence they are not independent. If the model components were completely dependent, the sum of Stdev should equal the Stdev of the predictor, which they do not, hence they are not completely dependent. The predictor variance and the residual variance do sum to the total Shaft Power variance, meaning that the predictor and the residuals are (approximately) independent.
Care must be taken when interpreting the size of the model components according to the variance decomposition. As an example, take the residuals.
With a variance contribution of 4.1%, one could intuitively say that the model explains 95.9% of the data variability and that 4.1% is not explained by the model. However, by looking at the Stdev decomposition we see that the variability of the residuals is roughly 1/5 of the variability of the Shaft Power, which does not agree with only 4.1% residual variation. When comparing sizes, it is important to have the right measurement unit, which in this case is MW, and not MW 2 as it would be for the variance.
From the variance decomposition we conclude that the Speed covariate is by far the most important predictor of Shaft Power, which comes as no surprise. The only other component more important than Fouling is the Wind Speed. Fouling is more important than any other covariate, even more important than the Draft. Using the Stdev decomposition, we see that the Fouling effect represents roughly 15% of the total variation of the raw data, measured in MW.
For the linear effects, we see that they are all statistically significant at the level of 95% credible intervals. From the posterior estimates of the linear effects, we see that increased acceleration, Trim, Wind Speed and Wave Height all lead to increased shaft power use, which is sensible. A larger Draft Trim leads to slightly less shaft power use. This is a questionable relationship, and cannot be true for extreme values of the Draft Trim variable. The Draft Trim effect could be related to other operational conditions that are strongly correlated with Draft Trim. However, the effect is small, approximately 1% of the data variability. A larger wave period results in slightly less shaft power use, but again the effect is very small. Figure 3 shows the estimated effect of ship speed. This effect is adjusted for weather, draft, fouling, and other effects. In reality, this should be a strictly increasing function, with an increasing derivative, which is close to true for this graph. However, there are some discrepancies between the theory and this graph, especially near the endpoints. This can be due to unmeasured effects or biases at low and high speeds, and the estimate is not trustworthy at these extreme values. In the statistical model, an unmeasured effect at low speed can be accounted for by this model component. This is in contrast to models with strong physical constraints, where such unmeasured effects are carried into the estimation of fouling instead. From the figure, we see that the uncertainty is large at the endpoints. This implies that there is not enough data in these regions to support a good estimate. However, it also implies that the estimate in these regions will not greatly influence the model. Figure 4 shows the estimated fouling effect. This effect is more complicated to interpret than other effects as it depends on both speed and fouling. To make interpretation easier, we infer the fouling effect at 12 knots and show this in Figure 5. This figure shows what the fouling effect would have been, according to our model, had the ship been going at 12 knots. The estimates are linear in t * , which is the time the ship was at rest. The slopes of the fouling trends are different in the different time intervals. This could be due to operational conditions causing either faster biological growth or a larger impact of the fouling under some operational conditions. The general trend is that the biofouling effect worsens over time, is reduced by cleaning, but never resets to the low effect of a new ship. The most important source of efficiency loss is the gradual loss, i.e. that the ship never returns to its initial biofouling effect. This gradual worsening might be partly due to other factors than biofouling. If there are other age-related inefficiencies, e.g. small damages to the hull, or inefficiencies in the propeller, these will be included in our global biofouling effect. If a cleaning event completely restores the hull to the conditions of a new ship, the biofouling effect in our plots still may not revert to zero after the cleaning, due to these other inefficiencies. The red points in Figure 5 shows the residuals added to the estimated effect. These illustrate the variability from which the estimates and uncertainty are computed. Note that the blue lines are not prediction intervals, and so the red points are not supposed to be comprised by the blue lines. Figure 6 illustrates how the fouling effect varies with different ship speeds. Each piecewise function represents one speed scenario, the lowest speed gives the smallest fouling effect, and the highest speed gives the largest fouling effect. This model implies that the fouling effect becomes extremely problematic at very high speeds.  We considered presenting the model applied to subset 4, instead of subset 3. Therefore, we show the inferred fouling effect in that model in Figure 7. The general trends of the fouling effect are very similar to those of subset 3, but the uncertainty is significantly larger. Additionally, our results for the fouling effect from the other models in Table 2 were qualitatively similar to the effects presented in Figures 5 and 7, hence the estimation of the fouling effect is very robust.

Conclusion
In this paper, we presented a Bayesian Generalised Additive model (BGAM) for modelling shaftpower as a sum of model components and for obtaining a robust estimate of the fouling effect.
The BGAM presented here is a good tool for decomposing shaft power into model components in roder to describe how shaft power increases or decreases as a result of the speed, operational conditions and weather conditions. We have shown how Figure 6. Estimated fouling at constant speeds, assuming the same conditions as in the dataset. The five curves are with speed 10 knots (lowest black), 12 knots (lowest green), 14 knots (middle black), 16 knots (highest green), and 18 knots (highest black). The results are from the model BGAM-main on the subset 3 data. to deal with too few or too many samples on any given day by fitting the model jointly to a subsampled dataset with missing values.
The BGAM is a good tool for estimating the fouling effect. We get robust and statistically significant estimates of the fouling effect, together with reasonable uncertainty bands. There is an obvious positive shortterm effect of every cleaning, and the cleaning is more impactful when the fouling is larger (see Figure 5). After each cleaning, we get a visualization of how fast the fouling effect returns, and we observe a gradual worsening throughout the lifetime of the ship.
The BGAM can be a useful tool for decision makers as an input to cost/benefit analysis of cleaning. For any ship where similar data are collected, a figure similar to our Figure 5 can be produced and used to argue for less or more cleanings. Such an analysis can also be used to retrospectively study the effect of different types of cleaning on the overall fouling trend. The practice of reducing speed when there is a lot of fouling can be quantified for improved decision making on the choice of speed. More efficient strategies for dealing with fouling will give both economic and environmental benefits. These models can also increase the understanding of the environmental impacts of fouling through increased energy demand.
There is more work that can be done to improve the BGAM. We have presented a model of limited complexity on data which we believe are attainable for a large range of ships. In situations with less noisy or larger datasets, one should consider more covariates, more complex interactions between covariates, and also, other functional forms for h(t). For example, sea and air temperature may be included. Additionally, all the covariates measured here have some level of measurement uncertainty. If one can quantify it, it is possible to extend the model to incorporate measurement uncertainty, through the use of multiple likelihoods. We can gain a deeper understanding of the fouling and biological growth by measuring the amount/type of biological material before and after each cleaning. Then we could combine the results from BGAMs on several ships to understand how fast the biomass grows on the hull and propeller, and again combine this with AIS and weather data to understand under which conditions the biomass grows faster and slower.

Disclosure statement
No potential conflict of interest was reported by the author(s).