Machine learning in sustainable ship design and operation: A review

The shipping industry faces a large challenge as it needs to significantly lower the amounts of Green House Gas emissions. Traditionally, reducing the fuel consumption for ships has been achieved during the design stage and, after building a ship, through optimisation of ship operations. In recent years, ship efficiency improvements using Machine Learning (ML) methods are quickly progressing, facilitated by available data from remote sensing, experiments and high-fidelity simulations. The data have been successfully applied to extract intricate empirical rules that can reduce emissions thereby helping achieve green shipping. This article presents an overview of applying ML techniques to enhance ships ’ sustainability. The work covers the ML fundamentals and applications in relevant areas: ship design, operational performance, and voyage planning. Suitable ML approaches are analysed and compared on a scenario basis, with their space for improvements also discussed. Meanwhile, a reminder is given that ML has many inherent uncertainties and hence should be used with caution.


Background
About 70% of the Earth's surface is covered by water, and approximately 90% of all transport is waterborne (Wu, 2020). However, as of the year 2012, global shipping emissions were approximately 938 million tonnes of CO 2 and 961 million tonnes of CO 2e combining CO 2 , CH 4 and N 2 O, signifying around 2.2% of global anthropogenic Greenhouse Gases (GHGs) (Smith et al., 2014). By 2050, the maritime transport segment needs to reduce its total annual GHG emissions by 50% compared to 2008 to limit the global temperature rise to no more than 2 • C above the pre-industrial level (Cames et al., 2015). In this context, waterborne transport's role becomes critical, revealing an urge to promote sustainable shipping.
Optimising maritime transport has a long history and has been an ongoing task for ship operators, designers, and builders. Since hundreds of years ago, naval architects started to seek better hull forms so the ships would feel less resistance when operating in water. Although those approaches are mainly empirical and based on simplified classic physics, they did establish the fundamental theories of naval architecture, significantly improving hull design and instigating several centuries of blossoming maritime transport. These improvements were then accompanied by the optimisation of marine engines, since the industrial revolution.
More recently, the development of high-fidelity Computational Fluid Dynamics (CFD) techniques and High-Performance Computing (HPC) units have allowed the simulation of very complex maritime scenarios such as vessels going in rough seas (Jasak, 2017;Dashtimanesh et al., 2020), vessels going in sea ice ( Fig. 1)  , water-entry processes (Huang et al., 2021a), and the flow control of hydrofoils (Pena et al., 2019). As a particular example, the work of Pena et al., 2020aPena et al., , 2020b demonstrated the turbulent intensity in the flow around a full-scale cargo ship for the first time ( Fig. 2), which is facilitating the development of an energy-saving device that has been shown to significantly reduce ship resistance (Pena, 2020). Another instance is that Ni et al. (2020) simulated the icebreaking process of a ship in level ice, which for the first time achieved a level of fidelity where the overall icebreaking process is coupled with the water underneath; such simulations are important for reducing the GHG, as the energy consumption of icebreakers is extremely high. These computational techniques have also been incorporated within ship design processes that have helped to find the optimal configuration without the necessity of conducting extensive experiments.

The development of machine learning in shipping
In the meantime, enhancement in satellite observation has allowed ships to plan their voyages based on weather observation and prediction, improving maritime sustainability and safety by choosing the most optimal route (Li et al., 2020a). During the last decade, big data in this field has been established based on geospatial data systems such as the Copernicus Marine Environment Monitoring Service (CMEMS) (von Schuckmann et al., 2018). Those systems integrate historical weather data and provide future projections to support voyage planning. On top of that, ship fuel consumption corresponding to specific weather conditions can also be recorded. Traditional manual methods may no longer handle such data; instead, ML is used to ascertain efficient integrated operations that can reduce GHG emissions. As symbolised in Fig. 3 The concept of ML in the shipping industry relies on a data stream from and to the ships, which can be analysed directly through a ship bridge system or indirectly through separate computers (CIAOTECH Srl, 2020). This process can be further enhanced by incorporating data in real-time to inform dynamics and control, decision support, and performance optimisation (Anderlini et al., 2018a;Rehman et al., 2021). Apart from real-world data, CFD can be a cost-effective alternative to provide valuable data for trainomg ML and other optimising algorithms (Garnier et al., 2021;Pena and Huang, 2021). Therefore, Advanced Computing techniques, including the combination of CFD, algorithm optimisation and ML, have stepped into the shipping industry and are transforming it in a way that has never been seen before.
The reason for shipping to need ML in the future is that ship   performance is influenced by numerous factors, such as trim, draft, pitch angles, winds, waves, currents, biofouling, engine efficiency, alongside extensive hull geometric parameters. The links between these parameters are not straightforward. This makes it hard to use traditional regression techniques to perform holistic design, prediction, analysis and optimisation. The main advantage of ML is the ability to decode complex patterns, which makes it suitable for advanced shipping applications. Compared with CFD and experiments in ship design, ML can be a complement. Industrial members (e.g. a classification organisation or a ship design company) conduct numerous projects per year which can provide quality CFD and measurement data to train ML models to have sufficient accuracy. It is envisioned that ML will have the independent ability to provide reliable assessment after sufficient training, and its capability can still be expanded through more and more projects over time. This means that a wide range of ship parameters will be included with confidence. The early design stages will see significant benefits by using ML, where vast configurations need to be tested that would be prohibitive to obtain using CFD or experiments. ML can provide rapid estimates with nonlinearities accounted for, overcoming inaccuracies in linear analytical methods that are currently used in early design stages.
For ship operation, ship route optimisation can be achieved through ML models by connecting the ship parameters with real-time climate data. The rapidity of ML will enable route optimisation in real time. Operational strategies will be improved as ML can inform these based on the engine and sea conditions. Moreover, ML models will enable continuous monitoring of voyages, providing the ability to report risks, structural fatigue and engine faults, thus improving maintenance and repair strategies over the lifecycle.

Scope and literature scan approach of this review
To facilitate the sustainable development of shipping, this article reviews how ML and its combination with other Advanced Computing have been applied in this field. Three main areas that have been found particularly relevant will be discussed in detail: ship design, operational performance, and voyage planning. The work aims to demonstrate how these techniques can be used to inform the enhancement of waterborne efficiency and eventually help achieve a zero-emission future.
The literature scan for this work was performed based on Web of Science using the words co-occurrence method, where the searching condition was "Machine Learning" occurring together with "Ship" in any paper. In total, 1050 papers were found and their distribution is shown in Fig. 4.
Overall, it can be seen that this research field mostly started after 2015, with a majority of papers published after 2020. The keywords were then used to guide the applications of different ML methods in the three research categories (design, operational performance and voyage planning). At least one paper is detailedly reviewed for each method used in each research category. For papers using a similar approach, the selection standard is sufficient training data, which is important for the comparative purpose of this review; this will be further discussed in the following sections.
This paper is organised as follows: Section 2 covers the fundamentals of ML currently in use or with the potential to be used in the marine industry, as this is an emerging branch that is less well known than traditional methods. Section 3 reviews the marine industry's ML advancements with respect to ship design, operational performance, and route optimisation. Section 4 provides a discussion on ML's achievement in sustainable shipping, and points out the aspects that require special attention and future work. Section 5 summarises this review with its key points.

Machine learning fundamentals
As introduced by Kretschmann (2020), ML consists of different algorithms that learn dependencies through pattern recognition in data sets and use the identified patterns to make predictions (Nelli, 2018). The basis for solving a task is a dataset in which ML methods identify underlying relationships to give generalising rules used for completing a given task (Chollet, 2018). ML methods thus are particularly useful in determining correlations and patterns in complex data sets. In comparison to statistical methods, an advantage of ML is that it can represent both linear and nonlinear relationships without being bound by restrictive premises or assumptions of some statistical tests (Poh et al., 2018); in comparison with high-order methods such as CFD, ML can overcome the limitation of computational speed and is incorporable with real-time applications (Anderlini et al., 2020a). However, a prerequisite of any ML model is a sufficient and informative set of data to learn the inherent correlations from. In addition, a primary drawback of ML algorithms is a lack of physics-informed foundations, thus containing large uncertainty in the predictionfurther discussion on this limitation will be given in Section 4.
Depending on how the learning task is achieved, ML algorithms can be classified into Supervised Learning, Unsupervised Learning, Semisupervised Learning and Reinforcement Learning. A detailed tree diagram is given in Fig. 5 and more details about each technique are covered in the following sub-sections.

Supervised learning
Supervised learning consists of learning the mapping between input and output variables given sampled input-output pairs (Meijering, 2002), see Fig. 6. As labelled data is required for the process to achieve the desired goal, supervised learning is considered a task-driven approach. This is useful when a certain pattern of the data is already known, and the prediction will be more specific as undesired relationships can be filtered. There can be two types of outcomes: numerical and categorical. Numerical outputs are given as exact numbers, and categorical outputs are given as a classification, e.g. whether a ship's engine is faulty or not.
Supervised learning methods date back to linear regression solutions proposed by Carl Friedrich Gauss (Meijering, 2002) and later logistic regression. Supervised learning algorithms can also be classified into two subcategories: parametric and non-parametric models. On the one hand, parametric models have a fixed number of parameters with classical methods, including linear regression, logistic regression, Least Absolute Shrinkage and Selection Operator (LASSO) (Park and Casella, 2008), Linear Discriminant Analysis (LDA) (Izenman, 2013) and ensembles of boosted (Chen and Guestrin, 2016) and bagged trees (Biau and Scornet, 2016). On the other hand, in a non-parametric model, the parameters are not given beforehand, and the ML model may identify influencial parameters during the training process. Some non-parametric models include Gaussian Process (GP), k-Nearest Neighbours (KNN) and Support Vector Machines (SVM) (Hearst et al., 1998). Parametric models are more flexible and typically present higher accuracy for small datasets, but their training cost becomes excessive as dataset size increases.
Since 2012 when AlexNet was introduced (Krizhevsky et al., 2012), classical machine learning algorithms have been superseded by deep learning methods based on Neural Networks (NN) (LeCun et al., 2015). Artificial neural networks are inspired by the biological brain and comprise multiple artificial neurons which receive a signal, process it and send it to neurons connected to it. The signal, which is represented by a number, is computed by some nonlinear functions and is assigned a weight whose value is adjusted as learning takes place (Yegnanarayana, 2009). Neurons are typically grouped into layers and the signals travel from the input layer to the output layer (a signal can pass layers multiple times). Networks with multiple hidden layers are called Deep Neural Networks (DNN) and the subject is known as deep learning. Stacking multiple layers with nonlinear activation functions enables different levels of abstraction to extract the hidden connections in the data to produce much higher prediction accuracy over classical machine learning algorithms. For instance, AlexNet provided a 9.4% improvement in prediction accuracy for image classification over the 2009 ImageNet dataset comprising 1.5M sample points and 1000 categories (Krizhevsky et al., 2012). In 2015, ResNet further improved accuracy by 12.8%-96.4% with 152 layers . Although deeper networks learn better representations, they are much more complex, e.g. AlexNet has 60,954,656 weights and 612,432,416 connections. Hence, practical and theoretical improvements have been made over the years to improve the computational performance of DNN to keep training costs within acceptable limits even for the largest datasets. For example, Amazon's reference dataset with 82M product reviews (i.e., samples) is a benchmark for natural language processing, which is a typical example of improving algorithms to handle a very large dataset with low training costs (He and McAuley, 2016). This has led to the introduction of different types of NN. The state-of-the-art methods and implementations can be found in .
Typical current classification tasks involve image recognition, object detection, text digitalisation, video captioning, sentiment analysis, recommendation systems and threat detection. Deep learning is commonly used in regression tasks for market forecasting, logistics and operations planning. The last layers of the classification or regression task are typically accomplished by simple feedforward neural networks, which comprise layers with feedforward connections (Goodfellow et al., 2014). Computer vision tasks typically involve the use of Convolutional Neural Networks (CNN), with layers of incrementally decreasing size for all but the few last layers to pool resources and reduce the computational effort by sharing weights, as images can have a large number of pixels and three colour channels leading to a very large input space (Krizhevsky et al., 2012). Time series data, which is of particular interest for engineering applications, can be modelled with recurrent neural networks, where the output of the neurons is fed back into the network to maintain a memory effect. Long short-term memory is particularly popular as it solves problems associated with vanishing gradients (Hochreiter and Schmidhuber, 1997). However, recently, they have been replaced by attention-based methods, or transformers, for natural  language processing whose training can be parallelised (Vaswani et al., 2017).
Supervised learning solutions typically present the highest prediction accuracy, but require correct data labelling, which can be extremely expensive for data-intensive applications.

Unsupervised learning
Unsupervised learning finds structures in non-labelled data with no human supervision required during the training, see Fig. 7. This fact makes unsupervised learning attractive in applications with a large amount of data or where data labels are simply not available (Barlow, 1989). The most known unsupervised learning techniques are clustering and dimensionality reduction.

Clustering
Clustering or cluster analysis is a well-known unsupervised learning technique that can organise the data in clusters or groups by identifying similarities. This technique is particularly useful to identify underlying patterns which might not be visible or logical to humans. Clustering is mainly used for pattern recognition, market research, image analysis, information retrieval, robotics, or even crime analysis (Celebi and Aydin, 2016). The most known classical algorithms are k-Means Clustering (KMC) and KNN (Celebi and Aydin, 2016). KMC partitions data into k clusters; an observation belongs to the cluster with the nearest centroid, resulting in partitioning data space into Voronoi cells. On the other hand, KNN is a non-parametric model used as a classifier for clustering data; it looks at the points closest to the nearest centroid (Gareth et al., 2013). More options here can be the Expectation-Maximization (EM) method (Murray and Perera, 2021) and the DBSCAN method  which are increasingly active.

Dimensionality reduction
Dimensionality reduction is particularly useful in extracting the principal features in extremely large datasets with a complex input space. Principal Component Analysis (PCA) is a common classical method to identify the correlation between features and obtain lowerdimensional data while preserving as much of the data's variation as possible (Celebi and Aydin, 2016). Popular deep learning solutions for dimensionality reduction comprise autoencoders, which includes an encoder DNN presenting layers of decreasing side to extract the fundamental features in a latent space (Kingma and Welling, 2013). The latent space is then reconstructed into the original signals in a decoder that mirrors the encoder. Hence, the autoencoder is trained to reproduce the input signal in its output. Variational autoencoders, whose latent space is a probabilistic function, are particularly effective and represent the state of the art (Kingma and Welling, 2019). Autoencoders are extremely useful for anomaly detection and denoising the original signals.

Semi-supervised learning
Semi-supervised learning is an approach to machine learning that combines a small amount of labelled data with many unlabelled data used during training, see Fig. 8. Semi-supervised learning falls between unsupervised learning (with no labelled training data) and supervised learning (with only labelled training data). Unlabelled data, when used in conjunction with a small amount of labelled data, can produce considerable improvement in learning accuracy (Zhu and Goldberg, 2009). The acquisition of labelled data for a learning problem often requires a skilled human agent (e.g. transcribing an audio segment) or a physical experiment (e.g. determining the 3D structure of a protein or determining whether there is oil at a particular location). Thus, the cost associated with the labelling process may render large, fully labelled training sets, whereas the acquisition of unlabelled data is relatively inexpensive. In such situations, semi-supervised learning can be of great practical value. Semi-supervised learning is also of theoretical interest in machine learning as a model for human learning.
Generative models are the most known semi-supervised learning approaches and include generative adversarial networks (Goodfellow et al., 2014) and variational autoencoders (Kingma and Welling, 2019). Generative approaches to statistical learning first seek to estimate the distribution of data points belonging to each class. The probability that a given point has a label is then proportional to Bayes' rule. Semi-supervised learning with generative models can be viewed either as an extension of supervised learning (classification plus information) or as an extension of unsupervised learning (clustering plus some labels). Common applications include fault diagnostics to include unseen failure modes and editing of images, videos and text.

Reinforcement learning
Reinforcement Learning (RL) implies goal-directed interactions of a software agent with its environment, as shown in Fig. 9. Unlike in supervised learning, reinforcement learning paradigms do not need labelled input/output pairs to be presented, and it does not need suboptimal actions to be explicitly corrected. Instead, the focus is on finding a balance between exploration (of uncharted territory) and exploitation (of current knowledge) (Kaelbling et al., 1996).
The RL environment can be formalised as a Markov Decision Process (MDP). The central elements of RL are the agent's states, an environment, a set of actions and a reward after the transition from the state to a new state. An RL agent interacts with its environment in discrete time steps. At each time, the agent receives the current state and reward. It then chooses an action from the set of available actions and then sends it to the environment. The environment moves to a new state and the reward associated with the transition is determined. The goal of a reinforcement learning agent is to learn a policy that maximises the expected cumulative reward.
Deep learning has revolutionised RL research by enabling the treatment of continuous state and action spaces as well as the incorporation of computer vision (Levine et al., 2020). Hence, deep reinforcement learning (DRL) is actively being investigated for decision making and robotics applications, including autonomous driving, humanoid robot locomotion, robot manipulation and computer games. Offline DRL, where the agent learns from pre-sampled data similarly to supervised learning, is at present a topic of high interest to mitigate the risks associated with exploration.

Ship design
In terms of naval architecture, the design of a vessel constitutes an essential task to achieve superior hydrodynamic performance to minimise fuel consumption. Designing a ship relies on sophisticated experimental and computational techniques for hydrodynamic performance evaluation of multiple hull sizes and shapes. Traditionally, naval architects can perform regression analyses to predict a new ship's hydrodynamic performance based on existing hull forms. This approach can provide an approximate estimate of the new ship's performance, however, may become incapable at later stages of the design since such a regression approach normally only consider several primary parameters but does not consider advanced parameters such as highly-nonlinear hull surfaces. During a later optimisation process, which is a critical step in improving the performance of vessels, ship designers would have to rely on their personal experience to revise the hull plan. This largely depends on the designer's skills and makes it hard to find the optimal configuration. Therefore, a strong impetus has aimed at turning a tedious ship design procedure into a much simpler process. Such efforts have been facilitated by the fast development of Artificial Intelligence (AI) and the availability of HPC, so that now an ML-assisted ship design process has become realistic. The first application of ML in this regard could be considered Holtrop and Mennen's empirical algorithms that present a statistical method to approximate the ship resistance based on the results of multiple model basin tests (Holtrop and Mennen, 1982;Holtrop, 1984). This method is only applicable to hull forms resembling an average ship described by the main hull dimensions and form coefficients used to build the regression. Because of the limitation of this method, it is essential to emphasise that the calculated resistance tends to deviate from the actual hull resistance significantly (Holtrop and Mennen, 1982;Holtrop, 1984). Therefore, this method is only recommended during the concept design stage.
Other pioneers in the area of assisted ship design were Ray et al. (1995). They for the first time presented a global optimum strategy in ship design. Specifically, separate optimisations in resistance, weight, freeboard, building cost, and stability were integrated together by developing a system handler. Classic naval architecture equations were used in the calculation, and constraints were added according to the sailing requirements of a 136-TEU containership. Nonetheless, the optimisation objectives share equivalent weights in the decision-making process, and a more intelligent decision-making strategy was pointed out by the authors as future work. Another limitation of this work was that only one candidate hull was considered due to the technology status of that time. Yu and Wang (2018) revolutionised the ship design process by creating extensive hull geometries using a principal-component analysis approach. Extensive hull forms were evaluated for their hydrodynamic performance, and the results were then used to train a DNN to accurately establish the relation between different hull forms and their associated performances. Then, based on the fast, parallel DNN-based hull-form evaluation, an optimal hull form is searched for a given operation condition. Using this approach, the authors showed a novel application of ML in ship design and optimisation, which allows the creation of an extensive database and getting fast results through searching. Ao et al. (2021) advanced the state of the art by developing a DNN which uses a Fully Connected Neural Network technique to predict the total resistance of ship hulls in its initial design process based on control points of the CAD geometry. The flowchart of Ao et al.'s work is given in Fig. 10. The authors reported that the average error was lower than 4% when compared to CFD data. The high accuracy can be attributed to the fact that Ao et al. model relies on hundreds of control points on the CAD geometry as input whereas other models such as Yu and Wang (2018) only relied on principal parameters of the hull. Similarly, Abbas et al. (2022) developed a geometry-based DNN approach to link predict the wind resistance of ships, and the ML predicted results are highly close to CFD results. Nonetheless, in the three papers here, only resistance was considered. If this approach is coupled with other performance parameters such as stability and structural integrity, it has the potential to become a popular automatic ship design approach.
De Winter et al. (de Winter et al., 2021) used a database of existing vessels to establish performance indicators through RF, which is then used for designing new vessels. The RF helped identified some influential indicators such as light ship weight, dead weight, and maximum continuous rating. They trained this ML-based model with 1219 known container ships and used the model to predict the performance of other 1219 container ships. The comparison showed a pleasing deviation level of the highest at 2.6% and the average of 1.8%. However, the authors admitted two uncertainties: the quality of training data, and the dependence/independence between parameters.
On the other hand, it is known that accurately modelling nonlinear hydrodynamic phenomena such as a propeller performance evaluation for ship design is a highly sophisticated task that requires looking into several design variants. NN could therefore be the solution to facilitate such a complicated process. For example, Xue et al. (2022) presented a NN approach for the design optimisation of propellers. They discretised the propeller surface into numerous small elements, which led to geometric parameters as input, including chord length, skew, rake, and thickness of blade. A simple hydrodynamic method is used to calculate the outputs, i.e. the propeller's performance, which was then linked with the inputs through NN. The Genetic Algorithm (GA) is used to generate geometries and select the optimal ones. CFD was used to verify the NN  prediction and showed a deviation of 3.2%. Lim and Kim (2022) proposed a CNN-based model to detect the vortex-induced vibration and structural resonance of propellers. Their results showed a success rate of 82% when predicting vortex-induced vibration phenomena. This example demonstrates the possibility to predict the vortex-induced vibration in a fraction of time with respect to complex fluid-structure interaction simulations.
Grech La Rosa et al. (2021) proposed a KNN model to design the bulbous bow for ships, which can help reduce drags thus saving energy. By collecting in-service ship data with bulbous bows, the authors developed a supervised ML algorithm (labelled based on design parameters for bulbous bows) that can have the ability to recommend whether to install a bulbous bow for a new vessel and, if needed, provide an optimised bow geometry. Pena (2020) demonstrated that hydrofoils may be installed as a type of energy saving device for cargo ships, which has been widely applied in sailing (Souppez et al., 2019). Direct ML optimisation of hydrofoil for ships has not been found in the existing literature, but it can be inferred that the reviewed geometry optimisation ML methods can be used for this. For example, Yeo et al. (2022) used a GA to optimise the blade geometry for tidal turbines, which is similar to the geometry of hydrofoils.
Models based on human learning have been successfully introduced by Cui et al. (2012); they proposed a Q-learning RL optimisation approach to improve the search process that is typically followed during the concept design phase. This RL-based approach was successfully applied to improve the structural optimisation process of a bulk-carrier ship with two objectives of weight and fatigue and being integrated with JAVA and ABAQUS structural software. Their algorithm proved to show great potential to minimise a ship's structural weight, which could be used to minimise the ship's fuel consumption through improved ship efficiency. This method could then be integrated with other aspects of ship design, such as hydrodynamics and stability to automate the ship design process further.
In terms of stability, Turan and Cui (2012) presented a hybrid evolutionary algorithm that uses RL to guide the search for the most optimal ship design by using a multi-objective evolutionary algorithm. The authors analysed a Ropax damage stability problem demonstrating the effectiveness of their approach, which can be applied to other areas of naval architecture such as structures.
Cepowski (2020) developed a NN model to estimate added resistance in regular head waves with training data obtained through model test experiments. The positive outcomes from the study showed that added wave resistance values calculated by NN could be correlated with the measured data, which could be particularly applicable during the first stages of the design to minimise ship hydrodynamic resistance in rough seas. More recently, Yang et al. (2022) developed an innovative Data-driven and Physics-based Symbiotic Model, which combines a 2D strip theory method algorithm with NN to predict added resistance in head waves. Their physics-based NN model showed an average error level of 8% for wave resistance, whilst the parallel pure ML model shows a level of 20% for the same prediction. The results from their work indicated that combining physics in ML models can improve the model's reliability and minimise uncertainty, especially when the physics is relatively complicated.
As the physics gets progressively complex, e.g. ships in calm water, in waves, in ice-infested seas, pure ML methods' shortcoming of ignoring the underlying physics becomes more influential. In ship-ice interactions, ML does not inform how ships can break up sea ice and then how the broken ice can interact with the ship. Sun et al. (2022) and Zhou et al. (2022) used NN to predict intricate ship resistance and propulsion power in ice, showing some errors of >30%, which is significantly larger than the error level of NN applied for ships in calm water and waves. Therefore, It is recommended to combine certain physics in ML prediction of ship-ice interactions, such as the approach given by Yang et al. (2022).
It should be noted that some of the above ML models are based on data in model scale, e.g. (Cepowski, 2020), and therefore, subject to scaling issues. Detailed discussion on the scale effect between model and real ships has been documented by Terziev et al. (2022), showing the inherent errors in model-scale prediction and extrapolation are an outstanding problem for various ship design purposes, as it causes incorrect reproduction of geometrical features, false prediction of flow properties such as turbulence/wave characteristics, as well as a result of disparities in force ratios acting on the model and full-scale structures (Terziev et al., 2019). Therefore, ML has the potential to be applied to address such complex relationships between model and full scales. Based on a thorough review, Terziev et al. (2022) demonstrated that direct full-scale CFD is a promising solution to gather data that are not subject to the scaling issue, which may then be used in combination with model-scale CFD to inform ML models in this regard. However, ITTC has only provided a CFD guideline in model scale (ITTC, 2014a), which intimates that the standard approach to performing full-scale ship CFD has yet to be established before ML can be widely and confidently applied to address ship scalability.

Operational performance
A ship's performance can be expressed as mathematic relationships with relevant variables, such as operating speed, weather, and maintenance conditions. Such relationships can be built upon empirical equations using extensive data from experiments or simulations, and ML can derive such equations. As shown in Fig. 11, apart from hull design and engine condition, the efficiency of a ship is also related to its current trim and fouling; also, choosing an optimised route is essential for time and fuel savings. Tillig (2020) proposed a Ship Performance Model (SPM), which is a generic ship energy systems model to predict fuel consumption under operational conditions. The model can be divided into two main parts: (i) a static part for calm water power prediction based on empirical methods and standard propeller and hull series as well as the estimation of all required ship dimensions and properties using empirical formulas, and (ii) a dynamic part for the analysis of the necessary power under realistic operational conditions, including effects from wind, waves, current, temperature differences, fouling and shallow water. Coraddu et al. (2017) used physics induced white-box models. as well as black box models including LASSO, RF and RLS. to predict fuel consumption for a handymax chemical tanker from high-frequency continuous monitoring data. By combining the white-and black-box models to form a hybrid grey-box model, the same performance was achieved as the black-box, but with less historical data required, yielding an effective system for optimising trim in real operational conditions.
Previous studies have widely demonstrated the use of ML for predicting fuel consumption, with a key point on identifying the most influential variables (known as features in ML). Such a procedure has been introduced in detail by Soner et al. (2019). In their work, 27 features were investigated by LASSO and the most influential ones were identified. They suggested that the variance of starboard level, trim, port pitch, and starboard pitch has considerable effects on the fuel consumption of ships. As another example, Coraddu et al. (2019) established a relationship between a ship's speed loss and marine fouling. Their approach established an indicator to schedule cleaning of the hull and the propeller to mitigate fouling and provided better prediction than the current ISO guideline. The importance of features was further investigated by Laurie et al. (2021) to determine the optimal amount of variables for the propulsive power prediction for containerships to capture biofouling effects, with the addition of 'Days Since Clean' and 'Significant Wave Height' increasing prediction accuracy by 0.07% and  Fig. 12, where the authors demonstrated excellent practice from data pre-processing, feature selection, data training, model evaluation, to the final analysis model. In addition, Yu et al. (2019) applied RL to assist the decision support of ship mooring for varying sea environments.
Another key point is to figure out the best ML strategies for predicting ship fuel consumption. Earlier work of Pedersen and Larsen (2009) and Beşikçi et al. (2016) used NNs to predict the propulsive power using noon reports data, showing good accuracy with a predictive error of 7%. The accuracy level of NN in this regard was later demonstrated to be improved by Tarelko and Rudzki (2020), showing a 0.8-2.8% deviation. Petersen et al. (2012) developed an improved NN to predict fuel consumption using open-source ferry data from continuous monitoring systems, yielding a model with an error of only 1.50% and outperforming GP and Gaussian Mixture Models (GMM). Petursson (2009) applied the KNN algorithm and SVM to Petersen's data set to predict shaft power, with both algorithms exhibiting high predictive accuracy but difficult to compare to Petersen's NN due to the change in the target variable; Chaal (2018) employed the KNN algorithm, decision tree regression and NN on continuous monitoring data, with the algorithms yielding very similar results. Tree-based methods were explored further by Soner et al. (2019) using the same ferry data set used by Petersen, in the form of bagging, boosting, and RF approaches. Direct comparison with Petersen's NN showed RF obtained a reduced error of 43.5 L/h compared to the 47.2 L/h achieved by the NN. Multiple linear regression, decision trees, KNN, NN and RF were further compared by Laurie et al. (2021) to predict power consumption for a class of seven containerships from high-frequency, continuous monitoring data augmented by sea-states information via satellite. The RF model was most effective in predicting shaft power, with an error of 1.17% which was low enough to capture fouling effects. Wang et al. (2018) proposed a LASSO regression predicting the fuel consumption for several container ships, with features on ship and weather data extracted from a fleet management system. In their case, the LASSO regression produced better results than SVM, NN and GP regression. Gkerekos et al. (2019) also performed a comparative study for predicting fuel consumption for two ships, where they evaluated SVM, RF, ETR and NN. Additionally, they introduced an Automated Data Logging & Monitoring (ADLM) system to improve efficiency and accuracy. In their results, ETR and RF performed best in investigated cases. Considering the work of Wang et al. (2018) and Gkerekos et al. (2019) together, a notable point is that the suitable ML technique can be different in different cases, but all of the methods perform similarly well in this application.
As ships' performance varies greatly, fuel consumption data collected from different ships can potentially depend on other variables to various degrees. Using such highly varying data as training and validation sets could generate subjective evaluations on the applicability. Despite that multiple ML methods have shown promising results in existing publications, these publications have so far been limited to data of a handful of ships. From the perspective of this review, previous work proved ML is a workable method for fuel consumption prediction and the prediction can be accurate. However, comparative studies cannot demonstrate that an ML method is superior to another, if the methods are compared purely based on data from one ship or a couple of ships. Therefore, to confidently evaluate which ML methods are suitable for ship fuel prediction, crucial future work is to establish a sufficiently large database of real ship data. Developing standard datasets as in other industries (e.g. ImageNet (Krizhevsky et al., 2012)) is highly recommended to provide an unbiased benchmark.
Propulsion efficiency is crucial as it governs how much fuel consumption can be converted into ship movement. The efficiency is not static, and it is related to what condition the ship is operating in, including the ship's speed and oceanic conditions. The problem might be more complex than establishing the relationship between propulsion efficiency and operating conditions, as the total energy consumption of a ship is also dependent on different support systems that produce electricity, heating, ventilation and other auxiliary demands. As a solution, ML methods can be similarly applied here. For example, Yang et al. (2018) created NN for predicting waste heat recovery performance; Raptodimos and Lazakis (2018) applied ML to link monitoring data with situations where machinery failure could happen, thus enabling diagnostic purposes. A condition-monitoring solution using ML for the complex propulsion plant of a frigate was developed by Cipollini et al. (2018) using virtual data and a wide range of unsupervised and supervised ML methods. For a ship propulsion system that is influenced by many variables, the challenge is to select the most influential features for ML algorithms. To achieve sufficient accuracy with the least computational complexity possible, it is recommended to apply a LASSO regression process (or L 1 regularised regression) before establishing the model. This will implicitly perform feature selection and significantly improve the final algorithm's efficiency by balancing fit and sparsity, similar to that introduced for fuel consumption prediction.
ML can also help crews by providing controlling strategies for ships' propulsion systems. Perera et al. (2016) designed an ML-based automation system consisting of a power management architecture for engine and propulsion control systems with respect to various engine room operations. It achieved a coupling control of different engines' power, ship speed, shaft speed and corresponding fuel consumptions. Meanwhile, a marine engine centred data flow chart has been established to handle the large-scale data sets. Thereby, they forged a big data solution that can automatically improve the quality of engine strategies and advise the bridge crew on decisions such as speed selection. Nonetheless, a shortcoming here is that different ML approaches can provide notably different accuracies in engine performance prediction. In this context, Yuan and Wei (2018) compared the outcomes of NN and GP in this procedure and found out GP provides more accurate data; however, as Petersen et al. (2012) indicated, there is still a lack of benchmarking cases that can be used to verify different methods, thus the conclusion of Yuan and Wei can be one case but cannot generally mean GP is the best option here. Ongoing work within this area will focus on improving these models, also considering the possibility to combine multiple methods.
Nikolopoulos and Boulougouris (2020) developed a holistic approach for ship operation performance, with lifecycle considerations, which integrated modules to be inputted geometrical variables and output indexes in stability, strength, safety and economics. These input signals are based on big data from onboard acquired measurements from the parent vessels, coupled with weather data implemented for modelling the operating conditions. This work reveals a particularly good example of integrating different ship performance parameters.
In terms of biofouling, Demirel et al. (2017) presented a high-fidelity CFD method that enables the prediction of the effect of biofouling (or marine coatings) on ship resistance. They considered the roughness effects on the resistance and effective power of a full-scale ship. By contrast, fast and convenient approaches to predicting biofouling have been enabled through underwater cameras and image analyses (Bloomfield et al., 2021;First et al., 2021).
On the other hand, marine diesel engines operating with heavy fuel oil or marine diesel oil are not a viable powering solution for the shipping industry to achieve the required reduction in GHGs and pollutants. There have been trends to develop green and renewable energies to alternatively power ships. Planakis et al. (2022) developed an energy management system for hybrid battery + diesel engine ship propulsion which incorporates a clustering ML technique to swap the powering methods based on operational scenarios. Based on their results, the ML-managing hybrid system can reduce around 8.5% of fuel consumption and emissions.
Wu and Bucknall (2020) designed a hybrid fuel cell and Lithium-ion battery propulsion system for vessels. This system uses RL  to control the complementation between the two powering methods: since the fuel cell has the shortcoming of slow response, Lithium-ion batteries can cover the ac/deceleration processes; whereas the Lithium-ion battery is very slow to refill, the fuel cell can be used as the primary energy source. The authors provided simulations to repeat previous voyages and demonstrated that a minimum 65% GHG emission reduction can be achieved by utilising the hybrid system. Subsequently, Wu et al. proposed a Double Deep Q-Network RL approach for (Wu et al., 2021) energy management, achieving a further 5.5% cost reduction with a 93.8% decrease in training time. Their work is based on a recent type of hydrogen fuel cell, assuming hydrogen can be replenished overnight and does not need recharge during operation. Such a novel system is just applicable to coastal vessels committing short-distance voyages, limited by the total amount of energy carriage. Hydrogen is of great importance as one of the most promising green alternatives for maritime operations, and it can be produced offshore which shows the potential to develop hydrogen charging stations for ships. However, currently hydrogen's application as a direct fuel for ships is still limited by its transportation and storage (Masoudi Soltani et al., 2021;Jenkins et al., 2022). For global cargo shipping, today's batteries or hydrogen fuel cells still do not have the energy density to power long-distance trotting ships (Wittels, 2020), thus using ML methods to optimise the traditional engine efficiency and reduce waste will remain an important research area.

Voyage planning
With the benefits of reducing marine incidents and optimising energy efficiency, there have been increasing deployments of automated route planning, which is currently supported by weather routing and radar systems. In this approach, environmental factors such as the wave height, direction, wind and currents as well as the densities and temperatures of air and water are considered. At the same time, radars are generally used to identify other vessels and obstacles to secure safety. Voyage Planning Tools (VPTs) based on weather systems are usually built upon SPM, where the response surface of ships is linked with various input conditions, as introduced in the last section.
Based on SPM, VPT can map out the fuel consumption of all potential routes and choose the best one: like a "Google Map" for oceans. Yuan et al. (2021) developed a Long Short-Term Memory NN which was used for the prediction of real-time fuel consumption rate. The ML model was then used to re-plan a historical voyage with historical metocean data. Their research showed that the NN was able to revise the route and save approximately 30% fuel compared with the original route. A combination of both energy saving and obstacle avoidance has been done by the VPT of Li et al. (2020a). In their applicaiton, an SPM has been linked with ice conditions to guide ship navigation in the Arctic. While the VPT can choose a route with the least fuel consumption, it also avoids encountering significant ice conditions such as icebergs and ice ridges. Hence, the calculated route might be sub-optimal from the perspective of fuel consumption. Following validation, the fuel consumption predicted by their model has agreed well with full-scale measurement data . The work of Li et al. has demonstrated the excellent potential to apply AI techniques in this area to handle the non-negligible ice data and risks, which is motivated mainly by the opening of Arctic shipping routes in recent years (Huang et al., 2019(Huang et al., , 2021b. Another example of applying ML to predict ship speed in ice fields has been given by Milaković et al. (2019). Huang et al. (2021c) presented a coherent developing procedure from a low Technology Readiness Level (TRL) computational simulation to a high TRL VPT. As a computational simulation is fairly slow to run (such as the one presented in Fig. 1), it is impractical to run a simulation each time a ship performance prediction is requested; therefore, the simulation work was limited in low TRL applications. To overcome the speed limit, Huang et al. (2021c) regressed systematic computational simulations to reveal the relationships of the ship performance with different ships and environmental conditions, based on which an empirical equation is derived to swiftly predict ice-floe resistance for a given ship in a given condition. The rapid equation allows its incorporation into a set of Arctic SPM and VPT that link with real-time weather systems to predict a ship's fuel consumption in ice-infested seas and dynamically suggest a route with the least safety concern and fuel consumption. The design flowchart of their procedure is portrayed in Fig. 13. Overall, the work of Huang et al. (2021c) gives a good demonstration of leveraging Advance Computation to effectively assist shipping sustainability. In addition, ML could also be used to facilitate a realistic match between the simulated ice layout and the realistic ice layout; In , a GA was used to optimise the shape, the size distribution, and the random locations of ice-floe fields; the optimisation standard is for the generated floe fields to have the closest match with measurement data, and the generated floe fields can be imported into simulation tools to replace oversimplified floe fields (e.g. fixed sizes, regular locations).
To conduct ship voyage planning, a prerequisite is that ship operation data should be effectively collected and classified. In this context, a particular challenge is developing algorithms to efficiently classify a ship's data into different movements, e.g. static, normal navigation and manoeuvring. To achieve this, Chen et al. (2020) developed a CNN method to forge an Automatic Identification System (AIS) for ships' data. The underlying concept of this method is to train a CNN to learn from the labelled AIS data, and the unlabelled AIS data can be effectively classified by using this trained network. The results demonstrate that this efficient CNN method works very well in the classification of AIS data; The achieved accuracy level is 92.35%, which is higher than parallel tests using KNN (70.2%), SVM (80.4%).
ML applications in ship voyage planning have been beneficial for the antinomy of autonomous ships, especially for small boats in coastal regions. Because coastal regions typically have dense boat operations, the boat route is more sophisticated and their tasks change quickly. Liu and Bucknall (2015) designed an algorithm for planning routes for Unmanned Surface Vehicles (USV) that can achieve avoiding obstacles. They applied the Fast Marching (FM) method that can identify the corresponding safe shipping area and forbidden area in real-time to ensure that the planned trajectory do not encounter any obstacles. The method works in both static environments (with natural obstacles, offshore structures etc.) and dynamic environments (with other moving vessels). Chen et al. (2019) demonstrated the usage of RL to train USVs, in which the ships can be rewarded based on how rational the decisions are, and the route optimisation can be done by choosing the best reward value; however, their work only considered a static environment thus still need to incorporate a dynamic environment as Liu and Bucknall did (Liu and Bucknall, 2015). Similar examples can also be found using Deep Learning (Perera, 2020). A challenge for operating USVs is to operate multiple USVs simultaneously. This requires identifying the tasks and routes of all the vehicles in the operating area and making them consider each other, dramatically increasing the algorithm's complexity and accuracy requirement. As an example of applying ML to multi-USV systems, Ma et al. (2021) proposed an unsupervised learning method based on coordinated multi-task allocation for unmanned surface vehicles. In their work, unsupervised learning strategies were used with an improved KMC to assign different tasks for a multi-USV system; then, a self-organising map was implemented to deal with the task execution problem upon the assigned tasks for each USV. However, the model of Ma et al. (2021) assigns a specific region for each USV, assuming that a USV will not work in another one's region. This limitation is due to lacking an advanced algorithm that can navigate multiple USVs in a crowded environment. Meng et al. (2022a) developed a GP-based navigator which uses an onboard camera for USV to identify obstacles and navigate in real time. The work showed promising results through a typical operating scenario of USV navigation in a wind farm, as shown in Fig. 14. This approach has the potential to address the operation of multiple USVs in one region, but more verification is required. For example, the image approach's performance and risk level are unclear when vessels operate at a fast speed or when a structure's sight is blocked by another.
ML has also been applied to the weather routing of ships. Grifoll et al. (2022) developed a comprehensive A* path planning software to optimise the route as a function of the wind, wave and current data from the CMEMS service. They renavigated some previous intercontinental voyages and demonstrated an up to 9% time saving and 28% CO 2 reduction. Ryan et al. (2021) performed similar research whereas also included sea ice data from Met Office. Meng et al. (2022b) developed a combined model of GP with the FM method to consider the influence of real-time ocean currents on the fuel consumption of autonomous ships, based on which they demonstrated energy saving through route optimisation.   (Meng et al., 2022a(Meng et al., , 2022b.

Summary
The ML methods used to facilitate shipping sustainability are summarised in Table 1. It can be seen that relevant applications correspond to ML approaches, whilst one ML method may be suitable for multiple applications. A summarisation and comparison are given upon their required samples, computational cost, accuracy level, suitability, limitation and research gap. It is noteworthy that the sample size between different papers is not a comparable variable, because the required samples between different applications and different methods are too different to compare. To minimise the uncertainty of sample size, the literature selection in this paper is based on sufficient training data, as mentioned in Section 1.3. Therefore, the sample size of all reviewed papers is assumed to have converged.

Proven applications of machine learning in shipping
Based on the survey in recent work, the advancement of green shipping has highlighted successful implementations of ML in three main fields, which are ship design, operational performance and voyage planning: • In ship design, ML algorithms based on statistical regression have been traditionally used as part of the design process. New applications which facilitate a semi-automatic ship design process from a hydrodynamic or structural perspective have been made possible. This includes data-driven optimisation of a hull form based on its CAD geometry.
• Ship performance can be related to engine powers, ship speed, shaft speed, energy wastes and weather data, by which an optimal operational setup can be advised in a given navigation condition; ML has been widely added for this purpose. Moreover, green engine options such as fuel cells and batteries have been developed as a hybrid system combined with traditional marine diesel, in which ML can be used to optimise the mode-selection strategy. • There have been increasing deployments of automated route planning which consider factors such as weather forecasts as well as route obstacles that can be encountered by ships. ML techniques have been demonstrated abilities to achieve considerable fuel savings through VPTs.
A summary of different types of ML versus their shipping applications is given in Fig. 15.

Outlook of future developments
Despite that ML algorithms have demonstrated their capabilities in ship efficiency, traditional knowledge still dominates the maritime industry. One of the reasons could be that the algorithms firstly rely on big amounts of data while requiring high computational costs (Hastie et al., 2009). The marine industry has been conservative and still reluctant to Most models only consider resistance as the optimisation standard. Other aspects such as stability and structural integrity should be included (Q-learning is recommended in this regard ).
DNN, RL Geometry analysis to identify the coordination of the geometry Highrequire the generation of CAD files (GA) (Xue et al., 2022) Can be applied to designing novel geometries Resistance prediction NN Model tests or CFD data of different hulls under different operating conditions Lowuse numerics Highly accurate for calm water and wind resistance (>95%) (Ao et al., 2021), less accurate for wave resistance (>80%)  and ice resistance (>60%)  The scaling issue is not resolved. It is recommended to combine with physics for wave or ice resistance prediction . Can handle multiple vessels in one region (Meng et al., 2022a) openly share data to support the training and validation process of ML. Creating, optimising and maintaining relevant algorithms will still require extensive human inputs and expertise, given the fact that errors in ship design calculations and marine operations could carry catastrophic consequences. Therefore, human factors will need to be studied and well-balanced in the foreseeable AI future. Following ML replacing manual tasks, the shipping industry is expected to be slowly transitioned into an automated process which will just require a set of inputs to find the most efficient solution with little human intervention. For example, ports can leverage ML for real-time operation of cargo containers. Once this happens, the industry should be prepared for such a scenario which would cause corresponding job cuts as large design teams will no longer be required. This will only be possible with the availability of powerful infrastructures.
One of the key infrastructures is the sensors on ships, which is the precondition for sourcing relevant data, such as structural response and flow details. An advanced measurement network is also required to monitor and coordinate the measurement data. It is also expected that the development of HPC resources, as well as the availability of the 5th generation mobile network (5G), will facilitate the process, particularly during route planning that requires remote data transmission. The development of sensors may also enable the measurements of air emissions and underwater noise from ships and then provide direct strategies to improve environmental and ecologic sustainability.
With large databases available to the scientific community, linear approaches and old-fashioned empirical functions in ship design will be slowly substituted by ML algorithms that consider the conventionally neglected nonlinearities. This is expected to happen for the currently-inuse guideline formulae such as the ITTC-57, ITTC-78, Katsui et al. and Grigson empirical friction lines (ITTC, 2014b;ITTC, 1978) which calculate resistance coefficients based on empirical results from model scale experiments and for old ship geometries. Considering the prohibitive cost of obtaining large datasets from experiments for training ML, CFD can be particularly useful for data mining, with the support of validation against limited experimental data. It should be mentioned that the combination between ML and CFD requires CFD to be well-validated, as not-well-validated CFD models for ships could yield data that lacks quality. The resolutions and schemes of CFD that are constantly improving nowadays will be more and more significant support for ML. For example, Yu and Wang (2018) method could see further accuracy improvement when combined with the most advanced turbulence modelling approach for detailed and complex analyses of the flow around ships (Pena et al., 2020a(Pena et al., , 2020b.

The uncertainty and formal procedures
Based on the above progress, there has arisen an excited scientific community and learners who believe that ML can resolve any kind of practical problem. However, this group of people should still be very careful, since the data-based solution may easily establish a model that does not link to classic physical and mathematical rules. ML efficiency and efficacy are dependent on the algorithm as well as the training that is used as part of the process. Factors such as the quality and quantity of data, the variation selection, and the algortithms must be reasonably accounted for through proper steps.
ML users should avoid training an algorithm with insufficient data. For instance, Kretschmann (2020) tests different amounts of historical datasets to predict accidental risks of a ship: when 12-month data were used to train the ML model, the accident frequency is predicted to be 21%, and in the case of 6-month data, the accident frequency falls to 11%. Such an example shows that the validity of an ML model should go through a comprehensive verification process. A sensitivity study on training data amount is required to ensure that the model is well informed (similar to mesh sensitivity studies in CFD). In any case, the satisfactory standard of a dataset size sensitivity study should be the dataset is large enough that further increasing the dataset does not notably change the prediction. However, such a sensitivity test is missing in many of the reviewed works as they are normally based on a given/limited database.
Once a user has a valid dataset, the next step is to select a suitable ML method. The purpose of Sections 2 and 3 is to help a user select a suitable method. In certain cases, multiple ML models may all be valid and yield alike results. For example, Gkerekos et al. (2019) compared ETR, RF, SVM and NN approaches to predict ship fuel consumption, and their accuracy levels are all similar. However, significant uncertainty exists in the procedure that the user uses to process the data, such as labelling data, building links, and setting inputs/outputs. For another group of researchers, such setups could be significantly different thus the results would be changed. This step is contemporary a "grey area" in ML, which makes it hard to evaluate and compare various ML studies, thus it becomes unreliable to comment on what is the best ML method for a certain application. A way to improve this is for reputable associations to provide standard ML procedures, such as "ITTC Recommended Procedures and Guidelines in Machine Learning". It is recommended for ML studies to also carry out a formal procedure for estimation and reporting of uncertainty, e.g. (Celik et al., 2008). The procedure can quantify the uncertainties thereby providing an index for comparing different ML studies.

Conclusions
This paper gave an overview of how ML has been applied in assisting sustainable shipping, with respect to ship design, ship operational performance and voyage planning. These applications demonstrated that ML can process large datasets and extract connections between various elements. Extensive examples have shown ML can facilitate shipping sustainability through these applications, and it is also enabling complex functions that humans would unlikely perform. ML already demonstrated its very promising contribution to green shipping, and such applications will grow enormously in the near future.
On the other hand, there are also non-negligible concerns for current ML shipping technologies. Our maritime world will still rely on firstprinciple methods to govern the geometric design and secure operational safety, and ML is no substitute for understanding physics and engineering. This means that, ML, as a highly variable and dataorienting method, will not be an all-in-one solution to seamlessly replace traditional empirical/computational/experimental methods. Properly combining physical methods with ML can help secure the reliability.
In the ongoing digital revolution, the way forward is to appropriately distinguish the pros and cons of various ML methods and incorporate them as correct segments within the workflows. Meanwhile, uncertainties in datasets and model training processes need to be addressed. This requires related regulators and associations to develop formal standards and procedures for ML applications.

Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Data availability
All data underlying the results are available as part of the article and no additional source data are required.