APPLICATION OF AN ARTIFICIAL NEURAL NETWORK AND MULTIPLE NONLINEAR REGRESSION TO ESTIMATE CONTAINER SHIP LENGTH BETWEEN PERPENDICULARS

Container ship length was estimated using artificial neural networks (ANN), as well as a random search based on Multiple Nonlinear Regression (MNLR). Two alternative equations were developed to estimate the length between perpendiculars based on container number and ship velocity using the aforementioned methods and an up-to-date container ship database. These equations could have practical applications during the preliminary design stage of a container ship. The application of heuristic techniques for the development of a MNLR model by variable and function randomisation leads to the automatic discovery of equation sets. It has been shown that an equation elaborated using this method, based on a random search, is more accurate and has a simpler mathematical form than an equation derived using ANN.


INTRODUCTION
The ship design process consists of three main stages: the preliminary, contract, and detailed design steps. The key characteristics of ship design are based on the main requirements of a shipowner during the preliminary design stage. A parametric and geometric design stage are the main phases of preliminary design. Watson [1], Rawson and Tupper [2], and Papanikolaou [3] have argued that the selection of a ship's main dimensions such as length, breadth, and draught are the main parametric design objectives. However, there is no detailed information about technical characteristics of a ship during the parametric design stage to accurately estimate these parameters. The ship designer should resolve this problem and select these characteristics based on the requirements of the shipowner, the character of the ship's mission, and various formal maritime rules and regulations [1][2][3].
Ship design is a loop iterate process which goes through a design spiral, originally introduced in 1959 by Evans [4] and modified in 1985 by Andrews [5]. The inaccurate estimation of ship dimensions during the parametric design stage increases the number of design spiral loops, and the design time and project cost. As noted by Papanikolaou [3] and Chądzyński [6] for a standard ship type, various statistical, empirical, or regression methods based on similar ship builds may be used to solve this problem.
Chądzyński [6] and Papanikolaou [3] argued that initially, the length of a cargo ship is estimated based on cargo capacity, such as its deadweight, hold, or TEU capacity. Other ship dimensions are usually estimated based on this length at later stages.
Various linear or nonlinear equations developed using a container ship database have often been used for the initial estimation of a ship's length. Piko [7], Kristensen [8], and Papanikolaou [3] prepared a set of equations for estimating a container ship's main dimensions. Piko's equations were based on the statistical data of container ships that were being built up to 1980. Papanikolaou used the data of container ships built prior to 2005 and Kristensen used a database of container ships built before 2013. Linear and nonlinear regression methods were used in these studies.
Over the past 20 years, container ship design trends have been influenced by market and trade demands. The financial crisis of 2007-2008, together with fuel price changes and strict emission requirements have all had an influence on these trends. Container ships are usually categorised as volume carriers in common design procedures. Container number and velocity are the main requirements of any container ship owner. Economic and environmental factors could have an influence on changing these requirements and later, the design process in the future. Figure 1 shows container ship age profiles which are dependent on size and mean speed values. The Sea-web Ships database [9] of all container ships built from 2000-2020 was used in this analysis. Figure 1 shows that container ship capacity and speed have fluctuated to a large degree throughout the last 6-7 years. Moreover, the latest container ships usually have a higher TEU capacity and a lower Froude number. Year of delivery Year of delivery Old regression formulas developed before 2014 are inadequate for considering the design trend changes of modern container ships.
A literature review showed that key container ship characteristic equations were developed based on deadweight or container number capacity. While Piko [7] and Papanikolaou [3] used deadweight capacity, Kristensen [8] used a TEU container number for a ship's characteristic estimation. Deadweight capacity includes the mass of the cargo, ballast, and ship stores. To calculate the cargo mass, the number and mass of containers must be known. Container mass is based on the number of containers. However, a design characteristic estimation based on the number of TEU containers does not include ballast and store mass. However, a deadweight calculation using the number of containers required is unnecessary in this case.
A literature review did not show which parameter led to the smallest estimation error of a design characteristic, i.e., the deadweight or container number capacity. Piko, Papanikolaou, and Kristensen's formulas for a container ship length's estimate did not consider velocity, despite it being a main ship owner's requirement. These formulas were developed using regression methods. Initially, Piko implemented nonlinear approximation methods such as power and 2 nd degree polynomial regression models. Deadweight was used as an independent variable in these equations. Piko compared the results and considered that the functions developed using the power regression model provided reliable estimates of the parameters over a wider range of deadweights.
Papanikolaou [3] presented a theory and a detailed compendium of knowledge on practical methods for the preliminary design of a ship. This book also provides an equation for estimating the length between perpendiculars of a container ship using the deadweight developed using a power regression model. In Piko and Papalikonau's studies, the exponent values of non-linear regression functions were similar and almost 0.4. The Kristensen approximations were based only on TEU capacity and were developed using linear, 2nd degree polynomial and power regression models. In this study, the exponents of the power regression models were 0.38, 0.55, and 0.34 for Small, Panamax, and Post-Panamax container ships, respectively. There have been no publications in the scientific literature on the use of artificial neural networks (ANN) to determine the length of a container vessel. Only Gurgen et al. [10] applied an ANN to predict the main dimensions of chemical tankers using deadweight capacity and speed. An ANN based on a multi-layer perceptron structure with 13 neurons in a hidden layer was used in this research. Gurgen et al. [10] argued that neural networks may create more accurate models for complex systems than standard statistical methods.
Therefore, the aim of this study was to develop an empirical equation to estimate the length between perpendiculars for container ships built since 2014, considering the container number and velocity.

MATERIALS AND METHODS
The container ship characteristics used in this study are defined as follows: Length between perpendiculars (LBP) -the horizontal distance measured parallel to the baseline from the aft to forward perpendicular.
Ship velocity (V) -service speed in knots, which is less than the maximum ahead service speed.
TEU -the maximum number of 20-foot standard containers below and above the deck.
Deadweight (DWT) -the maximum deadweight of the ship immersed at the summer load line in water with a 1.025 t/m 3 density.
where V -ship velocity in m/s, g -standard gravity. L -length of the ship (it was here assumed that L = LBP) The data of the 120 latest new build container ships from 2014 to 2020 was used in this study. The source of the data was Sea-web Ships [9]. Sister ships with identical or similar characteristics were removed from the data set. The range and mean values of ship samples that remained after verification are shown in Table 1. In this study, ANNs and a random search method based on nonlinear regression and heuristics techniques were used to estimate the container ship's length. The second aim of the research was to compare the accuracy of these methods for estimating the container ship's length.

ARTIFICIAL NEURAL NETWORKS (ANN)
In recent years, ANNs have been used in several scientific ship design theory publications. For example, Alkan et al. [11] calculated the initial stability parameters of a fishing vessel using neural networks. Artificial neural networks were developed using sample ship data to estimate the vertical centre of gravity, the transverse metacentre height above the keel, and the vertical centre of buoyancy of the ship. Gurgen et al. [10] created an ANN to estimate chemical tanker dimensions. In this paper, the main ship parameters, such as, overall length, length between perpendiculars, breadth, draught, and freeboard were estimated based on deadweight and vessel speed. Gurgen et al. [10] argued that the initial main particulars of chemical tankers could be determined using ANNs, offering results which were much more accurate than those obtained with sample ship data. Ekinci et al. [12] used 18 computational intelligence methods (including neural network methods) to estimate the main design parameters of oil/chemical tankers. Abramowski [13] developed a model for determining the effective power of a ship using neural networks. In this publication, a mathematical model was developed using neural networks to determine the effective power of a ship. Cepowski [14] applied ANNs to estimate added resistance in regular head waves while using ship design parameters, such as length, breadth, draught, and Froude number. To create a reliable model, only experimental data determined through model test measurements was used to train the neural network. Song et al. [15] used the radial base function ANN to predict a ship's rolling motion. Based on this method, the disturbing moment and roll time series were estimated. Sahin et al. [16] used the ANN model linked to the main ship parameters to estimate the dilution factors in the preliminary design. Gross and deadweight ton, passenger number, freeboard, engine power, propeller number, and block coefficient values were used to estimate the likely dilution factors. Luan et al. [17] used ANNs to estimate the fuel consumption of container vessels. Cheng et al. [18] presented a comparative study of the sensitivity analysis and simplification of the ANN for a ship's motion prediction. Indeed, the use of ANNs has provided excellent results in several research experiments. An ANN was created based on the functioning of the biological nervous system. The nervous system is a structure consisting of neurons and connections linking them. A numerical model of the neural network was developed based on this structure and the signal transmission method. The neural network was built from an input, output, and one or more hidden layers that consist of neurons [19]. Values from previous layers were passed through neurons which were connected with weights. These weights determined the relationships between input and output data [20]. The main problem in developing an artificial network is selecting an optimal network structure and calculating the neuron weight values. Therefore, different types of neural networks and methods of learning can be used. Multilayer nonlinear neural networks and a learning backpropagation algorithm are often applied to predict technical parameters.
Overfitting phenomena are an additional problem in the development of neural networks. This phenomenon occurs when a statistical model has too many parameters in relation to the data sample size used in the creation of the model. A test set method is usually used to detect this phenomenon. Unfortunately, about 25-30% of randomly selected data is wasted to test the model in this method. Additionally, about 25-30% of data are used to validate the neural network during the learning process. This means that about half of the dataset is wasted throughout the neural network development process.
In this study, the search process for the best neural network included the following steps: • creating a neural network topology, • training a network, • testing a network, • making an accuracy assessment of a network's model based on the test results.
The statistical method of mean absolute error (MAE) value was used for an accuracy assessment Eq (2). where: LBP -length between perpendiculars from the data set, LBP e -estimated length between perpendiculars using a neural network, n -number of ships in the data set.
To develop these neural networks the following assumptions were made: • sum of squares as an error function, • backpropagation as demonstrated by [19,21,22], the conjugate gradient descent [19], and Levenberga-Marquardt [23,24] as a training algorithm, • logistic sigmoid function as activation, • validation and test sets included 30 cases each (60 cases in total).

RANDOM SEARCH METHOD
A random search method based on a Multiple Nonlinear Regression (MNLR) and heuristic algorithm was applied to estimate the alternative length between perpendiculars. A general MNLR model is given by the relations in Eq. (3) [25,26].
The authors defined the following general model Eq. (4) to develop the relationship between the length between perpendiculars (LBP), velocity (V), and the number of containers (TEU) based on the model Eq. (3): where: α -intercept, β -coefficient, f -base function, such as power, logarithmic, or exponential function, n, m, k, z -the number of functions or β coefficient.
A set of 400 power, logarithmic, or exponential functions was used in this study. Finding the best TEU and V combinations in this model and selecting the best-fitting functions from the function set led to a large number of possible variants. For example, if we assume the simplest model Eq. (5): which consists of four function combinations (f1, …, f4) selected from a collection of 400 base functions, we get the total combination number (n): Searching through all these possible variants using an exact algorithm is computationally expensive and timeconsuming, and thus, a heuristic approach was applied to solve this problem. The disadvantage of this method is that the solution is not as optimal as the exact approach. However, multiple searches allow the user to find a solution which is almost optimal.
An algorithm can be developed in which variables and base function combinations are randomised during the first step. Then, the model's fit to data is checked, and statistical errors are calculated. Finally, the best functions and variable combinations are selected through looped searching.
The authors implemented selected parts of this algorithm in the ndCurveMaster computer program [27] which was used to support equation searching.
Increasing model elements improves accuracy but may lead to overfitting with this method. Therefore, the next problem was to detect and prevent overfitting.
A test set was randomly selected from the data set to detect overfitting. The following two data sets were selected: • data set A contained 75% of all data (90 cases) used for model development, • test data set B contained 25% of all data (30 cases) used for overfitting detection. For data sets A and B, the root mean squared errors RMSE(A) and RMSE(B) were calculated using the following formula: where: LBP -length between perpendiculars from the data set, LBP e -estimated length between perpendiculars, n -number of ships in set A or B.
In this study it was assumed that overfitting occurs when the root mean squared error related to test set B is 20% higher than the root mean squared error related to set A. To estimate the overfitting, the ratio of the error RMSE(B) to RMSE(A) was calculated.
The algorithm schemes are shown in Figures 2 and 3. As shown in Figure 2, during the first step, data for sets A and B were randomly selected, and the simplest model (5) was initially defined. Next, the best functions f1-f4 and regression coefficient values were discovered through random searching, based on the algorithm shown in Figure 3. Better functions were selected based on a higher correlation coefficient R value.
As shown in Fig 2, after the initial development of the model (5), a standard error (SE) value was checked. The research assumed that the SE value limit was 7.7 m (ship length).
If the SE was greater than this limit value, the model was randomly expanded in the next step. After this expansion, the occurrence of overfitting phenomena was checked by computing the ratio RMSE(B)/RMSE(A). If this ratio value was greater than 1.2, the least statistically significant component was removed from the model. This procedure was looped until these two conditions were met.

. The general algorithm scheme, where: A -data set, B -test data set, SE(A) -standard error related to set A, RMSE(A) -root mean squared error related to set A, RMSE(B) -root mean squared error related to test set B
Defining the regression model

ARTIFICIAL NEURAL NETWORKS
Among all the neural network types, the multiple perceptron (MLP), which has two neurons in the input layer, 11 neurons in the hidden layer, and one neuron in the output layer, was the most accurate. Table 2 shows the statistical data of this neural network broken down by teaching, validation, and testing sets. The mathematical form of this network is given by the formulas in Eqs. (8), (9), (10), (11), (12), and (13). LBP= c 0.00318 +0.22 (2) where: c -the variable, calculated as follows:  Figures 4 and 5 show the process of discovering an equation for estimating the length between perpendiculars using the random search method. The SE value related to set A, the ratio RMSE(B)/RMSE(A), and the number of model elements are shown through the model evolution. In the first phase, the model was inaccurate and the SE value was high. Therefore, the model was expanded to seven elements in the next phase. This model extension reduced the SE error but increased the RMSE(B)/RMSE(A) ratio which overfitted the model. In the next phase, the model was reduced to four elements to avoid overfitting. However, the model accuracy was reduced after this procedure. Finally, increasing the model element number to five and finding the most accurate functions allowed the successful completion of this search. Eq. (14) was discovered: LBP = 54.296 + 2.656 · TEU 1/2 + 1.4E-06 · V 5.6 -2.821E-21 · TEU 5.6 · V -1.8 -1.116E+08 · TEU -1.3 · V -4 -1.007E-04 · TEU 0.4 · V 3.1   8) and (14). The surfaces presented on these figures look similar, though for the extreme values of velocity and TEU capacity, the length calculated using an ANN was slightly larger than the one obtained using the random search method. Figure 8 compares the length calculations using both methods for ship sample data for selected speeds. As shown in Figure 8 both methods gave similar results; Eq. (14) provided slightly more accurate results in relation to sample ship data at a speed of 11 kts. Figure 9 illustrates the estimates obtained using both methods compared to test sample ship data. The test data was in the range of full ship length. This figure shows that the length values calculated using both methods were close to the perfect fit line. Table 3 shows the root mean squared error (RMSE) and Pearson R-squared coefficient values relating to regression Eq. (14) and the developed neural network referenced for the entire data set (including the training, validation, and test sets). This table shows that Eq. (14) is characterised by a RMSE estimation error 2 metres smaller than Eq. (8).

Tab. 2. The values of the root mean squared RMSE error and Pearson
R-squared coefficients relating to regression (14)

AN EVALUATION OF METHODS IN TERMS OF EASE OF COMPUTATION AND MODEL SIMPLICITY
The equations presented in this article were developed for practical use by a ship designer. In this respect, one important factor was the possibility of using these formulas for manual calculations by a person with a basic knowledge of computer techniques. For this reason, model simplicity was an important criterion.
Research showed that Eq. (14) developed using the random search method had a simpler form than Eq. (8) developed using neural networks. Eq. (14) is based on only seven basic functions and six coefficients. The analytical relationships of independent variables are also clearly shown in this equation. The practical use of this equation requires only a basic knowledge of computer techniques. A scientific calculator or a simple spreadsheet can be used for calculating the LBP length based on Eq. (14).
In contrast, Eq. (8), which was developed using an ANN, is much more complex. Several calculations should be performed using the formulas in Eqs. (8) - (13) to estimate the length between perpendiculars in this case. The relationships between TEU capacity, ship velocity, and ship length presented in the formulas of Eqs. (9) -(12) are unclear. Eq. (8) ANNis more difficult to implement by a normal user. Indeed, the user must have an advanced knowledge of computer techniques to implement the ANNANN model, or alternatively, a specialised computer software may be used. The use of a scientific calculator to estimate LBP length using Eqs. (8) -(13) is more complicated and time-consuming than simply using Eq. (14).

THE EVALUATION OF METHODS FOR THE USE OF DATA
Both methods presented here use heuristic techniques, so these methods do not offer an optimal solution. The complex equations developed using an ANN and a random search method may lead to overfitting. Therefore, overfitting was detected using a test set which included 25% of the data in both methods. An ANN and random search method are the same in this respect.
However, an additional validation data set was used to develop the ANN. This validation set also included 25% of the data. In this research, ANNs lost 50% of data in total for validation and testing.
In contrast, the random search method did not require a validation set and lost only 25% of the data during the overfitting detection. In this regard, the random search method provides a more effective use of the data set than the ANN. Table 4 shows a summarised method comparison in terms of different properties.

CONCLUSIONS
In ship design, only DWT or TEU capacity are usually used to estimate a container ship's length. Over the last few years, economic and environmental factors have affected ship owner requirements. At present, ship velocity may be a second key design parameter in addition to TEU or DWT capacity. Therefore, alternative design equations for estimating a container ship 's length based on TEU capacity and ship velocity have been proposed in this research. This provides a new approach in naval engineering. The equations presented in this work were developed based on the data of the most recent standard container ships built since 2014.
In this article, ANNs and a random search method based on MNLR were applied to estimate a container ship's length. The conclusions drawn from these compared methods may be summarised as follows: • Both methods are characterised by a high estimation accuracy. The random search method is slightly more accurate and offers a RMSE error value less than 2 metres in length. • Eq. (14) developed using the random search method is simpler and easier to compute than Eq. (8) developed using an ANN. • The random search method used the data set more effectively than the ANN. • The random search method also only used 25% of the data for testing while the ANN needed 50% of the data for validation and testing.
Estimates acquired with an equation developed using Multiple Nonlinear Regression (MNLR) may be as accurate as ones obtained using ANNs. The application of heuristic techniques for the development of MNLR by variable and function randomisation automatically enables the discovery of a set of equations.
The methods presented in this article may be used to estimate the parameters of a container ship, such as breadth, side depth, or draught. In general, in the case of volume carriers, these dimensions are primarily determined by the ratio of length to breadth and side depth, and the displacement of the vessel. However, it seems that the use of these estimates could be helpful to assess the accuracy of the design calculations. The results presented in this paper and in [4] indicate the possibility of developing a neural network to predict all dimensions of a container ship and, potentially, other types of ships.
The algorithms described here may have practical applications for the commercial design of container vessels. However, these formulas can be inaccurate for the design of an innovative container ship and can only be used to estimate the length of container ships with design characteristic ranges listed out in Table 1. The use of algorithms to determine a ship's length for characteristics outside these ranges may possibly be associated with less reliable calculations.