DATA DRIVEN PERFORMANCE EVALUATION IN SHIPBUILDING

Rapid development in data science keeps paving the way for use of data for many purposes in shipbuilding, both for product development and production, such as Industry 4.0 have been developing many industries. Similar to other industries the evaluation of performance in shipbuilding is the key to success which is closely connected to productivity and lowered costs. Data mining and analysis techniques are used to create effective algorithms to evaluate the performance, also by means of cost estimation based on parametric methods. However, it is usually not very clear how data are collected, organised and prepared for analysing and deriving valuable knowledge as well as algorithms. In most of the cases, having this data requires either continuous investment in expensive software or expensive external expertise which are generally not available for small and medium size shipyards. In this study, considering the needs of the small and medium sized shipyards, a step-by-step methodology is proposed which could be easily applied with widely available low budget software. The application is demonstrated with a case to evaluate the performance of early phase structural design with a data driven cost estimation algorithm.


Introduction
Shipbuilding is a very complicated process, because the activities take long time, human factor and safety issues play an important role, and the ship itself is a complex product. The importance of estimating a correct budget in the bidding stages is essential in order to be able to compete under the best circumstances. If the company presents too high a budget, this could result in a loss of competitiveness for the tender [1]. Calculating the cost of required labour, which is one of the hardest to estimate, necessitate not only the historical data but also a good organization of past data. Although data for material and labour cost exist in shipyards, it is difficult to find proper time and suitable tools to handle the data and make it useful for the future projects. On the other hand, there is a huge amount of data. This study is about easy application of data driven performance evaluation in shipbuilding and its connection to cost 40 and data analysis. The intention of the current work is to demonstrate how small and medium shipyards could utilise data to assess and increase their performance. For this purpose, a methodology is proposed to implement a data driven performance evaluation system with a considerably low budget solution.
The shipbuilding industry where continuously improved with machines, software and new implemented organizational restructuring; but still is facing difficulties with large number of changes during construction and large number of ships series led to the loss of control over costs [2]. Cost estimation rely on data. Recent development in data analysis methods and tools offer many possibilities to make use of historical data. Different data sets, which were collected for different purposes such as budgeting, warehouse controls, and design work, etc. could be easily connected and analysed to derive relations. Some of these relations might give unknown insights about the productivity of a company. In addition to that the harvested data could be numerically formulated for predictive purposes e.g. for cost estimation or scheduling. Several researchers published data analysis techniques as well as results of their application. In these publications, the data extraction/collection processes are either done through costly software or too complex to be performed without the use of expertise knowledge.
It remains unclear how an organisation could start exploring its own data without the necessity of making costly investments in the latest software and/or for the expert knowledge to perform this task for the shipyard. Small and Medium Enterprise (SME) size shipyards, which are already struggling to survive their business, require low budget solutions combined with an easy to apply methodology [3].
This study proposes a low budget and easy to apply methodology, which is specifically developed for SME size shipyards to structure, and analyse their own historical data for the evaluation of performance and other prediction related purposes. A parametric cost estimation model is created based on real big-data from the history of shipyard by using Microsoft Excel to make the model within easy reach.

Literature Review
Definition of performance evaluation is a topic where no concrete consensus exists. For this study, the performance model and definition proposed by Slack et al. [4] as shown in Fig 1 will be considered and applied. This model has five performance objectives with their internal and external effects where the internal effects of performance objectives lead to high total productivity and consequently to reduced cost. It could be said that when cost is well evaluated, then so is performance.
There are different means to estimate cost. The parametric cost estimation is suggested as a method together with the Cost Estimation Relations (CERs) from previous projects. As can be seen in [5], [6], [7] CERSs are the most suitable tool for the shipbuilding industry especially when sparse data is available. Parametric cost estimation requires an understanding of the cost structure and cost drivers and statistical analysis of historical data to derive the CERs.
Recent development in data science made it possible to analyse larger amount of data for designing better ships and for building them with higher quality. Operational data from shipping companies are collected, analysed and used to improve the product design. Whereas, data stored at the shipyards is analysed to help improve shipbuilding performance. Using the stored data by means of better estimations and assessments leads to an increase in knowledge and insight about the key relations affecting the performance. Kaluzny et al. [8] applied data mining methodologies to develop a cost estimation algorithm based on the analysis of data from 57 ships of 16 classes of 6 nations and developed a satisfactory algorithm to estimate the cost of naval ships. Also Kolich et al. [9] developed a model to predict the cost of interim block assembly by use of historical shipyard data. Moreover, Huijgens, et al. [10] utilised historical project data to develop an extrapolation method for predicting the work content and stressed the importance of operational data, which could have increased the accuracy of their study. Nevertheless, Bao et al [11] proposed an algorithm to develop erection planning where the algorithm was based on design data and management data. Bao et al [11] suggested that organisations store both structured and unstructured data and it is sometimes difficult to obtain a structured data model. Huijgens et al. [10] defined it as a challenge to gather especially production man-hour data. Therefore, the question is not if there is data or not, but how to handle it in the available form.
Major maritime companies have already started investing in research for the methods of implementation of the digital revolution also referred to as Industry 4.0 or Shipbuilding 4.0 [2]. There are off-the-shelf software solutions, which integrate several management functions such as enterprise resource planning, scheduling, etc. However, software technology is changing very fast and usually requires help of external experts for implementation. It is less likely that a small/medium sized shipyard will invest money in a system which will be outdated before a single shipbuilding project is completed. This necessitates a software which the shipyard is familiar with and still is capable of gathering and analysing the data. In this study, Microsoft Excel (Excel) is used for this purpose. Data analysis add-ins of Excel are offered without any additional cost when the product is purchased with a professional licence. In fact, most shipyards have already invested in this software, so it is common, low cost and familiar tool for the engineers and experts.

Definition of Boundaries for Performance Evaluation
Shipbuilding is a heavy manufacturing industry. The initial project starts with requirements, which are related to market conditions. In every step from a ship design to construction, there are many decision-making milestones. These decisions directly affect the cost. Therefore, design and production processes should be integrated all together including data collection and organization.
The proposed methodology will look into the performance evaluation for a shipbuilding project and from the perspective of a shipyard and will focus on the evaluation of internal effects of the performance. Although it is difficult to clearly define boundaries between phases, departments and other means of steps in shipbuilding process, a generic illustration of the critical decisions and the most significant performance objectives is given in Fig 2. Fig 2 is not intended to be comprehensive and its timeline is limited by the end of the Contract Design stage. In case of contract signing with a potential owner, this is usually the first milestone of a shipbuilding project on which time and budget estimations rely. Therefore, the performance of the shipyard is pre-defined at this stage involving both technical and commercial concerns, assumptions and estimations, which will further become the basis for all performance evaluations of that project. As shown in Fig 2 each and every critical decision has an impact on the cost related performance. Recalling the performance model by Slack et al. [4], cost is directly related to the total productivity and internal performance. Therefore, when evaluating the shipyard performance, it is very important to assess each aspect from cost perspective and to correctly estimate the building cost.

Data-Driven Approach for Performance Evaluation
Recent developments in data science and relevant tools make it possible to derive more insight and knowledge from existing data and enlightens the way for further data extraction requirements. Algorithms for the performance evaluation of SME sized shipyards could be developed by following these steps: • Step 1 Background Study: By means of unstructured interviews, accompanied with direct observation of the facilities, general information is to be collected about production processes, information flow and personal interactions, which are crucial to clearly define the problem before collecting and analyzing the data. • Step 2 Pre-defining Cost Drivers: Obviously, a shipyard has no control on the prices of material and labour which highly depend on the market conditions. The focus of the 43 analysis should be on the parameters, which are under the control of the shipyard and directly related to its performance. Therefore, technical parameters and organizational strategies are to be evaluated by an expert group from the shipyard for identifying the most important cost drivers and understanding how these drivers are decided and controlled. • Step 3 Data Collection and Handling: Cost and relevant technical data to be identified, selected and collected in a pre-defined format as far as practical limitations allow. These could include the data from previously built ships and projects, such as ship main particulars, material quantities, equipment characteristics, purchase costs, applied unit costs, man-hours, technical details like weight, volume, area, number of parts, building stages, etc. • Step 4 Analysis, Relations and Adjustment Factors: Statistical significance is to be checked for the relations for pre-defined cost drivers as well as between technical parameters. Mainly, linear regression method will be used as a tool which is commonly accepted in showing relations. When the statistical significance is found satisfactory, further analysis is to be performed to create Cost Estimation Relations (CERs) and adjustment factors. • Step 5 Setting up an Algorithm: Based on pre-defined cost drivers, analysis and the cost structure, an algorithm is developed with a combination of CERs and adjustment factors

Considerations About Cost
Before developing a data driven algorithm for the estimation and/or assessment of cost followings issues need to be addressed:

Cost structure
It is important to understand how the cost is structured in a shipyard in relation to the source of data which is to be analyzed. Although larger shipyards follow more structured breakdown of the work and cost, this may not be the case for small/medium size shipyards. Different departments like design, production, procurement, or different purposes like cost estimation, cost control, etc. may result in altered cost structure. When used for performance evaluation, breakdown of the cost should focus only on the relevant measures. Multiplying these measures with actual unit prices of material, labour or other relevant variables should provide an acceptable level of accuracy for the cost and consequently the performance of the work in question. A high level generic building cost structure could be divided into major cost items such as; material and equipment, production labour, design and engineering cost, overhead cost and energy consumption cost.

Cost Adjustment Factors
Adjustment factors should be defined and selected based on the cost drivers defined for the specific shipyard in question. Some of the adjustment factors are listed below based on some major cost groups [6]: • Material related: Type, distribution, waste, sister ships, actual unit cost • Equipment related: Maker, type, sister ships, characteristic, actual unit cost • Labour related: Assembly stage, lead time, work content/density, productivity, sister ships, producibility, actual unit cost Due to the longer project periods of the shipbuilding process and economic fluctuations, it is rather hard to rely on cost figures. Uncertainties and bias could be reduced by use of technical parameters instead of monetary ones. Cost normalization is a key issue when adapting CERs into a cost assessment algorithm. This way the monetary figures are only used as a multiplier in the form of a unit price to reflect the effective cost. This requires gathering different data and knowledge together in order to properly define the cost drivers. For instance, instead of analyzing lump-sum cost of an assembly of a double bottom block, it makes more sense to go deeper and seek the possibilities to extract data on spent man-hours, total weight of the steel, welding length, etc. which would give insight in the work content and the major cost drivers. Another aspect of cost normalization is to avoid price changes. Instead of using the direct cost of material like welding consumables, primer, etc. these costs could be normalized e.g. with the effective steel plate unit price at the time of the purchase of that material. By this way, instead of a monetary figure, there will be equivalent amount of steel as a cost indicator, which could later be multiplied by the actual steel price of the time of the analysis to get an understanding of the total cost. Cost normalization could be made with unit prices or some dimensionless index, which reflect the market conditions. Some examples could be given as oil/steel prices, stock market index, Clarkson's shipbuilding index, etc.

Cost related parameters and measures
Although statistical significance is important for defining the CERs, it is not necessarily a proof of causation. Expert opinion should always be asked in order to avoid unnecessary or misleading analysis / relations. Similarly, some of the relations might be expected as significant and yet not seen within the available data sets. In such cases data and the analysis need to be checked for mistakes or other parameters and measures to be considered.

Case Study
This case study was carried out within the scope of the EU funded Holistic Optimisation of ship design and for life cycle (HOLISHIP) Project (2016-2020) [12]. Optimization of structural design is also a part of the project where building cost is identified as the major performance indicator for design. Within the project hundreds of structural design alternatives needed to be evaluated from the cost perspective. Therefore, an algorithm and a tool were required in order to estimate the cost and rank the design alternatives based on performance.
Uljanik Shipyard/Croatia supported the project and helped with the development of the methodology in this study and provided data. Unstructured interviews were made with shipyard experts to translate the above referred problem into the practical shipyard environment. Following steps were carried out at Uljanik Shipyard.

Step 1 Background Study
During the early design phases, potential owners would like to evaluate several design alternatives before undertaking the responsibility of a large investment. Shipyards are requested to provide alternative designs where the most important performance evaluation 45 criterion is the cost of merchant ships. Structural design at contract phase defines the largest portion of the weight which is a key measure. This calls for an automated process for the performance evaluation which could be simplified as the cost assessment of the structural design alternative.

Cost Structure
It is important to define a cost structure with the measures that have the highest impact [13]. Therefore, the aim was not to calculate or estimate the final cost, but rather to rank the design alternatives from cost point of view. For the structural design case, the most important parameters were identified as the material quantity and the production effort. Based on this assumption, evaluation of the performance for structural design could be reduced to the cost of material and cost of production which could be formulated as follows: BCsteel = MPCsteel + PLCsteel (1) BCsteel : Building Cost for steel structure MPCsteel : Material Purchase Cost for steel structure PLCsteel : Production Labour Cost for steel structure

Parameters and Adjustment Factors
A huge amount of data is stored in different databases of the shipyard. Before collecting the data, the requirements were identified. Shipyard experts were invited to define and select the critical parameters, measures and adjustment factors which are necessary for creating the cost evaluation algorithm. The focus was given on the case study problem. Major parameters and adjustment factors for material and effort related costs for steel structure are listed below in Table1 [14].

Data Scope and Source
The scope of the collected data was limited in order to ease the analysis processes and to increase the confidence for the derived relations. For this case study, data was collected for 12 Ro-Ro and similar type of ships having Length (overall) of between 99.8m -210m. Collected data-sets were organized in 6 Excel tables and are briefly explained below based on the source of information/data: • ERP System: Data-set named "Material and Effort" in

Pre-Processing and Data Cleaning
The collected data were mostly in spreadsheet format so they were mostly structured. Before making any analysis, so called 'data cleaning' should be performed to explore the data, removing mistakes to avoid errors and reducing the data for the required computing power. Data cleaning is defined as the process of identifying and removing errors in a data set [15]. Many researchers define this process as the most time consuming and challenging part of data analysis since most of the errors are discovered upon completing the analysis which requires repeating the previous tasks.
For this case study, all data was imported in Excel and some cleaning steps were applied in Excel Power Query where the user could shape and clean the data without making any changes on the original data-set. In addition, the applied steps could be viewed and changed any time. Some of these steps include arranging the column headers, assigning the data types correctly (integer, string, etc.), cleaning the "null" values, creating new columns by altering the existing ones, translation of terms to English, assigning new groupings, filtering irrelevant data and removing unnecessary data. The advantage of using Excel for this data cleaning is that the user is not required to have any previous coding skills. 47

Data Modelling
Since the data was collected from different sources, it is necessary to create a data model to define the connections and relations between these data-sets. The required data model was created by Excel Power Pivot (See, Fig 3) based on six data-sets explained in the previous section. This model uses three primary keys to make the connection between these data-sets. The main primary key is the Yard Number that is the building number given by the shipyard and is applicable to all six data-sets. Yard Macro Space (used for defining the major divisions of the ship) and Yard Group (used for production groups and workshops) data-sets provide a different level of connection This connection is applicable only for data-sets Materials, Effort and Detailed Design which include further bottom up details. This data model reflects both bottom up and top down approaches in order to find out the proper cost estimation relations based on the given set of data from twelve ships (See, Fig 3).

Creating Measures
Within Excel Power Pivot it is also possible to create measures which are defined by the use of DAX language (Data Analysis Expressions). The functions and operators of this language as well as the formula-writing principals are very similar to the ones in standard formulas in Excel. Therefore, an average Excel user would require only some basic trainings to build complex formulas and operations as if column based selective calculations for a larger set of data. For this case study, the measures were created and then were combined with cube functions in Excel to prepare summary tables. These summary tables brought together all collected data from different sources, department and experts and it is now possible to perform statistical analysis as will be seen in the next step.

Step 4 Analysis, Relations and Adjustment Factors
In the previous steps, the collected data was cleaned, sorted, brought together in a data model and summarized with measures. In this step the summarized data was analysed and investigated for relations between parameters by use of Regression Analysis, which is also available in Excel. Statistical significance was checked through predictive lines with the corresponding regression coefficient R², which shows how good the line fits to the sample. Higher R² values (0.6 and above) show better relation and higher statistical significance.
Analysis showed, as expected, a strong relation (R²=0.8019 for linear function) between the total net steel weight and overall length of the ships (Fig 4). Besides several other expected relations, also other less expected and yet logical relations were found. For instance a significant relation (R²=0.9056) was found between the adjusted labour productivity (manhours/kg) and the ratio of weight of profiles to the weight of plates (Fig 4 and Fig 5).
Accuracy of the predictions could be further increased by use of Multiple Linear Regression Analysis. For example, a linear function for predicting the labour productivity by use of three parameters (Length Overall, total net weight and the weight ratio of profiles to plates) gave better statistical results (R² = 0,9969) when compared to other relations having only one parameter (Fig 6). These findings do not only prove the success of the data collection and analysis processes, but also provide the necessary predictive functions for the next step (Fig 6). Based on above cost structure and the analyses, an algorithm was developed to assess the design by calculating a dimensionless cost index. The monetary values were normalized by the most significant cost item. For this case, the unit cost of A grade steel material was 49 selected for normalization. Higher cost index refers to a relatively higher cost. Calculated cost index were verified by checking the relations with the most obvious parameters. As expected, cost index increases as the net weight increases and it decreases as the labor productivity increases. These relations and their significance are also given in Fig 7 and   In the final stage of the study, the requested input data was divided into two main groups. The first one is the general data which was applicable to all new design alternatives and includes; A Grade Steel Plates Price (€/ton); A Grade Profiles Price (€/ton) and Price Increment Factor for Steel Material other than A Grade (%). The other part of the data includes the design specific data and includes; Weight of A grade steel plates (tons), Weight of Holland profiles (tons), Ratio of weight of profiles to ratio of weight of plates and total net weight. Based on these input from new design alternatives and the algorithms which were developed based on the previous ships, the dimensionless cost index was calculated for each design option. Another part of the case study explores whether the new design is easy to produce or not. So the output file consists of the cost index and production score per design which allows the designer to rank these alternatives. In Fig 9 a partial view is shown for an output file which is in fact for over 200 design alternatives (Fig 9).

Conclusions
Performance evaluation and cost are very important factors for ship production when the product price is taken into account. Higher performance is an outcome of higher productivity and lowered costs. For the steel hull construction of a ship it is very useful to evaluate the performance based on the cost drivers such as steel weight, distribution of steel materials and labour productivity. This requires handling and analysing a huge amount of data from different sources at a shipyard.
In this study, the importance of data analysis for turning data into knowledge were examined. It is demonstrated with a step-by-step guideline how this could be done for small and medium sized shipyards. A data-driven performance evaluation method was described and implemented in a case study for contract phase of a structural design work.
For the case study the real shipyard data were collected, organised and implemented in a data model. Based on this model it was possible to investigate relations between different parameters and to develop predictive algorithms. It is obvious that the costs are increasing when the net steel weight increases. Similarly, it was expected to see that the cost reduces when labour productivity increases. However, without the use of data-driven approach, it would not be possible to turn these relations into functions that are based on real data and open for flexible improvement. These functions were then used to calculate the cost index and to rank hundreds of different structural design alternatives.
It is shown that without even costly investments, it is possible to create data-driven algorithms which may lead to automated cost estimation and performance evaluation. With the selected tools it is easy to change the way of cost calculation, used parameters and variables at any time and without any prior coding knowledge or any expertise. Flexibility of the tools also allow for removing any bias when evaluating the performance of a specific unit by implementing filters, normalisation of monetary values, etc. into the analyses.
A shipyard could prepare its own analysis as well as standard algorithms by following the steps and the generic methodology as presented in this study. It should be noted that it is crucial to have shipyard experts involved and support them with training in all steps in order to have a comprehensive data model and also to change and improve the algorithms when needed. Suggested model could further be enriched to assess the investment decisions for new technologies and materials and their impact on the cost of the product. It is believed that the proposed model will encourage SME sized shipyards to keep record of their production related data and to consider how they could improve their productivity.
The herein selected case study covers the cost estimation for the steel production. The proposed model could be implemented also for the outfitting phase based on major zones like engine room, cargo space and accommodation area. A relation could be investigated between parameters that are affected by the ship main particulars e.g. volume of the space, number of decks, number of passengers, lane meters etc. and the amount and type of outfitting material and consequent effort to install them, e.g. insulation, piping, cabling, etc.
This study was intended as a pioneering work for the small and medium sized shipyards to benefit from recent developments in data science and data analytics tools. This is entirely within the scope of Industry 4.0 that should be also implemented in smaller maritime technology enterprises. Future steps might also include extending the analysis with more data, or a data model which is directly connected to the existing databases, or implementing machine-learning techniques to gather further insights and knowledge.