A method for the direct assessment of ship collision damage and flooding risk in real conditions

Collision accidents may lead to significant asset damage and human casualties. This paper introduces a direct analysis methodology that makes use of Automatic Identification System (AIS) data to estimate collision probability and generate scenarios for use in ship damage stability assessment. Potential collision scenarios are detected from AIS data by an avoidance behaviour-based collision detection model (ABCD-M) and the probability of collision is estimated in various routes pertaining to a specific area of operation. Damage extents are idealised by the Super – Element (SE) method accounting for the influence of surrounding water in way of contact. Results are presented for a Ro - Pax ship operating from 2018 to 2019 in the Gulf of Finland. It is confirmed that collision probability is extremely diverse among voyages and the damages obtained correlate well with those adopted by the UN IMO Regulatory Instrument SOLAS (2020). It is concluded that the method is by nature sensitive to traffic features in the selected case study area. Yet, it is useful for the evaluation of flooding risk for ships operating in real hydro-meteorological conditions.


Introduction
Ships are Systems of Systems (SoS) operating in complex traffic situations and extreme environmental conditions. Ship-ship collision accidental events may result in devastating consequences such as ship capsizing/sinking, leading to oil spills and human fatalities. The latter is of particular relevance to passenger shipping operations and the mitigation of risks associated with ship damage stability following serious flooding events . To mitigate risks in real traffic scenarios and environmental conditions, it is necessary to develop rapid maritime risk assessment tools.
To date, the evaluation of collision probability of occurrence and associated consequences have been based on historical records of accidents and expert judgment (e.g., Huang et al., 2020a;Fan et al., 2020). Examples of methods used are Fault Tree Analysis (FTA) and Event Tree Analysis (ETA) (e.g., Zhang et al., 2019;Martins and Maturana, 2010); Bayesian Networks (BNs) (e.g. Kelangath et al., 2012;Zhang et al., 2020a;Montewka et al., 2014); Hybrid method by combining Fault Tree Analysis and Bayesian Networks, namely Hybrid Causal Logical (HCL) (e.g., Ramos et al., 2020;Wang et al., 2020) and maritime traffic simulation (e.g., Goerlandt and Kujala, 2014;Rawson and Brito, 2020;Jiang et al., 2021). These studies illustrate the factors that influence collision risk and may help identify the probability of occurrence of extreme events that may influence ship safety during operations. However, they do not account for traffic complexity and collision-based probabilistic damages in real operations. Thus, they are not reliable in terms of estimating the collision risk level and possible consequences to crew during operations.
To assess the risk of collision in real conditions it is useful to idealize complex traffic scenarios using Automatic Identification System (AIS) data. Methods to evaluate collision probability utilising big data are: ship domain (e.g., Zhang et al., 2015a;Szlapczynski and Szlapczynska, 2017); Vessel Conflict Ranking Operator (CVRO) (e.g., Fang et al., 2018;Zhang et al., 2015b); DCPA and TCPA (e.g., Lopez-Santander and Lawry, 2017;Zhao et al., 2016); Velocity Obstacle (VO) (Chen et al., 2020;Du et al., 2020a;Huang and van Gelder, 2020). Collision risk is defined as the consequence of a damage and the probability of occurrence (Goerlandt and Montewka, , 2015). Results from these approaches illustrate the collision risk probability/frequency in a specific area but underestimate accidental risk indices (i.e. indices linked to the occurrence of an accident). This is because they do not consider collision consequences (i.e., possible damage breach, and the damage stability following serious flooding events) during real traffic operations. To explore collision risk after the potential collision scenarios are detected and in real conditions, a rapid direct method is necessary. Such method could be used to assess both collision probability, possible damage distributions and provide a convincing justification for Risk Control Options (RCOs).
To date, SOLAS 2020 damage stability assessment is based on a probabilistic distributions of damage characteristics that originate from pooled analysis of collision accidents on all types of ships available from accident statistics (IMO, 2018). These damage distributions do not explicitly consider the differences in structural design of each ship and ignore the influence of real traffic situations. To assess the consequences of ship-ship collision nonlinear structural analysis by finite element analysis methods is essential (e.g., Amdahl, 1982;Wierzbicki and Abramowicz, 1983;Simonsen and Ocakli, 1999;Liu et al., 2018). This approach provides accurate results and allows for a refined investigation of the impact process (e.g., Le . However, the models should be sufficiently refined to accurately capture the crushing mechanisms. As a result, numerical simulations become exceptionally time demanding. To overcome these challenges simplified approaches (empirical or analytical) have been developed (e.g., Pedersen, 2010;Liu et al., 2018;Kim et al., 2021). Another idea has been to model vessels with very large-size structural units (the so-called SuperElements -SE) and to derive closed-form analytical formulations of the resistance of each unit (Buldgen et al., 2012(Buldgen et al., , 2013(Buldgen et al., , 2013Le Sourne et al., 2021;Conti et al., 2021). Then, by combining properly the individual resistances, it is possible to rapidly calculate the dimensions of the breach on the struck ship's hull. The SE method opens the way toward the development of direct assessment methods for the evaluation of ship collision probability as well as collision damage reflecting traffic situations. Such methods can explicitly consider the influence of the ship structural design on collision damage distributions to be used within the context of damage stability analyses.
This paper proposes a method that brings together knowledge from big data analytics for the estimation of collision probability and the assessment of ship damage stability following a collision event, possibly leading to serious flooding under real operating conditions (see Section 2). In Section 3 a set of realistic collision scenarios are identified via processing AIS data for all ships operating in the Gulf of Finland from 2018 to 2019. Then, the ship to Ro-Pax ship collision probability is estimated in various routes considering traffic uncertainty and structural crashworthiness is accounted for by fully coupling the external dynamics and the internal mechanics of the struck ship via the SE method. Damage stability analysis focuses for the case of FLOODSTAND SHIP B (Luhmann, 2009)  Scaling factor from those embedded in IMO SOLAS (IMO, 2006) in which the damage distributions result from accident statistics, mainly related to cargo ships. This allows better insight into the nature of collision risk from both probability and consequence perspectives. It also provides information to support the decision making of the crew during operations and strengthen ship resilience under off-design conditions and throughout ship lifecycle.

Methodology
The high-level framework for ship-to-ship collision detection, collision probability evaluation, and structural damage breach simulation in real operational conditions using AIS data is shown in Fig. 1.
Detailed discussion on the methodology associated with the collision scenario detection is presented in Zhang et al. (2021). The collision risk estimation process (Fig. 1) comprises three steps: • Step (i) where Ship Trajectories (STs) are reconstructed using AIS data that contain static voyage and dynamic navigation details. The process is used to cluster ship trajectories of the struck ships by using K-means for static voyage clustering and DB-SCAN for dynamic navigation features clustering. • Step (ii) -Cluster collision scenarios are identified using the proposed avoidance behaviour-based collision detection model (ABCD-M). The collision probability is estimated with the focus on the ship to (Ro-Pax) ship collision. • Step (iii) -for each collision scenarios, collision breaches are evaluated using the struck ship SHARP model.
In the method presented potential collision scenarios, evasive actions, and crash scenarios are defined as follows: • Potential collision scenario is a critical situation that triggers the ship to take evasive action when a collision accident may occur if no evasive action is taken. • Evasive action is an operational routine that encompasses changes in speed, course, or their combination should be carried to avoid collision event. • Crash scenario is a critical situation which reflects the eventuality of a potential collision event when the evasive action is underestimated. In a crash scenario the parameters for crash analysis are defined based on AIS data describing the navigation patterns of the ships involved (i.e., striking ship type, striking ship initial surge velocity, etc.).
Based on the above collision damages and flooding risk are illustrated and quantified in real conditions.

Step i: clustering of ship trajectories
To estimate ship collision probability in various routes, the ship trajectories should be clustered by similarity measurements at first. This section demonstrates the use of methods for ship trajectories clustering using AIS traffic data.

AIS traffic data
The automatic identification system (AIS) is an automatic tracking Fig. 1. The logic of collision probabilistic damage assessment using big data analytics.
system that uses transceivers on ships and is used by vessel traffic services (VTS). Use of the AIS systems has been required by the International Maritime Organization (IMO) since December 31, 2004. The regulation requires AIS to be fitted aboard all ships of 300 gross tonnage and upwards engaged on international voyages, cargo ships of 500 gross tonnage and upwards not engaged on international voyages and all passenger ships irrespective of size. Automatic tracking systems may be used to identify and locate ships through data exchange with nearby ships, AIS base stations, and satellites. AIS big data streams contain multiple parameters related to static voyage features (e.g., departures/ destinations, voyage length) and dynamic navigation features (e.g., speed, course, motion parameter variation, and ship trajectory spatial distance). Availability of AIS data is of course critical factor ( Fig. 2 and Table 1).
In an AIS data stream trajectory paths are defined as follows: where, i stands for a trajectory number; j is the number of timestamp of ship trajectory Tr i (also is the number of point of ship trajectory Tr i ); T j is the j th timestamp; p i j denotes a point of ship trajectory Tr i in multidimension space that contains IMO number, MMSI of the ship, timestamp, geographical position, speed, course, heading, ship type, ship length, ship width, and draft; n is the total number of the points in the trajectories Tr i and p i 1 ,p i n represent ship departure and destination points of ship trajectory Tr i .

Ship trajectories clustering methods
Ships navigating in various routes will encounter complex traffic. To explore collision probability, ship trajectories should be clustered based on similarity measurements accounting for static voyage features (e.g., departures/destinations, voyage length) and dynamic navigation features (e.g., speed, course, motion parameter variation, and ship trajectory spatial distance); see Appendix B.
Although IMO number/call signs can be used as labels to separate ship trajectories (STs) of various ships, existing methods do not offer automatic means for ship trajectories clustering in various voyages. This is because it is difficult to derive available labels to fully explore both static and dynamic navigation features of STs in complex traffic scenarios. Thus, when using information directly from historical AIS data (i. e., MMSI, IMO number, call signs) ship voyages cannot be separated automatically. Big data clustering may be useful in terms of grouping STs by measuring the similarity between available data streams (Rong et al., 2020).
In this work K-means and DB-SCAN are selected and employed to cluster STs (see Appendix A). K-means algorithm is a distance partition method (e.g., see Zhen et al., 2017;Zhang et al., 2020a;Cai et al., 2020) used to evaluate static voyage features. Density-Based Spatial Clustering of Applications with Noise (DB-SCAN) is a density-based clustering non-parametric algorithm (e.g., see Zhao et al., 2017;Rong et al., 2020) used for clustering dynamic navigation features.
The method comprises of three steps, namely: (a) re-construction of STs; (b) grouping of static data by K-means and (c) clustering of dynamic data by DB-SCAN. Fig. 3, demonstrates an example of the ST clustering process for one ship with 6 STs (voyages) sailing in a given area. ST1 is opposite to ST2 and likewise ST3,4 are opposite to ST5,6. Despite the fact that ST3,4 describe trajectories of ships navigating between the same departure and destination points, they are different. Similarly, ST 5,6 head in the same direction. However, ships on ST5 are faster than those on ST6. Separation of the STs and exploration of the collision risk is achieved as follows: • K-means algorithm is used to classify STs into 4 clusters using static voyage features (departure, destination, voyage length). In this way, ST1, ST2, ST3,4, and ST5,6 are split in different clusters. • DB-SCAN algorithm is employed to re-cluster results using dynamic navigation data (ship speed, course, motion parameter variation and trajectory spatial distance). In this way, ST3, and ST4 (ST5, and ST6) are split in different sub-clusters.

Step ii: collision detection and probability estimation
This section demonstrates the use of methods for potential collision scenarios detection and collision probability estimation for the clustered ship trajectories of Ro-Pax ships outlined in Section 2.1.

Ship to Ro-Pax ship collision scenarios detection
The model developed to detect potential collision scenarios using AIS traffic data in real operational conditions is shown in Fig. 4.
Ship evasive actions take place when ship manoeuvres result in motion changes. In Fig. 4, ship trajectories Tr i and Tr (i+γ) relate to struck (own ship) and striking ship (target ship). A struck ship may encounter the target ship over four stages shown on the top of Fig. 4, namely: (a) unconstrained navigation; (b) encounter; (c) collision avoidance; (d) clearance. During stages (a) and (d) there is no collision likelihood between the two ships because there are no ships within 6 nm radius or ship trajectories diverge. At stage (b) when the rate of change of bearing angle Δβ relative to struck ship falls within [-2.00 to +2.00] 0 a collision accident may occur if no evasive action is taken. As the distance between ships keeps shrinking the rate of change of bearing angle Δβ exceeds the range of [-2.00 to +2.00] 0 , thus indicating that the give-way ship changes her course to avoid collision (stage c). At the collision avoidance stage (c), the minimum distance between striking and struck ships is below 3 nm, the minimum DCPA is below 1 nm and the minimum TCPA is located within (0-30) mins. The endpoint of the collision avoidance stage is defined as the point where TCPA becomes 0. If TCPA is below 0, there is no collision risk stage (stage d). The collision detection method comprises: (i+γ) ; where [T (j) , T (j+m) ] denotes the timestamp interval of the two series; • Part B during which ship encounters are determined based on ship course, bearing angles, TCPA, DCPA, rate of turn (ROT), and the difference between the headings (Fig. 4); Part C where we classify collision scenarios as per COLREGs (Zhang et al., 2021).
To analyse the collision avoidance behaviours, the ship trajectories Tr target ship and Tr own ship during evasive action were defined as follows:

Ship to Ro-Pax ship collision probability estimation
The objective of collision risk analysis is to find out what might happen, how probable it is, and what are the consequences. In the research domain of probabilistic risk analysis (PRA) and probabilistic safety analysis (PSA), collision risk entails the product between the probability of the unexpected event and the consequences if it occurred.  The collision probability is often defined as the probability of the number of collisions per year or the number of years per accident.
Collision scenarios are determined according to the International Regulations for the Prevention of Collisions at Sea (COLREGs) convention (Johansen et al., 2016). The analysis accounts for the relative speed, position, heading, bearing angle, and course of ships (see Fig. 5).
The probability of a collision between two ships is estimated according to Fujii and MacDuff' model (Fujii and Shiobara, 1971;Mac-Duff, 1974;Gil et al., 2020) as: where N k a is the geometrical probability or the probability of being on a collision course for three encounter types (crossing, overtaking, and head on); P k c is the causation probability or the probability of failing to avoid the accident while being on a collision.
The causation probability can be estimated on the basis of different scenarios or by the so-called synthesis approach. The former is sensitive to available accident data (e.g. Pedersen, 2002). This is the reason why most methods use the synthesis approach. According to this method probability of error is found by application of a BBNs (e.g. Kelangath et al., 2012;Martins and Maturana, 2013) or by the use of Fault Tree Analysis (e.g. Zhang et al., 2019). As shown in Table 2, in the Gulf of Finland, the causation probability was studied using the synthesis approach by Kujala et al. (2009), Goerlandt and Kujala (2011) and Montewka et al. (2014). The results indicate that a causation probability of 1.3 × 10 − 4 for crossing ships and 4.9 × 10 − 5 for head-on and overtaken is derived.

Step iii: SHARP modelling of collision scenarios
The estimation of potential collision scenarios assumed that the evasive action is underestimated. By making ships fall into the eventualities, a collision accident happens along the route. This is used to determine the collision parameters based on the AIS information of the ships involved. This section demonstrates the methodology developed to determine collision parameters and crash analysis using SHARP (Besnard and Buannic, 2014).

Collision scenarios analysis
A ship -ship encounter comprises four stages, namely (a) unconstrained navigation; (b) encounter; (c) collision avoidance; (d) clearance. In such scenarios, an accident would occur if no evasive manoeuvres were made. Based on AIS data streams the mass can be roughly inferred from the ship size and ship specification, which may be related to the consequences of a collision . As shown in Fig. 6 the collision and possible relative striking positions are classified as:(a) Front-side, (b) Head -head, (c) frontal, (d) Front-side, (e) Rear-end. Consequently, the anticipated relative collision location along a ship hull can be estimated for use in crashworthiness analysis.

Collision crash analysis using the SE method
The SE method is used to model the ship into very large-sized structural units (the so-called Super-Elements), for which closed-form analytical formulations have been derived, see e.g. Amdahl (1982), Wierzbicki and Abramowicz (1983) or Simonsen and Ocakli (1999). These formulations, based on plastic limit analysis and experimental data, characterise the resistance/energy dissipation of the SE depending on its type and deformation mechanism.
As the impacting vessel is moving forward into the struck structure, the super-elements are successively activated and their contribution to the total collision force is evaluated. The force F leading to the collapse of a given structural component is obtained by the so-called upperbound theorem (Jones, 1989): where:  To derive the force F analytically, the following assumptions are made: • The constitutive material is assumed to be perfectly rigid plastic, • Shear effects near the plate edges are neglected so that the total internal energy rate is obtained by summing the contribution of bending and membrane effects, which are assumed to be completely uncoupled.
For example, for a plate in a plane-stress state, assuming that bending effects are confined inside m plastic hinges, the bending Ė b and membrane energy rates Ė m can be calculated by using the following two expressions: where M 0 is the fully plastic bending moment, σ 0 is the flow stress, A and t p are respectively the area and the thickness of the plate, θ k and l k are the rotation and the length of the hinge number k.
The SE method is implemented in the software SHARP. As illustrated in Figs. 7 and 4 types of super-elements allow to model the struck ship: • Hull and longitudinal bulkheads • Vertical frames and transverse bulkheads • Secondary stiffeners • Stringers, decks and bottom Within SHARP, the resistance of each SE has been derived by Buldgen et al. (2012Buldgen et al. ( , 2013 for ship oblique collisions. Then, by combining the individual resistances, it is possible to obtain a global evaluation of the ability of both the striking and struck ships to withstand a collision event. The SHARP internal mechanic's solver has been coupled by Le Sourne (2007) with an external dynamics program named MCOL able to idealize global ship motions, taking into account the forces exerted by the surrounding water. The first version of MCOL was developed and included in LS-DYNA FEA software by Mitsubishi (LSTC, 2018). The current version has been entirely rewritten to take into account large rotational movements driven by the crushing force and hydrodynamic forces (water added mass, wave radiation damping, and restoring forces) and to introduce drag damping effects (Ferry et al., 2002). Implemented in LS-DYNA in 2001, this new version was used to simulate the large rotational movement of submarines impacted by surface ships (Le Sourne et al., 2001), and to study surface ship collisions (Le Sourne et al., 2003). Some practical applications of SHARP/MCOL tool can be found in Le  and Paboeuf et al. (2015Paboeuf et al. ( , 2016. To consolidate the reliability of this approach, a benchmark study has been recently carried out in which SHARP/MCOL calculations have been compared to finite element results (Le .
In SHARP, a collision scenario such as the one presented in Fig. 8 is defined by the following parameters: After solving the collision problem, SHARP provides an estimate of the damage bounding box, based on the computed penetration as well as the shape of the striking ship's bow.

Case study
In this section, considering all large RoRo/Passenger ships (Ro-Pax) (46,124 GT > Gross tonnage > 10,000 GT; 218.8 m > Length > 120 m) as struck ships, a case study is carried out by using AIS data covering a 13-month ice-free period (year 2018 and 2019) in the Gulf of Finland (see Fig. 9). The types of ships operating in the Gulf of Finland during this period are shown in Fig. 10. Notably, 9.8% of the STs involve Ro-Pax ships.

Collision scenarios modelling
In this Section, ship trajectories of Ro-Pax ships in the Gulf of Finland are clustered and probabilities are estimated in various routes. The results are validated using historical collision accidents.

Ship trajectories clustering
To analyse the ship to Ro-Pax ship collision risk in various voyages, the K-Means algorithm and DB-SCAN were used to cluster voyage details of ship trajectories of potential struck ships (Ro-Pax ships). According to the STs clustering procedure outlined in Fig. 3, the STs of Ro-Pax ships were extracted from AIS database. The 12,214 ship voyages of struck ships were divided into 17 clusters (16 complete clusters and 1 incomplete cluster); see Fig. 11.

Potential collision detection for each cluster
For each of the clusters outlined in Fig. 11, potential collision scenarios were identified. Fig. 12 demonstrates the locations of Ro-Pax ships during a 13-months ice-free period (the year 2018-2019). It was found that 50% of the potential collisions occur in cluster 13 (i.e., after leaving the port of Tallinn and towards Helsinki). The probability of potential collisions per journey was calculated as: Fig. 7. Different types of super-elements. Fig. 13 shows that the number of potential collisions per journey is at its highest in cluster 11 (3.03 potential collisions per journey). This indicates that collision risk estimates may be extremely diverse among voyages. Clusters 6 and 13 are located in the same route. However, the voyage is reversed between Helsinki and Tallinn. In cluster 6, 0.25 potential collisions per journey or one potential collision per 4.0 Ro-Pax journeys occur. However, 1.13 potential collisions per journey in cluster 13 are 4.52 times that those observed in cluster 6.

Probability of collision
To estimate ship to Ro-Pax ship probability of collision, scenarios (N a ) and causation probability (P k c ) should be taken under consideration. Along the lines of the method presented in Section 2.2 9240 potential collision scenarios (Crossing: 6213; Overtaking: 125; Head-on: 2902) were detected (see Fig. 12). Consequently, P k c could be evaluated for all ship types using the synthesis approach of Section 2.3. If we assumed TSS (traffic separation scheme) for all ship types the causation probability for head on collisions (P h c ) and overtaking cases (P o c ) could be 4.9 × 10 − 5 leading to P c c = 1.3 × 10 − 4 . However, the aim of this study has been to evaluate ship to Ro-Pax ship collision only during a 13-month ice free period. Thus, the evaluation of causation probabilities considered the following factors: • The ratio of Ro-Pax ships to the number of other ships navigating in the GoF is 0.098 (see Figs. 9 and 10). So, the causation probability P k ship− RoPax between ship and Ro-Pax ship equals to 9.8% × P k c . • If we assume 50% chances to scenarios idealising a Ro-Pax ship as a struck or striking ship  the causation probabilities P k RoPax involving Ro-Pax ship as struck ship are 9.8% × 0.5 × P k c . • To evaluate the collision probability or frequency per year (12 months), a calibration coefficient TT should be considered.
Accordingly, collision probability involving Ro-Pax as struck ship was estimated by Equation 10-12: Thus, the model gave 4.32 × 10 − 2 accidents per year or a collision per 23.1 years, including 2.70 × 10 − 3 for overtaking, 6.43 × 10 − 4 accidents per year for heading on, and 3.65 × 10 − 2 accidents per year for crossing encounter type. This observation leads to the conclusion that   potential scenarios can be evaluated, focusing on various clusters (voyages) in more detail. The conditional probability is evaluated according to: The probability of collision was estimated for 16 clusters (see Fig. 13). This resulted in 0.021 accidents per year for cluster 13 (see Table 3).
To validate the results of probability estimates, historical collision accidents in the Gulf of Finland were reviewed (see Fig. 14). Based on the casualty reports of FMA, BMEPC and HELCOM over 21 years only one Ro-Pax ship accident took place during ice-free period. This corresponds to 4.76 × 10 − 2 accidents per year. This value is in good agreement with the results of collision probability estimation (4.32 × 10 − 2 accidents per year).

Damage breach simulation
This section runs all the potential collision scenarios identified on the reference ship as struck ship using the crash analysis software SHARP.

Description of collision scenarios
Results confirm that if a struck ship's course falls into the detected potential collision scenarios (eventualities) outlined in Section 2.2, action should be taken to avoid the collision. Fig. 15, demonstrates the manoeuvres of the give-way ship along a red track with the aim to avoid collision accident. Give-way ships may ignore evasive actions along the initial trajectory and in turn this may result in damages following a collision event. If the evasive action is underestimated for each of the potential collision scenarios detected in Section 3.1.2, ships may fall into eventualities and the collision accident will happen along the original routine. For the case study presented in this paper potential collisions during evasive actions were accounted for (see p TS (j+t) on Fig. 15) and the motion features of involved ships were used to feed in crash analysis. In doing so, possible collision consequences were used to support risk mitigation in advance. The results outlined may strengthen ship resilience under off-design conditions and throughout the lifecycle.
After collision scenarios detections, Fig. 13 shows that the 9240 potential collision scenarios are detected using the proposed ABCD model, covering a 13-month ice-free period (year 2018 and midterm 2019) in the Gulf of Finland. Among them, 9012 potential collision scenarios are determined into 16 clusters according to the results of STs clustering. In this Section, 3491 potential collision scenarios among them during year 2019 periods of the selected large Ro-Pax ships (46,124 GT > Gross tonnage > 10,000 GT; 218.8 m > Length > 120 m) as struck ship were used to feed in real complex data in the direct approach.
The objective of the direct approach is to obtain a probabilistic  description of the damage breach, for a specific ship in specific operational conditions. Within the current SOLAS framework (IMO, 2018), the underlying damage distributions correspond to the distributions of damage, knowing that a breach had occurred following a collision event. Therefore, the collision probabilities identified in Table 3 do not need to be considered as such. Instead, the relevant information to be used for crash analysis relates to the parameters defining each potential collision, assuming that this collision happened. Given the way the potential scenarios are obtained, they are assumed to be equiprobable (i.e. p = 1/ N c ). From AIS data analysis, the following data was made available for the description of each collision scenario: The probability of the event (assumed to be equal to 1/N c ), the length and width of the striking ship, the surge velocity and draft of the struck and striking ships, the displacement of the striking ships, the collision angle, the type of collision (head-on, overtaking, crossing), and the type of striking ships (Group 1: Tankers, Group 2: Passenger ships, Group 3: Bulk carriers, Group 4: Container ships, Group 5: General cargo ships, Group 6: Other ships such as tug boats, trawlers, research vessels); see Table 4.
From SHARP analysis, the outcome of a low-energy collision event with such small striking ships will be a low penetration. However, it is difficult to estimate with accuracy if this low penetration corresponds to a hull breach rather than a dent (i.e., no breach in the hull). Furthermore, type of minor collisions with small ships and no hull breach are believed to be under-reported in damage statistics. Therefore, filtering has been applied and all the striking ships with lengths lower than 50 m have been removed (90% of which are L < 33 m, B < 10 m, displacement < 850 tons). The data has been further filtered by excluding headon and overtaking scenarios, thus keeping crossing collision scenarios. After filtering, 2124 collision scenarios remained available for crash analysis. The distributions of striking ship type, length, width, speed, and displacement obtained after filtering are shown in Fig. 16 and Fig. 17.
In Fig. 18, the distributions of initial surge velocities for struck and striking ship are plotted. Regarding the distribution of the struck ship surge velocity at the onset of the collision event, it is observed to be significantly different than the one considered by Lützen (2001), based on collision accidents statistics. Indeed, it assumed a triangular decreasing distribution with a most likely value equal to zero speed, whereas the present distribution entails a large majority of collision scenarios with struck ship having an initial surge velocity between 14 and 25 knots in this work (Lützen, 2001). Regarding the distribution of the striking ships surge velocity at the onset of the collision event, three peaks emerge (at approximately 10 knots, 17 knots, 22 knots). Also, it can be observed that the scenarios corresponding to the collision at high striking ship velocity correspond to Group 2 striking ships (Passengers ships), see blue bars on the right of Fig. 18.

SHARP modelling
Based on the detected collision scenarios, realistic striking ship models are to be built. In the definition of all the potential collision scenarios, the striking ships are defined based on their length, width, type, displacement and draft. However, when it comes to striking ship modelling in SHARP, the actual bow shape, as well as hydrodynamic properties shall be defined. In order to build realistic striking ship models in SHARP, the already built models from the SHARP striking ships database (see Fig. 19 and Table 5) are reused and scaled according to the required striking ship properties for each potential collision scenario. Accordingly, for each striking ship to be modelled, the reference ship with the closest properties was selected and an average scaling factor was defined as per Equation (15). In order to check that the striking ships to be modelled can satisfactorily be defined using the above approach based on homothetic scaling, the real length and breadth ratios L L ref and B B ref have been compared with the averaged ratio λ h . In average, for all the modelled striking ships the difference is about 10%, which is deemed acceptable. An example of scaled bow shape of striking ship is shown in Fig. 20. Since the striking ships scantling is not known, the striking ships have been considered to be structurally rigid during the collision, which provides a conservative estimate of the computed damages on the struck ship since no energy can be absorbed by deformation of the striking ship's bow. The reference ship design was then scaled to the striking ship to be modelled, both in terms of geometrical bow shape and in terms of hydrodynamic properties (the scaling factor is used as length scale for Froude scaling in this case).
Regarding the definition of the hydrodynamic properties of the striking ships to be modelled, it appeared from multiple testing with SHARP/MCOL that for the striking ship the inertia matrix is by far the most influential hydrodynamic parameter on the outcome of the collision analysis. Based on the known length, breadth and displacement of the striking ship as well as the use of some correlations providing the ship gyration radii, the inertia matrix can be completely defined, with no need for scaling from a reference ship as Equation (16): where: ∇ Striking ship displacement R xx = 0.35 × B Striking ship gyration radius in roll R yy = 0.25 × L Striking ship gyration radius in pitch R zz = 0.25 × L Striking ship gyration radius in yaw The remaining hydrodynamic properties to be defined in MCOL (Position of centre of gravity and centre of buoyancy, added inertia matrix, stiffness matrix and damping matrixes) are specified by scaling of the hydrodynamic properties of the reference ship attached to the striking ship to be modelled. Since multiple physical units are involved, Froude scaling laws are used, considering λ h as the length scale.
Each potential collision scenario has been modelled based on struck and striking ships initial surge velocities, drafts and collision angle. To idealize the longitudinal position of the impact, a section of 100 m length representing the parallel ship body was modelled in SHARP and the longitudinal impact position was defined randomly according to a uniform law over a reduced 32 m length. Such approach yields a conservative estimate of the damage as little energy can be dissipated through the yaw motion of the struck ship.

SHARP struck ship modelling
For the case study, all the calculations have been carried out considering a struck ship with the main particulars of Table 6. The Super Elements structural description has been modelled for 100 m long centred on the mid-ship section. SE have been defined for side shell, decks, transverse bulkheads and longitudinal bulkheads. In order to keep a workable model for intensive computations, decks were modelled as continuous (i.e. no holes) with a homogeneous thickness, floors/girders within the double bottom were not modelled and secondary rooms were omitted.
All materials have been modelled as rigid-perfectly plastic with S 235 mild steel properties (see Table 7). In the SE approach, the membrane strain is calculated for the impacted SE subjected to membrane tension (e.g., side shell, longitudinal bulkheads) and compared to the material failure strain. The value of 10% was initially proposed for mild steel by (Lützen, 2001). Regarding the SE representing decks and bulkheads, they are supposed to deform first by concertina splitting, then by tearing along the edges. For these SE, normal and tangential components of the resistant forces are calculated at element edges and compared to empirical threshold values (determined from weld beads thickness and length). Expressions of the threshold forces are given in SHARP user's manual (Besnard and Buannic, 2014). The SHARP SE model of the struck ship is illustrated in Fig. 21. Her hydrodynamic properties (inertia, added inertia, restoring and damping matrixes) as required by MCOL have been obtained using BV Hydrostar (2019), in infinite water depth with no forward speed. The input data considered for the hydrodynamic calculations are summarised in Table 8.

Damage breach characterisation
Before presenting the damages obtained from crash analysis, the framework for damage characterisation shall be defined. Geometrically,   collision-type damage breach is represented as a box with two faces parallel to the waterplane, two faces parallel to the ship transversal plane, and two faces following the hull longitudinal shape at the waterline. Furthermore, the damage box crosses the waterline as well as one side of the ship. In the general case, the damage is modelled using the following 6 geometrical parameters (ind side , X c , L x , L y , z UL , z LL ), see Fig. 22. In SOLAS framework (IMO, 2006), the damage lower vertical limit z LL is not considered as a random variable. Instead, a worst-case approach is used for the computation of the s-factor in case of horizontal subdivision below the waterline. As an extension of the SOLAS framework, Bulian et al. (2019a) introduced a probabilistic description of this variable. The present paper considers this extended framework with a probabilistic description of the damage parameter z LL .

Crash simulation results
In this section, the results of crash simulation from the case study are presented. An A index is applied to assess the impact of damage breaches    obtained from direct simulations.

Simulated damage breaches
After simulating the detected collision scenarios in Section 3.2, data filtering has been applied before deriving damage distributions. Indeed, it was observed that for more than 30% of the scenarios, the damage vertical position lower limit is above the waterline. This is typically due to scenarios with collision angles close to 0 • or 180 • with limited penetration. To maintain relevancy with the framework defined by SOLAS (IMO, 2006) and extension by Bulian et al. (2019a) regarding the damage vertical limits, the scenarios leading to these damages were discarded. After filtering 1284 damages were made available for post-processing and analysis.

A-index calculation example
To assess the impact of damage breaches obtained from direct simulations, damage stability analysis was carried out using the non-zonal Monte Carlo method, for the design draft condition. This method allows the calculation of the A-index based on each of the individual simulated damages (i.e. 1 damage for Monte Carlo analysis corresponds to 1 crash analysis result), keeping all the potential correlation between the damage parameters. The use of this non-zonal framework has been for instance described by Bulian et al. (2016); Bulian et al. (2019b); Krüger and Dankowski, 2019. For a given ship draft, the calculation of the damage stability attained index using non-zonal Monte Carlo method relies on: • The generation of a list of n individual breaches (ind side , X c , L x , L y , z UL , z LL ) k,1≤k≤n sampled according to the desired probabilistic distributions, using for instance inverse transform sampling from the cumulative distribution functions. • Generation of damages corresponding to the rooms broken by the breaches. • The calculation using dedicated software of the survivability factor S k associated to each individual sampled damage.
Mathematically, the calculation of the partial attained index A for a given draft on the reference ship can be written as: where: P k = 1/n is the probability of occurrence of each individual damage S k is the survivability factor associated to an individual damage f is the survivability function, computed by damage stability software For the damage stability analysis, two calculations have been performed using NAPA Software. In these calculations, the s factor from SOLAS (IMO, 2018) was used to measure the survivability of each damage: • A reference calculation, considering damages sampled from SOLAS underlying distributions, extended by Bulian et al. (2019a) probabilistic description of the damage vertical lower limit • A calculation for which each damage directly results from the crash analyses. In this case, the longitudinal position of the damage centre has been resampled assuming this random variable to be uniformly distributed along the ship length, in accordance with SOLAS underlying assumption.

Discussion
This paper proposes a method that brings together knowledge from big data analytics for the estimation of the ship-ship collision detection and scenario analysis with structural damage simulations using the SE method. The results and findings from the case study are concluded, together with discussions.
Ro-Pax ship collision scenarios were detected in various routes using AIS data. The number of occurrences of potential collisions per voyage  during the period, that is, collision frequency (collision probability), was calculated ( Fig. 13 and Table 3). This has shown that the voyage may be the key influential factor contributing to collision risk, which is ignored in the traditional models. Therefore, it may be pointed out that the proposed collision risk analysis model can determine the effectiveness of the navigational safety strategy in more detail when it is further embedded with ship traffic in various routes, which could be a possible strategy to enhance navigational safety in the Gulf of Finland. Ro-Pax ship collision probability was evaluated by the ABCD-M model (see Table 3). For the purposes of validating results, accident statistics from various previous studies were considered (see Table 10). The collision probability obtained from the ABCD-M model was close to those presented by Otto et al. (2002). However, the ship to Ro-Pax ship collision probability value delivered by the study is a bit higher than the one obtained by Montewka et al. (2014). Differences against the results of Goerlandt and Kujala (2011) could be attributed to the fact that their study evaluate the collision probability based on all ship types whereas the present study considered only Ro-Pax ship collision.
Figs. 23-24 illustrate damages simulated from crash analyses both in terms of scatter plots and distributions. Collision scenarios and damages were obtained by simulation rather than real damages from collision accidents. The damages obtained using the simulation method introduced in this paper show distributions that are globally relevant with SOLAS, even if more severe. Notwithstanding this contrary to SOLAS distributions, the simulated damage distributions are generally not strictly decreasing. This could be explained by the fact that the collision scenarios include a significant part of group 2 striking ships (passenger ships), which lead to severe damages due to high kinetic energy. The distribution of damage vertical position lower limit somewhat differs from the description proposed by Bulian et al. (2019a) based on analysis of accident statistics. This is due to a large proportion of group 2 striking ships (See Section 3.2.1 in Table 4) which entail severe damages down to the bottom of the struck ship.
In Table 9, the partial A-index obtained from the directly simulated damage breaches is compared with the one obtained considering damage distributions from accident statistics. It is seen that the A-index resulting from simulated damages is significantly lower than the partial A-index computed with conventional damages from accident statistics. Such a decrease of the A-index was expected since the damage distributions obtained by direct simulations are more severe than the ones processed from accident statistics (see Figs. 24 and 25). This difference may be attributed to the severity of the collision scenarios considered (i. e. half of the potential striking ships considered were passenger ships sailing at full speed, thus leading to high-energy collisions).

Conclusions
This paper proposed a big data analytics framework for ship-to-ship collision detection, collision scenario analysis, damage breach simulation and damage stability assessment reflecting the influence of traffic complexity in real hydrometeorological conditions. A method for modelling collision scenarios using Avoidance Behaviour-based Collision Detection Model (ABCD-M), followed by collision probability evaluation and crash analysis using the SE method (SHARP software) has been developed and tested.
• For the case study presented it is confirmed that collision probability is extremely diverse among voyages. As this is ignored by traditional models it may be concluded that safety in navigation practice and standards would benefit from the method proposed. • It is assumed that evasive actions are underestimated. Ship damage distributions are then evaluated. The results outlined may present important support for risk mitigation in advance during shipping. • The A-indices obtained from the proposed method appear to be more conservative compared to those obtained when using SOLAS (2020). This could be attributed to the collision scenario distribution used is for a specific area, i.e. the fact that the damage breaches originate from simulations of Ro-Pax ships in Gulf of Finland. This is different compared to current SOLAS regulations that are based on global statistics of historical accidents involving mainly cargo ships. • The interpretation of damage stability results should be treated with care because the method presented is purely based on damage breach simulations that reflect traffic situations in a specific area. Since the proposed method is by nature sensitive to traffic patterns and hydrometeorological conditions it would better suit the evaluation of flooding risk under ship-and operation-specific scenarios. This logic is aligned with that of the safety case used by the offshore industry.
Future research could focus more on the estimation of ship collision damage and flooding risk by integrating some manoeuvre-based prediction techniques into the proposed approach (Gil et al., 2019(Gil et al., , 2020a. In addition, it would be insightful to explore the global ship collision damage and flooding risk of multiple ship types, to assist surveillance

Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgment
The research presented in this paper has received funding from European Union project FLooding Accident REsponse (FLARE) number 814753, under H2020 program. The authors express their gratitude for this support. The views set out in this paper are those of the authors and do not necessarily reflect the views of their respective organisations.

Appendix A. The ship trajectories clustering methods
• K-means algorithm K-means is a clustering algorithm based on Euclidean Distances that is easy to understand, implement and can handle large datasets. However, the K-means clustering test shows that if it considers more than three parameters for STs clustering, the performance is not worked well. This is because the K-means algorithm is difficult to handle both static voyage features and dynamic navigation features of complex ship trajectories. Thus, to explore the difference of ship trajectories in more detail, dynamic navigation features of STs are mined after K means clustering for DB-SCAN clustering.

Table 11
K-Means algorithm for STs clustering.
• DB-SCAN algorithm In contrast to the K-means method that applies to static points datasets, DB-SCAN is an algorithm that helps to form data clusters based on regular and irregular dense data, as presented in Table 11. But DB-SCAN algorithms may not work well with static voyage features (distance points datasets) of STs. This is the reason why the both K-means algorithm and DB-SCAN algorithm are used to cluster STs in the paper. where (lon 1 ,lat 1 )and (lon n , lat n ) denote locations of the departure and destination points, respectively; S p dd is a set including the distance between ship departure (p i 1 ,p i+y 1 ) and destination points (p i n , p i+y n ); n is the number of the waypoints of the ST i. Consequently, similarity parameter S l and similarity parameters S p dd and S st dd are used to represent the similarity of STs for K-mean clustering.
(2) For navigation features To explore the difference of ship trajectories in more detail, dynamic navigation features of STs are mined after K means clustering for DB-SCAN clustering. SOG (Speed Over Ground), COG (Course Over Ground), and variations of those (e.g., average value, median value, and variance) are considered for ship trajectories similarity measurement.
• The similarity parameters S sog , S cog and S mpv (Tr i , Tr i+y ) are defined as: where, the sog mean and sog median denote the average and median values of SOG, respectively; the cog mean and cog median denote the average and median values of COG, respectively; thesog interval , sog std , cog interval , and cog std represent variable interval and standard deviation of SOG and COG; Tr i and Tr i+y denote different STs. (

3) For spatial distance features of STs
• The spatial similarity parameter of two different STs is defined as Equation (B. 9), calculated using the Hausdorff distance algorithm (Kumar et al., 2020) Fig. B1. Illustration of the Hausdorff distance algorithm for the spatial similarity calculation of STs (Zhang et al., 2021).
The similarity of voyage features, navigation features, and spatial distance of trajectories are measured for DBSCAN clustering.