Automatic traffic scenarios generation for autonomous ships collision avoidance system testing

The Collision Avoidance (CA) system constitutes a key enabling technology for the Maritime Autonomous Surface Ships (MASS), the appropriate functionality of which is critical for assuring the navigation safety. Although several techniques including testing of the collision scenarios in a virtual environment can be employed, the trust of testing phase results depends on the number of tested scenarios and their coverage. This study aims at proposing a systematic and automatic process for the generation of the traffic scenarios that can be employed for the CA system testing. First, the range of the investigated parameters is defined, and samples of potential traffic parameters are generated using Sobol sequences. Subsequently, hazardous traffic scenarios are identified from the initially generated scenarios by using predefined rules. For these hazardous scenarios, a risk vector considering weather conditions and traffic conditions is calculated. A clustering algorithm is employed to identify the groups of traffic conditions that can be encountered based on each scenario risk vector and COLREGs traffic scenarios. For each of these groups, the riskiest scenario is provided as input for the test cases development, thus, simplifying the selection process of testing scenarios. The process is applied to a theoretical Short Sea Shipping autonomous vessel, whereas the derived results are employed to discuss the advantages and disadvantages of the developed process.

The MASS autonomy degree can be classified into four types according to International Maritime Organisation (IMO), ranging from ships with automated processes to fully autonomous ships (IMO, 2020).The research and development initiatives pertinent to MASS extend to multiple directions.Several research studies focused on the development of intelligent machinery systems in the context of MASS, such as in (Abaei et al., 2021;BahooToroody et al., 2022;Bolbot et al., 2021b;Tsoumpris and Theotokatos, 2022).The required regulatory aspects for MASS were investigated in (Bačkalov, 2020;Erceg, 2018;Nzengu et al., 2021;Ringbom, 2019).The aspects related to MASS remote control were studied in (Basnet et al., 2022), whilst m the Collision Avoidance systems were investigated in (Huang et al., 2020;Utne et al., 2020).
The Collision Avoidance (CA) system is considered a critical system for MASS, as it is expected to make decisions, affecting the safety of the own and surrounding ships (Bolbot et al., 2021a).Several approaches have been demonstrated for the design of CA systems with a variety of algorithms and models being considered, such as holonomic (Degre and Lefevre, 1981), kinematic (Vincent, 1977), dynamic models (Fossen, 2002) for ship motion modelling in CA manoeuvres, physics-based (Fossen, 2018), manoeuvre-based (Peel and Good, 2011), interaction-based (Hu et al., 2008) for ship trajectory prediction, CA based on collision risk estimation (Kearon, 1997), and artificial intelligence algorithms (Yu-Hong and Chao-Jian, 2005).A considerable number of the previous studies focused on the development of CA systems for conventional ships and MASS, as for example in (Brcko et al., 2021;Huang and van Gelder, 2020b;Mizythras et al., 2021;Namgung and Kim, 2021), or on the estimation of collision risk metrics (Goerlandt and Kujala, 2014;Huang and van Gelder, 2020a;Liu et al., 2019;Silveira et al., 2021;Tam and Bucknall, 2010;Zhang et al., 2017).
The existence of a plethora of advanced design methods and CA system solutions is an indicator of complexity and challenges associated with the safety assurance for CA systems.This statement is supported by the real-world accidents, which have occurred in other industries involving similar CA functions, due to defects in such systems (Guiochet et al., 2017;National Transportation Safety Board (NTSB), 2017).A prerequisite for avoiding malfunctions is to ensure adequate coverage of the potential scenarios that may be encountered by the ship during testing.This requirement stems from the problem of environmental complexity (Alexander et al., 2015), which indicates that MASS should interact properly with all the anticipated objects, under various conditions and traffic scenarios (Bolbot et al., 2019;Sørensen and Ludvigsen, 2017).
The exhaustive testing of all the potential traffic conditions to determine proper functionality of the CA seems to be a challenging endeavour, which is almost impossible to be achieved under the present state of technology (Torben et al., 2022).The number of scenarios increases exponentially with the number of parameters, therefore, there exists a critical threshold beyond which, exhaustive testing cannot be performed (Torben et al., 2022).The testing procedure is also restricted by the fact that the CA designers would prefer to conceal the details of their CA system, to avoid leakages of proprietary information (Pedersen et al., 2020).
This dual problem can be addressed by reducing the required number of tested scenarios, use of gradual approaches for the verification of the CA system, as well as by developing black box testing and verification techniques that would provide sufficient confidence to the CA system without the disclosure of proprietary information.In other words, only once an adequate number of scenarios are tested in a virtual or real environment, it can be assured that the CA system will not jeopardise safety when operating.Moreover, black-box functional or performance testing is desirable, as these techniques do not require the release of sensitive intellectual property information (Nidhra and Dondeti, 2012;Pedersen et al., 2020).
The navigation of ships is primarily regulated by the Convention on the International Regulations for Preventing Collisions at Sea (COL-REGs) (COLREGS, 1972).However, as COLREGs requirements were developed considering crew on-board (and not autonomous systems), they do not provide numerical criteria for crew actions, and their implementation relies on the crew judgement.Therefore, they cannot be used to develop a comprehensive set of testing scenarios (Porathe, 2019;Woerner et al., 2019), they do not include information on the frequency of traffic scenarios, or provide a comprehensive list of objects with which the MASSs will interact.These COLREGs limitations in relation to MASSs have been discussed at International Maritime Organisation (IMO) level (IMO, 2021), whilst some of the national authorities have already issued recommendations for application of specifically adapted for a MASS version of COLREGs (Russian Federation, 2021).
Automatic Identification System (AIS) data received from ship traffic systems can be a valuable data source for identifying such testing scenarios.It has been widely used for the analysis of traffic conditions, grouping of scenarios, identification of the most probable collision scenarios and development of safety domain as reported in (Gao and Shi, 2020;Goerlandt et al., 2017;Han et al., 2021;Kulkarni et al., 2020;Mieczyńska and Czarnowski, 2021;Mou et al., 2010).However, the quality of AIS data exhibit some limitations, as the ships with switched off their transponder and small recreational ships that are not required to have AIS, are not visible (IMO, 2015).Moreover, objects other than ships and buoys are not included in these data sets.Testing scenarios generated based on AIS data will inherit these limitations, resulting in incomplete number of scenarios being finally considered.
Several studies focused on testing cases generation, such as in automotive (Fremont et al., 2020;Khastgir et al., 2021;Li et al., 2016;Riedmaier et al., 2021), aviation (Lindvall et al., 2017), and other industries (Arora and Bhatia, 2018;Clark et al., 2021).Relatively few studies focused on both the testing process, testing cases generation, or formal verification in the maritime industry, as reported in (Fedorowski et al., 1979;Foster et al., 2020;Minne, 2017;Park and Kim, 2020;Pedersen et al., 2020;Rokseth et al., 2019;Shokri-Manninen et al., 2020;Stankiewicz and Mullins, 2019;Torben et al., 2022;Woerner, 2014;Yang et al., 2007).To the best of authors' knowledge, only (Torben et al., 2022) proposed a way to automatically generate testing scenarios.However, this approach requires the use of simulators and application of search algorithms resulting in a significant number of simulations for testing of the CA systems.This is an important limitation that need to be addressed.This study aims to develop a process that specifies traffic scenarios for the ship CA system testing in automatic fashion, whilst reducing the required number of scenarios for testing.The novel contribution of this research stems from the integration of methods used in statistics, ship hydrodynamics and big data analytics for solving a test scenarios generation problem, and includes: (a) the development of a novel automatic process for traffic scenarios generation independent from CA system with focus on scenarios that need to be tested; (b) the recommendation of novel risk metrics for use in a CA system testing; (c) the process application to the Short Sea Shipping autonomous vessel considering operations close to shore.
The remaining of the article is organised as follows.Section 2 presents the developed process methodology.Section 3 provides the investigated case study parameters.Section 4 reports and discusses the derived results.Finally, Section 5 summarises the main findings from this study and provides recommendations for future studies.

Rationale, assumptions, and steps
Considering that the potential traffic scenarios space is infinitely large, it is necessary to identify representative scenarios as well as to demonstrate how these representative traffic scenarios depict the traffic scenarios that were not accounted for.The intention of this study is also to recommend testing scenarios without considering the CA system properties under testing and the CA system evaluation.Another objective is to identify testing scenarios, which would require as limited as possible customisation of the CA system.
To achieve the set aim and objectives, this study follows the steps depicted in Fig. 1.During step 1, uniform sampling of the potential traffic scenarios over the potential traffic space parameters values is conducted.These traffic encounter scenarios can be considered as representative for all the potential traffic encounters based on the efficiency of sampling technique.
Subsequently, an automatic risk assessment is conducted in steps 2 and 3, with the support of several selected risk metrics and rule-based criteria.The risk assessment is employed to identify those traffic scenarios, which are the riskiest in terms of navigational safety, and to filter out the safe scenarios.Apparently, there is no value in testing safe traffic scenarios, which does not require actions from the CA system.It should be noted though that "safe" traffic scenarios can be selected for testing if the CA system overreacts to safe traffic scenarios, however, this is outside the scope of the present research.
In step 4, the various collision risk metrics organised in a risk vector are analysed using a clustering technique to identify similarities between the selected traffic scenarios of step 3 in terms of risks.Some additional criteria are also employed to distinguish the different traffic scenarios, such as overtaking/overtaken, crossing, and head on.Based on these similarities, different groups of traffic scenarios can be identified, which are similar in terms of safety and traffic conditions.
During the last step, a representative scenario is selected for each identified traffic scenarios group.The main criteria for selecting a representative scenario include the values in the risk vector.It is assumed that the riskiest scenario in a group/cluster is adequate to depict the CA system behaviour in other traffic scenarios belonging to the same group.Therefore, if the CA system passes the test criteria for the riskiest traffic scenario, it will also pass the criteria for the less risky traffic scenarios.In this way, reduction of the considered test scenarios is ensured, without omitting a specific category of traffic encounter scenarios.
The effectiveness of the derived results depends on the assumptions reported in the preceding paragraphs, which are typically employed in pertinent research studies.For clarity, these assumptions are summarised below to ensure better understanding of the limitations of this study and the derived results.
• The sampled traffic scenarios offer a comprehensive coverage of the potential traffic scenarios.• The selected rules for filtering the traffic scenarios are adequate and rational.• The selected risk metrics accurately represent the safety of the encounter scenarios.• Clustering and grouping of traffic scenarios are accurate.
• Holonomic models, which function under the assumption that the ships retain their speed and direction (Huang et al., 2020), are considered for the identification of the hazardous scenarios.• Use of a circular domain is considered as a reasonable approximation of the ship domain.• The riskiest scenario in each group is considered representative for each group.• The elements of the risk vector as elaborated in section 2.5 are considered as equivalently important to each other.• A ship cannot instantly change its speed to the desired level, and an arbitrarily threshold is considered as elaborated in section 2.5.

Identification of parameters influencing safety of navigation
The collision between the ships is usually an emergent event dependent on a large number of risk influencing factors that are related to route, human factors, ship design, equipment characteristics and organisational factors (Chen et al., 2019;Pedersen, 2010;Puisa et al., 2018).As the CA system reacts to the navigational factors of the route, only these factors are considered in this study.
The lists of navigational factors and pertinent metrics are provided in Table 1.These factors were derived based on the information reported in (Abebe et al., 2021;Bakdi et al., 2020;Bye and Aalberg, 2018;COL-REGS, 1972;Goerlandt et al., 2015;Li et al., 2021;Mizythras et al., 2021;Silveira et al., 2021;Woerner et al., 2019).The identified metrics that can be used to represent these navigational factors are derived based on the pertinent literature (Chen et al., 2019).These metrics and their ranges can be determined with the assistance of AIS data, meteorological data, geospatial services, operator prior knowledge, according to the methodology described in (Bolbot et al., 2022).
The first three metrics of Table 1 can be considered as outcome of several basic parameters, such as, the ships location, heading, and speed.Hence, the latter parameters (ships location, heading, and speed) are selected for the sampling, instead of the metrics of Table 1.For demonstration purposes, we also selected the environmental conditions in our sampling.For the sake of this study brevity, we omitted the other remaining parameters/metrics, which can be considered in follow up studies.

Step 1: Development of traffic scenarios using sampling techniques
Once the sea traffic parameters are specified, the traffic scenarios can be generated using any sampling technique, e.g., Sobol sequences (Sobol', 1967), random Monte Carlo sampling (Metropolis et al., 1953), Latin hypercube sampling (McKay et al., 1979).In this study, Sobol  Space required/available for ship manoeuvres Ship domain (Fujii and Tanaka, 1971;Szlapczynski and Szlapczynska, 2017) Velocity space (Degre and Lefevre, 1981;Fiorini and Shiller, 1998) Collision Threat Parameter Area ( Lenart, 1983( Lenart, , 2015) ) Velocity obstacle derivatives (Huang et al., 2018(Huang et al., , 2019 sequences were selected, as they demonstrate a number of useful properties.First, Sobol sequences provide an uniform coverage of the parameters space (Sobol', 1967).Second, they provide greater repeatability in results compared to the random and Latin hypercube sampling techniques without sacrificing accuracy, as demonstrated by several previous studies (Bolbot and Theotokatos, 2021;Burhenne et al., 2011;Kucherenko et al., 2015;Qian and Mahdi, 2020).Third, they do allow for a better repeatability of the results due to their quasi-random nature (Sobol', 1967).The sampling is implemented over a continuous and not discrete space.Therefore, for the estimation of the sampling effectiveness, it is required to discretise the space of the potential values.This is implemented with the assistance of parameter dif min , depicting the average normalised minimum difference in the parameter p l for the range (p l,max − p l,min ) with sampling over a number of p parameters.The physical meaning of dif min is also illustrated in Fig. 2.There exist several other measures for assessing the sampling uniformity, such as, discrepancy-based, point-to-point based and volumetric (Ong et al., 2012;Zaremba, 1968); however, parameter dif min was selected, as its physical meaning is more readily perceived.
The number of potential combinations T that can be sampled for p parameters, by considering the parameter dif min is estimated according to the following equation: The sampling ratio (R) (Analytic Technologies, 2010) is defined according to the following equation, as the ratio between the N sample points and the total number of points that can be generated: For an existing sample, the estimation of dif min can be approximated using the following formula: where p is the total number of parameters, x p,i,n is the i normalised sample over p parameters and j is used to denote the p points having the least distance from the ith point.
For the Sobol sequences, due their pseudorandom nature, eq. ( 3) can be simplified as follows (which was determined after a series of numerical tests):

Step 2: Identification of the hazardous scenarios
Τhe identification of hazardous scenarios is implemented by analysing the traffic scenarios generated using the sampling techniques from the previous step.
For this purpose, a number of metrics from Table 1 is employed, such as, the geometric distance (D i,k ) between the Own Ship (OS) and kth Target Ship (TS k ), the time to the closest point of approach between the OS and TS k (TCPA i,k ), the distance at the closest point of approach between the OS and TS k (DCPA i,k ), and the safety domain around the OS that is represented by a circle with radius a i,1 .These are the metrics that were employed in a considerable number of previous studies (Szlapczynski and Szlapczynska, 2017).
However, some more advanced and effective metrics, such as, the Degree of Domain Violation (Szlapczynski, 2006) and Time to Domain Violation (Lenart, 2015;Szlapczynski and Szlapczynska, 2016), which were included in Table 1 could be also employed in a follow up study.In this study, due to the use of circular domain, these metrics (Degree of Domain Violation and Time to Domain Violation) practically give the same information with TCPA i,k and DCPA i,k , as indicated by (Szlapczynski, 2006).This is not valid for cases where different domains were employed, such as, elliptical domains (Szlapczynski, 2006).This is one of the limitations of the study.
It should be noted that the use of circular domain constitutes an oversimplification and a very conservative approach to the safety domain definition.More advanced approaches define the safety domain as an ellipse (Namgung and Kim, 2021) (Coldwell, 1983;Davis et al., 1982;Fujii and Tanaka, 1971;Hansen et al., 2013;Liu et al., 2016;Pietrzykowski and Wielgosz, 2021;Szlapczynski and Szlapczynska, 2021), as block areas (Kijima and Furukawa, 2003), as quaternion (Wang, 2010), or as polygons (Bakdi et al., 2020;Hansen et al., 2013;Pietrzykowski and Uriasz, 2009;Szlapczynski and Szlapczynska, 2015).This simplification is adopted to facilitate the implementation and investigation of the overall automatic scenarios generation process.The consideration of alternative representations for the safety domain and the selection of the most appropriate is proposed as a subject for future research.A comprehensive review of safety domains and their characteristics can be found in (Du et al., 2021a;Szlapczynski and Szlapczynska, 2017).
All these metrics are valid under the assumption that the ship movements are holonomic.This assumption is not valid in real conditions, as the ships will eventually interact and communicate with each other and change their route based on the manoeuvres of the other ships (Huang et al., 2020).However, this assumption can be considered as plausible and sound approximation of traffic scenarios and traffic risks, as (a) the ships exhibit rather slow dynamics, and (b) the distance between the ships considered during the sampling is small relative to their dynamics.The calculation of more detailed safety metrics, considering ships interactions and simulations, would be rather too computationally expensive (potentially causing memory overflow issues), therefore, it would be challenging to implement the proposed analysis for a large sampling number.
The hazards identification is based on a set of criteria/rules, as presented in the pseudocode form provided in Table 2. First, for each traffic scenario generated using samples, the previously mentioned metrics are estimated.Then, for the ship that is the closest to the OS, the violation of all safety criteria/rules is examined.The following criteria/ rules are considered: • Firstly, TCPA i,k > 0, as we are not interested in scenarios where the closest encounter occurred in the past, and the ships are expected to diverge from each other, according to the holonomic assumption.• Secondly, DCPA i,k < a i,1 , as we are interested in scenarios where the two ships are anticipated to be close enough to each other in the future (TS k in the safety domain of OS k ).• Thirdly D i,k < d s , where d s is a set distance threshold, as we are interested in ships which are in proximity to each other.• Fourthly, TCPA i,k < t s , as the focus is on potential collision scenarios in the near future, depicted using t s as threshold.• Lastly, D i,k > a i,1 , as we are not interested in the scenarios, where TS k is already in the safety domain of OS k and the collision is considered as unavoidable.
The use of these rules is quite similar to the ones employed in (Goerlandt et al., 2015;Szlapczynski and Szlapczynska, 2017) for the risk assessment purposes; it is much simpler though herein.This can be attributed to the fact that the rules are used to filter out the scenarios on a high level, whilst a more detailed analysis and grouping is implemented in step 4.
The values of d s and t s are set to 1 nm and to 10 min in line with the information provided in (Namgung and Kim, 2021) and (Rødseth et al., 2020).The d s can be set to 3 nm as proposed in (Namgung and Kim, 2021), 2 nm as reported in (Federation, 2021), or 12 nm, which is the normal range of radar operation (Du et al., 2021b) and is also influenced by the operational area size and characteristics, such as the presence of shore.Other studies proposed the use of 15 min (Federation, 2021), 20 min, or 30 min for the t s , and 3 nm whereas 6 nm for d s depending on the traffic scenarios (Wang et al., 2021).A constraint that needs to be considered is the satisfaction of the holonomic assumption, therefore the values of d s and t s should not be too large.
When the shore is considered, the same set of criteria can be employed.The difference is that the shore is considered as an aggregation of different points with zero speed and the compliance against the set criteria is verified by considering if any of the shore points satisfies these criteria.The shore characteristics are represented using the shore point with the least DCPA in the next step.A graphical explanation is also provided in Fig. 3.
The equations for estimating the employed safety metrics are provided in eqs.( 5)-( 9), where (x, y) denote the location coordinates for each ship, u is speed in m/s, φ is the ship direction in degrees, 0 and k denotes the OS and TS k , respectively based on (Namgung and Kim, 2021;Pedersen et al., 2020;Woerner, 2014).
The radius of the safety domain is calculated by the following equation (Namgung and Kim, 2021): where L is ship length in (m).
The advantage of eq. ( 10) is that it renders the safety domain dependent not only on generic manoeuvrability characteristics expressed by the ship length, but also on the ship speed.

Step 3: Risk vector calculations
For the hazardous scenarios identified in the previous step, a risk vector with relevant risk metrics is calculated.The purpose of the risk vector is to characterise the risk type of the investigated hazardous scenarios, to support the scenarios clustering in the next step, and to rank the different hazardous scenarios.The navigational risk metrics estimated in this section include the normalised versions of the metrics presented in Table 1 to allow for the comparison of the selected scenarios on the same scale.The normalisation is implemented to have values in the range from 0 (if the risk metric does not reveal any hazardous condition) to 1 (indicating that collision is imminent).Exponential relationships between the metrics of Table 1 and the used risk metrics was preferred.This is implemented to assign higher values to those scenarios, which are closer to collision.
Based on the above considerations, the risk vector (RV) for each traffic scenario includes the components according to eq. ( 11).

Table 2
Pseudocode for hazardous scenarios identification.
Algorithm: Hazardous scenarios identification 1: Procedure: Hazardous scenarios identification 2: Input: Potential traffic scenarios generated using Sobol sequences in previous step 3: For i = 1:N% for all the sample points 4: Estimate D i,k %distance between OS and TS k TCPA i,k %time of closest approach DCPA i,k %distance of closest approach ai,1 %estimation of safety domain radius for OS 5 Find min (D i,k ), i = constant 6: For Scenario should be considered as hazardous Store the scenario parameters 8: End for D i,k 9: End for i 10: End procedure Fig. 3. Explanation for shore treatment.
By setting the elements in a vector, an implicit assumption is introduced, which is that the elements of vector are equally important with each other.
The DCPAn k and TCPAn k represent the normalised version of DCPA and TCPA for TS k ship (or shore) based on the selected safety domains and are estimated according to eqs. ( 12) and ( 13).
The physical meaning of eq. 12 and 13 is as follows.If the closest point of approach with the ship TS k is in the past or too far away in the future, ship TS k does not contribute to the risk (relevant value 0).If the TCPA k →0 from positive values of TCPA, as we already excluded TCPA<0, then the TCPA contributes the most to the risk (relevant value 1).For the intermediate values an exponential relationship is used to put greater emphasis to the conditions where the TS k is closest to the OS.If TS k lies within the safety domain, then the risk from DCPA k becomes maximum (relevant value 1).The larger the DCPA k value, the smaller is its contribution to the risk of collision.The parameter d r is a parameter depicting the distance for which the risk metric is estimated, which is set to 3 nm in line with (Wang et al., 2021).The multiplication between DCPAn k and TCPAn k is implemented to emphasise that the closer an encounter is in distance and in time, the higher is the risk of collision with the ship TS k .
It should be noted that the above risk metrics are relevant only when considering ships independently from each other.To address potential interactions between the ships, the risk metric, A, is calculated, which depicts the percentage of the area in the u − φ (speed-bearing) space that is not allowed for manoeuvres.A is calculated using concepts from velocity obstacle algorithms (Degre and Lefevre, 1981;Fiorini and Shiller, 1998) according to the following equations: where AC is the area in the u − φ space that is not available for safe navigation as collision with the ship TS k can occur according to the holonomic hypothesis.Therefore, AC is defined as: where N v denotes the number of ships interacting with the own ship, whereas A k denotes the area in the u-φ space where collision of the OS and the TS k occurs.This area is also graphically depicted in Fig. 4 as the intersection between the highlighted triangles and the circle.The metric A is used to depict the traffic complexity and the area that is not available for the safe manoeuvring of the own ship.
The normalised A n is calculated according to: The exponential normalisation is used to give greater emphasis on values of A n closer to 1.The parameter h u d is estimated according to: , if the OS speed vector lies within the AC area 0, otherwise where u d is the minimum distance between u(φ) ̅̅→ and the safe area, as depicted in Fig. 4.This metric is used to depict the easiness with which the ship can change its speed and find itself in a safe combination of speed and direction values.The normalisation with 0.25 u 0,max is implemented in line with the assumption that a ship cannot instantly change its velocity to the desired level.Herein, this threshold is set arbitrarily for simplicity reasons.Typically, it depends on the ship manoeuvrability characteristics (rudder size, engine power, length, etc.) and available manoeuvring time.
The normalised parameter h n is calculated by the following equation: The last metric of the risk vector is used to account for the weather conditions during manoeuvring and is estimated according to: where H s is the wave height that is considered for the selected scenario, V c is the current speed, V w is wind speed, and max denotes the maximum value of H s , V c and V w .The normalised parameter W n is calculated by: The risk metrics presented herein are novel and to the best of the authors' knowledge, have not been employed in previous studies.These metrics could be also employed in real CA system for the risk estimation.However, other metrics potentially could be considered, such as obedience to COLREGs of the present encounter traffic scenario (Du et al., 2021b), metrics having sinusoidal and parabolical relationship with DCPA, TCPA, distance (Abebe et al., 2021), metrics having polynomial relationship with DCPA and TCPA (Kearon, 1997;Park and Jeong, 2021), metrics having exponential relationship with DCPA and TCPA (Mou et al., 2010), risk metrics for the Degree of Domain Violation (Szlapczynski, 2006), the Time to Domain Violation (Lenart, 2015;Szlapczynski and Szlapczynska, 2016), or alternative approaches for risk estimation (Chun et al., 2021;Ha et al., 2021).However, they are outside the scope of this study, as the aim is to demonstrate the functionality of the proposed process for identify testing scenarios.As there are no definite guidance with respect to which risk metrics must be selected in the maritime community, and it constitutes an area of an intense research efforts, the conclusive selection of the appropriate risk metrics is left as a subject for future work.
Since the risk vector is multi-dimensional, it cannot easily be visualised.In order to enable its visualisation, the t-distributed stochastic neighbour embedding (t-SNE) algorithm is used (Hinton and Roweis, 2002;Van der Maaten and Hinton, 2008).This algorithm first calculates a probability distribution that represents similarities between neighbouring points, (the conditional probability that a point would pick another point as its neighbour), and subsequently defines a similar probability distribution over the points in the low-dimensional mapping.Finally, the t-SNE algorithm minimises the Kullback-Leibler divergence (Kullback, 1968) between the two distributions to eventually derive the final mapping.

Step 4: Equivalence classes identification
In this step, mathematical tools of unsupervised clustering analysis are employed for the determination of equivalence classes (Bhat and Quadri, 2015).For the purpose of this study, the k-means clustering is selected (Cheng, 1995;Fukunaga and Hostetler, 1975;MacQueen, 1967).This algorithm aims to partition any available observations into k clusters, in which each observation belongs to the cluster with the nearest mean.The parameter k, (denoting the number of clusters) is user selectable.
The following two approaches were used to identify the optimal number of clusters in the available dataset: the elbow method (Mac-Queen, 1967) and the silhouette method (Steinhaus, 1956).The elbow method involves the calculation of within-cluster error sum of squares (SSE) and identifying the number of clusters after which the SSE stops reducing rapidly.The relevant equations for estimating SSE can be found in (Stanford, 2022).
The silhouette method follows a similar working principle, however it employs the silhouette score to quantify the similarity of data points within a cluster and their distance to data points that belong to other clusters.In this way, the silhouette method calculates how similar an object is to its own cluster (cohesion) compared to other clusters (separation) (de Amorim and Hennig, 2015;Rousseeuw, 1987).The silhouette score ranges from −1 to +1, where a high value indicates that the object matches to the dedicated cluster (de Amorim and Hennig, 2015; Rousseeuw, 1987).For a clustering method to be considered effective, most of the points must have a high value.The mean of silhouette score is used to assess the clustering technique quality.The silhouette score can be calculated as in (de Amorim and Hennig, 2015;Rousseeuw, 1987).
The number of cluster zones is determined according to the following formula: where s denotes the number of employed shore points.
The rationale behind eq. ( 22) is as follows.When a number of k TSs are in proximity to the OS, the OS may collide either with one TS, or with any combination of two or more TSs.Eq. ( 22) is developed by considering that the potential to collision is expressed using binary values (Yes or No).The subtraction of 1 is used to address the scenario when there is no collision with any TS or shore (all values are No).The multiplication with 3 k is followed to distinguish if the scenario with the kth TS is classified as crossing, following or heading.This distinction of traffic conditions is reported in (Namgung and Kim, 2021).
Another indication of the potential cluster number is associated with the risk vector (RV) length.By considering that each risk value in the RV can either have a high or low value (binary assumption) and are independent from one another, the total number of clusters can be calculated by: The subtraction of 1 is employed to account for conditions where all the RV elements exhibit low values.These conditions are already filtered using the rules in step 2 of the proposed process.

Step 5: Scenarios selection
Once the equivalence classes (clusters or groups) have been identified, for each class (cluster/group), the sample with its RV closest to the mean RV value of the whole class (cluster/group) is used as representative.However, this does not guarantee to provide the riskiest scenario in the cluster.For this reason, in line with the boundary value analysis (Bhat and Quadri, 2015), the most critical scenario is determined by considering the total risk (TR), which is calculated by: Practically, the average of the RV elements in eq. ( 11) is considered as the risk measure for each scenario, without assigning any weight to the risk vector elements.In this way, more complex scenarios are associated with higher risk.It is assumed that if the collision is avoided for a scenario with high risk, it will also be avoided for the scenarios with lower risk in the same group.

Investigated case and selected parameters
This study considers the case of the sort ship shipping (SSS) cargo ship as the own ship (OS), which is employed at the AUTOSHIP project (AUTOSHIP, 2019).The OS is considered to operate outside the coasts of Norway, and interacts with a high speed craft (TS 1 ) and a sailing boat (TS 2 ).
The input parameters for the investigated scenarios are provided in Table 3.The random parameters with their ranges are provided in Table 4.These 18 parameters are assumed to vary from 0 to their maximum value and are sampled using the Sobol technique as described in section 2. The test area is set to 3 nm × 3 nm in line with (Namgung and Kim, 2021).A shore is also considered, which is represented by a simple spline line.It should be noted that both the sailing boat and the high-speed craft are not required to have an AIS transponder.Hence, they are not visible on the AIS data, which justify the developed approach use.

Step 1: Development of traffic scenario using sampling techniques
A typical sampled traffic scenario is provided in Fig. 6.This is a nonhazardous traffic scenario for the OS, which sails far away from the highspeed craft (TS 1 ) whereas the sailing boat (TS 2 ) slowly approaches the OS.The shore boundary was not shown for demonstration purposes.This scenario can potentially occur during navigation.Therefore, it can be concluded that the artificial sampling can generate realistic traffic conditions, which can be categorised as both safe and unsafe (several unsafe scenarios are shown in the next figures).As the OS is sampled uniformly over the investigated location space (the same applies for both target ships), the process generates scenarios, in which no hazardous situations are observed.This can be viewed as a drawback of the process, as it results in a certain degree of overhead, which can be addressed in a future study.
The sampled positions for the OS (without presenting the target ships TS 1 and TS 2 ) with N = 100 (Sobol samples number) are presented in Fig. 5.This figure was developed by illustrating only the OS positions, speed vectors and safety domains for the sampled scenarios.As it can be observed, the Sobol sampling method provided a uniform coverage of the OS positions, speeds and speed directions in the selected area.This is in line with findings from previous studies (Bolbot and Theotokatos, 2021;Burhenne et al., 2011;Kucherenko et al., 2015;Qian and Mahdi, 2020), which justifies the use of the Sobol sampling compared to alternative sampling techniques.The estimations for the parameter dif min , the total number of scenarios, and the sampling ratio for several values of the samples number are provided in Table 5.As it can be observed, the average normalised minimum difference between points (dif min ) reduces with increasing samples numbers, however, the sampling ratio remains approximately the same.This also means that with a specified dif min , the Sobol sequence method requires far less samples     than the crude sampling of all potential combinations for the desired parameters, thus considerably reducing the number of scenarios.Due to the computational limitations of the employed hardware, this study considers the Sobol samples generated using N = 100, 1000, and 10,000.
It should be also noted that the estimated sampling ratio, dif min and T depend on the number of parameters p, as was revealed based on the calculations presented in Table 6.The sampling ratio R decreases whereas the parameter dif min increases with increasing number of parameters.This results in the following two conclusions.First, the benefit from using sampling increases with the number of parameters, as the sampling ratio decreases for the same dif min compared to the exhaustive combinations number.Secondly, the coverage of the potential parameters becomes scarcer, as p and dif min increase, which means that the sampling results in higher uncertainty.

Step 2: Identification of the hazardous scenarios
The number of hazardous scenarios found using the rules from the second step is provided in Table 7.As the number of samples increases, the identified number of hazardous scenarios increases accordingly.However, it can be observed that the percentage of identified hazardous scenarios does not significantly change with the sample size.This indicates that there should be a similarity between the different scenarios, therefore their grouping in clusters is possible.However, it should be noted that the required computational time on a standard desktop computer increases exponentially, with the number of samples, which constitutes an important limitation.For this reason, also the estimation of safety metrics based on the holonomic assumption is necessary, whereas the interactions between ships need to be left outside the analysis scope, unless a supercomputer or work/station is employed.

Step 3: Risk vectors calculations
As mentioned in methodology section, for each hazardous scenario a risk vector is calculated.The risk vector has 6 elements, as two ships and one shoreline are considered.As the visualisation of the estimated risk vector in a multi-dimensional space is impossible, the t-distributed stochastic neighbour embedding (t-SNE) algorithm is applied to reduce the dimensionality of the data and allow for their plotting.The results of this algorithm are presented in Fig. 7.The colour scale represents the normalised Euclidean distance of each point from the origin, prior to the application of the t-SNE algorithm.This is done to enable the visual evaluation of the algorithm results.The visualisation demonstrates that there are distinguished groups of potential traffic scenarios, which can be identified using the clustering techniques.

Step 4: Equivalence clusters identification
For the range of the considered sample sizes, it was investigated how the selected cluster number affects the performance metrics, in specific the sum of squared error (SSE) and the silhouette score, described in section 2.6.These results are provided in Fig. 8.The SSE metric rapidly reduces with the cluster number, whereas the relationship between cluster number and SSE is monotonic.The SSE also asymptomatically tends to 0 with increasing cluster size, which indicates that after a specific cluster number, the improvement in SSE is limited.
The response is different for the silhouette score, where its optimal or convergent values are exhibited.This can be justified by the fact that the grouping converges to well-defined groups.This is anticipated as described in section 2.6, and the expected cluster number for the investigated case study is between 63 according to eq. ( 22) and 576 according to eq. ( 23).The silhouette score is constantly on the positive metric side, which indicates that the convergence progresses relatively well.
Based on the optimal values of the silhouette score for the minimum number of clusters, the selected cluster number is provided in Table 7.The results demonstrate that as the number of samples increases, the optimal number of clusters for each sample does not increase in the same manner, but instead also demonstrates a trend towards convergence.This constitutes another evidence that the grouping of scenarios in clusters is possible.However, further investigation is required by using a  V. Bolbot et al. work station of greater computational capacity, so that increased sample sizes are evaluated.

Step 5: Scenarios selection
The selected scenarios for N = 10,000 are presented for some of the equivalent classes/clusters/groups in Fig. 9.This clustering is selected, as it is the closest to the minimum number of classes/clusters according to eq. ( 22) (C N = 63).The selected scenarios depict the different traffic conditions.Some of the traffic encounters are simple, as depicted for the case of Sobol numbers of 7080 and 4640.However, some include both target ships in the proximity to OS, such as, the traffic conditions generaed using Sobol numbers of 795, 831, and 4183.This is in line with our expectations, since, as elaborated in Section 2.6, we anticipate interactions with both TS.Some of these scenarios demonstrate high complexity and total risk (TR) value, whilst others exhibit lower risk and complexity, however they are different in nature.It should be noted that the present classes/clusters/equivalence groups have been identified based on the considered risk vector and risk metrics.It is expected that the inclusion of additional risk metrics would result in different identified scenarios.This constitutes a subject for future research.Additionally, the Sobol numbers are included in Fig. 9 plots for enhancing traceability.In another case study considering other ships (types and numbers), different Sobol samples are required.

Discussion and limitations
The main advantage of the presented process is that it follows a deductive, and not inductive, logic compared to the traffic scenarios generated by using the AIS data.It starts by considering the whole set of potential traffic scenarios, and progresses to more specific scenarios, which are not expected to be the most frequently occurring, as it is expected when AIS data is used.This is a more robust process compared to the case of deriving the potential traffic scenarios based on AIS data.The identification and testing of scenarios that have not been encountered in the past, but might occur in the future, belongs to the 'known unknowns' region of the Johari window (Luft and Ingham, 1961).The consideration of these types of scenarios during testing will contribute to the safety assurance of the CA system.
The reduction of the potential scenarios is achieved by using more effective coverage of the traffic scenarios (use of sampling techniques during step 1), by excluding non-hazardous scenarios (application of rules in step 2), by grouping the scenarios based on similarity, as well as by using clustering algorithms and selecting the riskiest scenario (steps 3-5).A similar approach, with some modifications could be applied by using AIS data, as demonstrated in (Gao and Shi, 2020;Goerlandt et al., 2017;Han et al., 2021;Kulkarni et al., 2020; Mieczyńska and V. Bolbot et al. Czarnowski, 2021;Mou et al., 2010) for other types of problems; however, it would result in identifying different scenarios due to the use of different input.An important advantage of the proposed process is that it can generate data for the ships, for which AIS equipment is not required, and therefore, no AIS data are collected, e.g., sailboats and leisure high speed crafts.This is important, as these types of ships constitute an important source of hazards for MASS (Bolbot et al., 2021a(Bolbot et al., , 2021c)).
The proposed process was demonstrated by considering only 3 ships.Essentially, the analysis needs to be repeated by considering a number of other ships, ship types and shore types to derive a set of testing scenarios.This alone contributes to the complexity of the problem, and the reduction in the considered scenarios is therefore useful.Testing of 60 scenarios for a combination of OS with two TS and shore is far less stricter compared to the requirements reported in (Torben et al., 2022), where more than 250 testing scenarios had to be simulated for a simple head-on encounter situation with one ship, without considering the impact of weather conditions and the presence of other ships/shore.
The process is rather flexible and allows for using alternative methods.Advanced sampling techniques can be employed in step 1, more elaborate sets of rules compared to the ones employed in step 2 can be considered, advanced risk metrics based on the Degree of Domain Violation (Szlapczynski, 2006) and the Time to Domain Violation (Lenart, 2015;Szlapczynski and Szlapczynska, 2016) can be included in step 3, more effective clustering techniques can be employed in step 4, whereas and multiple criteria can be employed in step 5 for selecting the scenarios for testing.The presented process can be further enhanced, and its adaptations constitute a subject of further research.However, we would advocate strongly against the use of Latin Hypercube samples and random samplings, as the Sobol sampling offers much better results performance and stability (Bolbot and Theotokatos, 2021;Burhenne et al., 2011;Kucherenko et al., 2015;Qian and Mahdi, 2020).
The selected safety parameters, such as, d s , t s , the safety domain and the associated risk metrics, constitute critical decisions for the proposed approach, as they influence the identified scenarios and their clusters.The plethora of approaches for defining the risk metrics and safety domains reported on the pertinent literature poses a considerable challenge.Standardisation in this area is needed for defining internationally agreed COLREGs requirements for the MASS as well as to promote the use of MASS technologies.
Further investigation is required for defining acceptable values for the parameter dif min and the sampling number N, as they influence the number of identified traffic scenarios.As was demonstrated in the preceding sections, there is an indication that the optimal zones number/ metrices converge with the increased sample size, which constitutes an important factor for selecting the sample number in future studies.
An important weakness to the analysis stems from the holonomic assumption use, as the previous ship track and the potential future interactions are not considered during all the process steps, such as during the risk metrics calculations.The use of holonomic assumption was necessitated by the limitations in the computational capacity.It is left as a subject for future research to consider the incorporation of these factors to greater detail.It is expected though that a dense sampling, as demonstrated herein, might be sufficient for effectively identifying the critical scenarios.
Lastly, the application of the process revealed that there are fundamental mathematical problems that need to be addressed.Firstly, we could not locate an analytical expression for calculation of the parameter dif min in the Sobol sequences, although we could identify some other metrics representing the sampling uniformity coverage.Additionally, there is a need for a rigorous mathematical proof that the successful testing of the riskiest scenarios can be extrapolated with a specific confidence level to the less risky scenarios.Proving mathematically that the process converges and identifying the conditions for which the process converges are areas for future research.

Conclusions
In this study, a process for identifying a reduced number of traffic scenarios to be considered during the testing of CA system was proposed.This process was based on several steps that include sampling, filtering, and clustering.The process was applied to a cargo ship operating in close proximity to shore, demonstrating a significant reduction in the identified number of scenarios that can be selected for testing either in a virtual environment or full-scale trials.The main findings of this study are summarised as follows.
The proposed process constitutes an effective tool for identifying traffic scenarios and replacing the AIS data, especially for testing the CA system interactions with target ships, AIS data for which is not available.
The process offers a significant reduction in the considered traffic scenarios compared to the crude testing, through the use of sampling techniques especially with increasing number of simulated parameters, without sacrifice in sampling coverage.The process results in much fewer scenarios for testing compared to some other automatic scenario generation schemes.
The categorisation of the traffic scenarios based on their navigational risk metrics using clustering techniques into well distinguished groups was effectively demonstrated.
Identification of complex traffic scenarios that might not be considered by a human, where multiple ships are on a collision track was demonstrated.
The limitations of this study include the following: (a) The derived results depend on the selected criteria/rules, the navigational risk metrics and the clustering algorithm settings; careful selection of these parameters is required to avoid inconsistency of the identified scenarios; (b) Computational limitations necessitate the use of holonomic assumption for the ship motions; advanced computers are needed to facilitate cases with considerable sample sizes.
It is expected that the proposed process for the automatic traffic scenarios identification constitutes a useful tool in the hands of the CA systems designers and certifiers.However, as also highlighted in the preceding sections, there is a potential for the process further improvement.Future studies are recommended to explore: (a) the standardisation of the navigational risk metrics and the considered safety thresholds, (b) the consideration of alternative safety domains (e. g., COLREGs-reflecting domains) and advanced metrics (e.g., time to domain violation and degree of domain violation), (c) complex scenarios with more than two target ships.

Acknowledgement
The study was carried out in the framework of the AUTOSHIP project, which is funded by the European Union's Horizon 2020 research and innovation programme under agreement No 815012.The authors also greatly acknowledge the funding from DNV AS and RCCL for the MSRC establishment and operation.The authors acknowledge the feedback received from Kongsberg Maritime, SINTEF Ocean and Bureau Veritas.The opinions expressed herein are those of the authors and should not be construed to reflect the views of EU, DNV AS, RCCL, Kongsberg Maritime, SINTEF Ocean, Bureau Veritas or the other involved partners in the AUTOSHIP project.

Fig. 2 .
Fig. 2. Sampling space for two parameters (p 1 , p 2 ) and physical meaning of parameter dif min with uniform sampling.

Fig. 7 .
Fig. 7. 2D visualisation of the risk vectors of each scenario through the t-SNE algorithm.

Table 3
Input parameters.

Table 4
Random parameters.

Table 5
Estimation of dif min , T and R, when p = 18 for several Sobol sample numbers.

Table 6
Estimation of dif min , T and R, with p varying and N = 10,000,000.

Table 7
Number of hazardous scenarios and clustering zones.

A. Nomenclature and abbreviation list
Radius of safety domain around the OS (m) A % of area in the u − φ (speed-bearing) space (-) AC area in the u − φ (speed-bearing) space (m/s degree) C N Minimum number of cluster zones (-) u d Minimum distance between u(φ) ̅̅→ and the safe area in the u − φ (speed-bearing) space (m/s) d s Distance threshold (nm) D i,k Distance between OS and TS k ship for i sampled scenario (m) DCPA i,k Distance at the closest point of approach between the OS and TS k (m) DCPAn k Normalised version of DCPA i,k (-) dif min Average normalised minimum difference in parameter p l of the range (p l,max − p l,min ) h u d Normalised version of u d (-) Time to the closest point of approach between the OS and TS k (min) TCPAn k Normalised version of TCPA i,k (-)