Predictive Vehicle Safety—Validation Strategy of a Perception-Based Crash Severity Prediction Function

: Trafﬁc accident avoidance and mitigation are the main targets of accident research and vehicle safety development worldwide. Despite improving advanced driver assistance systems (ADAS) and active safety systems, it will not be possible to avoid all vehicle accidents in the near future. Innovative Pre-Crash systems (PCS) should contribute to the accident mitigation of unavoidable accidents. However, there are no standardized testing methods for Pre-Crash systems. In particular, irreversible Pre-Crash systems lead to great challenges in the veriﬁcation and validation (V&V) process. The reliable and precise real-time crash severity prediction (CSP) is, however, the basic prerequisite for irreversible PCS activation. This study proposes a novel validation and safety assessment strategy for a perception-based crash severity prediction function. In doing so, the intended functionality, safety and validation requirements of PCS are worked out in the context of ISO 26262 and ISO/PAS 21448 standards. In order to reduce the testing effort, a real-data-driven scenario-based testing approach is applied. Therefore, the authors present a novel unsupervised machine learning methodology for the creation of concrete and logical test scenario catalogs based on K-Means++ and k-NN algorithms. The developed methodology is used on the GIDAS database to extract 35 representative clusters of car to car collision scenarios, which are utilized for virtual testing. The limitations of the presented method are disclosed afterwards to help future research to set the right focus.


Introduction
Based on a World Health Organization (WHO) report, traffic accidents cause approximately 1.35 million road deaths every year and rank eighth among the causes of death worldwide. In the ages between 5 and 29 years, traffic accidents are the most common cause of death [1]. To address this problem, WHO has released a global plan for a decade of action for road safety. The United Nations General Assembly Resolution 74/299 recognizes the importance of global road safety and aims to improve it [2]. In addition to legal initiatives, the innovative vehicle safety systems contribute to increasing road safety. Knowledge of the exact accident events enables targeted development and the testing of driver assistance and vehicle safety systems. Passive safety systems, such as passenger airbags and seat belts, are designed to mitigate the consequences of unavoidable accidents for vehicle occupants. Active safety systems, such as autonomous emergency braking (AEB), primarily pursue the goal of accident avoidance. Even if active safety systems cannot prevent every accident, they can contribute to accident mitigation, for example by reducing the collision velocity. However, since not every detected collision can be prevented, further possibilities are being sought to optimize occupant protection. Integrated safety systems are intended to combine the benefits of active and passive vehicle safety and minimize the occupant injury risk.
The current trend in vehicle safety development is moving in the direction of integrated Pre-Crash systems (PCS). PCS are activated before the crash, in the so-called Pre-Crash phase, and require a highly precise and reliable real time prediction of the potential crash [3]. Figure 1 shows the five phases of traffic accidents from regular driving to Post-Crash phase. Each of these phases is addressed by different driver assistance, active, passive or integrated safety systems. Moreover, relevant variables within the five phases, which are used for accident analysis and further test scenario development, are illustrated. A conventional In-Crash restraint activation strategy is based on a measurable technical crash severity, such as a crash pulse or a pressure increase for side impact collisions, during the In-Crash phase of the accident. Acceleration sensors and pressure sensors in the vehicle are conventionally used for this purpose. Pre-Crash safety systems make the decision to trigger based on detected Pre-Crash and real-time predicted In-Crash conditions. For this purpose, data from the environmental sensors must be accessed. The prediction could be performed either individually for each Pre-Crash system or by a central function for crash prediction (CP) and crash severity prediction (CSP). In this paper, the central crash severity prediction function is considered as a prerequisite for PCS activation. Appl. Sci. 2023, 13, x FOR PEER REVIEW 2 of 20 Integrated safety systems are intended to combine the benefits of active and passive vehicle safety and minimize the occupant injury risk. The current trend in vehicle safety development is moving in the direction of integrated Pre-Crash systems (PCS). PCS are activated before the crash, in the so-called Pre-Crash phase, and require a highly precise and reliable real time prediction of the potential crash [3]. Figure 1 shows the five phases of traffic accidents from regular driving to Post-Crash phase. Each of these phases is addressed by different driver assistance, active, passive or integrated safety systems. Moreover, relevant variables within the five phases, which are used for accident analysis and further test scenario development, are illustrated. A conventional In-Crash restraint activation strategy is based on a measurable technical crash severity, such as a crash pulse or a pressure increase for side impact collisions, during the In-Crash phase of the accident. Acceleration sensors and pressure sensors in the vehicle are conventionally used for this purpose. Pre-Crash safety systems make the decision to trigger based on detected Pre-Crash and real-time predicted In-Crash conditions. For this purpose, data from the environmental sensors must be accessed. The prediction could be performed either individually for each Pre-Crash system or by a central function for crash prediction (CP) and crash severity prediction (CSP). In this paper, the central crash severity prediction function is considered as a prerequisite for PCS activation. In order to release a safety relevant Pre-Crash system, international safety standards for road vehicle systems development and validation must be met. The PCS must be proven at least as safe as conventional safety systems during the validation process. The decisive factor here is the performance of the crash severity prediction function in different situations. For validation of the CSP function, a data-driven scenario-based testing approach is applied. In general, synthetic scenarios can be generated for all conceivable traffic situations within the defined parameter space. However, the phenomenon of parameter space explosion in the scenario generation for highly automated driving systems was identified in the PEGASUS project [4]. It is not achievable to test all possible scenarios within the parameter space ranges [4]. As a consequence, representative and relevant scenarios for the Pre-Crash system under test should be identified to reduce the test effort. Since no standardized testing scope for the Pre-Crash systems based on crash severity prediction performance exist, the authors propose a novel validation strategy. In order to release a safety relevant Pre-Crash system, international safety standards for road vehicle systems development and validation must be met. The PCS must be proven at least as safe as conventional safety systems during the validation process. The decisive factor here is the performance of the crash severity prediction function in different situations. For validation of the CSP function, a data-driven scenario-based testing approach is applied. In general, synthetic scenarios can be generated for all conceivable traffic situations within the defined parameter space. However, the phenomenon of parameter space explosion in the scenario generation for highly automated driving systems was identified in the PEGASUS project [4]. It is not achievable to test all possible scenarios within the parameter space ranges [4]. As a consequence, representative and relevant scenarios for the Pre-Crash system under test should be identified to reduce the test effort. Since no standardized testing scope for the Pre-Crash systems based on crash severity prediction performance exist, the authors propose a novel validation strategy.

Pre-Crash Systems in Compliance with International Safety Standards
The concept of perception-based Pre-Crash safety systems goes back at least two decades. A few examples of potential Pre-Crash systems are the seat belt pre-tensioner [5], the airbag preset [5], unavoidable collision mitigation through crash constellation optimization [6] and Pre-Crash seat adjustment in highly automated vehicles [7]. Pre-Crash sensing based on a radar sensor is described by Moritz [3]. The necessity to reproduce real world collisions to test a Pre-Crash prediction performance on representative scenarios was recognized by [5]. Due to safety risks and high costs, crash prediction cannot be tested on a sufficiently high number of real crash tests. Reference [5] presents a scenario-based Pre-Crash system validation approach in a fully virtual environment. The predicted parameters in this study are the collision unavoidability, relative collision velocity, impact zone and impact angle, but not the technical crash severity. A real time technical crash severity estimation based on predicted crash constellation was presented by [8]. The approach combines machine learning and a 2D mass-spring-damper model to predict the distribution of expected crash pulses after the crash. Mages, Seyffert and Class show the benefit of a reduced occupant forward displacement for reversible Pre-Crash seat belt pre-pretensioning (PPT) in comparison with a conventional belt system [9]. The effects of automated emergency braking (AEB) in combination with PPT forces of 0, 300 and 600 N, in high severity frontal crashes, were examined by [10]. The rib fracture risk has been found to reduce in proportion to the PPT force increase. Nevertheless, even the non-invasive use of the crash severity prediction function can lead to an increase in road safety. A method of preventive identification of accident black spots using the CSP function and four approaches to road safety increase is presented by Putter [11].
A holistic top-down approach in which occupant protection is paramount was presented by [12]. This was to be achieved through perception-based crash prediction and the optimization of restraint strategies. To test the effectiveness of innovative restraint strategies, relevant accident test cases were extracted from the GIDAS (German In-Depth Accident Study) database. Subsequently, finite element method (FEM) vehicle to vehicle collision simulations and vehicle occupant simulations, including triggering the restraint systems, are carried out with conventional and optimized restraint strategies, which require a predictive estimation of crash severity. By analyzing the injury values, the safety potential of optimized restraint strategies is demonstrated [12]. The question of a reliable validation concept for a crash severity prediction function still remains open.
Safety relevant electrical and electronic (E/E) road vehicle systems are developed to meet international functional safety requirements which are defined by the ISO 26262 standard [13]. The standard considers systematic and random faults on system, hardware and software levels, to determine whether the system meets functional and technical safety requirements. As a part of this process, hazard analysis and risk assessment (HARA) is performed to identify potential hazards and risks in the operating conditions. Based on HARA, the system is classified with an automotive safety integrity level (ASIL). ASIL classification helps to specify the safety requirements of the system and identify the tolerable risk as well as the accepted probability of system failure [13]. Thus, ASIL can be understood as a risk classification, which is a function of the three factors: Severity of possible injuries (S), probability of Exposure (E) and Controllability (C). The lowest ASIL safety relevant classification is A and the highest is D. Additionally, there is a QM level which means that the system has a low safety relevance and the risk should be considered during quality management process.
In addition to ISO 26262, ISO/PAS 21448 focuses on the safety of the intended functionality (SOTIF) of road vehicles and addresses functional insufficiencies of the system on the vehicle level as well as foreseeable misuse [14]. SOTIF standards are not intended to replace the established standards, but to enhance them, especially for advanced driverassistance systems (ADAS) and autonomous driving (AD) development. It considers system boundaries, in particular sensor, controller and actuator boundaries. For this purpose, the intended functionality of the system is defined and the SOTIF HARA, which can identify additional hazards and risks to ISO 26262, is performed. SOTIF also focuses on scenario-based testing and classifies the relevant use cases into four categories: known safe scenarios, known unsafe scenarios, unknown unsafe scenarios and unknown safe scenarios.
While known safe scenarios should be maximized, known unsafe scenarios should be minimized through functional modification and unknown unsafe scenarios should be identified through test data acquisition [14]. Since Pre-Crash systems are perception-based and safety-relevant systems, development according only to ISO 26262 is not sufficient, and the international standard ISO/PAS 21448 should be taken into account [12].
A Schematic representation of the multi-level crash prediction function and a generic Pre-Crash system is shown in Figure 2. The crash prediction function estimates the probability of the collision, the time to collision (TTC) and the collision configuration for both host and opponent vehicles based on sensors' perceptions and real-time data processing. The collision configuration is described by In-Crash variables at the time t 0 , the time of the first impact. In-Crash variables, such as impact zone or impact points, impact angle and collision velocity for both accident participants, define the collision configuration. Crash severity prediction function is designed to predict the technical crash severity, which can be defined by various crash severity parameters, such as crash velocity, crash pulse, delta-v, occupant load criterion (OLC), energy equivalent speed (EES), intrusion zone and depth and others. The reliable and accurate prediction of relevant crash severity parameters enables Pre-Crash activation of irreversible restraint systems. However, activation of QM systems is possible based only on crash prediction without the associated crash severity. Once the targeted PCS is classified as ASIL A+ (ASIL of at least A or higher), system activation is inconceivable without an additional crash severity estimation. Collision constellation optimization based on crash prediction should contribute to the lowest injury severity within the possible collisions. A machine learning approach to estimate the trajectory of the lowest crash severity for unavoidable collisions is shown in [6]. However, the injury severity of the occupants is unknown even for the host vehicle, as there are no real-time models for the predictive estimation of injury severity.
It considers system boundaries, in particular sensor, controller and actuator boundaries. For this purpose, the intended functionality of the system is defined and the SOTIF HARA, which can identify additional hazards and risks to ISO 26262, is performed. SOTIF also focuses on scenario-based testing and classifies the relevant use cases into four categories: known safe scenarios, known unsafe scenarios, unknown unsafe scenarios and unknown safe scenarios. While known safe scenarios should be maximized, known unsafe scenarios should be minimized through functional modification and unknown unsafe scenarios should be identified through test data acquisition [14]. Since Pre-Crash systems are perception-based and safety-relevant systems, development according only to ISO 26262 is not sufficient, and the international standard ISO/PAS 21448 should be taken into account [12].
A Schematic representation of the multi-level crash prediction function and a generic Pre-Crash system is shown in Figure 2. The crash prediction function estimates the probability of the collision, the time to collision (TTC) and the collision configuration for both host and opponent vehicles based on sensors' perceptions and real-time data processing. The collision configuration is described by In-Crash variables at the time , the time of the first impact. In-Crash variables, such as impact zone or impact points, impact angle and collision velocity for both accident participants, define the collision configuration. Crash severity prediction function is designed to predict the technical crash severity, which can be defined by various crash severity parameters, such as crash velocity, crash pulse, delta-v, occupant load criterion (OLC), energy equivalent speed (EES), intrusion zone and depth and others. The reliable and accurate prediction of relevant crash severity parameters enables Pre-Crash activation of irreversible restraint systems. However, activation of QM systems is possible based only on crash prediction without the associated crash severity. Once the targeted PCS is classified as ASIL A+ (ASIL of at least A or higher), system activation is inconceivable without an additional crash severity estimation. Collision constellation optimization based on crash prediction should contribute to the lowest injury severity within the possible collisions. A machine learning approach to estimate the trajectory of the lowest crash severity for unavoidable collisions is shown in [6]. However, the injury severity of the occupants is unknown even for the host vehicle, as there are no real-time models for the predictive estimation of injury severity. Schematic representation of the multi-level crash prediction function and a generic Pre-Crash system. The illustration is modified and extended according to [5].
In addition, a conflict of interest arises between the safety of host and opponent. The protection of other road users cannot be neglected and is directly addressed by the SOTIF standard. Predicting injury severity distributions for host and opponent occupants for all possible collision constellations in real time is necessary for targeted collision constellation optimization. This is not possible according to the current state of the art and is therefore not part of the testing scope, but could be considered in the future. Additionally, ethical decision-making challenges in automated vehicle crashes were addressed by [15] and remain an unsolved problem to this day.

Machine Learning Methods for Scenario Extraction
Supervised and unsupervised machine learning methods are particularly relevant in the context of accident data analysis. Commonly used supervised learning algorithms on accident data are classification and regression decision trees (CART), random forest, k-nearest neighbors (k-NN) and support vector machines. Supervised machine learning algorithms applied on accident data frequently have an aim to build a model in order to predict accident-related output features based on given inputs; for example, the prediction of the injury severity of the passengers based on the technical crash severity. Jeong et al. present an approach to the classification of motor vehicle crash injury severity [16]. Assi et al. propose a supervised machine learning model to predict crash injury severity based on 15 crash-related parameters [17]. However, supervised learning algorithms require labeled training data to build a mathematical model.
Unsupervised learning, on the other hand, aims to learn patterns from unlabeled data. In contrast to supervised learning, there is no separation between the training and testing data sets; all data is used for model training. One of the main applications of unsupervised learning on traffic accident data is the extraction of unobserved patterns from accident events. A commonly used unsupervised machine learning technique is cluster analysis or clustering. Cluster analysis is an exploratory data analysis technique, which aims to separate a dataset into groups as different as possible, while the data points within the groups should be as similar as possible. The fundamental concepts of clustering and clustering results validation techniques are presented by Halkidi, Batistakis and Vazirgiannis [18]. The main groups of clustering methods are partitioning clustering, hierarchical clustering, density-based clustering, model-based clustering and spectral methods. The most well-known clustering method is the K-means algorithm, which performs iterative steps to assign data points to clusters based on the previously defined number of clusters k.
Since no labels are used, the evaluation of unsupervised learning results appears to be a challenging problem. The evaluation is not a universal procedure and depends on the used data in the first place. Different evaluation metrics can be applied. The three main approaches to investigate cluster validity are based on external, internal and relative criteria. For external criteria, pre-specified benchmark data are needed, for example externally provided labeling. The internal criteria clustering validity approach involves the vectors of the data set themselves. It is based on characteristics such as cohesion, separation, distortion and likelihood. The common internal metrics are silhouette coefficient or the Davies-Bouldin index. The third, relative criteria approach evaluates a clustering structure by comparing it with other clustering results produced using the same algorithm but with different parameterization, for example a different number of clusters.
In order to understand the distortion and silhouette score algorithms used in chapter 3 to evaluate the clustering results, the mathematical foundations behind them are explained. First of all, the terms cluster cohesion and separation are introduced. Cluster cohesion is an intra-cluster similarity score, which measures how closely related the data points in the cluster are. It is measured by the within-cluster sum of squares (WCSS). with: • Σ k : the sum over all clusters k • Σ i : the sum over all data points i within cluster k • x i : a data point within cluster k • c k : the cluster center of cluster k Cluster separation is an inter-cluster dissimilarity score, which measures how well separated the cluster are. It is measured by the between-cluster sum of squares (BSS). with: • Σ k : the sum over all clusters k • n k : the number of data points in cluster k • c k : the cluster center of cluster k • c: the overall centroid or mean of all data points The distortion score is the sum of squared distances from each data point to its cluster center. A decreasing distortion score indicates better cluster allocation and helps to identify the optimal cluster number.
The silhouette coefficient is an overall measure of how near a data point is to its own cluster center in comparison with other cluster centers [19]. The range of silhouette coefficient goes from its minimal value −1.00 to its maximum value 1.00. A large silhouette coefficient, near to 1.00, means that the clustering has a very strong structure, while 0.00 shows no structure at all. Negative silhouette coefficient indicates when an object has been assigned to the wrong cluster. The silhouette score calculates the mean of all silhouette coefficients within the clustered data.
Silhouette Score = ∑ Silhouette Coefficient n i (5) with: • i : the data points within the cluster • a(i): the average intra-cluster distance • b(i) : the smallest inter-cluster distance • n i : the number of all data points in the dataset

Crash Prediction Testing Methodology
International safety standards explain what steps should be fulfilled, but not how. The proposed validation strategy aims to define concrete validation steps for a perception-based crash severity prediction function. A systems engineering approach is pursued; the CSP function is considered as a safety system in the context of the entire vehicle, thus partner functions are considered in the validation process. In particular, the focus of the proposed method is on mining representative and relevant test scenarios for the specified crash severity prediction function under test. In doing so, the area of action of the function is defined and unsupervised machine learning algorithms are applied on the GIDAS database to extract test scenarios. Figure 3 shows the defined validation strategy steps, which are explained in detail in Sections 3.1-3.8. Appl. Sci. 2023, 13, x FOR PEER REVIEW 7 of 20

Definition of the Intended Functionality
The steps of item definition are defined by ISO 26262 and ISO/PAS 21448. They include, among others, the goals and description of the intended functionality. As well as the description, they describe dependencies and interactions with other functions and systems. Menzel, Bagschik and Maurer present three scenario abstraction levels for different process steps defined by the ISO 26262 [20]. Functional scenarios on a semantic level are used for item definition and HARA. Thus, a knowledge-driven approach for the definition of the item and functional scenarios by experts should be utilized.

Definition "Area of Action" or Operational Design Domain (ODD)
The selection of the database for the creation of the test cases catalog is based on the limits within which the tested system is designed to operate. For driving automation systems level one to five, the SAE J3016 defines the operational design domain as "Operating conditions under which a given driving automation system, or feature thereof, is specifically designed to function, including, but not limited to, environmental, geographical, and time-of-day restrictions, and/or the requisite presence or absence of certain traffic or roadway characteristics." [21]. The ODD parameters among others are defined by: •

Road and traffic conditions •
Weather and lighting conditions • Static (e.g., shape) and dynamic (e.g., velocity) conditions of the host vehicle and other road participants • Host occupant configuration For integral safety systems the limits within which the tested system are designed to operate are defined by the area of action. The area of action contains similar parameters to the ODD, with the additional focus on the Pre-Crash and In-Crash variables. Particularly additionally relevant are the following:

•
Concrete collision configuration at • Safety systems configuration • Passengers Pre-Crash and In-Crash configuration

Definition of the Intended Functionality
The steps of item definition are defined by ISO 26262 and ISO/PAS 21448. They include, among others, the goals and description of the intended functionality. As well as the description, they describe dependencies and interactions with other functions and systems. Menzel, Bagschik and Maurer present three scenario abstraction levels for different process steps defined by the ISO 26262 [20]. Functional scenarios on a semantic level are used for item definition and HARA. Thus, a knowledge-driven approach for the definition of the item and functional scenarios by experts should be utilized.

Definition "Area of Action" or Operational Design Domain (ODD)
The selection of the database for the creation of the test cases catalog is based on the limits within which the tested system is designed to operate. For driving automation systems level one to five, the SAE J3016 defines the operational design domain as "Operating conditions under which a given driving automation system, or feature thereof, is specifically designed to function, including, but not limited to, environmental, geographical, and time-of-day restrictions, and/or the requisite presence or absence of certain traffic or roadway characteristics." [21]. The ODD parameters among others are defined by:

Hazard Analysis and Risk Assessment
Hazard analysis and risk assessment is performed based on ISO 26262 and ISO/PAS 21448 guidance. The development is carried out according to ISO 26262-6, whereby the classification of the ASIL level can only take place in the context of the targeted Pre-Crash systems. Here, a distinction is made between QM and ASIL A+ classified target systems. Systems such as predictive eCall or reversible belt pre-pretensioner with a low force level can be classified as QM Pre-Crash systems. However, reversible seat belt with a high force level, which replaces the conventional pyrotechnic seat belt, is classified as ASIL B [22].
Thus, it is not sufficient to consider only whether the Pre-Crash system is reversible or irreversible; the relevant question is if there is a risk of harm for all traffic participants and what risk factors of the systems can lead to it. The target Pre-Crash system hazard and risk should be assessed according to the five proposed risk characteristics: Risk of injury by the PCS for host vehicle occupants • Risk of injury by the PCS for opponents Depending on the ASIL classification of the targeted Pre-Crash system, different requirements arise for the accepted probability of system failure and thus the robustness and accuracy of the crash severity prediction function. Examples of addressed PCS and a classification of risk characteristics are shown in Table 1. While predictive eCall function is classified as QM, the Pre-Crash airbag activation receives ASIL D classification. Fallback level defines a secondary system, which is engaged when the primary system fails. In the case of Pre-Crash airbag activation there is no fallback level if the system is activated as a result of false positive prediction, since the airbag is already deployed. In addition, false positive Pre-Crash activation of airbag can lead to a provoked collision, which leads to injury risk not only for host, but also other traffic participants. For false negative activation, thus no Pre-Crash airbag activation, the conventional In-Crash activation of airbag as a fallback level is still possible.
Thus, a crash prediction function developed and validated according to ASIL B meets the safety requirements for the reversible seat belt deployment with a high force level, but not for the Pre-Crash activation of the airbags or collision constellation optimization. Based on the HARA results, a functional modification of the system can be made before further development and validation steps are performed.

Test Scenario Requirements
A data-driven approach to scenario generation is chosen. Logical scenarios are utilized to describe the parameter ranges of the state values used for scenario representation and concrete test scenarios for the verification and validation (V&V) tests [20]. In order to describe driving scenarios, a five layer model is proposed by Bagschik et al. [23]. The scenario is described by five levels: road (L1), traffic infrastructure (L2), temporary manipulation of L1 and L2 (L3), Objects (L4) and Environment (L5). Scholtes et al. have proposed a six-layer model by adding digital information as a sixth layer and placing static objects on L2 [24]. To create concrete test scenarios with sufficient data depth for the perception-based crash prediction function, information from level one to five is required. Considering the increasing use of V2X communication in modern vehicles, L6 information is not currently part of this study, but should be used in future testing.
Test scenarios for the CSP function must represent real-world driving situations and are divided into three scenario groups: For collision scenarios, all collisions in the area of action of the function should be considered. A critical situation extends from potential conflicts to near accident scenarios. Various criticality metrics are discussed by Hruschka, Töpfer and Zug [25]. The most common metrics with a focus on collision avoidance are time to collision (TTC), time to distance (TTD), time to brake (TTB) and time to steer (TTS). More advanced metrics, such as crash severity and collision configuration distribution, assess the critical situation risk based on the potential accident severity. To validate the Pre-Crash system, various critical situations from a low to high criticality level are required. Regular driving is the test baseline, it represents the different states in the area of action, such as different environmental, road, traffic infrastructure, traffic flow, lighting and other conditions.
Considering the addressed Pre-Crash systems, concrete scenarios are classified by must fire, may fire and no fire scenarios. Exclusively for PCS classified as QM, the condition may fire is established. Considering the crash prediction function, must fire and may fire collisions, as well as may fire nearby collisions, are required to test TP (true positive) prediction. No fire collisions, nearby collisions and regular driving are required to test TN (true negative) crash prediction. Figure 4 shows different test scenario classifications and true prediction classes according to the system under test and their activation requirements. Any prediction assignments other than those shown in Figure 4 lead to false positive (FP) or false negative (FN) results.

Test Scenario Requirements
A data-driven approach to scenario generation is chosen. Logical scenarios are utilized to describe the parameter ranges of the state values used for scenario representation and concrete test scenarios for the verification and validation (V&V) tests [20]. In order to describe driving scenarios, a five layer model is proposed by Bagschik et al. [23]. The scenario is described by five levels: road (L1), traffic infrastructure (L2), temporary manipulation of L1 and L2 (L3), Objects (L4) and Environment (L5). Scholtes et al. have proposed a six-layer model by adding digital information as a sixth layer and placing static objects on L2 [24]. To create concrete test scenarios with sufficient data depth for the perception-based crash prediction function, information from level one to five is required. Considering the increasing use of V2X communication in modern vehicles, L6 information is not currently part of this study, but should be used in future testing.
Test scenarios for the CSP function must represent real-world driving situations and are divided into three scenario groups: For collision scenarios, all collisions in the area of action of the function should be considered. A critical situation extends from potential conflicts to near accident scenarios. Various criticality metrics are discussed by Hruschka, Töpfer and Zug [25]. The most common metrics with a focus on collision avoidance are time to collision (TTC), time to distance (TTD), time to brake (TTB) and time to steer (TTS). More advanced metrics, such as crash severity and collision configuration distribution, assess the critical situation risk based on the potential accident severity. To validate the Pre-Crash system, various critical situations from a low to high criticality level are required. Regular driving is the test baseline, it represents the different states in the area of action, such as different environmental, road, traffic infrastructure, traffic flow, lighting and other conditions.
Considering the addressed Pre-Crash systems, concrete scenarios are classified by must fire, may fire and no fire scenarios. Exclusively for PCS classified as QM, the condition may fire is established. Considering the crash prediction function, must fire and may fire collisions, as well as may fire nearby collisions, are required to test TP (true positive) prediction. No fire collisions, nearby collisions and regular driving are required to test TN (true negative) crash prediction. Figure 4 shows different test scenario classifications and true prediction classes according to the system under test and their activation requirements. Any prediction assignments other than those shown in Figure 4 lead to false positive (FP) or false negative (FN) results.  This results in different requirements for the information that is contained within the different scenario groups. All groups require dynamic and static information about the road network, road users and environment to build the test scenario, as defined for L1 to L5 [23]. In addition, for collision scenarios, In-Crash information is required. For the testing of ASIL A+ Pre-Crash systems, it is necessary to define the ground truth of the technical crash severity, such as the crash pulse or delta-v values. For QM systems, the ground truth of at least a collision configuration is also sufficient.

Data Acquisition
Representative test scenarios are required to test Pre-Crash systems. One of the biggest challenges in scenario mining for ADAS/AD and safety systems is the collection of high quality and in-depth real-world data on a large scale. Test scenario data sources on macroscopic, mesoscopic and microscopic levels are presented by Mai et al. [26]. A modified and expanded illustration with additional types of data sources and concrete dataset examples is shown in Figure 5.
Appl. Sci. 2023, 13, x FOR PEER REVIEW 10 of This results in different requirements for the information that is contained within th different scenario groups. All groups require dynamic and static information about th road network, road users and environment to build the test scenario, as defined for L1 L5 [23]. In addition, for collision scenarios, In-Crash information is required. For t testing of ASIL A+ Pre-Crash systems, it is necessary to define the ground truth of th technical crash severity, such as the crash pulse or delta-v values. For QM systems, th ground truth of at least a collision configuration is also sufficient.

Data Acquisition
Representative test scenarios are required to test Pre-Crash systems. One of th biggest challenges in scenario mining for ADAS/AD and safety systems is the collectio of high quality and in-depth real-world data on a large scale. Test scenario data sourc on macroscopic, mesoscopic and microscopic levels are presented by Mai et al. [26]. modified and expanded illustration with additional types of data sources and concre dataset examples is shown in Figure 5. A common limitation is the inverse proportionality between the amount of data an the data depth. While the official accident statistics cover all traffic accidents recorded b the police, the depth of data is usually not sufficient to define concrete test scenarios. Mo important for test scenario extraction is not the amount, but the combination of th representativeness of the data and the data depth. These properties are fulfilled by i depth accident data bases. The GIDAS database contains detailed information on Germa traffic accidents with injured participants. The accidents are recorded at the scene accident and reconstructed in order to obtain information about all five phases of t accident, as shown in Figure 1. The GIDAS PCM (Pre-Crash Matrix) database contains big sample of reconstructed GIDAS Pre-Crash scenarios in a specific PCM format. describes the Pre-Crash phase over about 5 s before the collision and contains th information on participants, their dynamics and the environment [27]. Currently, th GIDAS PCM database contains 11074 collision scenarios, 40% of which are car to c collisions [28]. The PCM scenarios are used for the simulative testing of driver assistan A common limitation is the inverse proportionality between the amount of data and the data depth. While the official accident statistics cover all traffic accidents recorded by the police, the depth of data is usually not sufficient to define concrete test scenarios. More important for test scenario extraction is not the amount, but the combination of the representativeness of the data and the data depth. These properties are fulfilled by in-depth accident data bases. The GIDAS database contains detailed information on German traffic accidents with injured participants. The accidents are recorded at the scene of accident and reconstructed in order to obtain information about all five phases of the accident, as shown in Figure 1. The GIDAS PCM (Pre-Crash Matrix) database contains a big sample of reconstructed GIDAS Pre-Crash scenarios in a specific PCM format. It describes the Pre-Crash phase over about 5 s before the collision and contains the information on participants, their dynamics and the environment [27]. Currently, the GIDAS PCM database contains 11,074 collision scenarios, 40% of which are car to car collisions [28]. The PCM scenarios are used for the simulative testing of driver assistance and active safety systems in a virtual environment [29]. However, GIDAS does not include accidents without injured persons, nor near accident scenarios. Naturalistic driving studies and field operational tests collect real world data during regular driving. Nevertheless, critical situations and even a small number of collisions also occur in large traffic data collection campaigns. For example, the SHRP2 study contains over 36,000 regular driving, near collision and collision events [30]. However, the collisions represent only a small part of the SHRP2 dataset. Modern vehicles are equipped with advanced vehicle dynamics, and interior and environmental sensors, which enables the collection of real-world driving data on a mesoscopic level. The amount and depth of the data is flexible and depends on the vehicle equipment and the definition of the data collection campaign. Thus, the highest possible amount of data corresponds to the entire vehicle fleet, which would vastly exceed conventional data sources. However, the protection of personal data must be ensured.

Machine Learning Scenario Extraction
One possibility for the extraction of the relevant test scenarios from the acquired accident database is the explorative machine learning approach. Since our goal is to investigate the occurrence of collision configurations in real-world accident data, no external labels should be defined for the data. Thus, an unsupervised machine learning approach without data labeling was chosen. Furthermore, the property of clustering to organize similar cases into one group makes the extraction of representative test scenarios possible. The clustering aim and input data selection are derived from the test scenario requirements previously defined in step C. The proposed scenario clustering methodology is shown schematically in Figure 6.
Appl. Sci. 2023, 13, x FOR PEER REVIEW 11 o and active safety systems in a virtual environment [29]. However, GIDAS does not inclu accidents without injured persons, nor near accident scenarios. Naturalistic driv studies and field operational tests collect real world data during regular drivi Nevertheless, critical situations and even a small number of collisions also occur in la traffic data collection campaigns. For example, the SHRP2 study contains over 36, regular driving, near collision and collision events [30]. However, the collisions repres only a small part of the SHRP2 dataset. Modern vehicles are equipped with advan vehicle dynamics, and interior and environmental sensors, which enables the collection real-world driving data on a mesoscopic level. The amount and depth of the dat flexible and depends on the vehicle equipment and the definition of the data collect campaign. Thus, the highest possible amount of data corresponds to the entire veh fleet, which would vastly exceed conventional data sources. However, the protection personal data must be ensured.

Machine Learning Scenario Extraction
One possibility for the extraction of the relevant test scenarios from the acqui accident database is the explorative machine learning approach. Since our goal is investigate the occurrence of collision configurations in real-world accident data, external labels should be defined for the data. Thus, an unsupervised machine learn approach without data labeling was chosen. Furthermore, the property of clustering organize similar cases into one group makes the extraction of representative test scena possible. The clustering aim and input data selection are derived from the test scena requirements previously defined in step C. The proposed scenario cluster methodology is shown schematically in Figure 6.  Depending on the database and the selection of the input features, both accident scenarios and critical situations can be grouped by cluster analysis. However, the selected input features directly determine the clustering results. To test a Pre-Crash function, both the Pre-Crash and In-Crash information in the test scenario are required. The scale level of the different features should be considered. However, the characteristics of the ratio scale features are well suited for cluster analyses since they show rankings and interpretable distances and can be transformed in the preprocessing. Thus, variables of both participants, such as collision velocity, impact angle, floating angle, impact point along the contour of the vehicles, vehicle mass and other In-Crash variables, can be used directly for collision configuration clustering.
Additionally, the k-nearest neighbors (k-NN) algorithm is utilized to determine the representative cases within the clusters. Concrete representative cases from each cluster are selected to create a concrete test scenario catalog. For logical test scenarios, the parameter ranges of features used for scenario representation should be limited by a standard deviation of these features within the clusters. Through k-NN, the selection of more than one cluster representative, as a cluster center, is possible. In this way, test case catalogs with both concrete and logical scenarios can be created.
Proof of concept clustering on the GIDAS dataset of vehicle to vehicle crashes, collected in the years 2000 to 2019, was performed to demonstrate the challenges in the evaluation of clustering results. The goal of this clustering was to identify representative collision configurations and to find the optimal number of clusters, to reduce the test effort of the system without losing the relevant information. The dataset was filtered by the area of action of the function under tests. A sample of 20,239 cases on the passenger level was used for clustering. A K-means++ clustering algorithm, which were introduced by Arthur and Vassilvitskii [31], was utilized. The difference to the conventional K-means algorithm is the initialization method of the cluster centers, which helps to increase the cluster quality and reduces the risk of falling into local data minima. The feature input set consisted of ten overall collision configuration features, such as the impact points, collision velocity, collision angle and technical crash severity dv for both the host and opponent vehicles. Additionally, the input features have been standardized.
In unsupervised learning, no labels are used, which makes the evaluation of the clustering results more complicated. The external clustering validity criteria cannot be used here because no labeling of the generated clusters is available. Relative and internal criteria, such as distortion and silhouette scores, are applied to assess the cluster validity. Figure 7 shows the distortion and silhouette scores for the clustering results, from 1 to 100 clusters. While the distortion score is continuously reduced with an increasing number of clusters, the silhouette score reaches the peak at four clusters and drops rapidly with an increasing number of clusters.
both participants, such as collision velocity, impact angle, floating angle, impact po along the contour of the vehicles, vehicle mass and other In-Crash variables, can be us directly for collision configuration clustering.
Additionally, the k-nearest neighbors (k-NN) algorithm is utilized to determine t representative cases within the clusters. Concrete representative cases from each clus are selected to create a concrete test scenario catalog. For logical test scenarios, t parameter ranges of features used for scenario representation should be limited by standard deviation of these features within the clusters. Through k-NN, the selection more than one cluster representative, as a cluster center, is possible. In this way, test ca catalogs with both concrete and logical scenarios can be created.
Proof of concept clustering on the GIDAS dataset of vehicle to vehicle crash collected in the years 2000 to 2019, was performed to demonstrate the challenges in t evaluation of clustering results. The goal of this clustering was to identify representati collision configurations and to find the optimal number of clusters, to reduce the test eff of the system without losing the relevant information. The dataset was filtered by the ar of action of the function under tests. A sample of 20,239 cases on the passenger level w used for clustering. A K-means++ clustering algorithm, which were introduced by Arth and Vassilvitskii [31], was utilized. The difference to the conventional K-means algorith is the initialization method of the cluster centers, which helps to increase the clus quality and reduces the risk of falling into local data minima. The feature input consisted of ten overall collision configuration features, such as the impact poin collision velocity, collision angle and technical crash severity dv for both the host a opponent vehicles. Additionally, the input features have been standardized.
In unsupervised learning, no labels are used, which makes the evaluation of t clustering results more complicated. The external clustering validity criteria cannot used here because no labeling of the generated clusters is available. Relative and intern criteria, such as distortion and silhouette scores, are applied to assess the cluster validi Figure 7 shows the distortion and silhouette scores for the clustering results, from 1 to 1 clusters. While the distortion score is continuously reduced with an increasing number clusters, the silhouette score reaches the peak at four clusters and drops rapidly with increasing number of clusters.  According to the elbow method on the distortion score plot, the optimal number of clusters is about 15 clusters. This would also be a very rudimentary representation of the diversity of all vehicle to vehicle collision configurations. From the silhouette score plot, four clusters show the best clustering structure. Obviously, only four clusters cannot represent all relevant vehicle to vehicle collisions. Similar results have been obtained with various clustering algorithms on the same dataset. It is clear that common distance and similarity-based clustering evaluation metrics, such as distortion or silhouette scores, are not sufficient to validate the traffic scenario clustering results in detail. However, it can help to show a tendency. Nevertheless, the evaluation of the clusters by expert knowledge is required.
The current GIDAS clustering results already allow the reduction of thousands of collision configurations to less than 50 representative test cases. For this study, a catalog of 35 clusters representing car to car collision configurations was created. The clustered collision configurations were evaluated by experts based on the system definition and concrete testing requirements. In doing so, the collision velocity, impact points, collision angle, delta-v values and injury severity of the cluster representatives have been examined. To better understand the results, the collision configurations of all the accident cases within the clusters were visualized.
A few examples of the clusters are presented. Figure 8 shows, on the left, the cluster C1 with the highest number of cases, which represents low speed rear-end collisions. The blue box is the host vehicle, and the red boxes are all the opponent vehicles within the cluster. By determining the cluster center, a concrete collision configuration for this cluster is derived. The median velocity speed in this cluster is 15 km h , the delta-v is 9 km h and the percentage of serious injured passengers in the host vehicle is 2%. On the right of Figure 8 is the cluster C2, which represents the most severe accidents on the database. The median velocity speed in this cluster is 132 km h , the delta-v value is 61 km h and the percentage of serious injured passengers is 52%.
According to the elbow method on the distortion score plot, the optimal number of clusters is about 15 clusters. This would also be a very rudimentary representation of the diversity of all vehicle to vehicle collision configurations. From the silhouette score plot, four clusters show the best clustering structure. Obviously, only four clusters cannot represent all relevant vehicle to vehicle collisions. Similar results have been obtained with various clustering algorithms on the same dataset. It is clear that common distance and similarity-based clustering evaluation metrics, such as distortion or silhouette scores, are not sufficient to validate the traffic scenario clustering results in detail. However, it can help to show a tendency. Nevertheless, the evaluation of the clusters by expert knowledge is required.
The current GIDAS clustering results already allow the reduction of thousands of collision configurations to less than 50 representative test cases. For this study, a catalog of 35 clusters representing car to car collision configurations was created. The clustered collision configurations were evaluated by experts based on the system definition and concrete testing requirements. In doing so, the collision velocity, impact points, collision angle, delta-v values and injury severity of the cluster representatives have been examined. To better understand the results, the collision configurations of all the accident cases within the clusters were visualized.
A few examples of the clusters are presented. Figure 8 shows, on the left, the cluster C1 with the highest number of cases, which represents low speed rear-end collisions. The blue box is the host vehicle, and the red boxes are all the opponent vehicles within the cluster. By determining the cluster center, a concrete collision configuration for this cluster is derived. The median velocity speed in this cluster is 15 km h , the delta-v is 9 km h and the percentage of serious injured passengers in the host vehicle is 2%. On the right of Figure  8 is the cluster C2, which represents the most severe accidents on the database. The median velocity speed in this cluster is 132 km h , the delta-v value is 61 km h and the percentage of serious injured passengers is 52%. However, the implemented clustering algorithm also manages to form unique clusters for some critical accident constellations. This is shown in Figure 9 by cluster C3. This is a cluster with accidents at a very high collision velocity; however, the percentage of serious injuries is in the low-medium range. The median velocity speed in this cluster is 130 km h , the delta-v value is 15 km h and the percentage of serious injured passengers is 12%. This can be explained by sliding effects during collision. This cluster is a proof that, by the precise selection of the correct clustering parameterization and the relevant input features, such special cases can be identified as test cases. However, the implemented clustering algorithm also manages to form unique clusters for some critical accident constellations. This is shown in Figure 9 by cluster C3. This is a cluster with accidents at a very high collision velocity; however, the percentage of serious injuries is in the low-medium range. The median velocity speed in this cluster is 130 km h , the delta-v value is 15 km h and the percentage of serious injured passengers is 12%. This can be explained by sliding effects during collision. This cluster is a proof that, by the precise selection of the correct clustering parameterization and the relevant input features, such special cases can be identified as test cases. Nevertheless, clustering can still blur some rare and unique scenarios. That is why we used a k-NN algorithm that helps to create the logical test scenario catalog. The use of logical test scenarios in the defined clusters parameter space prevents the disappearance of the corner cases as test cases.
Regular driving scenarios do not necessarily require machine learning analysis. For the baseline, a sufficiently large number of driving hours and the diversity of different driving conditions should be represented. Therefore, a representative assessment of a regular driving dataset is required. Nevertheless, clustering can still blur some rare and unique scenarios. That is why we used a k-NN algorithm that helps to create the logical test scenario catalog. The use of logical test scenarios in the defined clusters parameter space prevents the disappearance of the corner cases as test cases. Regular driving scenarios do not necessarily require machine learning analysis. For the baseline, a sufficiently large number of driving hours and the diversity of different driving conditions should be represented. Therefore, a representative assessment of a regular driving dataset is required.

Scenario Relevance Assessment
After the scenario representatives have been extracted, the relevance of each scenario in the testing context was evaluated. The relevance of a test scenario is directly related to the tested system and depends on several factors of influence. Test cases that are legally required for the release of the function receive the highest relevance rating. This is followed by test cases that are rated by car safety performance assessment programs, such as EuroNCAP. In addition to these standardized test scenarios, the relevance of identified collisions, critical and regular driving test sequences in the field data, respectively real world traffic, should be evaluated. For this purpose, a relevance metric is required. For collision scenarios, a field relevance of the cluster can be represented through the combination of the injury severity and the probability of exposure, which is defined as a risk function. Additionally, the technical crash severity (delta-v) of the cluster is considered in the third relevance criteria. The relevance assessment is necessary for potential functional modification decisions.

Validation Tests
Validation tests are performed in virtual simulations based on the developed test scenario catalog, and contain the crash severity prediction algorithm as well as the virtual vehicles and the environment. For this purpose, the software IPG CarMaker and the clustered GIDAS collision configuration scenarios were utilized. Concrete collision scenarios were extracted as cluster representatives and the case number was searched in the GIDAS PCM database. Since not every GIDAS case contains a GIDAS PCM case, the k-NN algorithm was used here again and the closest cases to the cluster center were considered as representatives of those that can be found in the GIDAS PCM database. This also covered the diversity of the different Pre-Crash phases, since the clustering represents the collision configuration, but the Pre-Crash phase may be different. The GIDAS PCM scenarios were converted into the IPG CarMaker scenario formats. In addition to real accident scenarios from the GIDAS PCM database, generic accident scenarios were created for the identified representative collision configurations.
The tests pursued the goal of verifying the predicated outputs with regard to a defined ground truth. Each predicted variable was compared against the nominal ground truth value from the database, at each prediction timestep in the Pre-Crash phase. This is shown exemplary in Table 2, where VREL is a GIDAS variable for the collision velocity. Thus, each predicted variable can be verified as decoupled from others. The time before t 0 , known as TTC, for a sufficiently good prediction performance should be determined. However, a purely quantitative assessment of the prediction performance is not sufficient without a connection to the targeted Pre-Crash system. Crash severity prediction contains a predicted distribution of potential collision configurations and their severity values. Methods to generate fire, may fire and no fire signals from these distributions must be developed. As a result, the requirements for prediction performance must be determined by the Pre-Crash systems. Moreover, simulations composed of the complete system under test, including the individual components, such as vehicle sensors, controllers, data transmission and processing, crash severity prediction and Pre-Crash actuators, are required for reliable validation.

Safety and Effectiveness Assessment
The results of the validation tests were analyzed as a part of the safety and effectiveness assessment argumentation. Figure 10 shows the confusion matrix for PCS activation. True positive predicted cases define the area of efficiency of the system under test, which is usually smaller than the originally defined area of action. The degree of efficiency (DoE) of the system under test is defined by [32] as the quotient of the area of efficiency and the area of action. In addition to the simulative determination of the DoE, a retrospective determination on the basis of the accident data collection is possible. However, DoE shows the efficiency of the system but does not provide proof of safety. The acceptable probabilities for FP and FN cases are defined based on the ASIL classification of the concrete system and must be proven during the testing. For example, when considering FN classification, the main consideration is whether a fallback level is present or not. However, there is a trade-off between the advantage of TP and the disadvantage of the FN and FP cases. Thus, a positive risk balance proof is necessary. TN cases have neither a negative nor a positive effect, while the FP and FN cases lead to potential hazard. For the positive risk balance, the sum of the injury reduction in all the TP cases has to be higher than the sum of injuries potentially caused by all the FP and FN cases. before 0 , known as TTC, for a sufficiently good prediction performance should be determined. However, a purely quantitative assessment of the prediction performance is not sufficient without a connection to the targeted Pre-Crash system. Crash severity prediction contains a predicted distribution of potential collision configurations and their severity values. Methods to generate fire, may fire and no fire signals from these distributions must be developed. As a result, the requirements for prediction performance must be determined by the Pre-Crash systems. Moreover, simulations composed of the complete system under test, including the individual components, such as vehicle sensors, controllers, data transmission and processing, crash severity prediction and Pre-Crash actuators, are required for reliable validation.

Safety and Effectiveness Assessment
The results of the validation tests were analyzed as a part of the safety and effectiveness assessment argumentation. Figure 10 shows the confusion matrix for PCS activation. True positive predicted cases define the area of efficiency of the system under test, which is usually smaller than the originally defined area of action. The degree of efficiency (DoE) of the system under test is defined by [32] as the quotient of the area of efficiency and the area of action. In addition to the simulative determination of the DoE, a retrospective determination on the basis of the accident data collection is possible. However, DoE shows the efficiency of the system but does not provide proof of safety. The acceptable probabilities for FP and FN cases are defined based on the ASIL classification of the concrete system and must be proven during the testing. For example, when considering FN classification, the main consideration is whether a fallback level is present or not. However, there is a trade-off between the advantage of TP and the disadvantage of the FN and FP cases. Thus, a positive risk balance proof is necessary. TN cases have neither a negative nor a positive effect, while the FP and FN cases lead to potential hazard. For the positive risk balance, the sum of the injury reduction in all the TP cases has to be higher than the sum of injuries potentially caused by all the FP and FN cases. Figure 10. Pre-Crash system triggering confusion matrix. Figure 10. Pre-Crash system triggering confusion matrix.

Functional Modification
Considering the effectiveness and safety assessment of the function under test and the relevance assessment of the test scenarios, functional modification requirements can be defined to improve the function and increase the safety potential. Functional modifications can lead to a change in the intended functionality or the field of action of the function. Therefore, the presented testing method should be carried out again, after each functional modification, starting with step A.

Discussion of Limitations
This chapter discusses the limitations of the methodological steps described above and provides an outlook for future research.

Crash Severity Validation
A major challenge to the validation of the predicted crash severity is the mining of the ground truth crash severity data. In-depth accident data studies reconstruct traffic accidents, which are an approximation of reality. Additionally, the GIDAS database contains reconstructed scalar values of delta-v, not crash pulses as time series. In order to obtain the crash pulses measured In-Crash, the event data recorder (EDR) signals must be collected.
This has been performed only on a small number of GIDAS cases [33]. Another possibility in estimating the crash pulses for identified representative collision configurations are FEM simulations of car to car collisions; however, these are very cost-intensive.

Prospective Test Scenarios
Accident databases provide only a retrospective view on traffic accidents. Leledakis et al. present a method for predicting collision configurations in vehicles with crash avoidance systems and identifying the changes in new collision configurations [34]. A lot of prospective studies utilize retrospective accident scenarios to create a new prospective test scenario catalog by using virtual simulations. As addressed by SOTIF, even after the data acquisition and test sequences extraction, there is still a possibility of missing relevant scenarios, especially unknown unsafe scenarios. Therefore, continuous traffic data collection and analysis is still necessary, besides prospective simulations studies.

Global Traffic Data
International traffic accident differences are caused by different types of roads, infrastructure, traffic rules, road users and their regional behavior. Thus, accident scenarios that are representative for one country cannot be considered as representative for another country without validation. Additionally, the data collection and coding format is not identical for different databases, which makes the analysis and scenario extraction complicated. Therefore, the initiative for the global harmonization of accident data (iGLAD) collects and harmonizes accident data from 12 different countries [35]. PCM created from the iGLAD dataset can be utilized for the simulations of international test scenarios [36]. However, the question of representativeness remains unanswered. The same challenges apply to critical driving and regular driving scenarios.

Pre-Crash Phase Clustering
While collision configurations clustering has the potential to determine representative retrospective collision configurations, as shown in this study, and even prospective collision configurations, as shown by Leledakis et al. [34], the diversity of different Pre-Crash maneuvers that can lead to the same collision is still unobserved. Information of the Pre-Crash phase should be additionally considered in the clustering algorithm. As shown in Figure 1, Pre-Crash variables consist mainly of the time series of the participants or categorically scaled slowly changing variables. For the clustering of categorical variables, one-hot encoding can be applied to convert the categorical data to numerical form. The encoded categorical variable is removed and new binary variables are added to the dataset. However, this can lead to a strong increase in the number of input features, which increases the risk of the curse of dimensionality.
The first three limitations mentioned above can be completely solved by targeted in-depth fleet data collection. Thus, fleet data has the greatest potential for further representative traffic data collection and is considered as essential for the V&V process of ADAS/AD and novel safety systems.

Conclusions
In this paper, the authors have elaborated on the safety potentials and risks of innovative crash severity prediction and Pre-Crash systems. A novel validation strategy for the crash severity prediction function was proposed in the context of ISO 26262 and ISO/PAS 21448 standards. The relevance of a reliable and precise real-time crash severity prediction function was demonstrated using the potential Pre-Crash use cases. Thus, the authors identified fundamental differences in the validation requirements of QM-and ASIL A+-related CSP functions. Test scenario classification criteria for collisions, critical situations and regular driving, combined with must fire, may fire and no fire signals, were presented. In addressing a scenario-based testing approach, unsupervised machine learning was chosen as an exploratory data analysis technique to extract representative and relevant test scenarios from real-world data. A novel, unsupervised machine learning methodology for the creation of concrete and logical test scenario catalogs were developed with K-Means++ and k-NN algorithms. The methodology was implemented on the GIDAS database to create 35 clusters of representative collision configurations in the area of action of the CSP function under test. The presented methodology for machine learning scenario extraction is transferable to other databases and can be carried out for different systems under test. Furthermore, not only the fulfillment of the ASIL-classified probability of failure requirements, but additionally the evidence of a positive risk balance were defined as essential conditions for the proof of safety of Pre-Crash systems. However, the limitations of the presented method have been disclosed.
In the future, the research should focus on the identified limitations. The difficulties in mining representative traffic data with corresponding data depth, such as crash severity information, should be solved as soon as possible. That being the case, fleet data appears as the most promising source for the representative traffic data collection. Moreover, the combination of In-Crash and Pre-Crash variables should be considered in further scenario extraction. One question that still remains unanswered is what injury reduction benefit the Pre-Crash systems can provide through a specific prediction performance. For this reason, methods to generate fire signals from crash prediction distributions must be developed in the future. The complete chain from crash severity prediction to the activation of Pre-Crash systems and passenger injury analysis should be taken into account for a valid effectiveness and safety assessment of the crash severity prediction function.

Conflicts of Interest:
The authors declare no conflict of interest.