Analysis of Full-scale Riser Responses in Field Conditions Based on Gaussian Mixture Model

experience complex and combined load conditions from waves, current and vessel motions that may result in both wave frequency and vortex shedding response patterns. Field measurements often consist of records of environmental conditions and riser responses, typically with 30-minute intervals. These data can be represented in a high-dimensional parameter space. However, it is difficult to visualize and understand the structural responses, as they are affected by many of these parameters. It becomes easier to identify trends and key parameters if the measurements with the same characteristics can be grouped together. Cluster analysis is an unsupervised learning method, which groups the data based on their relative distance, density of the data space, intervals, or statistical distributions. In the present study, a Gaussian mixture model guided by domain knowledge has been applied to analyze field measurements. Using the 242 measurement events of the Helland-Hansen riser, it is demonstrated that riser responses can be grouped into 12 clusters by the identification of key environmental parameters. This results in an improved understanding of complex structure responses. Furthermore, the cluster results are valuable for evaluating the riser response prediction accuracy.


Introduction
Slender marine structures, such as marine risers or power cables, are exposed to complex load conditions, i.e., wave, current, vessel motions and vortex induced vibrations (VIV) due to current.Among these, VIV in particular is often a critical design consideration as it can lead to fast accumulation of fatigue damage and amplified drag loads in face of strong currents.Extensive research has been performed over the years on this topic.However, most research efforts have been based on laboratory tests with a simplified structure model and flow conditions, and compromising scaling effects.There is also lack of a model test that can combine different load processes simultaneously as experienced in the field environmental conditions due to cost and physical limitations of the test setup.On the other hand, field measurements become increasingly available with the increased use of sensors.However, field measurement data have larger uncertainty compared to laboratory tests due to the relatively low sensor density and measurement quality for the complex response and load conditions.
Prediction accuracy using Frequency Domain (FD) semi-empirical prediction programs, such as VIVANA-FD (Larsen et al., 2009), Shear7 (Vandiver and Li, 2007) and VIVA (Triantafyllou et al., 1999) are continuously improved through comparison against laboratory test data (Voie et al., 2017(Voie et al., , 2016)).However, there exist larger uncertainties in VIV response predictions when comparing with field measurement data, see e.g.(Tognarelli et al., 2008;Ge et al., 2014).In (Tognarelli et al., 2008) it was concluded that "For full scale drilling risers without VIV suppression, data show that state-of-the-art analysis methods are, on average, inherently 30X (30 times) conservative on a maximum fatigue damage basis.This average bias may be reduced by adjusting the maximum lift in the lift curve utilized in the method; however, the ability to do this is limited due to the significant scatter in measured fatigue damage due to VIV."It is therefore important to reduce such uncertainties and increase safety in structure design against VIV loads.There is also a need to obtain insights on the complex physical processes experienced by the structure under field conditions.Some questions to be addressed are: 1) When will significant VIV responses occur?2) How will different load conditions affect the structural responses?3) How will the findings affect present prediction practices?
Field measurements often consist of records of environmental conditions and structural responses typically with 30-minute intervals.Many environmental conditions will affect the structural responses, e.g., current profiles (speed, directionality and shearedness), direction and magnitude of wave force, and vessel motion induced loads.These parameters form a high-dimensional parameter space, which is difficult to visualize and interpret.Traditionally, the kurtosis value of the signal has been used as an indication in the case of a lock-in event.Kurtosis is defined as  4 ∕ 2 2 , where  4 and  2 are the 4th and 2nd moments of the signal's distribution, respectively.For a Gaussian process like the typical multi-frequency random vibrations, the kurtosis value is 3.0.For a sinusoidal process like the typical single mode lock-in event, the kurtosis value is equal to 1.5.However, this parameter alone provides limited information as multiple loads are normally present at the same time and the corresponding kurtosis will be between 1.5 -3.0.
It is easier to identify trends and key parameters if measurements with similar characteristics can be grouped together instead of examining cases individually.Cluster analysis is the process of grouping together data by common or similar characteristics (Estivill-Castro, 2002;Xu and Tian, 2015;Gan et al., 2021).The applications of cluster analysis are wide and across many different fields.Oftentimes clustering is an early key step within a larger framework.In data mining however, the identification of the different groups is the main purpose of doing cluster analysis.In some applications clustering is mainly used to identify similarities and differences between the individual data points.In the field of machine learning, cluster analysis is a subclass of unsupervised learning, since it is not dependent on training data assigned to predefined groups.Depending on the specific choice of algorithm, the data can be grouped based on different measures of distance, density of the data space, graphs, or statistical distributions.Based on the results of a previous study, on experimental data (Wu et al., 2020b), density and distribution based methods were applied for the field data in the present study.
It is known that the underlying mathematical models and empirical hydrodynamic coefficients of the present frequency domain VIV prediction programs are based on simplified assumptions.VIV responses of a slender beam are influenced by both the structural properties and flow characteristics.It was demonstrated in a concept study that the adaptive empirical hydrodynamic parameters may be applied to reduce the prediction discrepancy (Wu et al., 2020b).A semi-empirical time domain VIV model based on the synchronization concept was proposed by (Thorsen et al., 2014(Thorsen et al., , 2016(Thorsen et al., , 2017)).The pure cross-flow (CF) load model has been implemented in the prediction software VIVANA-TD after a systematic validation based on various laboratory test data in both constant flow (Wu et al., 2020a) and oscillatory flow conditions (Wu et al., 2022b).The model has been updated to describe CF and in-line (IL) coupled motions (Ulveseter et al., 2019;Kim et al., 2021b) and validated with respect to various deep water riser tests in 2D uniform and shear flow (Ulveseter et al., 2018(Ulveseter et al., , 2019;;Kim et al., 2021b,a).Reasonable VIV response prediction in 3D flow conditions were also obtained by this model (Kim et al., 2022).It can potentially also account for multiple simultaneously acting load processes (Wu et al., 2022a;Yin et al., 2022b).
The objective of the present study is to investigate whether cluster analysis can help to 1) gain good insights of the physical process of different types of riser responses; 2) cluster field measurement data based on environmental load conditions.Gaussian mixture model (GMM) guided by domain knowledge has been applied to analyze the field measurements based on data of a drilling riser from the Helland-Hansen field in 1998 (Pettersen and Saudland, 1998).The cluster results are also used to explain the reasons of the riser response prediction discrepancy.

Riser system and instrumentation setup
The Helland-Hansen drilling riser is 688 m long and the diameter of the bare riser section and the buoyancy element is 0.5 m and 1.13 m respectively.Most parts of the riser are covered by the buoyancy elements.The riser system is shown in Fig. 1 and the properties of the riser are summarized in Table 1.The eigenfrequency of the riser system has been estimated and is shown in Table 2; the eigenfrequency can vary due to the changes in weight and tension during the drilling operations.The tension was not measured during the measurement campaign, but it was estimated to be about 3000 kN to 4000 kN.The riser and vessel motions were measured as well as the current.A total of 916 sets of records were collected, each of 34 minutes in length (Pettersen and Saudland, 1998).
The instrument system to measure riser responses on Helland-Hansen consisted of six instrument containers attached to the riser in positions shown on Fig. 1.Each instrument unit contained motion sensors, data acquisition hardware and batteries.The main sensors were accelerometers for measurement of horizontal acceleration in two or-  (Kaasen et al., 2000), depth given in (m) thogonal axes, X and Y, as shown in Fig. 2. In three of the instrument containers the accelerometers were supplemented with sensors for measuring angular velocity about the X-and Y-axes.
A seventh instrument container was installed in the drilling vessel, including both accelerometers and angular velocity measurement.Current was measured by a number of acoustic doppler current profilers (ADCP), and a rotortype current meter measured current near the seafloor.The current speed and direction were measured at a number of depths.Current measurement for the top 100 metres of the water column was excluded due to the disturbance from the drilling vessel's thrusters.Eigenfrequency of the Helland-Hansen riser system (Kaasen et al., 2000) Eigenfrequency Value (Hz) f 1 0.022 -0.031 f 2 0.047 -0.056 f 3 0.071 -0.079 f 4 0.1 -0.11 f 5 0.12 -0.13 Figure 2: Orientation of measurement on the riser (Kaasen et al., 2000).

Field measurement data processing
The field measurement data has been analyzed in earlier studies (Pettersen and Saudland, 1998;Solaas and Kaasen, 1999;Kaasen et al., 2000), but additional analysis was required in this study, as summarised in this section.
242 sets of the measured data have been analyzed and used.The riser experienced multiple environmental loads simultaneously, i.e., wave loads, VIV loads, vessel motions, etc.Low frequency vessel motion appears at 0.02 Hz, which is also close to the eigenfrequency of the first mode.The response at wave load frequency was typically broad-banded, which is between 0.08 -0.2 Hz.VIV can occur at single or multiple frequencies depending on the flow speed and its profile.The highest expected VIV frequency is in the range of the wave frequency.

Current data analysis
The measured current data has been processed to identify a main current direction.The shearedness and directionality of the current were defined and calculated.
Main current direction: The riser motion is measured in a coordinate system X-Y which is defined by the orientation of the instrument cylinders, see Fig. 2. A new coordinate system   -  is found by a counter-clockwise rotation about the vertical axis.The axis   is defined as the principal or main current direction and axis   is normal to the principal direction.
Directionality: The horizontal current (   ) that is not in the main direction (   ) is given by a spread parameter, sprcoeff, defined as follows: It is assumed that the shedding process is insensitive to the sign of the velocity.For a pure 2-D current profile, sprcoeff becomes zero.For a case with RMS(   ) equal to RMS(   ), sprcoeff becomes unity.
Shearedness: The shear of a current profile is defined as the ratio between the velocity variation △ and the average velocity    .In many of the observed current profiles, current velocity varied a lot with a rather random pattern and a more general shear parameter is therefore defined: The current profile tends to be uni-directional with increasing speed and the shearedness of the profile also increases.

VIV response analysis
Using a modal approach, the lateral displacements along the length of the riser were reconstructed from the gravitycontaminated acceleration measurements, and the response was calculated as a weighted sum of a set of eigenfunctions.
The VIV response frequency (  =    ∕) can be estimated based on the equivalent Strouhal number (  ) of an oscillating cylinder (Potts et al., 2018) and the current speed ( ).Large scatter on Strouhal number is observed as it is affected by many parameters, e.g., the Reynolds number, roughness and model test setups.Strouhal number values around 0.25 -0.3 can be expected for the Reynolds number range 1 − 7 × 10 5 based on the maximum current speed and the external diameter of 1.13 m.This means that the expected VIV response frequency (  ) is about 0.025 Hz when the current speed is 0.1 m/s.This is close to the first eigenfrequency (0.022 -0.031 Hz) of the riser and the first mode may be excited, without considering other factors.E.g., the actual dominating frequency may deviate from the Strouhal frequency when the flow speeds vary strongly along the length of the riser.The highest estimated VIV load frequency is in the range of wave frequencies.

Vessel motion estimation
The drilling vessel subject to wave, current and wind loads moves around its equilibrium position above the wellhead.Such motions can occur both at wave frequency (around 0.1 Hz) and at much lower frequency around 0.02 Hz.The low frequency vessel motion also introduces relative flow speeds at the riser top end.This speed can be the same magnitude as the current speed in some cases.Direct integration of acceleration measurement at vessel is sensitive to errors at such a low frequency.Therefore, the vessel motion was estimated by the modal approach using all of the acceleration measurements on the riser.The measured vessel acceleration represents the resulted magnitude and direction of the wave loads.

Data used in the present study
All of the data has been transformed to the coordinate system defined by the main current direction.The data can be classified into two categories, environmental load parameters and riser response parameters, as shown in Table 3.The environmental load parameters were applied as inputs to the clustering algorithm, while the riser response parameters were used to evaluate the cluster results.It is expected that riser responses will have the same characteristics if it is subject to similar load combinations, provided that the key load processes are correctly represented by these parameters.

Choice of clustering algorithms
The many clustering algorithms from the literature can be divided into different categories.Among these categories there are partitioning methods, hierarchical clustering, spectral clustering, density-based clustering and distributionbased clustering.
Partitioning methods use a distance-based metric to cluster the points based on their similarity.The clusters obtained are spherical-shaped in the parameter space, and not overlapping.K-means clustering belongs to this category (Lloyd, 1982).When the data consists of a known number of clearly separated clusters, this method works well.For the Helland-Hansen data, however, partitioning methods were deemed not effective as we do not expect clearly seperated clusters.
Hierarchical methods form clusters by evaluating the maximum distance needed to connect parts of the cluster (Ward Jr, 1963;Press et al., 2007), so that different clusters are formed at different distances.Thus, the algorithm does not provide a single partitioning of the data set, but instead provide a hierarchy of clusters.Hierarchical methods can be further divided into two types: agglomerative and divisive.Spectral clustering uses the spectrum of the similarity matrix of the data to reduce the number of dimensions before clustering in the low-dimension space (Von Luxburg, 2007).Based on the results of using these of the experimental data considered in (Wu et al., 2020b), neither of these categories are deemed promising in our case.
In density-based methods, HDBSCAN is an example method where clusters are defined as areas of higher density of data points in the parameter space (Kriegel et al., 2011).Different clusters are separated by areas of low density, and data points in such sparse areas are classified as noise.Unlike partitioning methods and hierarchical clustering, density-based clustering can capture arbitrarily shaped clusters.
Gaussian mixture is a form of distribution-based clustering (Attias, 2000;Press et al., 2007).It is a probabilistic model that assumes that the data points originate from a given number of overlapping Gaussian distributions with unknown locations and sizes.Thus the number of clusters must be given as an input parameter, and no data points will be classified as noise.Since the Helland-Hansen data is from continuous distributions of physical origin and has strong overlap in a high dimensional parameter space, and can thus be reasonably approximated by overlapping Gaussian distributions, this algorithm is chosen in the present study.

Clustering with Gaussian mixture models
The Gaussian mixture models used here are based on the seminal papers by Banfield and Raftery (Banfield and Raftery, 1993) and Celeux and Govaert (Celeux and Govaert, 1995).The description of the method given in the following is mostly based on the latter reference.In GMM, it is assumed that the data  = {  1 ,  2 , … ,   } , with independent observations   ∈ ℝ  for  = 1, … , , comes from a random vector with density where   > 0 are the mixing proportions such that ∑  =1   = 1, and Φ( | , Σ) is the density of a Gaussian distribution with mean  and variance Σ. Common assumptions on Σ are given in (Celeux and Govaert, 1995).Then the goal is to find the parameters This optimization task is usually performed in GMM, as in other mixture models, by the expectation-maximization (EM) algorithm of Dempster et al. (Dempster et al., 1977).In an iterative procedure and starting with an initial guess  0 , the expectation (E) step at iteration  of the algorithm consists of estimating the conditional probability   (  ) that   comes from the -th mixture component, given the estimated   .This probability is given by In the maximization (M) step, updated maximum likelihood estimates p , μ , Σ are computed using   (  ) as conditional mixing weights, and this gives  +1 .
Here, -means (MacQueen et al., 1967;Hartigan and Wong, 1979) was applied to initialize the parameters  0 , as suggested in (McLachlan et al., 2019).100 random initialization of the -means algorithm were used to make the results more consistent.Then 100 iterations of the EM algorithm was performed for each  0 obtained.Only the best result was kept, i.e. the parameters with the largest likelihood.

Validation of cluster analysis
Statistical methods for cluster evaluation are commonly divided into two categories: internal and external evaluation (Halkidi et al., 2002a;Gan et al., 2021).These terms reflect whether the test data are obtained internally or externally with respect to the clustering process.Sometimes the terms supervised and unsupervised evaluation are used, but it is important to not confuse supervised validation of clustering with supervised learning.In addition there is a third class of approaches, called relative evaluation, which is not based on statistical tests and can be preferable when computational cost is an issue (Halkidi et al., 2002b).This was not considered in the present study.
An internal evaluation typically results in a single quality score for the clustering in question.Several different methods with corresponding scores exist, including the Davies-Bouldin index, the Dunn index and the silhouette index (Arbelaitz et al., 2013).The motivation behind each of these methods is the intuitive idea that an item in a given cluster should be more similar to the other items in that cluster than items in other clusters.On the other hand, an external evaluation approach can typically consist of comparing cluster results with an existing ground truth classification, manual evaluation by a human expert, or indirect evaluation by evaluating the utility of the clustering in its intended application (Pfitzner et al., 2009).
Both internal and external evaluation suffer from fundamental limitations.Internal evaluation is effective for comparing different clusters against each other, but the scores obtained will be highly biased towards assumptions made in the clustering processes.Furthermore, it is effective for comparing the results with respect to the different optimization objectives used for clustering and evaluation, where you get a high score if they would yield similar results.Similarly, external evaluation is biased towards the ground truth used.This ground truth is not always reliable; if it was very reliable, unsupervised learning would probably not be the preferred method.Furthermore, if the clusters are evaluated by humans, this evaluation will be highly subjective and not necessarily consistent.With this in mind, the evaluation performed in this study is most useful for identifying bad clusters, more than giving an accurate measure of the quality or utility of the end results.

Internal evaluation method
The clusters obtained by GMM were evaluated using silhouette scores (Rousseeuw, 1987) in this study.There is no known efficient algorithm for performing clustering with the objective given by the silhouette index.However, it is one of the most used internal criteria for evaluation, and can be seen as a measure of how well-defined the clusters are.Specifically, a silhouette value is calculated for each item in the data set.It is based on: 1) the mean distance between an item and all other items in the same cluster; 2) the mean distance between an item and all other items in the next nearest cluster.Formally, given a distance metric  ∶ ℝ  × ℝ  → ℝ, 1) is found by calculating for the item  ∈ , where || denotes the number of items in cluster .Then 2) is given by where () is the smallest mean distance of  to all points in any other cluster, of which  is not a member.The cluster with this smallest mean dissimilarity is said to be the "neighboring cluster" of  because it is the next best fit cluster for point .
The silhouette value is given by and the closer the score is to one, the more well-defined the clustering is.Values near zero indicate overlapping clusters.
By calculating silhouette values for all data points, an average for the whole data set and averages for each of the clusters can be obtained.The former provides a measure of how well-defined the clustering is as a whole, and thus scores for comparing different initializations and assumptions, e.g. with respect to the number of clusters.The latter gives a measure of how well-defined individual clusters are, and can thus be used to the quantify the confidence of the different clusters.

External evaluation method
As mentioned above, external evaluation also comes with some issues: if we have accurate "ground truth" labels, then clustering is not needed.However, accurate ground truth labels usually do not exist.In addition, pre-existing labels only reflect one possible partitioning of the data set, which does not imply that there does not exist a different, and maybe better, clustering.
Here, manual clustering was carried out by examining the frequency contents of the measured riser acceleration signals.Examples of acceleration spectra at both unit 3 and 4 in cross flow (CF) direction are shown in Fig. 3, from the measurement record taken on 3rd of May 1998 with maximum current velocity 0.2 m/s, current shearedness 0.37 and directionality parameter 0.79.For this case, the acceleration responses consist of responses due to VIV load at 0.05 Hz, wave load excitation at 0.1-0.2Hz and low frequency motion excitation around 0.02 Hz.The maximum acceleration spectrum value at unit 3 or 4 in CF direction (normal to the main current direction) is used as reference quantity in the manual clustering.In the example case, the peak spectrum value is 0.079 m 2 ∕s 3 at unit 3 and 0.027 m 2 ∕s 3 at unit 4. The larger value of the two, at unit 3, is taken as reference quantity for this case.This is found efficient in finding those cases with responses dominated by VIV load, as the acceleration spectra often have high peaks and narrow bandwidth.On the other hand, the acceleration spectra for those cases dominated by wave load often have round peaks and wide bandwidth.For this reason, a relatively high level of acceleration spectrum value was applied to identify cases dominated by VIV load while a lower level was applied in identifying cases dominated by wave loads.
To obtain an overview of the maximum acceleration spectrum values for all 242 cases, the corresponding cumulative distribution functions (CDF) for both unit 3 and 4 are shown in Fig. 4. The CDFs of both units are in general close to each other.At CDF level of 0.7, the corresponding maximum acceleration spectrum value is 0.06 m 2 ∕s 3 and at CDF level of 0.3, it is 0.02 m 2 ∕s 3 , for both units.The manual clustering was defined based on these two CFD levels: 0.7 for cases with relatively large responses; 0.3 for cases with relatively small responses; and 0.3-0.7 for cases with intermediate responses.Strouhal frequency was used to estimate the expected VIV response frequency.Based on these conditions, the response was manually classified as follows: • Responses dominated by VIV load: spectrum peak value at either unit 3 or 4 larger than 0.06 m 2 ∕s 3 and the corresponding frequency is close to the Strouhal frequency (difference < 0.04 Hz).75 cases were identified as VIV load dominated cases.
• Small response cases: cases at both units smaller than 0.02 m 2 ∕s 3 .61 cases were identified as small response cases.
• Responses dominated by wave load: for the remaining 107 cases, cases with frequency at maximum acceleration spectrum value at either unit 3 or 4 larger in wave frequency range and far away from Strouhal frequency (difference > 0.04 Hz) were recognized as wave load dominated cases.52 cases were identified as wave load dominated cases.
• Responses subject to VIV, wave load and low frequency top motion: The remaining 54 cases were observed to include combined load effects and the response magnitude was intermediate in general.
The above criteria are based on the domain knowledge and can be subjective.The local measurements may not represent the global responses in some cases, e.g., the sensor happens to be located at a small response area, and the actual frequency may be lower than Strouhal frequency.In addition, there can be multiple loads acting simultaneously and multi-frequency response can be present even for VIV load dominant cases, as shown in Fig. 3.However, this method provided an independent evaluation based only on measured responses.Confidence on the cluster results is obtained when GMM using only environmental load parameters gives consistent clustering results compared to the manual evaluation.
242 evaluated cases were further used for verifying the application of GMM for its possibly first-time application on field measurement data.The identified VIV load dominated cases were used as criteria to evaluate the cluster results.It is the intention to reduce the need of manual evaluation.Even so, GMM still provides more details and better interpretability on the riser response due to different load combinations.

Overview of input parameters
A total of 7 parameters were selected representing environmental loads due to current, waves and vessel motions.They represent key load processes that can affect riser response.The distributions of cluster analysis input parameters are presented in Fig. 5.It is difficult to visualize this high dimensional data, and hence the data is presented by the histogram of each parameter and scatter plots between all pairs of parameters.The plot also reveals the relationships of some parameters, e.g., high current speed tends to be associated with high shearedness of the current profile.Larger directionality of the current was observed for low current speed.The number of parameters can be further increased, which may reveal more features, but it also makes the results interpretation more difficult.Significant overlap of these parameters can also be seen and visual separation of data groups is not feasible.The strong overlap of the data indicates that hard separation may be difficult.Gaussian mixture models are probabilistic models and it uses the soft clustering approach for distributing the points in different clusters.
Some of the parameters are also presented in Fig. 6, as a function of the measurement in chronological order.Both gradual and sudden changes of weather conditions are observed and there are not two cases having identical conditions.

Cluster analysis results
Silhouette score uses compactness of individual clusters (intra cluster distance) and separation amongst clusters (inter cluster distance) to measure an overall representative score of how well our clustering algorithm has performed.The silhouette score was calculated for different numbers of clusters, as presented in Fig. 7.It is seen that the silhouette score increases with the number of clusters, and we may conclude that a minimum of 8 clusters are needed to get a good score.The selection of the number of clusters is based on the compromise that the data is sufficiently separated with a minimum number of clusters.A sensitivity study on the selection of the number of clusters is presented in Fig. 8.It can be seen that more differences can be identified with increasing number of clusters.However, the smaller clusters can be less representative due to the size of the data.The cases have also been evaluated manually based on domain knowledge.The frequency contents of the acceleration signals at sensors no. 3 and 4 were used to evaluate the characteristics of the riser responses.This may not reflect all aspects of riser responses, but it provides information based on local response measurements.The cases are shown to be dominated by VIV load (i.e., narrow-banded riser response dominated by the frequency close to Strouhal frequency).In addition, the identified VIV load dominated cases have different frequencies as they include cases with different current speed ranges.
The manual cluster results were compared with the GM cluster results.12 clusters were selected, at which point the gradient of the silhouette score trend becomes smaller.This means that less improvement is expected with further increase of the number of clusters.It can be seen that the manually identified VIV load dominating cases center around case no.45,125 and 200. This corresponds to GM cluster no. 6,8,10,11 in Fig. 8c.Some discrepancy can be seen: e.g., five cases in cluster no. 1 overlap with manual results.However, the cluster results using totally different methods and input parameters seem to be consistent.It is also noted that the silhouette score is an average of scores for individual clusters, as shown in Fig. 9.It can be seen that cluster no.11 is best separated with the highest silhouette score and cluster no. 1 has the largest uncertainty, i.e. it is the cluster most likely be mixed with other clusters.The horizontal bar is a measure of the variation and indicates the sensitivity on the silhouette score due to random initialization.
The distribution of the 12 clusters is presented in Fig. 10.It can be seen that some of the identified clusters seem to be separated from the others based on certain parameters.For example, cluster no. 12 has a high spreading coefficient and low current speeds.In general, however, the clusters have strong overlaps in this high dimensional parameter space.The mean values are also presented in Table 4 and marked with colors indicating the levels of magnitude, i.e., red for the highest value, brown for medium value and green for the lowest value.We observe that there are 12 cases in cluster no.11.The mean value of the maximum current speed of the 12 cases is about 0.67 m/s, which is highest compared to that of other clusters, which corresponds to a Strouhal frequency about 0.15 Hz.This is in the same range as the wave frequency and corresponding to the fifth mode.The actual response frequency will be affected by other parameters as well.The averaged shearedness is 0.57 and in the medium level, while the directionality of the current, the wave loads (vessel acceleration), and vessel motion induced velocity are relatively low.Cluster no. 4 has the lowest averaged current speed; the wave loads and vessel induced velocity are relatively low as well.The current profile in cluster no. 12 has the highest directionality and low current speed.Cases in cluster no. 2, 6 and 8 have medium current speeds and low wave loads and vessel motion induced velocity.In summary, environmental load conditions are separated in 12 clusters and the riser responses in each of the clusters are expected to be similar.
The characteristics of the riser response of each cluster are also summarized in Table 4.In general, clusters are able to separate different riser response characteristics.The differences in riser responses are also correlated to the corresponding environmental load parameters.
Cases in cluster no.6, 8, 10 and 11 are in general dominated by VIV load, which are related to 1) high current speed and medium/low wave loads (cluster no.11); 2) medium current speeds and relatively low wave loads (cluster no.6 and 8); 3) low current speed, but high vessel motion induced velocity (cluster no.10).Wave loads normal to the main current direction in cluster no. 8 do not seem to disturb VIV responses.Otherwise, VIV frequency are generally not present for low current speed cases as indicated in clusters no. 4 and 12.In such cases, the responses are small or dominated by the wave frequency.The associated environmental load parameters for the cases dominated by VIV loads are presented in Fig. 12.It is seen that these cases are in general associated with medium and high current speeds, low current directionality and small wave loads in the main current direction.In some cases, vessel motion induced velocity is of the same order of the current speed, e.g., case 213.
VIV load and wave load will compete or interact for cases in clusters no. 1, 3, 7 and 9, when the current speed is about the medium level.Cases in cluster no. 2 have medium current, mild shearedness and directionality as well as relatively low wave loads, which seems to be ideal for the presence of VIV responses as in cluster no. 6.However, only small responses are observed.The further investigation revealed that the current profile of these cases are special and cannot be properly represented by the shearedness parameter.One example is shown in Fig. 13.The maximum speed is 0.36 m∕s, but the current direction turns 180 • at a water depth of 450 m.The shearedness parameter was calculated with the absolute value of the current speed, which means that the actual local current speed variation is much higher than the calculated value.It is known that quick changes in the local current speeds make it difficult for vortex shedding to be synchronized along the length of the riser, which can be the reason for small responses.

Time domain VIV prediction model
The riser responses of the selected 242 full scale measurement events are predicted using the time domain VIV responses analysis tool: VIVANA-TD.We introduce first the theory background of VIVANA-TD, then the predicted riser responses including maximum displacement standard deviation along the riser and the corresponding dominant response frequencies.The responses are presented with respect to the 12 identified clusters.The prediction accuracy for different clusters varies and the possible reasons are explored and discussed with consideration of the characteristics of individual cluster and the limitations of the current time domain VIV calculations.

Hydrodynamic load formulation
The hydrodynamic load model combines a vortex shedding force term with Morison's equation based on the strip theory.The cross-sectional force of the cylinder are illustrated in Fig. 14, and the hydrodynamic forces on the cylinder (per unit length) are represented by Eq. ( 9) (Kim et al., 2022): where,   ,   , , and  are the inertia coefficient, drag coefficient, cylinder diameter, and fluid density, respectively. , and  , are in-line and cross-flow vortex shedding force coefficients;  exc, and  exc, are the in-line and crossflow instantaneous phases of the vortex shedding forces.  is normal to the cylinder axis as shown in Fig. 14.Each force term of the load model is defined in the direction of the incoming flow velocity,   , the structure local response,   , and the relative flow velocity,   (=   − ̇ ).
The first three force terms are the inertia force   (=    +    ) and drag force   , while  , and  , are inline and cross-flow vortex shedding forces.The cross-flow vortex shedding force represents a lift force in the local  direction, which is the direction normal to the relative flow vector   .The in-line vortex shedding force  , describes a fluctuating drag force in a local  direction, which is in the same direction as   .

Synchronization of vortex shedding forces
The synchronization model describes the phase-coupling between force and response to obtain lock-in.The instantaneous frequency of the vortex shedding force will be adjusted so that the phase of the force can maintain lock-in with the phase of the cylinder velocity.Part of the vortex shedding force will be in phase with velocity, which determines the energy input to the structure.The hydrodynamic damping is determined by the drag load term.The rest of the vortex shedding force is in phase with acceleration and contributes to the added mass term.CF synchronization model is as follow: fexc,  = f0,  + Δ f sin   (11) where, Δ  , and f0, determine the cross-flow synchronization range.  is the phase difference between the cylinder cross-flow velocity,  ẏrel and the cross-flow vortex shedding force,  exc,y , (  =  ẏrel −  exc, ). ẏrel can be numerically approximated at each time step (Thorsen et al., 2017).f0, refers to the point of full synchronization, where the vortex shedding force is almost completely in phase with cylinder velocity or   = 0.
The IL synchronization model (Kim et al., 2021b) has a similar formulation as CF model.IL response frequency is assumed to be twice that of CF frequency, which is normally the case for marine risers.

Establish empirical parameters to be used in VIVANA-TD
It is well known that hydrodynamic coefficients are strongly affected by Reynolds (Re) effects.The Re number of the Helland-Hansen riser ranges from 1 to 7e5 based on the maximum current speed and the diameter of the buoyancy element.
A review of the available test data (Ding et al., 2004;Potts et al., 2018;Gu et al., 2020;Lie et al., 2013;Yin et al., 2018) showed variation of responses is dependent on the Reynolds number, roughness and model test set-up, i.e. mass ratio, structural damping, etc.The highest displacement amplitude ratio is in the range of 0.5D -0.8D depending on roughness for Re > 15.Higher displacement up to 1.8D of a smooth cylinder was also observed close to Re = 15, which was considered not relevant for marine riser application.Strouhal number up to 0.25 -0.3 is observed with a large scatter of data.The empirical parameter set defined in Table 5 is proposed based on the evaluation of realistic VIV response amplitude and frequency at high Re.
The effect of using this empirical parameter set was evaluated by simulation of a rigid cylinder.The simulated cylinder has a mass ratio of 2.0, ratio of the natural frequency in CF and IL direction is 2.0.The predicted displacement amplitude ratio (∕ = √ 2  ∕) is plotted against the reduced velocity (  =  ∕  ) in Fig. 15.The maximum cross-flow displacement amplitude ratio is 0.69, which corresponds to normalized frequency ( ℎ =   ∕ ) of 0.25.

Riser response prediction
The actual measured 3-dimensional current profiles and estimated vessel motions were used in the simulation.The predicted maximum displacement standard deviation along the length of the riser and the corresponding dominating response frequency are compared with measurements in Fig. 16 and Fig. 17.The results are presented in different clusters.The percentage of cases having a large error in the predicted maximum displacement (over-prediction or under-prediction larger than a factor of 1.5) and dominant frequency (difference larger than 0.03 Hz or miss-prediction of one mode) compared to measurement were also calculated and presented in Table 6.In general, the prediction agrees well with the measurement.The deviation between predicted and measured displacement is within a factor of 1.5 for 141 among the total 242 cases (marked by the black lines in the figure).Under-prediction of displacement larger than a factor of 1.5 is seen for 10 cases.It is clear to see that VIV prediction accuracy varies for different environmental load combinations.Response prediction of VIV load dominating cases is good for clusters no.6, 8, 10 and 11 in terms of displacement and/or frequency, which means that 70 % of the cases have small displacement error and 94 % of the cases have small frequency error.Larger deviations are observed for clusters no. 1, 4 and 12, which experience low current speeds and responses are dominated by wave loads.This deviation is expected as the wave loads were not included in the present simulation due to lack of monitored data.Such deviation seems to be smaller for higher current speed cases in clusters no. 3 and 9. Influence of wave loads on the prediction accuracy are reduced in such cases.Some of the cases in clusters no. 2 and 6 show large over-prediction; current changes direction in these cases, as seen in Fig. 13.The measured frequency is very low in some of the cases in cluster no. 4, e.g., between case no.50 -100.This is partially related to small and chaotic responses under small environmental loads.The estimated Strouhal frequency based on the maximum current speed gave an indication of the expected response frequency.The prediction discrepancy is related to 1) wave loads are not included in the simulation; 2) interaction of wave and VIV load processes requires different hydrodynamic parameters.
Further improvement on the prediction can be achieved by including wave loads and applying adaptive hydrodynamic parameters (Wu et al., 2020b) based on the different environmental load conditions indicated by the identified clusters.Optimization algorithm can be applied to obtain optimal hydrodynamic parameters based on the field measurements (Yin et al., 2022a).The obtained empirical hydrodynamic parameters also need to be further consolidated based on extensive analysis of both field and laboratory test data.The internal flow of the drilling mud was not considered in this work, as it is not in present VIV prediction practice.However, several numerical studies (Thorsen et al., 2019;Duan et al., 2021) show potential impact of internal flow on riser responses which may be considered in further analysis.

Conclusions
The riser in the field experiences complex environmental load conditions.The current profile and its directionality, wave loads and vessel motion induced flow velocities all affect riser responses, which are represented by a 7-parameter space.
Gaussian mixture models can separate environmental conditions into clusters based on the statistical similarity of the data assisted by domain knowledge.A total of 242 measurement events were investigated and grouped into 12 clusters using only the environmental condition parameters.The number of cluster selected was based on the quality of the separation by the GM algorithm and the data size of the identified clusters.The cluster results were also evaluated manually based on the local acceleration measurements.Both approaches gave consistent results.
Insights on riser responses were obtained by evaluating the characteristic values of environmental load parameters in the identified clusters.Cluster analysis is a valuable tool to reduce the parameter dimensions so that the underlying physics can be better understood.
Riser response prediction using the time domain VIV prediction tool was conducted using the actual 3-directional current and vessel motions, and the comparison with measurements is satisfactory.The potential reasons for prediction deviations were better explained based on the cluster results.This indicates the necessity of performing integrated analysis, in which multiple load processes can be included simultaneously.Adaptive empirical hydrodynamic parameters may also be required when the load process becomes different than in a constant flow condition, and this will be a topic for further studies.

Acknowledgement
The authors are grateful for the financial support to the research project PRAI (Prediction Riser-response by Artificial Intelligence, project no.308832) from the Research Council of Norway, Equinor, BP, Subsea7, Kongsberg Maritime and Aker Solutions, and their permission to publish this work.The authors are thankful to Dr. Guttorm Grytøyr for his help to provide data and information on the Helland-Hansen riser measurements.

Figure 3 :
Figure 3: Example of measured acceleration spectrum at sensor unit 3 and 4 from 3rd of May 1998.

Figure 4 :
Figure 4: Cumulative distribution of maximum acceleration spectrum at sensor unit 3 and 4

Figure 5 :
Figure 5: Distribution of parameters used in cluster analysis.

Figure 6 :
Figure 6: Variation of the environmental load parameters in chronological order.

Figure 7 :
Figure 7: Silhouette score for various numbers of clusters.

Figure 8 :
Figure 8: Sensitivity on the selection of number of clusters

Figure 9 :
Figure 9: Silhouette score for individual clusters when 12 clusters are selected.

Figure 12 :
Figure 12: Environmental load parameters corresponding to VIV load dominating cases

Figure 13 ::
Figure 13: Current profile for case 159 in cluster no.2

Figure 15 :
Figure 15: Predicted A/D based on a simulated elastic supported rigid cylinder

Figure 17 :
Figure 17: Comparison of dominant frequency.The measured frequency is compared with prediction organized in 12 clusters (C1-C12) and the estimated Strouhal frequency

Table 3
Selected data in present study

Table 4
Riser response characteristics of each cluster

Table 5
Empirical hydrodynamic parametersParameters      ,  , Comparison of displacement prediction.The measured displacement is compared with prediction organized in 12 clusters (C1-C12)

Table 6
Prediction evaluation Cluster no.Number of cases Large displacement prediction error (%) Large frequency prediction error (%)