A classification system for global wave energy resources based on multivariate clustering

Better understanding of the global wave climate is required to inform wave energy device design and large-scale deployment. Spatial variability in the global wave climate is analysed here to provide a range of characteristic design wave climates. K-means clustering was used to split the global wave resource into 6 classes in a device agnostic, data-driven method using data from the ECMWF ERA5 reanalysis product. Classification using two sets of input data were considered: a simple set (based on significant wave height and peak wave period) and a comprehensive set including a wide range of relevant wave climate parameters. Both classifications gave resource classes with similar characteristics; 55% of tested locations were assigned to the same class. Two classes were low energy, found in enclosed seas and sheltered regions. Two classes were moderate wave energy classes; one swell dominated and the other in areas with wave action often generated by more local storms. Of the two higher energy classes; one was more often found in the northern hemisphere and the other, most energetic, predominantly on the tips of continents in the southern hemisphere. These classes match existing regional understanding of resource. Consideration of publicly available device power matrices showed good performance was primarily realised for the two highest energy resource classes (25–30% of potential deployment locations); it is suggested that effort should focus on optimising devices for additional resource classes. The authors hypothesise that the low-risk, low variability, swell dominated moderate wave energy class would be most suitable for future exploitation.


Introduction
Deployment of wave energy converters (WECs) has vast potential for renewable energy generation. To understand this potential and identify deployment locations, a range of global resource assessments have been conducted, e.g. [1][2][3]. Traditionally, focus has been on resource magnitude using annual mean power. Increasingly, resource assessments are considering variability: spatial, e.g. [4], temporal, e.g. [5,6], and spatio-temporal, e.g. [7], variability have all been considered. When considering the broader-spatial scales associated with theoretical resource assessment, [8] demonstrated that the traditionally considered storm exposed areas may not be the most beneficial compared to equatorial regions with higher consistency.
To demonstrate the implications of differences in temporal variability at specific sites, Fig. 1 shows the theoretical resource at two sites, Wave Hub, UK (50.50 N, 5.00 W), a well-researched site described by Refs. [9,10], and Southwest Java, Indonesia (7.50S, 105.50E), a less storm-exposed area described by Ref. [11]. Both sites used in Fig. 1 have the same mean available power (18.30 kW/m) but very different temporal variability: the standard deviation of the power timeseries is 11.06 kW/m for the SW Java site and 27.92 kW/m for Wave Hub. Therefore, in the case of Fig. 1, it could be hypothesised that the SW Java site might allow for deployment for lower cost devices (lower survivability requirements) and would result in steadier supply of electricity. Furthermore, [12] demonstrate that consideration of power production can accentuate these differences. Recognition of this has led to a range of resource studies focussed on tropical regions [11,[13][14][15][16]; however these studies often utilise WEC technology developed for stormy/high latitude regions, e.g. [13].
Various other sea-state characteristics may also influence choice of wave energy site. Design cost can be considered by use of a risk factor, a ratio between extreme wave height and mean wave height [8,17,18]; when the ratio is low, there is little difference between the extreme and the mean and thus designing for storm survivability should be less of a cost burden. Similarly, [19] defined the ETER parameter as the ratio of the exploitation wave height to the 112-year return period wave height; and [20] defined the Figure of Merit as the mean annual wave energy flux divided by the 100-year return period wave height. Spectral properties of the sea state can also influence WEC power output and hence design [21][22][23]. [22] demonstrate that Goda's peakedness parameter [24,25] is correlated with device performance. Some devices may also be sensitive to incident direction; while one would assume such devices would be orientated towards the dominant direction, sites with limited variation in direction over the long term (low standard deviation in mean wave direction) and short term (wave directional width) might be beneficial, especially when it comes to arrays of devices [26,27].
Unlike offshore wind and, to a lesser extent, tidal stream energy, there has been no convergence of WEC design. This is due to the diversity of wave climates, the range of deployment characteristics (e.g. water depth), and the variety of kinetic and potential energy resources available in water. Anecdotally, this divergence in design has increased investor confusion; therefore, methods to rationalise device characteristics should lead to enhanced investment. It is therefore attractive to attempt wave resource classification and attribute different types and scales of WEC device to different resource classes.
There are two other main reasons that wave resource classification is attractive. Firstly, global classification of wave resources would assist with large-scale roll-out of existing WEC technology; if a device was optimised for one wave resource class then it would perform well in other geographic locations with the same class. Secondly, global classification would identify alternative resource types that would inform future design. Much of the WEC technology development effort has focussed on deployment in NW Europe [28]; however, this means that relatively similar wave climates have been considered in the design process. Fig. 2 uses the ECMWF ERA5 dataset (Section 2.1) to show how wave energy test centres are spread out through the global coastal mean H s -mean T p parameter space. The considered wave energy test centres are listed in Table 1. The plot shows the joint occurrence matrix for mean values of H s and T p with the H s -T p pairs for test centre locations superimposed. While there were two centres in China with lower wave heights and periods; and two on the west coast of Australia with higher periods, the majority in Europe and the US are grouped in a small region of the global wave parameter space. To date, two different approaches have been taken for WEC resource classification; either focussing on classifying WEC behaviour [29,30] or WEC resources [17,31]. [29,30] focus on classifying WECs based on power take off loadings. They consider wave model outputs for NW Europe and split these based on significant wave height (H s ) and peak Fig. 1. Theoretical wave power calculated using the deep-water approximation and data from the ERA5 dataset (see Section 2.1) at two sites: Wave Hub, UK (50.50 N, 5.00 W) and a site in SW Java (7.50S, 105.50E).

Fig. 2.
A joint occurrence matrix for mean H s -T p values for the ECMWF ERA 5 showing variation over the coastal globe with the characteristics of wave energy test facilities marked in black. The sites marked are described in Table 2. Fairley, et al. Applied Energy 262 (2020) 114515 wave period (T p ). A range of devices from [32] are then used to calculate loadings for the different resource bandings. These are then used to classify WECs. [17,31] focus on classifying the resource. Both cases consider United States waters using hindcast model outputs. Wave resources are classed based on bandings of annual available wave energy and wave period to provide a matrix of 12 classes. [17] also gives consideration to a risk metric which is the ratio of an extreme wave height value to the mean wave height value. Clear geographical differentiation between East and West US coasts is shown. This approach [31] is attractive as it clearly mirrors wind resource classification (described in [29]) and hence is familiar to investors; however [31] consider a shortcoming to be the delimitation between classes being 'somewhat arbitrary.' Neither approach has been applied on a global scale. This contribution takes a new approach: we attempt to classify the global wave resource from a data-driven device-agnostic perspective. The rationale of being device-agnostic relates to the last postulated objective of classification; informing future design. Therefore, we use the k-means clustering algorithm [33] to classify the global resource a priori based on data characteristics. In a related marine energy application, k-means clustering has previously been used successfully to reduce the range of observed conditions at a specific location to a set of characteristic conditions for tank testing [34].

Material and methods
In this section, both the dataset and the clustering-based classification methodology will be described. A note on terminology used here and in the results is required. Clustering is the process by which the wave resource classification takes place. The clustering analysis assigns all datapoints to one of several clusters; we refer to the returned clusters as classes (each cluster is a separate resource class). Classification is attempted with different sets of input parameters (named W -WXSD, see below). The classification using, for example, the W input set is the termed the W classification and its classes referred to as W1-W6, and similarly for the other input sets.

Dataset
This study uses the publicly available ECMWF ERA5 dataset. This is the latest atmospheric and wave reanalysis available from the ECMWF [35]. It builds on the previous ERA-Interim reanalysis [36,37]. The available wave data spans the globe at 0.50°resolution. The ERA5 wave model is based on WAM [38]. Bathymetry is taken from the ETOPO2 dataset [39] with parametrisation of sub-grid bathymetry [40]. Wind forcing is achieved through 2-way coupling of the wave model with the ECMWF Integrated Forecasting System (IFS). Full details of the ERA5 wave model is given in [41]; validation of the model is given in [42,43].
12 years of data from 2000 to 2011 at 3 hourly intervals is used; dataset duration and temporal frequency were constrained by workstation performance (see Section 4). To remove areas unlikely to be developed from a logistical perspective, only data that is 'coastal' was used in the classification analysis. For a point to be considered coastal, the datapoint must be within 3 degrees of land; not on the Antarctic coastline or any Arctic Circle coastlines; and not a location where sea ice is predicted at any point during the analysis time period (due to ice limitation and end-user need uncertainties). The considered areas are shown as blue in Fig. 3. This definition resulted in 21,314 datapoints being used in the classification.
Based on the description in Section 1, the authors consider several parameters that are important to both WEC design and power generation. The considered parameters are split into four groups and combinations of these groups used to create the input parameter sets ( Table 2). The four groups are: 1. Basic wave characteristics (denoted W): the square of significant wave height (H s 2 ), representing the energy contained within the wave field, peak wave period (T p ), and the coefficient of variation (CV) in these statistics. CV is the non-dimensional ratio of the standard deviation to the mean and has previously been used in wave resource assessments (e.g. [8] where m 0 is the zeroth moment of the wave spectra; f is the frequency and E(f) is the 1-d wave energy spectra in the frequency domain. 4. Directional characteristics (denoted D). It is assumed that devices will be moored such that optimal performance occurs for the dominant direction and therefore only circulate standard deviation in mean wave direction is considered. The wave directional width (WDW) represents the directional spreading in wave energy about the mean direction. Both this quantity and its standard deviation is considered.
Individual device design constraints would determine which categorisation is most appropriate for a specific developer. Fig. 4 shows global plots of the parameters used in the analysis. The square of wave heights is largest in the southern hemisphere, especially in the Southern Ocean; the coefficient of variation lowest in this hemisphere. Similar patterns are shown for peak period. The 50-year return period extreme wave height is slightly larger in the northern hemisphere. The risk factor is lower in the southern hemisphere. In general, risk factors are highest in more sheltered seas where there are episodic large storm events but otherwise low background wave heights.
The global pattern of the risk factor from this dataset is similar to previously reported [8]. The maps presented in Fig. 4 suggest that key deployment areas of the southern hemisphere are better suited to wave energy extraction than deployment areas in the northern hemisphere, something that has previously been remarked upon [28]. Mean values of Q p are lower (wider banded spectrum) in the mid-latitudes; with the exception of western facing coasts exposed to the Southern Ocean. Standard deviation in Q p is higher in these areas. Mean wave directional width is lower on western coasts and higher on eastern coasts; the patterns link both with west -east movement of extra-tropical storms and areas of trade winds. Standard deviation in this parameter is lowest for the west coasts of South America and Africa. The mean value of mean wave direction shows highest values at locations where storms are moving off land onto the ocean (and hence the latitude of each individual storm will impact direction of generated waves). Table 3 shows the correlation between the different parameters. Since the correlations are symmetric about the diagonal; duplicates and self-correlations are greyed out for ease of reading. r 2 values greater than 0.9 are marked in bold. The only r 2 value that exceeds this is the correlation between the normalised variance in wave height and the risk metric. This is unsurprising given they are similar parameters, both being an indication of variability divided by the mean. Overall the lack of covariance means the parameter space is diverse and including all parameters should be beneficial to the clustering results.

k-means clustering
The k-means clustering algorithm [33] is an iterative process that assigns all datapoints to a specific cluster based on minimising the point-to-centroid distances. The number of clusters k is defined prior to running the algorithm. Initially k cluster centroids are chosen; in this analysis, which used the MATLAB 'kmeans' function [44], the k-means ++ algorithm [45] is used to compute the initial cluster locations. [45] show that this approach is faster and more accurate than the standard kmeans approach. Once the initial centroid locations are selected, all datapoints are assigned to the cluster with the closest centroid. New cluster centroid locations are then computed from the average of the points within the cluster. Points are once again assigned to the cluster with the nearest centroid and this iteration continues until the cluster assignments do not change. In this study, equal weight is given to each variable and therefore, since the k-means technique makes use of distance between points, the data is normalised by range in the dataset before clustering.
Results depend on the pre-selected value of k; the most suitable value of k is not necessarily obvious from the data, as is the case here. Therefore, two tests were used to determine the most suitable value of k: the Elbow test and the Silhouette test.
The Elbow test considers the variation in the mean sum of squared distance as the value of k increases. As number of clusters increases the mean sum of squared distances will reduce (when k = number of datapoints the distance for all clusters would be 0). When plotted against k, typically, one sees an 'elbow' where increasing k stops significantly reducing the mean sum of squares. In this case (Fig. 5), the elbow and thus the most appropriate value of k, is between k = 4-8.
The Silhouette test [46] assigns a value s to every datapoint of between 1 and −1 based on distance of points to clusters; whereby s = 1 if a datapoint should definitely be in the cluster it has been assigned to; s = 0 if a datapoint could equally well be in another cluster, and s = −1 if a point should definitely be in a different cluster. Therefore, one can take the average of s for clustering results with different kvalues and the highest average s gives the optimal k-value (Table 4).
Based on the results of both these tests, a k-value of 6 was chosen.

Table 2
The parameters used in the 8 input parameter sets; 1 indicates the parameter is included in the set and 0 that it is excluded.

I. Fairley, et al. Applied Energy 262 (2020) 114515
While Table 4 shows that the optimal k-value varies depending on input set; to enable comparison of results between sets, the most frequently occurring value in Table 4 was selected.
After the clustering analysis, the returned clusters are ranked based on cluster-mean H s 2 and named class 1 -6 such that class 1 has the lowest mean H s 2 (a proxy for energy) and class 6 the largest.

Results
In this section, the properties and geographical spread of the 6 different classes are described. To clarify discussion of results, only the W and WXSD classifications are presented and discussed. This represents the most basic of the classification sets (W), a proxy for theoretical resource, and the most comprehensive (WXSD) with all parameters included. Key properties of classifications from the other input sets are given in Appendix A and further rationale is given in the discussion for the choice of the two sets focussed upon. The percentage of the datapoints in each class is shown in Table 5. For both sets of categorisation data; the lowest percentage is for class 6, the highest energy class. For the W classification, greatest percentage is The parameter space of the two classifications is shown by a box and whisker plot (Fig. 6); additionally, mean values are given in Table 6.
The boxes show the median value, 25th and 75th percentiles. The whiskers indicate the range of the data. For both classifications and all parameters, there is a fair amount of overlap in the range of parameters between different classes. This is due to the continuous nature of the global coastal dataset. Broadly speaking, the W and WXSD classifications show similar patterns.
Mean H s 2 (Fig. 6a)  Distributions of mean T p between classes is shown in Fig. 6c. Considering the W classification, values are lowest for class W1 and increase to a maximum for class W4. The class W5 distribution occupies a lower area in the parameter space, similar to class W3. Values of T p for class W6 are higher than both W3 and W5 but not as high as W4. The WXSD classification shows a similar pattern with the exception that the distributions for classes WXSD3 and WXSD4 are switched compared to classes W3 and W4. The CV in T p (Fig. 6d) is again similar for both the W and WXSD classifications, with classes 3 and 4 being switched between classifications. Class 1 has the highest median CV values. Class 2′s median values are similar but the range is much greater, covering the entire span of values. CV values for classes 3-6 all show substantial overlap between distributions but class 6 as the lowest median value.
The H 50 parameter space is similar for both classifications (Fig. 6e). Classes 1-4 are all similar and have lower H 50 values compared to classes 5-6 (which are also similar). A stronger pattern is observed when considering the risk factor (Fig. 6f). For both classifications, risk factor is highest for class 1 and reduces to a lowest risk for class WXSD3/W4. Of the 2 most energetic classes, class 5 has a higher risk factor for both classifications.
The mean peakedness values (Fig. 6g) are similar for both classifications and all classes. Of note is that that median value is higher for class W4 than class W3 and similarly higher for class WXSD3 than class WXSD4. Additionally, the range of the distribution reduces for the higher energy classes (5&6). The standard deviation in this parameter (Fig. 6h) is similar for all classes apart from W4 and WXSD3, where the   standard deviation is higher. For the WXSD classification, standard deviation in mean wave direction (Fig. 6i) is highest for class WXSD1 and class WXSD5, it is lowest for class WXSD3, while classes WXSD2, WXSD4 and WXSD6 all show similar distributions. The W classification follows a similar pattern when focussing on median values, but the distributions are much wider such that there is large overlap between classes. Focussing on the WXSD classification, mean wave direction width (Fig. 6j) is lowest for WXSD3 and highest for WXSD4 and WXSD5. Values for classes WXSD1, WXSD2 and WXSD6 are all similar. For the W classification, the pattern is similar (excepting classes 3 and 4 being switched) but much less pronounced and with wider ranges for classes W4-W6. Standard deviation in wave directional width (Fig. 6k) follows the same pattern as mean wave directional width.
While the W classification did not use as many of the parameters as the WXSD classification, there are still similarities in the parameter distributions for the additional parameters (Fig. 6e-k) between the two classes. Typically, the distributions of the W-classes have wider interquartile and overall ranges for these parameters; but this does demonstrate there is some linkage between the basic parameters (used for the W classification) and other tested variables.
An alternative way to view the parameter space, albeit limited in parameters visualised for the WXSD classification, is shown in Fig. 7 (W classification) and Fig. 8 Fig. 6d, there is less obvious distinction in CV T p (marker size); though it can be seen that W6 and lower values than W1 and W2. Similar patterns are shown for the WXSD classification (Fig. 8); though there is less distinct clustering of classes due to the use of additional parameters in the classification that are not shown visually. The geographic spread of the two classifications is shown in Fig. 9. Considering the W classification; classes W1 & W2 are largely found in enclosed seas, with class W2 also in the southern North Sea. Class W3 is found on the eastern coasts of the three oceans, particularly in the Southern Hemisphere; while class W4 is found on the western coasts and predominantly in the mid-latitudes. Class W5 is largely found in exposed areas of the Northern Hemisphere and class W6 in the southern  hemisphere in areas near to the southern storm belt. Very exposed areas in the northern hemisphere are also classified class 6; for example, northern California and Oregon, USA, and offshore regions to the west of Ireland and the south of Iceland. Equally some wave exposed east coasts in the southern hemisphere are classified class W5, eg. NSW, Australia) One unexpected classification is that class W5 is also found in the Arabian Sea and Bay of Bengal; this unexpected classification is believed to be caused by the high CV for H s 2 in this region (Fig. 4). Due to the grid resolution used in this study, there is little variation with distance from shore in the geographic spread; one example exception being the eastern seaboard of the united states that is classified as class W5 offshore and class W2 nearshore; this is also the case for the WXSD classification.
When additional parameters are considered in the WXSD classification, the geographical spread becomes more complex. Recall that, given we sought a device agnostic classification, all parameters are weighted equally. Class WXSD1 is found in the most enclosed regions such as the Mediterranean. Class WXSD2 is located in semi-enclosed areas such as the Caribbean and Gulf of Mexico as well as lower energy open coasts such as the east coast of Africa. When including the additional parameters, classes WXSD3&WXSD4 switch sides of the continents compared to the W class. However, based on consideration of the parameter spaces and the similarity in mean H s 2 values, we consider class WXSD3 is equivalent to class W4 (and vice versa). There is greater latitudinal variation on the west coasts of continents for the WXSD case; with class WXSD4 in areas exposed to wave energy from both hemispheres (greater directional variation) and class WXSD3 in areas exposed to energy only from one hemisphere. Classes WXSD5 and WXSD 6 are found in both hemispheres, with class WXSD5 more on eastern coasts and class WXSD6 on western coasts. Fig. 9 shows there are similarities in the geographic spread for both classifications. For both cases (and the other cases shown in Appendix A), the repeating patterns, especially in the Southern hemisphere, on eastern and western coasts give a physical verification to the clustering since the similarly orientated areas will be exposed to similar conditions generated by storms moving eastward around the globe. It is instructive to consider which areas show the same classes for both classifications; since one might have greater confidence in the classification of such areas. If (for this section of the analysis only) the labelling of WXSD3 and WXSD4 is switched, 55% of the tested datapoints are given the same class. These regions are shown in Fig. 10. Areas that are the same include: the high-latitude southern Pacific areas as class 6; the west coast of Central and South America as class 4; the offshore regions of the East coast of North America as class 5; and, the enclosed/semi-enclosed seas (thus predominantly fetch-limited wave climate) of Mediterranean and Gulf of Mexico as Class 1 or 2. It is noticeable that much of Europe is classified differently when additional parameters are added. Fig. 11 gives the distribution of power for the 6 classes and for both classifications. The vertical grey line in this figure marks the 15 kW/m threshold often used for the viability of WEC deployment. The two different classifications produce similar shaped power -class distributions. For both classifications, it is only classes 5 and 6 that largely exceed this threshold. Classes W1 and W2 are entirely below the threshold for the W set; for set WXSD the tail of the class WXSD2 distribution exceeds the threshold. In both sets, classes 3 and 4 straddle the threshold; with W3 being largely below the threshold. The lower energy classes (1&2) have lesser spread and higher peaks than the higher energy classes.
Amalgamating the above results, we can provide conceptual descriptions of the 6 resource classes. Class 1 and 2 are low energy classes. Class one has higher variability in H s 2 than class two and the lowest mean T p values of all the classes. Class one also has the highest risk factor, due to the low mean wave height. It has the highest variability in mean wave direction. Class 1 is only found in enclosed seas. Class 2 has similar (W2) or slightly higher (WXSD2) values of H s 2 , and lower again similar and the lowest for all classes. Correspondingly the risk factors are also lowest for these two classes. Class W4/WXSD3 is largely found on the western coast of continents but away from direct storm impact; these areas are well exposed to long swell waves from distant storms and such conditions make of the majority of their resources. Therefore, mean values of T p are larger for these classes than for W3/ WXSD4 and are the highest of all classes. Equally, Q p values are higher for W4/WXSD3 than for W3/WXSD4, indicating a typically narrower banded spectrum associated with swell waves. Variability in mean wave direction is lowest for this class compared to all others, as is mean wave directional width for WXSD3 (when wave directional width was included in the classification). Class W3/WXSD4 is found on ocean coasts away from frequent direct storm impact, but less well exposed to wave directions generated by these dominant storms so much of the wave resource is caused by more local storm systems. Therefore peak periods are lower, wave spectra are typically wider banded (lower Q p ), and There is a large amount of variability in mean wave direction and in wave directional width. The class is largely found on storm exposed coastlines in areas where one would expect significant seasonality in resource. Class 6; the highest energy class, is a combined swell and storm resource class. Mean peak periods values are high and there is lesser variability in parameters such as H s 2 , T p , and mean wave direction than class 5. This class is found on very exposed coastline, particularly in the southern hemisphere where exposure to the 'roaring forties' in the Southern Ocean leads to year-round swell.

Applications to WEC deployment and development
The first obvious application of these results is use of the geographic spread; a developer could utilise the maps from the case that best reflects the design constraints of the device in question and then assume that if it works well in a certain location; it would be reasonable to consider deployment in other locations of the same class.
One key motivation for this research was to provide a catalyst to inform device design. One way to do this is to define characteristic joint occurrence matrices that could be used as test cases, following [29]. As an example, Fig. 12 shows the mean H s -T p joint occurrence matrices for   the W classification classes 3-6. Classes one and two are neglected as being low power (Fig. 12). Class 3 has a peak of occurrence between 5.00 and 10.00 s peak T p and 0.75-1.75 m H s . Class four has a peak between 1.00 and 2.00 m H s and the period range increases to between 10.00 and 15.00 s. Class 5 shows a more spread out joint probability distribution; the peak is between 5.00 and 10.00 s T p and 1.00-2.25 m H s . Class 6 has a peak around H s = 2.50 m and T p = 12.00 s. This mirrors the parameter space description in Figs. 6 and 7.
Another motivation was to use the different classes of wave climate to explain the wide range of WECs currently existing with the aim of increasing investor confidence. Using openly available power matrices for the Aquabuoy, Wavedragon and Pelamis devices (taken from [47]) and the mean power matrices for the 6 classes; capacity factors were calculated (Fig. 13). These values neglect any estimate of operations and maintenance downtime, and it is assumed that no power is generated for sea-states outside of the bounds of each device matrix. For both the W and WXSD classification, all devices perform best in resource class 6, which is unsurprising because these designs were initially developed for deployment in the high wave resource areas of western Europe. The figure also demonstrates that, given the limited coverage of classes 5 and 6 (30%/25% of tested datapoints) and the low capacity factors for the lower classes, that alternative designs for lower resource classes are required if wave energy extraction is to become a truly global industry.
The same was conducted for theoretical power matrices (taken from [32]) as shown in Fig. 14. While similar patterns to the real-world devices are seen for the W classification, for the WXSD classification 3 devices (F-3OF, F-HBA, Bref-SHB) have higher capacity factors for class 5 rather than class 6. This demonstrates the concept that highest resource levels may not be the best for all devices. Note that in both these plots, the lines showing ± 1 standard deviation demonstrate there is large amounts of variability and overlap between classes.

Discussion
This research provides a valuable global classification of wave energy resources that provides insight into the global wave climate and should help with expansion of the wave energy converter industry. The two examined classifications result in broadly similar geographic distributions with some notable similarities (Fig. 10). This strengthens the argument for region-specific WEC designs as the wave climate can be grouped in the same way for some regions regardless of input parameter set.
The two classifications can be ground-truthed using the wide literature base that exists on wave energy resource assessments. For European seas, set W classifies most exposed ocean areas as class 5 (second highest resource class) except the far offshore areas around Ireland (class 6), whereas set WXSD classifies most western facing coasts as class 6 and Norwegian and Scottish coasts as class 5 (slightly lower energy, higher variability, particularly in direction). There has been extensive analysis of the European resource on site-specific, country and continental scales, e.g. [4,5,9,10,[48][49][50][51][52][53]. Given the variability in wave conditions over European coasts, multiple classes (WXSD classification) seem sensible; and while energy levels for areas marked 6 in Europe are lower than class 6 areas in South America or Australia, other parameters, particularly directional and spectral, are similar. Both classifications class the Mediterranean Sea as class 1; there have been a range of resource assessments for this area, e.g. [54][55][56][57], all of which indicate the low power availability which makes class 1 appropriate.
There has been minimal research effort focussed on the African continent. [58] consider an area that spans from Mozambique in the [59] developed a high-resolution wave model of the South African coastline. They find highest wave power on the southern coasts with lower wave power and greater seasonality on the western coasts. This fits better with the WXSD classification that classifies the southern coast as class 6 and the western coast as class 3; compared to the W classification which classifies the whole of South Africa as class 6. Similarly, in Northern Africa, [60] considered wave energy potential for Morocco. They highlight reduction in wave power in the North due to shadowing from the Iberian Peninsula and reduction in wave power in the south due to shadowing from the Canary Islands. Both classifications pick up the reduction caused by the Canary Islands but only the WXSD classification picks up the reduction in power caused by the Iberian peninsula. For North American waters, the spatial distribution of the classification compare favourably with local resource assessments in Canada [61,62], the US Pacific Northwest [63] and the US East Coast. On the west coast, the higher class is associated with high-latitude extratropical cyclones which are relatively common between northern California and British Columbia. The finer scale variation on the west coast around California is better represented by the WXSD classification. On the US East Coast, the dependence of class (regardless of classification) on depth is apparent. The US East Coast features a wide continental shelf, which has significant attenuation impacts on incoming wave energy flux; represented by the consistent reduction in class from offshore to nearshore.
For South American waters, the impacts of the energetic Southern Ocean is immediately apparent in the consistent Class 6 rating for the SW coast. This high energy flux has been previously noted by [64,65]. The eastern seaboards of Brazil, Uruguay and Argentina change class between 3 and 4 depending on classification system. Given the historically low occurrence of major cyclonic activity in the South Atlantic, these interchangeable classes are expected. However, with increasing numbers of recorded storms in recent years [66], this class rating may increase in the WXSD classification due to the risk parameters.
Japan is described in the same way for both classifications: class 2  on the more sheltered west coast and class 5 on the exposed east coast. [67] use measured wave data to consider wave power feasibility, there is a noticeable difference in annual mean power between the two coasts. A 20 year hindcast of seas adjacent to China [68], predict mean wave energy flux values which well match both classifications in this region.
Areas of the global ocean which display the most energetic class regardless of the classification set, occur on the continental margins exposed to the waves generated in the Southern Ocean. This corresponds with the energetic wave resource identified along Australia's southern margin by [69,70]. This energetic resource of Australia's southern margin contrasts to the low energy classes 1 and 2 identified in Australia's north, in the Gulf of Carpentaria and Arafura Sea. Australia's populated coast in the south-east of that continent displays consistent class 5. Hemer et al. [65], while recognising this region being not as energetic as the southern coasts, comment on the consistency of the resource in this region. Along Australia's west coast, we see a gradient of classes, particularly in the WXSD scheme, as the relative energy of the wave field changes from the energetic classes in the south, through to the less energetic classes in the north. These results demonstrate the consistent representation of wave field characterisation between this global study, and the more focused regional study.
Both the W and WXSD classifications class all the Indonesian coast as the same moderate energy, long period, limited directional variation class (WXSD3/W4). Wave energy in Indonesia has been described in detail using numerical modelling by [11] and via satellite data by [71]; both works demonstrate the moderate energy, relatively high mean period, low variability wave climate in the area and corroborate the classification.
While the two classifications give the same or similar classes (3&4) around the southern tip of India and Sri Lanka; there is a marked divergence further north in the Arabian Sea and Bay of Bengal. The W classification gives these areas as class W5 whereas the WXSD classification gives them as WXSD2. [72] analyse in detail the ECMWF ERAinterim dataset for a series of points around the Indian coast. They show that the mean wave power is in general low (< 15 kW/m) which means that the WXSD classification is more appropriate here. [72] demonstrate there is high levels of variability in the areas that the W classification mark as class W5; which explains this classification due to the small number of W classification input parameters and equal weighting given to each.
It is interesting to note that, in some areas, the WXSD classification seems to pick up some of the finer variability in mean wave power reported in the literature better than the W classification, despite the required variables (H s 2 , T p ) being included in the W classification.
Figs. 13 and 14 (Section 3.1) make use of power matrices to calculate capacity factors; however, these matrices only take into account wave height and period. Since it is recognised that the other parameters used in the WXSD classification impact on power production; it is likely that variation between classes may well change. For example, multidirectional waves cause a performance reduction as do multipeaked sea states [73,74]. Depending on WEC design, performance can be sensitive to spectral bandwidth, and it has been suggested that 3D performance tables including a spectral shape parameter may be more appropriate.
Many of the data points included in this analysis are below the commonly quoted threshold for wave energy deployment viability of 15 kW/m. This was a conscious decision; firstly, because such thresholds are evolving and, secondly, because the authors wished to enable consideration of WEC designs for lower energy waters. Some lower energy areas are already being considered for wave energy extraction, for example the Mediterranean [54], the China Seas [75] and the Caribbean [76]. In some cases, the application is not grid connected electricity, but power for other facilities such as an aquaculture facility [77]. Therefore, inclusion of these lower energy regions is important for a global wave resource classification.
The authors recognise that confidence in the absolute values used in this analysis would be increased with a longer time period and higher temporal resolution dataset. Dataset length and temporal resolution were dictated by workstation performance. However, the dataset sufficiently captures global variation in resource and accuracy of derived parameters is suitable for such a high-level study; previous studies, e.g. [5], have adequately described interannual variation and the impact of long-term atmospheric cycles using datasets of similar duration. Importantly, it is the spatial variation, rather than absolute accuracy, of parameters that is important for the clustering analysis Spatial resolution is appropriate for a study such as this: fine-scale variability would reduce clarity in findings while not being sufficiently accurate for use by developers without inclusion of more detailed processes such as tidal effects and rigorous local validation.
Classification results based on the other sets of input parameters are included in Appendix A since they may be of interest to device developers. To give further rationale to the choice of focussing on set W and set WXSD: some other sets of input parameters give very similar classifications to the two that are discussed in detail here. WS gives almost the same classification as the W set; WD is the same as WXSD; and WX classification is geographically similar to the W classification. The WSD, WXD and WXS sets gave counter-intuitive class boundaries: in all cases areas of known low wave energy were classified as classes 5 and 6.
Beyond a wave energy application; a global classification of coastal wave climate would have benefits for the coastal engineering and coastal processes communities. Currently in such research, study sites are classified rigorously based on tidal range but often wave conditions are described as high or low energy without clear definition of the boundary between the two. This work may allow for a more rigorous description.

Conclusions
k-means clustering has been used to classify global wave resource based on the ECMWF ERA5 wave reanalysis. Two different sets of input parameters to the classification are considered and for 55% of datapoints the same class is returned. The classifications give classes that are oceanographically intuitive and are supported by a literature review of existing resource assessments. Of the two classifications, the WXSD classification is considered to provide a slightly better match with existing knowledge. The 6 classes are then: 1. Enclosed seas; 2. Semi-enclosed seas and sheltered ocean coasts; 3. Moderate energy ocean coasts primarily influence by local storm systems; 4. Moderate energy ocean coasts primarily influenced by long distance swell; 5. Higher energy ocean coasts, with variable conditions; 6. Highest energy ocean coasts, influenced by large long period swell and storm conditions.
The spatial distribution and parameter spaces of these classes can be used to inform future industrial developments. Application of existing device power matrices show that all tested devices are optimised for the two highest energy resource classes which are only present in a limited area of the globe; therefore, there is significant deployment opportunity if devices optimised for the lower energy classes can be developed. It is recommended to design for class W4/WXSD3; where while energy is slightly lower, the period is higher, variability in both quantities is low, the risk factor is low, the spectrum is narrow banded and there is less directional variation. These factors raise the possibility of development of lower cost wave energy converters for this class.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
In the following appendix; geographic spread and class mean values are given for other input sets tested in the cluster-based classification.