Categorizing global MPAs: A cluster analysis approach

Marine Protected Areas (MPAs) are a widely used and flexible policy tool to help preserve marine biodiversity. They range in size and governance complexity from small communally managed MPAs, to massive MPAs on the High Seas managed by multinational organizations. As of August 2018, the Atlas of Marine Protection (MPAtlas.org) had catalogued information on over 12,000 Marine Protected Areas. We analyzed this global database to determine groups of MPAs whose characteristics best distinguished the diversity of MPA attributes globally, based upon our comprehensive sample. Groups were identified by pairing a Principal Components Analysis (PCA) with a k-means cluster analysis using five variables; age of MPA, area of MPA, no-take area within MPA, latitude of the MPA's center, and Human Development Index (HDI) of the host country. Seven statistically distinct groups of MPAs emerged from this analysis and we describe and discuss the potential implications of their respective characteristics for MPA management. The analysis yields important insights into patterns and characteristics of MPAs around the world, including clusters of especially old MPAs (greater than 25 and 66 years of age), clusters distributed across nations with higher (HDI ≥ 0.827) or lower (HDI ≤ 0.827) levels of development, and majority no-take MPAs. Our findings also include statistical verification of Large Scale Marine Protected Areas (LSMPAs, approximately>180,000km2) and a sub-class of LSMPA's we call “Giant MPAs” (GMPAs, approximately>1,000,000km2). As a secondary outcome, future research may use the clusters identified in this paper to track variability in MPA performance indicators across clusters (e.g., biodiversity preservation/restoration, fish biomass) and thereby identify relationships between cluster and performance outcomes. MPA management can also be improved by creating communication networks that connect similarly clustered MPAs for sharing common challenges and best practices.


Introduction
Marine Protected Areas (MPAs) are clearly defined areas of the ocean and coastal environments that are governed or managed with the distinct purpose of, "the long-term conservation of nature with associated ecosystem services and cultural values" [1]. Under this definition, MPAs help conserve key areas for marine biodiversity, aid in the recovery of degraded areas, and also help increase the resilience of some ecosystems to the impacts of climate change [2]. These benefits are often achieved by restriction of potentially harmful activities, like fishing, within the MPA, and they give MPAs the potential to be highly effective tools for marine conservation around the world. Global commitments have been made to protect 10% of the ocean by the year 2020 (e.g. SDG target 14.5 under the United Nation's Sustainable Development Goals [3,4]), including many other regional or national level protection targets. MPA proposals, designations, and implementations have accelerated over the last ten years [5], with conservation databases reporting nearly 15,000 MPAs (Marine Protected Areas coverage in 2018, https://protectedplanet.net/marine, accessed March 15th, 2019).
With such a high number of MPAs in many regions of the world, the diversity of MPA characteristics is not surprising. For example, the internationally managed Ross Sea MPA that covers 1.55 million km 2 in the remote Southern Ocean has features that are quite distinct from those of a small indigenous-governed MPA in a tropical developing country such as Naru Reef in the Solomon Islands [7,8]. There are also cases where some MPAs are distinguished by a key policy or management characteristic. Examples of subcategories of MPAs include "notake" MPAs which prohibit all forms of capture fishing within their boundaries, or "Large Scale Marine Protected Areas" (LSMPAs) which are defined as MPAs exceeding a given area-based threshold [2,9]. In this paper, rather than grouping MPAs by a single variable, we developed a more comprehensive perspective of MPA characteristics and diversity by considering multiple attributes simultaneously.
Other studies have conducted multivariate analyses for analyzing MPA characteristics around the world with relation to MPA management and performance. Two notable recent works include Edgar et al. and Gill et al. [9,10]. However, these studies investigated the characteristics that most influence ecological performance and impacts of MPAs, whereas here we investigated how MPAs fall into representative groups or categories based on similar key attributes. Our research also differs from many past analyses in that we used a comprehensive global database rather than a relatively small sample of MPAs.
Our primary aim in conducting this analysis was to deliver valuable insight into the present status or state of affairs of MPAs around the world through the lenses of our chosen variables. These insights are highly relevant in light of high rates of MPA proposals, designations, implementations, and expansions in recent years. Investigating the commonalities of MPAs that cluster together may also serve as a useful platform for investigating the potential status, strengths, and challenges for MPA management within each cluster. As a secondary outcome, such a study may also have practical relevance for MPA research. Many highly cited studies pertaining to MPA management and governance have been based on case study focused research [11][12][13]. MPA diversity is especially consequential for case study research when findings and outputs are intended to have broad scale relevance and global implications. Case study based research can only truly be scaled up to a global level if the studies selected are representative of the diversity of MPAs around the world. Therefore, our process and findings will provide objective guidance towards identifying representative groups of MPAs from which case studies may be selected. Looking across representative groups can also help investigate the effect of certain features on performance metrics, or control for influential variables that may affect outcomes. Thus, in addition to applying this approach to obtain a global view of MPA characteristics, diversity, and implications for management, we also describe its use as a replicable, systematic method of identifying representative groups to help guide future MPA research.

Analytical approach
We classified MPAs based on the results of a cluster analysis using publicly available, globally tracked MPA data. Our methodological approach builds upon a cluster analysis conducted for urban parks [14], which we found analogous to MPAs as another form of spatial planning and management for uses other than industrial or exploitative activities. Ibes (2015) used a two-step clustering method, combining Principal Components Analysis (PCA) and a cluster analysis. This approach is considered statistically rigorous [14].
We conducted a PCA first to help verify the contributions of our selected variables to variance within the chosen data set. We then performed a k-means analysis as a partitional clustering method using Euclidean Distance to establish clusters among the finalized list of explanatory variables. The PCA and k-means analyses were performed in R using the packages 'FactoExtra' and 'FactoMiner' [15,16]. Following cluster assignment, clusters were verified and analyzed with ANOVA, pairwise t-tests, and descriptive statistics. The analysis was performed using R version 3.5.1 (2018-07-02) -"Feather Spray".

Variable selection
We selected the following initial set of variables based on the combination of analytical practicality, tractability, global availability, and relevance to MPA policy and management: Age (years) Area (km 2 ) Proportional No-Take Area (%) Latitude (degrees) Human Development Index (0-1) Edgar et al., 2014 demonstrated that highly effective MPAs are often large, isolated, old, well-enforced, and have no-take status [9]. Our analysis tracked "old", "large", and "no-take" MPAs by inclusion of area (km 2 ), age (years), and proportional no-take area as variables. Area and no-take status have also been demonstrated to influence MPA management costs [17]. Gill et al., 2017 also identified sufficient staff capacity and operating budgets as especially important indicators of MPA performance [10]. Some of the influential variables in these studies, among factors in other literature deemed relevant to MPA management and performance [12,18], were not included in our analysis, including enforcement, isolation, and governance type. Variables were excluded due to limited data availability and restrictions of combining categorical and numeric data in k-means analyses, among other reasons (see section 4.2. for further explanation).
We also considered that patterns may emerge based on differentiating habitats and environments. For example, MPAs in cooler polar seas may have different characteristics than those in tropical environments, such that tropical MPAs may generally be more attractive for industries like tourism. Tropical regions also typically contain higher biodiversity according to a latitudinal diversity gradient hypothesis [19]. Latitude was thereby selected as a proxy to broadly represent the range of environments (polar seas to tropical) and their associated characteristics.
Human Development Index (HDI) has been used in many studies on MPAs and MPA management [10,13,20,21]. HDI permits ranking of the level of overall societal development of a country based on a collection of variables including income, standard of living, education, and health. HDI has also been used as an informative indicator of economic development, and the strength of government frameworks including legal and judicial systems [13]. GDP, while widely available, is solely focused on economic production and output rather than measuring broader societal well-being. Past research has also identified a strong positive correlation (Kendall's t = 0.698, p < 0.0001) between HDI and Yale's Environmental Performance Index, a ranking index comprised of 24 indicators (including MPAs) that gauges national progress towards established environmental policy goals (https://epi.envirocenter.yale. edu/) [22,23]. HDI may therefore serve as a useful indicator of environmental progress of a given country including for MPAs, as well as the potential ability to further progress on MPAs and marine conservation goals (though we do not intend to unequivocally present it as such). One recent MPA management study identified a weak correlation between HDI and the authors' chosen management indicators [10]. However, other studies effectively incorporated HDI into MPA research to contextualize MPA management and governance scenarios [13,20].

Data sources
The Atlas of Marine Protection (MPAtlas, MPAtlas.org) uses the marine portion of the WDPA database (www.protectedplanet.net), but independently validates the data and includes some additional useful metrics. Experts have cited MPAtlas specifically for its accuracy and wide acceptance among MPA databases [5]. MPAtlas (August 2018) provided shapefiles, from which the attribute table was exported to Excel and used in our analysis. MPA Area was taken directly from MPAtlas. Latitude, Proportional No-take Area, and Age were also gathered via the MPAtlas dataset, but required varying degrees of manipulation.
MPAtlas also segmented some multi-use MPAs based on their separate management zones, for example the Great Barrier Reef was included as seven distinct MPAs according to its seven different management areas that have unique goals and policies (http://www. mpatlas.org/mpa/sites/7700301/). Recent expansions were also sometimes included as distinct MPAs, such as the 2016 expansion of Papahanaumokuakea. For consistency, we considered these distinctions as individual MPAs within our analysis as well. We also limited the analysis to locations that, per the verification and validation process of MPAtlas, were considered true MPAs. This procedure resulted in the exclusion of areas that protect only a single species group, such as shark sanctuaries, which, while a form of spatial protection significant to marine conservation, are often distinguished from MPAs in the literature [24]. MPAtlas lists intention to protect the entire ecosystem as an important component of the definition of MPAs (http://mpatlas.org/ about/why-mpas/). Other types of spatial protection that MPAtlas determined were not truly MPAs included marine mammal sanctuaries, bottom trawl closures, and general fishery management areas, among others. When these were removed, our data set included 10,825 "true" MPAs.
HDI indices and calculation methodology have occasionally been revised, especially in 2010 and 2014 [25]. However, HDI reports (among many other societal indicators) often ignore many Small Island Developing States (SIDS) with large Exclusive Economic Zones (EEZs), including autonomous governments that do not officially have UN representation. Using the most recent HDI reports would therefore remove globally significant MPAs including the Coral Sea in New Caledonia, the St. Helena Marine Protected area, and the Cayman Islands. A separate 2009 report expanded the 2008 HDI calculation methodology to include many typically unreported SIDS [26] and other countries not typically included in HDI reports, and we used those HDI metrics in our study.
Our selected variables were defined and calculated or modified as follows (see supplementary material for more information): Human Development Index (0-1) -Taken from a 2009 report that expanded HDI to many countries not typically accounted for under the official HDI reports, but are highly relevant to marine conservation.
All variables were standardized and assigned z-scores relative to their respective variance to remove influences of differing scales across variables. The z-scores were used for the PCA and k-means analysis for assigning clusters. After cleaning the data and removing MPAs with incomplete data, our final data set with which we conducted the analysis included a total of 2938 MPAs. Lack of data on the area in No-take zones and age of the MPA were primarily responsible for the decrease in sample size (7800 and 665 MPAs, respectively, did not have these data).

Results
The PCA and k-means process identified a preferred analysis using four variables (dropping latitude) and 7 clusters (Fig. 1, Table 1, and supplementary information). The 7 clusters contained a total of 2938 MPAs (Figs. 1 and 2, Table 2), and were converted from numeric designations to alphabetical from A to G, from highest to lowest number of MPAs within the cluster. For example, cluster A contains nearly half of the MPAs in the data set, while Cluster G has the fewest (n = 3).
A series of ANOVAs identified significant differences across cluster means for all variables, with pairwise means comparison t-tests revealing statistically significant differences between individual cluster pairings (Tables 3 and 4), suggesting that the clusters represent distinct groups.
Among cluster and variable results (Fig. 3, Table 4), some patterns were notably distinct. For example, the average area of MPAs in Clusters F and G were much larger than that of other clusters with average areas of 345,282 km 2 and 1,142,051 km 2 respectively, compared to a sample-wide average of 4420 km 2 . The other clusters contained a wide range of MPA sizes from 1 to nearly 100,000 km 2 or more, though Clusters B and E were notably smaller on average than others, at 191 and 186 km 2 respectively. However, we also tracked the proportional distribution of MPAs per cluster across different size ranges ( Table 5). The majority of MPAs in Clusters A and B are less than 1 km 2 . Clusters F and G exclusively contain MPAs greater than 100,000 km 2 . The majority of MPAs in the remaining clusters are less than 10 km 2 (Clusters C and E) and 100 km 2 (Cluster D).
MPAs in Clusters B and E were notably older than the others, with average ages of 35 and 95 years respectively. In contrast, MPAs in Clusters F and G had an average age of only 4 and 2 years, with a maximum of 12 and 6 years. The other clusters had similar MPA age distributions, with averages ranging from~13 to 20 years.
Cluster C had a distinctly high proportion of MPAs with high notake area with a minimum of 76% no take and an average of over 99% no-take. Some other clusters contained fully no-take MPAs (100%), but all averaged at 50% or less no-take. MPAs in Clusters A and B averaged less than 1% no-take.
Based on HDI, Clusters A, B, and E were exclusively located among more developed countries with an average HDI of 0.952-0.961, and minimum of 0.811 in Cluster E. Cluster D was centered around developing countries with an average of 0.711 and maximum of 0.827.

Cluster descriptions
The statistically significant and verified clusters from the k-means analysis lead us to describe MPAs on a global scale in seven general groups (Table 6). With thousands of MPAs around the world, these classifications are not intended to account for all relevant features and characteristics. But they can function as lenses through which to assess the current state of affairs of MPAs around the world via factors relevant to management and performance. Each cluster had at least one major defining characteristic within our results that distinguished it from its peers, and sometimes had less prominent secondary characteristics that deviated from sample wide averages or otherwise merited consideration. These characteristics are used to define the nature of MPAs within each cluster and potential consequences for management. We have also selected sample MPAs from each cluster within our dataset that are characteristic of their respective cluster's key defining features (Table 7).

Cluster A -MPAs in highly developed countries
Cluster A is the largest cluster by number of MPAs, which at n = 1260 amounts to 43% of the MPAs in the final data set. According to our results, MPAs in this cluster ranged from 0 to nearly 150,000 km 2 in area, but more than half of the MPAs in the cluster were less than 1 km 2 . MPAs in Cluster A ranged from 0 to 25 years old. Cluster A contained some partial no-take areas up to~50% of total area, but the vast majority of MPAs in this cluster did not report any no-take area, which averaged less than 1% across the cluster. HDI had a relatively narrow range within Cluster A, centered on more developed countries which can be observed visually with most of Cluster A dispersed around North America, Western Europe, and Australia. The average HDI score for the cluster at 0.951 was similar to that of countries such as Spain and the United States (0.950, 0.947). The lower range of development in this cluster includes more developed countries in Latin America (Mexico, 0.854) and lesser developed countries in Europe (Latvia, 0.876). Based on HDI, we refer to Cluster A as MPAs in highly developed countries (Fig. 4).
From a management perspective, MPAs in Cluster A may be located in countries with better government infrastructure and greater institutional capacity. MPAs in this cluster may also have better access to financial resources to support management activities (including enforcement) than some other groups of MPAs due largely to the wealth of the specific countries as indicated by HDI. Prior research also suggests that MPAs in more developed countries receive a greater proportion of their financial support from the national government [20]. Education is another contributor to HDI, indicated by variables such as literacy and school enrollment rates, may also suggest that MPAs in this group are located in countries with a more educated population and/or better education infrastructure. Educating the populace on the importance of the marine environment and MPAs as a method of public outreach is often an important indicator of success and therefore a typical focus of MPA administration [27]. Prior research has also found that level of education can directly influence the amount that people are willing to pay to support marine conservation efforts, even when controlling for income [28].
These inferences on the relation between HDI and MPA management do not mean that MPAs in this cluster are necessarily better managed and enforced. For example many MPAs in Mexico are reported to be ineffectively managed [18]. Rather, MPAs within Cluster A may be particularly well positioned for potential success. To the extent that HDI is correlated with the Environmental Performance Index [23], governments in countries with high HDIs may also more frequently achieve their respective environmental goals and policies including national targets for ocean protection.

Clusters B and E − "Middle-aged" and "Senior" MPAs
We discuss Cluster B and E simultaneously due to their similar characteristics and distinction based on their age compared with other clusters. According to our results, both consisted of older MPAs than other clusters, with averages of 35 and 95 years respectively. Cluster B MPAs ranged from 26 to 65 years old, whereas Cluster E encapsulated all MPAs older than 65. These age ranges influence our decision to refer to Clusters B and E as "Middle-aged" and "Senior" MPAs. MPAs in Clusters B and E were generally located in more developed countries and returned a mean, standard deviation, and min/max range for HDI closely resembling Cluster A (see Fig. 5).
While some exceptionally large MPAs existed within clusters B and E, such as in Greenland, both of these clusters had similarly small average areas, which at~190 km 2 is nearly an order of magnitude smaller than Cluster A's average. In all, they contributed to only~1% of the total MPA coverage despite making up~31% of MPAs in our sample. But like Cluster A, the majority of MPAs in Cluster B are less than 1 km 2 , while Cluster E MPAs were actually more evenly distributed from 0 to 1000 km 2 (Table 5). Therefore, we attribute the smaller average size of these clusters not to "smaller" MPAs, but rather a lack of the larger, more expansive MPAs (~1000 km 2 or more) that are more frequent in all other clusters. Prior research has suggested that MPAs were historically small extensions of terrestrial PAs in coastal regions, designed to protect an adjacent local feature like an individual bay [29]. That approach contrasts modern efforts to specifically protect marine environments, including under the UN SDGs, and protecting larger swaths of the ocean that encompass entire ecosystems or ocean regions. The larger MPAs covering thousands of km 2 or more, which are more prevalent in the other younger clusters, are more likely to provide that type of protection. Our findings for Clusters B and E thereby  provide some objective insight and support of the claim that early MPAs, as extensions of or adjacent to terrestrial PAs, seldom protected larger areas of ocean (thousands of km 2 or more) that are the emphasis of current marine conservation efforts (including UN SDGs). Clusters B and E also suggest that MPAs, in their modern legal form, were largely exclusive to more developed nations until recent decades. Though some cultures in developing countries conducted spatial closures long before any of these developed country MPAs existed, they have not been registered and are therefore not included in global databases on which we based our analysis. Should consistent data become available from these MPAs, they could be included in future cluster analyses using our methods. Stakeholder or communal participation and cooperation is often heralded as key to effective management [18], and MPAs which have been established for several decades or more may have further integrated within coastal communities and culture than their younger generational counterparts. This consideration is particularly relevant for Cluster E, which contains MPAs that have been in existence for multiple generations and predate all but the very eldest of local community members. Environmental performance is also known to increase with age [9]. Therefore, Cluster B and E MPAs may receive the benefits of having more time to demonstrate environmental benefits, in addition to being located primarily in more developed countries and having political and societal longevity. These features would all be beneficial towards effective management.

Cluster C -No-take MPAs
Cluster C is defined by a high proportion of no-take area and contained most MPAs with full no-take coverage. And while Cluster C included MPAs with as little as 75% no-take in our results, the mean of 99.7% no-take with standard deviation of 2.29% was consistent with a similar observation from Clusters A, B, and E that partial no-take MPAs are rare and that MPAs are typically either fully no-take or did not have any no-take area at all. Though this observation may have partially been a result of the way MPAtlas segments multi-use MPAs as distinct data points. But consistent with prior published results, we still observed fully no-take MPAs to be the minority of MPAs around the world [5], and Cluster C represented only 17% of examples in our data set. Other clusters did contain some fully no-take MPAs, but these were few and only if one of the other of our 4 variables were especially prominent (see Fig. 6).
Previous studies have demonstrated that no-take MPAs are substantially more expensive to manage than MPAs that do not fully protect from fishing when controlling for other factors that affect the cost of operations [17]. As a result, management activities for Cluster C are likely to require more financial resources to enforce the more restrictive nature of these MPAs. No-take MPAs may also face greater political opposition from sectors restricted by no-take status. However, no-take status has been identified as a key feature for achieving conservation goals [9], and all MPAs in this category have this as a strong attribute towards effective management. Therefore, if these challenges can be surmounted, then MPAs in this category may be particularly well positioned for effective management and subsequent performance.

Cluster D -MPAs in developing countries
Cluster D was most distinguished in the results by having low HDI values among its assigned MPAs, which indicated an association with developing countries. American Samoa, Ecuador, and Colombia were the most developed countries to contain a Cluster D MPA, with HDIs of 0.827, 0.816, and 0.812 respectively (ranked #102, #107, and #112 out of 230 countries in our HDI data set). Examples of countries within Cluster D closer to the cluster's average HDI of 0.711 included Indonesia (0.746, #150), Egypt (0.714, #165), and Tuvalu (0.711, #166). Visually, we observed Cluster D MPAs to be distributed primarily in more tropical zones of Latin America, Africa, South East Asia, and parts of Oceania. Cluster D MPAs also had a modest amount of no-take area,  Table 2 Frequency and coverage of each MPA cluster. Includes total number of MPAs counted in the cluster, % of total cluster sample (n = 2938), total area covered by MPAs in the cluster (km 2 ), and % of total sample area (12,986,102 km 2 ). Numbers rounded to nearest whole number or %. including over 50 fully no-take MPAs, or~20% of the cluster sample. However, these were exclusively in lesser developed countries within the cluster, at or below the average HDI (see Fig. 7). Recent research has suggested that MPAs in developing countries with lower HDIs often rely on funding from international sources or at the sub-national level [20], likely due to limited resources within their national governments. Another study on MPA governance theorized that MPAs in developing countries have trended towards various forms of decentralization due to weaker state capacity [13]. Research conducted on some MPAs within Cluster D have highlighted some of these alternative management and financial strategies that reduce reliance on the central government [30][31][32]. The findings from this prior research, combined with our objective results from the cluster analysis and implications of HDI, suggest that countries that contain Cluster D MPAs may frequently have fewer financial resources than MPAs in other clusters, or otherwise are less likely to get the support needed from their respective national government. This may be of particular concern for the no-take MPAs within the cluster, which may both require more financial resources than non no-take counterparts while also being located in the least developed countries in this group.

Clusters F and G -large scale MPAs (LSMPAs) and Giant MPAs (GMPAs)
We discuss Clusters F and G simultaneously due to their similar characteristics and distinction based on area compared with other clusters. Clusters F and G only contained 17 and 3 MPAs respectively, but combined encompassed more than 70% of the entire area in our data set. The most distinctive characteristic of these clusters was the size of the MPAs, with a minimum area of 180,300 km 2 for Cluster F and 989,842 km 2 for Cluster G, with average areas~2 orders of magnitude greater than the sample mean. These were very young groups of MPAs, the oldest in Cluster F being only 12 years old and Cluster G even more recent at a maximum of 6 years old. In addition, while having a mix of no-take coverage ranging from 0 to 100%, 12 of 20 MPAs in the Table 3 Significance results of pairwise t-tests (p-values) for each variable. Significance defined as: *p < 0.05, **p < 0.01, ***p < 0.001. two clusters combined had at least some no-take area, 7 of which were completely no-take. These clusters also included many high profile MPAs such as Papahanaumokuakea in the USA [33] and the Phoenix Islands Protected Area in Kiribati [34] (see Fig. 8). Management needs and approaches for Clusters F and G are likely to differ from the other four clusters primarily due to the expansive ranges that such large MPAs can encompass. While larger MPAs are overall more expensive to manage, they are far less expensive than smaller MPAs on a per area basis [17]. It is therefore difficult to project just how much greater the costs of managing MPAs in clusters F and G may be compared to others. Also due to their expansive range, Cluster F and G MPAs likely cover wide areas of more remote offshore waters. This may require, as well as enable the use of, management surveillance and equipment appropriate for such remote conditions, which can include offshore-equipped vessels and satellite monitoring. Additionally, the performance of such MPAs that cover large swaths of ocean has also been the subject of scientific debate [2], and is all the more difficult to ascertain considering the younger age of these MPAs. Therefore, we speculate that MPAs in Cluster F and G may require special emphasis on scientific monitoring in the short to medium term. All of these implications may be especially magnified for Cluster G due to the particularly expansive ranges of MPAs in that cluster.
Our results also contribute to the growing body of literature on Large Scale Marine Protected Areas (LSMPAs or LMPAs), especially towards better defining the group. LSMPAs are MPAs beyond a certain size threshold, but the minimum size that constitutes an LSMPA Fig. 3. Boxplots of explanatory variables across clusters. In order from left to right, first to second row; Area (km 2 ), Age (years), No-take %, HDI. Y axis for Area (km 2 ) is logarithmic.

Table 5
Distribution of MPAs per cluster across size ranges from 0 to 1 km 2, to 100,000 + km 2 . Color scale indicates higher % in red, lower % in blue.
remains under debate with different sources ranging from 30,000 km 2 to as high as 250,000 km 2 [35]. Our analysis suggests that, with our other variables considered, MPAs become large enough to statistically distinguish themselves by area alone at approximately 180,000 km 2 . While within range of historical LSMPA definitions, this definition differs from previous literature by arriving at a minimum size threshold via mathematical methods rather than by arbitrary selection [35]. The results also identified an even larger threshold for Cluster G at 1,000,000 km 2 , suggesting that an additional group of especially large LSMPAs has recently emerged. While Cluster G is a small group, it may continue to grow as countries pursue MPA protection goals of 10% or more of the ocean. Some known MPAs that may have qualified for Cluster G were also excluded due to restrictions related to HDI (as explained in section 4.2). Our findings suggest that perhaps the minimum threshold for LSMPAs be in the range of 180,000 km 2 as per Cluster F, while a sub-class of especially large LSMPAs be designated with a minimum of around 1,000,000 km 2 as according to Cluster G, which we refer to here as Giant MPAs (GMPAs). While only a small number (3) of MPAs currently populate this statistically identified cluster, the addition of future, new LSMPAs to the database will assist in evaluating the robustness of this group as a distinct cluster.

Limitations
Our analysis could not make use of all potentially relevant or informative variables due to technical constraints and data limitations. For example, k-means analyses can be used with categorical data like "Governance Type", but via a process that is distinct from the numeric based approaches used in this study [36]. Therefore, "Governance Type" would be unable to be paired with others like "Area in km 2 " in the same cluster analysis. Data on MPAs is also notoriously scarce, especially on a global scale, a pre-requisite for our analysis. Potentially Table 6 Summary of MPA cluster characteristics (primary and secondary) and potential implications for management.

Cluster
Primary Characteristic Secondary Characteristics Potential Implications for Management A High HDI MPAs 25 years or younger, few with partial no-take areas. Some larger MPAs but majority very small (< 1 km 2 ). Little no-take area.
High HDI suggests better access to financial resources to support MPA operations, as well as stronger institutional infrastructure including governance, enforcement, and education which can also be beneficial to MPA management. B Older (26-65 years) High HDI, smaller than other clusters on average, majority very small (< 1 km 2 ). Little no-take area.
Have political and societal precedence or longevity which may indicate political and social support. Older also means more time to demonstrate potential environmental benefits. High HDI suggests better access to financial resources and stronger institutional infrastructure for governance, enforcement, and education. C Full or almost fully no-take Some up to 65 years old but average less than 20 years. No-take status may require more financial and human resources for management and enforcement. May also face more political resistance from commercial fishing and other extractive industries.  Isolation is another variable that is considered key to MPA performance [9] and would have been appropriate to include. But it is difficult to incorporate because of data availability and reliability. Some studies have measured isolation in quantitative terms [10,17,37]. But the reliability of these calculations remains unverified, they are difficult to attain, and overall there remains no consensus on how to objectively measure isolation within the marine science and conservation community. The complexity of measuring and interpreting isolation also means that it is not an easily tractable variable. Given the limitations of not including these potentially relevant variables, future research using our results should expand upon the variables noted here as needed when applying our findings and methods for individual MPA analysis, comparisons, or case study selection.
Another limitation in the study was the exclusion of some MPAs from consideration because data were not available for all variables in our analysis. Most MPAs were removed due to a lack of no-take data, and some countries were left out because of a lack of an assigned HDI. For this reason, high seas MPAs like the Ross Sea in Antarctica were not included in our results, nor were MPAs in some SIDS such as Bonaire National Marine Park. However, this highlights one of the many advantages of using such widely available and easily tractable variables, in that it is fairly easy to assign such examples to our MPA clusters based on other key characteristics where data are available. For example, the Ross Sea would likely belong to Cluster G due to its expansive range of 1.55 million km 2 [7].

Future directions for research
Opportunities exist for improvement and refinement of our methodology, including further insight on the definitions of each cluster and how one arrives at those definitions. For example, how might the method be improved, especially with more complete and better access to data that may allow us to explore other variables more easily? Such examples may include performing additional cluster analyses within our defined clusters, and including additional variables that become available with improved data access and refined methodologies for measurement.
Our study also provides a platform from which to steer future research in MPA management and policy. Might we be able to compare performance indicators (like long-term financing and achievement of     environmental objectives) through the lens of these cluster assignments such that different outcomes might be associated with each of the clusters? Would these outcomes verify or refute our speculations on the management implications from these cluster assignments? For example, researchers could investigate if certain clusters contribute more towards biodiversity targets. Or perhaps some MPA clusters may demonstrate performance in different ways such that some may be better performing in strictly environmental parameters, whereas others can contribute more towards socioeconomic goals. There is also the potential to directly use cluster results for future case study based research on MPAs. By isolating different 'types' of representative MPAs that may be more predisposed to different management outcomes as explained above, one can more fairly compare performance indicators across clusters and control for certain factors that may influence performance outside of management decision making. The clusters can also help guide case study selection by defining representative groups of global MPAs, from which case studies can be selected from to maximize diversity within case study samples.

Conclusions
The k-means analysis successfully segmented over 2938 MPAs around the world into seven statistically significant clusters from the perspective of MPA management based on age, area, proportional notake area, and HDI of the host country. These clusters were derived from a comprehensive global database to allow us to view the present state of affairs of MPAs on a global scale (given data limitations) through the lenses of these variables. Each cluster held at least one characteristic of intrigue from which they could generally be described. We were also able to infer the potential differences in management needs and approaches for each cluster based on their given defining characteristics. Among these, three groups of MPAs (C, F and G) emerged that embody management practices of no-take and Large-scale MPAs (LSMPAs) that have become increasingly popular among MPAs due to the demonstrated positive contributions of large area and notake status to MPA performance. We also identified a new threshold from which to define LSMPAs, as well as a subcategory of especially large LSMPAs dubbed here as GMPAs.
Our findings also provide valuable practical contributions to future research. Our approach provides objective guidance in case study selection for MPA research. Using these clusters, researchers can track MPA performance and management across clusters and relate any differences in management or performance to our selected variables. By accounting for factors that may influence management or performance beyond ground level management activities, these clusters may provide a more objective comparison of performance outcomes and lead to better planning for MPA viability.

Declarations of interest
None.