How many metazoan species live in the world’s largest mineral exploration region?

The global surge in demand for metals such as cobalt and nickel has created unprecedented interest in deep-sea habitats with mineral resources. The largest area of activity is a 6 million km2 region known as the Clarion-Clipperton Zone (CCZ) in the central and eastern Pacific, regulated by the International Seabed Authority (ISA). Baseline biodiversity knowledge of the region is crucial to effective management of environmental impact from potential deep-sea mining activities, but until recently this has been almost completely lacking. The rapid growth in taxonomic outputs and data availability for the region over the last decade has allowed us to conduct the first comprehensive synthesis of CCZ benthic metazoan biodiversity for all faunal size classes. Here we present the CCZ Checklist, a biodiversity inventory of benthic metazoa vital to future assessments of environmental impacts. An estimated 92% of species identified from the CCZ are new to science (436 named species from a total of 5,578 recorded). This is likely to be an overestimate owing to synonyms in the data but is supported by analysis of recent taxonomic studies suggesting that 88% of species sampled in the region are undescribed. Species richness estimators place total CCZ metazoan benthic diversity at 6,233 (+/-82 SE) species for Chao1, and 7,620 (+/-132 SE) species for Chao2, most likely representing lower bounds of diversity in the region. Although uncertainty in estimates is high, regional syntheses become increasingly possible as comparable datasets accumulate. These will be vital to understanding ecological processes and risks of biodiversity loss.


In brief
Species-level biodiversity information is key to understanding ecosystems and tracking environmental impacts. Rabone et al. provide the first checklist (436 species) and total species estimates (>6,000->8,000) for the world's largest mineral exploration region, the CCZ. Estimates provide a baseline to build biodiversity knowledge at a regional scale.

INTRODUCTION
The Clarion-Clipperton Zone (CCZ) is an area of seabed roughly twice the size of India (approx. 6 million km 2 ), spanning 5 -20 North between the Clarion and Clipperton oceanic fracture zones, and 115 -160 West. This vast region, between Hawaii, Kiribati, and Mexico, lies entirely within areas beyond national jurisdiction (ABNJ), legally designated under the United Nations Convention on the Law of the Sea (UNCLOS). The region is composed of abyssal seafloor at depths of 4,000-6,000 m, characterized by muddy sediments overlain by potato-sized polymetallic nodules, rich in minerals. Despite the darkness and low food availability, nodule field habitats contain diverse communities of benthic invertebrate fauna, albeit at low densities compared with coastal and shelf ecosystems. 1 Mineral exploration began in the CCZ in the 1960s, later formalized under the International Seabed Authority (ISA). 2 Currently, there are 17 contracts for mineral exploration covering 1.2 million km 2 . Despite decades of intensive exploration, there has been a historical lack of taxonomic work in the region. Large-scale CCZ environmental surveys conducted in the late 1970s to early 1990s produced lists of informal species names, 3 but few species were formally described. Informal names refer to species differentiated by morphology and/or molecular data and recorded with temporary names before formal description 4,5 (hereafter ''unnamed species''). These names present challenges to taxonomic standardization and regional-scale synthesis of biological data. Molecular work provides an arbiter for compatibility across identifications, 6-8 but is not without challenges. Adding to this complexity, cryptic species, or those with similar or identical morphology but separate molecular lineages are numerous in deep-sea environments, 9,10 including the CCZ. 11,12 Other factors contributing to the lack of comparability across datasets are variable sampling methods, 13 and more fundamentally, a lack of data. 14 As a result, CCZ synthetic works are rare and primarily focus on particular taxa, size classes, and/ or regions. [15][16][17][18][19] Information gaps span all size classes, from small meiofauna (typically >150 mm) and macrofauna (> 300 mm), to large megafauna (typically >10 mm). 20 The data deficiency is particularly notable for the network of Areas of Particular Environmental Interest (APEIs), regions protected from mining 21 (but see Bonifá cio et al. 15 , Brix et al. 17 , B1a _ zewicz et al. 22 , and Hauquier et al. 23 ). This has hampered assessment of their representativeness, with clear implications for environmental management. 20 Biodiversity knowledge is essential to robust assessments of species ranges and rarity over time and space, and therefore to evidence-based Regional Environmental Management Plans (REMPs) and future environmental impact assessments (EIAs) in the event of mining operations. 24,25 The need for regional-scale environmental management has been increasingly recognized by policymakers and the ISA, 21 supporting a recent resurgence of comparative taxonomic work, including incorporation of DNA methods that allow for a more comparable methodology. 13,26 Critical to the development of CCZ biodiversity knowledge is the creation of a curated checklist of known taxa and estimates of total undescribed species. Building on recent regional syntheses, 20 we present the first comprehensive synthesis of benthic metazoan biodiversity and checklist for this vast region on the eve of possible large-scale mining operations. We make these data and interpretations open to all stakeholders to inform the ongoing debate on deep-sea mineral extraction and to grow our knowledge of the largest ecosystem on our planet.

RESULTS
How many animal species are known to live in the CCZ? The synthesis produced >100,000 records compiled from seven data sources (Figure 1, key resources table). Recent growth in taxonomic efforts for the CCZ is evident, particularly over the past 5 years (Figure 2A). To date, 219 taxa new to science (families, genera, and species) have been described from the CCZ. Most of these new taxa have been described in recent years, with only seven descriptions prior to the year 2000. The CCZ Checklist presented here comprises 436 named benthic metazoan species of all size classes (Table 1; Data S1). These include 185 species, three families, and 31 genera described from the CCZ (see Figures 2 and 3). Only six of the 185 CCZ new species have also been recorded elsewhere, namely the sea cucumbers Psychronaetes hanseni (Pawson, 1983) 41 and Psychropotes dyscrita (Clark, 1920) Figure 3).
The CCZ Checklist records 27 phyla, 49 classes, 163 orders, 501 families, and 1,119 genera in total (Table 1). For all species-level identifications in the Checklist, 42% are based on morphology and molecular data (185/436), 50%, morphology only (217/436),  See also Figures S4-S6, the key resources table, and supplemental information. the remainder, data not available (Data S1). 51% of the new species are described solely by morphology, and for meiofauna, 86% are described on morphology alone. For the key macrofaunal groups (tanaids, isopods, and polychaetes), 23% of species in the Checklist have type localities outside the CCZ, including other ocean basins (33/145). In total, 5,367 unnamed species are recorded, an estimated 3.9% of which are synonyms (sensu named   45 there are currently 36,579 named metazoan deep-sea species found globally at depths >500 m. Within WoRDSS, the most speciose phyla are Arthropoda, 31%, Mollusca, 17%, Chordata, 15%, and Annelida and Echinodermata, 10%. Key differences include relatively more annelids, nematodes, and echinoderms in the CCZ (and to a lesser degree, sponges and bryozoans), and conversely more molluscs (class Gastropoda) and chordates (class Teleostei) in WoRDSS (Figure 4). Another notable difference at class level is that Holothuroidea (named and unnamed) are relatively more speciose in the CCZ than other key echinoderm classes (Asteroidea, Ophiuroidea) compared to WoRDSS. Though many faunal gaps are evident in the CCZ Checklist across phyla (e.g., no Pycnogonida in Arthropoda), these groups are recorded from the CCZ in the unnamed species list (Data S2).
Examining common faunal groupings, 50% of the species in the Checklist are macrofauna (220), with similar proportions for megafauna, 28% (122), and meiofauna, 22% (96). Similarly, most studies primarily assess macrofauna (46%), followed by megafauna (30%) and meiofauna (22%). Descriptions by size class (families, genera, and species combined) are 153 for macrofauna, 24 for megafauna, and 42 for meiofauna. A dominant feature of the CCZ is the unusual combination of mud and hard substrate/nodule fauna. Overall, 14% of named species and 13% of unnamed species in the CCZ are estimated to be primarily nodule dwellers (Data S1 and S2). Several nodule megafauna descriptions (cnidarians and sponges) have recently been published 37,39,40,46,47 ; but only two quantitative studies for metazoan nodule fauna. 48,49 The majority of CCZ macrofaunal nodule dwellers (primarily bryozoans and sponges) are undescribed ( Figure 2, Data S2), a rare exception being a recent monograph on Bryozoa describing 16 species, nine genera, and two families new to science. 30 How many species might live in the CCZ? The Chao1 estimator (abundance-based) for total species richness in the CCZ is 6233 (+/À 82 SE, N = 112,429 ind., S(obs) = 4,716) and Chao2 (sample-based), 7,620 (+/À132 SE; N = 1,668 samples, S(obs) = 4,779; Table 1). Species rarefaction and accumulation curves are far from reaching an asymptote ( Figures 5 and S1). Other species estimates range from 6,109 (+/42 SE), ACE to 8,514 (+/À438 SE), Jacknife2 (Table 1). At lower taxonomic levels, the family accumulation curve approaches asymptote, with an estimated total family richness of 469 total (+/À18 SE, N = 70,597 ind., F(obs) = 406) for Chao1 and 544 total (+/À24 SE; N = 2,179 samples, F(obs) = 423) for Chao2 ( Figures 5 and S1). These estimates are based on a subset of the data where abundance and site information are available. In comparison, the CCZ Checklist incorporating all records includes 501 families in total. Estimates of total genera range from 947 (+/À26 SE) for Chao1 to 1,034 (+/À32 SE) for Chao2, with relatively more flattening of rarefaction curves than for species but still far from asymptote ( Figures 5 and S2). This compares to 1,119 genera in the Checklist (Table 1). Sampling completeness curves show higher completeness for family-level estimates than species, and higher completeness for Chao1 than Chao2 estimates ( Figure S3).

Distribution of sampling effort
Sampling effort, as density of unique sampling sites, shows a highly uneven distribution across the region. Samples are concentrated in central and eastern CCZ contract areas, and large regions with very few samples are evident ( Figure S4). APEIs have very low density of sampling or no samples at all. Large regions, particularly between the west and central CCZ, are close to unsampled (Figures 1 and S4). The density of sampling is highest at certain depths (e.g., $4,200m, $5,000m; Figure S5). These densities correlate with depths of the contract areas in the eastern and central CCZ ( Figure S6). Where abundance data are available, 37% of species occur as singletons, i.e., represented by a single specimen across all sampling deployments (1,586/4,409), indicating extensive under-sampling. Of these singletons, 91% are in mining/reserved areas/the vicinity (1,441), the remainder (145) are found only in Article APEIs (Data S3 and S4). Most species are recorded in the eastern CCZ, closely followed by the central CCZ, with few in the west (Figure 6). The majority of all species are recorded from contract or reserved areas, with few in APEIs. Overall, 95% of named/unnamed species have not been recorded in the APEIs.

DISCUSSION
Species richness estimates are likely to increase as the data improve This synthesis of all published biodiversity from the CCZ has allowed the first estimates of both the known and unknown species richness across the region. This is important as it sets a baseline for the current state of knowledge while placing the CCZ in a global context. At species level, it is clear that sampling of the CCZ is very far from complete. Species are accumulating rapidly with increasing samples, with rarefaction and accumulation curves far from asymptote ( Figures 5 and S1). Estimates at family level may be more robust given the lower likelihood of synonyms and misidentifications than for species. 59 The Chao1 total family estimate of 469 (+/18 SE) falls short of the current total in the Checklist at 501, but Chao2 at 544 (+/À24 SE) exceeds it. Family-level diversity is expected to be higher than is currently recorded in the Checklist given evidence of extensive under-sampling and the observation that curves have not reached asymptote (Figures 5, S1, and S2). Chao2 (sample-based) estimates exceeding the Checklist appear more robust, which may partly stem from Chao2 accounting more for missed data in surveys. However, few species records in the dataset represent whole-sample analyses (i.e., only select taxa are identified), likely contributing to underestimation of diversity in these estimates also. 60,61 Data duplication can contribute to underestimates of diversity, as relative proportions of rare species, including singletons, will be affected. 15,62-66 Extensive record duplication is evident in the ISA database DeepData, estimated to be at least a quarter of the total. Although removed for final analysis, further duplication is suspected but cannot be definitively identified owing to underlying limitations of the database. 67 Including the known duplicates, species estimates are >1,000 lower.
Perhaps most importantly for these estimates, some regions and habitats of the CCZ have barely been sampled at all. For example, there are only six published studies of rocky seamounts and outcrops, which appear to host very different communities. [68][69][70][71][72][73] The CCZ, with abundant nodules and rocky outcrops, exhibits high habitat heterogeneity 16,74 compared with sedimented abyssal plains 75,76 (although a recent study suggests rocky outcrops may be more common than widely assumed 77 ). This unusual ''mosaic'' habitat of nodule and sediment at local scales supports relatively higher benthic biodiversity. 16,74,78,79 Overall, many regions of the CCZ are almost unsampled (Figures 1 and S4) and this data deficiency will contribute to underestimation of diversity for the region.
Estimates of species richness are subject to other biases which can either inflate or reduce projections. Synonyms for unnamed species appear rare at 4%, but additional synonyms yet to be identified are inevitable, which would inflate the species estimate. Inflation of informal names can also accrue over time as designations change, and names proliferate. Misidentifications could increase or reduce the diversity estimates, but similarly contribute to overall uncertainty. An unknown proportion of the named species in the CCZ Checklist will be misidentified, owing in part to the lack of regional field guides. Conversely some of the unnamed species may be known species yet to be correctly identified. The lack of field guides can also contribute to rangeinflation of cosmopolitan species. 80,81 For the key macrofaunal groups in the CCZ Checklist (polychaetes, tanaids, and isopods), 23% of species have type localities outside the region, including other ocean basins (33/145). Although wide-ranging benthic species have been confirmed, 80,81 including in the CCZ 26 , the 23% may be undescribed cryptic species (or species complexes), particularly prevalent in the deep sea 9,10,82,83 and previously recorded from the CCZ. 11,12 Resolving these identifications requires genetic data both from the CCZ specimen and the type locality of the species it most closely resembles. Diversity based solely on morphological assessment can underestimate biodiversity by 20%-25%. 82,84 Although most of the CCZ new species have been described since the advent of DNA taxonomy methods (Figure 2A), 51% are described by morphology only. This figure rises to 86% for meiofauna, partly reflecting challenges of molecular sub-sampling from small-sized specimens. 85 Unknown cryptic speciation may be high in this size fraction for the CCZ 33 but this may be quite taxon-specific. 86 The figure of 92% of species undescribed is likely to be overestimating undescribed species owing to synonyms, but underestimating given known levels of cryptic species 11 and under-sampling in the CCZ (Figure S4). In the subset-analysis of taxonomic studies, the potential for misidentification is greatly reduced as groups are examined by their specialists (Table S1). This provides an additional line of evidence to support $90% of CCZ species being undescribed.
Where does CCZ biodiversity fit in a global context? Species composition of the CCZ Checklist differs from WoRDSS, even at phylum level ( Figure 4). Though some trends (such as relatively high diversity of holothurians) may be real, they will be heavily influenced by taxonomic trends, size fractions assessed, sampling bias, and availability of specialists. The majority of species (named and unnamed) are macrofauna, reflecting numerous studies on this size class. Megafauna, comprising the largest and thereby least abundant species 87 are rarely collected, compromising species-level identification. Aside from descriptions, there are only two synthetic taxonomic checklist studies with archived vouchers that cover multiple megafaunal taxa, 88,89 and three covering specific taxa. [90][91][92] This reflects the challenges of collecting larger animals, typically involving remotely operated vehicles (ROVs)-which are expensive and require specialists to operate-or trawls, which are inherently destructive of animals. 93 Meiofauna, often regarded as the dominant component of deep-sea ecosystems, at least in terms of biomass if not diversity, 94,95 are also likely to have considerable undocumented species richness given significant sampling challenges. 96 Biases may also be present in WoRDSS (Figure 4) given chronic under-sampling in the deep sea 97,98 and taxon-specific factors, e.g., Nematoda being highly speciose but notoriously difficult to identify to species level. 36,96 ll OPEN ACCESS There are few comparable estimators of biodiversity in other broad-scale regions of the deep sea. One study of the Southern Ocean deep sea reported 674 isopod species of which a high proportion (87%) were new to science. 83 Undescribed CCZ isopods are higher with an estimated 96% being new species 3,17 (23 named species, 474 unnamed). Total marine species richness estimates reviewed in Appeltans et al. 58 range from 300,000 99 to 10 million, 100 the latter regarded as a significant overestimate, and the former a significant underestimate. 97,101 Our figure of 92% is similar to the proportion of currently known marine species in WoRMS (241,129) 45 to the Mora et al. 57 global estimate at 89%. The current CCZ Checklist represents just 1% of currently recorded deep-sea species in WoRDSS (36,579). 44 Including unnamed species, this would rise to 15%, or species estimators, 17%-24%.
Clearly the CCZ represents significant undescribed biodiversity. With 31 new genera and three new families, (and several additional new genera and at least one additional new family known to the authors), the Checklist illustrates the novelty of the region at deep taxonomic levels. Evolutionary novelty has been previously recorded in the CCZ for echinoderms, 12 but it is noteworthy that this extends across further taxa. Diversity of life-history strategies are beginning to be recorded in the CCZ 102 as elsewhere in the deep sea, such as association with sponge stalks. 89,103 Characteristic sediment-dwelling infauna such as nematodes, isopods, and polychaetes are now being found living in and on nodules, illustrating the interconnectivity of nodule-sediment dwelling lifestyles. 27,104,105 Beyond nodule-dwellers, many suspensionfeeding forms depend on nodules. Spatial ecology studies report 60%-80% of the megafauna (largely dominated by suspension feeders in the CCZ) to be found growing attached to nodules. 78,79 Pertinent questions remain on the relative vulnerability of nodule and sediment fauna to mining impacts. 106 Remarkably little is known of life-history traits of these species and answering these questions is an immense challenge in a region where most species are rare and a third appear to have been found only once.

Article Conclusions
The proportion of undescribed species in the CCZ has been reported as being over 80% within taxa. 11,17,22 Our study provides the first quantitative support for that figure across multiple taxonomic groups, with two estimators (88% and 92%) clearly illustrating the remaining taxonomic impediment to an understanding of CCZ biodiversity. Addressing the ''lost decades'' of CCZ taxonomy will require extensive collaboration between stakeholders supported by regulatory bodies/governments and appropriate and sustained funding. 17,67,85,107,108 Programs such as the new ISA Sustainable Seabed Knowledge Initiative (SsKI) 109 recognized under the UN Ocean Decade should be leveraged to fund descriptions in all taxonomic groups. As the new species will take years to be formally described, a robust approach to open nomenclature in the medium term is also important to ensure that species-level taxa can be referenced and that datasets are comparable and linked to open data and specimen vouchers. 5,85,90,91 The CCZ Checklist is a key step forward in an iterative process towards field guides for the region, which will dramatically improve identifications and reduce uncertainty. Our study provides the first regional estimates of species diversity for all size classes. Although uncertainty is high, these estimates provide a starting point to be developed as additional data and approaches become available. Development in statistical methods for estimating species richness will be critical to future assessments of diversity in such poorly sampled environments. 110,111 Given mining operations may be imminent, a key consideration for the CCZ is the application of biodiversity data for environmental management, in particular assessing species extinction risk. Often assumed to be lower in marine environments, this appears largely an artefact of lower taxonomic knowledge compared to terrestrial ecosystems. 112 The UNCLOS states that ''no serious harm'' can occur from any mining activities and that necessary measures must be taken to protect the environment from any harmful effects. Although sometimes equated with no loss of biodiversity, the definition of the term ''serious harm'' (and that of ''lower environmental thresholds'') remains to be clarified. 24,113 Accurately quantifying species ranges and rarity, key components of extinction risk, requires a comprehensive ll OPEN ACCESS approach to taxonomy, 114 extensive molecular studies, 115 and standardized quantitative methods 20 enabling regional analyses. This is particularly important given that the CCZ remains one of the few remaining areas of the global ocean with high intactness of wilderness. 116 Sound data and understanding are essential to shed light on this unique region and secure its future protection from human impacts.

STAR+METHODS
Detailed methods are provided in the online version of this paper and include the following:

ACKNOWLEDGMENTS
Full funding for this study was provided by The Pew Charitable Trusts (Contract ID 34394). This work was possible through collaboration of The Pew Charitable Trusts, the Natural History Museum London, and the International Seabed Authority, the first formal partnership of these organizations. We are very grateful to Andrew Friedman, Chris Pickens, and Peter Edwards of The Pew Charitable Trusts for their support and assistance throughout the project. We would also like to thank Luciana Genio, Sheldon Carter, Tamique Lewis, and Ansel Cadien of the ISA Secretariat for their cooperation and assistance.  line'. The separate 'Point' and 'Trawl Line' data downloads were combined into the same dataset. Data and column headings varied between the two datasets, e.g. 'actual latitude' in the 'Point' data, and 'start latitude' and 'end latitude' in the 'Trawl Line' data. Data were harmonized, e.g. for coordinates and depth the end-point was used and additional columns added to the 'Point' data to allow the datasets to be combined. Initial data exploration found that the database export did not contain a record identifier. To examine the data, first it was necessary to establish a unique key or record identifier for every individual record (or row of data) in the dataset. A composite key was created to ensure a unique key or identifier for every record by combining the following DeepData identifier fields: 'ContractorID' + 'StationID' + 'SampleID'. The composite key was checked for any duplicates, and none were found. Data columns were checked and edited where necessary, e.g. for depth, missing values were listed as À9, these were replaced with 'NA'. Where possible this was scripted in R, but where multiple entries for character variables were present, this was done in Microsoft Excel 365. Any data point needing cleaning or editing was copied so the original data column and the processed data column were in the same dataset, with the latter renamed with a suffix '_ed'. Initial examination of taxonomic information found variable recording of data. Taxonomic information was cleaned with the 'taxonmatch' tool in WoRMS, a QA/QC function on the web portal where scientific names can be validated against the database. As above, data columns were copied and edits made on the copied column, with spelling and formatting mistakes removed. Taxonomy was mapped to the correct column, e.g. class names in the order column were moved to the class column. No column for scientific name was present, i.e. the actual identification of the specimen referenced in a given record, here a column was added, populated with the lowest taxonomic level reported (i.e. species name if recorded rather than genus name only). If a name was noted with question mark, recorded with a qualifier indicating uncertainty in identification (e.g. Incerta) or written as two names, then the next highest taxonomic level was recorded, e.g. if two family names were recorded, the order name was recorded instead. For informal names or open nomenclature designations, scientific name was also recorded, mapped to the lowest scientific level recorded above species level. If a species name was present, e.g. Paralicella cf. caperesca no 5, the genus name was recorded for the scientific name. This resulted in a final dataset of 40,518 records for DeepData (https://github.com/howlerMoonkey/CCZ_BIODIVERSITY/blob/main/ Data-fin/Data_S6_DeepData.csv).
For contextual spatial data, all mining exploration contract areas, both active and reserved, and Areas of Particular Environmental Interest (APEI) shapefiles were downloaded from the ISA website (https://www.isa.org.jm/exploration-contracts/maps/); combined into one shapefile in QGIS version 3.10, Coruñ a (QGIS.org, 2020). Bathymetric data were sourced from GEBCO (General Bathymetric Chart of the Oceans; https://www.gebco.net/). A search area was created covering the entire CCZ region. Coordinates for a polygon covering the CCZ including the combined CCZ shapefile were established with the following coordinates (in decimal degrees): northwest À164.01462, 15.70629; southwest À155.04998-5.51238; southeast À101.9181 6.05623; northeast À117.66088 23.72549 (see R script, Data S5).
Data were collected from the Ocean Biodiversity Information System (OBIS) and the Global Biodiversity Information Facility (GBIF). OBIS occurrence data were downloaded as a Darwin Core file also on the 12 th of July, 2021 using the 'occurrence' function in the robis package, 119 with the CCZ polygon as delineated above, for all depths. DeepData records have been harvested by OBIS since June 2021 and published on the OBIS ISA node. 120 These records were analyzed separately in the parallel study, Rabone et al. 67 to examine ISA data mapping procedures. To avoid duplication of DeepData records across the databases, they were not included in the dataset for analysis (identified as records tagged as owned by the ISA in the Darwin Core 'accessRights' field). GBIF occurrence data were downloaded from the web portal also on the 12 th of July, 2021; from all depths, using the polygon search function, with the CCZ polygon coordinates.
All records from GBIF and OBIS were mapped together with the CCZ shapefile, using the following R packages: 'GADMTools' 121 ; 'sp' 122 ; 'spatialEco' 123 ; 'maptools', 124 'rgdal' 125 and 'rgeos'. 126 All dataset records were sub-selected by depth, with depths of 3000m and greater included. Some records without depth values were present, those falling within or near the CCZ shapefile were reviewed and included if valid, for example if a benthic species/taxa associated with a publication and a benthic collection method e.g. a box core sample; and/or a relevant reference in 'datasetName' or 'associatedReferences' column. As an additional check to ensure all relevant benthic records were selected and pelagic records removed, the scientific names recorded were cross-referenced to habitat information recorded in WoRMS (the World Register of Marine Species). 45 Following record selection by depth, datasets were remapped. The data selection by depth resulted in a significant reduction in records, with all records at depth falling within contract areas/APEIs or close by. The latter records falling outside the CCZ shapefile were reviewed to check all relevant records were captured. In the final data selection, all non-metazoan and fossil records were excluded from datasets. This resulted in a final dataset of 2185 records for OBIS (https://github.com/howlerMoonkey/CCZ_BIODIVERSITY/blob/main/Data-fin/Data_S7_OBIS.csv) and 2405 records for GBIF (https:// github.com/howlerMoonkey/CCZ_BIODIVERSITY/blob/main/Data-fin/Data_S8_GBIF.csv). records by depth was computed in R to visualize sampling effort by depth, by subdividing data into 10 sample quantiles ( Figure S5). Total sample records by contract area/APEI and depth were also plotted to visualize differences by contract area, given the known depth gradient in the CCZ ( Figure S6). 84 Comparison of CCZ Checklist with global checklists To provide a degree of global context to CCZ biodiversity as currently recorded, the proportion of named to unnamed species was compared to published estimates of global marine species diversity versus the current recorded total of known global marine (eukaryotic) species currently in WoRMS (241,129). 44 Relevant literature was searched to identify estimates and any assessments of their accuracy. No global estimates of deep-sea species richness published to date were identified in the search, therefore global marine species richness estimates were examined. Estimates from Mora et al. 57 and Appeltans et al. 58 were primarily used on basis of analysis by Poore et al. 101 To examine taxonomic composition of the Checklist in relation to global datasets, data were requested from WoRMS and a database copy of WoRDSS 44 was provided from the 1 st of January, 2023 and archived on GitHub (https://github. com/howlerMoonkey/CCZ_BIODIVERSITY/blob/main/Data-fin/Data_S11_WoRDSS.csv). Non-metazoans were removed from the dataset and relative proportions of species by phyla were calculated and plotted to compare the CCZ to all deep-sea metazoan species recorded to date.

ADDITIONAL RESOURCES
The CCZ Checklist created in this study is published as a webpage, available via the World Register of Deep-Sea Species WoRDSS, 44 subregister of the World Register of Marine Species (WoRMS) 45 at https://www.marinespecies.org/deepsea/ccz_ checklist.php.