Circumpolar dataset of sequenced specimens of Promachocrinus kerguelensis (Echinodermata, Crinoidea)

Abstract This circumpolar dataset of the comatulid (Echinodermata: Crinoidea) Promachocrinus kerguelensis (Carpenter, 1888) from the Southern Ocean, documents biodiversity associated with the specimens sequenced in Hemery et al. (2012). The aim of Hemery et al. (2012) paper was to use phylogeographic and phylogenetic tools to assess the genetic diversity, demographic history and evolutionary relationships of this very common and abundant comatulid, in the context of the glacial history of the Antarctic and Sub-Antarctic shelves (Thatje et al. 2005, 2008). Over one thousand three hundred specimens (1307) used in this study were collected during seventeen cruises from 1996 to 2010, in eight regions of the Southern Ocean: Kerguelen Plateau, Davis Sea, Dumont d’Urville Sea, Ross Sea, Amundsen Sea, West Antarctic Peninsula, East Weddell Sea and Scotia Arc including the tip of the Antarctic Peninsula and the Bransfield Strait. We give here the metadata of this dataset, which lists sampling sources (cruise ID, ship name, sampling date, sampling gear), sampling sites (station, geographic coordinates, depth) and genetic data (phylogroup, haplotype, sequence ID) for each of the 1307 specimens. The identification of the specimens was controlled by an expert taxonomist specialist of crinoids (Marc Eléaume, Muséum national d’Histoire naturelle, Paris) and all the COI sequences were matched against those available on the Barcode of Life Data System (BOLD: http://www.boldsystems.org/index.php/IDS_OpenIdEngine). This dataset can be used by studies dealing with, among other interests, Antarctic and/or crinoid diversity (species richness, distribution patterns), biogeography or habitat / ecological niche modeling. This dataset is accessible through the GBIF network at http://ipt.biodiversity.aq/resource.do?r=proke.

Study area descriptions/descriptor: The 1307 specimens in this dataset were collected from the Southern Ocean, south of the Sub-Antarctic Front (SAF): Kerguelen Plateau (Kerguelen and Heard islands), Davis Sea, Dumont d'Urville Sea, Ross Sea, Amundsen Sea, West Antarctic Peninsula, East Weddell Sea and Scotia Arc (from the tip of the Antarctic Peninsula and the Bransfield Strait to the South Georgia island). The bathymetric range extended from 65 to 1162 meters deep.
Design description: This dataset was gathered to conduct a circumpolar phylogeographic study of the crinoid species Promachocrinus kerguelensis ) and designed to spatially improve the sampling of Wilson et al. (2007), which was limited to the Atlantic sector of the Southern Ocean. The aim of  was to test the circumpolarity of the genetic lineages of Wilson et al. (2007), and to test whether these lineages represented an under-sampling artifact of a large and genetically diverse metapopulation or whether they were truly representative of the Southern Ocean. The authors used a sampling strategy designed to cover the broadest possible genetic variation and to explore the evolutionary relationships among the seven lineages, in order to be able to conduct population analyses (Meyer and Paulay 2005). They also wanted to understand the distributional limits of each phylogroup in Promachocrinus kerguelensis to assess the connectivity displayed throughout their range, and to test the "multiple refugia" theory by studying the demographic history of each phylogroup. For this purpose, more than two thousand specimens, sampled during the most recent Antarctic cruises focused on benthic biodiversity and fixed and preserved in a way allowing for DNA extraction and amplification (fixed in ethanol or frozen), were provided by several taxonomists and benthologists from different institutions. Specimen identifications during the sampling cruises were conducted to a higher level allowed by the taxonomic skills of the collectors then checked principally at the Muséum national d'Histoire naturelle, Paris by taxonomists trained to deal with Ant-arctic crinoids. The Cytochrome c Oxydase subunit I (COI) was successfully sequenced for 1307 of these specimens. Both collection data and produced sequences were digitized in appropriate databases, used or ready to be used for publishing purpose (Figure 1).
Data published through GBIF: http://ipt.biodiversity.aq/resource.do?r=proke as an Excel spreadsheet of the dataset, available through the Darwin Core Archive format at http://ipt.biodiversity.aq/archive.do?r=proke.

taxonomic coverage
General taxonomic coverage description: This dataset focuses on the Antarctic comatulid species Promachocrinus kerguelensis (Carpenter 1888), the most abundant and morphologically variable comatulid species in the Southern Ocean (Speel and Dearborn 1983). It corresponds to the 1307 specimens sequenced in .

General spatial coverage
The specimens of Promachocrinus kerguelensis gathered in this dataset were collected from most of the strategic regions in the Southern Ocean (triangles in Figure 2): the Antarctic continental shelf (East Weddell Sea, Davis Sea, Dumont d'Urville Sea, Ross Sea, Amundsen Sea, West Antarctic Peninsula), the Scotia Arc islands (South Shetland, South Orkney and South Sandwich) and the Sub-Antarctic islands (South Georgia, Kerguelen and Heard). Specimens were sampled at depths ranging from 65 m to 1162 m. This covers most of the known distribution area of this species (black circles in Figure 2), but only a portion of the bathymetric range for this species, which extends from 10 m to 2100 m (Speel and Deardorn 1983).

General temporal coverage
The specimens were collected during one to four different cruises per sampling region for a total of 17 cruises from 1996 to 2010 (Figure 3). However, the number of specimens was too variable among cruises to be statistically compared (see details of numbers in the Methods part).  Sampling description: The specimens were sampled using several sampling gears, depending on the cruise: agassiz trawls, beam trawls, bottom trawls, box corers, epibenthic sledges ). During each cruise, specimens were sorted onboard and then fixed and preserved in 70-95% ethanol or first frozen and subsequently preserved in ethanol. The specimens were curated by each institution once back from the field and digitized in their own databases before the specimens were gathered by the authors in the purpose of the molecular study. Metadata associated with each specimen were extracted from the cruise reports. The molecular data (barcoding) were generated following the protocols described in Ivanova et al. (2006), Eléaume et al. (2011) and.
Quality control description: The initial geo-referencing was done by means of the vessel onboard GPS systems. Samples identification was supervised and checked by Marc Eléaume, crinoid taxonomist at the Muséum national d'Histoire naturelle, Paris, following Clark and Clark (1967) taxonomic description of the species, and matched to the World Register of Marine Species (WoRMS). The barcoding was done by Lenaïg G. Hemery at the Muséum national d'Histoire naturelle, Paris, and by the Canadian Center for DNA Barcoding, Toronto, and the Scripps Institution of Oceanography, San Diego, and matched to sequences already available on the Barcode of Life Data System (BOLD: http://www.boldsystems.org/index.php/IDS_OpenIdEngine). All sequences, specimen occurrences and identifications are linked together through unique numbers in BOLD under the public project name PROKE.

Datasets
Dataset description: This dataset has been generated for a molecular study of the Antarctic comatulid species Promachocrinus kerguelensis, improving the geographic coverage of the previous study by Wilson et al. (2007). All the specimens are identified by several types of numbers that are linked together: Sample ID (characteristic of each individual), BOLD ID, GenBank ID and SeqID (all three characteristic of each sequence in different databases), Field Number (when available) and Museum ID. In some cases, the two last identifiers are shared by several individuals identifiable from each other by their own Sample ID. The dataset also includes the name of the institution storing the specimens, the complete taxonomy, names of identifiers and collectors, and information on the sampling itself: cruise names, vessel names, sampling gears, dates, regions, sectors, exact sites (when available), station numbers, latitudes and longitudes in decimal degrees, and depths in meters. This dataset is suitable to be used in studies dealing with, for example, Antarctic and/or crinoid diversity (species richness, distribution patterns), biogeography or habitat / ecological niche modeling.