Data on elemental concentrations in marine sediments from the South and South West of England

The present Data In Brief methodological paper details the acquisition, mining and pre-processing of elemental concentration data in marine sediments (coastal and open sea) of Southern England, presented and discussed in the co-submitted Environment International paper entitled: “Three decades of trace element sediment contamination: the mining of governmental databases and the need to address hidden sources for clean and healthy seas” [1]. Elemental sediment concentration data were obtained from the two main UK environmental sources, i.e. the Environment Agency (EA) and the Marine Environment Monitoring and Assessment National (MERMAN) database managed by the British Oceanographic Data Centre (BODC). The merged database is the result of a rigorous data selection-validation process and provides spatially and temporally extensive records of arsenic (As), cadmium (Cd), chromium (Cr), copper (Cu), iron (Fe), mercury (Hg), nickel (Ni), lead (Pb) and zinc (Zn) concentrations for hundreds of sites over 31 years (1983–2013). Additional records of manganese (Mn), aluminium (Al), lithium (Li), tin (Sn) [and tributyltin (TBT)], barium (Ba), antimony (Sb), boron (B), calcium (Ca), molybdenum (Mo), cobalt (Co), selenium (Se), potassium (K), magnesium (Mg), beryllium (Be), vanadium (V), titanium (Ti), sodium (Na), silver (Ag), thallium (Tl) and strontium (Sr) are also included. The full secondary database is hosted in the Mendeley Data repository and the geo-spatial information to map sites is given in supplementary files to the paper. To provide end-users with the relevant context on spatial and temporal coverage, monitoring statistics are given for the nine trace elements (TEs). Site-specific statistics include: the first and last year of sediment monitoring, the number of years monitored, and minimum, maximum, mean and median numbers of years monitored. Also given are summary data on the number of sites monitored each year, from the first records from 1983 to 2013. For the nine TEs (total and strong acid digestion techniques are considered separately for Cr and Fe), monitoring statistics are presented separately for coastal and open sea sites. Data are relevant to diverse end-users to assess the local and regional contaminant loads and to contextualize anthropogenic threats to benthic systems in multiple locations from the French/English Channel, southern North and Celtic Seas.

tions for hundreds of sites over 31 years . Additional records of manganese (Mn), aluminium (Al), lithium (Li), tin (Sn) [and tributyltin (TBT)], barium (Ba), antimony (Sb), boron (B), calcium (Ca), molybdenum (Mo), cobalt (Co), selenium (Se), potassium (K), magnesium (Mg), beryllium (Be), vanadium (V), titanium (Ti), sodium (Na), silver (Ag), thallium (Tl) and strontium (Sr) are also included. The full secondary database is hosted in the Mendeley Data repository and the geo-spatial information to map sites is given in supplementary files to the paper. To provide end-users with the relevant context on spatial and temporal coverage, monitoring statistics are given for the nine trace elements (TEs). Site-specific statistics include: the first and last year of sediment monitoring, the number of years monitored, and minimum, maximum, mean and median numbers of years monitored. Also given are summary data on the number of sites monitored each year, from the first records from 1983 to 2013. For the nine TEs (total and strong acid digestion techniques are considered separately for Cr and Fe), monitoring statistics are presented separately for coastal and open sea sites. Data are relevant to diverse end-users to assess the local and regional contaminant loads and to contextualize anthropogenic threats to benthic systems in multiple locations from the French/English Channel, southern North and Celtic Seas.  Table   Subject Environmental Science -Pollution Specific subject area Analysis of elemental concentration data from public repositories to assess the contamination in marine sediments. Type of data Figure Table  Database How data were acquired Data on sediment elemental concentrations were acquired from the Environment Agency and the British Oceanographic Data Centre. data format analyzed filtered secondary data Parameters for data collection Only data provided by the two UK key public repositories identified were included in the analysis and processing stages.

Description of data collection
Coastal and open sea monitoring data recording sediment sample elemental concentrations were requested from two key UK public repositories: a) the Environment Agency (EA) and b) the Marine Environment Monitoring and Assessment National (MERMAN) database managed by the British Oceanographic Data Centre (BODC). Files were requested for sediment elemental concentration data for all UK marine waters (BODC, .xls(x) extension files); and from the two regions South England and South-West England (EA, .mdb extension files

Value of the Data
• The database provides spatially and temporally extensive records of arsenic (As), cadmium (Cd), chromium (Cr), copper (Cu), iron (Fe), mercury (Hg), nickel (Ni), lead (Pb) and zinc (Zn) concentrations in sediments from coastal and open sea sites for 320 UK sites over 31 years. • Additional records of manganese (Mn), aluminium (Al), lithium (Li), tin (Sn) [and tributyltin (TBT)], barium (Ba), antimony (Sb), boron (B), calcium (Ca), molybdenum (Mo), cobalt (Co), selenium (Se), potassium (K), magnesium (Mg), beryllium (Be), vanadium (V), titanium (Ti), sodium (Na), silver (Ag), thallium (Tl) and strontium (Sr) represent approximately 13% of the full database (334 UK sites when considering the twenty-nine chemicals). • The database can be used to assess the contaminant load for specific sites, but also to strengthen and target current and future legislative control measures for anthropogenic contaminant inputs. • Sediment contamination assessment is necessary to understand potential anthropogenic threats and subsequently for managing contaminant impacts upon benthic habitats and trophic bioaccumulation at local, regional and national levels. • Information published in this paper is relevant to marine ecotoxicologists, coastal ecologists (practitioners, scientists and policy makers) and government decision makers.

Data Description
The secondary data linked to this article and hosted in the Mendeley Data repository ( http: //dx.doi.org/10.17632/m68k63nnk3.1 ) provide a summary of > 45,0 0 0 contaminant concentration data points for twenty-nine marine sediment chemicals from 334 Southern England (UK) sites ( Fig. 1 , sites within the English/French Channel and the southern North and Celtic Seas), covering a survey period of 31 years . The geo-spatial information contained in geographic data files (.kml extension files, Supplementary Materials) enables end-users to directly visualize and select sites of interest from their geolocalisation on Google Earth (Google LLC). The data were obtained from two UK key public repositories -the Environment Agency (EA), and the Marine Environment Monitoring and Assessment National (MERMAN) database managed by the British Oceanographic Data Centre (BODC) and were subjected to a rigorous selection-validation process. That process is fully described in the Experimental Design, Materials and Methods section of this paper. The secondary data (.csv extension file) have been organized and labelled for interrogation and searching by end-users. Hereafter are given explanations with regard to interpreting the content. The first variable is the Southern England sampling 'Area', the second a single number-letter code for each site ('SITEnb_db'), then 'Latitude' and 'Longitude' coordinates (WGS84). These are followed by the sampling 'Location': either a coastal site (i.e. in transitional, estuarine and coastline waters) or an open sea site (distant/remote from the coastline). With regard to the 'SITEnb_db' variable, it has no specific significance except for the _EA or _ME component, indicating the original source, i.e. EA or MERMAN database. The 'Site Name' variable is the full site name from the original database (this could be useful for local studies). After the sampling 'Date' comes the 'Dete. Desc.', the determinand descriptor variable from the original database ('ME' character string added for MERMAN determinand descriptors). This variable enables end-users to select data according to the chemical and the related sample processing technique, e.g. grain size fraction used (e.g., < 20 0 0 μm or < 63 μm) or digestion method (e.g. total hydrofluoric acid specified for EA data). From the categorized data, summary statistics are presented in Tables 1 and 2 . The last variables are the 'Chemical' name, the concentration or value of the analytical detection limit (DL) when lower ['Result (ppm, < DL)'], and the concentration with the values under the DL replaced by half of it [2] ['Result (ppm)'].
The overall aim of this paper is to give ecotoxicologists, coastal ecologists (practitioners, scientists and policy makers) and government decision makers a ready-to-use .csv database detailing elemental composition of sediments ( http://dx.doi.org/10.17632/m68k63nnk3.1 ), useful for local, regional and global case studies. The data for nine trace elements (TEs; As, Cd, Cr, Cu, Fe, Hg, Ni, Pb and Zn), represent 87% of the database and all (except Fe) are included on the US EPA priority pollutant list [3] . The nine TEs have on average 4434 ± 607 SD sedimentary -silt and clay [4] -measurements from 320 of the 334 sites. In addition, data detailing 20 supplementary elements (Mn, Al, Li, Sn [including tributyltin (TBT)], Ba, Sb, B, Ca, Mo, Co, Se, K, Mg, Be, V, Ti, Na, Ag, Tl and Sr) are included. To provide end-users with the relevant context on spatial and temporal coverage, monitoring statistics are given for the nine TEs in Table 1 . These include the first and last year of sediment monitoring, the number of years monitored, and summary statistics showing minimum, maximum, mean and median numbers of years monitored. Statistics are given separately for coastal (  Table 1 also contains site geographic coordinates and their regional location (Area). Table 1 , used in conjunction with the map of the sites (Supplementary Materials), allows for site identification and elements for which the monitoring effort was greatest, without having to analyse the complete .csv database. Finally, Table 2 gives, for each of the nine TEs (td and sad techniques considered separately for Cr and Fe), the number of sites monitored per year, from the first 1983 records to 2013. Statistics are given separately for coastal (   2001  2004  3  3  0  3  3  0  3  3  3  3  4  3  4  3  3 ( continued on next page ) ( continued on next page ) ( continued on next page )       Data stored in the MERMAN database are dedicated to marine waters only (coastal and open sea waters) from 1999 onwards. They were provided as a list of files (.xls(x) extension files), each corresponding to one year of sediment monitoring with a unique 'Determinand Full Name' per element. Georeferenced MERMAN sites (WGS84 coordinates) were projected on to maps (Google Earth projection, Google LLC) and subsampled northwards from the Celtic Sea to the Bristol Channel and Thames Estuary. To one site there corresponded several georeferenced sampling points, resulting in multiple close coordinates for the same location. These multiple coordinates were averaged so that each site only corresponded to a unique geolocation. The resulting secondary MERMAN dataset for Southern England contained sediment concentration data of thirteen TEs: Al, As, Cd, Cr, Cu, Fe, Pb, Li, Mn, Hg, Ni, V and Zn from 95 sites, for a total of 12,540 data points.
The two South and South West England datasets sent by the EA (codes SO and SW, held in one dataset separated by region [.mdb extension files]) contained elemental concentration data from the mid-1980s. To avoid losing any data along the transition from estuarine to coastal water continuum, both inland and marine site data were requested. Georeferenced EA sites were projected on to maps (Google Earth), after coordinate transformation from OSGB36 format to WGS84 for ease of use with Google Earth and ArcGis (Esri, Redlands, CA). Only sites in transitional, estuarine and coastal waters were considered; i.e. all sites under marine and tidal influence (terrestrial and freshwater sites were discarded). We considered that geographic coordinates of EA-sampled sites were correct, since we selected sites according to their geographic position; although in earlier sampling years, site coordinates corresponded sometimes/often to the highest point of the tide on the shoreline. Nowadays, EA site coordinates correspond to sampling locations within water bodies (EA, C. Ashcroft pers. com.).
Early EA records preceded computer-recording processes, which sometimes resulted in a mismatch of sample material codes with more recent records. We, therefore, selected EA data as follows. Each data, for previously georeferenced selected sites only, had a unique material description identifier, or 'MATERIAL_DESC'. EA data with a 'MATERIAL_DESC' related to sediments: Limiting our request to these nine sediment identifiers, would have missed the older data for which the 'MATERIAL_DESC' identifier hadn't been properly encoded. Because the main objective of the co-submitted Environment International paper [1] was to investigate the temporal trend of TE contamination, additional data with material identifiers not linked to sediments were also selected. From an extended dialogue and question and answer process with the EA, we were able to select additional sediment elemental concentration data, belonging to nine supplementary 'MATERIAL_DESC' such as 'UNCODED', 'SOIL', 'SEA WATER' etc. This time consuming, but necessary, approach required checks of all selected mapped sites for concentration data that did not correspond directly to one of the sediment-related identifiers. If any doubt remained for sites and/or an elemental concentration data points, they were discarded from the final filtered EA dataset. The data filtering related to sediment identifiers only, resulted in a dataset of 30,395 data; but with the supplementary nine MATERIAL_DESC identifiers, we generated a dataset of 39,910 data, i.e . 31% larger, for twenty-nine chemicals (including the thirteen in the MERMAN dataset), in 254 sites. These chemicals were: As, Cd, Cr, Cu, Pb, Hg, Ni, Zn, Fe, Mn, Al, Li, Sn [including tributyltin (TBT)], Ba, Sb, B, Ca, Mo, Co, Se, K, Mg, Be, V, Ti, Na, Ag, Tl and Sr.

Generating a new database on sediment elemental concentrations for Southern England
The resulting EA and MERMAN datasets were merged into a unique EA-MERMAN database, after harmonization of the names of shared variables (e.g. latitude variable was labelled 'X' in the EA dataset, 'Sample.Latitude' in the MERMAN dataset). The EA dataset contained 21 variables, the MERMAN dataset 23 variables, the merged database 29 variables, subsequently in this were included three new variables created for data analysis. In particular, a new code was assigned to each site with an identifier for EA or MERMAN origin ['SITEnb_db' variable with a unique number/letter ('EA' or 'ME')]. Elemental concentration units were standardized. Some concentrations were in ppm, others in ppb or in %, with differences of units between datasets. Some coastal sites data were also duplicated between the two EA and MERMAN datasets; these were removed, giving priority to the EA data (the time series dataset).
Once merged, the resulting EA-MERMAN database consisted of 45,962 data-points, from 334 sites, for twenty-nine chemicals over 31 years (1983-2013) of environmental monitoring. The twenty-nine chemical full database is saved in a ready-to-use .csv format for further analysis ( http://dx.doi.org/10.17632/m68k63nnk3.1 ). For the co-submitted Environment International paper [1] we selected the nine most monitored TEs, namely Cu, Zn, As, Cd, Cr, Fe, Hg, Ni and Pb together representing 87% of data (320 sites). Subsequent data analysis for this subset is fully detailed in [1] .

Background detail on sample processing and trace element analysis
All competent UK authorities undertaking monitoring (should) use the same programme monitoring manual, the 'Clean Seas Environment Monitoring Programme (CSEMP) -Green Book' [5] . Whilst, detailed information on procedural guidelines for sediment TE analysis is available in the Appendices 6 and 7 of the Green Book, we have provided relevant summary information here for context during the EA and MERMAN database usage. The shared analytical considerations of the secondary, compiled EA-MERMAN database are the concentration unit (mg kg dw −1 ) and, for the nine TEs the median detection limit values (DL, in mg kg dw −1 ): 0.1 for Cd, Cr (sad) and Hg, 1 for Cu, 2 for As, 3 for Fe (sad), 5 for Ni and Zn and 8 for Pb. For TE concentrations below the analytical procedure DL, we used half the DL values [2] .

MERMAN samples
Sediment TE concentration data stored in the MERMAN database are acquired following the Green Book guidelines. Briefly, sediment samples are wet or freeze-dry sieved through a nylon 63 μm mesh, and the < 63 μm silt and clay fraction [4] , i.e. the grain size fraction that accumulates contaminants is retained for analysis (a small minority of MERMAN sediment samples were 'untreated', thus removed from the analysis). A total digestion procedure, most often hydrofluoric acid (HF) digestion, is required to allow data to be normalized (e.g. to Al or Li) to facilitate inter-site comparison of anthropogenic contamination levels. A partial extraction method is acceptable for determination of long-term trends at sites where this method has traditionally been used (see the case of the EA database below). The analytical technique chosen is not mandatory, but most laboratories now use Inductively Coupled Plasma Mass Spectrometry (ICP-MS) for TE determination. Hg can be determined by cold vapour atomic absorption spectrometry or atomic fluorescence.

EA samples
For EA samples, the protocol has evolved since the acquisition of the oldest (1980s) data. EA sediment samples were formerly wet sieved and the < 90 μm fraction retained, but this was changed to the < 63 μm fraction in the 1990s; a small minority of samples were sieved through a 20 0 0 μm mesh size, thus were removed from the analysis. Sediment requiring analysis for TE contaminants would have been analysed following digestion with hot nitric acid (HNO 3 ) or aqua regia (sad technique). The intention would have been to maximise the extractable TE components that could be considered as bioavailable [8] in the environment. A partial extraction method is -according to the Green Book -acceptable for determination of long term trends at sites where this method has traditionally been used (see above), which was the aim of the present analysis and co-submitted [1] paper that studies TE concentration trends over time. For more recent sediment samples acquired in the framework of the CSEMP programme (see above),