Long-term chloride concentrations in North American and European freshwater lakes

Anthropogenic sources of chloride in a lake catchment, including road salt, fertilizer, and wastewater, can elevate the chloride concentration in freshwater lakes above background levels. Rising chloride concentrations can impact lake ecology and ecosystem services such as fisheries and the use of lakes as drinking water sources. To analyze the spatial extent and magnitude of increasing chloride concentrations in freshwater lakes, we amassed a database of 529 lakes in Europe and North America that had greater than or equal to ten years of chloride data. For each lake, we calculated climate statistics of mean annual total precipitation and mean monthly air temperatures from gridded global datasets. We also quantified land cover metrics, including road density and impervious surface, in buffer zones of 100 to 1,500 m surrounding the perimeter of each lake. This database represents the largest global collection of lake chloride data. We hope that long-term water quality measurements in areas outside Europe and North America can be added to the database as they become available in the future.


Background & Summary
Recent analyses estimate there are 27 million naturally-formed lakes and human-constructed reservoirs (hereafter together referred to as lakes) on Earth with surface areas greater than 0.01 km 2 (ref. 1). While these lakes account for less than four percent of Earth's terrestrial land cover, they provide many important social and economic resources such as recreation, fisheries, irrigation, and energy production. Moreover, the vast majority 2 of these lakes are freshwater lakes (salinityo1 g l − 1 ), which are routinely used as drinking water sources. Therefore, it is critically important to protect the ecosystem services that lakes provide for humanity. One important measure of water quality is the amount of salt, or salinity, in a lake. Chloride is often used as a measure of salinity because it is a highly conservative and highly soluble ion, is easily measured with high accuracy, and is a good proxy for salinity 3,4 . While lakes vary in their natural chloride concentrations due to geological factors 2 , changes in chloride levels can result from a response to climate 5 and anthropogenic influences 6 . Anthropogenic sources of chloride include road salt, fertilizer, wastewater, industrial effluents, and hydrochloric acid derived from coal combustion [7][8][9][10][11] . Increasing chloride concentrations in lakes can change the acid neutralizing capacity of the water (pH), increase the transport and bioavailability of heavy metals 12 , increase lake stratification 13 , and alter ecological communities 4,14,15 , thereby degrading water quality and habitat.
Many long-term studies from around the world have identified anthropogenic-induced salinization of lakes, including many large freshwater bodies such as the Laurentian Great Lakes 7,16 , Lake Champlain 17 , Lake George 18 , Lake Constance 19 , and the deep subalpine lakes in Northern Italy 20 . To quantify the prevalence and drivers of long-term salinization in lakes worldwide, we collected chloride data from government organizations, universities, and lake associations; however, datasets covering at least ten years were only available in Europe and North America. The database presented here encompasses longterm chloride data from 529 lakes in Europe and North America, a geodatabase of lake polygon shapefiles, as well as a variety of lake characteristics, land cover metrics, and climate statistics. This database can be used to answer many questions about the salinization of freshwater lakes at different scales, and can be applied to address the impacts of salinization on the ecology of freshwater lakes 21 . Our  hope is that other researchers will add additional lakes to expand the geographical coverage of our database so that the effects of lake salinization can be more easily addressed at a global scale.

Methods
The organizational workflow and data requirements for inclusion of lakes in the global lake chloride database are depicted in Fig. 1. All analyses were performed in the R statistical programming language 22 , using a suite of packages to access data and perform statistical analyses.

Acquisition
Three primary approaches were used to collect lake chloride concentration data. Data were obtained from 1) online repositories containing datasets from multiple lakes, 2) online repositories containing data from single lakes, or 3) individual researchers, including members of the Global Lake Ecological Observatory Network (GLEON), and other organizations that responded to data requests. This range of approaches accounted for the multiple ways data were stored and shared, and as such, maximized the number of lakes included in our database.
For inclusion in this global lake chloride database, a site had to meet set criteria (Fig. 1).
1. The lake must have a surface area of at least 0.04 km 2 (4 ha). This was the original size cutoff instituted by the United States Environmental Protection Agency (EPA) in their 2007 National Lakes Assessment. 2. The long-term mean chloride concentration must be less than 1 g l − 1 . This removes brackish and saline lakes from the database. Saline lakes are often defined as lakes with total dissolved solids >3 g l − 1 (ref. 2). 3. The dataset must span at least ten years, and contain at least five data points. These criteria ensure both a robust measure of chloride concentrations and the ability to detect a long-term trend. 4. The dataset must include one chloride record after 2000. This criterion was established to ensure comparability in respect to time among site records.
Once the data were collected, they were processed using an a priori metadata protocol. Each lake was assigned a project-specific identification number, and was paired with the associated data source name, source contact/acquisition information, and public accessibility designation (publicly available/private). We were unable to find any sites that met these criteria in South America, Africa, Asia, or Oceania. Many sites in the aforementioned regions had lake chloride measurements, but no records that spanned at least ten years, and therefore were not included in this database. In total, we collected 529 long-term datasets from ten countries (Fig. 2). United States (n = 315). The majority of lakes included in this database are located in the United States (US). Data were obtained from the sources described below.
• The Water Quality Portal (WQP, http://www.waterqualitydata.us/) is a database that amalgamates data from federal, state, tribal, and local sources in the United States. It includes data from the EPA Storage and Retrieval (STORET) data warehouse and the United States Geological Survey (USGS) National Water Information System (NWIS). We searched the WQP for the characteristic name 'chloride' in 'lakes, reservoirs, and impoundments', using the dataRetrieval package in R (v2.5.5 (ref. 23)). Data extracted from the WQP included 67 lakes in Minnesota monitored by the Minnesota Pollution Control Agency. Lakes located in the Minneapolis/St Paul region are some of the most urban lakes included in the database (as determined by road density and land cover). • North Temperate Lakes Long Term Ecological Research Site (LTER) collects long-term chloride data for eleven lakes in Wisconsin 24 . Data for Lake Wingra, Lake Monona, and Lake Mendota were augmented with earlier data from the Wisconsin Department of Natural Resources. In some cases, only annual averages were available; these sampling dates were labeled as July-01 (Lake Mendota: 1957-1972, Lake Monona: 1940-1987). • Hubbard Brook LTER (http://www.hubbardbrook.org/) hosts 43 years of chloride measurements from Mirror Lake, New Hampshire. • The Lake Champlain Basin Program provides a publicly accessible database of chloride measurements from Lake Champlain (http://www.lcbp.org/water-environment/data-monitoring/lake-and-watersheddata/). We obtained data for the Burlington Bay sampling site, and combined these with earlier data from USGS site 04295000. Lake Champlain (744 km 2 ) is the only US lake represented in the ten largest lakes in our database. • The EPA manages two long-term monitoring projects, specifically focused on the effects of atmospheric deposition into surface water bodies. These programs are the Temporally Integrated Monitoring of Ecosystems and Long Term Monitoring Project. The lakes included in these programs tend to be located in remote environments in New York, Vermont, Maine and New Hampshire. Canada (n = 37). Alberta Environment and Sustainable Resource Development provided chloride data for eleven lakes in Alberta. The Water Quality Management Section of Manitoba Conservation and Water Stewardship provided chloride data for 16 lakes in Manitoba. Manitoba lakes represent four of the top ten largest lakes in the database, including Lake Winnipeg, the largest lake at 24,514 km 2 . Kawartha Conservation provided datasets exceeding 40 years for Balsam Lake, Cameron Lake, and Sturgeon Lake in Eastern Ontario. The International Institute for Sustainable Development provided data for four small lakes in the IISD Experimental Lakes Area (ELA) in northern Ontario. The Ontario Ministry of the Environment and Climate Change provided 15 years of data from Lake Simcoe. Of the ten largest lakes in the database, Lake Simcoe (722 km 2 ) has the highest road density surrounding its shoreline. In 2008, the Ontario government introduced the Lake Simcoe Protection Act as a strategy to improve water quality.
Sweden (n = 102). A publicly available database of water chemistry was available through the Swedish Department of Water and Environment (http://info1.ma.slu.se/db.html). We included long-term monitoring data from a series of lakes that were designated as trend stations, which resulted in a collection of 101 lakes geographically distributed throughout the country. We also included data from data from Vänern, the largest lake in Sweden. Swedish coordinates were presented in the Swedish grid RT90 (ESPG:2400) at 10-m resolution, and were converted to WGS84.
Germany, France, and Switzerland (n = 32). Chloride data for Lake Constance were obtained from the Institut für Seenforschung in Langenargen. Data for Lake Zurich were provided by the City of Zurich Water Supply (WVZ) and the Amt für Abfall, Wasser, Energie und Luft (AWEL) of the Canton of Zurich, Switzerland. Chloride data for 28 lakes in Mecklenburg-Western Pomerania, Northern Germany were compiled for a review of German sulfate trends. All authorities and persons who supplied data are mentioned in the acknowledgements of Kleeberg 25 . Data from Lake Geneva and Lake Bourget were accessed through the Alpine Lakes Observatory (SOERE OLA-Observation and Experimentation System for long-term Environmental Research). SOERE OLA is member of AnaEE-FRANCE and sponsored by AllEnvi, the National Research Alliance for the Environment. Physical, chemistry, and biodiversity data can be download using the information system developed by Eco-Informatique ORE team of the French National Institute for Agronomical Research (INRA, https://si-ola.inra.fr). INRA collects data from Lake Bourget with its partner CISALB, the intercommunal association of Lake Bourget, and Lake Geneva (also known as Lac Léman) with its partner CIPEL, the International Commission for the Protection of Leman.
Finland (n = 11). Water resource and environmental data for Finnish lakes are publicly available through the Finnish Environment Institute (SYKE) (http://www.syke.fi/en-US/Open_information). Chloride data for lakes included in our analysis were compiled by Lauri Arvola, at the University of Helsinki, through the online SYKE portal. Lakes Saimaa, Päijänne, and Inari are three of the largest lakes in the database; all lakes are over 1,000 km 2 .
United Kingdom (n = 8). The United Kingdom Upland Waters Monitoring Network (UK UWMN) is a consortium led by University College London (UCL), the NERC Centre for Ecology and Hydrology (CEH), Marine Scotland and Queen Mary College (QMUL). Originally funded by UK Government Department for Environment, Food and Rural Affairs, the network is currently supported by the partner organizations and a range of regional governments, agencies, and research institutes. The UK UWMN provided chloride data for eight UK lakes.
Hungary (n = 3). Water quality data are publicly available through Middle-Transdanubian and West-Transdanubian Water Directorates. We obtained long-term data from yearly reports of water quality of the Kis-Balaton Reservoir System and of Lake Balaton. Chloride concentrations were determined by argentometric methods according to Hungarian ISO Standards.
Italy (n = 1). The Institute of Ecosystem Study of the National Research Council (CNR ISE) in Verbania, Italy, provided chloride data for Lake Maggiore 20 . These data are collected in the framework of the long-term research program on Lake Maggiore funded by the International Commission for the Protection of Waters between Italy and Switzerland (CIPAIS). Lake Maggiore is the westernmost of the deep subalpine lakes in Northern Italy (LTER Europe site LTER_EU_IT_008). The lake area is 212.5 km 2 and its maximum depth is 370 m.

Chloride data
Each lake entry contains, at minimum, the sample date and the chloride concentration, converted to standardized units (mg l − 1 ). Sampling depth was included for most sites. In some datasets, decadal gaps between sampling dates existed because early data points were sparse. Because trend analyses can be overly sensitive to outlying data points, we added additional quality control criteria; if a decadal gap was present in a dataset, at least five sampling points had to exist prior to the gap; otherwise, earlier data were not included. Additionally, the tsoutliers package in R (v0.6 (ref. 26)) was used to find and remove additive outlier data points, which although rare, skewed the linear and additive models' fit to the data. The approach uses a t-statistic to test the significance of outliers at each time step and highlights those above a critical value. In our analyses, we conservatively chose a critical value of 20. In total, the data from 483 lakes were unchanged, and 64 lakes had 1-9 data points removed. The number of outliers removed from the original dataset is included in the metadata.

Lake data
Each lake was identified by its latitude and longitude (in decimal degrees) and given a unique ID number. The country, or the province/state for North American lakes, was determined from the geographic coordinates.
In the United States, the county was added to the file name, as there are often lakes with duplicate names within a state. Lake areas were obtained from a variety of published data sources. In rare cases, where lake area was not available, lake area was calculated from custom shapefiles in ArcGIS (see section on shapefiles). Lakes   were classified as natural lakes or reservoirs. This distinction was made from visual inspection of satellite images for a dam, or from published government reports or articles that identified the waterbody as a reservoir. 407 of the 529 lakes had maximum lake depth data, in meters. The distance of the lake to the nearest coast (km) was calculated using the coordinates of the lake and a 1:10 m coastline vector shapefile available from Natural Earth (http://www.naturalearthdata.com/downloads/10m-physical-vectors/10m-coastline/), using the equal-area mollweide projection. The coastline vector was edited to exclude the St Lawrence Seaway.

Climate data
To ensure homogeneity across regions, all climate data were obtained from gridded global datasets.
Temperature and precipitation. Monthly mean temperatures and annual precipitation totals were obtained using the WorldClim dataset, which contains high resolution global interpolated climate data from 1960-1990 (ref. 27). We used the highest resolution 30 s data, which is equivalent to 0.86 km 2 at the equator. Lakes in the interior of North America, including those in North Dakota, Alberta, and Colorado receive less than 500 mm of precipitation a year. In contrast, some lakes in Vermont and the UK average greater than 1,300 mm a year.

Sea salt deposition.
A global data set of wet and dry sea salt (NaCl) deposition was obtained from the World Data Centre for Precipitation Chemistry 28 (http://www.wdcpc.org/assessment). We used the north grid of the HTAP 2001 ensemble-mean model results for emissions, deposition, and concentration, which has a spatial resolution of 1.0°latitude x 1.0°longitude 28 . Sea salt deposition is given as the combined total wet and dry deposition (kg ha − 1 yr − 1 ). Overall deposition was lower over North American lakes (mean = 13 kg ha − 1 yr − 1 ), than European lakes (mean = 38 kg ha − 1 year − 1 ). Deposition was highest in the United Kingdom, where lakes receive an average of 98 kg ha − 1 of sea salt deposition each year.

Shapefiles
Shapefiles that geographically delimited the perimeter of the lakes were required to quantify surrounding land use patterns. Shapefiles were acquired via five methods: 1. Lakes in the continental United States were included in the National Hydrography Dataset (NHD, n = 336, http://nhd.usgs.gov/). The high resolution NHD data are available at 1:24,000-scale. 2. Shapefiles for lakes in Canada were downloaded from the 1:50,000 National Hydro Network (NHN). 3. Shapefiles for Swedish lakes were available through the Swedish Water Archives (n = 102, http://www.smhi. se/klimatdata/hydrologi/sjoar-och-vattendrag/). We used surface water body layer version Vy_y_2012_2. Multiple shapefiles were merged to create a single shapefile for Vänern (pers. comm. B. Denfield).  Lakes are presented at a scale of 1:1,000,000. 5. Shapefiles for Lake Fenek (12.5 km 2 ) and Lake Hidvegi (17.1 km 2 ) in Hungary, and eleven other small European lakes in the UK, Germany, and Finland were not publicly available. These shapefiles were generated manually using ArcMap10.2.2.
The method by which shapefiles were acquired is included in the metadata as NHD, NHN, Sweden, GLWD, or Manual.

Land cover
We constructed buffer zones surrounding each lake to approximate the anthropogenic influence of watershed and shoreline runoff of chloride into lakes. Buffer widths of 100, 200, 300, 400, 500, 1,000, and 1,500 m were chosen based on previous findings, such as those by Kelting et al. 29 , who found that >70% of total variation in chloride concentrations in Adirondack Park, NY could be explained by the road density in 320-to 1,280-m buffer zones. Regalado and Kelting 30 used a 100-m buffer as an estimate of vehicular spray distance, and Read et al. 31 used a 200-m buffer to understand local drivers on lake characteristics across the United States. We calculated both the percent impervious surface and total road density within buffer zones surrounding each lake as metrics for urban development. Buffer zones were constructed using the gBuffer function in the rgeos package in R (v0.3-19) (ref. 32) using local WGS84 Universal Transverse Mercator (UTM) zone projections.

Impervious surface
Impervious surfaces are all impenetrable artificial surfaces, including roadways and parking lots (concrete and asphalt), and building roofs 33 . Since impervious surfaces prevent water infiltration into soils, increased runoff from these urban landscapes can threaten water resources 34 . Impervious surfaces have previously been linked to road salt application and the salinization of fresh waters 9,35,36 .
In land cover datasets, impervious surface is typically represented two ways: Method 1) a pixel is represented either as impervious or not-impervious (Boolean value), or Method 2) impervious surface is a continuous variable within each pixel from 0 to 100 percent 37 . The latter is a more robust quantification of impervious surface 37 .
United States. The 2011 United States National Land Cover Database (NLCD) is a national land cover product at a spatial resolution of 30 m (http://www.mrlc.gov/nlcd11_data.php). The NLCD presents the option of quantifying impervious surface via both methods from above. Method 1 is implemented via the Trout Lake Wisconsin (15. Figure 5. Comparison of impervious surface and road density calculations for Trout Lake and Lake Wingra, Wisconsin, USA. Values are given for a 500-m buffer. Impervious surface calculations via Method 1 have higher absolute values than Method 2, but relative differences are similar (see Fig. 4). Road density. Road density has been used in previous studies as a proxy for urbanization and the application of road salt 29,40,41 . Road density was defined as the ratio of the length of the total road network in a given area (km) to the land area (km 2 ). Worldwide road data were downloaded from OpenStreetMap via the osmar package in R (v1. 1-7 (ref. 42)). In each buffer zone, we selected 'ways' tagged as highway, which included all primary, secondary, residential, and service roads. In our database, the highest road densities in a 500-m buffer were located around lakes in Minneapolis/St Paul, Minnesota and were up to 27 km km − 2 . Road density and percent impervious surface in buffer zones surrounding a lake were positively correlated overall (r 2 = 0.69 in 500-m buffer, r 2 = 0.79 in 1,500-m buffer); however, many lakes with low road density (0-5 km km − 2 ) had 0% impervious surface.

Data Records
The final database includes three files (Data Citation 1).
1. All descriptive lake data are formatted as a horizontal data table and is provided in chloride_concentrations.csv (Table 1). 2. Chloride time series data are formatted as a long data table and is provided in lake_characteristics.csv (Table 2). 3. Shapefiles for 529 lakes are provided as shapefiles in a zip file.
The database is designed so additional lake data can be easily added as they become available. All climate and land cover metrics were derived using open source datasets, and processed using open source tools. As new long-term datasets become available, or current datasets are expanded, the database can be updated with minimal time investment.

Technical Validation
For sites where maximum lake depth was not available, we have inferred that samples were collected at the surface, although this cannot be verified. Where samples were collected at multiple depths, we visually examined a selection of these lakes and found that chloride trends were consistent throughout the water column. For example, in Lake Constance and Vänern, both large freshwater lakes, chloride concentrations in the hypolimnion (bottom waters) and the epilimnion (surface waters) show similar trends through time (Fig. 3).
A comparison of the two methods used for calculating impervious surface reveal the measure of impervious surface surrounding a lake using Method 1 typically returns higher estimates, these are highly correlated to values calculated using Method 2, and become more tightly correlated as buffer width increases (Figs 4 and 5).