Historical and projected datasets of the United States electricity-water-climate nexus

This article describes datasets that were produced in connection with the research article: “Visualizing the United States electricity-water-climate nexus” published in Environmental Modeling and Software (https://doi.org/10.1016/j.envsoft.2021.105128). Data cover 9,961 individual power plants across the United States, including monthly values for electricity generation, greenhouse gas emissions, water withdrawal, and water consumption between 2003 and 2020, as well as projections out to 2050. Data were retrieved from publicly available sources and processed for the purpose of providing plant-level information that can be aggregated according to various user needs. Power plant information was retrieved from the US EPA Facility Registry Service (FRS) web service through the filter of “EIA860.” For these plants, we retrieved electricity generation, greenhouse emission, water consumption, and water withdrawal of each plant from heterogeneous data sources, including web services and files, clean and process them, and save them in our database tables. We filled remaining data gaps using a coefficient-based approach. This data article describes metadata and methods for producing the historical and projected datasets in the format of CSV files. The datasets are beneficial for researchers to view electricity generation in the context of emissions and water usage at the granularity of power plants, such as for data analysis and machine learning. These data also can be aggregated to different spatial scales, such as watershed, county, state, and national level, according to different analytical needs. In addition, decision makers can use these data for future energy and resource allocations with the awareness of emission and water constraints.

watershed, county, state, and national level, according to different analytical needs. In addition, decision makers can use these data for future energy and resource allocations with the awareness of emission and water constraints.
© 2021 The Authors. Published by Elsevier Inc. This is an open access article under the CC BY license ( http://creativecommons.org/licenses/by/4.0/ ) Table   Subject Renewable Energy, Sustainability and the Environment Specific subject area Energy-Water Nexus Type of data CSV files How data were acquired Raw data were retrieved from heterogeneous, publicly available data sources, including web services and files. Data format Mix of raw and processed data Parameters for data collection We prioritized publicly available data from government sources that were accessible with web services through computer programming interfaces. When web services were not available, we used downloadable data. To fill data gaps, we also used water use coefficients from existing peer-reviewed literature, as discussed in the methods section.

Description of data collection
We collected the following information through web services: Facility Information: EPA Facility Registry Service (FRS) web service [1] Historical electricity generation data: U.S. Energy Information Administration (EIA) Plant-level Generation web service [2] Historical emission data: EPA GHGRP (EnviroFacts) greenhouse emissions web service [3] Projected electricity generation data: EIA Annual Energy Outlook 2019 [4] We collected the following through CSV files: Historical water consumption and withdrawal data: EIA thermoelectric cooling water data [5] Data source location Primary data sources: • US Environmental Protection Agency, "Facility Registry Service," from: https://www.epa.gov/frs/frs-data-resources [

Value of the Data
• Water use in the electricity sector is not fully captured in existing energy-water assessments, which can be complemented by our modeling approach and data provided in HydroShare. Our data sets cover 9961 power plants with different plant types across the United States, from 2003 to 2050. It has the granularity of monthly data at the plant level that can be aggregated to different levels to satisfy various user requirements.
• Potential Users of these data include: • Power systems researchers seeking to extend or verify their models in terms of water usage. • Government agencies tasked with monitoring current power systems and planning for environmental impacts of future ones. • Water resource stakeholders tasked with inventorying power system water use at various geographic scales. • Electric utilities seeking to inventory water demand from power sources and assess portfolio options in terms of water use and greenhouse gas emissions. • Electricity Balancing Authorities and Regional Entities tasked with ensuring reliability of and managing natural resource constraints on power systems. • Potential data uses: • Users can use the data directly for plant-level generation, emission, water consumption, and water withdrawal, for data analysis and machine learning. • Users can aggregate the data into different spatial scales including watershed (HUC-8), county, state, and national. • Once users aggregate the data into the HUC level, users can compare water consumption and water withdrawal with water availability and other hydrologic metrics from other models.

Data Description
(1) CSV file named historical.csv: This file contains the historical data from 2003 to 2020. name of the cooling water source generation electricity generation in the given year and month (MWh) emissions greenhouse gas emission in the given year and month (MtCO2e) emissionsDerived emissions flag: 0 = data that is based on yearly raw data; 1 = derived data produced by coefficient waterWithdrawal water withdrawal in the given year and month (MGal) ( continued on next page ) Field Description waterWithdrawalDerived water withdrawal flag: 0 = raw data; 1 = derived data produced by dynamic coefficient; 2 = derived data produced by static coefficient except where fuel type is "water"; 3 = derived data produced by static coefficient where fuel type is "water" waterConsumption water consumption in the given year and month (MGal) waterConsumptionDerived water consumption flag: 0 = raw data; 1 = derived data produced by dynamic coefficient; 2 = derived data produced by static coefficient except where fuel type is "water"; 3 = derived data produced by static coefficient where fuel type is "water" Hydropowercentroid georeferenced centroid for hydroelectric power plants (2) Table 1 Static coefficients of water withdrawal and consumption by plant type, adapted from a study by Grubert and Sanders [6] . Lifecycle stages refer water uses that take place at the power plant water related to the pre-conversion (1), conversion (2), and post-conversion (3) stages of electricity production, as specified in the study.  projected.csv includes the electricity generation projections based on the EIA Reference Case, which includes the most likely prediction based on the current information, compared to other cases.

Experimental Design, Materials and Methods
We develop different computer programs to retrieve and clean the data from heterogeneous data sources. The data are stored in database tables using Microsoft SQL server. For any facilities reported to EIA FRS by the filter of "EIA 860", we retrieve the power plants and saved the details of each plant in the Facility table. Only power plants in this table are included in our system and presented in the two CSV files in HydroShare repository. The CSV files were produced by querying the results of joining multiple database tables. This section describes our data processing methods for two components: historical data and projected data.

• Greenhouse Gas Emissions
Emission data that is retrieved from GHG [3] is yearly data. We disaggregate it to monthly data and handle gap years using a coefficient approach. Firstly, we calculate the ratios:

EmissionCoef year(i) = (Emission year(i) /Generation year(i) )
A gap year means that a plant does not have any emission data in a specific year but it may have emissions in other years. To estimate the unavailable emission data for the gap year, the emission coefficient value is further processed as follows: If emYear < = minimal year in the available data then EmissionCoef unavailableYear = EmissionCoef minYear else EmissionCoef unavailableYear = EmissionCoef maxYear Next, this formula converts yearly emission value to monthly:

• Water Consumption and Withdrawal
• Raw Data After retrieving available reported water consumption and withdrawal data from EIA's thermoelectric cooling water CSV files [5] , we aggregate the consumption and withdrawal values, respectively, grouping them by plant code, year, and month. Each aggregate result is presented as one row in the file of historical.csv with a derived flag of 0, which signifies that this water consumption and withdrawal are raw data.

• Dynamic Coefficients
For any power plant that has been reported in EIA's thermoelectric cooling water CSV files, but has a gap year, we use the dynamic coefficient approach for the gap year. Firstly, we use the following database query to calculate the dynamic coefficient as the ratio of known monthly water withdrawal (or consumption) to corresponding monthly generation: Select w.plantCode, SUM(w.waterConsumption)/SUM(g.generation) as waterConsumpPer-Gen, SUM(w.waterWithdrawal)/SUM(g.generation) as waterWithdrawPerGen from cool-ing_summary_raw_per_year as w, [genPerYear] as g where w.plantCode = g.plantCode and w.usageYear = g.genYear and g.generation ! = 0 Group by w.plantCode Next, we use database stored procedures to multiple this coefficient by the known monthly generation value to produce water consumption and withdrawal values:

• Static Coefficients
Only about 10% of power plants have raw water use data (those that report to EIA are larger ( > 100MW) fossil, nuclear, biomass, and solar thermal plants); water use at the remaining 90% is estimated using static coefficients from a comprehensive study of water use in the U.S. energy sector [6] as well as another study specifically for hydroelectric facilities [7] . First, since many power plants have multiple fuel and prime mover types, we characterize plant type (denoted as plantType in the historic.csv file) by querying yearly generation for each fuel-prime mover combination at each plant and choosing the plant type with the largest generation value. Next (except for hydroelectric power plants), each power plant's monthly generation is multiplied by the corresponding coefficient -notated as Static_Consumption_Ratio and Static_Withdrawal_Ratio in Table 1 -using the following stored procedures. Hydroelectric power plants (i.e. those with a fuel type of "water") were assigned regional withdrawal and consumption coefficients from hydropower-specific study by Grubert [7] , which account for evaporation and seepage of reservoirs, as shown in Table 2 . • Water type We retrieve cooling system type, water type, water source, and water source name from the same EIA's thermoelectric cooling water CSV files [5] as water consumption and withdrawal. If a facility has more than one cooling system type, the cooling system type with the maximum total generation is used as the cooling system type, which is described as the dominant cooling system type in the second column in Table 1 . Water type, water source, and water source name are produced in the same way. For a gap year, we use either minimal or maximum year of available data to fill in the blank, which has the same logic as processing gap years in emissions.

WaterConsumption year(i),month(j) = Static_Consumption_Ratio
(2) Projected data (2020 to 2050) This section corresponds to the data in the file of projected.csv EIA AEO 2019 yearly projections of generation data per EMM region and fuel type. We disaggregated the yearly regional data into plant-level monthly generation data by the following two steps: (1) calculating the percent contribution to the total EMM-regional generation per fuel-type by a power plant for each month in the historic data for year 2018, (2) Apply this percentage to regional data for each future year and month.
With individual plant-level generation data, we calculate projected emissions similar to the calculation of historical emissions. Specifically, the plant-level coefficient from the most recent year (denoted as "maxYear" below) of emissions data is multiplied by generation of a given month and year to produce the projected corresponding emission value. For example, for a given power plant, if no emission data after 2018 is available, then "max year" of this plant is 2018.

EmissionCoef projected = Emission maxYear /Generation maxYear
A predicted emission value is the multiplication of the ratio and the predicted generation value.

Emission year(i), month(j) = EmissionCoef projected * Generation year(i), month(j)
Regarding water consumption and withdrawal, all coefficients are the same as the historical system since it already considering all the available years until 2018. A projected water consumption or withdrawal value is the product of the coefficient and the predicted generation value. The projected cooling system type, water type, water source, and water source name are the same as that in the historical data from the most recent available year for each power plant.

Ethics Statement
None.

Declaration of Competing Interest
This work was supported in part by the United States Environmental Protection Agency Exchange Network Grant Program [Grant number OS-83923301].
The authors declare that they have no known competing financial interests or personal relationships which have or could be perceived to have influenced the work reported in this article.