Dataset of 2012-2020 U.S. National- and State-Level Greenhouse Gas Emissions by Sector

The dataset contains ∼1.1 million records of total greenhouse gases directly emitted annually by economic sectors and households in the US from 2012-2020. Data are given for 16 unique greenhouse gases by 118 aggregate sectors for each state, and as totals by these aggregate sectors as well as by 540 detailed sectors at the national level. The dataset is a product of updated sector attribution models that improve upon the National Greenhouse Gas Industry Attribution Model. This paper provides documentation of the methods used to produce these datasets and proof of validation of the dataset, along with relevant supporting tables, figures, and source code.


a b s t r a c t
The dataset contains ∼1.1 million records of total greenhouse gases directly emitted annually by economic sectors and households in the US from 2012-2020.Data are given for 16 unique greenhouse gases by 118 aggregate sectors for each state, and as totals by these aggregate sectors as well as by 540 detailed sectors at the national level.The dataset is a product of updated sector attribution models that improve upon the National Greenhouse Gas Industry Attribution Model.This paper provides documentation of the methods used to produce these datasets and proof of validation of the dataset, along with relevant supporting tables, figures, and source code.
Published by Elsevier Inc.

Value of the Data
• Tracking greenhouse gas emissions is important for governments and organizations seeking to estimate progress in reducing the drivers of climate change-related impacts.• While the U.S. GHG Inventory (GHGI) is an essential resource for understanding the total annual GHG emissions attributable to the U.S., it does not fully attribute these emissions to economic sectors at a highly detailed resolution, which is the purpose of the modeling and resulting datasets described here.• Researchers looking to assess U.S. greenhouse gas emission trends associated with sector/industry activities will find these data valuable.• The data can be used in sector-and national-scale models of GHG emissions associated with economic activity including environmentally-extended input-output models, life cycle assessment models, or to produce environmental-economic accounts for GHGs for the U.S.

Background
This dataset provides an update to the output of what was originally the National Greenhouse gas Industry Attribution Model (NGIAM) [1] .The model attributes GHG emissions (more generically flows for sector attribution model) from primary data sources to specific industries, where not all flows are at the target level of industry detail or not given by the industry classification that is required, and hence additional sources are used to attribute or allocate the flows to a given industry.The most recent NGIAM was published in 2020 [1] in association with the Supply Chain GHG Emission Factors [2] .The 2020 NGIAM is an Excel © model that provides totals of GHGs for years 2010-2016 by industries as defined in the 2012 BEA Detailed Input-Output tables [3] , which are aggregations of 2-to-6 digit North American Industry Classification System (NAICS) codes.The objective of this work was to update the data and architecture of NGIAM model to create a dataset of GHG emissions by U.S. industries.This document also improves upon the documentation of the NGIAM, which has not been previously described in a peer-reviewed format.[8 , 9] .Names for the sectors that correspond to the codes can be found in the dataset.Columns with identical values across all records are omitted here.

Data Description
The 2012-2020 Greenhouse Gas (GHG) National-and State-Level Emission Totals By Sector dataset [4] contains ∼1.1 million records of GHGs directly emitted in the U.S. by economic sectors and households across three Excel © files.The data provide greenhouse gas totals by sector across a U.S. state for a given year at an aggregate level with 118 unique sectors.At the national scale, totals by sector are provided at the same aggregate industry level as well as at a more detail level, with 540 unique U.S. sectors.Sectors are defined using the sector code from the 2012 version of the NAICS.The 2022 release of the U.S. GHG Inventory (GHGI) is the primary source the emissions quantities in the dataset [5] .The 16 unique GHG chemicals or groups are reported using the nomenclature from the Federal LCA Commons Elementary Flow List v1.2.0 [6] , where all individual GHGs present in the GHGI are reported.The dataset uses the Flow-By-Sector collapsed format [7] .Emissions of each GHG are reported in kilograms (kg) per year.One exception is groups of hydrofluorocarbons (HFCs) where the original data were only available in CO 2 -equivalents (CO2e).The geographic boundary includes the 50 U.S. states and the District of Columbia.Even more detailed versions of the Excel datasets are available in Apache parquet files, where each record provides a total for a given GHG, sector, year, location, and data source and provides all the fields in the Flow-By-Sector collapsed format.The Excel versions of the data provide records aggregated at the level of GHG, sector, year, and location.The Excel versions also provide supporting definitions of sector codes.
The first 15 rows of the full tables (Apache parquet format) for detailed sector-level national values for year 2020 are shown in Table 1 .The first three records show CO 2 emissions for sector 11111, which is soybean farming, from three different sources, including applying lime to fields, applying urea fertilizer, and combustion of petroleum in farming operations.The data come from the EPA GHGI tables 2.1 and A5.Users can refer to the description of the data in the GHGI for more information on the emission source.
For the summary tables and figures shown here, we have converted the GHG emissions in the datasets into carbon dioxide equivalents (CO2e) using estimates of their potential impact on global warming.We provide a related dataset, IPCC AR4, AR5, and AR6 20-, 10 0-, and 50 0-year GWPs [10] , containing the most frequently used global warming potentials (GWPs) from the International Panel on Climate Change 4th, 5th, and 6th Assessment Reports that are aligned for   3 for the national aggregate data.Analogous figures produced for the U.S. detailed dataset are available on figshare [11] .
use with this dataset or others using the Federal LCA Commons nomenclature.The 5th Assessment report 100-year (AR5-100) GWPs from the aforementioned dataset are used to aggregate the GHGs in our dataset to provide totals in CO2e and simplify presentation of the aggregate national models for 2012-2020 in Fig. 1 and the 2020 state models in Fig. 2 .We also further aggregated sectors in these figures where GHG totals were too low to discern any differences or sector is split into emissions that are directly attributed and emissions that are allocated.The presentation in these figures reveals quantities of GHGs that are directly attributed to industries without allocation, and those amounts that are attributed after allocation.The three largest CO2 emitting sectors groups are the electricity (221) and transport (48), and households (F010).The three largest CH4 emitting sector groups are animal agriculture (112), mining (211), waste management (562).The three largest N2O emitting sector groups are crop agriculture (111), animal agriculture (112), and electricity (221).Emissions of other GHGs, such as HFCs, fluorocarbons, sulfur hexafluoride (hereafter noted as "minor GHGs") are more dispersed across a wide range of sectors.While most of the CO2 and N2O emissions are attributed using allocation approaches, most CH4 emissions are directly attributed.Minor GHGs from certain manufacturing and transportation sectors and households are directly attributed; however, they must be allocated to other manufacturing and service sectors.trends.High resolution versions of these figures along with analogous figures produced for the U.S. detailed dataset and additional lists of sector names are available on figshare [11] .

Experimental Design, Materials and Methods
Multiple flow sector attributions models [7] , also referred to as Flow-By-Sector (FBS) models, were created to produce the datasets.
The models used to generate this dataset were created in the Flow Sector Attribution (FLOWSA) tool v2.0.0 [7 , 12] .The modeling approach is generally consistent across data years, with only slight variations in data years represented in attribution data sources or source table or activity name changes, in keeping with changes in the annual GHGI report.The data flow through the models is represented in Fig. 4 .
The creation of an aggregate level 118 sector national model is the first step.This model is a parent model for both a more detailed national-level model, and an aggregate state-level model.This parent-child relationship between the models assures consistency in calculated GHG totals by sector at the more refined industry and geographic resolutions given in the child models.The 2022 release of the GHGI is the primary emissions source for all models and is a direct input into the national aggregate model [4] .This GHGI release covers emissions from 1990-2020, although only data from 2012 to 2020 are used, which are the years available for the other datasets.The GHGI includes 100s of tables of emissions data describing various activities.GHG emissions data from these tables are carefully selected to choose emissions with activities associated with the most specific industries as possible to avoid the need for allocation.For example, emissions from Natural Gas Systems are reported by stage in Table 3-68 of the GHGI, as opposed to in aggregate in Table 2-1, which allows for attribution to specific activities of the natural gas supply chain directly, such as extraction and transmission.Emissions from 147 unique activities found in 25 unique GHGI tables are selected.Each of the selected GHGI tables are downloaded directly and converted into flow-by-activity (FBA) datasets, where records from the tables are translated into FLOWSA-defined FBA fields.For example, the year 2020 GHGI Table 2.1 indicates 0.5 CO2 MMT CO2e for Lead Production.In this example, "EPA Table_2_1" is the Source Name, "Lead Production" is the Activity-Produced-By, "CO 2 " is the Flow Name, "0.5" is the Flow Amount, and "MMTCO2e" is the Unit.Additional information is added to the FBA, such as "Chemicals" as the Class of the flows, "ELEMENTARY_FLOW" as the Flow Type, and "air" as the compartment the flow is emitted to.Metadata are also collected and stored for each FBA that includes the acquisition location (URL), date, and the githash identifier of the FLOWSA source code used to acquire and store the FBA.In a similar procedure, FBAs are also created and stored for each of the data sources used to perform attribution.
Crosswalks are created between the activities in the selected GHGI tables and the 2012 industry codes [13] .The crosswalks map an activity to the most appropriate NAICS sector(s) which could be an aggregate 2-digit sector or a more detailed 3, 4, 5, or 6-digit sector.This crosswalk is referred to in FLOWSA as an Activity-to-Sector mapping file.The crosswalks link an activity to one or more NAICS codes, where the level of resolution (from 2-6 digits) of the NAICS industry is selected based on the appropriate fit, where some activities can best be described as matching more general industries (e.g., NAICS 31-33 'Manufacturing'), whereas other activities can be mapped to more specific industries (e.g., NAICS 331410 'Nonferrous Metal (except Aluminum) Smelting and Refining').Where an activity is linked to more than one sector in the crosswalk, or where the model specifies a higher resolution of industries (e.g., 6-digit NAICS) than specified in the crosswalk (e.g.4-digit NAICS), attribution is required to split the GHG emissions from the original source activity across the target sectors.Crosswalks are also created for all the datasets used for attribution, where they are not already reported as NAICS codes.
The rules for attribution, which specify the flow source tables (as FBAs) that are used, which activities from the sources to use, and how to allocate them where relevant, are encapsulated in Flow-by-Sector (FBS) method files stored in FLOWSA.These method files are written in YAML, a simple human and machine-readable text format using a key-value like format.The primary method for the first step, creation of the aggregate national model, is captured in a common YAML file, GHG_national_m1_common.yaml and year-specific YAML files (e.g., GHG_national_2016_m1 , the year-specific file for the 2016 data year implementation) are included that inherit all the instructions from GHG_national_m1_common.yaml but provide yearspecific customization of flow source and attribution source data years.The Appendix provides an annotated excerpt of the GHG_national_m1_common.yaml .The subsequent state aggregate method includes GHG_state_m1_common.yml and associated year-specific YAML files, and the national detail method can be found in GHG_national_m2_common , also associated with yearspecific YAML files.Note that "m1" and "m2" are simply notations meant to distinguish alternate methods which cover the same emissions scope.
Table 5, found in the Appendix, summarizes to which sectors GHGs from each activity are attributed and lists the data sources used when the GHGs from the activity require attribution.There are 25 unique data tables from the GHGI from which GHG emissions data are drawn.Table 5 does not specify the GHGs used from each table, and there are cases where certain gases are drawn from one table, and others from another table for the same activity.Those details can be found in GHG_national_m1_common.yaml .
The primary attribution sources are the same as used in the 2020 NGIAM, although data source years are updated and/or more recent releases (data updates) of the source data are used.

General Approaches to Allocation
When allocation is required because a higher level of industry detail is needed than provided in the data source, the allocation method is primarily proportional allocation based on a secondary flow (e.g., expenditures on some sector input).The generalized equation for proportional allocation for sector attribution modeling is shown in Eq. ( 1) , where e s is the flow total for the sector s of interest, e t is the flow total across all sectors, α t is the allocation flow total across all sectors, and α s is the allocation flow for the sector of interest.
This can be illustrated with an example from the aggregated national model for the attribution of methane emissions from combustion for the activity "Agricultural Equipment Non-Road" from GHGI table 3.14 where the Detailed Use (henceforth "Detail Use") table is used as the attribution source.In this case, the sector of interest is any of a number of industries that purchase Farm Machinery, for instance Oilseed Farming.A row from the Detail Use table containing dollars of industry purchases of the Farm Machinery commodity is used for the allocation.α s in Eq. ( 1) would be the purchases of Farm Machinery commodities by the Oilseed Farming industry, and α t would be the sum of all purchases across the industries of Farm equipment.The dollar units cancel out leaving a fraction, representing a relative purchases of farm machinery in relation to other agricultural sectors (as a proxy for agricultural non-road equipment) by Oilseed Farming, which is the multiplied by the total methane flow amount from "Agricultural Equipment Non-Road."When the attribution source still does not provide information specific enough to achieve the target level of allocation, a second attribution source may be provided.
Where an attribution source cannot be identified to provide enough detail to provide flows for the target level of resolution, then equal allocation is used to allocate the remainder of the unallocated flow.In these cases, if an activity is mapped to multiple sectors, the flow amount is first equally divided across the relevant sectors.Then, the flows are equally attributed to child NAICS, based on sector length, until reaching the target sector levels.For example, if an activity with a flow amount of 16 is mapped to industries 236115, 236116, 236117, 236118, and 236220, a flow amount of 8 will be attributed to 236220 and a flow amount of 2 each to 236115, 236116, 236117 and 236118 (Box A in Fig. 5 ).Notably, if 236117 and 236118 were dropped as target sectors, the amount attributed to 236220 would not change, but rather 236115 and 236116 would each be attributed 4 (Box B in Fig. 5 ).In this way, the number of sectors that share a parent NAICS for a given sector length does not impact attribution to other "cousin" sectors.Eq. ( 2) provides the formula for equal allocation, where n t is the total number of relevant child sectors.
The national detail model ( m2 ) attributes GHGs to more detailed sectors (540) than used in the 2020 NGIAM model, which attributed emissions to BEA Detail level industries ( ∼400).Because a primary use case for this dataset is for use in USEEIO models, previously the BEA Detail schema, which is the schema used by USEEIO v1 models, was used as the target industry classification set.However, BEA uses categories like 1111A0 , which include selected 5-digit NAICS codes within the 4-digit industry 1111 .We attribute to the more resolved NAICS codes, like 11111 and 11112 , that are associated with the aggregate BEA code.This results in a higher sector resolution.
Table 2 shows the data years used for attribution sources when necessary for the new models alongside the NGIAM year 2016 model for reference.Data from the latest available year of BEA Make and Use tables from BEA are used wherever the Make and Use tables are used as an allocation source.Because the official Detail input-output tables are only issued every 5 years, we imputed annual Detail tables using the monetary flows from the Summary tables and proportionally-allocated those across the respective detail sectors using most recent detail tables (2012 or 2017).This imputation is performed using the flow-by-sector methods, Detail_Use.yaml and Detail_Supply.yaml .A common year of data from the most recently published results of the quadrennial Manufacturing Energy Consumption Survey (MECS) 2018 [14] , was used for 2016 and later year models, and the 2014 MECS data were used for all earlier models [15] .Similarly, the 2017 Census of Agriculture data [8 , 9] are used for years 2016 and beyond, while the 2012 data are used for prior years.The annual USGS Mineral Yearbook report for Lead [16] is used corresponding with the GHGI year, except for 2020, where 2019 data are used because they were the most recent available at the time of publication.The data acquisition dates can also result in changes if the allocation data provider revised the data source, even if revision information is not provided.The data sources shown in Table 2 in the NGIAM 2016 were acquired at different times than those used in the national aggregated model ( m1 ).

Estimating suppressed data from EIA MECS
The EIA MECS reports energy use by 3-to 6-digit NAICS and fuel type, such as coal or natural gas.MECS data for select NAICS and fuels are suppressed because the estimated energy use is less than 0.5 trillion Btu, the value represents a single establishment, or because the energy estimate's relative standard error (RSE) is greater than 50% [14] .When the energy use is suppressed because the value is less than 0.5 trillion Btu, we estimate this suppressed data by assigning the NAICS a value of 0.25 trillion Btu.These estimates of suppressed data represent a small fraction of energy use by sectors.For example, coal consumption by industries 31-33 in 2018 was approximately 432 trillion Btu [14] .EIA MECS suppressed 19 sectors at various sector lengths for being less than 0.5 trillion Btu, which, at most, would represent 1.7% of national coal consumption.Values suppressed for any other reason are estimated by proportionally attributing any unallocated reported parent values to the suppressed child values using a secondary source, in this case, Detail Use data.For example, national coal consumption for NAICS 325211 and 325312 are suppressed because the values represent single establishments, but MECS publishes coal consumption by the parent NAICS 325.Any unallocated flows from the parent 325 are proportionally attributed to the suppressed child values, as demonstrated in Eq. (1) .

Attributing industrial emissions to manufacturing sectors
Industrial stationary combustion emissions are sourced from GHGI Tables A-6, 3-8, 3-9.These tables report emissions by fuel and sector.In the GHGI, the "Industrial" sector includes manufacturing sectors (NAICS 31-33) but also non-manufacturing industrial sector such as Agriculture, Mining, and Construction.EIA MECS, the primary attribution source for industrial sources, only captures manufacturing sectors, so a second source is necessary for the non-manufacturing sectors.As a result, industrial emissions are first attributed between manufacturing and nonmanufacturing sectors by comparing fuel use in EIA MECS with that reported in GHGI Table A-6 for all industrial sectors.For 2018, we estimate that 76% of coal consumption and 68% of natural gas consumption is from manufacturing sectors.The remaining emissions are attributed to non-manufacturing sectors and attributed within those sectors based on the Use table.

Estimating hydrofluorocarbons by species
Emissions of hydrofluorocarbons (HFCs) and perflurocarbons (PFCs) as ozone depleting substance (ODS) substitutes are sourced primarily from GHGI Table A-97 for transportation and GHGI Table 4-102 for other activities.These tables report HFCs and PFCs in aggregate (in units of CO2e).In alignment with the goal of reporting all flows in units of raw mass, the specific flow breakdown from these sources is estimated based on the emissions reported in GHGI Table 4-100, which includes the emissions of seven specific HFCs and PFCs from ODS Substitutes.The remaining non-specified flows (described as "Others" in this table) are maintained as "HFCs and PFCs, unspecified" in units of CO2e.

Description of Changes from 2020 NGIAM
Methodological updates were made to correct for errors in the previous version, to make use of higher resolution GHGI data now available, and to account for other corrections, data updates, and methodological updates.Notable revisions to emissions estimates in the GHGI since the publication of the NGHGIAM are also presented.A list of changes including subsections describing method changes, data updates, and corrections is included below.

Methodology changes
1. Map to NAICS industries in place of direct mapping of emissions to BEA Detail industries (described above).2. Revise the flow used from EIA MECS to attribute stationary combustion emissions from petroleum products.3. Revise the approach to impute missing or non-reported data by sector from EIA MECS. 4. Refine the attribution approach for HFCs and PFCs from ODS Substitutes (GHGI Table 4-102) to account for increased resolution of data reporting.5. Simplify handling of gasoline and diesel fuel for transport allocation to be consistent with other Use table allocation methods.

State GHG Models
The state level models rely on the national aggregate model as the primary emissions source.For attribution to specific states, they use EPA's Inventory of U.S. Greenhouse Gas Emissions and Sinks by State (State GHGI), a recently developed dataset that is meant to provide state level estimates of GHG emissions that aligns with the national GHGI [17] .State level datasets are presented here for the first time, as they were not present in the NGIAM.The national aggregate model is chosen as a starting point for the state models, rather than the State GHGI.The national model provides additional detail in some cases that make it more appropriate as a starting place and assures that state totals aggregate to national totals for each sector.Like the national inventory, the state inventories provide data annually from 1990-2020 and the full range of GHGs.Because the structures of the source datasets (or FBAs) are not the same between the state and national inventories, additional Activity-to-Sector mapping files are required for the Greenhouse Gas Totals By Summary Industry by State dataset described here.
As shown in Fig. 4 , the state model uses the aggregate national FBS as its starting point.National emissions by sector are attributed across all states proportionally based on the emissions reported for specific activities in the State GHGI.In select cases, where the granularity of the State GHGI is less than that in the national GHGI, emissions are instead attributed using state-level Make and Use tables from the State IO models [18] .For example, mobile source CO2 emissions from aircraft and boats are attributed across states based on the use of petroleum fuels by sectors 481 and 483, respectively, instead of based on reported CO2 emissions in the State GHGI, for which the most granular emission activity is "Transportation -CO2 from Fossil Fuel Combustion -Fossil Fuel Combustion -Petroleum."Using this single activity from the State GHGI as an attribution source would yield emissions for aircraft and boats in the same ratio across all states, a result unlikely to be accurate.In this way, the state model can reflect state-specific consumption of commodities by industries as a means of attribution.

Technical Validation
We validate the datasets though comparisons to reference GHG totals from the EPA GHGI and state GHGI, compare to the NGIAM output, and we compare the total across the state datasets to our national totals.

Comparison to Reported Totals
Table 3 presents the results of the validation check against the GHGI totals by gas or groups of minor gas.All differences can be explained by rounding errors, except for the HFCs, due to the lack of attribution of some activities to industries with HFC emissions.Rounding errors occur due to two reasons within the GHGI: 1) totals of line items may not add to respective subtotals or grand totals due to lack of precision, and 2) in a number of tables a ' + ' symbol is present indicating that GHG emissions are present but at a smaller quantity than can be represented as non-zero by the decimal unit in the table.

Comparison to Previous Model
High-resolution images are provided in the supporting figures that show the relative changes in the results by sector for each major GHG and for minor gases in CO2e [11] .For this analysis, the NGIAM 2016 model emissions are set to 1 for a given sector, and all other models results are shown as relative changes in comparison with this model.Table 4 provides a summary of significant data changes by sector for each gas and the associated explanation from the list above.

Comparison Across Years
The change in emissions by sector across years is generally very minor from 2016 to 2020 and less significant than the difference between 2016 in the old and new models.Images are provided in the supporting figures [11] .There is frequently a steady increase in totals by sectors from 2016 to 2019 which then falls back in 2020.This mirrors the change in the GHGI over this period for industry for what are defined in the GHGI as industry, transportation, agriculture, residential and commercial sectors (GHGI Table 2-1).GHG emissions from electricity are an exception to this trend, and they decrease steadily through the entire period, with a larger drop from 2019 to 2020.

State comparison to National
As a form of validation and review of the state GHG model, we confirmed that, for all GHGs: 1.The sum totals for a given GHG and sector across all states (and the District of Columbia) equals the national total for the GHG and sector, as in Eq. ( 3) , where e represents a vector of GHG totals, s is a sector, f is a flow, st is a state, and n is the nation.
2. The sum totals for a given GHG across all states equals the national total for the given GHG, as in Eq. ( 4) . 51 The national totals for these validation tests were derived from the aggregate level model, where the sectors match the state model.We performed this test for each year of data from 2012-2020.The result of the test in Eq. (3) show that state totals are < 1% different than national totals for all gases and sectors in all years with some exceptions for the smaller quantity GHGs (HFCs, CF4, SF6, NF3, etc.) in the equipment sectors, like primary metal manufacturing.The result of the second test in Eq. ( 4) showed that these differences are cancelled out when combining sectors, such that totals by gas across all states equal those at the national level.The only exception to the second test is when the reported precision for emissions of small quantities in the national inventory (e.g., for Perfluorocyclobutane and Perfluoropropane) is insufficient when compared to the data in the state GHG inventories.Future work will seek to resolve these data precision issues with the data providers.

State comparison to EPA's Inventory
In developing the state model, we use the State GHGI as an attribution source and not a primary emission source in order to ensure that the sum of all state totals by sector is equal to the national totals by sector (see previous section).Therefore, here we compare the modeled state totals to the reported emissions in EPA's State GHGI in Fig. 6 .All state totals are within + /-10% of emissions reported in the State GHGI, with most states yielding much closer results.

Code Availability
The models used to produce the datasets are incorporated into FLOWSA v2.0.0 and available in the FLOWSA source code repository .The model code is represented in a number of YAML files that can be found in the flowsa/methods/flowbysector folder.GHG_national_m1_common is national aggregate common model with specific implementations for each year from 2012-2020.GHG_state_m1 is the state aggregate common model with specific implementations for each year from 2012-2020.GHG_national_m2 is the national detail common model with specific implementations for each year from 2012-2020.The specific GHGI-to-NAICS crosswalk file can be found in flowsa/data/activitytosectormapping .Instructions in the repository README and the associated FLOWSA Wiki describe how flow-by-sector methods can be executed.

Limitations
The data represent emissions estimates from models and not real measurements.Limitations of estimates from the U.S. GHG Inventory and State Inventories apply to these datasets.The choice of datasets for attribution are subject to interpretation and are limited by data availability.Emissions from U.S. Territories that are included in the GHGI are excluded here, due to a difference in geographical boundary between the GHGI and this dataset.Biogenic carbon emissions, or emissions from the combustion or decay of wood or other bio-based materials, are excluded just as they are excluded UNFCCC national inventory emissions.HFCs and PFCs related to solvent use, aerosols, and fire protection are not included, due to lack of data for sector attribution.These emissions represent ∼1% of total HFC and PFC emissions reported in the GHGI.Some HFCs and PFCs are reported as a total CO2e and not by individual species; due to a lack of species resolution.

Appendix
Detail Attribution Summary Table Table 5 .

Annotation of National Aggregate model YAML ( GHG_national_m1_common )
The following excerpt is from the main instructions file, GHG_national_m1_common.yaml , that defines the general method used for national aggregate sector attribution of GHGs.This excerpt specifies use of selected data from the GHG     The first lines specify the target level of sectors in the result, classification system used and geographic scale for this method.These targets apply to all attribution that is dictated in later statements.The flow source EPA_T_2_1 is specified under a source_name .A mapping file to map the flow names and units to the FEDEFL is given under fedefl_mapping , which corresponds to this mapping file name contained within the fedelemflowlist package.Multiple sets of activities (called activity sets) can be specified from a given source; in this case one set of activities are given under direct that are from the EPA_T_2_1 source.Specific flows and activity names are selected for each activity set using the selection_fields tag.In this first example, this attribution applies to three source flows CO2, CH4, N2O .All emissions are directly attributed to NAICS sector per the allocation method.The third activity set is electric_power.electric_power uses electricity production from the Detail Supply table to further allocate it proportionally based on the production amounts in dollars from sectors that produce that commodity.Only sectors that consume electricity ("221100") are included as a data point for calculating attribution.Notably, the year of the attribution source does not necessarily need to match the year of the primary data.The final activity set shown is liming of agricultural soils.A proportional allocation approach based on the dollars consumed of lime from the Detail Use table is used with a similar set of instructions as in the previous allocation.

Fig. 2 .
Fig.2.GHG totals by state and sector for 2020, presented in CO 2 -equivalent for summary purposes.U.S. totals by sector for 2020 for the three largest global warming contributors and the other GHGs combined are presented in Fig.3for the national aggregate data.Analogous figures produced for the U.S. detailed dataset are available on figshare[11]  .

Fig. 3 .
Fig. 3.U.S. GHG emissions by aggregate NAICS sector: (a) CO 2 ; (b) CH 4 ; (c) N 2 O; (d) other GHGs.Each emission bysector is split into emissions that are directly attributed and emissions that are allocated.The presentation in these figures reveals quantities of GHGs that are directly attributed to industries without allocation, and those amounts that are attributed after allocation.The three largest CO2 emitting sectors groups are the electricity (221) and transport (48), and households (F010).The three largest CH4 emitting sector groups are animal agriculture (112), mining (211), waste management (562).The three largest N2O emitting sector groups are crop agriculture (111), animal agriculture (112), and electricity (221).Emissions of other GHGs, such as HFCs, fluorocarbons, sulfur hexafluoride (hereafter noted as "minor GHGs") are more dispersed across a wide range of sectors.While most of the CO2 and N2O emissions are attributed using allocation approaches, most CH4 emissions are directly attributed.Minor GHGs from certain manufacturing and transportation sectors and households are directly attributed; however, they must be allocated to other manufacturing and service sectors.

Fig. 4 .
Fig. 4. Data flow diagram for creation of aggregate and detail level national models and aggregate state level model.
f,st = e s, f,n

collection
This is an open access article under the CC BY license ( http://creativecommons.org/licenses/by/4.0/ ) Data are modeled by combining several sources.The modeling approach is flow sector attribution modeling where flows such as greenhouse gas (GHG) emissions are attributed or allocated to economic sectors based on direct emissions reports or other sector-level activity data.The primary source of annual flows is the U.S. Greenhouse Gas Inventory.Secondary sources for attribution include the disaggregated state-level GHG emission inventories, the Manufacturing Energy Consumption Survey, U.S. Supply/Make and Use input-output tables, employment data and various other minor sources.

Table 1
First 15 records for 2020 detailed sector estimates in dataset.The fields are defined in the Flow-by-Sector format .The MetaSources pattern is the EPA GHGI Table name following by the activity name in the table corresponding with the data used.The AttributionSources 'Detail Use' is another model described below; USDA_CoA_Cropland are Cropland data from the Census of Agriculture

Table 2
Data Years for Allocation Sources By Model and Year.

Table 3
Relative difference between sum of GHG or groups between the national aggregate model (m1) and GHGI.

Table 4
Noteworthy changes in 2016 emissions with model update.D = Data update, M = Methodology update, C = Correction.The "Reason #" refers to one or more changes in the Description of Changes section above.

Table 5
Attributions by Activity for national aggregate model (m1).