Stepwise guidance for data collection in the life cycle inventory (LCI) phase Building technology-related LCI blocks

,


Introduction
The life cycle inventory (LCI) analysis is one of the four main phases of the International Organization for Standardization (ISO) life cycle assessment (LCA) methodology (ISO, 2006a;2006b).It is typically the most data-driven and time-consuming phase (e.g., Bicalho et al., 2017;Miah et al., 2018) because of the data collection process.Data collection, defined as the "process of gathering data for a specific purpose" (UNEP, 2011), represents an essential step because data is the backbone of each LCA study and is the primary driver of quality as well as uncertainty (e. g., Ciroth and Arvidsson, 2021).In spite of its high relevance, the methodological framework for the LCI analysis phase as described in ISO14040/44 has been considered in past works as insufficiently detailed, and dedicated procedural guidance for a systematic collection of data is missing (Zamagni et al., 2008).
Indeed, until now, literature on LCI data gathering steps has mostly remained abstract and generic, as Ciroth and Arvidsson (2021) also recognized.For instance, Klöpffer and Grahl (2014) emphasize the importance of data collection; however, no information is given on how to collect those data.Likewise, in several textbooks, like Jolliet et al. (2015), the authors briefly describe data collection in a chapter dedicated to the LCI analysis phase without elucidating the requirements needed for data gathering.Although examples of more detailed guidance to plan and execute data collection exist (e.g., Wenzel et al., 1997;Guinée, 2002;Curran, 2012;and Hauschild et al., 2018), these still do not provide a clear and systematic stepwise approach on guiding LCA practitioners through the essential step of data collection when conducting an LCA.The same level of genericity is associated with reference documents, like the International Life Cycle Data (ILCD) Handbook and other guidance documents from the European Commission -Joint Research Centre (EC-JRC) (EC-JRC, 2010a), often leaving the LCA practitioner with little practical guidance (Finnveden et al., 2009;Horne et al., 2009).Such guidance, therefore, is prone to different interpretations by LCA practitioners on how to collect and handle data to build their system model.This can be observed in current literature, where numerous systematic literature reviews have confirmed the lack of transparency and reproducibility of LCI data (e.g., Laurent et al., 2014;Bohnes and Laurent, 2019;Thonemann, 2020).These previous calls and observations, therefore, demonstrate the lack of sufficiently detailed guidance that can enable LCA practitioners to ensure transparency and reproducibility of their LCI.
To fill this gap, a systematic, comprehensive, and detailed guidance for LCA practitioners to facilitate the data collection process in the LCI phase is proposed.Building on a critical review of current LCI practice from the scientific literature (Section 2), a stepwise guidance is developed, along with recommended methods and tools to assist LCA practitioners in building LCI blocks (Section 3).An LCI block is defined in this study as the collection of (single or multiple) unit processes at an aggregation level that is technology-wise appropriate (e.g., component level).Unit processes are defined as the "smallest element considered in the LCI analysis for which input and output data are quantified" (ISO, 2006a;2006b).The application of the stepwise guidance is illustrated in a data collection case, using the manufacturing of a battery pack as a proof-of-concept (Section 4), before the guidance applicability and further research needs are discussed (Section 5).

State-of-the-art in LCA data collection practices
Two main types of literature were reviewed to establish a detailed state-of-the-art, namely guidance documents (see Section 2.1) and LCA case studies (see Section 2.2).In both, the purpose was to evaluate how LCI data collection has been addressed from methodological guidance and application points of view and what potential issues may exist.

LCI data collection in guidance documents
The guidance documents include book chapters, conference proceedings, LCA textbooks, reference documents, and scientific papers published until October 2021, which consider LCA as a whole and/or discuss the LCI analysis phase in detail.The pool of reference documents from Laurent et al. (2020), complemented with an additional literature search, was used to identify such relevant guidance documents.Google Scholar (https://scholar.google.de/) was used to search for further literature, using the terms "life cycle inventory data collection/gathering" and "LCI data collection/gathering".Google Scholar is used to ensure finding relevant books, conference papers, and peer-reviewed journal articles.
A total of 24 guidance documents that address data collection within the LCI phase were collected and are compiled in Table 1.The documents were reviewed to assess if a description of the data collection step was included (extensively, briefly, or no) and if a data collection support tool was provided (yes or no).These guidance documents are mainly reference documents, such as the ISO 14040/44 standards (ISO, 2006a;2006b) and the ILCD handbook edited by the EC-JRC (2010a).Most of the selected guidance documents (22 out of 24) include a description of the data collection step to some extent, with only a few exceptions (Finnveden et al., 2009;Horne et al., 2009).How the LCI data collection is described varies across the reviewed guidance documents.For instance, the Environmental Protection Agency (EPA) (EPA (1993)), Jolliet et al. (2015), and Rodrigues et al. (2021) briefly describe the data collection step, while Wenzel et al. (1997) and Hauschild et al. (2018) have a dedicated section providing more insight into this step and some practical examples in the form of support tools to plan data collection and to collect the LCI.In summary, only 7 of the 24 reviewed guidance documents provide a data collection scheme.However, none reports step-by-step, hands-on support to guide practitioners regarding LCI data collection.

LCI data collection in past LCA case studies
A full systematic review to estimate how past LCA studies have handled LCI data collection was performed.Fig. 1 provides an overview of the review methodology followed.First, LCA case studies from the peer-reviewed scientific literature were retrieved.The python pybliometrics library (Rose and Kitchin, 2019) in a jupyter notebook (please consult the Supplementary Material 4 and 5 to retrieve the jupyter notebook as ipynb and html respectively) was used to search the terms "Life cycle assessment" and "Case stud*" in the Scopus database for the field codes "titles, abstracts, and keywords" and limiting results to journal articles (srtype & doctype) in English (language) which were published after 2015 (pubyear) (see Fig. 1).The search yielded 2,284 results which were then ranked using Automated Systematic Review (ASReview), a neuronal network software (van de Schoot et al., 2021).ASReview is a tool that allows for ranking studies based on an indicative set of included or excluded articles.The inclusion-exclusion criteria were applied to retrieve the indicative set of studies and ensure the selection of scientific articles that performed LCA studies, those reporting sustainability assessments with a dedicated section regarding environmental impacts, and studies with an assessment of at least one impact category, e.g., water footprint.Articles only conducting social LCA, life cycle costing, multi-criteria decision analysis, or optimization algorithms without any LCA case study, as well as method developments and review articles, were excluded.After ranking the 2,284 results using ASReview, titles and abstracts of these ranked results were screened manually to discard irrelevant publications; this yielded a shortlist of 1, 850 articles.
A sampling procedure was then performed on this total pool of 1,850 shortlisted articles to keep the review manageable (see Fig. 1).To estimate an appropriate and representative sample size, Equation (1), as described in Naing et al. (2006), was used: where N is the population size (1,850 potentially relevant LCA case studies), Z is the level of confidence (95% leading to a Z-score of 1.96), p is the expected proportion (0.5 is used as it is the most conservative assumption and applicable when previous knowledge about the population is lacking), and e is the margin of error (0.05, based on a confidence level of 95%).Stratification was assumed not necessary, as there might be no difference between the case studies when it comes to data collection concerning articles describing parameters (like publishing journal, year, or citation count).The application of this sampling thus led to the random shortlisting of 319 articles.Out of the 319 sampled articles, a total of 285 articles were identified as LCA case studies, the remainder being studies not meeting the selection criteria (e.g., paper not displaying any case despite being captured during the search process; see Supplementary Material 1).To better understand how the LCI data collection was commonly performed, the 285 shortlisted LCA case studies were reviewed according to three main criteria: (i) the level of description of the data collection (3 levels: extensive, brief, no description), (ii) the transparency in the LCI data by assessing whether or not the data is provided (3 levels: complete, incomplete, not provided), and (iii) which data sources are used for completing LCI data (qualitative evaluation).The details about the above-mentioned review criteria and their different attributes are provided in the Supplementary Material 1.
The results of this review process are presented in Fig. 2. They show that only 40% of the reviewed LCA case studies provide a complete description of their data collection process and a transparent account of their LCI data.As a direct interpretation, this calls for a stricter peer review process that can guarantee transparency and reproducibility of the studies.More generally, several scientific journals have also taken additional initiatives to promote and/or ensure transparency and reproducibility of the studies, including data documentation and data accessibility, e.g., Journal of Industrial Ecology (Hertwich et al., 2018) or Nature (Nature, 2018).From a long-term perspective, if taken up by most journals publishing LCA case studies, such a trend can be expected to dramatically increase transparency, reproducibility, and, de facto, reuse of LCI data and LCA case studies.
Concerning the data collection step, 42.5% of the LCA case studies are found not to provide sufficient detail ("brief" and "no" flags in Fig. 2).A good example of transparent reporting of the data collection step is Bacenetti et al. (2018), as they detail the data collection for each unit process built for the study.As for the inclusion of LCI data, only 45% of the studies reported a complete LCI (green areas in Fig. 2).It can be inferred that studies that either report incomplete LCI data or do not report data at all are not reproducible.On the other hand, studies that report sufficient LCI data, such as Egas et al. (2020), might not be reproducible due to other missing information (e.g., assumptions, LCI block connections, or system boundaries).
The general lack of harmonization in data reporting is another relevant aspect.Such harmonization could help ensure transparency in the data collection process and enable the reuse of LCI data across studies.Such data guidance could take the form of an LCI data collection sheet, which could also support LCI reporting.To date, two studies touch upon this aspect, namely Uusitalo and Leino (2019), who reported using a specific support tool for data collection but did not disclose it, and Sonderegger et al. (2020), who provided multiple MS Excel spreadsheet files, although these remain specific to their system of investigation and is not transferable to another field of study.
Overall, these findings demonstrate that much effort is needed to bring more transparency and reproducibility into LCA studies published in the peer-reviewed scientific literature.Although several initiatives, e. g., new journal policies/initiatives for transparency, may contribute to this effort, it is hypothesized that a generic, transferable, and stepwise guidance for LCI data collection and reporting can provide additional support to bridge this major gap in the quality and usefulness of LCA studies, as properly reported LCIs could be reused by other LCA practitioners.

Methodological guidance for data collection in the LCI phase
Using the critical review in Section 2 as an overarching frame, the authors propose a 3-step guidance to assist LCA practitioners during the data collection process (see Fig. 3).A customizable generic LCI data collection template was developed (see Saavedra-Rubio et al. (2022)) based on the needs identified in the critical review to support the Presents ways to collect the inventory data in steps: i) flowchart, ii) data collection planning using a spreadsheet, and iii) data validation (quality and uncertainties).(B ||N) Zhang et al. (2021)  3  The data collection step is exemplified in the case of building unit processes.However, the overall guidance and data collection support tool are missing.(E ||N) Babu ( 2006) 4  Talks about a 4-step framework for performing an inventory analysis and assessing the data quality: i) flow diagram of process evaluated, ii) data collection plan, iii) collect data, iv) evaluate and report results.conduction of the methodology.The LCI template directs the data collection (Vigon et al., 1994) in a defined and reusable structure.In Step 1, the planning and preparation of the tools needed for the data collection process are laid out.This step is initiated in the goal and scope phase of the LCA methodology, where the data requirements are determined after the functional unit and system boundaries have been defined, and continues into the LCI phase, in which the data requirements are used as inputs to draft the LCI blocks and fine-tune the generic LCI template.This results in a case-specific customized LCI data collection template, which will serve as the primary interface to exchange data between the data provider and the LCA practitioner.It is essential to highlight that if the inventory dataset intends to be included in an external database, such as ecoinvent, specific prerequisites from the external database should be considered when determining the data requirements (e.g., naming conventions or good documentation practices among others to ensure coherent data acquisition and reporting (Weidema et al., 2013)).The main output of Step 1, namely the customized LCI template (i.e. , not yet filled), is used in Step 2 to carry out the actual data collection process.Finally, in Step 3, the consistency and completeness of the LCI dataset (i.e., the filled customized LCI template) are reviewed in a pre-step to the interpretation phase to finalize the modeling of the LCI blocks.Each step of the methodology is described in detail in the following sections.
With the above scoping, the study therefore exclusively focuses on the LCI phase within the LCA framework although the connection to relevant steps in other LCA phases, e.g., in scope definition or interpretation phases, is made to ensure consistency in the LCA application.

Step 1: planning of data collection
The first step of the proposed methodology, namely the data planning step, starts after gaining the overview and scoping of the system under study (e.g., defined by system boundaries and functional unit).It consists of 3 sub-steps, specifically: (i) defining the LCI data requirements, (ii) preliminarily drafting the LCI blocks, reflecting the structure of the system under study while potentially refining the LCI data requirements set in (i) as the data planning is an iterative process, and (iii) utilizing the outlined LCI blocks to customize the generic LCI template provided by the guidance (cf.Fig. 3).

LCI data requirements
With LCI data requirements, the types of data necessary to build the LCI of the system under study are specified.An LCI consists of inputs (e. g., natural resources, materials, or energy) and outputs (e.g., products, by-products, or emissions) respectively required and generated by a product or system throughout its entire life cycle (EC-JRC, 2010a).The inputs or outputs are reported in terms of amounts associated with a specific reference (product) and treated as either intermediate flows (e. g., wastes, products, or by-products within the technological system, also called "technosphere") or elementary flows (e.g., emissions to air or extraction of raw materials from the ground, which represent flows of material and energy from/to the environment, also called "ecosphere").They can also be LCI data requirements of an exemplificative product or system (see Fig. 4 covering all ideal life cycle stages of a product).For each life cycle stage, different LCI data requirements exist.The collected data for the LCI is either primary or secondary data depending on the time and data resources available, and the detail needed in the study to satisfy the goal and scope.
Raw material extraction: The first stage of a product's life cycle covers all the activities required to obtain the raw materials from the lithosphere (Vigon et al., 1994) and any processing activity that the raw materials may need before being used in the manufacturing of the system of study.It also includes transporting these materials to the manufacturing or production site (Babu, 2006;EC, 2012).To fulfill the data requirements of this life cycle stage, secondary data from LCI background databases and literature is typically used.However, if access to primary data is feasible, the practitioner should consider prioritizing using these data instead.This decision will also depend on the time resources and the goal and scope of the study.For example, in steel production, where iron ore is used as raw material, if an iron ore mining organization is conducting an LCA, primary data for the raw material extraction is available and highly recommended to be used.In contrast, if a steel producer is conducting the study, the most usual case will be that secondary data either from literature or LCI databases is used to report on the LCI of the iron ore mining.Unless the steel producer has a trustworthy and close business relationship with the iron ore miner, which would allow for accessing primary data via the miner directly.In summary, primary data should always be favored over secondary data when accessible.If only secondary data is available, it is necessary to check if the chosen dataset, e.g., literature or LCI background databases fit properly, i.e., according to the geographical, temporal, and technological scopes of the study (specified in the scope definition phase).
Manufacturing/production stage: Product manufacturing encompasses all the activities required to process raw and/or basic materials (e.g., steel) into intermediate products and ultimately into the final product of the studied system (Babu, 2006;EC, 2012;Vigon et al., 1994).First, the manufacturing process needs to be described by process steps, temporal and geographical representativeness, and a process flow diagram; and characterized by its performance, efficiency, production volume, yield, and maturity.Second, the inputs and outputs of each manufacturing process step need to be specified.LCA practitioners divide inputs of the manufacturing process into material and energy inputs.Material inputs are, for example, raw materials, auxiliaries (e.g., for production-maintenance needs), or intermediate products and are retrievable from the bill of materials (BOM).Energy inputs are, for example, electricity or heat needed to manufacture each component of the final product.The main products, by-products, wastes, and emissions are quantified on the output side.The first three elements are considered as intermediate flows, as defined above, and the latter as elementary or environmental flows in LCA.The distinction between intermediate and elementary flows is relevant, as these are taken into account differently (Edelen et al., 2017).For instance, within the technical flows, the main product is either part of the functional unit and hence the subject of interest or an intermediate product further processed in the next step.Valuable by-products need to be handled by a procedure for multifunctional processes.ISO 14040/44 describes the handling of multifunctional processes with a hierarchy.Additional data requirements for the manufacturing stage are the lifetime of the final  product, recyclability of individual components, and insights of waste management routes (e.g., reuse, recycling, incineration, landfilling) for the waste generated during manufacturing.
Use/Operation stage: This stage consists of the activities in which the final product is used, operated, or consumed by the end-user.The activities to extend its useful life, such as maintaining, reconditioning, and servicing, are also included and accounted for here (EC, 2012;Vigon et al., 1994).To describe the use stage, the intended use of the final product and, if applicable, description of its operation, lifetime, technology readiness level (TRL), manufacturing readiness level (MRL), temporal representativeness, location of usage, efficiency, performance, and production volume (referred to the specified time and geographical scale) of the final product should be noted.Other aspects to consider when defining data requirements for the LCI in the use stage are maintenance requirements and emissions produced.Maintenance requirements could be the amount of material (e.g., lubricants) or energy demands (e.g., delivered via fuels or electricity).
Recycling and end-of-life (EOL): This stage relates to all the activities needed after the product has served its intended purpose and is discarded by the user.Data regarding the efficiency in collecting the decommissioned product and information on material and energy inputs and outputs are gathered for this stage.Depending on various factors (e. g., product's characteristics), the product will either be returned to nature as a waste product after entering a waste management system or enter another product's life cycle (EC, 2012;Vigon et al., 1994).Usually, this stage relies on secondary data, unless primary data can be collected from waste treatment or recycling sites specialized in treating the product or system of study, or if the manufacturing sites also take care of the EOL of the product.Regardless of the situation, collecting insights from the manufacturers is always helpful, even though it is not their expertise.Data requirements include information about the disposal route (e.g., reuse, recycling, incineration, landfilling), the characterization of the final product's EOL treatment process, involving its performance parameters, efficiency, maturity, percentage of material recovered, process steps (e.g., covered by a process flow diagram), and temporal and geographical representativeness.Transverse activities: These activities correspond to all the activities that could occur within each of the life cycle stages or in between these stages and are considered as auxiliaries instead of as a principal element needed in the specific activity.Transverse activities include transport, energy or water supply, and waste management (solid waste and wastewater treatment).As an example, data requirements for transport activities are the mode of transport (e.g., via sea, road, air, or a combination of all), the means of transportation used (e.g., railway, barge, truck, pipeline, transmission lines, or a combination), including specifications regarding amount and type of fuel (e.g., petrol, gas or diesel), emission class (e.g., EURO 4), the distance of travel, and the volume, mass or density of transported goods.
Metadata on data provider: When the inventory data is collected from multiple production sites, companies, organizations, etc., such as the case of the aluminum industry described by Pomper (1998), it is recommended to collect the details of all the companies/organizations involved in the data provision step.Helpful information entails the name, address, and market share of the company or organization and name, job title, primary responsibilities, email address, and telephone number of the contact persons.This information can help keep track of and further differentiate the data origin and improve data handling, thus increasing transparency and easing the communication process if further clarifications or data exchanges are needed from the data provider later in the process.

Outlining of LCI blocks
Once the LCI data requirements, discussed in Section 3.1.1and derived from the scope definition, are identified, further familiarization with the product system is advised, such as literature research or expert interviews.This additional information, along with the goal and scope of the study, will assist the practitioner outline the different building blocks of the LCI and, if needed, refine the data requirements to ensure that all the essential information is addressed.
The outlining of the LCI blocks consists of identifying and differentiating the components and subcomponents of the product under study or processes and subprocesses of the studied system.This can be done by ideally "zooming" into the product, starting from the finished product (higher level of complexity), and decomposing it into the different components or process steps (lower level) needed for its manufacturing (see Fig. 5).
Each LCI block represents single or multiple unit processes at aggregation levels that are defined depending on the technology under study and the goal and scope of the LCA study, including the foreseen scenarios.Each LCI block is linked, according to its relative contribution, to the main system to avoid omissions and double counting.When different scenarios are foreseen within the scope of the study, LCI blocks must be drafted with an aggregation level that allows the differentiation of all elements of the main system that are expected to change.For example, to perform the LCA of road vehicles using different propulsion technologies within the powertrain, the LCA practitioner should outline LCI blocks for all the variable powertrain components to reflect the changes in this technology.These are in line with the proposed approach of modular LCA described by Steubing et al. (2016).
Crenna et al. ( 2021) followed a similar modular approach for the LCIs of lithium-ion batteries (LIB) for electric vehicles (EV).This approach allowed the modeling of a complex system in a simplified way, specifically by partitioning it into its different components, and provided higher flexibility.The LCI blocks are outlined to represent the elements varying due to, for example, technology development, use of alternative materials, or scenarios.They can thus be rapidly replaced without the need to update the whole system or the elements that remain constant.Moreover, the blocks can be reused in other compatible systems.A substantial additional benefit observed is the possibility to perform detailed contribution and hotspot analysis at different resolutions or levels of detail depending on the aggregation level considered for the LCI blocks.
Parallel to the outlining of the LCI blocks, it is vital to map the data requirements to specific data providers according to their expertise.The latter will give the practitioner an overview of who will provide what and assist with planning the data collection process and needed LCI Fig. 5. Visualization of the different LCI blocks of a generic system.Each LCI block represents (single or multiple) unit processes at an aggregation level that is technology-wise appropriate and consistent with the goal and scope.
templates.Furthermore, it can help in the early identification of potential data gaps.As the data collection planning step is iterative, refinements of data requirements and thus reformulations of the LCI blocks can be foreseen.Refinements could be due to different views from the LCA practitioner, new findings in the literature, or LCI databasespecific requirements.

Customization of the generic LCI template
The outcome of the planning of data collection (Step 1) is the custom LCI template for data collection, and the building blocks will provide the foundation for the customization of the generic LCI template.The generic LCI template prepared in an MS Excel spreadsheet format is included in Saavedra-Rubio et al. (2022).Spreadsheet solutions other than MS Excel could also be used.It is essential to highlight that depending on the complexity of the system, the desired resolution, and the goal and scope of the study, the template can be developed in one or more MS Excel files.
3.1.3.1.Overall structure.The general structure of both the generic and custom LCI data collection templates proposes the use of several worksheets.A summary of the contents and purpose of each is detailed below: • Read me: provides detailed instructions, definitions, and guidelines on completing and navigating the LCI data collection template.• Contact details: refers to the metadata on the data provider and collects information of the LCA practitioner and the main person(s) responsible for the data provision.• Use: data pertaining to the "use stage" of the system is collected here.
• Manufacturing: collects information related to the production and assembly of the system and its subcomponents or sub-processes.Customized according to the outlined LCI blocks.The raw materials needed in the manufacturing process are also collected here.• End of life (EOL): collects information or any insight related to this life cycle stage for the main system and, if applicable, for its components.
• References: compiles all the sources from where the LCI data originates to ensure complete transparency and reproducibility.• Feedback: collects and directs all additional comments, questions, and further clarifications needed by the LCA practitioner regarding the data provided throughout the several iterations to complete the template.

Process inputs and outputs.
The table structure provided in the generic LCI template will be adapted according to the outlined LCI blocks to mimic the system of study and ultimately get the custom LCI template.The LCI blocks are thus represented in several tables which describe unitary processes, and each table collects specific data relevant for LCI.Rows are used to detail material, energy, and auxiliary inputs, and by-products and emissions outputs.For each item described in the rows, the columns request information about quantities needed or produced, waste management routes during manufacturing, technical datasheets (if applicable), comments or clarifications about the data provided, and references used.Text boxes were added next to each table to enable technology experts to provide a detailed description of the process step that the corresponding table or block represents.Technology experts are also encouraged to provide any visual aid (e.g., process flow chart, BOM, etc.) that complements the inputs in the file.

Handling of uncertainties.
In LCA, uncertainty originates from parameters, scenarios, and/or models and is analyzed in the interpretation phase of an LCA (Huijbregts et al., 2003;Laurent et al., 2020).
With the accompanied LCI data collection template, the LCA practitioner can cover all three types of uncertainty.Parameter uncertainty arises from erroneous input data, which is handled in the LCI data collection template hierarchically.If the data collector conducts frequent measurements, variability in the measured data can be directly used to derive the uncertainty distribution type.Therefore, the data collection template provides columns to specify the range of data measured and distribution type.Potential distribution types are lognormal, normal, uniform, or triangular distributions (Muller et al., 2016).If only single data points are available either at the measuring site or from literature, the pedigree matrix approach, also included in the template, allows to assess data quality (Funtowicz and Ravetz, 1990;Weidema and Wesnaes, 1996) as in the ecoinvent database (Ciroth et al., 2016;Wernet et al., 2016).The pedigree matrix accounts for the reliability, completeness, temporal, geographical, and technological correlation of each collected data point and is defined on a scale from 1 (low) to 5 (high) (Wernet et al., 2016).The values determined by the matrix are directly used to describe data quality (Chen and Lee, 2021) or can be translated into uncertainty factors and used for a global sensitivity analysis based on uncertainty propagation via Monte Carlo sampling (Igos et al., 2019;Kim et al., 2022).Yet, the uncertainty related to data based on technology experts is not directly covered by the pedigree matrix approach.Hence, as introduced by Laner et al. ( 2016), an assessment of expert-based data quality evaluation, assigning coefficients of variation to the reliability of expert estimates, is also available in the template.Scenario uncertainty refers to normative choices made in the initial design of the LCI model (Huijbregts et al., 2003), and hence the considered LCI blocks.In the LCI data collection template (see Saavedra-Rubio et al. ( 2022)), scenario uncertainty is captured so that for each scenario, an additional LCI data collection file can be filled.Scenario uncertainty can then be analyzed in the LCA interpretation phase after the LCI block finalization and the LCIA phase have been conducted.In addition, different EOL practices can be directly captured in each LCI data collection file.Model uncertainty refers to the assumptions and simplifications made to allow for conducting an LCA (Huijbregts et al., 2003).For instance, if process data is lacking entirely, this would fall into model uncertainty.Section 3.3 explains how to fill data gaps by following a hierarchical approach.These three types of uncertainty are typically analyzed in the interpretation phase of a given LCA study and fed back to the data collection step for further refinement of the inventory (e.g., increasing data quality on specific activities or processes).

Step 2: data gathering process
Step 2 of the proposed methodology corresponds to the data gathering process in which the primary outcome from the previous step, i.e., the custom LCI template, is used and completed.An iterative approach, illustrated in Fig. 1 (highlighted in green), is followed to streamline the data gathering process.First, a workshop and an overview of the custom LCI template are conducted to provide some basic LCA knowledge and sensitize the data providers to the requirements for completing the different fields of the template.Subsequently, the data provider fills the template, and the LCA practitioner provides feedback on any questions that may arise, refining the custom LCI template when needed.Refinements during this step relate to the level of aggregation and detail of the data provided and its subsequent mapping to existing LCI databases to link the foreground and the background systems.This means that removing some LCI blocks or adding new ones may be needed to map the process inputs and outputs to an LCI database process.To overcome potential challenges to data provision, the LCA practitioner can consider using non-disclosure agreements (NDAs) to ease concerns and secure the data exchange.

Step 3: LCI block finalization
Once the data provision is completed and there are no additional questions or refinements needed over the template, the LCI block finalization step (i.e., Step 3) takes place.First, the LCI block datasets (i.e., filled custom LCI template) received from the data provider(s) are collected and reviewed.The collection of the LCI block datasets usually takes place electronically.For example, a data-sharing platform ensuring data security could be used.After its collection, a pre-step to the actual interpretation phase in an LCA ensures a match with the data requirements defined in the scope definition.In the actual interpretation phase of an LCA, more thorough completeness, consistency, and sensitivity check are performed (Laurent et al., 2020).In this proposed pre-step to the interpretation phase, we advocate for relying on the completeness and consistency check guidelines (Laurent et al., 2020;Weidema, 2019).In short, the completeness is checked towards (i) LCI unit process coverage and system modeling, (ii) coverage of intermediate and elementary flows, and (iii) requirements for comparative assertions (Laurent et al., 2020).
Consistency check in the LCI phase includes checking and reporting approaches for converting the reference flow and associated parameters (Weidema, 2019).The outcome of the consistency check is the validation of the collected data.Additionally, data quality is reviewed by either quantitatively reported uncertainty values or semi-quantitative data quality assessment via the pedigree matrix.One could think of specific data quality thresholds when iterating throughout the data collection procedure to increase data quality.Otherwise, the LCA practitioner should discuss the uncertainty of the LCA in the interpretation phase via a local or global sensitivity analysis (Igos et al., 2019;Kim et al., 2022;Laurent et al., 2020).
Due to insufficient completeness or data quality, data gaps typically remain at this stage of the process (beginning of Step 3).Some may be directly bridged using the support of the data provider in an additional iteration.The remaining data gaps typically exist because the data provider lacks expertise or knowledge, or the data sources are not controlled by the data provider.In such cases, approaches to bridge data gaps in LCA can be applied.Milà i Canals et al. (2011) differentiate between two main approaches: the use of proxy data sets and extrapolation methods.For the former, Milà i Canals et al. ( 2011) distinguish between scaled, direct, and averaged proxies.Scaled proxies apply linear scaling of available data to fill the data gap, although the available data might be different from the targeted data.Examples for scaled proxies are represented by applying multiple or local linear regression models, as Steinmann et al. (2014) explained.Direct proxies use the data referred to one product to represent the target product assuming similar impacts and functions.Averaged proxies use data for a product in, for instance, other geographical locations (Milà i Canals et al., 2011).Extrapolation methods use similarities of products and their associated LCI, for example, taking the LCI of apple cultivation to obtain an altered LCI of pear cultivation through varying cultivation parameters (Milà i Canals et al., 2011).These approaches to fill data gaps can be ranked according to Parvatker and Eckelman (2019) as follows (i) get plant (primary) data, (ii) use process simulations, (iii) use process calculations, (iv) apply stoichiometry, (v), use molecular structure models (neuronal networks), (vi) use proxies, and (vii) omit the datapoint or process.
The knowledge from the pre-step of preliminary consistency and completeness checks can additionally help anticipate potential points (e. g., remaining data gaps) that need to be further investigated in a sensitivity analysis within the interpretation phase (to evaluate their influence on the overall result).Additionally, local and global sensitivity analyses are primed through the LCI data collection template according to, e.g., the consideration of different EOL pathways.When these preliminary consistency and completeness checks are performed and the LCI blocks are completed to the extent possible, Step 3 is considered finalized.These LCI blocks can then feed into the LCI modeling per se and support the potential integration of the model in LCA software; these further steps are, however, considered outside the scope of the current study.

Illustrative case on batteries
To test the stepwise guidance and the template for LCI data collection, a case of LIB using 80% nickel, 10% manganese, and 10% cobalt (NMC811) chemistry, was considered.The battery manufacturing process usually goes from cell to module production and then to pack assembly.A lithium-ion cell is composed of two electrodes (e.g., graphite anode and lithium alloy cathode), a polymeric separator preventing the transfer of electrons inside the cell, an electrolyte providing the necessary conductivity for the transfer of ions, and current collectors (e.g., Al or Cu foils) to generate electricity.Individual cells are connected in series and parallel to form independent modules regrouping their battery management and diagnostic systems.The battery pack is then formed by coupling multiple modules together.The production process can be separated into three main phases, as shown in Fig. 6 (Löbberding et al., 2020;Miao et al., 2019).
The LCI data collection process for manufacturing a LIB pack using an NMC811 cathode and a graphite anode (i.e., "cradle to gate" system boundary) using the proposed guidance is described below.Being a reallife case study for the purpose of an EU-funded project, the LCI data collection process was conducted by the Technical University of Denmark (DTU), representing the LCA practitioner, in collaboration with Bern University of Applied Sciences (BFH), operating as the technology expert and data provider.

Step 1: planning of data collection
Starting from the data requirements of the system of study, the LCI blocks corresponding to the manufacturing stage of the battery pack were outlined to mimic the process flow in Fig. 6.Once the LCI blocks were outlined, the generic LCI template was customized.

Step 2: data gathering process
The data gathering process started with an LCA theory workshop for BFH and a meeting to introduce the custom LCI template and explain specific data needs.These were followed by a series of meetings to clarify any questions and challenges encountered by the data provider, such as: • misconceptions when filling the template • lack of data at the requested level of resolution of the LCI block • missing LCI blocks for new process inputs (i.e., processes not foreseen during the outlining of the LCI blocks) • high uncertainty in the BOM due to variations encountered across manufacturers and fast-paced technology development These challenges were addressed by re-guiding the misconceptions, amending the level of resolution of the LCI blocks (i.e., merging some LCI blocks due to limited data availability or creating new LCI blocks to allow traceability to LCI databases), and showcasing alternatives to provide uncertainty (e.g., the pedigree matrix and expert data quality assessment).Furthermore, specific comments and feedback were provided in the "Feedback" worksheet after each revision, which consisted mainly of cross-checking values through mass balances to ensure data consistency, making sure all material inputs and outputs are specified, and ensuring that all the information provided is clear.
As detailed in Section 3.1.3,quantities for process inputs and outputs, such as material, energy, by-products, and emissions, were collected.The uncertainty (qualitative or quantitative) estimation of the given quantities was also collected along with the input to output material ratio, waste management route of the waste generated during the manufacturing, the lifetime, the location(s) of the production sites, and references used for the data provided.
The customized and filled LCI template is provided as an MS Excel file in the Supplementary Material 3.

Step 3: LCI block finalization
For the LCI block finalization, DTU did a final review of the LCI block dataset completed by BFH to ensure that the data requirements defined in the scope definition were fulfilled and that the data provided was complete.The main data gaps identified were energy consumption, water use, emissions, input materials for some LCI blocks, and waste production and management during the manufacturing process (e.g., input-to-output manufacturing ratio and waste management route).Some of the gaps, specifically the energy consumption and emissions, were addressed by finding aggregated values for LCI blocks at a higher level (i.e., battery pack level).For the data gaps related to input materials of some LCI blocks (i.e., BMS), ecoinvent database processes were discussed with the data provider to understand if such processes could be used as proxies.Data gaps related to waste production and routing could not be bridged at this point.However, some assumptions regarding the input to output manufacturing ratio can be made and cross-checked with the technology experts in an additional iteration.Such assumptions will only be made at a level in the LCI blocks that makes sense, for example, electrode manufacturing and cell assembly, as the processing steps suggest material losses or waste will mainly originate from there.

Operability of the stepwise guidance
The proposed stepwise guidance allows for full-fledged data collection in the LCI phase and provides an adjustable generic LCI data collection template to direct data collection.As illustrated in the battery case (see Section 4), the outlining of the LCI blocks and use of the customized template, together with the iterative consultancy and feedback process, enable obtaining comprehensive and structured datasets of LCI blocks.These datasets can directly serve as inputs to LCI modeling and integrate relevant parameters, like uncertainties, etc.It also allows for flexibility due to the LCI blocks' modularity, which permits, for example, swift exchange when another electrolyte is used in the battery.In addition, the reusability of the LCI block datasets is ensured in other systems with compatible components, thus potentially reducing the modeling time.For example, in the battery case study, this applies to reusing the battery pack LCI block datasets on different application options (i.e., once scaled, the LCI could be used in EV, aircraft, or stationary energy storage solutions).Another example, with the battery case, illustrating the flexibility of the LCI template is the adaption over time when specific LCI blocks need to be changed.Eventually, the final LCI block datasets ensure complete transparency and optimize reproducibility, although it of course remains conditioned by potential confidentiality requirements (see below).The stepwise guidance also allows facilitating the communication with LCI data providers and helps increase their understanding of the LCA theory, clarifying requirements and expectations, and structuring the workflow within the LCI data collection step.
Although the framework was demonstrated to be applicable and bring significant value to the LCA/LCI practice, several challenges and limitations remain.For example, when performing comparative LCA studies, the level of detail in the LCI blocks for each system being compared needs to be aligned.Therefore, it would not make sense to dedicate a considerable amount of time preparing highly disaggregated LCI blocks for one system when the level of detail of the LCI blocks for the comparison system(s) (e.g., another product, technology, especially from literature) is relatively low.However, it could be argued to do a detailed assessment and conduct a sensitivity analysis to check whether a more aggregated LCI would change the comparison results with the alternative.In addition, keeping the LCI blocks detailed helps track all environmental impacts, which should be favored.In the end, the necessary level of detail and data quality of LCI blocks depends on the goal and scope of the study, including the chosen products or systems of comparison.The iterative nature of LCA may also be used to refine and increase the level of detail in specific parts of the system's LCI, depending on preliminary impact indicator results obtained.
Data scarcity is a common issue when performing conventional LCAs, and it becomes more challenging when conducting specific types of LCA, like prospective, dynamic, or consequential LCA studies.To complete such LCAs, the proposed LCI data collection template needs to be amended.For prospective and dynamic LCA, the time aspect needs to be explicitly considered, and for consequential LCA, it might need to be indicated whether a market is constrained or not.Data quality or data suitable for a specific application is also of concern and goes along with data uncertainty covered by either measured variability or the pedigree matrix.However, the application of the pedigree matrix should only be used when there is no uncertainty estimation retrievable due to the underestimation of uncertainty when using the pedigree matrix (Kuczenski, 2019a).
In addition, incomplete disclosure of LCI data information could lead to uncertainty in studies building on such a dataset.LCI data might not be disclosed because companies fear revealing their know-how and opt for keeping their data confidential, thus restricting transparency and reproducibility.An agreement such as an NDA can be set to overcome transparency issues and stipulate (i) what and in which level of detail data might be disclosed when publishing the LCA and (ii) a list of reviewers with full access to the data when the LCA study is reviewed.If know-how needs to be protected, it should be discussed if averaging, proxies, or aggregation could be applied to alienate data.However, these might infringe transparency.Such approaches should be considered a last resort as they also jeopardize integration in LCI databases due to LCI database-specific data requirements.If applied, Kuczenski (2019b) guidance, which describes a reviewable aggregation scheme, should be followed.

Conclusions, recommendations, and outlook
In this study, a critical review of guidance documents and LCA case studies was performed to investigate how LCI data collection has been addressed in terms of its methodological description and application points of view.For the former, it was discovered that the description of the existing methodology is not very detailed and that significant differences exist across guidance documents.In terms of its application, results showed that only 40% of the reviewed LCA case studies provide a complete description of their data collection process and a transparent account of their LCI data, thus having 60% of studies with incomplete data description and/or non-transparent data.These findings are partially caused by the lack of a detailed procedure for LCI data collection and handling, evidencing the need to develop a detailed guidance to aid LCA practitioners and LCI data providers along the timeconsuming and resource-intensive step of LCI data collection.To this aim, a stepwise guidance consisting of three main steps and accompanied by a reusable generic LCI template (available in open source at Saavedra-Rubio et al. ( 2022)), which can be customized and adapted to the specificities of each case study, have been undertaken in the present study.Each step, with recommendations for their application (e.g., hierarchical procedures for filling data gaps), has been explained in detail to facilitate uptake by future LCA practitioners.In addition, the generic template, among other features, can be adapted to different technologies or systems, offers a tiered approach for handling uncertainties, covers all life cycle stages, and allows harmonization of datasets across the foreground system.
A case study for battery manufacturing was used as a proof of concept to demonstrate the framework's applicability.In addition to its value in supporting future LCA, the framework was found to enable and facilitate the communication and structuring of the workflow between the LCA practitioner and the LCI data provider.Furthermore, comprehensive and structured LIB datasets of LCI blocks were accomplished, showing added flexibility due to the modularity of the LCI blocks, which allows its reusability or easy replacement in other systems with compatible (i.e., casing) or varying components (i.e., electrolyte) respectively.
Overall, the proposed framework and its application can be considered an additional step toward more robust and reliable LCA studies.Based on its demonstrated operability in the LIB case study and additional ones (not shown in the current study) and the benefits for LCA practice, it is recommended for broad utilization.Such use is expected to (i) increase harmonization, transparency, and accessibility of LCI data within the scientific community and beyond and (ii) accumulate experience to help fine-tune specific framework elements and address remaining challenges.
In parallel, future research needs were identified, particularly the need to adapt the guidance elements to cater to specific LCA applications or modes, such as prospective, consequential, or dynamic LCA.Efforts are also required to make industry and authorities realize the value of making data available (even partially).Nevertheless, the provided guidance and data collection template lay the foundation for more transparent and reproducible LCIs.

CRediT authorship contribution statement
Karen Saavedra-Rubio: Conceptualization, Methodology, Formal analysis, Investigation, Data curation, Writingoriginal draft, Writing - of data used according to modeling choices, talks about critical issues related to inventory data (quality & consistency), and suggests format conversion tools to enhance data exchange.(N ||N) Crenna et al. (2021) 5 Explains modular approach used for LCI of a (LIB) to increase data detail, from where the data was collected and how the data was processed.(E ||N) a Document types indicated in indices: (1) Reference document, (2) LCA textbook, (3) Book chapter, (4) Conference proceedings, (5) Scientific paper.b Answers to questions: E = extensively, B = briefly; Y = yes, N = no.

Fig. 1 .
Fig. 1.Methodology to identify and shortlist LCA case studies in the current review (post-2015 publications).

Fig. 2 .
Fig. 2. Overview of reviewed LCA case studies regarding LCI data collection and LCI data reporting practices (n = 285).See Supplementary Material 1 for a complete overview of the reviewed articles.

Fig. 3 .
Fig. 3. Decision diagram of the proposed methodological guidance for LCI data collection and its interaction with the goal and scope, LCI analysis, life cycle impact assessment (LCIA), and interpretation phase of the LCA methodology.

Fig. 4 .
Fig. 4. Data requirements in the LCI phase along the different life cycle stages.Bullet points are meant to be used as examples and not to be understood as an exhaustive list.

Fig. 6 .
Fig. 6.LCI block diagram representing the battery pack manufacturing of the selected case study.

Table 1
Reviewed LCA guidance documents addressing LCI data collection.
Ciambrone (1997)hapters for LCI and data collection.One covers the theory, and the other gives practical examples (e.g., filled LCI datasheet and advice for the data collection process).(E||Y)Ciambrone (997)2 Touches upon the topic when presenting a general chapter about LCI analysis.(E ||N) Guinée (2002) 2 Detailed guidance on collecting data for different levels of LCA (e.g., simplified vs. detailed LCA).(E ||N) Heijungs and Suh (2002) 2 Strong focus on inventory analysis but no dedicated subchapter regarding data collection.(E ||N) Baumann and Tillman (2004) 2 Dedicated chapter for LCI phase.Proposes three main steps: process flowchart, data collection and calculation of environmental loads (E ||N) Horne et al. (2009) 2 Reflections on LCA practice, briefly touching upon LCI analysis as part of LCA framework presentation (LCA methodological guidance being outside the book's scope).(N ||N) Curran (2012) 2 Explains data collection step in detail.Provides data collection support tool which is based on ISO.(E ||Y) Klöpffer and Grahl (2014) 2 Dedicated chapter on LCI analysis, detailing the data collection phase referring to unit processes.The data collection support tool is based on unit process datasheet descriptions.(E ||Y) Jolliet et al. (2015) 2 Briefly discusses data collection in the chapter dedicated to the LCI phase.A more detailed explanation on collecting data can be found in one of the two examples in the book.(B ||N) Hauschild et al. (2018) 2 Entails a dedicated sub-chapter on the collection of data.A support tool for planning data collection is provided.The focus of the chapter is also on LCI databases.(E/N) Matthews et al. (2018) 2 Dedicated sub-chapter for explaining the data collection based on goal and scope and data collection itself.(E ||N)