for

Biogas production from organic materials by anaerobic digestion is both a developed technology and an active area of research. In this contribution we describe an R package designed to help standardize biogas research. A web-based application provides access to the main functions. The software can be used to accurately calculate biochemical methane potential (BMP) from a range of biogas measurement types. Additionally, methane potential can be predicted from substrate composition, facilitating experimental design and interpretation of results. By providing access to flexible, efficient, standardized, and transparent algorithms, this software may make biogas research more accurate and efficient.


Motivation and significance
Anaerobic digestion is an established technology for the stabilization of organic materials and production of renewable energy. Energy is recovered in the form of biogas (a mixture of methane (CH 4 ) and carbon dioxide (CO 2 )), and optimization of biogas production is an active area of research [1]. Research on anaerobic digestion includes laboratory and pilot-scale experiments, as well * Corresponding author. as theoretical calculations and process modeling. Laboratory experiments are used to determine how much CH 4 can be obtained from a particular substrate, or to study the effect of a particular treatment [2,3]. Many of these experiments use biochemical methane potential (BMP) tests to estimate the maximum quantity of CH 4 that can be obtained. Transformation of laboratory measurements into BMP is done using a sequence of simple calculations [4]. Although conceptually simple, calculations differ among groups, and are seldom described in detail in publications, leading to a lack of reproducibility and a high likelihood of systematic bias in results. Many groups use custom spreadsheet templates for calculations. Prediction of CH 4 production can be carried out using substrate composition (simple stoichiometry) but there are multiple steps involved [5]. A simple means of quickly making these calculations would facilitate the design of experiments as well as evaluation and interpretation of results. In this contribution, we describe a package for the R environment [6] that addresses these problems and needs: the biogas package [7].

Software description
The biogas package is written in R and can be used directly in the R environment. The heart of the package is a set of ten functions, which require some proficiency in R for use. However, a web-based interface called OBA (from Online Biogas App, https: //biotransformers.shinyapps.io/oba1/) provides access to the main functions, albeit with less flexibility.

Software architecture
Package functions can be divided into three groups: (1) basic vectorized functions for common conversions and calculations, (2) two data processing functions that calculate BMP or similar results from laboratory measurements, and (3) a function for predicting CH 4 production.

Basic functions
Seven functions (Table 1) assist with data processing and experimental design, and are used by the data processing functions (Section 2.2.2). Most perform a conversion of an input vector (the first argument), and are vectorized. And most include optional arguments, making them both flexible and convenient.
The stdVol() function is used for converting measured gas volume or pressure into a standardized volume that corresponds to a fixed quantity (44.72 mmol or 717.4 mg CH 4 per L under standard conditions of 0 • C and 1.0 atm, as calculated using the vol2mol() function). Standardized volume is often referred to as normal volume, and given in publications without details on the standardization conditions. However, because of inconsistency in the use of terms, correction for water vapor, and standard conditions (IUPAC now uses 1 bar = 0.9869 atm [8]), it is prudent to simply provide the conditions used for standardization. Standard conditions in the biogas package are 0 • C and 1.0 atm (101.325 kPa) by default, but can be set by the user in a function call or globally. Accurate standardization requires correction for water vapor, pressure, and temperature [9,10]. Water vapor content is based on saturation vapor pressure by default, and is calculated using a Magnus form equation [11]. This function can be used for volumetric (typically fixed pressure and variable volume) or manometric measurements (typically fixed volume and variable pressure). Multiple temperature and pressure units are supported ( • C, • F, K, atm, kPa, bar, and others). Unit conversion is carried out by the hidden function unitConvert().
Two functions convert between biogas mass and volume: mass2vol() and vol2mass(). These functions are intended for gravimetric measurement of biogas production (mass2vol()), and checking for leaks (vol2mass()). The vol2mol() function facilitates standardization to molar quantity, and is based on molar volume data from NIST [12].
Given the importance of substrate energy content, it is useful to be able to easily determine the oxygen demand of any substrate. The calcCOD() function returns the theoretical (or ''calculated'' [5]) oxygen demand COD ′ based on a chemical formula [5]. The molMass() function calculates molar mass based on a chemical formula. An internal database of atomic weights is from [13] (for ''normal materials''). Both functions rely on a hidden function (readFormula()) for reading chemical formulas. Interpolation of biogas composition, cumulative production, or any other variable can be done using interp(), which provides a simple integrated interface to both linear interpolation and spline functions from the stats package [6].

Functions for biogas data processing
Calculation of BMP (or related measures of biogas production) from laboratory data is done using two functions sequentially: cumBg() (for cumulative biogas) and summBg() (for summarize biogas). Their use requires some familiarity with the design of BMP experiments [4,14,15]. Inputs, outputs, and operations are shown in Fig. 1. These functions carry out data manipulation, and calls to basic functions perform the necessary conversions.
Input data will generally consist of one or two data files (once in the R environment, data frames), containing original measurements of biogas quantity and composition (''biogas'' and ''composition'' data, respectively ( Fig. 1)). All files must contain a unique key to identify individual bottles. Any type of file that can be read by R can be used (spreadsheet files, tab-or comma-delimited text files, and others). Spreadsheets are convenient for data entry, but data import is simpler with text files. Measurement data and related variables such as biogas measurement temperature are passed to cumBg() to calculate biogas production. Input and output data are interval-level, meaning each observation corresponds to a single measurement interval. Calculated cumulative CH 4 production is primarily an intermediate object that is used to calculate BMP with summBg(). However, some users may work with these data directly to study kinetics, and users interested in BMP may check the shape of the CH 4 production curves by plotting cumulative production. Model fitting and plotting tasks can currently be carried out with functions from other packages.
Measurements may be volumetric [16], manometric [16], gravimetric [9], or gas-chromatography-based [17]. Both cumulative (biogas quantity is the total up to the end of the interval represented by a row in a data frame) and interval (biogas quantity is only that produced during a single interval) data can be handled by cumBg(). For most methods, there are two options for calculating CH 4 production. If measured CH 4 concentrations are normalized so the sum of CH 4 and CO 2 concentration is unity, calculations can be based on the volume of gas that exited each bottle [18]. With absolute CH 4 concentrations, production is determined as the sum of the volume of CH 4 that exited the bottle and CH 4 in the bottle headspace. These two methods will give identical results if headspace volume is accurately known and biogas contains only CH 4 , CO 2 , H 2 O, and flushing gas.
Structure of input data is flexible: ''wide'' and ''long'' formats [19] can be used for biogas and composition data, or the two can be combined. The default long format is the most flexible (because sampling times may differ among bottles) but the wide format is more compact when measurements are made at fixed times, and is used in the automated AMPTS II system (Bioprocess Control, Lund, Sweden).
The final data file required for calculating BMP (''setup'' data ( Fig. 1)) is bottle-level, and contains at least the bottle key, a description of the substrate or treatment (a grouping variable), the quantity of inoculum, and mass of substrate added (typically as volatile solids (VS)). The bottle key is used by summBg() to merge setup data with the cumulative biogas data (output from cumBg()). Actual calculation of BMP is described elsewhere [4,10,15]. Time at which to evaluate BMP can be specified as a numeric value (e.g., 30 d), set to the maximum, or selected automatically when CH 4 relative production rate drops below a cutoff for a specified duration (e.g., < 1% of net cumulative per d for 3 d [15]). The last option is considered the best practice. Alternatively, BMP can be returned for all measurement times. Linear interpolation is carried out as needed using interp() to estimate CH 4 production at a specified time. By default mean values and standard deviation for each level of the grouping variable are returned, but results for individual bottles can be returned instead. Total random error (standard deviation) in the final BMP estimates may include contributions from substrate mass determination, CH 4 production by inoculum-only bottles, and CH 4 production from bottles with substrate. Presently, it is not possible to calculate BMP by model fitting and extrapolation using cumBg(), but this can be done using existing functions from other packages, and future versions of the biogas package may include functions for this task.

Function to predict methane production
The predBg() function estimates CH 4 production from substrate stoichiometry. In its simplest usage the function can be used to determine the maximum theoretical CH 4 potential (sometimes called ''theoretical BMP'') from a defined substrate with a known chemical formula. Complex substrates can be described using empirical chemical formulas, or by their macromolecular composition, e.g., the content of carbohydrates, proteins, and lipids. Internally, an empirical chemical formula is determined from a specified macromolecular composition and fixed chemical formulas for each component: C 6 H 10 O 5 for carbohydrates [20], C 4 H 6.1 O 1.2 N for proteins [21], and C 57 H 104 O 6 for lipids [20]. However, users can specify other formulas. Stoichiometry is based on Eq. (13.5) in [5]. Additional input arguments can be used to include substrate partitioning between energy and biomass production, limited degradation of a substrate, and partitioning of CO 2 between solution and biogas (based on [5] and [22]). With these additions the function can be used to estimate actual biogas yield and composition in a continuous reactor.

Illustrative examples
Package functions can be used in many ways. The cumBg() function can process more than twenty different types of measurements. Most functions are flexible and contain optional arguments. The vectorized basic functions exploit vector recycling rules to facilitate a wide range of operations. Due to space restraints, this section presents only a small number of limited examples. For more details, readers should refer to the online supplementary material and documentation available elsewhere (see reference manual and vignettes on the biogas package at https://cran.rproject.org/package=biogas, and videos on OBA at https://www. youtube.com/channel/UCxNGlwTnSkEa1GaFuAKM_3A).

Basic functions
Oxygen demand (g O 2 per g substrate) for glucose, ethanol, palmitic acid, and food waste with a known composition can be calculated with: > calcCOD(c("C6H12O6", "H3CCH2OH", "CH3(CH2)14COOH", + "C14.25 H23.74 O7.16 N")) [1] 1.065743 2.083876 2.870336 1.567671 Note the flexibility in specification of chemical formulas, which applies to all functions that accept formulas. In volumetric methods, biogas volume is generally measured at the temperature of the bottles or else at room temperature, and must be standardized to be useful. The standard volume of 100 mL of water vapor-saturated gas measured at 20, 35, or 55 • C at atmospheric pressure (1 atm) is given by: In manometric methods, pressure varies among observations, but gas volume is fixed for each bottle. Note the use of kPa, and the automatic conversion of standard conditions to new units.

Data processing for BMP calculation
This example shows the calculation of BMP from volumetric measurements made with an AMPTS II system (Bioprocess Control, Lund, Sweden). The data file and R code can be found in the supplementary material (MMC S1 and MMC S2). Bottles contained inoculum only, inoculum and cellulose, or inoculum and a particular type of food waste. Details can be found in [23]. The objective was to determine the BMP of the food waste, and cellulose was included as a positive control [15].
Input data were entered into two worksheets in a single spreadsheet (xlsx) file and read into R to give two input data frames. The first data frame (''biogas'') contains cumulative biogas volume reported at the end of each incubation interval ( Table 2). The second (''setup'') links the bottle key (unique bottle identification code) to the general contents, used for grouping (e.g., cellulose), and also contains the quantity of inoculum (wet mass) and substrate (VS mass) added to each bottle. Correctly using the cumBg() function requires knowledge about the nature and structure of the data: 1. What method was used for biogas measurement? Here, it was a volumetric method, and values are cumulative. Values are already standardized, and CH 4 content is assumed to be 100% due to CO 2 removal. 2. What is the structure of the interval-level data? Here, the measurements were organized in a wide format (Table 2). (The ''setup'' data file should always have the same structure, with one row for each bottle.) Cumulative biogas production is calculated for all bottles with a call to cumBg(). Output is a data frame with all original columns in the input ''biogas'' data frame, along with new columns containing biogas and CH 4 volume produced during each interval, cumulative values (equal to input values in this case), and production rates.
Cumulative CH 4 production can then be used to calculate BMP by subtracting CH 4 production by inoculum from total production and normalizing the result by substrate VS using summBg(). Resulting mean BMP estimates, evaluated when net relative CH 4 production rate has dropped below 1% of cumulative for 3 d (when = "1p3d") [15] are given in Table 3. Estimates for all measurement times can be returned by setting when = "meas", and show.obs = TRUE will return results for individual bottles as well (Fig. 2). Variability is acceptably low in this example-the standard deviation for food waste BMP is about 4% of the mean, which is below the 5% criterion recommended by Holliger et al. [15]. And the measured BMP of cellulose is 92% of the theoretical maximum (413.7 mL g −1 is the value returned by predBg()), which meets the criterion recommended by Holliger et al. [15]. (85-100% of theoretical maximum). (Software users may prefer to apply other criteria, e.g., [10].)

Stoichiometric calculation of methane potential
How much CH 4 should be expected from this food waste? An empirical chemical formula of C 14 H 24 O 7 N was estimated from nutritional analysis. Theoretical maximum CH 4 potential (mL g −1 ) can therefore be determined from: > predBg(form = "C14H24O7N") Comparing this potential to the measured BMP (Table 3) shows that BMP was 83% of the maximum theoretical potential. CH 4 potential could also be calculated from oxygen demand, which could be measured or calculated. But use of a chemical formula provides more information. Setting value = "all" returns additional output. Output includes substrate mass (mass, g), molar mass (mol.mass, g mol −1 ) and moles of substrate (moles), substrate COD ′ (COD, g g −1 ), water mass consumed in the reaction (hydro, g), the fraction of CH 4 in the reaction products (CH 4 and CO 2 , including inorganic carbon that remains in solution, so fCH4 is not equal to biogas composition, but see xCH4 in the help file for an estimate of this value), the volume of CH 4 (vCH4, dry mL at 1 atm of pressure and 0 • C), and the mass of CH 4 and CO 2 produced (g). Also, the overall reaction used for the calculations can be returned as a character string or numeric vector: > predBg("C14H24O7N", value = "reactionc") These results are for complete substrate degradation. But in practice some fraction will not be degraded (as in the experiment described above), and a part of the degraded fraction is used in Table 3 Biochemical methane potential (BMP) of cellulose and food waste calculated using the cumBg() and summBg() functions. The first six columns are included in the default output from summBg(). (net) synthesis of new microbial biomass (shown as C5H7O2N in the reaction above), and so not converted to biogas. These and other effects can be included in the calculation through the use of optional arguments, but the focus here is on CH 4 potential only. Nutritional analysis of this substrate showed a composition of 56% carbohydrates, 27% proteins, and 17% lipids (percentage of VS). This information could be used directly with the mcomp argument for mass-based mixtures and macromolecular composition. The chemical formula C 14 H 24 O 7 N was actually calculated from these data using predBg() by setting value = "all". The difference between the two estimates of CH 4 potential is due to rounding in the empirical formula.

Impact
The biogas package and OBA can help improve the efficiency of biogas research and the accuracy of results. Within our own groups, it has made data processing significantly easier, and has facilitated evaluation of results. In teaching, the software has been used to demonstrate concepts, and to check students' calculations.
Optimization of biogas plants requires accurate estimates of substrate BMP and simple tools for estimating CH 4 potential. Both are provided by the software described here.
As part of an open-source package, the functions described above clarify and standardize common operations that presently differ among groups. Already, the software is being used as the ''reference'' approach for data processing in a large inter-laboratory study (33 institutes) [24]. We anticipate that the package may serve as the base for a larger platform that includes model fitting and other functionality in the future.
Research activity on biogas is on the rise. The Web of Science lists more than 8000 papers published in 2000 or later with ''methane potential'', ''anaerobic digestion'', or ''biogas'' in the title. Academia represents a large group of potential users, and significant interest in our software already exists. Since it was uploaded in May 2015, the biogas package has been downloaded more than 22,000 times. And usage of OBA has gradually increased to more than 150 h per month (time app is running, regardless of number of concurrent users). YouTube videos on the app (https://www. youtube.com/channel/UCxNGlwTnSkEa1GaFuAKM_3A) have been viewed more than 3000 times in total.

Conclusions
Research on biogas production by anaerobic digestion includes both measurement and prediction of methane production potential. The biogas package and web-based interface OBA can improve the efficiency of biogas research, and accuracy of methane potential measurements and predictions by providing access to standardized algorithms for data processing and stoichiometric calculations. This R package may serve as a platform for future software tools, including functions for extraction of kinetic data, model-based estimation of BMP, and empirical models for prediction of BMP.